A streamlined and customizable framework for efficient large model evaluation and performance benchmarking