The article examines the phenomenon of "benchmarking chaos" in the current evaluation systems for large models, noting that there is a widespread occurrence of "everyone being number one" in the rankings. The existing open-source benchmarking datasets can lead to a "problem-solving" mentality, while closed proprietary datasets can affect fairness. Additionally, some rankings lack scientific and comprehensive evaluation dimensions. The article suggests establishing an authoritative evaluation system, open-sourcing the evaluation tools and processes to ensure fairness, but adopting a model of open historical + closed formal datasets for evaluation. Moreover, the commercialization of large models is far more important than the parameters of the models or their rankings on the leaderboards.