The latest research warns of potential risks in large-scale model benchmark evaluations. A study jointly completed by Renmin University of China and others has found that test set data may be included in pre-training, leading to unexpected hazards in practical applications. The study suggests using multiple benchmark tests and providing the source of test data to mitigate the issue. In simulated tests, models performed better in pre-training that included benchmark data but showed a decline in performance on other benchmarks. The research underscores the need for greater transparency and diversity in large-scale model benchmark evaluations, providing crucial references for future studies.