During the 2024 college entrance examination season, nine AI large models bravely faced an unprecedented challenge—participating in the national college entrance examination, particularly the highly challenging New Curriculum Standard Volume I: Henan Paper. This test, initiated by the media, not only assessed the academic capabilities of AI but also offered a unique perspective on the differences between AI and human intelligence.

1.jpg

Among the nine AI models tested, four exceeded the first-tier undergraduate line of the Henan college entrance examination. GPT-4o scored 562, taking the top spot and surpassing the first-tier line by 41 points, while ByteDance's Doubao closely followed with 542.5 points, standing out as a top domestic model.

Robots Taking Exams - Robots in College Entrance Exams

Image Source Note: The image was generated by AI, authorized service provider Midjourney

AI performed exceptionally well in liberal arts subjects, especially in Chinese and English, but less impressively in science subjects, particularly mathematics. It is evident that AI has a clear advantage in language subjects, with impressive ancient poetry comprehension abilities.

AI performed adequately on simple reasoning questions but poorly on those requiring complex derivations and proofs, indicating a need for improvement in logical capabilities. In the comprehensive liberal arts section, geography performed the worst, while in the comprehensive science section, biology performed relatively better. GPT-4o stood out with a high score of 91.5 in the politics subject.

Testing Method and Scoring Criteria

Test Rounds: To reduce the impact of randomness, all subjects were tested twice, with the average score serving as the final result.

Input Format: Formulas were input in Markdown/LaTeX format, and image-based questions were input based on the model's recognition capabilities with corresponding images and text.

Test Operation: Professional AI data service providers conducted standardized test screenshots to ensure the fairness of the test.

Scoring Method: The same scoring standards as human candidates were used to ensure the fairness of the scoring.

This attempt for AI to participate in the college entrance examination not only showcased AI's advantages in specific fields but also exposed its shortcomings in logical reasoning and mathematical proofs. As one AI candidate quoted in an essay: "The journey is long and arduous, and I will seek knowledge both above and below." This not only reflects the development of AI but also vividly describes humanity's continuous exploration of the unknown world. Through this test, we gained a deeper understanding of AI's intellectual level and provided valuable insights for its future development direction.

The candidate list included well-known AI products such as OpenAI's GPT-4o, ByteDance's Doubao, and Baidu's Wenxin 4.0. Their performance in this college entrance examination will undoubtedly have a profound impact on the development of AI technology.