April 3, 2025: According to the latest large language model math ability evaluation results released by MathArena, Google's Gemini-2.5-pro has taken the lead with an absolute advantage, demonstrating remarkable performance in uncontaminated, high-difficulty math competitions.

QQ_1743659809882.png

Groundbreaking Achievement

Gemini-2.5-pro achieved a 24.40% accuracy rate in the rigorous evaluation on the MathArena platform. This result not only ranks first but also stands in stark contrast to DeepSeek-R1's 4.76%, showcasing a stunning fivefold lead. This groundbreaking achievement signifies a qualitative leap in Gemini-2.5-pro's advanced mathematical reasoning capabilities.

Excellent Performance in Multiple Competitions

Particularly noteworthy is Gemini-2.5-pro's remarkable 93% score in the "AIME 2025 I" competition, a widely recognized high-difficulty math competition. Simultaneously, its 50% performance in "USAMO 2025" further demonstrates its ability to solve extremely challenging mathematical problems.

Technical Significance

The uniqueness of the MathArena evaluation lies in its rigor and fairness. It uses only post-model-release math competition problems, ensuring the model cannot gain an advantage through pre-training materials. Under such stringent conditions, Gemini-2.5-pro's high success rate reflects Google's significant breakthrough in large model mathematical reasoning capabilities.

Industry Impact

The outstanding performance of Gemini-2.5-pro not only demonstrates the immense potential of large language models in advanced mathematical thinking but also opens up new possibilities for AI-assisted education, research, and complex problem-solving. This achievement will further drive competition and innovation in the AI industry regarding reasoning capabilities and professional field applications.

Compared to other models such as Claude-3.7-Sonnet (Think) with 3.65% accuracy and o1-pro (high) with 2.83%, Gemini-2.5-pro's leading advantage is even more prominent, signifying that the development of large language model mathematical capabilities may have entered a new phase.

Data Source: https://matharena.ai/