Recently, an AI system developed by Google DeepMind — AlphaGeometry2, has successfully surpassed the average level of gold medalists at the International Mathematical Olympiad (IMO), showing excellent performance in solving geometric problems. AlphaGeometry2 is an upgraded version of the AlphaGeometry system released by DeepMind last year. The research team pointed out in their latest study that this system can solve 84% of the geometric problems from the IMO over the past 25 years.

So, why is DeepMind interested in such a high school mathematics competition? Researchers believe that new methods for solving complex geometric problems could be key to enhancing AI capabilities, especially in Euclidean geometry. Proving mathematical theorems requires reasoning abilities and the capacity to choose appropriate solving steps, and DeepMind believes that these problem-solving skills may be crucial for the development of future general AI models.

LLM Alpaca Mathematical Model

Image Source Note: Image generated by AI, image licensed from service provider Midjourney

This summer, DeepMind also showcased a system that combines AlphaGeometry2 with AlphaProof (an AI model for formal mathematical reasoning), which solved four out of six problems in the 2024 IMO preliminary round. Beyond geometric problems, this approach may also extend to other areas of mathematics and science, potentially aiding in complex engineering calculations.

The core of AlphaGeometry2 includes a language model from the Google Gemini family and a "symbol engine." The Gemini model helps the symbol engine derive solutions to problems using mathematical rules. The workflow is as follows: the Gemini model predicts which constructs (such as points, lines, circles) may be helpful in solving the problem, and then the symbol engine conducts logical reasoning based on these constructs. Through a series of complex searches, AlphaGeometry2 can combine the suggestions from the Gemini model with known principles to arrive at a proof.

Despite AlphaGeometry2 successfully answering 42 out of 50 problems from the IMO, surpassing the average score of gold medalists, it still has some limitations, such as being unable to solve problems with an indefinite number of variables, nonlinear equations, and inequalities. Additionally, for some more challenging problems, AlphaGeometry2's performance was not ideal, solving only 20 out of 29 problems.

This research has once again sparked discussions about whether AI systems should be based on symbolic operations or more brain-like neural networks. AlphaGeometry2 employs a hybrid approach, combining neural networks with a rule-based symbol engine. The DeepMind team pointed out that while large language models may generate partial solutions without external tools, the symbol engine remains an important tool in mathematical applications in the current context.