In the latest sensational news in the mathematical community, a group of alumni from Tsinghua University, leveraging the power of AI, have successfully proven 162 previously unsolved mathematical theorems. Even more impressive, the intelligent agent named LeanAgent has tackled the formalization challenge of the Polynomial Freiman-Ruzsa conjecture posed by Terence Tao! This makes us不得不感叹, the research methods of basic science have been completely transformed by AI.
As is well known, current language models (LLMs) are dazzling, but most are still static, unable to learn new knowledge online, and proving advanced mathematical theorems is as difficult as ascending to heaven. However, the research team from Caltech, Stanford University, and the University of Washington jointly developed LeanAgent, an AI assistant with lifelong learning capabilities, capable of continuously learning and proving theorems.
LeanAgent addresses different levels of mathematical difficulty through carefully designed learning paths, utilizes a dynamic database to manage an ever-growing body of mathematical knowledge, ensuring it does not forget its acquired skills while learning new knowledge. Experiments have shown that it successfully proved 162 previously unsolved mathematical theorems from 23 different Lean code repositories, outperforming traditional large models by a factor of 11, truly astonishing!
These theorems cover various fields of advanced mathematics, including challenging problems in abstract algebra and algebraic topology. LeanAgent not only starts from simple concepts and gradually tackles complex topics but also demonstrates excellent performance in stability and reverse transfer.
However, Tao's challenge still leaves AI feeling helpless. Although interactive theorem provers (ITPs) like Lean play a significant role in formalizing and verifying mathematical proofs, constructing such proof processes is often complex and time-consuming, requiring meticulous steps and extensive mathematical codebases. Advanced large models like o1 and Claude are prone to errors when faced with informal proofs, highlighting the shortcomings of LLMs in the accuracy and reliability of mathematical proofs.
Previous research attempted to use LLMs to generate complete proof steps, such as LeanDojo, which created a theorem prover by training a large model on a specific dataset. However, the data for formal theorem proving is extremely scarce, limiting the widespread applicability of this method. Another project, ReProver, is an optimized model for the Lean theorem proving codebase mathlib4, although it covers over 100,000 formal mathematical theorems and definitions, it is limited to undergraduate math and thus performs poorly when faced with more complex problems.
It is worth noting that the dynamic nature of mathematical research poses a greater challenge to AI. Mathematicians often work on multiple projects simultaneously or alternately, such as Terence Tao who advances multiple research areas including the PFR conjecture and real symmetric mean. These examples highlight a key deficiency in current AI theorem proving methods: the lack of an AI system capable of adaptive learning and improvement across different mathematical fields, especially given the limited Lean data.
For this reason, the LeanDojo team created LeanAgent, a brand-new lifelong learning framework designed to solve the above problems. LeanAgent's workflow includes deducing theorem complexity to formulate learning curricula, balancing stability and flexibility through progressive training during the learning process, and using best-first tree search to find theorems yet to be proven.
LeanAgent can be combined with any large model, enhancing its generalization ability through "retrieval." Its innovation lies in using a custom dynamic database to manage expanding mathematical knowledge and a curriculum learning strategy based on Lean's proof structure, facilitating the learning of more complex mathematical content.
To address the problem of catastrophic forgetting in AI, researchers adopted a progressive training method, enabling LeanAgent to continuously adapt to new mathematical knowledge without forgetting previous learnings. This process involves incremental training on each new dataset to ensure optimal balance between stability and flexibility.
Through this progressive training, LeanAgent has demonstrated exceptional performance in proving theorems, successfully proving 162 unsolved problems, especially excelling in challenging theorems from abstract algebra and algebraic topology. Its ability to prove new theorems is 11 times higher than static ReProver, while maintaining the ability to prove known theorems.
LeanAgent has shown characteristics of progressive learning in the process of theorem proving, gradually transitioning from simple to more complex theorems, demonstrating its depth in mastering mathematical knowledge. For example, it has proven foundational algebraic structure theorems related to group theory and ring theory, showcasing a profound understanding of mathematics. Overall, LeanAgent, with its powerful continuous learning and improvement capabilities, brings exciting prospects to the mathematical community!
Paper link: https://arxiv.org/pdf/2410.06209