Tsinghua Alumni Conquer 162 Mathematical Theorems with AI, Even Terence Tao's Challenge Couldn't Stop It!

AIbase基地

Published inAI News · 8 min read · Oct 12, 2024

307

In the latest sensational news in the mathematical community, a group of alumni from Tsinghua University, leveraging the power of AI, have successfully proven 162 previously unsolved mathematical theorems. Even more impressive, the intelligent agent named LeanAgent has tackled the formalization challenge of the Polynomial Freiman-Ruzsa conjecture posed by Terence Tao! This makes us不得不感叹, the research methods of basic science have been completely transformed by AI.

As is well known, current language models (LLMs) are dazzling, but most are still static, unable to learn new knowledge online, and proving advanced mathematical theorems is as difficult as ascending to heaven. However, the research team from Caltech, Stanford University, and the University of Washington jointly developed LeanAgent, an AI assistant with lifelong learning capabilities, capable of continuously learning and proving theorems.

LeanAgent addresses different levels of mathematical difficulty through carefully designed learning paths, utilizes a dynamic database to manage an ever-growing body of mathematical knowledge, ensuring it does not forget its acquired skills while learning new knowledge. Experiments have shown that it successfully proved 162 previously unsolved mathematical theorems from 23 different Lean code repositories, outperforming traditional large models by a factor of 11, truly astonishing!

These theorems cover various fields of advanced mathematics, including challenging problems in abstract algebra and algebraic topology. LeanAgent not only starts from simple concepts and gradually tackles complex topics but also demonstrates excellent performance in stability and reverse transfer.

However, Tao's challenge still leaves AI feeling helpless. Although interactive theorem provers (ITPs) like Lean play a significant role in formalizing and verifying mathematical proofs, constructing such proof processes is often complex and time-consuming, requiring meticulous steps and extensive mathematical codebases. Advanced large models like o1 and Claude are prone to errors when faced with informal proofs, highlighting the shortcomings of LLMs in the accuracy and reliability of mathematical proofs.

Previous research attempted to use LLMs to generate complete proof steps, such as LeanDojo, which created a theorem prover by training a large model on a specific dataset. However, the data for formal theorem proving is extremely scarce, limiting the widespread applicability of this method. Another project, ReProver, is an optimized model for the Lean theorem proving codebase mathlib4, although it covers over 100,000 formal mathematical theorems and definitions, it is limited to undergraduate math and thus performs poorly when faced with more complex problems.

It is worth noting that the dynamic nature of mathematical research poses a greater challenge to AI. Mathematicians often work on multiple projects simultaneously or alternately, such as Terence Tao who advances multiple research areas including the PFR conjecture and real symmetric mean. These examples highlight a key deficiency in current AI theorem proving methods: the lack of an AI system capable of adaptive learning and improvement across different mathematical fields, especially given the limited Lean data.

For this reason, the LeanDojo team created LeanAgent, a brand-new lifelong learning framework designed to solve the above problems. LeanAgent's workflow includes deducing theorem complexity to formulate learning curricula, balancing stability and flexibility through progressive training during the learning process, and using best-first tree search to find theorems yet to be proven.

LeanAgent can be combined with any large model, enhancing its generalization ability through "retrieval." Its innovation lies in using a custom dynamic database to manage expanding mathematical knowledge and a curriculum learning strategy based on Lean's proof structure, facilitating the learning of more complex mathematical content.

To address the problem of catastrophic forgetting in AI, researchers adopted a progressive training method, enabling LeanAgent to continuously adapt to new mathematical knowledge without forgetting previous learnings. This process involves incremental training on each new dataset to ensure optimal balance between stability and flexibility.

Through this progressive training, LeanAgent has demonstrated exceptional performance in proving theorems, successfully proving 162 unsolved problems, especially excelling in challenging theorems from abstract algebra and algebraic topology. Its ability to prove new theorems is 11 times higher than static ReProver, while maintaining the ability to prove known theorems.

LeanAgent has shown characteristics of progressive learning in the process of theorem proving, gradually transitioning from simple to more complex theorems, demonstrating its depth in mastering mathematical knowledge. For example, it has proven foundational algebraic structure theorems related to group theory and ring theory, showcasing a profound understanding of mathematics. Overall, LeanAgent, with its powerful continuous learning and improvement capabilities, brings exciting prospects to the mathematical community!

Paper link: https://arxiv.org/pdf/2410.06209

AI Daily Report - June 30th: Baidu Open Sources the WENXIN Large Model 4.5 Series; Tongyi Qianwen Multimodal Generation Model Qwen VLo

Welcome to the AIbase [AI Daily Report] section! Spend three minutes a day to learn about the latest AI events, helping you understand AI industry trends and innovative AI product applications. For more AI news, visit: https://www.aibase.com/zh1. Baidu officially releases the WENXIN Large Model 4.5 series and fully opens it to the public, featuring ten new models with various parameter configurations. These models are trained and inferred using the PaddlePaddle framework, achieving a FLOPs utilization rate of 47%, and perform well in multi-modal text tasks.

Test Article

The internal testing project of Xiaomi, "AI Toolkit," has officially announced the end of its phased testing and plans to suspend service starting July 5, 2025. As an important AI project incubated internally by Xiaomi, the AI Toolkit aims to explore and integrate cutting-edge AI technologies, providing users with a series of innovative features and experiences. Although the specific testing functions and application scenarios have not been fully disclosed, its name suggests its positioning as a multifunctional AI toolset. During the recent testing period, the AI Toolkit has gathered some Xiaomi employees

Test Article

The internal testing project of Xiaomi, "AI Toolbox," has officially announced the end of its phased internal testing and plans to suspend services starting from July 5, 2025. As an important AI project incubated internally by Xiaomi, the AI Toolbox aims to explore and integrate cutting-edge AI technologies, providing users with a series of innovative features and experiences. Although the specific internal testing functions and application scenarios have not been fully disclosed, its name suggests its positioning as a multifunctional AI toolkit. During the recent internal testing period, the AI Toolbox has gathered some Xiaomi employees

The Internal Testing Period of Xiaomi AI Toolbox Ends, Service Will Be Suspended Starting July 5

The internal testing project "Xiaomi AI Toolbox" has officially announced the end of its phased internal testing and plans to suspend service starting July 5, 2025. "AI Toolbox" is an important AI project incubated internally by Xiaomi, aimed at exploring and integrating cutting-edge AI technologies to provide users with a series of innovative features and experiences. Although the specific internal testing functions and application scenarios have not been fully disclosed, its name suggests its positioning as a multifunctional AI toolset. During the recent internal testing period, "AI Toolbox" has gathered some Xiaomi employees and core users.

The 'In-Depth Research' Feature of Doubao is Now in Testing on the Doubao APP, Web Version, and Desktop Version

Recently, the Doubao APP, web version, and desktop version platforms have introduced a new feature test - the 'In-Depth Research' feature has been officially launched, offering users free trial. This feature aims to help users efficiently handle complex tasks by quickly integrating massive in-depth information and generating detailed research reports or visualized web results.

AI Parenting Video: How to Earn Over 600 Per Day Using Trending Topics and AI Tools - Detailed Step-by-Step Breakdown

Monetization Idea: Use AI tools to create parenting conversation videos and post them on video platforms. Monetize through traffic sharing, account sales, and tutorial sales. Suitable for parents with parenting experience, young people who enjoy video creation, and individuals with basic knowledge of AI technology. Difficulty level is moderate, requiring proficiency in using AI tools and video editing software. Operation Process Method ** Step 1: Find Benchmark Videos ** Open the Qing Dou mini program, browse related parenting videos. Find the videos you are interested in and extract their scripts. ** Step 2:

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Tsinghua Alumni Conquer 162 Mathematical Theorems with AI, Even Terence Tao's Challenge Couldn't Stop It!

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily Report - June 30th: Baidu Open Sources the WENXIN Large Model 4.5 Series; Tongyi Qianwen Multimodal Generation Model Qwen VLo

Meta 3.2 Billion Dollar Talent Acquisition from OpenAI! The AI Talent War Has Exploded, Will the Industry Landscape Change?

Test Article

Test Article

Zhihu Direct Answer Upgrades Knowledge Base Function, Deeply Integrates Community Content to Create an Immersive AI Q&A Experience

New Open Source AI System OmniGen 2: Integrates Image and Text Generation Like GPT-4o

The Internal Testing Period of Xiaomi AI Toolbox Ends, Service Will Be Suspended Starting July 5

The 'In-Depth Research' Feature of Doubao is Now in Testing on the Doubao APP, Web Version, and Desktop Version

AI Parenting Video: How to Earn Over 600 Per Day Using Trending Topics and AI Tools - Detailed Step-by-Step Breakdown

AI Parenting Video: How to Earn Over 600 Per Day Using Trending Topics and AI Tools - Detailed Step-by-Step Breakdown