Microsoft recently announced its new rStar-Math technology, an innovative reasoning approach that can be applied to small language models (SLMs), significantly enhancing their performance on mathematical problems, and even surpassing OpenAI's o1-preview model in some cases. This technology is still in the research phase, with a related research paper published on arXiv.org, co-authored by eight researchers from Microsoft, Peking University, and Tsinghua University.
In tests, the rStar-Math technology was applied to several small open-source models, including Microsoft's Phi-3 mini model, Alibaba's Qwen-1.5B (1.5 billion parameter model), and Qwen-7B (7 billion parameter model). The test results showed that all participating models improved in performance, with rStar-Math even outperforming OpenAI's previously leading model on the MATH benchmark test.
The research team plans to release the relevant code and data on GitHub, although it is currently under internal review and not yet publicly available. The community has shown great interest in this technology, with many members praising its step-by-step reasoning approach combined with Monte Carlo Tree Search (MCTS), believing this innovation has broad application prospects in areas such as geometric proofs and symbolic reasoning.
The core of rStar-Math lies in the use of Monte Carlo Tree Search, a method that simulates human "deep thinking" by gradually refining the solutions to mathematical problems to help small models self-evolve. Researchers not only applied MCTS but also required the models to provide reasoning steps in natural language along with Python code during the output process. This requirement facilitated effective training of the models.
After four rounds of self-evolution, rStar-Math achieved significant accomplishments across multiple benchmark tests. In the MATH benchmark test, the Qwen2.5-Math-7B model's accuracy jumped from 58.8% to 90.0%, surpassing OpenAI's o1-preview. In the American Invitational Mathematics Examination (AIME), the model solved 53.3% of the problems, placing it in the top 20% of high school competitors.
In recent years, innovations in artificial intelligence have primarily relied on increasing model parameters, but the associated high costs have led to questions about the sustainability of this expansion. Microsoft demonstrates the potential of small models through rStar-Math, emphasizing the direction of high efficiency. The release of this technology indicates that specialized small models can serve as a powerful alternative to large systems, providing cutting-edge capabilities to medium-sized organizations and academic researchers without the burdens of substantial financial and environmental costs.
Paper link: https://arxiv.org/pdf/2501.04519
Key Points:
🌟 Microsoft launches rStar-Math technology to enhance small models' performance on mathematical problems.
📊 This technology has been tested on various open-source models, with some outperforming OpenAI's o1-preview.
🔍 The research plans to release code on GitHub, attracting community interest and showcasing the vast potential of small models.