Microsoft Collaborates with Tsinghua and Peking University to Launch rStar-Math Technology: Small Model Triumphs Over Mathematical Problems, Surpassing OpenAI!

AIbase基地

Published inAI News · 5 min read · Jan 10, 2025

319

Microsoft recently announced its new rStar-Math technology, an innovative reasoning approach that can be applied to small language models (SLMs), significantly enhancing their performance on mathematical problems, and even surpassing OpenAI's o1-preview model in some cases. This technology is still in the research phase, with a related research paper published on arXiv.org, co-authored by eight researchers from Microsoft, Peking University, and Tsinghua University.

In tests, the rStar-Math technology was applied to several small open-source models, including Microsoft's Phi-3 mini model, Alibaba's Qwen-1.5B (1.5 billion parameter model), and Qwen-7B (7 billion parameter model). The test results showed that all participating models improved in performance, with rStar-Math even outperforming OpenAI's previously leading model on the MATH benchmark test.

The research team plans to release the relevant code and data on GitHub, although it is currently under internal review and not yet publicly available. The community has shown great interest in this technology, with many members praising its step-by-step reasoning approach combined with Monte Carlo Tree Search (MCTS), believing this innovation has broad application prospects in areas such as geometric proofs and symbolic reasoning.

The core of rStar-Math lies in the use of Monte Carlo Tree Search, a method that simulates human "deep thinking" by gradually refining the solutions to mathematical problems to help small models self-evolve. Researchers not only applied MCTS but also required the models to provide reasoning steps in natural language along with Python code during the output process. This requirement facilitated effective training of the models.

After four rounds of self-evolution, rStar-Math achieved significant accomplishments across multiple benchmark tests. In the MATH benchmark test, the Qwen2.5-Math-7B model's accuracy jumped from 58.8% to 90.0%, surpassing OpenAI's o1-preview. In the American Invitational Mathematics Examination (AIME), the model solved 53.3% of the problems, placing it in the top 20% of high school competitors.

In recent years, innovations in artificial intelligence have primarily relied on increasing model parameters, but the associated high costs have led to questions about the sustainability of this expansion. Microsoft demonstrates the potential of small models through rStar-Math, emphasizing the direction of high efficiency. The release of this technology indicates that specialized small models can serve as a powerful alternative to large systems, providing cutting-edge capabilities to medium-sized organizations and academic researchers without the burdens of substantial financial and environmental costs.

Paper link: https://arxiv.org/pdf/2501.04519

Key Points:
🌟 Microsoft launches rStar-Math technology to enhance small models' performance on mathematical problems.
📊 This technology has been tested on various open-source models, with some outperforming OpenAI's o1-preview.
🔍 The research plans to release code on GitHub, attracting community interest and showcasing the vast potential of small models.

Qwen Chat Desktop Client Released, Supporting One-Click Activation and Invocation of MCP

Recently, Qwen Chat received a major update and made a new appearance, offering users a more intuitive interaction experience and a wider range of functional services, aiming to become the most reliable AI partner for everyone. The updated Qwen Chat has achieved significant improvements in interaction design, allowing users to start a conversation directly on the home page without complicated operations, making chatting more convenient. Its functions have also been significantly expanded, supporting daily questions, meeting users' various information query needs, and assisting in content creation, whether it's writing articles or generating text.

Tencent Hunyuan-A13B Model API Launches

Recently, Tencent Cloud officially launched the API service for the Tencent Hunyuan A13B model on its official website. The input price is set at 0.5 yuan per million Tokens, and the output price is 2 yuan per million Tokens, which has quickly sparked enthusiastic discussions in the developer community. As the first 13B-level MoE (Mixture of Experts) open-source hybrid inference model in the industry, Hunyuan-A13B features a total of 80B parameters and only 13B activated parameters, achieving performance comparable to leading open-source models of the same architecture, while also demonstrating efficient reasoning capabilities.

AI Daily: xAI Shockingly Launches Grok4; Microsoft Opensources New Phi-4-mini Version; Shanghai has Cumulatively 82 Large Models Passed Filing

1. xAI launches Grok4 with enhanced math/coding capabilities; 2. Microsoft open-sources efficient Phi-4-mini for edge devices; 3. Shanghai approves 82 specialized AI models; 4. Hugging Face releases Reachy Mini robot; 5. Perplexity debuts Comet AI browser; 6. OpenAI plans first open-weight model; 7. Google releases GPU-friendly MedGemma; 8. OpenAI acquires AI hardware firm for $6.5B.....

Microsoft Launches New Phi-4-mini Version: Inference Efficiency Improved by 10 Times, Easily Compatible with Laptops

Microsoft open-sources the Phi-4-mini-flash-reasoning model, specifically designed for edge devices, with inference efficiency improved by 10 times. It uses an innovative SambaY architecture to achieve efficient memory sharing, showing outstanding performance in long text generation and mathematical reasoning. Benchmark tests show its excellent long context understanding ability, with a Phonebook task accuracy rate of 78.13%. This model is suitable for educational and research fields and can run on a single GPU.

Mistral Seeks $1 Billion in Funding to Target the Throne of AI in Europe!

French AI company Mistral is seeking $1 billion in equity financing, with a valuation of $6.51 billion. The company is known for its open-source large language model and chatbot Le Chat, and has raised a total of $1.19 billion in funding so far. This round of financing will be used for research and development and market expansion. Additionally, it will collaborate with MGX Fund and NVIDIA to build the largest AI data center park in Europe, supporting France's AI sovereignty initiative. Mistral's development will enhance Europe's position in the global AI competition.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Microsoft Collaborates with Tsinghua and Peking University to Launch rStar-Math Technology: Small Model Triumphs Over Mathematical Problems, Surpassing OpenAI!

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Qwen Chat Desktop Client Released, Supporting One-Click Activation and Invocation of MCP

Tencent Hunyuan-A13B Model API Launches

NVIDIA stellt DiffusionRenderer vor: Ein neues KI-Modell zur Erstellung von realistischen 3D-Szenen aus Videos

Google Veo3 Adds Image-to-Video Feature, Users Create Over 40 Million Videos Within Seven Weeks

AI Daily: xAI Shockingly Launches Grok4; Microsoft Opensources New Phi-4-mini Version; Shanghai has Cumulatively 82 Large Models Passed Filing

Microsoft Launches New Phi-4-mini Version: Inference Efficiency Improved by 10 Times, Easily Compatible with Laptops

AI Daily: Alibaba Tongyi Opens Source Audio Generation Model ThinkSound; Google Veo3 Generates Images into Videos; Feishu Announces Several New AI Products

Kunlun Wildfire Launches Skywork-R1V 3.0: Cross-modal Reasoning Capabilities Approaching Those of Human Experts!

Mistral Seeks $1 Billion in Funding to Target the Throne of AI in Europe!

Hugging Face Launches SmolLM3: A 3B-Parameter Small Model Competes with 4B Giants, 128K Context Leads a New Trend in Efficient AI!