Alibaba Cloud Launches New Mathematical Reasoning Model Qwen2.5-Math-PRM, 7B Version Surpasses GPT-4o

AIbase基地

Published inAI News · 3 min read · Jan 16, 2025

318

Today, the Alibaba Cloud Tongyi team officially released a brand new mathematical reasoning process reward model, Qwen2.5-Math-PRM. This model is available in two sizes: 72B and 7B, both of which significantly outperform similar open-source process reward models, particularly excelling in identifying reasoning errors.

The 7B version of Qwen2.5-Math-PRM astonishingly surpasses the widely popular GPT-4o in the industry, marking an important milestone for Alibaba Cloud in the development of reasoning models. To comprehensively assess the model's performance in mathematical reasoning, the Tongyi team also open-sourced the first step-level evaluation standard — ProcessBench. This evaluation standard covers 3,400 mathematical problem test cases, including challenging problems from the International Mathematical Olympiad, with each case annotated by human experts to ensure scientific and comprehensive evaluation.

Through the evaluation of Qwen2.5-Math-PRM's performance on ProcessBench, the research team found that both the 72B and 7B models performed exceptionally well. Notably, the 7B version not only surpassed other open-source models of the same size but even exceeded the closed-source GPT-4o-0806 in certain aspects. This demonstrates the tremendous potential of process reward models (PRM) in enhancing reasoning reliability and provides new insights for the future development of reasoning process supervision technologies.

The innovative work of the Alibaba Cloud Tongyi team not only advances artificial intelligence reasoning technology but also provides valuable references for other developers in the industry. Through open-sourcing, the Tongyi team hopes to share experiences with more researchers and promote technological progress across the industry.

Qwen Launches Qwen Chat Memory Feature

Tongyi Qianwen, a subsidiary of Alibaba, has launched the Qwen Chat Memory feature, which users can experience at chat.qwen.ai. This feature enables the intelligent assistant to have long-term memory capabilities, saving user preferences, habits, and historical conversation content. It maintains context consistency in multi-turn conversations, achieving more intelligent and personalized interactions.

AI Daily: Alibaba Launches Compact Qwen3-VL Model; iFlytek AI Translation Earbuds Launch Globally; Gemini Code Appears in Veo3.1

Alibaba launches the compact Qwen3-VL series of visual language models, including 400 million and 800 million parameter versions, aiming to promote the application of multimodal AI technology on edge devices. The model helps enhance AI processing capabilities on devices and promotes the popularization of the technology.

Alibaba Tongyi Qianwen Launches Qwen3-VL Lightweight Model: 4B and 8B Parameter Versions Performance Approaches Previous 72B Flagship

The Alibaba Tongyi Qianwen team has launched two lightweight models in the Qwen3-VL series, with parameter scales of 4B and 8B. This series is the strongest family of vision-language models to date, adding small-parameter versions to lower deployment barriers while maintaining strong performance. Each scale offers two versions: instruction following and chain-of-thought reasoning, providing developers with more flexible options.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Alibaba Cloud Launches New Mathematical Reasoning Model Qwen2.5-Math-PRM, 7B Version Surpasses GPT-4o

AIbase基地

This article is from AIbase Daily

AI News Recommendations

LLaVA-OneVision-1.5, a Fully Open-Source Multimodal Model That Exceeds Qwen2.5-VL

Qwen Launches Qwen Chat Memory Feature

AI Daily: Google Releases Veo 3.1; Tongyi Qianwen Introduces Qwen Chat Memory Feature; Sora2 Free Users Can Generate 15-Second Videos

New Breakthrough in AI Assistant! Qwen Chat Memory Launches Officially, It Can Remember Every Conversation You Have!

Tongyi Qianwen announces the official launch of the Qwen Chat Memory feature

AI Daily: Alibaba Launches Compact Qwen3-VL Model; iFlytek AI Translation Earbuds Launch Globally; Gemini Code Appears in Veo3.1

Alibaba Tongyi Qianwen Launches Qwen3-VL Lightweight Model: 4B and 8B Parameter Versions Performance Approaches Previous 72B Flagship

Alibaba Launches Compact Qwen3-VL Model to Enhance Multimodal AI Efficiency and Accelerate Edge Device Deployment

HKU and Meituan Collaborate to Solve AI Math Challenges: CodePlot-CoT Enables Large Models to Think with Code Plotting, Performance Surges by 21%

Google NotebookLM Launches Anime-Style Video Feature: Nano Banana Can Instantly Generate Six Art Styles, Chinese Support Still Needs Optimization

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Alibaba Cloud Launches New Mathematical Reasoning Model Qwen2.5-Math-PRM, 7B Version Surpasses GPT-4o

AIbase基地

This article is from AIbase Daily

AI News Recommendations

LLaVA-OneVision-1.5, a Fully Open-Source Multimodal Model That Exceeds Qwen2.5-VL

Qwen Launches Qwen Chat Memory Feature

AI Daily: Google Releases Veo 3.1; Tongyi Qianwen Introduces Qwen Chat Memory Feature; Sora2 Free Users Can Generate 15-Second Videos

New Breakthrough in AI Assistant! Qwen Chat Memory Launches Officially, It Can Remember Every Conversation You Have!

Tongyi Qianwen announces the official launch of the Qwen Chat Memory feature

AI Daily: Alibaba Launches Compact Qwen3-VL Model; iFlytek AI Translation Earbuds Launch Globally; Gemini Code Appears in Veo3.1

Alibaba Tongyi Qianwen Launches Qwen3-VL Lightweight Model: 4B and 8B Parameter Versions Performance Approaches Previous 72B Flagship

Alibaba Launches Compact Qwen3-VL Model to Enhance Multimodal AI Efficiency and Accelerate Edge Device Deployment

HKU and Meituan Collaborate to Solve AI Math Challenges: CodePlot-CoT Enables Large Models to Think with Code Plotting, Performance Surges by 21%

Google NotebookLM Launches Anime-Style Video Feature: Nano Banana Can Instantly Generate Six Art Styles, Chinese Support Still Needs Optimization

GEO Services