Tsinghua and Shanghai AI Lab Jointly Develop Novel Process Reward Model, Enabling Smaller Models to Surpass GPT-4

AIbase基地

Published inAI News · 4 min read · Apr 14, 2025

In the field of artificial intelligence, with OpenAI's o1 and DeepSeek's R1 models gaining significant attention, the reasoning capabilities and test-time scaling (TTS) techniques of large language models (LLMs) have sparked considerable research interest. However, accurately evaluating the quality of a model's response at each step when dealing with complex reasoning problems remains a challenge. To address this, Tsinghua University and Shanghai AI Lab jointly proposed the Generative Process Reward Model (GenPRM), offering an innovative solution for process-supervised reasoning.

Traditional Process Reward Models (PRMs), while capable of verifying the correctness of reasoning steps, struggle to capture deep-level logical errors due to their scalar scoring mechanism. Furthermore, the discriminative modeling approach of PRMs limits their scalability during the testing phase. GenPRM addresses these limitations by incorporating generative chain-of-thought reasoning and code verification, and by introducing a test-time scaling mechanism, opening up a new research direction.

AI Brain, Large Model

Image Source Note: Image generated by AI, image licensing provided by Midjourney

GenPRM's design philosophy mimics the human problem-solving process, allowing the model to perform natural language analysis at each reasoning step. This enhances transparency and makes step evaluation more interpretable. Simultaneously, GenPRM generates and executes Python code related to the reasoning, ensuring reliability. This "explain-then-verify" mechanism not only judges correctness but also provides specific suggestions for improvement, significantly enhancing the effectiveness of process supervision.

Surprisingly, GenPRM achieved superior performance to GPT-4o using only 23K training samples. In tests on mathematical reasoning benchmarks like ProcessBench, the 1.5B parameter GenPRM, boosted by test-time scaling, performed exceptionally well; while its 7B parameter version successfully surpassed the 72B parameter Qwen2.5-Math-PRM, demonstrating powerful step-level critique capabilities.

Furthermore, GenPRM's advantages extend to its efficient data synthesis method. Through Relative Progress Estimation (RPE) and code verification, GenPRM generated high-quality process supervision data, significantly reducing the need for large amounts of labeled data. Researchers used the QwQ-32B model to synthesize data, and consensus filtering was employed to retain high-quality samples, resulting in the 23K training set.

In the future, GenPRM can serve not only as an answer verifier but also as a "coach," guiding the iterative optimization of strategy models through feedback. This "generate-critique-reflect" closed loop provides a novel path for the self-improvement of large language models, and may be extended to areas like code generation and multi-modal reasoning in the future.

LargeLanguageModel(LLM)GenerativeProcessRewardModel(GenPRM)OpenAIo1 DeepSeekR1

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Alibaba Ovis-U1 Launches with a Bang: A Multi-Modal AI All-in-One, Open Source Empowers Global Developers

On June 29, 2025, the Alibaba International AI Team officially released the new multi-modal large model **Ovis-U1**, marking another major breakthrough in the field of multi-modal artificial intelligence. As the latest masterpiece of the Ovis series, Ovis-U1 integrates multi-modal understanding, image generation, and image editing functions, demonstrating powerful cross-modal processing capabilities, providing new possibilities for developers, researchers, and industry applications. This is a detailed report on Ovis-U1 by AIbase. Ovis-U1

Jun 30, 2025

680

OpenAI announces that the 2025 Developer Conference will be held in San Francisco, expected to attract more than 1,500 developers

OpenAI has officially announced the date and location of its next developer conference (DevDay), which will be held on October 6, 2025, in San Francisco. This conference is expected to attract more than 1,500 developers and is anticipated to be the largest developer event to date. The agenda for this DevDay will be rich and diverse, featuring multiple important sessions. The conference will include live-streamed keynote speeches, during which OpenAI will share its latest developments and future vision in the field of artificial intelligence. In addition, participants will also be able to

Jun 27, 2025

280

Black Forest Shocks Open Source FLUX.1 Kontext [dev]: Image Editing Comparable to GPT-4o

Black Forest Labs officially announced that its new image editing model FLUX.1Kontext [dev] is now open source, drawing widespread attention from the AI community. As the latest member of the FLUX.1 series, this model is praised as an open-source alternative comparable to GPT-4o, thanks to its powerful image editing capabilities and efficient performance. FLUX.1Kontext [dev] is based on a 1.2 billion parameter flow matching transformer architecture, specifically designed for image editing tasks, and supports consumer-grade hardware.

Jun 27, 2025

180

Open Source Magic is Here! FLUX.1 Kontext [dev] Challenges GPT-4o, Bringing Image Editing into a New Era

Jun 27, 2025

160

New Oriental Launches Its First Original AI Education Product - New Oriental AI 1-on-1, Revolutionizing the Traditional Learning Model

New Oriental officially launched its first consumer-facing original AI education product - New Oriental AI 1-on-1 today. This is not only a major breakthrough in teaching methods, but also marks a critical step in New Oriental's strategic layout of "Education + AI." The core competitiveness of New Oriental AI 1-on-1 lies in providing learners with a high-frequency interactive 1-on-1 learning experience. The AI teacher can realistically reproduce the learning environment, achieving real interaction and real Q&A. At the same time, the AI teacher is patient, responsible, proficient in teaching, and can provide timely feedback, as well as praise and encourage students.

Jun 26, 2025

1.1k

New GoT-R1 Multimodal Model Released: Making AI Drawing Smarter, the New Era of Image Generation!

Recently, a research team from the University of Hong Kong, The Chinese University of Hong Kong, and SenseTime has released a groundbreaking framework - GoT-R1. This new multimodal large model significantly enhances the semantic and spatial reasoning capabilities of AI in visual generation tasks by introducing reinforcement learning (RL), successfully generating high-fidelity and semantically consistent images from complex text prompts. This advancement marks another leap in image generation technology. Currently, although existing multimodal large models have made significant progress in generating images based on text prompts

Jun 26, 2025

150

Wang Xingxing, Founder of Unitree Robotics: From a Solo Entrepreneur to a Robot Giant with 1 Billion Yuan in Annual Revenue

On June 26th, during the Tianjin Summer Davos Forum today, Wang Xingxing, CEO of Unitree Robotics, shared the company's impressive growth journey with attendees. According to AIbase, Wang revealed that Unitree Robotics, which started as a one-person company since its establishment in 2016, has now grown into an industry giant with nearly a thousand employees and annual revenue exceeding 1 billion yuan. Wang's speech highlighted the remarkable progress made by Unitree Robotics in just nine years, showcasing its strong capabilities and market influence in the robotics field.

Jun 26, 2025

AI Daily: Midjourney重磅推出视频生成模型V1; OpenAI将在今年夏季发布GPT-5; Google推出Search Live语音搜索功能

Jun 19, 2025

140

Musk Disputes Rumors of Massive Losses at xAI: Claims of Burning $1 Billion per Month Are Nonsense

Media reports claimed that Elon Musk's artificial intelligence startup, xAI, is burning through up to $1 billion per month. This report has garnered significant attention. The message suggests that xAI's costs in building advanced AI models far exceed its revenue growth, and the company's funding needs are becoming increasingly urgent. However, Musk strongly refuted these claims, calling them 'nonsense.' Photo source note: The image was generated by AI, with photo licensing services provided by Midjourney. Since its founding in 2023, xAI has been...

Jun 19, 2025

210

DeepSite V2 Update! Supports DeepSeek-R1-0528 Model, Easily Generate 3D Web Page Animations, No Code Needed for Creative Play!

Jun 19, 2025

650

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Tsinghua and Shanghai AI Lab Jointly Develop Novel Process Reward Model, Enabling Smaller Models to Surpass GPT-4

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Alibaba Ovis-U1 Launches with a Bang: A Multi-Modal AI All-in-One, Open Source Empowers Global Developers

OpenAI announces that the 2025 Developer Conference will be held in San Francisco, expected to attract more than 1,500 developers

Black Forest Shocks Open Source FLUX.1 Kontext [dev]: Image Editing Comparable to GPT-4o

Open Source Magic is Here! FLUX.1 Kontext [dev] Challenges GPT-4o, Bringing Image Editing into a New Era

New Oriental Launches Its First Original AI Education Product - New Oriental AI 1-on-1, Revolutionizing the Traditional Learning Model

New GoT-R1 Multimodal Model Released: Making AI Drawing Smarter, the New Era of Image Generation!

Wang Xingxing, Founder of Unitree Robotics: From a Solo Entrepreneur to a Robot Giant with 1 Billion Yuan in Annual Revenue

AI Daily: Midjourney重磅推出视频生成模型V1; OpenAI将在今年夏季发布GPT-5; Google推出Search Live语音搜索功能

Musk Disputes Rumors of Massive Losses at xAI: Claims of Burning $1 Billion per Month Are Nonsense

DeepSite V2 Update! Supports DeepSeek-R1-0528 Model, Easily Generate 3D Web Page Animations, No Code Needed for Creative Play!