AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

Google Research: Synthetic Data Boosts Large Model Math Reasoning Eightfold

AIbase基地

Published inAI News · 5 min read · Apr 7, 2025

Recently, a joint research team from Google, Carnegie Mellon University, and MultiOn published a new study on the application of synthetic data in large model training. According to a report by Epoch AI, a research institute focusing on AI development, there are currently about 300 trillion publicly available high-quality text training tokens. However, with the rapid development of large models like ChatGPT, the demand for training data is growing exponentially, and it's projected that this data will be exhausted before 2026. Therefore, synthetic data is becoming a crucial alternative.

The researchers explored two main types of synthetic data: positive data and negative data. Positive data refers to correct problem solutions generated by high-performance large models (such as GPT-4 and Gemini 1.5 Pro), providing the model with examples of how to solve mathematical problems. However, relying solely on positive data for training has limitations. First, this approach may not fully reveal the underlying logic of the problem-solving process; the model might learn through pattern matching without true understanding. Second, as the amount of training data increases, the model may learn spurious correlations, leading to decreased generalization ability when dealing with new problems.

Therefore, the researchers introduced negative data, which includes problem-solving steps verified as incorrect. This helps the model identify and avoid errors, enhancing its logical reasoning capabilities. Although utilizing negative data presents challenges because incorrect steps may contain misleading information, the researchers successfully enabled the model to learn from mistakes through Direct Preference Optimization (DPO), emphasizing the importance of each problem-solving step.

The DPO method assigns an advantage value to each problem-solving step, reflecting its value relative to the ideal solution. The study shows that high-advantage steps are key to correct solutions, while low-advantage steps may indicate problems in the model's reasoning. Using these advantage values, the model can dynamically adjust its strategy within a reinforcement learning framework to learn and improve more efficiently from synthetic data.

To validate the effectiveness of synthetic data, the research team conducted comprehensive tests on the GSM8K and MATH datasets using models like DeepSeek-Math-7B and LLaMa2-7B. Results showed that large models pre-trained with both positive and negative synthetic data achieved an eightfold improvement in performance on mathematical reasoning tasks. This research demonstrates the immense potential of synthetic data in enhancing the logical reasoning capabilities of large models.

Key Highlights:
📊 Synthetic data offers an effective solution to the growing demand for training data.
🧩 Combining positive and negative data enhances the model's mathematical reasoning and logical abilities.
🚀 The study shows an eightfold improvement in the reasoning ability of large models after pre-training with synthetic data.

SyntheticData LargeModelTraining GPT-4 Gemini1.5Pro

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Open-Source Revolution! Step1X-Edit Lands on Hugging Face, Generating Images with Natural Language, Rivaling GPT-4o!

Step1X-Edit, a groundbreaking open-source AI model, has arrived on Hugging Face. This powerful tool allows users to create images using natural language descriptions, demonstrating performance comparable to GPT-4o. This release marks a significant advancement in accessible AI image generation technology.

Apr 28, 2025

GPT-4o's Image Generation Integrated into GPTs: A New Era of Personalized Image Bots

OpenAI has announced the official integration of GPT-4o's image generation capabilities into the GPTs (custom GPT) platform, providing developers and creators with powerful tools to build personalized image generation robots. According to AIbase, this update allows users to create custom image generation applications through GPTs, such as poster design robots or generators for specific artistic styles, significantly enhancing creative flexibility and sharing. The enthusiastic discussions on social media highlight its widespread impact; the feature is already available to ChatGPT Plus and P users.

Apr 27, 2025

110

OpenAI Launches New ChatGPT Version: Smarter, More Intuitive GPT-4o

Apr 27, 2025

330

Step1X-Edit: A New Benchmark in Open-Source Image Editing, Rivaling Closed-Source Models like GPT-4o

Step1X-Edit is a groundbreaking open-source image editing model that achieves performance comparable to leading closed-source models such as GPT-4o. It offers a powerful and versatile solution for various image manipulation tasks.

Apr 27, 2025

140

GPT-4's Image Generation Capabilities Now Integrated into Custom GPTs

Apr 27, 2025

140

AI Daily: Baidu Unveils Wenxin Large Model X1Turbo and AI Open Program; OpenAI Offers Free Lightweight Deep Research; iDream Video 3.0 Internal Testing

Baidu released its new Wenxin large language model X1Turbo and an accompanying AI open program. OpenAI is offering a free, lightweight version of its Deep Research platform. iDream Video 3.0 is currently undergoing internal testing.

Apr 25, 2025

180

Baidu's Li Yanhong Unveils Ernie Bot's Twin Stars: X1 Turbo Directly Targets DeepSeek 4.5 Turbo, Surpassing GPT-4o

Apr 25, 2025

360

GPT-4.1 Model Faces Scrutiny: Alignment and Stability Concerns Raised

Apr 24, 2025

140

OpenAI's New GPT-4.1 Model Faces Challenges in Alignment

OpenAI recently released its latest AI model, GPT-4.1, claiming superior instruction following. However, independent tests suggest a decline in alignment, i.e., reliability, compared to its predecessor, GPT-4. OpenAI typically releases detailed technical reports including safety evaluations with new models, but hasn't done so this time, explaining that GPT-4.1 is not considered a 'cutting-edge' model.

Apr 24, 2025

140

ChatGPT Major Update: New Image Library Feature Allows Viewing of All GPT-Generated Images

OpenAI has announced a major update to ChatGPT: a new image library feature is now live, enabling users to view, edit, and share all images generated via the GPT-4 model in a unified interface. This feature is now gradually rolling out to free, Plus, and Pro users, significantly enhancing the user experience in AI image generation. Image Library Feature: One-Stop Management of AI Creations. ChatGPT's image library provides a centralized platform for storing and managing all images generated via GPT-4.

Apr 16, 2025

550