Breaking! Stanford & BGI Join Forces to Launch New AI Training Method S1, Cost Drops Drastically and Performance Soars!

AIbase基地

Published inAI News · 5 min read · Feb 6, 2025

470

The research teams from Stanford University and the University of Washington recently released a groundbreaking AI training method called S1. The core idea behind S1 is to significantly enhance the reasoning capabilities of language models using minimal testing-time scaling techniques. Unlike previous methods that relied on massive computational power or complex algorithms, the S1 method cleverly achieves a leap in performance by controlling the allocation of computational resources during testing.

The S1 method first carefully constructed a small dataset named s1K, which contains 1,000 high-quality reasoning questions. The selection criteria for this dataset are very strict, requiring that the questions meet three conditions: high difficulty, strong diversity, and excellent quality. The research team validated the importance of these three criteria through detailed ablation experiments, showing that random selection or focusing on only one criterion would lead to a significant drop in performance. Notably, even training on a superset containing 59,000 samples did not yield results as effective as the carefully selected 1,000 samples, highlighting the critical nature of data selection.

After the model training was completed, the researchers employed a technique called "budget enforcement" to control the computational load during testing. In simple terms, this method extends the model's thinking time by either forcibly terminating the model's thought process or adding "wait" commands, guiding the model to explore and verify more deeply. This way, the model can repeatedly check its reasoning steps and effectively correct errors.

Experimental results indicate that after fine-tuning on the s1K dataset and utilizing the "budget enforcement" technique, the s1-32B model outperformed OpenAI's o1-preview model by up to 27% on competitive-level math problems. Even more impressively, through scaling with "budget enforcement," the s1-32B model demonstrated a generalization ability that exceeded its own training level, improving its score on the AIME24 test set from 50% to 57%.

The core contribution of this research lies in providing a simple and efficient method for creating datasets with high reasoning capabilities and achieving performance scaling during testing. Based on this, the research team developed the s1-32B model, which can rival or even surpass closed-source models while maintaining open-source status and high sample efficiency. The code, models, and data from this research have been made available on GitHub.

The researchers also conducted in-depth ablation experiments on the nuances of the data and testing-time scaling techniques. They found that considering difficulty, diversity, and quality simultaneously is crucial. In terms of testing-time scaling, the "budget enforcement" method exhibited excellent controllability and performance improvement. The research also explored two different methods of scaling: parallel and sequential scaling, and introduced advanced techniques like REBASE, providing important insights for future research directions.

This research not only brings a low-cost, high-benefit new approach to the field of AI training but also lays a solid foundation for broader AI applications.

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Breaking! Stanford & BGI Join Forces to Launch New AI Training Method S1, Cost Drops Drastically and Performance Soars!

AIbase基地

This article is from AIbase Daily