Recently, researchers from Stanford University and the University of Washington successfully trained an AI reasoning model named s1, with training costs of less than $50 and very low cloud computing credit requirements. This research was published last Friday, indicating that s1 performs comparably to OpenAI's o1 model and DeepSeek's R1 model in mathematics and programming ability tests. The code and data for s1 have been made publicly available on GitHub for other researchers to use.
The research team stated that they started with a ready-made base model and fine-tuned it using distillation techniques to extract the desired reasoning capabilities. The distillation process for s1 utilized Google's Gemini 2.0 Flash Thinking Experimental model, which is similar to the method used by researchers at the University of California, Berkeley, last month when training another AI reasoning model that cost around $450 to train.
This achievement has excited many, especially in today's AI field, where researchers can innovate without substantial financial backing. However, the emergence of s1 has also sparked deep reflections on the commercialization of AI models. If anyone can replicate multi-million dollar models at relatively low costs, where do the competitive advantages of large companies lie?
Clearly, large AI labs are not pleased with this situation; OpenAI has accused DeepSeek of improperly using its API data for model distillation. The s1 research team hopes to find a simple method to achieve powerful reasoning performance while enhancing the "test time extension" capability, allowing AI models more time to think before answering questions. These are breakthroughs achieved by OpenAI's o1 model, and DeepSeek and other AI labs are also attempting to replicate this through various methods.
The research on s1 indicates that effective distillation of reasoning models can be achieved using a relatively small dataset with supervised fine-tuning (SFT) methods, which are generally cheaper than the large-scale reinforcement learning methods used by DeepSeek. Google also offers free access to Gemini 2.0 Flash Thinking Experimental, but the platform has daily usage limits and prohibits reverse engineering of its models to develop competing services.
To train s1, researchers constructed a dataset containing 1,000 carefully selected questions and their corresponding answers, along with the "thinking" process behind the questions. The training process used 16 Nvidia H100 GPUs and took less than 30 minutes. According to the researchers, they can now rent the necessary computing resources for about $20. Additionally, the research team employed a clever trick to add the word "wait" during reasoning, thereby enhancing the accuracy of the answers.
In the future of 2025, Meta, Google, and Microsoft plan to invest hundreds of billions of dollars in AI infrastructure, part of which will be used to train the next generation of AI models. Although distillation technology has shown good results in replicating AI models at lower costs, it has not significantly improved the performance of new AI models.
Paper: https://arxiv.org/pdf/2501.19393
Code: https://github.com/simplescaling/s1
Key Points:
🌟 The training cost of the s1 model is under $50, performing comparably to top reasoning models.
🛠️ The research team extracted reasoning capabilities from a pre-existing model using distillation techniques, achieving a fast and efficient training process.
🚀 Large AI labs express concerns over low-cost model replication, with future investments focusing on AI infrastructure.