The research team NovaSky from the Sky Computing Lab at the University of California, Berkeley, released the Sky-T1-32B-Preview inference model on Friday. This model has performed exceptionally well in several key benchmark tests, comparable to early versions of OpenAI's o1, and notably, it has an extremely low training cost.
Sky-T1-32B-Preview is the first truly open-source inference model. The NovaSky team not only released the model but also made the dataset used for training and the necessary training code publicly available, which means the model can be replicated from scratch. According to the team in a blog post, "The training cost of Sky-T1-32B-Preview is less than $450, indicating that advanced inference capabilities can be replicated cost-effectively." Not long ago, training models with equivalent performance often cost millions of dollars, but the significant reduction in costs is primarily due to the use of synthetic training data or data generated by other models. For example, the AI company Writer recently released the model Palmyra X004, which was almost entirely trained on synthetic data, with development costs of only $700,000.
Image source note: Image generated by AI, authorized by service provider Midjourney
Inference models differ from regular AI models in that they can effectively perform self-fact-checking, thus avoiding some common pitfalls. However, the time taken to arrive at solutions with inference models is generally longer, ranging from a few seconds to several minutes. Their reliability in fields such as physics, science, and mathematics is a significant advantage.
The NovaSky team revealed that they generated the initial training data for Sky-T1 using Alibaba's QwQ-32B-Preview inference model, then "cleaned" the data and utilized OpenAI's GPT-4o-mini to restructure it into a more usable format. Training the 32 billion parameter Sky-T1 using 8 Nvidia H100 GPU racks takes approximately 19 hours, with the number of parameters roughly corresponding to the model's problem-solving capabilities.
In performance testing, Sky-T1 outperformed the early preview version of o1 on MATH500 (a set of "competition-level" math challenges) and defeated the preview version of o1 on a set of problems from LiveCodeBench (a coding assessment). However, Sky-T1 did not perform as well as the o1 preview version on GPQA-Diamond, which includes physics, biology, and chemistry questions that a PhD graduate should master. Furthermore, OpenAI's o1GA version is more powerful than the preview version, and OpenAI is expected to release a more capable inference model, o3, in the coming weeks.
Nevertheless, the NovaSky team stated that Sky-T1 is just the starting point for their development of open-source models with advanced reasoning capabilities. "Looking ahead, we will focus on developing more efficient models while maintaining strong reasoning performance, and exploring advanced technologies to further enhance the efficiency and accuracy of the models during testing," the team wrote in their post. "Please stay tuned for our progress on these exciting plans." The emergence of this open-source inference model undoubtedly brings new opportunities and challenges to the field of artificial intelligence, and its future development is worth ongoing attention.