Recently, the Sky Computing Lab team at the University of California, Berkeley, released Sky-T1-32B-Preview, an open-source inference artificial intelligence model that makes the development of inference AI easier and more affordable. This model has performed exceptionally well on several key benchmark tests, even rivaling earlier versions of OpenAI's o1.
The training cost of Sky-T1 is remarkable, at only $450, which means that replicating high-level inference capabilities has become more affordable and efficient. While the $450 fee may not seem low, it represents a significant decrease compared to the training costs that were often in the millions just a few years ago. The use of synthetic training data—data generated by other models—has significantly reduced costs. The AI company Writer recently released Palmyra X004, which relies almost entirely on synthetic data, with a development cost of only $700,000.
Image Source Note: Image generated by AI, image authorized by service provider Midjourney
Unlike most AIs, inference models can effectively self-verify, making them more reliable when handling common problems. Inference models typically require more time to arrive at solutions, which can take from a few seconds to a few minutes, but their reliability advantage is significant in fields such as physics, science, and mathematics.
The NovaSky team utilized another inference model—Alibaba's QwQ-32B-Preview—to generate the initial training data for Sky-T1 and curated the data before using OpenAI's GPT-4o-mini to reorganize it into a more actionable format. Training the 3.2 billion parameter Sky-T1 took about 19 hours using a set of 8 Nvidia H100 GPUs. The number of parameters is roughly related to the model's problem-solving capabilities.
According to the NovaSky team, Sky-T1 outperformed the earlier preview version of o1 on MATH500, a collection of "competition-level" math challenges. Additionally, Sky-T1 surpassed the preview version of o1 on challenging problems encountered in LiveCodeBench. However, in questions related to fields such as physics, biology, and chemistry, Sky-T1's performance in the GPQA-Diamond test fell short compared to the o1 preview version.
It is important to note that OpenAI's GA version of o1 is more powerful than the preview version, and OpenAI is expected to release an even more advanced inference model, o3, in the coming weeks. Nevertheless, the NovaSky team stated that Sky-T1 is just the beginning of their journey in developing open-source models with advanced inference capabilities.
"Looking ahead, we will focus on developing more efficient models to maintain strong inference performance and explore advanced techniques to further enhance model efficiency and accuracy," the team wrote in their blog. "Please stay tuned for our progress on these exciting projects."