On April 13th, Kunlun Wanwei's Tiangong team announced the launch of its upgraded Skywork-OR1 (Open Reasoner 1) series of models. This follows the February 2025 release of Skywork-o1, the team's first Chinese large language model for logical reasoning. The new series achieves industry-leading reasoning performance with comparable parameter scales, pushing the boundaries of large language models' capabilities in logical understanding and complex problem-solving.
The open-sourced Skywork-OR1 series includes three high-performance models: Skywork-OR1-Math-7B, a specialized model focusing on mathematics with strong coding capabilities; Skywork-OR1-7B-Preview, which combines mathematical and coding abilities, offering both general-purpose and specialized functionalities; and Skywork-OR1-32B-Preview, a flagship version designed for more complex tasks and featuring superior reasoning abilities.
In competitive programming tasks, the general-purpose models Skywork-OR1-7B-Preview and Skywork-OR1-32B-Preview achieved optimal performance on the LiveCodeBench dataset within their respective parameter scales. Skywork-OR1-32B-Preview demonstrated particularly outstanding performance; its code generation and problem-solving capabilities are approaching those of DeepSeek-R1 (with a parameter scale of 671B). This remarkable cost-effectiveness, achieved while significantly reducing model size, showcases the advanced training strategies of the Tiangong team.
The significant performance breakthroughs of the Skywork-OR1 series are attributable to the Tiangong team's long-term, self-developed advancements and technical expertise in the post-training phase. For data selection and preprocessing, Skywork-OR1 constructed a high-quality mathematics and code dataset used for reinforcement learning to enhance the models' reasoning capabilities in these areas. The team employed three criteria—verifiability, correctness, and challenging nature—for initial data screening, eliminating proof-based problems that couldn't be automatically verified, incorrect problems, and code problems lacking unit tests. To avoid the inefficiencies of "all correct" or "all incorrect" scenarios in policy learning, each problem underwent multiple rounds of sampling and answer verification. Problems with extreme difficulty levels were filtered based on model performance.
Furthermore, Skywork-OR1 utilized Group Relative Policy Optimization (GRPO) for model training and incorporated several optimization measures, including training data optimization, training pipeline optimization, model exploration during training, and training loss optimization.
Skywork-OR1 Series Open Source Address:https://github.com/SkyworkAI/Skywork-OR1