Recently, the 360 ZhiNao team announced the successful reproduction of DeepSeek's reinforcement learning effects and the official release of the open-source inference model Light-R1-14B-DS. This model surpasses DeepSeek-R1-Distill-Llama-70B and DeepSeek-R1-Distill-Qwen-32B in performance, becoming the industry's first 14B-parameter model to achieve reinforcement learning effects. It significantly enhances mathematical reasoning capabilities, outperforming most 32B-level models.
Compared to DeepSeek-R1-14B, Light-R1-14B-DS* excels in mathematical competition tasks: achieving a 4.3-point improvement in the AIME24 test and a remarkable 10-point improvement in AIME25. Furthermore, it achieved an outstanding score of 61.7 on the GPQA mathematical reasoning task.
To achieve this breakthrough, the 360 ZhiNao team employed two innovative training methods. The first is Curriculum SFT (Curriculum Supervised Fine-tuning), a phased training approach that allows the model to gradually transition from simple to complex mathematical problems, further enhancing its logical reasoning capabilities. The second is Reinforcement Learning (RL), successfully applied for the first time to a 14B-level inference model, improving inference accuracy while largely preserving other skills.
This release includes not only the model itself but also the open-sourced SFT data, code, and technical report, providing valuable resources for the industry. This achievement marks significant progress in reinforcement learning for smaller-scale models and may further promote the widespread adoption and development of AI reasoning capabilities.
Project Address: https://github.com/Qihoo360/Light-R1
Model Address: https://huggingface.co/qihoo360/Light-R1-14B-DS
Data Address: https://huggingface.co/datasets/qihoo360/Light-R1-SFTData