360 ZhiNao Team Successfully Replicates DeepSeek Reinforcement Learning Results, Releases Open-Source Model Light-R1-14B-DS

AIbase基地

Published inAI News · 3 min read · Mar 14, 2025

Recently, the 360 ZhiNao team announced the successful reproduction of DeepSeek's reinforcement learning effects and the official release of the open-source inference model Light-R1-14B-DS. This model surpasses DeepSeek-R1-Distill-Llama-70B and DeepSeek-R1-Distill-Qwen-32B in performance, becoming the industry's first 14B-parameter model to achieve reinforcement learning effects. It significantly enhances mathematical reasoning capabilities, outperforming most 32B-level models.

Compared to DeepSeek-R1-14B, Light-R1-14B-DS* excels in mathematical competition tasks: achieving a 4.3-point improvement in the AIME24 test and a remarkable 10-point improvement in AIME25. Furthermore, it achieved an outstanding score of 61.7 on the GPQA mathematical reasoning task.

To achieve this breakthrough, the 360 ZhiNao team employed two innovative training methods. The first is Curriculum SFT (Curriculum Supervised Fine-tuning), a phased training approach that allows the model to gradually transition from simple to complex mathematical problems, further enhancing its logical reasoning capabilities. The second is Reinforcement Learning (RL), successfully applied for the first time to a 14B-level inference model, improving inference accuracy while largely preserving other skills.

This release includes not only the model itself but also the open-sourced SFT data, code, and technical report, providing valuable resources for the industry. This achievement marks significant progress in reinforcement learning for smaller-scale models and may further promote the widespread adoption and development of AI reasoning capabilities.

Project Address: https://github.com/Qihoo360/Light-R1

Model Address: https://huggingface.co/qihoo360/Light-R1-14B-DS

Data Address: https://huggingface.co/datasets/qihoo360/Light-R1-SFTData

Light-R1-14B-DS 360 ZhiNao Reinforcement Learning Mathematical Reasoning

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Vision-R1: Reinforcing Visual Localization with RL, Achieving 50% Performance Boost

Recently, a collaborative effort between the Institute of Automation, Chinese Academy of Sciences, and the CASIA-Zidong Taichu team introduced Vision-R1, a novel method leveraging R1-like reinforcement learning to significantly enhance visual localization capabilities. This approach achieved a 50% performance improvement in complex tasks such as object detection and visual localization, surpassing existing state-of-the-art (SOTA) models with over ten times the parameter scale. Currently, large vision-language models typically rely on "pre-training + supervised fine-tuning" to improve responsiveness to user instructions, but this...

Apr 8, 2025

330

DeepSeek and Tsinghua University Collaborate on Self-Optimizing AI Model

Amidst the growing prevalence of artificial intelligence, the collaboration between DeepSeek and Tsinghua University has garnered significant industry attention. DeepSeek, a Chinese startup, is renowned for its breakthroughs in low-cost inference models. This collaboration aims to further reduce the training costs of AI models, thereby enhancing operational efficiency. DeepSeek recently launched a new low-cost inference model that has generated considerable market excitement. To further optimize this model, DeepSeek's research team...

Apr 7, 2025

360

Figure AI Achieves Breakthrough in Humanoid Robot Locomotion: Near Human Speed, Hours of Training

Figure AI recently announced a significant breakthrough in humanoid robot locomotion, showcasing natural walking capabilities achieved through reinforcement learning. This technology not only dramatically improves robot speed but also represents a new milestone in AI-driven robotic control systems. The new Figure02 robot achieves a walking speed of 2.68 mph (approximately 1.2 m/s), approaching normal human walking speed (approximately 3-4 mph), a significant improvement over the 0.67 mph of its predecessor, Figure01.

Mar 26, 2025

260

Tencent's HunYuan-T1 Reasoning Model Matches OpenAI's Top Performance in Benchmark Tests

Tencent recently announced its latest large language model, HunYuan-T1, claiming its reasoning capabilities rival OpenAI's best reasoning systems. Tencent reports that HunYuan-T1's development heavily relied on reinforcement learning, with 96.7% of post-training computing power dedicated to enhancing its logical reasoning and alignment with human preferences. In various benchmark tests, HunYuan-T1 demonstrated strong performance. On the MMLU-PRO evaluation, testing knowledge across 14 academic subjects, the model achieved a score of 87.2.

Mar 25, 2025

270

Alibaba Unveils Qwen2.5-VL-32B: A New Multimodal Model Combining Vision, Language, and Mathematical Reasoning

Alibaba is making waves in the AI field with the recent open-source release of its latest multimodal model, Qwen2.5-VL-32B-Instruct. This model is part of the Qwen2.5 series, which also includes 3B, 7B, and 72B versions. The 32B version prioritizes convenient local execution while maintaining performance. Enhanced through reinforcement learning, Qwen2.5-VL-32B excels in several areas. Notably, its responses are more aligned with human expectations.

Mar 25, 2025

680

OceanDS Large Language Model Officially Launched to Empower China's Ocean Intelligence

The State Oceanic Administration recently announced that the National Marine Environmental Forecasting Center, in collaboration with the Ocean Press and 360 Digital Security Technology Group, has successfully developed and launched OceanDS, a large language model specializing in the marine field. Based on the 360 ZhiNao 13B and Deepseek-R1-70B large language models, OceanDS focuses on marine applications and has successfully passed expert review and been officially released. The emergence of OceanDS marks a significant step forward in the application of artificial intelligence technology in China's marine sector.

Mar 24, 2025

350

Fin-R1: A 7B-Parameter Financial Large Language Model Trained with Reinforcement Learning, Outperforming Industry Giants Based on Qwen2.5-7B

A powerful newcomer has emerged in the fintech arena. The Fin-R1 model, jointly developed by Professor Liwen Zhang's team (SUFE-AIFLM-Lab) at the School of Statistics and Data Science, Shanghai University of Finance and Economics, and Caiyue Xingchen, has been officially open-sourced, attracting significant attention due to its impressive performance. This financial specialized large language model, based on Qwen2.5-7B and trained with reinforcement learning, achieves leading performance across multiple financial benchmark tests. Remarkably, Fin-R1 surpasses most models of comparable size, and even many significantly larger models, despite having only 7B parameters.

Mar 24, 2025

380

Boston Dynamics' Latest Atlas Robot Shows Stunning Agility: Running, Backflips, Side Somersaults, and Breakdancing

On March 19th, Boston Dynamics showcased the impressive agility of its latest Atlas robot. In collaboration with the RAI Institute, this humanoid robot, utilizing reinforcement learning and motion capture technology, achieves remarkably natural and fluid human-like movements, generating significant attention. Boston Dynamics revealed that Atlas's breakthroughs stem from advanced reinforcement learning, enabling it to self-learn and optimize its movements. This movement data is sourced from motion capture technology, capturing the trajectories of human or other model movements to train Atlas.

Mar 20, 2025

110

Boston Dynamics' Atlas Robot Achieves Human-Level Agility

Boston Dynamics recently showcased its Atlas humanoid robot's latest advancements on X, sparking widespread discussion in the tech community. In collaboration with the Robotics & AI Institute (RAI Institute), the company leveraged reinforcement learning and motion capture to enable Atlas to learn and perform more natural, fluid, human-like movements. This breakthrough is expected to accelerate the development of humanoid robots towards practical real-world applications.

Mar 20, 2025

320

AI Daily: Man Sentenced to 10 Months for AI-Generated Pornographic Novels; 360's Zhi Nao Team Replicates DeepSeek Reinforcement Learning Results; ByteDance's SeedFoley AI Sound Effects Generation Model Launches

Welcome to the AI Daily column! Your daily guide to exploring the world of artificial intelligence. We present you with the hottest AI news, focusing on developers and helping you understand technology trends and innovative AI product applications. Discover new AI products here: https://top.aibase.com/1. A man was sentenced to ten months in prison for using AI to write and profit from pornographic novels, illegally earning over 20,000 yuan. Dayaze People's Court in Hubei Province recently ruled on a case involving the use of artificial intelligence to write and profit from pornographic novels.

Mar 14, 2025

150

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview