2025-03-14 10:07:07.AIbase.16.3k
360 ZhiNao Team Successfully Replicates DeepSeek Reinforcement Learning Results, Releases Open-Source Model Light-R1-14B-DS
Recently, the 360 ZhiNao team announced the successful replication of DeepSeek's reinforcement learning results and the official release of the open-source reasoning model Light-R1-14B-DS. This model surpasses DeepSeek-R1-Distill-Llama-70B and DeepSeek-R1-Distill-Qwen-32B in performance, becoming the industry's first 14B parameter model to achieve reinforcement learning effects. It significantly improves mathematical reasoning capabilities, outperforming most 32B-parameter models.