Light-R1 is an open-source project developed by Qihoo360, aiming to train long-chain reasoning models through curriculum-style supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning (RL). This project achieves long-chain reasoning capabilities from scratch through decontaminated datasets and efficient training methods. Its main advantages include open-source training data, low-cost training, and excellent performance in mathematical reasoning. The project background is based on the current training needs of long-chain reasoning models, aiming to provide a transparent and reproducible training method. The project is currently free and open-source, suitable for research institutions and developers.