This model improves the reasoning capabilities of diffusion large language models through reinforcement learning and masked self-supervised fine-tuning with high-quality reasoning trajectories. The importance of this technology lies in its ability to optimize the model's reasoning process, reduce computational costs, while ensuring the stability of learning dynamics. Suitable for users who want to improve efficiency in writing and reasoning tasks.