ReFT is a simple yet effective method for enhancing the reasoning capabilities of large language models (LLMs). It first preheats the model through supervised fine-tuning (SFT), and then further fine-tunes the model using online reinforcement learning, specifically the PPO algorithm presented in this paper. ReFT significantly outperforms SFT by automatically sampling a large number of reasoning paths for a given problem and naturally deriving rewards from the true answers. ReFT's performance can be further improved by combining reasoning strategies (such as majority voting and re-ranking). It's noteworthy that ReFT achieves improvements by learning from the same training questions as SFT, without relying on additional or enhanced training questions. This demonstrates ReFT's stronger generalization ability.