ReFT
ReFT enhances the reasoning ability of LLM
CommonProductProductivityArtificial IntelligenceReasoning
ReFT is a simple yet effective method for enhancing the reasoning capabilities of large language models (LLMs). It first preheats the model through supervised fine-tuning (SFT), and then further fine-tunes the model using online reinforcement learning, specifically the PPO algorithm presented in this paper. ReFT significantly outperforms SFT by automatically sampling a large number of reasoning paths for a given problem and naturally deriving rewards from the true answers. ReFT's performance can be further improved by combining reasoning strategies (such as majority voting and re-ranking). It's noteworthy that ReFT achieves improvements by learning from the same training questions as SFT, without relying on additional or enhanced training questions. This demonstrates ReFT's stronger generalization ability.
ReFT Visit Over Time
Monthly Visits
17788201
Bounce Rate
44.87%
Page per Visit
5.4
Visit Duration
00:05:32