Large models (LLMs) such as GPT and Llama have revolutionized the field of artificial intelligence, but efficiently training these massive models while aligning them with human values remains a challenge.

Reinforcement Learning with Human Feedback (RLHF) has become a widely adopted method for LLM training in recent years, but traditional RLHF frameworks face limitations in flexibility, efficiency, and scalability.

image.png

To address these issues, ByteDance's Doubao Large Model team has open-sourced a new RLHF framework called HybridFlow, which introduces new possibilities for LLM training.

RLHF typically involves three stages:

First, the actor model generates text based on the input prompt; then, the critic model, reference model, and reward model evaluate the generated text and calculate the corresponding values, reference probabilities, and reward values;

image.png

Finally, these evaluation results are used to train the actor model to produce text that aligns more closely with human preferences. Traditional RLHF frameworks often use a single controller to manage the entire data flow, which is inefficient for LLMs requiring distributed computing.

The HybridFlow framework innovatively combines single and multi-controller modes and decouples complex computations and data dependencies through a hierarchical API design, enabling flexible representation and efficient execution of the RLHF data flow.

image.png

The advantages of HybridFlow are primarily reflected in the following three aspects:

Flexible support for various RLHF algorithms and models: HybridFlow offers modular APIs, allowing users to easily implement and extend various RLHF algorithms, such as PPO, ReMax, and Safe-RLHF.

Efficient model weight reorganization: The 3D-HybridEngine component supports efficient model weight reorganization for the actor model during both training and generation phases, minimizing memory redundancy and communication overhead.

Automated model deployment and parallel strategy selection: The Auto Mapping component automatically maps models to different devices based on model load and data dependencies, and selects the optimal parallel strategy, thereby simplifying the model deployment process and enhancing training efficiency.

Experimental results show that HybridFlow significantly improves throughput when running various RLHF algorithms, up to 20.57 times. The open-source release of HybridFlow will provide a powerful tool for RLHF research and development, driving the future advancement of LLM technology.

Paper link: https://arxiv.org/pdf/2409.19256