DeepSeek Open Source Release Day Four: Parallel Strategy Upgrade with DualPipe and EPLB Technologies Revolutionizes Large Model Training

AIbase基地

Published inAI News · 5 min read · Feb 27, 2025

25

Today, DeepSeek, a leading domestic AI company, officially unveiled the fourth day's results of its open-source initiative—Optimized Parallelism Strategies, highlighting the DualPipe bidirectional pipeline parallel algorithm, the Expert Parallel Load Balancer (EPLB), and deep optimizations to the computation-communication overlap mechanism. This technological upgrade directly addresses the core pain points of large-scale language model training, providing a new solution for the efficient operation of clusters with over 10,000 GPUs.

1. DualPipe: Bidirectional Pipeline Parallel Algorithm

As one of the core technologies in this upgrade, DualPipe is specifically designed for the V3/R1 architecture. Through an innovative bidirectional data flow pipeline, it achieves a high degree of overlap between computation and communication. Compared to traditional unidirectional pipelines, this technology significantly improves computational throughput, especially suitable for model training with hundreds of billions to trillions of parameters. The GitHub code repository shows that DualPipe, through an intelligent scheduling mechanism, concurrently executes forward computation during the backpropagation phase, increasing hardware utilization by approximately 30%.

(Project link: https://github.com/deepseek-ai/DualPipe).

2. EPLB: Dynamic Load Balancer

Addressing the persistent issue of "hot experts" in Mixture-of-Experts (MoE) model training, the EPLB technology achieves dynamic load balancing for expert parallelism for the first time. Traditional methods often lead to overload on some computing cards due to uneven expert task allocation. EPLB, through real-time monitoring and adaptive allocation, increases the overall utilization of a 10,000-GPU cluster to over 92%, effectively avoiding resource idleness (Project link: https://github.com/deepseek-ai/EPLB).

3. Computation-Communication Overlap Optimization

Based on the V3/R1 architecture communication overlap analysis tool, DeepSeek has built a spatio-temporal efficiency model for 3D parallelism (data/pipeline/tensor parallelism) for the first time. Through the open-source analysis dataset (link: https://github.com/deepseek-ai/profile-data), developers can accurately locate conflict points between computation and communication, providing a tuning benchmark for ultra-large-scale model training. Tests show a reduction of approximately 15% in end-to-end training time.

Industry Impact: Breaking Through Bottlenecks in Large Model Training

This technology release has garnered significant industry attention. Experts point out that the combined innovation of DualPipe and EPLB directly addresses two major challenges in current large model training: firstly, with the exponential growth of model size, the scalability bottleneck of traditional parallel strategies is becoming increasingly prominent; secondly, the popularization of Mixture-of-Experts models makes dynamic load balancing a necessity. A technical leader from a cloud computing vendor commented: "These tools will significantly reduce the hardware threshold for training hundreds of billions of parameter models, and are expected to reduce training costs by 20%-30%."

DeepSeek's CTO emphasized in the technical documentation that these open-sourced strategies have been validated in its internal training of multiple hundreds-of-billions-parameter models and will continue to be iteratively optimized. Currently, all three technologies are open-sourced on GitHub, supporting developers to customize applications for different hardware environments.

As the global AI competition enters the "scale wins" phase, DeepSeek, through four consecutive days of key technology open-sourcing, not only demonstrates the technological strength of Chinese AI companies but also provides reusable infrastructure for the industry. This technological innovation, driven by "open collaboration," may reshape the industrial ecosystem of large model training.

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

DeepSeek Open Source Release Day Four: Parallel Strategy Upgrade with DualPipe and EPLB Technologies Revolutionizes Large Model Training

AIbase基地

This article is from AIbase Daily