Deepseek announced its second day open-source product: DeepEP, the first open-source EP communication library for Mixture-of-Experts (MoE) models. It supports full-stack optimization for MoE model training and inference.

DeepEP is a highly efficient communication library specifically designed for Mixture-of-Experts (MoE) and expert parallelism (EP). It aims to provide high-throughput and low-latency many-to-many GPU kernel communication, commonly known as MoE routing and aggregation.

QQ_1740452301668.png

DeepEP not only supports low-precision operations like FP8 but also aligns with the group-limited gating algorithm proposed in the DeepSeek-V3 paper. It optimizes kernels for asymmetric domain bandwidth forwarding, such as transferring data from the NVLink domain to the RDMA domain. These kernels boast high throughput, making them ideal for pre-filling tasks in both training and inference, and allow for control over the number of stream processors used.

QQ_1740452534008.png

For latency-sensitive inference decoding tasks, DeepEP also includes a set of low-latency kernels utilizing pure RDMA to minimize latency. Furthermore, DeepEP introduces a hook-based communication-computation overlap method that doesn't consume any stream processor resources.

Performance tests were conducted on H800 and CX7 InfiniBand 400Gb/s RDMA network cards. Tests showed excellent bandwidth performance for the regular kernels on both intra-node and inter-node communication. The low-latency kernels met expectations in both latency and bandwidth. Specifically, the low-latency kernel achieved a latency of 163 microseconds and a bandwidth of 46 GB/s when handling 8 experts.

DeepEP is extensively tested and primarily compatible with InfiniBand networks, but theoretically, it can also run on converged Ethernet (RoCE). To prevent interference between different traffic types, it's recommended to isolate traffic in separate virtual channels to ensure the regular and low-latency kernels don't impact each other.

DeepEP is a valuable tool providing an efficient communication solution for Mixture-of-Experts models, offering significant advantages in optimized performance, reduced latency, and flexible configuration.

Project Link: https://x.com/deepseek_ai/status/1894211757604049133

Key Highlights:

🌟 DeepEP is designed for Mixture-of-Experts models, providing high-throughput and low-latency communication solutions.

⚙️ Supports various low-precision operations and optimizes data transfer bandwidth performance.

💡 Tested and verified, DeepEP is compatible with InfiniBand networks and is suitable for isolating and managing different traffic types.