DeepSeek Open Source Week Day Two: The First Open-Source EP Communication Library for MoE Models

AIbase基地

Published inAI News · 4 min read · Feb 25, 2025

319

Deepseek announced its second day open-source product: DeepEP, the first open-source EP communication library for Mixture-of-Experts (MoE) models. It supports full-stack optimization for MoE model training and inference.

DeepEP is a highly efficient communication library specifically designed for Mixture-of-Experts (MoE) and expert parallelism (EP). It aims to provide high-throughput and low-latency many-to-many GPU kernel communication, commonly known as MoE routing and aggregation.

DeepEP not only supports low-precision operations like FP8 but also aligns with the group-limited gating algorithm proposed in the DeepSeek-V3 paper. It optimizes kernels for asymmetric domain bandwidth forwarding, such as transferring data from the NVLink domain to the RDMA domain. These kernels boast high throughput, making them ideal for pre-filling tasks in both training and inference, and allow for control over the number of stream processors used.

For latency-sensitive inference decoding tasks, DeepEP also includes a set of low-latency kernels utilizing pure RDMA to minimize latency. Furthermore, DeepEP introduces a hook-based communication-computation overlap method that doesn't consume any stream processor resources.

Performance tests were conducted on H800 and CX7 InfiniBand 400Gb/s RDMA network cards. Tests showed excellent bandwidth performance for the regular kernels on both intra-node and inter-node communication. The low-latency kernels met expectations in both latency and bandwidth. Specifically, the low-latency kernel achieved a latency of 163 microseconds and a bandwidth of 46 GB/s when handling 8 experts.

DeepEP is extensively tested and primarily compatible with InfiniBand networks, but theoretically, it can also run on converged Ethernet (RoCE). To prevent interference between different traffic types, it's recommended to isolate traffic in separate virtual channels to ensure the regular and low-latency kernels don't impact each other.

DeepEP is a valuable tool providing an efficient communication solution for Mixture-of-Experts models, offering significant advantages in optimized performance, reduced latency, and flexible configuration.

Project Link: https://x.com/deepseek_ai/status/1894211757604049133

Key Highlights:
🌟 DeepEP is designed for Mixture-of-Experts models, providing high-throughput and low-latency communication solutions.
⚙️ Supports various low-precision operations and optimizes data transfer bandwidth performance.
💡 Tested and verified, DeepEP is compatible with InfiniBand networks and is suitable for isolating and managing different traffic types.

Unsloth AI Releases 1.8-bit Quantized Kimi K2 Model, Significantly Reducing Deployment Costs

Unsloth AI quantized Moonshot AI's 1T-parameter Kimi K2 model to 1.8bit, reducing size by 80% to 245GB while maintaining performance. The MoE-based model excels in coding and reasoning, now deployable on 512GB M3Ultra devices, lowering costs. This advancement positions Kimi K2 as a GPT-4.1 competitor, benefiting SMEs and boosting open-source AI adoption in education/healthcare.....

Meta May Abandon the Open-Source Philosophy and Shift to Proprietary AI Model Development

Meta may shift from open-source to closed-source AI, potentially abandoning its 'Behemoth' model due to poor performance. Despite claims of commitment to open-source, this move could challenge Zuckerberg's vision, impact AI competition, and disadvantage smaller firms reliant on open models, including China's AI strategy.....

Silicon Base Flow Launches Powerful Coding Model Kimi K2 to Promote Smart Application Development

The Silicon Base Flow platform has launched the open-source MoE model Kimi K2 developed by Moonshot AI. The model has a total of 1T parameters and 32B activated parameters, supports a context length of 128K, and performs excellently in coding and agent tasks. The pricing is 4 yuan per million tokens for input and 16 yuan per million tokens for output. New users can get 14 yuan in trial credit upon registration. The model has three technical advantages: 15.5T tokens of large-scale training, MuonClip optimizer for stable expansion, and design optimized for agent tasks. Tests show that it excels in coding

A Daily: Moonlight Open-Sources Large Model Kimi K2; Zhiyuan Fully Open-Sources RoboBrain 2.0; Tongyi Qianwen Launches Qwen Chat Desktop Client

Moon's dark side opens trillion-parameter Kimi K2 model; RoboBrain2.0 enhances robot cognition; Alibaba's Qwen adds image generation; IndexTTS2 revolutionizes voice cloning; HuggingFace's Reachy Mini sells well; Meta enables real-time video generation; PixVerse adds multi-keyframe; Tesla Grok supports AMD only; OpenAI delays open-source release; Liquid AI's LFM2 boosts edge AI; AI 'time travel' trend goes viral.....

Dozens of Works Gain Hundreds of Thousands of Followers Teach You How to Use DeepSeek + Jiameng APP to Create Story Picture Book Short Videos for Monetization

【140-word summary】This tutorial introduces the monetization methods of using AI tools to mass-produce pet story picture book videos. Using DeepSeek to generate story scripts, Jiameng APP to create dynamic videos, and CapCut for post-processing, then publishing them on short video platforms. Monetization methods include traffic sharing, selling pet supplies, and training students. Suitable for short video creators, pet lovers, and those looking for side jobs. The operation threshold is low. Case studies show that a single video can receive tens of thousands of likes. The combination of tools reduces the difficulty of content creation, forming a complete business cycle.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

DeepSeek Open Source Week Day Two: The First Open-Source EP Communication Library for MoE Models

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: Meitu Launches Imaging AI Agent RoboNeo; 1.8bit Quantized Kimi K2 Model Released; Amazon Introduces AI Code Editor Kiro

Unsloth AI Releases 1.8-bit Quantized Kimi K2 Model, Significantly Reducing Deployment Costs

Google NotebookLM Launches Feature-Rich Notebooks, One-Click Access to Expert-Curated Content

Meta May Abandon the Open-Source Philosophy and Shift to Proprietary AI Model Development

Meta's Open-Source Strategy Now in Question? Report Says Senior Leaders Discuss Abandoning Behemoth Model in Favor of Closed Development

MiniMax Valued Over 4 Billion USD, Backed by Shanghai State Capital, Joins the 3 Billion USD Large Model Club

Google Gemini Embedding Model Tops MTEB Ranking, Surpassing OpenAI

Silicon Base Flow Launches Powerful Coding Model Kimi K2 to Promote Smart Application Development

A Daily: Moonlight Open-Sources Large Model Kimi K2; Zhiyuan Fully Open-Sources RoboBrain 2.0; Tongyi Qianwen Launches Qwen Chat Desktop Client

Dozens of Works Gain Hundreds of Thousands of Followers Teach You How to Use DeepSeek + Jiameng APP to Create Story Picture Book Short Videos for Monetization