DeepEP is a communication library specifically designed for Mixture-of-Experts (MoE) and Expert Parallel (EP) models. It provides high-throughput and low-latency fully connected GPU kernels, supporting low-precision operations (such as FP8). The library is optimized for asymmetric domain bandwidth forwarding, making it suitable for training and inference pre-filling tasks. Furthermore, it supports Stream Multiprocessor (SM) count control and introduces a hook-based communication-computation overlap method that doesn't consume any SM resources. While its implementation differs slightly from the DeepSeek-V3 paper, DeepEP's optimized kernels and low-latency design enable excellent performance in large-scale distributed training and inference tasks.