The DeepSeek-V3/R1 inference system is a high-performance inference architecture developed by the DeepSeek team, aiming to optimize the inference efficiency of large-scale sparse models. It significantly improves GPU matrix computation efficiency and reduces latency through cross-node expert parallelism (EP) technology. The system employs a double-batch overlapping strategy and a multi-level load balancing mechanism to ensure efficient operation in large-scale distributed environments. Its main advantages include high throughput, low latency, and optimized resource utilization, making it suitable for high-performance computing and AI inference scenarios.