DeepSeek has launched an official account on Zhihu, publishing a technical article titled "DeepSeek-V3/R1 Inference System Overview." This article, marking the end of the highly anticipated "DeepSeek Open Source Week," provides the first detailed disclosure of its model inference system's optimization details and cost-profit margin information.
The article outlines two primary optimization goals for the DeepSeek-V3/R1 inference system: "higher throughput and lower latency." To achieve these goals, DeepSeek employs large-scale cross-node expert parallelism (EP) technology, despite the increased system complexity. The article highlights how EP technology is leveraged to increase batch size, hide transfer latency, and achieve load balancing.
Significantly, DeepSeek has unusually disclosed its cost and profit margin data. The article reveals: "Assuming a GPU rental cost of $2/hour, the total cost is $87,072/day. If all tokens are priced according to DeepSeek R1 pricing, the theoretical total daily revenue is $562,027, resulting in a cost-profit margin of 545%."