Lenovo today announced that its first AMD AI large model training server, the Lenovo ThinkSystem WA7785a G3, achieved a peak throughput of 6708 tokens/s when deploying the full-scale 671B DeepSeek large model on a single machine, setting a new record for single-server performance in running ultra-large-scale models.

This performance breakthrough is attributed to the powerful support of Lenovo's Wanquan heterogeneous computing platform. Lenovo optimized memory access, GPU memory, implemented an innovative PCIe5.0 fully interconnected architecture, and selected optimal operators from the SGLang framework. These innovative technical measures continuously optimized the entire process of large models, from pre-training and post-training to inference. Tests showed that the Lenovo ThinkSystem WA7785a G3 server, deploying the DeepSeek 671B large model on a single machine, achieved an astonishing peak throughput of 6708 tokens/s.

GPU chip (5)

Image Source Note: Image generated by AI, image licensing provider Midjourney

In simulated question-and-answer scenarios (context sequence length 128/1K), the server supports up to 158 concurrent users, with a TPOT (Time Per Output Token) of 93 milliseconds and a TTFT (Time To First Token) of 2.01 seconds. In simulated code generation scenarios (context sequence length 512/4K), the concurrency can reach 140, with a TPOT of 100 milliseconds and a TTFT of 5.53 seconds. Lenovo stated that this performance means a single Lenovo ThinkSystem WA7785a G3 server can support the normal use of a 1500-person company, marking another significant leap in single-machine deployment inference performance for the full-scale DeepSeek large model, following the previous breakthrough of over 2500 tokens/s total throughput on the Lenovo ThinkSystem WA7780 G3 server.

Lenovo emphasized that this technological breakthrough is a joint achievement of Lenovo's China Infrastructure Business Group, Lenovo Research's ICI lab, and AMD, through joint design, collaborative optimization, and shared implementation. Furthermore, this is not the final result; Lenovo and AMD are continuing to explore new methods for deep optimization to achieve even higher performance breakthroughs.