Recently, NVIDIA launched its brand new Blackwell platform and showcased its initial performance in the MLPerf Training 4.1 benchmark tests. According to the test results, Blackwell achieved a twofold performance improvement in certain areas compared to the previous generation Hopper platform, which has garnered widespread attention in the industry.
In the MLPerf Training 4.1 benchmark tests, the Blackwell platform demonstrated a performance of 2.2 times that of Hopper for the Llama270B fine-tuning task in the LLM (Large Language Model) benchmark, while achieving a twofold improvement in the pre-training of GPT-3175B. Additionally, in other benchmark tests such as Stable Diffusion v2 training, the new generation Blackwell also surpassed its predecessor by 1.7 times.
It is noteworthy that although Hopper continues to show progress, its performance in language model pre-training has also increased by 1.3 times compared to the previous round of MLPerf Training benchmark tests. This indicates that NVIDIA's technology continues to advance. In the recent GPT-3175B benchmark test, NVIDIA submitted 11,616 Hopper GPUs, setting a new scaling record.
Regarding the technical details of Blackwell, NVIDIA stated that the new architecture utilizes optimized Tensor Cores and faster high-bandwidth memory. This allows the GPT-3175B benchmark to run with only 64 GPUs, whereas the Hopper platform required 256 GPUs to achieve the same performance.
NVIDIA also emphasized the performance improvements of the Hopper generation products in software and network updates during the launch event, with expectations that Blackwell will continue to improve with future submissions. Additionally, NVIDIA plans to release the next-generation AI accelerator, Blackwell Ultra, next year, which is expected to offer more memory and stronger computing capabilities.
Blackwell also made its debut in the MLPerf Inference v4.1 benchmark tests last September, achieving an impressive performance of four times that of H100 per GPU in AI inference, particularly using lower FP4 precision. This new trend aims to address the growing demands for low-latency chatbots and intelligent computing needs, such as OpenAI's o1 model.
Key Highlights:
- 🚀 **NVIDIA's Blackwell platform doubles performance in AI training, setting a new industry standard!**
- 📈 **In the GPT-3175B benchmark test, Blackwell only required 64 GPUs, significantly improving efficiency!**
- 🔍 **Blackwell Ultra will be launched next year, expected to provide higher memory and computing capabilities!**