Recently, the technology analysis firm SemiAnalysis released a five-month investigation report revealing significant software issues with AMD's newly launched MI300X AI chip, which hinder its performance and prevent it from challenging Nvidia's dominance in the AI chip market.
Image Source Note: Image generated by AI, image licensed by Midjourney
The report indicates that AMD's software has numerous vulnerabilities, making it nearly impossible to train AI models, requiring users to spend a significant amount of time debugging. Meanwhile, Nvidia continues to roll out new features, libraries, and performance updates, further widening the gap between the two. Analysts conducted extensive testing, including GEMM benchmark tests and single-node training, and found that AMD consistently failed to overcome the so-called "CUDA moat"—Nvidia's strong advantage in software.
From a hardware specifications perspective, the performance data of the MI300X is quite impressive, boasting FP16 computing capabilities of 1307 TeraFLOPS and equipped with 192GB of HBM3 memory. In contrast, Nvidia's H100 offers 989 TeraFLOPS and 80GB of memory, although Nvidia's latest H200 has narrowed this gap in memory, providing a configuration of 141GB. It is worth mentioning that AMD systems have an advantage in total cost of ownership, being lower in price and offering more affordable Ethernet networking.
However, these hardware advantages have not translated into the expected outcomes in real-world usage. SemiAnalysis describes this phenomenon as "comparing cameras solely by pixel count," suggesting that AMD has lost its way in the digital game and failed to deliver sufficient practical performance. To obtain usable benchmark results, analysts had to work directly with AMD engineers to resolve multiple software vulnerabilities, whereas Nvidia's systems can be used directly without additional adjustments.
The report also noted that AMD's largest GPU cloud service provider, Tensorwave, even had to provide its purchased GPUs to the AMD team for free to help resolve software issues. Therefore, SemiAnalysis recommends that AMD CEO Lisa Su increase investment in software development and testing, particularly allocating a significant number of MI300X chips for automated testing, simplifying complex environmental variables, and improving default settings to enhance the out-of-the-box experience.
Although SemiAnalysis hopes AMD can become a strong competitor to Nvidia, they also stated, "Unfortunately, there is still much work to be done." Without substantial improvements to the software, AMD risks falling further behind, especially with Nvidia preparing to launch its next-generation Blackwell chip, although reports have also indicated that Nvidia's next-generation product launch may not go smoothly.
Key Points:
🌟 The AMD MI300X AI chip faces serious software issues, making AI model training difficult.
🔧 Nvidia continues to expand its market advantage with a strong CUDA platform and frequent software updates.
💡 SemiAnalysis recommends that AMD increase investment in software development to improve user experience and enhance competitiveness.