Chinese AI company DeepSeek announced the launch of DeepGEMM, an open-source library supporting FP8 general matrix multiplication (GEMM), on day three of its "Open Source Week." This tool is designed for intensive and Mixture-of-Experts (MoE) matrix operations, powerfully supporting the training and inference of DeepSeek V3 and R1 models. The official announcement, made via X, quickly generated significant excitement within the tech community.

image.png

According to DeepSeek's official X post, DeepGEMM achieves up to 1350+ TFLOPS of FP8 computing performance on NVIDIA Hopper GPUs. Its core logic consists of only about 300 lines of code, yet it surpasses expert-tuned kernels on most matrix sizes, demonstrating exceptional efficiency and simplicity. The library requires no complex dependencies, utilizes Just-In-Time (JIT) compilation, supports dense layouts and two MoE layouts, and is designed with a "tutorial-like" cleanliness for easy learning and use.

X user @TechBitDaily commented: "The release of DeepGEMM is a highlight of DeepSeek's Open Source Week; its FP8 performance and concise design are impressive." Another user, @AIObserverCN, noted the library's significant advantages in supporting efficient training of MoE models, potentially driving further innovation within the AI community on Hopper architectures.

As part of Open Source Week, the release of DeepGEMM continues DeepSeek's commitment to promoting transparency and community collaboration in AI technology. In the first two days, the company released FlashMLA and DeepEP, focusing on fast language model architecture and expert parallel communication, respectively. DeepGEMM's unveiling further showcases its technological prowess in AI infrastructure development. Industry experts believe this library will not only enhance the performance of DeepSeek's own models but also provide a highly efficient and user-friendly matrix operation tool for global developers, with promising future applications. Users can now access DeepGEMM via GitHub to explore its potential in AI training and inference.

Project Address: https://github.com/deepseek-ai/DeepGEMM