DeepSeek Open Source Week Day 3: Announcing DeepGEMM, an FP8 GEMM Library for AI Training and Inference

AIbase基地

Published inAI News · 3 min read · Feb 26, 2025

363

Chinese AI company DeepSeek announced the launch of DeepGEMM, an open-source library supporting FP8 general matrix multiplication (GEMM), on day three of its "Open Source Week." This tool is designed for intensive and Mixture-of-Experts (MoE) matrix operations, powerfully supporting the training and inference of DeepSeek V3 and R1 models. The official announcement, made via X, quickly generated significant excitement within the tech community.

According to DeepSeek's official X post, DeepGEMM achieves up to 1350+ TFLOPS of FP8 computing performance on NVIDIA Hopper GPUs. Its core logic consists of only about 300 lines of code, yet it surpasses expert-tuned kernels on most matrix sizes, demonstrating exceptional efficiency and simplicity. The library requires no complex dependencies, utilizes Just-In-Time (JIT) compilation, supports dense layouts and two MoE layouts, and is designed with a "tutorial-like" cleanliness for easy learning and use.

X user @TechBitDaily commented: "The release of DeepGEMM is a highlight of DeepSeek's Open Source Week; its FP8 performance and concise design are impressive." Another user, @AIObserverCN, noted the library's significant advantages in supporting efficient training of MoE models, potentially driving further innovation within the AI community on Hopper architectures.

As part of Open Source Week, the release of DeepGEMM continues DeepSeek's commitment to promoting transparency and community collaboration in AI technology. In the first two days, the company released FlashMLA and DeepEP, focusing on fast language model architecture and expert parallel communication, respectively. DeepGEMM's unveiling further showcases its technological prowess in AI infrastructure development. Industry experts believe this library will not only enhance the performance of DeepSeek's own models but also provide a highly efficient and user-friendly matrix operation tool for global developers, with promising future applications. Users can now access DeepGEMM via GitHub to explore its potential in AI training and inference.

Project Address: https://github.com/deepseek-ai/DeepGEMM

DeepGEMM FP8 GEMM Mixture of Experts

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Google Unveils DolphinGemma: A New Milestone in Dolphin Language Research

Apr 15, 2025

110

Google Unveils New AI Model to Help Decode Dolphin Language

Apr 15, 2025

130

Google Unveils DolphinGemma: AI Deciphers Dolphin Language; Pixel Phones Become Translators

Google recently released a groundbreaking AI model, DolphinGemma, aimed at understanding the communication patterns of wild Atlantic spotted dolphins. Developed in collaboration with the Wild Dolphin Project (WDP) and researchers from the Georgia Institute of Technology, the project leverages WDP's extensive database of dolphin audio and video recordings accumulated over nearly 40 years in the Bahamas. Trained on this valuable data, DolphinGemma utilizes Google's advanced SoundStream tokenizer to segment...

Apr 15, 2025

Google Releases Open-Source TxGemma Model to Accelerate Therapeutic Drug Discovery

Google recently announced the launch of its new open-source model, TxGemma, designed to significantly improve the efficiency of therapeutic drug development. Developed by the Google DeepMind team, this model is a fine-tuned version of their advanced Gemma model family, incorporating powerful natural language understanding, scientific prediction, and multi-turn dialogue capabilities. TxGemma aims to revolutionize drug discovery. Its release is considered a significant advancement in the application of AI to biomedicine, promising to drastically shorten the time it takes for drugs to move from the lab to clinical trials, while also reducing costs.

Mar 31, 2025

200

Research Finds: Number of Documents in RAG Systems Impacts Language Model Performance

Researchers at the Hebrew University of Jerusalem recently discovered that in Retrieval Augmented Generation (RAG) systems, the number of documents processed significantly impacts language model performance, even when the total text length remains constant. The research team conducted experiments using 2,417 questions from the MuSiQue validation dataset, each linked to 20 Wikipedia paragraphs. Two to four paragraphs contained relevant answer information, with the remaining paragraphs serving as distractors. To study the impact of the number of documents, the team created multiple data partitions, progressively reducing the number of documents from 20 to...

Mar 31, 2025

190

Google AI Releases TxGemma: A New Large Language Model for Drug Discovery

Mar 28, 2025

330

Reka Releases Open-Source Reka Flash 3, Outperforming Gemma 3 27B (Developed by Former Google Scientists)

Reka AI, founded by a dozen former Google DeepMind scientists, has unveiled its first open-source model: Reka Flash 3. This 21-billion parameter inference model has garnered significant attention. Despite its relatively smaller parameter count, Reka Flash 3 is a general-purpose reasoning model trained from scratch. It underwent supervised fine-tuning on synthetic and public datasets and further refinement through model-based techniques.

Mar 21, 2025

410

Moore Threads Open-Sources Two Major AI Frameworks, Achieving Over 90% Training Efficiency on Domestic GPUs

Mar 18, 2025

200

Tsinghua Team Open-Sources Chitu Inference Engine to Boost Domestic AI Ecosystem

Recently, Professor Zhai Jidong's team from the Institute of High Performance Computing at Tsinghua University, in collaboration with Tsinghua-affiliated innovative enterprise, Qingcheng Extreme Intelligence, announced the open-sourcing of Chitu, a groundbreaking large model inference engine. This innovative technology marks another significant breakthrough for China in the AI field, particularly in inference engine development. A core highlight of the Chitu engine is its ability to natively run FP8 precision models on non-Nvidia Hopper architecture GPUs and various domestic chips.

Mar 15, 2025

560

AI Daily: Alibaba's Quark Upgrades to AI Super Box; Google Open-Sources Multimodal Model Gemma-3; LuChern Technology Open-Sources Video Large Model Open-Sora 2.0

Welcome to the 【AI Daily】 column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the hottest content in the AI field, focusing on developers and helping you understand technology trends and innovative AI product applications. Discover fresh AI products here: https://top.aibase.com/ 1. Alibaba Launches AI Flagship Application "New Quark," Upgraded to "AI Super Box" Alibaba launched its newly upgraded AI flagship application - New Quark - on March 13th. By 2025, Zhipu...

Mar 13, 2025

290

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

DeepSeek Open Source Week Day 3: Announcing DeepGEMM, an FP8 GEMM Library for AI Training and Inference

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Google Unveils DolphinGemma: A New Milestone in Dolphin Language Research

Google Unveils New AI Model to Help Decode Dolphin Language

Google Unveils DolphinGemma: AI Deciphers Dolphin Language; Pixel Phones Become Translators

Google Releases Open-Source TxGemma Model to Accelerate Therapeutic Drug Discovery

Research Finds: Number of Documents in RAG Systems Impacts Language Model Performance

Google AI Releases TxGemma: A New Large Language Model for Drug Discovery

Reka Releases Open-Source Reka Flash 3, Outperforming Gemma 3 27B (Developed by Former Google Scientists)

Moore Threads Open-Sources Two Major AI Frameworks, Achieving Over 90% Training Efficiency on Domestic GPUs

Tsinghua Team Open-Sources Chitu Inference Engine to Boost Domestic AI Ecosystem

AI Daily: Alibaba's Quark Upgrades to AI Super Box; Google Open-Sources Multimodal Model Gemma-3; LuChern Technology Open-Sources Video Large Model Open-Sora 2.0