2025-02-24 10:18:41.AIbase.15.6k
DeepSeek Open Source Week Day 1: Open Source Large Model Accelerating Tool FlashMLA Achieves Decoding Performance of 3000GB/s
On the first day of DeepSeek Open Source Week, we officially released our latest technological achievement, FlashMLA, an efficient Multi-Layer Attention decoding kernel specifically designed for NVIDIA Hopper architecture GPUs. This technology is optimized for variable-length sequence scenarios and can significantly enhance the inference performance of large models. The core technical features of FlashMLA include comprehensive support for BF16 precision and a paged key-value cache with a block size of 64.