MiniGPT-and-DeepSeek-MLA-Multi-Head-Latent-Attention
PublicAn efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designed and implemented the Multi-Head Latent Attention (MLA) module as a drop-in replacement for traditional multi-head attention (MHA) in large language models.