AIbase
Product LibraryTool Navigation

MiniGPT-and-DeepSeek-MLA-Multi-Head-Latent-Attention

Public

An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designed and implemented the Multi-Head Latent Attention (MLA) module as a drop-in replacement for traditional multi-head attention (MHA) in large language models.

Creat2025-04-08T23:10:50
Update2025-04-09T06:41:35
1
Stars
0
Stars Increase

Related projects