MInference 1.0
Accelerates long-context pre-fill processing for large language models
CommonProductProgrammingNatural Language ProcessingMachine Learning
MInference 1.0 is a sparse computation method aimed at accelerating the pre-fill stage of long sequence processing. It implements a dynamic sparse attention method for long-context large language models (LLMs) by identifying three unique patterns in the long context attention matrix, accelerating the pre-fill stage for 1M token prompts while maintaining the capabilities of LLMs, especially retrieval capabilities.
MInference 1.0 Visit Over Time
Monthly Visits
690
Bounce Rate
39.63%
Page per Visit
2.2
Visit Duration
00:01:19