MInference 1.0

Accelerates long-context pre-fill processing for large language models

CommonProductProgrammingNatural Language ProcessingMachine Learning
MInference 1.0 is a sparse computation method aimed at accelerating the pre-fill stage of long sequence processing. It implements a dynamic sparse attention method for long-context large language models (LLMs) by identifying three unique patterns in the long context attention matrix, accelerating the pre-fill stage for 1M token prompts while maintaining the capabilities of LLMs, especially retrieval capabilities.
Visit

MInference 1.0 Visit Over Time

Monthly Visits

690

Bounce Rate

39.63%

Page per Visit

2.2

Visit Duration

00:01:19

MInference 1.0 Visit Trend

MInference 1.0 Visit Geography

MInference 1.0 Traffic Sources

MInference 1.0 Alternatives