MInference 1.0

Accelerates long-context pre-fill processing for large language models

CommonProductProgrammingNatural Language ProcessingMachine Learning
MInference 1.0 is a sparse computation method aimed at accelerating the pre-fill stage of long sequence processing. It implements a dynamic sparse attention method for long-context large language models (LLMs) by identifying three unique patterns in the long context attention matrix, accelerating the pre-fill stage for 1M token prompts while maintaining the capabilities of LLMs, especially retrieval capabilities.
Visit

MInference 1.0 Visit Over Time

Monthly Visits

240

Bounce Rate

43.89%

Page per Visit

1.0

Visit Duration

00:00:00

MInference 1.0 Visit Trend

MInference 1.0 Visit Geography

MInference 1.0 Traffic Sources

MInference 1.0 Alternatives