Following the DeepSeek R1, the Alibaba Cloud Tongyi Qianwen team has just announced the launch of its latest open-source model Qwen2.5-1M, once again attracting industry attention.

The newly released Qwen2.5-1M series includes two open-source models: Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M. This is the first time that Tongyi Qianwen has introduced models that natively support million-token context lengths, achieving significant improvements in inference speed.

Alibaba Cloud, Tongyi Qianwen

The core highlight of Qwen2.5-1M is its native support for ultra-long context processing capabilities of million tokens. This enables the model to easily handle extremely long documents such as books, lengthy reports, and legal documents without cumbersome segmentation. Additionally, the model supports longer and deeper conversations, allowing it to remember longer dialogue histories and provide a more coherent and natural interaction experience. Furthermore, Qwen2.5-1M demonstrates stronger capabilities in understanding complex tasks such as code comprehension, intricate reasoning, and multi-turn dialogues.

In addition to the impressive million-token context length, Qwen2.5-1M brings another significant breakthrough: a lightning-fast inference framework! The Tongyi Qianwen team has fully open-sourced the inference framework based on vLLM and integrated a sparse attention mechanism. This innovative framework allows Qwen2.5-1M to achieve speed improvements of 3 to 7 times when processing million-token inputs! This means users can utilize ultra-long context models more efficiently, greatly enhancing the efficiency and experience in practical application scenarios.