Aliyun's Bailian platform recently announced the launch of the Qwen2.5-Turbo model, a million-token long text model developed by the Tongyi Qianwen team. This model supports processing ultra-long contexts of up to 1 million tokens, equivalent to 1 million English words or 1.5 million Chinese characters.
This new version of the model achieved an accuracy rate of 100% in long text retrieval tasks and scored 93.1 on the long text evaluation set RULER, surpassing GPT-4. In near-realistic long text tasks such as LV-Eval and LongBench-Chat, Qwen2.5-Turbo outperformed GPT-4o-mini in most dimensions. The model also performed exceptionally well in short text benchmark tests, significantly exceeding previous open-source models with a context length of 1M tokens.
The Qwen2.5-Turbo model has a wide range of applications, including deep understanding of long novels, large-scale code assistance, and reading multiple papers. It can handle 10 long novels, 150 hours of speeches, or 30,000 lines of code at once. In terms of inference speed, the Tongyi Qianwen team has reduced the computational load by about 12.5 times using a sparse attention mechanism, decreasing the response time for processing a 1M token context from 4.9 minutes to 68 seconds, achieving a 4.3 times speed improvement.
Aliyun's Bailian platform provides all users with the ability to directly call the Qwen2.5-Turbo API and offers a limited-time gift of 10 million tokens. The cost for subsequent use of one million tokens is only 0.3 yuan.
Currently, Aliyun's Bailian platform has launched over 200 mainstream domestic and international open-source and closed-source large models, including Qwen, Llama, and ChatGLM, supporting users in direct invocation, training fine-tuning, or building RAG applications.