Alibaba Cloud has launched the newly upgraded Qwen2.5-Turbo large language model, which boasts an astonishing context length of 1 million tokens. What does this mean? It is equivalent to 10 volumes of "The Three-Body Problem," 150 hours of audio transcription, or the capacity of 30,000 lines of code! Now, it truly is not a dream to "read ten novels in one go"!
The Qwen2.5-Turbo model achieved 100% accuracy in the Passkey Retrieval task and outperformed models like GPT-4 in long text understanding capabilities. In the RULER long text benchmark test, it scored an impressive 93.1, while GPT-4 scored only 91.6 and GLM4-9B-1M scored 89.9.
In addition to its ability to handle long texts, Qwen2.5-Turbo also excels in short text processing. Its performance in short text benchmark tests is comparable to that of the GPT-4o-mini and Qwen2.5-14B-Instruct models.
By adopting a sparse attention mechanism, the Qwen2.5-Turbo model reduced the processing time for the first token of 1 million tokens from 4.9 minutes to just 68 seconds, achieving a 4.3 times increase in inference speed.
Moreover, the cost of processing 1 million tokens is only 0.3 RMB, allowing for 3.6 times the content to be processed at the same cost compared to GPT-4o-mini.
Alibaba Cloud has prepared a series of demonstrations for the Qwen2.5-Turbo model, showcasing its applications in deeply understanding long novels, code assistance, and reading multiple papers. For example, after a user uploaded the Chinese novel trilogy "The Three-Body Problem" containing 690,000 tokens, the model successfully summarized the plot of each book in English.
Users can experience the powerful capabilities of the Qwen2.5-Turbo model through the API services of Alibaba Cloud Model Studio, HuggingFace Demo, or ModelScope Demo.
Alibaba Cloud stated that it will continue to optimize the model in the future to enhance its alignment with human preferences in long sequence tasks, further improve inference efficiency, reduce computation time, and explore the release of larger and more powerful long-context models.
Official introduction: https://qwenlm.github.io/blog/qwen2.5-turbo/
Online demo: https://huggingface.co/spaces/Qwen/Qwen2.5-Turbo-1M-Demo
API documentation: https://help.aliyun.com/zh/model-studio/getting-started/first-api-call-to-qwen