On March 3, 2025, Tongyi Lingma announced the launch of its latest reasoning model, Qwen2.5-Max, offering developers powerful programming and mathematical capabilities. Qwen2.5-Max leverages over 20 trillion tokens of pre-training data and incorporates a meticulously designed post-training scheme, demonstrating exceptional performance.
Qwen2.5-Max excelled in numerous benchmark tests. For example, it outperformed other leading models, including DeepSeek V3, GPT-4o, and Claude-3.5-Sonnet, in tests such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond. Furthermore, it achieved highly competitive scores in evaluations like MMLU-Pro.
In a comparison of base models, Qwen2.5-Max underwent comprehensive benchmarking against models like DeepSeek V3, Llama-3.1-405B, and Qwen2.5-72B. Results showed that Tongyi Qianwen's base model demonstrated significant advantages in most benchmark tests.
Notably, in the latest blind test rankings of large models published by the third-party benchmark platform Chatbot Arena, Qwen2.5-Max surpassed models such as DeepSeek-V3, Open AI o1-mini, and Claude-3.5-Sonnet, achieving a score of 1332 and ranking seventh globally. It secured first place in mathematics and programming and second place in hard prompt capabilities. Chatbot Arena officially praised Alibaba's Qwen2.5-Max for its strong performance across multiple domains, particularly in programming, mathematics, and hard prompt-related technical expertise.
Currently, Qwen2.5-Max is integrated into Tongyi Lingma, and users can experience its powerful programming capabilities by downloading the Tongyi Lingma plugin.