On February 27, 2025, Tencent officially released Hunyuan Turbo S, a new generation of fast-thinking model, marking a significant breakthrough in response speed and performance optimization of large language models. Unlike traditional slow-thinking models like Deepseek R1 and Hunyuan T1, Hunyuan Turbo S achieves "instant replies," significantly improving the speed of outputting answers, doubling the word output speed, and reducing the first-word latency by 44%. This innovation enables the model to excel in multiple areas such as knowledge, mathematics, and creation, providing a new solution for the rapid response capability of large models.

The design inspiration for Hunyuan Turbo S comes from the fast-thinking mode, which humans rely on for 90% to 95% of daily decisions based on intuition. Combined with slow-thinking mode of rational analysis, it provides the large model with more intelligent and efficient problem-solving capabilities. Through the fusion of long and short reasoning chains, the model not only maintains a fast experience on humanities problems but also significantly improves its science reasoning capabilities, resulting in a substantial overall performance improvement. In several widely used public benchmark tests, Hunyuan Turbo S demonstrates performance comparable to leading models such as DeepSeek V3, GPT4o, and Claude.

微信截图_20250227173715.png

In terms of architectural innovation, Hunyuan Turbo S adopts a Hybrid-Mamba-Transformer fusion mode, effectively reducing the computational complexity and KV-Cache cache occupancy of the traditional Transformer architecture, significantly lowering training and inference costs. This hybrid architecture overcomes the challenges of high training and inference costs associated with traditional large models for long texts, leveraging the advantages of the Mamba architecture in handling long sequences while retaining the Transformer's ability to capture complex contexts. This marks the first successful application of the Mamba architecture to ultra-large MoE models in the industry without loss of performance.

As the core foundation of Tencent's Hunyuan series, Hunyuan Turbo S will provide basic capabilities for derivative models in inference, long texts, and code in the future. Based on Turbo S, Tencent has also launched the T1 inference model with deep thinking capabilities. This model has been fully launched on Tencent Yuanbao and will soon be available via API access.

Currently, developers and enterprise users can access Hunyuan Turbo S via API on the Tencent Cloud website and enjoy a one-week free trial. The pricing is 0.8 yuan/million tokens for input and 2 yuan/million tokens for output, a significant price reduction compared to the previous generation Hunyuan Turbo model. Additionally, Hunyuan Turbo S will be gradually rolled out on Tencent Yuanbao, where users can select the "Hunyuan" model and disable the deep thinking function for a trial.

Tencent Hunyuan Turbo Model API Free Trial Application: https://cloud.tencent.com/apply/p/i2zophus2x8