Tencent Hunyuan Releases New Theory on Floating Point Quantization Training, Revealing the Limits of Large Model Training

AIbase基地

Published inAI News · 4 min read · Jan 17, 2025

152

In the rapidly evolving field of Large Language Models (LLM), the costs associated with model training and inference have become a focal point of research and application. Recently, the Tencent Hongyuan team released an important study that delves into the "Scaling Laws" of low-bit floating-point quantization training, which refers to the principles governing the scale of floating-point quantization training. The core of this research is to explore how to significantly reduce computational and storage costs by lowering the model's precision without sacrificing performance.

The research team conducted up to 366 different experiments involving varying parameter scales and precisions for floating-point quantization training. They systematically analyzed various factors affecting training outcomes, including model size (N), training data volume (D), exponent bits (E), mantissa bits (M), and quantization granularity (B). Through these experiments, the researchers established a unified Scaling Law that reveals how to effectively allocate training data and model parameters to achieve optimal training results at different precision levels.

Crucially, the study points out that in any low-precision floating-point quantization training, there exists a "limit effect," meaning that at a certain amount of data, the model's performance will peak, and exceeding this data volume may lead to a decline in effectiveness. Additionally, the research indicates that the theoretically optimal cost-performance ratio for floating-point quantization training precision should be between 4 to 8 bits, which is of significant guidance for developing efficient LLMs.

This study not only fills a gap in the field of floating-point quantization training but also provides references for future hardware manufacturers, assisting them in optimizing floating-point computing capabilities at different precision levels. Ultimately, this research offers a clear direction for the practice of training large models, ensuring that efficient training outcomes can still be achieved even with limited resources.

Paper link: https://arxiv.org/pdf/2501.02423

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Moonshot AI officially released its latest creation - the Kimi K2 model, and simultaneously announced its open source. This foundation model based on the MoE architecture has gained widespread attention in the AI field since its release, thanks to its strong coding capabilities and excellent general Agent task processing abilities. The Kimi K2 model has a total of 1T parameters, with 32B activated parameters. It has achieved top performance among open-source models in a series of benchmark performance tests such as SWE Bench Verified, Tau2, and AceBench.

Tencent Hunyuan-A13B Model API Launches

Recently, Tencent Cloud officially launched the API service for the Tencent Hunyuan A13B model on its official website. The input price is set at 0.5 yuan per million Tokens, and the output price is 2 yuan per million Tokens, which has quickly sparked enthusiastic discussions in the developer community. As the first 13B-level MoE (Mixture of Experts) open-source hybrid inference model in the industry, Hunyuan-A13B features a total of 80B parameters and only 13B activated parameters, achieving performance comparable to leading open-source models of the same architecture, while also demonstrating efficient reasoning capabilities.

AI Daily: Zhipu Launches PPT Generation Function AI Slides; Ke Ling AI Releases Ketur 2.1 Model

1. Zhipu launches free AI Slides for PPT generation. 2. Keling AI introduces KeTu 2.1 with 180 styles. 3. NVIDIA's DiffusionRenderer enables 3D scene editing. 4. Modao AI offers 30-second prototype generation. 5. Higgsfield creates avatars from 10 photos. 6. Google open-sources GenAI Processors. 7. Google Veo3 adds image-to-video. 8. Mistral AI releases Devstral2507 for code generation.....

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Tencent Hunyuan Releases New Theory on Floating Point Quantization Training, Revealing the Limits of Large Model Training

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Tencent Hunyuan-A13B Model API Launches

AI Daily: Zhipu Launches PPT Generation Function AI Slides; Ke Ling AI Releases Ketur 2.1 Model

Microsoft BioEmu Model Dramatically Shortens Protein Simulation Time

City Commercial Banks Are Launching a Trend of Large Model Bidding, with Million-Level Investments Becoming a New Industry Opportunity!

Kling AI Releases KTu 2.1 Model: Significant Improvement in Image Generation Capabilities, Supports 180 Styles

Keling AI Launches Keltu 2.1 Model, Will Be Free for All Members for 7 Days

vivo New Multimodal Model Launches! AI's Ability to Understand GUI Interfaces is Upgraded Again!

Meta Hires Apple AI Model Head for Over 200 Million USD

Google's Medical AI Model MedGemma Series Released, Can Run on a Single GPU