In the rapidly evolving field of Large Language Models (LLM), the costs associated with model training and inference have become a focal point of research and application. Recently, the Tencent Hongyuan team released an important study that delves into the "Scaling Laws" of low-bit floating-point quantization training, which refers to the principles governing the scale of floating-point quantization training. The core of this research is to explore how to significantly reduce computational and storage costs by lowering the model's precision without sacrificing performance.

image.png

The research team conducted up to 366 different experiments involving varying parameter scales and precisions for floating-point quantization training. They systematically analyzed various factors affecting training outcomes, including model size (N), training data volume (D), exponent bits (E), mantissa bits (M), and quantization granularity (B). Through these experiments, the researchers established a unified Scaling Law that reveals how to effectively allocate training data and model parameters to achieve optimal training results at different precision levels.

image.png

Crucially, the study points out that in any low-precision floating-point quantization training, there exists a "limit effect," meaning that at a certain amount of data, the model's performance will peak, and exceeding this data volume may lead to a decline in effectiveness. Additionally, the research indicates that the theoretically optimal cost-performance ratio for floating-point quantization training precision should be between 4 to 8 bits, which is of significant guidance for developing efficient LLMs.

image.png

This study not only fills a gap in the field of floating-point quantization training but also provides references for future hardware manufacturers, assisting them in optimizing floating-point computing capabilities at different precision levels. Ultimately, this research offers a clear direction for the practice of training large models, ensuring that efficient training outcomes can still be achieved even with limited resources.

Paper link: https://arxiv.org/pdf/2501.02423