Artificial Intelligence (AI) is becoming increasingly intelligent, especially large language models (LLMs), which are astonishing in their ability to process natural language. But did you know? Behind these intelligent AI brains lies the need for substantial computational power and storage space to support them.
A multilingual model called Bloom with 176 billion parameters requires at least 350GB of space just to store the model's weights, and several high-end GPUs to run it. This is not only costly but also difficult to popularize.
To address this issue, researchers have proposed a technique called "quantization." Quantization is like giving the AI brain a "slim-down," by mapping the model's weights and activations to a lower-bit data format, which not only reduces the model's size but also speeds up its operation. However, this process also carries risks, potentially sacrificing some accuracy.
Facing this challenge, researchers from Beihang University and SenseTime have jointly developed the LLMC toolkit. LLMC is like a personal fitness coach for AI, helping researchers and developers find the most suitable "weight loss plan," making the AI model lighter without affecting its "intelligence level."
LLMC toolkit has three main features:
Diversity: LLMC offers 16 different quantization methods, like preparing 16 different diet plans for AI. Whether your AI wants to lose weight all over or just in certain areas, LLMC can meet your needs.
Low cost: LLMC is very resource-saving, even handling ultra-large models with minimal hardware support. For example, just one 40GB A100 GPU can adjust and evaluate a model with 175 billion parameters like OPT-175B. It's as efficient as training an Olympic champion on a home treadmill!
High compatibility: LLMC supports multiple quantization settings and model formats, and is compatible with various backends and hardware platforms. It's like a versatile coach, no matter what equipment you use, it can help you devise a suitable training plan.
LLMC's practical application: Making AI smarter and more energy-efficient
The emergence of the LLMC toolkit provides a comprehensive and fair benchmark test for the quantization of large language models. It considers three key factors: training data, algorithms, and data formats, helping users find the best performance optimization solution.
In practical applications, LLMC can help researchers and developers more efficiently integrate suitable algorithms and low-bit formats, promoting the compression and popularization of large language models. This means that in the future, we may see more lightweight yet equally powerful AI applications.
The authors of the paper also shared some interesting findings and suggestions:
When selecting training data, choose datasets that are more similar to the test data in terms of vocabulary distribution, just as humans should choose suitable diets based on their own conditions.
Regarding quantization algorithms, they explored the impact of three main techniques: conversion, pruning, and reconstruction, comparing different exercise methods' effects on weight loss.
When choosing between integer or floating-point quantization, they found that floating-point quantization has an advantage in handling complex situations, while in some special cases, integer quantization might be better. This is similar to needing different exercise intensities at different stages of weight loss.
The advent of the LLMC toolkit brings a new breeze to the AI field. It not only provides a powerful assistant for researchers and developers but also points the way for the future development of AI. Through LLMC, we can look forward to seeing more lightweight and high-performance AI applications, truly integrating AI into our daily lives.
Project address: https://github.com/ModelTC/llmc
Paper address: https://arxiv.org/pdf/2405.06001