In recent years, training large language models (LLMs) has become increasingly expensive and complex, with only a few large tech companies possessing the necessary computational resources. However, Google has recently introduced a new method called SALT (Small Model-Assisted Large Model Training), which could revolutionize the landscape of AI training.

Robot Artificial Intelligence AI (4)

Image Source Note: Image generated by AI, image licensed from service provider Midjourney

According to a recent research paper from Google Research and DeepMind titled "A Little Help Goes a Long Way: Efficient LLM Training Through Small Language Models," SALT introduces a new two-stage training process. This method is not only efficient but also more practical, changing the way we have traditionally approached training.

The first stage of SALT is knowledge distillation. In this phase, the small language model (SLM) acts as a teacher, transferring its understanding to the larger model. The small model shares its learned knowledge through "soft labels," helping the large model grasp foundational concepts in the early stages of learning. This stage is particularly suited for "simple" tasks where the small model has strong predictive confidence in its learning areas.

The second stage is self-supervised learning. During this phase, the large model begins to learn independently, focusing on mastering more complex patterns and challenging tasks. This transition requires carefully designed strategies, including linear decay and linear proportional decay, ensuring that the large model can smoothly transition while gradually reducing its dependence on the small model.

Google researchers found in experiments that training a 2.8 billion parameter large model using a 1.5 billion parameter small model reduced training time on the "Stack Data Set" by 28%. After fine-tuning, the large model's accuracy on mathematical problems improved from 31.84% to 34.87%, and reading comprehension accuracy increased from 63.7% to 67%. This new approach not only enhances training efficiency but also achieves significant performance improvements.

The emergence of SALT is expected to lower the barriers to AI development, allowing many smaller research institutions and companies that were previously limited by resources to participate in AI model development. Research and development opportunities will become more widespread, potentially leading to more unique and specialized AI solutions, driving innovation and application in related fields.

Key Points:

🌟 The SALT method can reduce the training time of large models by 28%, significantly lowering computational costs.

📈 Using small models for knowledge distillation can significantly enhance the performance of large models on complex tasks.

🔍 SALT's innovation may lower the barriers to AI development, enabling more small institutions to engage in AI research.