With the increasing popularity of large language models (LLMs), efficient deployment in resource-constrained environments has become a crucial challenge. To address this, the lightweight LLM series DistilQwen2.5, based on Qwen2.5, has been officially released. This model employs an innovative two-stage distillation framework, leveraging optimized data and parameter fusion techniques. This not only preserves model performance but also significantly reduces computational resource consumption.
The success of DistilQwen2.5 stems from its unique knowledge distillation technique. This process begins with a large amount of high-quality instruction data sourced from various open-source and proprietary synthetic datasets. To ensure data diversity, the research team expanded the Chinese and English data using Qwen-max, achieving a balance in tasks and languages. Subsequently, the model employs a "black-box distillation" approach, utilizing the teacher model's output to expand, select, and rewrite instructions. This method not only improves data quality but also enhances the model's multi-task processing capabilities.
Notably, DistilQwen2.5 also introduces white-box distillation technology. By mimicking the teacher model's intermediate representations (its distribution), the student model achieves more efficient knowledge acquisition. This technique avoids the GPU memory consumption, slow storage and retrieval speeds, and other problems associated with traditional white-box distillation.
Tested on several authoritative instruction-following benchmarks, DistilQwen2.5 demonstrates remarkable performance, particularly excelling in AlpacaEval2.0 and MT-Bench evaluations. This marks a new stage in the development of lightweight LLMs, significantly reducing computational costs while maintaining performance. This advancement further promotes the application of AI technology across various scenarios.
The open-source release of DistilQwen2.5 will also benefit developers, enabling easier access to this powerful tool and contributing to the widespread adoption of AI technology.