Tencent Youtu Lab and the research team from Shanghai Jiao Tong University have jointly introduced a revolutionary knowledge-enhanced method, opening up a new path for large model optimization. This innovative technology discards the limitations of traditional model fine-tuning, directly extracting knowledge from open-source data, significantly simplifying the model optimization process, and achieving superior performance that surpasses the current state-of-the-art (SOTA) in multiple tasks.

image.png

In recent years, despite significant advancements in large language models (LLMs) across various domains, they still face numerous challenges in practical applications. Traditional model fine-tuning methods require extensive labeled data and computational resources, which are often unattainable for many real-world businesses. Although the open-source community provides rich fine-tuned models and instruction datasets, effectively utilizing these resources with limited labeled samples to enhance model task capability and generalization performance has been a persistent industry challenge.

Addressing this issue, the research team proposed a novel experimental framework, focusing on leveraging open-source knowledge to enhance model capabilities under K-shot labeled real-world business data conditions. This framework fully leverages the value of limited samples, providing targeted task performance improvements for large language models.

image.png

The core innovations of this research include:

Efficient Model Selection: By comprehensively evaluating inference perplexity, model performance, and knowledge richness, the potential of existing models is maximized under limited data conditions.

Knowledge Extraction Optimization: A method for extracting relevant knowledge from open-source data was designed, balancing similarity and diversity in data selection strategies to provide supplementary information to the model while reducing overfitting risks.

Adaptive Model System: An adaptive system based on a mixed-expert model structure was constructed, enabling knowledge complementarity among multiple effective models to enhance overall performance.

During the experimental phase, the research team conducted comprehensive evaluations using six open-source datasets. The results showed that this new method outperformed baselines and other advanced methods in all tasks. Through visualizing expert activation patterns, it was also discovered that each expert's contribution to the model is indispensable, further confirming the effectiveness of this method.

This research not only showcases the immense potential of open-source knowledge in the large model field but also provides new insights for the future development of artificial intelligence technology. It breaks through the limitations of traditional model optimization, offering a feasible solution for businesses and research institutions to enhance model performance under limited resources.

As this technology continues to improve and be promoted, we have reason to believe it will play a significant role in the intelligent upgrade of various industries. Tencent Youtu's collaboration with Shanghai Jiao Tong University is not only a model of cooperation between the academic and industrial sectors but also a crucial step in advancing artificial intelligence technology to a higher level.

Paper Link: https://www.arxiv.org/pdf/2408.15915