Recently, researchers at Microsoft have introduced a novel AI framework called Auto Evol-Instruct, which can automatically evolve instructional datasets without any human intervention.

In the field of artificial intelligence, the development of large language models (LLMs) is crucial, especially in enhancing their ability to follow detailed instructions. Researchers have been exploring ways to improve the datasets used to train LLMs to enhance their performance and adaptability.

image.png

Traditional evolutionary methods like Evol-Instruct rely on evolution rules specified by human experts, which are not only costly and time-consuming but also require redesigning methods when adapting to new tasks. In contrast, Auto Evol-Instruct achieves an automated evolution process by initially using LLMs to analyze input instructions and autonomously design initial evolution rules. Subsequently, through iterative optimization by optimizer LLMs, it identifies and resolves issues in the evolution process to ensure the final evolution instructions' complexity and stability.

Auto Evol-Instruct enhances the complexity and diversity of datasets by automatically analyzing input instructions and formulating evolution rules, utilizing LLMs to design evolution methods.

In terms of performance evaluation, Auto Evol-Instruct has performed exceptionally well in multiple benchmark tests. For example, by fine-tuning Mixtral-8x7B with only 10K evolved ShareGPT data, the framework achieved 8.09 points on MT-Bench and 91.4 points on AlpacaEval, surpassing GPT-3.5-Turbo and WizardLM-70B, and matching Claude2.0.

Additionally, by using only 7K evolved GSM8K training data, the framework achieved 82.49 points on GSM8K. In code generation, by fine-tuning DeepSeek-Coder-Base-33B with 20K evolved Code Alpaca, the framework achieved 77.4 points on HumanEval, outperforming other competitive models.

image.png

This new framework has demonstrated outstanding performance in multiple benchmark tests, including MT-Bench, AlpacaEval, GSM8K, and HumanEval, showcasing its potential in improving instruction following, mathematical reasoning, and code generation capabilities.

Paper link: https://arxiv.org/abs/2406.00770

Key Points:

🔍 Auto Evol-Instruct is a fully automated AI framework capable of automatically analyzing and evolving instructional datasets without human intervention.

🚀 The framework effectively enhances the complexity and diversity of datasets by optimizing evolution methods, thereby improving the performance and adaptability of LLMs across various tasks.

💡 The research results of Auto Evol-Instruct indicate the effectiveness of automating the evolution of instructional datasets.