The team from Peking University and the Hong Kong University of Science and Technology made a big splash with a new training method that has achieved GPT-4 level performance with an 8B-sized medical expert model. This is no small feat, and they have also introduced a new concept, "stability gap," to explain certain phenomena observed during the continuous pre-training of large language models.
Image Source Note: The image is generated by AI, and the image is provided by Midjourney, an image authorization service provider
Firstly, they found that during continuous pre-training, the model's performance in the target domain would first decline and then improve, like a rollercoaster ride. To address this issue, they proposed three strategies. The first is to conduct multi-round pre-training on appropriately sized data subsets, which can recover performance faster than single-round large dataset pre-training. The second is to select the highest quality subset of text for multi-round pre-training. Lastly, by mixing data to approximate the distribution of pre-training data, this allows the model to become more stable.
These strategies have achieved significant results in continuous pre-training and instruction tuning in the medical field, improving effectiveness while also reducing computational volume. Moreover, their open-source Llama-3-Physician-8B model is now available on HuggingFace.
The significance of this research goes beyond this. They also found that with these strategies, the OpenLLaMa model only needs to be trained for 4 rounds on high-quality 5 billion data to significantly outperform all baselines in medical tasks. This not only enhances performance but also greatly reduces the consumption of computational resources.
Even more impressive is that their Llama-3-Physician-8B-instruct model's performance in medical question answering tasks is not only superior to other models of the same size but also surpasses the closed-source GPT-3.5 model, approaching the level of GPT-4. This is a revolution in the medical field.
This research not only provides us with a new training method but also shows the tremendous potential of large language models in the medical field. Through continuous pre-training and instruction fine-tuning, we can achieve higher performance in specific domains while reducing computational costs. This is undoubtedly a great boon for the medical industry.
This research also reminds us that the training of large language models is not an overnight success and requires continuous optimization and adjustment. By introducing the concept of "stability gap," we can better understand and solve problems in model training, allowing models to play a greater role in specific domains. This is not only a technical breakthrough but also a profound insight into the medical industry.
Link to the paper: https://arxiv.org/abs/2406.14833
Open Source Link: https://huggingface.co/YiDuo1999/Llama-3-Physician-8B-Instruct