A recent study from the Swiss Federal Institute of Technology in Lausanne (EPFL) compared two leading methods for adaptive training of large language models (LLMs): In-Context Learning (ICL) and Instruction Fine-Tuning (IFT). The researchers used the MT-Bench benchmark to assess the models' ability to follow instructions and found that in specific scenarios, each method had its strengths and weaknesses.

The study revealed that when the number of available training samples is small (e.g., no more than 50), ICL and IFT perform very similarly. This suggests that in situations with limited data, ICL could potentially serve as an alternative to IFT.

image.png

However, as task complexity increases, such as in multi-turn dialogue scenarios, the advantages of IFT become more apparent. The researchers believe that ICL models tend to overfit to the style of individual samples, leading to poor performance in handling complex dialogues, and sometimes even underperforming the base models.

The study also examined the URIAL method, which trains basic language models using only three samples and instruction-following rules. Although URIAL achieved some success, it still lagged behind models trained with IFT. EPFL researchers improved URIAL's performance by refining the sample selection strategy, bringing it closer to fine-tuned models. This highlights the importance of high-quality training data for ICL, IFT, and basic model training.

image.png

Additionally, the study found that decoding parameters significantly impact model performance. These parameters dictate how the model generates text and are crucial for both basic LLMs and models trained with URIAL.

Researchers noted that even basic models can follow instructions to some extent under appropriate decoding parameters.

The significance of this research lies in its revelation that In-Context Learning can quickly and effectively adjust language models, especially when training samples are limited. However, for complex tasks like multi-turn dialogues, Instruction Fine-Tuning remains the superior choice.

As dataset sizes expand, IFT's performance continues to improve, while ICL's performance stabilizes after reaching a certain number of samples. Researchers emphasize that the choice between ICL and IFT depends on various factors, such as available resources, data volume, and specific application needs. Regardless of the method chosen, high-quality training data is crucial.