2024-07-29 11:34:37.AIbase.10.6k
Llama 4 Training Launch: Meta Scientists Reveal the Story Behind Llama 3.1's Training
Meta scientist Thomas Scialom reveals the secrets behind the development of Llama 3.1, whose 405B parameter scale was designed to compete with GPT-4. By increasing the number of training tokens rather than changing the architecture, Llama 3.1 achieves an optimized balance between model scale and the total amount of training data, resulting in significant leaps in knowledge depth and breadth. In terms of data selection, Scialom prefers synthetic data over publicly available internet text. The evaluation and improvement of Llama 3.1 utilize reward models and diverse benchmarks.