Recently, Meta AI researcher Thomas Scialom shared insights on their latest project, Llama3, in an interview. He candidly pointed out that the vast amount of text on the internet is of varying quality, and training on such data is a waste of resources. Therefore, Llama3's training process does not rely on any human-written answers but is entirely based on synthetic data generated by Llama2.
Discussing the training details of Llama3, Scialom elaborated on the application of synthetic data in various fields. For instance, in code generation, they employed three different methods to generate synthetic data, including feedback from code execution, translation of programming languages, and reverse translation of documentation. In mathematical reasoning, they drew on the "let's verify step by step" research method for data generation. Additionally, Llama3 continued pre-training with 90% multilingual tokens to collect high-quality human annotations, which is particularly important in multilingual processing.
Long text processing is also a focus for Llama3, relying on synthetic data for long text Q&A, long document summarization, and codebase reasoning. In terms of tool usage, Llama3 was trained on Brave search, Wolfram Alpha, and Python interpreters to achieve single, nested, parallel, and multi-round function calls.
Scialom also mentioned the importance of Reinforcement Learning with Human Feedback (RLHF) in Llama3's training. They extensively utilized human preference data to train the model and emphasized the human ability to make choices (such as preferring one poem over another), rather than creating from scratch.
Meta has already begun training Llama4 in June, with Scialom revealing that a major focus of Llama4 will be around agents. He also mentioned a multimodal version of Llama, which will have more parameters and is planned for release in the near future.
Scialom's interview sheds light on the latest advancements and future directions of Meta AI in the field of artificial intelligence, particularly in how to leverage synthetic data and human feedback to enhance model performance.