Llama 4 Training Launch: Meta Scientists Reveal the Story Behind Llama 3.1's Training

AIbase基地

Published inAI News · 4 min read · Jul 29, 2024

350

Meta scientist Thomas Scialom revealed the development secrets of Llama 3.1 and teased the upcoming Llama 4 on the podcast Latent Space.

The birth of Llama 3.1 represents a perfect balance between parameter scale, training time, and hardware constraints. Its massive 405B size is not a random choice but a challenge letter from Meta to GPT-4o. Although hardware limitations prevent Llama 3.1 from running on every household computer, the power of the open-source community makes everything possible.

During the development of Llama 3.1, Scialom and his team revisited the Scaling Law. They found that while model size is crucial, the total amount of training data is even more important. Llama 3.1 opted to increase the number of training tokens, even if it required more computational power.

Llama 3.1 did not undergo a revolutionary change in architecture, but Meta put a lot of effort into the scale and quality of the data. The 15T token ocean has led to a qualitative leap in the depth and breadth of Llama 3.1's knowledge.

In terms of data selection, Scialom firmly believes that there is too much textual garbage on the open internet, and the real gold is synthetic data. Llama 3.1's post-training process did not use any human-written answers, relying entirely on synthetic data generated by Llama 2.

Model evaluation has always been a challenge in the AI field. Llama 3.1 experimented with various methods for evaluation and improvement, including reward models and diverse benchmark tests. However, the real challenge lies in finding the right prompts to defeat a powerful model.

Meta has already started training Llama 4 in June, with a focus on agent technology this time. The development of agent tools like Toolformer indicates Meta's new exploration in the AI field.

The open-source release of Llama 3.1 is not only a bold attempt by Meta but also a profound reflection on the future of AI. With the launch of Llama 4, we have reason to believe that Meta will continue to lead the way in AI. Let's look forward to how Llama 4 and agent technology will redefine the future of AI.

Llama3.1 Meta GPT-4o ScalingLaw

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Kimi-2 Has Been Launched on LiveBench AI: The New Open-Source AI Champion Exceeds GPT-4.1

Kimi-2, an open-source 32B MoE AI model, excels in code generation and outperforms Claude Opus4/GPT-4.1. Priced at $0.15/million tokens, it ranks 3rd globally and is now available via API and Kimi app.....

Jul 16, 2025

220

ByteDance Seed's Latest Reinforcement Learning Recipe POLARIS Open Sourced, 4B Model's Mathematical Reasoning Approaches 235B Performance

Recently, the ByteDance Seed team collaborated with the University of Hong Kong and Fudan University to introduce an innovative reinforcement learning training method called POLARIS. This method successfully enhances the mathematical reasoning capabilities of small models to levels comparable to those of large models through a carefully designed Scaling RL strategy, offering a new approach for optimizing small models in the field of artificial intelligence. Experimental results show that the 4 billion parameter open-source model Qwen3-4B trained using POLARIS achieved remarkable performance on AIME25 and AIME24 mathematical tests.

Jul 16, 2025

150

Kimi K2 Wins Short Story Creative Writing Contest, Exceeding o3-Pro to Showcase New Heights in AI Literature

Kimi K2 excels in creative writing, outperforming o3-Pro in short story creation. This open-source model by Moonshot (Ali-backed) shows strengths in literary compression and metaphor innovation, with some works near publishable quality. Its low cost ($0.15/M tokens) and precise instruction-following attract developers, though emotional depth and multilingual performance need improvement. This breakthrough sets new AI writing standards.....

Jul 16, 2025

160

TRAE Launches Kimi-K2 Model Service International Version Supports Grok-4 (Beta) Function Upgrade

TRAE.ai launches Kimi-K2 model and Grok-4(Beta). Kimi-K2 excels in code/math with MoE architecture, rivaling GPT-4.1. Easy 3-step access. International version adds Grok-4(Beta) testing alongside Claude, Gemini, GPT.....

Jul 16, 2025

180

Meta Fixes Security Vulnerability, User AI Prompts and Generated Content No Longer Leaked

Meta fixed a critical AI chatbot flaw allowing unauthorized access to private chats via predictable IDs. Researcher Sandeep Hodkasia discovered it, earning a $10k bounty. Meta confirmed patching the vulnerability with no evidence of exploitation. This highlights ongoing AI security challenges.....

Jul 16, 2025

Grok4 Is Coming! Elon Musk's New AI Star Successfully Challenges Programming Tests

Musk's AI model Grok4 excels in programming, creative tasks, and outperforms OpenAI o3 in 8 areas. It adapts explanations for all ages and shows potential to revolutionize work and life.....

Jul 15, 2025

130

Meta Announces World's First 1GW+ Power Supercomputer Cluster to Go Live, AI Computing Competition Rises to New Level

Meta accelerates AI infrastructure, targeting a 1GW 'Prometheus' supercomputer with 1.3M NVIDIA H100 GPUs (2 exaflops) by 2026, plus 5GW 'Hyperion' cluster. Plans $60-65B investment by 2025 for AI/data centers, competing with OpenAI/xAI. Commits to open-source and privacy despite environmental concerns.....

Jul 15, 2025

220

Report: ByteDance's Pico is Developing a Lightweight MR Glasses, Directly Targeting Meta's Next-Generation Product

ByteDance's Pico is developing a lightweight MR glasses, weighing only 127 grams. It uses a split design to transfer computing tasks to an external processing unit and is developing a dedicated chip to reduce latency. This move directly targets Meta's MR strategy, as the latter has paused the development of Quest 4 and shifted towards lightweight devices. Both major companies are betting on split-style MR glasses, indicating that the industry is transitioning from traditional VR headsets to more compact forms.

Jul 15, 2025

Meta Strikes Out! Cracking Down on Copycat Accounts to Rebuild the Social Media Content Ecosystem

Meta cracks down on 'unoriginal' content, removing 10M fake accounts. New measures limit violators' reach while protecting transformative works. Original videos get priority; AI-generated low-quality content is discouraged. Creators can track visibility via a professional dashboard.....

Jul 15, 2025

Meta May Abandon the Open-Source Philosophy and Shift to Proprietary AI Model Development

Meta may shift from open-source to closed-source AI, potentially abandoning its 'Behemoth' model due to poor performance. Despite claims of commitment to open-source, this move could challenge Zuckerberg's vision, impact AI competition, and disadvantage smaller firms reliant on open models, including China's AI strategy.....

Jul 15, 2025

100

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Llama 4 Training Launch: Meta Scientists Reveal the Story Behind Llama 3.1's Training

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Kimi-2 Has Been Launched on LiveBench AI: The New Open-Source AI Champion Exceeds GPT-4.1

ByteDance Seed's Latest Reinforcement Learning Recipe POLARIS Open Sourced, 4B Model's Mathematical Reasoning Approaches 235B Performance

Kimi K2 Wins Short Story Creative Writing Contest, Exceeding o3-Pro to Showcase New Heights in AI Literature

TRAE Launches Kimi-K2 Model Service International Version Supports Grok-4 (Beta) Function Upgrade

Meta Fixes Security Vulnerability, User AI Prompts and Generated Content No Longer Leaked

Grok4 Is Coming! Elon Musk's New AI Star Successfully Challenges Programming Tests

Meta Announces World's First 1GW+ Power Supercomputer Cluster to Go Live, AI Computing Competition Rises to New Level

Report: ByteDance's Pico is Developing a Lightweight MR Glasses, Directly Targeting Meta's Next-Generation Product

Meta Strikes Out! Cracking Down on Copycat Accounts to Rebuild the Social Media Content Ecosystem

Meta May Abandon the Open-Source Philosophy and Shift to Proprietary AI Model Development