Meta, the American tech giant, has launched its most powerful open-source AI model yet: Llama4. This initial release includes two models: Llama4Scout and Llama4Maverick.
Llama4Scout boasts 109 billion parameters, 17 billion active parameters, and 16 experts. Its standout feature is its support for 10 million context length, equivalent to processing over 20 hours of video, all while running on a single H100 GPU (after Int4 quantization). Benchmark tests show it outperforms Gemma3, Gemini2.0Flash - Lite, and Mistral3.1.
Llama4Maverick features 400 billion parameters, 17 billion active parameters, and 128 experts, with a context length of 1 million. On the LMSYS large model leaderboard, Llama4Maverick achieved second place (ELO score 1417), trailing only the closed-source Gemini2.5Pro. Notably, it achieves comparable reasoning and coding capabilities to DeepSeek - v3-0324 using only half the parameters.
Furthermore, an even more powerful 2 trillion-parameter Llama4Behemoth is slated for release in the coming months. It will have 288 billion active parameters and 16 experts. Currently, in STEM benchmark tests, it surpasses GPT-4.5, Claude Sonnet3.7, and Gemini2.0Pro.
The Llama4 series utilizes a Mixture-of-Experts (MoE) architecture for the first time, resulting in increased efficiency during training and user query response. Llama4 is also a native multi-modal model employing early fusion technology, seamlessly integrating text and visual tokens. Meta has also upgraded its visual encoder and developed a new training method, MetaP, to optimize hyperparameters. Developers can download these latest models from llama.com and Hugging Face starting today.
Key Highlights:
- 🌟Meta releases the open-source multi-modal Llama4, initially launching Llama4Scout and Llama4Maverick, with Llama4Behemoth to follow.
- 💪Llama4 demonstrates strong performance, excelling in large model leaderboards and exhibiting reasoning and coding capabilities comparable to, or exceeding, other top models.
- 🛠️Utilizing an MoE architecture as a native multi-modal model, featuring technological upgrades and a new training method, available for download by developers.