Meta's Llama-4-Maverick Plummets in Rankings, Raising Concerns of Benchmark Manipulation

AIbase基地

Published inAI News · 3 min read · Apr 14, 2025

Recently, Meta's open-source large language model, Llama-4-Maverick, plummeted from second to 32nd place on the LMArena leaderboard, sparking widespread skepticism among developers who suspect Meta of submitting a specialized version to manipulate the rankings.

The controversy began on April 6th when Meta released its latest large language model, Llama4, encompassing three versions: Scout, Maverick, and Behemoth. Initially, Llama-4-Maverick performed impressively, securing second place on the LMArena leaderboard, trailing only Gemini2.5Pro.

However, as user feedback on the publicly available Llama4 version surfaced, the model's reputation quickly deteriorated. Developers discovered significant discrepancies between the version Meta submitted to LMArena and the openly released version, fueling allegations of ranking manipulation.

LLM Llama Math Model

Image Note: Image generated by AI, licensed by Midjourney.

According to Chatbot Arena, Meta's initial submission, Llama-4-Maverick-03-26-Experimental, was an experimentally optimized version that initially ranked second. The revised open-source version, Llama-4-Maverick-17B-128E-Instruct, despite boasting 17 billion activation parameters and 128 MoE experts, only achieved a 32nd-place ranking, significantly lagging behind top performers like Gemini2.5Pro and GPT4o, and even underperforming the Llama-3.3-Nemotron-Super-49B-v1, a model based on the previous generation.

Regarding Llama-4-Maverick-03-26-Experimental's underwhelming performance, Meta explained at a recent conference that the model was "specifically optimized for dialogue," resulting in its relatively high score on LM Arena. This optimization, while yielding high leaderboard scores, hindered accurate performance prediction in various scenarios.

A Meta spokesperson told TechCrunch that Meta will continue exploring customized versions and expects developers to adapt and improve Llama4 based on their needs. The company welcomes developers' creativity and values their feedback.

Llama-4-Maverick Meta LMArena Large Language Model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: Zhipu AI Opens Sources 32B/9B GLM Series Models and Launches Z.ai Domain; OpenAI Releases GPT-4.1 Series Models; Alibaba ModelScope Launches MCP Plaza

Welcome to the "AI Daily" column! Your daily guide to exploring the world of artificial intelligence. We present you with the hottest AI topics, focusing on developers, helping you understand technology trends and learn about innovative AI product applications. Discover new AI products here: https://top.aibase.com/ 1. Zhipu AI Launches New Domain Z.ai and Open Sources 32B/9B Series GLM Models Zhipu AI team recently announced the open sourcing of 32B and 9B series GLM models and launched a new interactive...

Apr 15, 2025

120

Moon's Dark Side Launches First Content Community, Kimi, to Enhance User Interaction

Moon's Dark Side recently announced it's conducting a gray-scale test of its first content community product, Kimi, aimed at improving user experience and retention. The product, Kimi, underwent limited testing late last year and is now entering a wider testing phase. According to The Paper, Moon's Dark Side is a company founded in March 2023, led by a team headed by Yang Zhilin, who has a background at Tsinghua University. Core members of the founding team have participated in the development of several well-known large language models, including Google's Gemini and Bard.

Apr 15, 2025

130

Zhihu AI Officially Initiates IPO Process; A New Chapter for the 'Big Six' in Large Language Models

Zhihu AI, a leading player in the Chinese large language model market, has officially begun its initial public offering (IPO) process, marking a significant milestone for the industry's 'Big Six' companies.

Apr 15, 2025

310

Meta's Plan to Use EU User Data for AI Training Raises Privacy Concerns

Meta Platforms, Inc. has announced plans to use user data from its European Union applications, including Facebook and Instagram, to train its artificial intelligence models. The company clarified that the training data will include users' public posts, comments, and interactions with Meta AI, but will exclude private messages with friends and family. Training will be limited to users aged 18 and over. Meta stated it will inform its EU users of this plan this week via in-app notifications and emails.

Apr 15, 2025

Meta Restarts AI Training Using Public Content from European Users

Meta recently announced it will resume training its AI models using publicly available content from European users. This decision follows a pause last year due to data privacy concerns. Meta stated that this AI training will primarily rely on publicly shared posts and comments from adult users across the 27 EU countries. Furthermore, interactions between users and Meta AI, such as questions and queries, will also be used to train and improve its AI models. Image attribution: Image generated by AI, image licensing provided by Midj

Apr 15, 2025

Zhipu AI Launches New Domain Z.ai and Open-Sources 32B/9B GLM Model Series

Zhipu AI's technology team has announced the open-sourcing of its 32B and 9B GLM (General Language Model) model series, and the official launch of its new interactive platform, Z.ai. This model series includes base models, inference models, and contemplative models, all under a permissive MIT license. This grants developers extensive freedom for use and development, allowing free use for commercial purposes and free distribution.

Apr 15, 2025

480

California Crosswalk Buttons Hacked to Mimic Musk and Zuckerberg's Voices

Apr 15, 2025

Pre-training Doesn't Equal Stronger: Research Reveals Catastrophic Overfitting in Large Language Models

Apr 14, 2025

160

OpenGVLab Open-Sources InternVL3 Series of Multimodal Large Language Models

OpenGVLab has open-sourced the InternVL3 series of models, marking a new milestone in the field of Multimodal Large Language Models (MLLMs). The InternVL3 series comprises seven models ranging from 1B to 78B parameters, capable of handling text, images, and videos simultaneously, demonstrating superior overall performance.

Apr 14, 2025

410

Stanford Report Confirms: Alibaba's Qwen Ranks Third Globally in Large Model Contribution, Reshaping Global Competition with Computing Power!

Stanford University's AI Index Report 2025 offers a fresh perspective on the global AI landscape. The report highlights Alibaba's significant contribution, ranking third globally among major large language models, establishing it as a leading Chinese tech company. In 2024, China contributed 15 models globally, with Alibaba contributing 6, trailing only Google and OpenAI with 7 models each. This achievement reflects Alibaba's ongoing commitment to technological innovation.

Apr 12, 2025

920

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Meta's Llama-4-Maverick Plummets in Rankings, Raising Concerns of Benchmark Manipulation

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: Zhipu AI Opens Sources 32B/9B GLM Series Models and Launches Z.ai Domain; OpenAI Releases GPT-4.1 Series Models; Alibaba ModelScope Launches MCP Plaza

Moon's Dark Side Launches First Content Community, Kimi, to Enhance User Interaction

Zhihu AI Officially Initiates IPO Process; A New Chapter for the 'Big Six' in Large Language Models

Meta's Plan to Use EU User Data for AI Training Raises Privacy Concerns

Meta Restarts AI Training Using Public Content from European Users

Zhipu AI Launches New Domain Z.ai and Open-Sources 32B/9B GLM Model Series

California Crosswalk Buttons Hacked to Mimic Musk and Zuckerberg's Voices

Pre-training Doesn't Equal Stronger: Research Reveals Catastrophic Overfitting in Large Language Models

OpenGVLab Open-Sources InternVL3 Series of Multimodal Large Language Models

Stanford Report Confirms: Alibaba's Qwen Ranks Third Globally in Large Model Contribution, Reshaping Global Competition with Computing Power!