Meta Accused of AI Model Double Standard: Maverick's Performance Varies Widely Between Evaluation and Public Versions

AIbase基地

Published inAI News · 4 min read · Apr 7, 2025

Meta released its new flagship AI model, Maverick, on Saturday. This model achieved a second-place ranking in the LM Arena benchmark. LM Arena is a testing platform that relies on human raters to compare outputs from different models and select their preferences. However, several AI researchers quickly discovered a significant discrepancy between the version of Maverick deployed by Meta to LM Arena and the version widely used by developers.

Facebook Metaverse meta

Meta acknowledged in an announcement that the Maverick on LM Arena was an "experimental chat version." Meanwhile, a chart on the official Llama website indicates that Meta's LM Arena test used "Llama4Maverick optimized for dialogue." This discrepancy sparked questions within the research community.

AI researchers on the social media platform X noted a clear behavioral difference between the publicly downloadable Maverick and the version hosted on LM Arena. The LM Arena version was characterized by extensive use of emojis and lengthy responses, uncommon in the standard version. Researcher Nathan Lambert shared this finding on X, sarcastically commenting, "Okay, Llama4 is definitely a bit overcooked, haha. What part of Yaph City is this?", along with relevant screenshots.

This practice of tailoring a model for a specific benchmark and then releasing a supposedly "raw" version raises serious concerns. Primarily, it makes it difficult for developers to accurately predict the model's performance in real-world applications. Furthermore, this is considered misleading, as the purpose of benchmarks is to provide an objective snapshot of a single model's strengths and weaknesses across various tasks.

While LM Arena has not consistently been considered the most reliable metric for measuring AI model performance for various reasons, AI companies typically don't publicly admit to specifically optimizing models for better scores in benchmarks. Meta's approach appears to break this convention, prompting a broader discussion on the transparency of AI model evaluations.

Maverick Llama LMArena Meta

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Meta Releases WebSSL Models: A New Exploration in Language-Free Visual Learning

In the field of artificial intelligence, Meta recently introduced the WebSSL family of models. These models, ranging in size from 300 million to 7 billion parameters, are trained on purely image data and aim to explore the vast potential of language-free visual self-supervised learning (SSL). This new research opens up new possibilities for future multimodal tasks and offers a fresh perspective on understanding how visual representations are learned. Previously, OpenAI's CLIP model was known for its performance in multimodal tasks such as visual question answering (VQA) and document understanding.

Apr 25, 2025

180

Meta Ray-Ban Smart Glasses Roll Out Real-Time Translation, Offline Support Included

Meta recently announced the global rollout of real-time translation for its Ray-Ban Meta smart glasses. Previously, this feature was limited to early testing users in select markets. This full launch allows users to enjoy more convenient language conversion across various scenarios, especially the ability to overcome language barriers offline. According to Meta, the real-time translation feature on Ray-Ban Meta smart glasses now covers global sales markets and supports English, French, and Italian (among other languages).

Apr 24, 2025

110

Meta Launches Real-Time Translation for Ray-Ban Smart Glasses

Meta has announced the rollout of several new features for its Ray-Ban smart glasses, including real-time translation, Instagram messaging, and calling. Initially available only to select users in a preview program, these features are now available to all Ray-Ban Stories users. The real-time translation feature, first revealed at Meta Connect 2024, underwent limited testing in select countries last December. Now, users can utilize this feature in supported markets.

Apr 24, 2025

Meta Uses AI to Identify Underage Users on Instagram, Triggering Protective Mode

Meta has announced it will use artificial intelligence (AI) to verify the age of teenage users on Instagram, preventing users from misrepresenting their age. This measure aims to enhance the online safety of teenagers, ensuring they use social media in a protected environment. Meta stated that once the system detects an account suspected of belonging to a teenager, even if the user has entered an adult birthday, the system will automatically place it in "teen account" mode. Instagram reportedly implemented this last year.

Apr 22, 2025

130

Apple Intelligence Feature Restricted on Meta Apps: Ban Sparks AI Competition Debate

According to foreign media reports, Apple's newly launched Apple Intelligence feature is disabled on Meta's apps (including Facebook, Instagram, WhatsApp, and Threads), preventing users from accessing core functionalities such as Writing Tools and the custom emoji generator (Genmoji). This move is believed to be related to Meta's strategy of promoting its own Meta AI tools, highlighting the intensifying competition between the two tech giants in the AI arena.

Apr 21, 2025

180

LMArena Officially Launches, Dedicated to Providing a Neutral AI Evaluation Platform

Apr 18, 2025

160

Perplexity's Sonar Challenges Google Gemini's Search Dominance

In the latest LM Arena Search Arena evaluation, Perplexity's Sonar-Reasoning-Pro-High model performed exceptionally well, tying with Google's Gem-2.5-Pro-Grounding model for first place and achieving a 53% win rate in head-to-head comparisons. This news has sent shockwaves through the search engine industry, showcasing Perplexity's powerful AI search technology.

Apr 16, 2025

200

UK AI Copyright Regulations Could Lead to Biased Models and Reduced Creator Revenue

Policy experts have voiced concerns over proposed AI copyright regulations in the UK, arguing that a lack of comprehensive text and data mining exemptions could lead to lower-quality AI models and stifle innovation. They suggest that prohibiting companies like OpenAI, Google, and Meta from using copyrighted material to train AI in the UK could result in biased model outputs, diminishing their effectiveness. The UK government launched a consultation in December 2024 to explore how to protect creators while allowing the use of creative content in AI model training.

Apr 16, 2025

110

Meta's Plan to Use EU User Data for AI Training Raises Privacy Concerns

Meta Platforms, Inc. has announced plans to use user data from its European Union applications, including Facebook and Instagram, to train its artificial intelligence models. The company clarified that the training data will include users' public posts, comments, and interactions with Meta AI, but will exclude private messages with friends and family. Training will be limited to users aged 18 and over. Meta stated it will inform its EU users of this plan this week via in-app notifications and emails.

Apr 15, 2025

150

Meta Restarts AI Training Using Public Content from European Users

Meta recently announced it will resume training its AI models using publicly available content from European users. This decision follows a pause last year due to data privacy concerns. Meta stated that this AI training will primarily rely on publicly shared posts and comments from adult users across the 27 EU countries. Furthermore, interactions between users and Meta AI, such as questions and queries, will also be used to train and improve its AI models. Image attribution: Image generated by AI, image licensing provided by Midj

Apr 15, 2025

140

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview