Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

AI Brand Monitoring Tool

Analyze & Track How AI Models Cite Your Brand

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

AI Deployment Calculator

Enter Your Large Model Computing Requirements for Instant GPU, Memory & Server Configuration Recommendations

AI Tutorial

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

The So-Called Most Powerful Model Reflection 70B Faces Doubts, Founder Under 'Fraud' Allegations

AIbase基地

Published inAI News · 6 min read · Sep 10, 2024

250

The recently released open-source AI model Reflection70B has faced widespread skepticism from the industry shortly after its debut.

This model, launched by New York-based startup HyperWrite, claims to be a variant of Meta's Llama3.1 and garnered attention for its impressive performance in third-party tests. However, as some test results were disclosed, Reflection70B's reputation began to be challenged.

The situation arose when Matt Shumer, co-founder and CEO of HyperWrite, announced Reflection70B on social media platform X on September 6, confidently calling it "the world's strongest open-source model."

Shumer also shared the model's "reflection tuning" technology, claiming that this method allows the model to self-audit before generating content, thereby enhancing accuracy.

However, the day after HyperWrite's announcement, the organization Artificial Analysis, which specializes in "independent analysis of AI models and hosting providers," posted their analysis on X, stating that their evaluated Reflection Llama3.170B's MMLU (Massive Multitask Language Understanding) score was identical to Llama370B but significantly lower than Meta's Llama3.170B, which is a major discrepancy from the initial results published by HyperWrite/Shumer.

Shumer subsequently explained that there was an issue with the weights (or settings of the open-source model) during the upload to Hugging Face (a third-party AI code hosting repository and company), which might have resulted in performance inferior to HyperWrite's "internal API" version.

Artificial Analysis later stated that they had access to the private API and witnessed impressive performance, but not at the level initially claimed. Since this test was conducted on a private API, they could not independently verify what they were testing.

The organization raised two key questions, severely questioning HyperWrite and Shumer's initial performance claims:

Why the released version was not the one they tested through the Reflection private API.
Why the model weights of the version they tested have not been released.

Meanwhile, users from multiple machine learning and AI communities on Reddit also questioned the claimed performance and origin of Reflection70B. Some pointed out that according to model comparisons posted by third parties on Github, Reflection70B seems to be a variant of Llama3, not a variant of Llama-3.1, further casting doubt on Shumer and HyperWrite's initial claims.

This led to at least one X user, Shin Megami Boson, publicly accusing Shumer of "fraudulent behavior" in the AI research community on September 8 at 8:07 PM Eastern Time, along with a series of screenshots and other evidence.

Others accused the model of actually being a "wrapper" or application built on proprietary/closed-source competitor Anthropic's Claude3.

However, other X users came forward to defend Shumer and Reflection70B, with some also posting impressive performance results from their end.

Currently, the AI research community is awaiting Shumer's response to these fraud allegations and updated model weights on Hugging Face.

🚀 After the release of the Reflection70B model, its performance has been questioned, with test results failing to reproduce the initially claimed performance.
⚙️ HyperWrite's founder explained that a model upload issue led to reduced performance and called for attention to the updated version.
👥 The social media discussion on the model is heated, with both accusations and defenses, creating a complex situation.

Reflection70B HyperWrite Llama3.1 ReflectionTuning

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

How Philips is Promoting AI Literacy Among 70,000 Employees

Philips is promoting AI literacy as a company-wide capability to drive medical innovation. Its products have long applied professional AI technologies, and it now plans to equip all employees with AI skills, no longer limited to specialized teams. The company is implementing a series of measures to enhance the AI application capabilities of all staff.

Nov 14, 2025

AI Daily: Baidu Launches Wenxin 5.0; Keling 2.5 Turbo Model Launches First and Last Frame Function; Weibo Launches VibeThinker-1.5B

【AI Daily】Launches the Keling 2.5 Turbo model, adding the "first and last frame" function, significantly enhancing the controllability, stability, and consistency of AI video generation, providing a better solution for professional creative content production.

Nov 13, 2025

150

Weibo Launches VibeThinker-1.5B, a Low-Cost AI Model Challenging Large Language Models

The Weibo AI department has launched the open-source large model VibeThinker-1.5B, which has 1.5 billion parameters. The model is optimized based on Alibaba's Qwen2.5-Math-1.5B and performs well in math and code tasks. It is now freely available on platforms such as Hugging Face, and it follows the MIT license, supporting commercial use.

Nov 13, 2025

230

Baidu releases ERNIE-4.5-VL-28B-A3B-Thinking: Accurately locates image details to solve complex problems

Baidu launches the multimodal AI model ERNIE-4.5-VL-28B-A3B-Thinking, which can deeply integrate images for reasoning. The model performs excellently in multiple benchmark tests, sometimes surpassing top commercial models such as Google Gemini 2.5 Pro and OpenAI GPT-5 High. Although it has a total of 28 billion parameters, it uses a routing architecture, activating only 3 billion parameters, achieving lightweight and efficient inference.

Nov 13, 2025

130

Anthropic's Ambitious Moonshot Plan: Targeting $70 Billion in Revenue by 2028

Anthropic plans to increase annual revenue from $4.7 billion to $70 billion between 2025 and 2028, requiring two consecutive years of doubling growth and an 80% increase in 2028. The key challenge lies in raising the gross profit margin from -94% to 77% to support this ambitious expansion goal.

Nov 10, 2025

110

Monetization Ideas Anyone Can Learn! B Station Uploader Uses AI to Create Character MVs from Journey to the West, All AI-Generated

Creator uses AI to generate songs, lyrics, and character images based on Journey to the West, producing MVs on Bilibili. Achieves high views, fan growth, and monetization. Ideal for AI enthusiasts and creators with basic AI and editing skills.....

Nov 5, 2025

240

Llama.cpp Has Evolved Completely! The Era of Local AI Has a Multimodal Revolution, Ollama May Be Outclassed

Llama.cpp evolves from a C++ engine to a full AI workbench with a modern web UI, supporting multimodal input, structured output, and parallel interactions, making it user-friendly for all.....

Nov 5, 2025

280

Anthropic's Revenue May Reach $70 Billion in 2028, Cash Flow Outperforms OpenAI

Anthropic expects revenue to reach $70 billion in 2028, with a cash flow of $17 billion, far exceeding OpenAI's significant losses. Its success stems from focusing on a B2B model, selling AI models through APIs, and quickly commercializing with the positioning of 'safe AI'.

Nov 5, 2025

230

Microsoft Azure ND GB300 Sets New Records: 1.1 Million Tokens per Second for Inference

Microsoft Azure's ND GB300v6 VM set a new record of 1.1M tokens/sec for Llama2 70B inference, powered by NVIDIA's GB300NVL72 system with 72 Blackwell Ultra GPUs and 36 Grace CPUs, showcasing Microsoft's AI scaling expertise.....

Nov 4, 2025

210

Kunlun Wanyi SkyReels V3 Model Launch! One-Stop Aggregation of Top AI Video Capabilities such as Sora2 and Veo3.1

The AI video creation platform SkyReels under Kunlun Wanyi is officially launched with a new version, introducing the V3 model and five core function upgrades, supporting both web and mobile ends. The platform highlights the 'one-stop' and 'multi-modal aggregation' features, integrating top global AI multi-modal models to provide a seamless creative experience.

Nov 4, 2025

280

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

The So-Called Most Powerful Model Reflection 70B Faces Doubts, Founder Under 'Fraud' Allegations

AIbase基地

This article is from AIbase Daily

AI News Recommendations

How Philips is Promoting AI Literacy Among 70,000 Employees

AI Daily: Baidu Launches Wenxin 5.0; Keling 2.5 Turbo Model Launches First and Last Frame Function; Weibo Launches VibeThinker-1.5B

Weibo Launches VibeThinker-1.5B, a Low-Cost AI Model Challenging Large Language Models

Baidu releases ERNIE-4.5-VL-28B-A3B-Thinking: Accurately locates image details to solve complex problems

Anthropic's Ambitious Moonshot Plan: Targeting $70 Billion in Revenue by 2028

Monetization Ideas Anyone Can Learn! B Station Uploader Uses AI to Create Character MVs from Journey to the West, All AI-Generated

Llama.cpp Has Evolved Completely! The Era of Local AI Has a Multimodal Revolution, Ollama May Be Outclassed

Anthropic's Revenue May Reach $70 Billion in 2028, Cash Flow Outperforms OpenAI

Microsoft Azure ND GB300 Sets New Records: 1.1 Million Tokens per Second for Inference

Kunlun Wanyi SkyReels V3 Model Launch! One-Stop Aggregation of Top AI Video Capabilities such as Sora2 and Veo3.1

GEO Services