Apple Proposes MAD-Bench Benchmark to Address Hallucination Issues in Multimodal Large Language Models

站长之家

Published inAI News · 1 min read · Mar 1, 2024

Translated data: Apple Research introduces MAD-Bench, a benchmark aimed at addressing the vulnerability of multimodal large language models (MLLMs) in handling misleading information. This study includes 850 pairs of image prompts, evaluating the MLLMs' ability to process the consistency between text and images. The findings reveal that GPT-4V performs well in scene understanding and visual confusion, providing significant insights for designing AI models. With the MAD-Bench benchmark, the robustness of AI models will be enhanced, making future research more reliable.

Multimodal Large Language Models MLLMs AI Models

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Korean Startup RLWRLD Secures $14.8 Million to Develop Robotic Foundation Models

As robotics technology advances, industries are increasingly adopting robots to automate various strenuous tasks. According to the International Federation of Robotics (IFR), over 540,000 new industrial robots were installed globally in 2023, bringing the total number of active industrial robots to over 4 million. While traditional industrial robots excel at repetitive tasks, they still face challenges in performing delicate tasks, handling fragile materials, and adapting to changing conditions. For example, robots in restaurant kitchens may cause more disruption than assistance.

Apr 15, 2025

130

Access to Future AI Models in the OpenAI API Will Implement Authentication to Ensure Secure AI Model Usage

Apr 14, 2025

160

OpenGVLab Open-Sources InternVL3 Series of Multimodal Large Language Models

OpenGVLab has open-sourced the InternVL3 series of models, marking a new milestone in the field of Multimodal Large Language Models (MLLMs). The InternVL3 series comprises seven models ranging from 1B to 78B parameters, capable of handling text, images, and videos simultaneously, demonstrating superior overall performance.

Apr 14, 2025

410

Concerns Rise as AI Models Conceal Their Reasoning Processes: Study Finds Their 'Thinking' Often Unreliable

In education, we're taught to "show your work." Now, advanced AI models claim to do just that. However, new research reveals that these models sometimes obfuscate their true reasoning processes, fabricating elaborate explanations instead. A recent study from Anthropic's research team, investigating simulated reasoning (SR) models including their own Claude models and DeepSeek's R1, found these models often misrepresent their 'thinking' when

Apr 11, 2025

460

Google Releases 69-Page White Paper: Optimizing AI Models Through Prompt Engineering

Apr 11, 2025

175.5k

Google Plans to Combine Gemini and Veo AI Models to Advance Smart Assistants

In a recent podcast, Demis Hassabis, CEO of Google DeepMind, stated that Google plans to eventually integrate its Gemini AI model with the video generation model Veo to enhance Gemini's understanding of the physical world. He noted that Gemini was designed from the outset to be multimodal, aiming for a "universal digital assistant" that can genuinely help users in the real world. Hassabis mentioned...

Apr 11, 2025

210

Soaring Costs of Benchmarking Inference AI Models: Assessing One Can Cost Nearly $3000

According to Artificial Analysis, a third-party AI testing agency, evaluating OpenAI's o1 inference model across seven popular benchmarks costs $2,767.05, while its non-inference model GPT-4o costs only $108.85. This significant disparity sparks discussion regarding the sustainability and transparency of AI evaluation. Inference models, AI systems capable of step-by-step reasoning to solve problems, while excelling in specific domains, incur significantly higher benchmarking costs than traditional models. Arti...

Apr 11, 2025

160

EU Invests €20 Billion in AI Superfactories

The EU recently announced a €20 billion (approximately £17 billion) plan to establish multiple AI factories across Europe, equipped with high-performance computing resources to drive the development of next-generation AI models. This strategy aims to establish Europe as an "AI continent." According to EU Commission Vice-President Henna Virkkunen, AI is crucial for enhancing Europe's competitiveness, security, and technological sovereignty in the face of intense global competition. Currently, the US and...

Apr 10, 2025

150

Stanford AI Index Report: Closing Performance Gap Between US and Chinese AI Models, Alibaba Model Rises to Third Globally

The Stanford Institute for Human-Centered Artificial Intelligence (HAI), led by renowned AI scientist Fei-Fei Li, has released its latest AI Index Report 2025. In its eighth year, this authoritative report highlights the narrowing performance gap between top AI models from China and the United States, the world's two most influential AI nations. The gap has shrunk to a negligible 0.3%, down from 17.5% in 2023. The report also features a ranking of Notable Models in 2024, with...

Apr 10, 2025

350

Mozilla Releases LocalScore: A New Tool to Simplify Benchmarking Local AI Models

Mozilla recently launched a tool called LocalScore through its Mozilla Builders program, aimed at providing easy benchmarking for local Large Language Models (LLMs). Compatible with Windows and Linux systems, the tool shows great potential as a key component of easily distributable LLM frameworks. While still in early development, LocalScore already demonstrates promising performance.

Apr 8, 2025

230

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Apple Proposes MAD-Bench Benchmark to Address Hallucination Issues in Multimodal Large Language Models

站长之家

This article is from AIbase Daily

AI News Recommendations

Korean Startup RLWRLD Secures $14.8 Million to Develop Robotic Foundation Models

Access to Future AI Models in the OpenAI API Will Implement Authentication to Ensure Secure AI Model Usage

OpenGVLab Open-Sources InternVL3 Series of Multimodal Large Language Models

Concerns Rise as AI Models Conceal Their Reasoning Processes: Study Finds Their 'Thinking' Often Unreliable

Google Releases 69-Page White Paper: Optimizing AI Models Through Prompt Engineering

Google Plans to Combine Gemini and Veo AI Models to Advance Smart Assistants

Soaring Costs of Benchmarking Inference AI Models: Assessing One Can Cost Nearly $3000

EU Invests €20 Billion in AI Superfactories

Stanford AI Index Report: Closing Performance Gap Between US and Chinese AI Models, Alibaba Model Rises to Third Globally

Mozilla Releases LocalScore: A New Tool to Simplify Benchmarking Local AI Models