New Challenges for GPT-4 in Visual Recognition Tasks

量子位

Published inAI News · 2 min read · Nov 14, 2023

Recent studies have revealed that GPT-4 performed poorly in a visual recognition challenge task, possibly because the images in this task were overly common in the training set, leading GPT-4 to rely on memorization rather than genuine visual recognition capabilities. This indicates that even large models that excel in certain tasks require careful evaluation; their success in the training set should not lead to an overestimation of their generalization abilities. Enhancing the model's generalization and robustness against adversarial samples remains a key research focus. It is also crucial to be wary of testing models solely on the training set; evaluating their generalization capabilities across a broader range of samples is essential for a more accurate assessment of model performance.

GPT-4 Visual Recognition Model Testing

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Google Launches New Veo 3 Video Generation Model Globally

Google announced the global launch of its latest video generation model, Veo3. This long-anticipated release has generated great excitement among users, as Veo3 is now available to Gemini users in over 159 countries, offering a new video creation experience. The key feature of the Veo3 video generation model is its ability to generate videos up to eight seconds long based on simple text prompts. According to Google, this technology is designed for creative users, especially those on social media who increasingly demand short-form content.

Jul 4, 2025

240

JD Logistics Launches Self-Developed Unmanned Light Truck JD Logistics VAN with L4 Level Public Road Autonomous Driving

At the 17th International Exhibition of Transportation Technology and Equipment held recently, JD Logistics officially launched its self-developed unmanned light truck product - JD Logistics VAN. This unmanned light truck has a large cargo space of 24 cubic meters, making it the one with the largest cargo capacity in the logistics industry. It is expected to replace traditional 4.2-meter trucks in logistics shuttle and transfer station links. According to the introduction, JD Logistics VAN has a full-load driving range of up to 400 kilometers and is equipped with L4-level autonomous driving capabilities on public roads. This means it can drive autonomously.

Jul 4, 2025

160

MiniMax Launches the World's First Open-Source Large-Scale AI Model, Technological Breakthrough Attracts Industry Attention

Jul 4, 2025

340

Kunlun Xiwang Once Again Open-Sources the Reward Model Skywork-Reward-V2

On July 4, 2025, Kunlun Xiwang continued to open-source the second-generation reward model Skywork-Reward-V2 series. This series includes 8 reward models based on different foundation models, with parameter sizes ranging from 600 million to 8 billion. Upon its release, it won all seven major reward model evaluation rankings, becoming a focus in the open-source reward model field. Reward models play a key role in the reinforcement learning from human feedback (RLHF) process. To build the next generation of reward models, Kunlun Xiwang has constructed a dataset containing 40 million

Jul 4, 2025

200

Google Veo 3 Video Generation Model Now Available to Pro/Ultra Subscribers, Will Add Photo-to-Video Function

Jul 4, 2025

250

China's Medical Large Model Release Volume Accounts for 70% of the Global Total! KPMG Reveals Future Market Potential

According to KPMG China's recent report, "The First 50 Health Tech Companies," China accounts for more than 70% of the global release volume of medical large models. This data not only demonstrates China's rapid development in the field of intelligent healthcare, but also reflects the wide application of large language models in the healthcare industry. The report points out that about 65% of the currently released medical large models are large language models. These models can process and generate natural language, playing a significant supporting role in the analysis of medical data, patient communication, and scientific research.

Jul 4, 2025

100

Xiaopeng G7 Ultra Makes a Grand Debut! Revolutionary Intelligent Driving Large Model Unveiled

In the new energy vehicle market, Xiaopeng Automotive has once again drawn attention. On July 3rd, the Xiaopeng G7 Ultra was officially launched, becoming the first intelligent vehicle equipped with the local-end "VLA+VLM" large model. This innovative technology marks an important step forward for Xiaopeng in the field of intelligent driving. The Xiaopeng G7 Ultra is equipped with the VLA (active thinking and rapid decision-making capability) large model, making the driving experience more intelligent. In daily driving, the G7 Ultra can flexibly handle various complex driving scenarios, such as in traffic.

Jul 4, 2025

170

A Daily: Bilibili Upgrades Anime Video Generation Model AniSora V3; ByteDance Open Sources 4D Video Generation Framework EX-4D; DeepSWE Open Sources AI Agent System Rises to the Top

Jul 3, 2025

190

ByteDance Open Sources New Model VINCIE-3B: 300 Million Parameters Support Continuous Image Editing with Context

Jul 3, 2025

670

Bilibili Open-Sourced Anime Video Generation Model AniSora V3 Version - One-Click Generation of Various Style Anime Video Shots

Jul 3, 2025

530

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

New Challenges for GPT-4 in Visual Recognition Tasks

量子位

This article is from AIbase Daily

AI News Recommendations

Google Launches New Veo 3 Video Generation Model Globally

JD Logistics Launches Self-Developed Unmanned Light Truck JD Logistics VAN with L4 Level Public Road Autonomous Driving

MiniMax Launches the World's First Open-Source Large-Scale AI Model, Technological Breakthrough Attracts Industry Attention

Kunlun Xiwang Once Again Open-Sources the Reward Model Skywork-Reward-V2

Google Veo 3 Video Generation Model Now Available to Pro/Ultra Subscribers, Will Add Photo-to-Video Function

China's Medical Large Model Release Volume Accounts for 70% of the Global Total! KPMG Reveals Future Market Potential

Xiaopeng G7 Ultra Makes a Grand Debut! Revolutionary Intelligent Driving Large Model Unveiled

A Daily: Bilibili Upgrades Anime Video Generation Model AniSora V3; ByteDance Open Sources 4D Video Generation Framework EX-4D; DeepSWE Open Sources AI Agent System Rises to the Top

ByteDance Open Sources New Model VINCIE-3B: 300 Million Parameters Support Continuous Image Editing with Context

Bilibili Open-Sourced Anime Video Generation Model AniSora V3 Version - One-Click Generation of Various Style Anime Video Shots