Xi Xiaoyao Technology Talk | Stop Saying GPT-4V is Amazing! It Can't Even Recognize Peking Duck, Can You Believe It??
夕小瑶科技说
63
Translation:
Regarding the recently highly-discussed visual language model GPT-4V, researchers have constructed a new benchmark test called HallusionBench to evaluate its image reasoning capabilities. The results indicate that models like GPT-4V perform poorly on HallusionBench, often succumbing to language hallucinations influenced by their parametric memories, with error rates as high as 90%. Additionally, GPT-4V's performance on visual tasks involving geometry is also unsatisfactory, highlighting its current limitations in visual abilities. Simple image manipulations can easily mislead GPT-4V, exposing its vulnerabilities. In contrast, LLaVA-1.5, while not as richly knowledgeable as GPT-4V, has fewer common sense errors. This study reveals the limitations of current visual language models in image reasoning and provides insights for future improvements.
© Copyright AIbase Base 2024, Click to View Source - https://www.aibase.com/news/2491