2023-10-25 15:39:22.AIbase.2.5k
Xi Xiaoyao Technology Talk | Stop Saying GPT-4V is Amazing! It Can't Even Recognize Peking Duck, Can You Believe It??
The newly proposed image reasoning benchmark HallusionBench is used to examine visual language models like GPT-4V, revealing issues with language and visual hallucinations. Models like GPT-4V exhibit a high error rate of up to 90% in generating language hallucinations influenced by parametric memory within HallusionBench. Additionally, models such as GPT-4V are prone to geometric and other visual illusions, indicating that their current visual capabilities are still limited. Simple image manipulations can easily mislead these models, reflecting their fragility.