Translation: Regarding the recently highly-discussed visual language model GPT-4V, researchers have constructed a new benchmark test called HallusionBench to evaluate its image reasoning capabilities. The results indicate that models like GPT-4V perform poorly on HallusionBench, often succumbing to language hallucinations influenced by their parametric memories, with error rates as high as 90%. Additionally, GPT-4V's performance on visual tasks involving geometry is also unsatisfactory, highlighting its current limitations in visual abilities. Simple image manipulations can easily mislead GPT-4V, exposing its vulnerabilities. In contrast, LLaVA-1.5, while not as richly knowledgeable as GPT-4V, has fewer common sense errors. This study reveals the limitations of current visual language models in image reasoning and provides insights for future improvements.