Recently, several artificial intelligence large models have garnered widespread attention for making errors in simple numerical comparisons. Prominent AI models, including ByteBean, GPT4o, Kimi from the Dark Side of the Moon, StepStar JumpAsk, and Baichuan Intelligence's BaiXiaoYing, all provided incorrect answers to basic questions like "Which is larger, 9.11 or 9.9?" Additionally, earlier reports indicated that multiple large models incorrectly answered how many "r"s are in the word "strawberry."
Image Source: The image was generated by AI, with authorization from Midjourney
In response to this phenomenon, the Dark Side of the Moon company issued a statement. They noted that human exploration of large model capabilities is still in its infancy, whether understanding what they can or cannot achieve requires more research and testing.
The Dark Side of the Moon emphasized that they warmly welcome users to discover and report more boundary cases during usage. These cases, whether recent issues with numerical comparisons or previous spelling errors, contribute to a deeper understanding of the capabilities of large models.
However, the Dark Side of the Moon pointed out that resolving these issues cannot rely solely on fixing each case individually. They believe these situations are akin to scenarios encountered by autonomous driving, which are difficult to exhaustively address. Therefore, it is more important to continuously enhance the intelligence level of the underlying foundational models, making large models more robust and comprehensive, capable of performing excellently in various complex and extreme conditions.
This incident has sparked industry discussions on the foundational capabilities of AI large models and highlighted the challenges faced by current AI technology in handling seemingly simple tasks. With further research and technological advancements, it is believed that these issues will gradually be improved.