Recently, a seemingly simple math problem – "Which is larger, 13.8 or 13.11?" – has not only stumped some humans but also caused confusion for many large language models (LLMs). This issue has sparked a widespread discussion about the ability of AI to handle common sense problems.

During a popular TV show, this question ignited a heated debate among netizens. Many believed that 13.11% should be larger than 13.8%, but in fact, 13.8% is the larger number.

1.jpg

Researcher Yu Chen Lin from AI2 discovered that even large language models like GPT-4o can make mistakes on this simple comparison question. GPT-4o incorrectly thought that 13.11 was larger than 13.8 and provided a wrong explanation.

2.jpg

Lin's discovery quickly sparked a lively discussion in the AI community. Many other large language models, such as Gemini, Claude3.5Sonnet, also made the same mistake on this simple comparison question.

The emergence of this problem reveals the potential difficulties AI may encounter when dealing with tasks that seem simple but actually involve precise numerical comparisons.

Despite the significant progress that artificial intelligence has made in many areas, such as natural language understanding, image recognition, and complex decision-making, they can still make mistakes in basic mathematical operations and logical reasoning, showing the limitations of current technology.

QQ Screenshot 20240717135712.jpg

Why do AI models make such mistakes?

Bias in training data: The training data for AI models may not include enough examples to correctly handle this specific type of numerical comparison problem. If the model is exposed to data during training mainly showing that larger numbers always have more decimal places, it may incorrectly interpret more decimal places as a larger value.

Floating-point precision issues: In computer science, the representation and calculation of floating-point numbers involve precision issues. Even tiny differences can result in incorrect results when comparing, especially when precision is not explicitly specified.

Lack of context understanding: Although context clarity may not be the main issue in this case, AI models generally need to interpret information correctly based on context. If the way the question is phrased is not clear or does not match the patterns the AI is familiar with in the training data, it may lead to misunderstandings.

Influence of prompt design: How a question is posed to the AI is crucial for obtaining the correct answer. Different ways of asking may affect the AI's level of understanding and the accuracy of its response.

How can we improve?

Enhancing training data: By providing more diverse and accurate training data, we can help AI models better understand numerical comparisons and other basic mathematical concepts.

Optimizing prompt design: Carefully designed question phrasing can increase the chances of the AI providing the correct answer. For example, using more explicit numerical representations and questioning methods can reduce ambiguity.

Improving numerical accuracy: Developing and adopting algorithms and technologies that can more accurately handle floating-point operations to reduce computational errors.

Strengthening logical and common sense reasoning abilities: By specifically training for logical and common sense reasoning, we can enhance the AI's capabilities in these areas, allowing it to better understand and handle tasks related to common sense.