AI Models Falter: Lunar Dark Side Claims 9.11 is Greater than 9.9

AIbase

Published inAI News · 4 min read · Jul 17, 2024

150

Recently, a simple elementary school math question has stumped many AI large models, with 8 out of 12 internationally renowned AI large models incorrectly answering the question "Which is larger, 9.11 or 9.9?"

During the test, most large models mistakenly believed that 9.11 was greater than 9.9 when comparing numbers after the decimal point. Even under explicit mathematical context constraints, some large models still provided incorrect answers, revealing their shortcomings in mathematical abilities.

Among the 12 large models tested, four models including Alibaba's Tongyi Qianwen, Baidu's Wenxin Yiyan, Minimax, and Tencent's Yuanbao answered correctly. In contrast, ChatGPT-4o, ByteDance's Doubao, the Dark Side of the Moon's Kimi, Zhipu Qingyan, Lingyi Wanzhi, Jietu Xingchen Yuewen, Baichuan Zhihui's Bai Xiaoying, and SenseTime's Shangliang all answered incorrectly.

Some industry insiders believe that the poor performance of large models on mathematical problems may be due to their design being more akin to liberal arts students rather than science students. Generative language models are typically trained by predicting the next word, which makes them excel in processing linguistic data but struggle with mathematical reasoning.

In response to this issue, the Dark Side of the Moon stated: "Our exploration of what large models can and cannot do is still in its very early stages."

"We eagerly anticipate users discovering and reporting more boundary cases, whether it's recent questions like 'Which is larger, 9.9 or 9.11, 13.8 or 13.11?' or previous ones like 'How many 'r's are in 'strawberry'?' These boundary cases help us better understand the capabilities of large models. However, to completely solve the problem, it's not enough to rely solely on fixing each case individually, as these situations are as inexhaustible as scenarios encountered by self-driving cars. Instead, we need to continuously enhance the intelligence level of the underlying foundational models, making large models more powerful and comprehensive, capable of performing well in various complex and extreme situations."

Some experts believe that the key to improving the mathematical abilities of large models lies in training corpus. Large language models are primarily trained on text data from the internet, which contains relatively few mathematical problems and solutions. Therefore, future training of large models needs to be more systematically constructed, especially in complex reasoning.

Large Model Mathematical Ability Generative Language Model Training Corpus

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Baidu Releases PaddlePaddle Framework 3.0 to Empower Intelligent Development in the Age of Large Models

Apr 3, 2025

120

PaddlePaddle 3.0 Officially Released: Supports Large Models Like Wenxin 4.5, Reduces Cross-Chip Adaptation Costs by 80%

Baidu's deep learning platform, PaddlePaddle, recently announced the official release of its new generation framework, PaddlePaddle 3.0. This release introduces five core technological innovations, including "dynamic and static unified automatic parallelism," aiming to effectively reduce the development and training costs of large models and support the infrastructure construction of the large model era. As the core infrastructure supporting large model training and inference tasks, PaddlePaddle 3.0 demonstrates excellent performance optimization. The framework already supports multiple mainstream large models, including Wenxin 4.5 and Wenxin X1, and through optimization...

Apr 2, 2025

260

National Astronomical Observatories of China and Alibaba Cloud Release World's First Solar Large Model, Jinwu: M5-Class Solar Flare Prediction Accuracy Exceeds 91%

Recently, the National Astronomical Observatories of China (NAOC) and Alibaba Cloud jointly announced the launch of Jinwu, the world's first solar large model, marking a significant step in the deep integration of solar physics research and artificial intelligence technology. The model, built on Alibaba Cloud's open-source Tongyi Qianwen framework, boasts a prediction accuracy exceeding 91% for M5-class solar flares, achieving the highest global level for this type of prediction. This achievement not only improves the accuracy of space weather forecasting but also provides new techniques for addressing the potential terrestrial impacts of solar activity.

Apr 1, 2025

320

Tuniu Launches AI Assistant Xiao Niu: Open-Source Large Model Empowers One-Stop Smart Travel Service

On April 1st afternoon, Tuniu Travel announced the official launch of its self-developed AI assistant, "Xiao Niu," a travel application agent available on both the Tuniu Travel app and the "Xiao Niu" mini-program. According to the announcement, "Xiao Niu" innovatively utilizes the open-source large models DeepSeek and Tongyi Qianwen, deeply integrating with vertical travel application scenarios to provide users with a more convenient and efficient travel experience. Through "Xiao Niu," users can easily search and book air tickets, hotels, and train tickets. Furthermore, this AI...

Apr 1, 2025

200

SF Express Same City: Partnerships with Doubao, Tencent HunYuan, and Others

SF Express Same City recently announced a comprehensive push towards digitalization and AI-driven decision-making across all operational aspects. The company aims to build a large model infrastructure tailored to the on-demand delivery industry for increased efficiency and improved service. Leveraging the DeepSeek open-source ecosystem and its multimodal AI capabilities, SF Express Same City can rapidly develop customized solutions. This allows for quick adaptation and adjustments to services and products based on specific client needs and market demands.

Apr 1, 2025

240

National Astronomical Observatory of China Unveils 'Jinwu', the World's First Solar Large Model Powered by Tongyi Qianwen

The National Astronomical Observatory of China (NAOC), in collaboration with Alibaba Cloud, has announced the successful development of 'Jinwu', the world's first solar large model. This groundbreaking achievement, built upon Alibaba Cloud's Tongyi Qianwen series of open-source models, marks a significant breakthrough in the application of artificial intelligence in astronomy.

Apr 1, 2025

360

Baidu's Wenxiao Speech Model Receives Comprehensive Upgrade with Multi-Model Fusion Scheduling and New Speech Large Model

Mar 31, 2025

300

Tencent HunYuan Large Model Application Practice Course Officially Launched on the National Smart Education Platform

Mar 31, 2025

290

iFLYTEK Medical Releases World's First Type 1 Diabetes-Specific Large Language Model, Claimed to Surpass GPT-4!

iFLYTEK Medical announced today the launch of the world's first Type 1 Diabetes-specific large language model, a significant achievement stemming from the core results of a national major project on four chronic diseases. This marks a crucial step in translating key research findings from the laboratory to clinical applications, representing a first for Anhui Province in translating national-level major research project results in chronic disease prevention and control. This project focuses on key pain points in the diagnosis and treatment of Type 1 Diabetes, integrating multimodal data and extensive clinical experience, and leveraging the powerful capabilities of the iFLYTEK Starfire Medical large language model X1.

Mar 30, 2025

380

Huawei's ModelEngine Achieves Certification from China Academy of Information and Communications Technology, Boosting AI Large Model Development

Mar 27, 2025

300

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

AI Models Falter: Lunar Dark Side Claims 9.11 is Greater than 9.9

AIbase

This article is from AIbase Daily

AI News Recommendations

Baidu Releases PaddlePaddle Framework 3.0 to Empower Intelligent Development in the Age of Large Models

PaddlePaddle 3.0 Officially Released: Supports Large Models Like Wenxin 4.5, Reduces Cross-Chip Adaptation Costs by 80%

National Astronomical Observatories of China and Alibaba Cloud Release World's First Solar Large Model, Jinwu: M5-Class Solar Flare Prediction Accuracy Exceeds 91%

Tuniu Launches AI Assistant Xiao Niu: Open-Source Large Model Empowers One-Stop Smart Travel Service

SF Express Same City: Partnerships with Doubao, Tencent HunYuan, and Others

National Astronomical Observatory of China Unveils 'Jinwu', the World's First Solar Large Model Powered by Tongyi Qianwen

Baidu's Wenxiao Speech Model Receives Comprehensive Upgrade with Multi-Model Fusion Scheduling and New Speech Large Model

Tencent HunYuan Large Model Application Practice Course Officially Launched on the National Smart Education Platform

iFLYTEK Medical Releases World's First Type 1 Diabetes-Specific Large Language Model, Claimed to Surpass GPT-4!

Huawei's ModelEngine Achieves Certification from China Academy of Information and Communications Technology, Boosting AI Large Model Development