Research Finds OpenAI's o1-preview Outperforms Doctors in Diagnosing Complex Medical Cases

AIbase基地

Published inAI News · 5 min read · Dec 25, 2024

392

A new study suggests that OpenAI's o1-preview AI system may outperform human doctors in diagnosing complex medical cases. Research teams from Harvard Medical School and Stanford University conducted comprehensive medical diagnostic tests on o1-preview, revealing significant improvements compared to earlier versions.

According to the study results, o1-preview achieved a correct diagnosis rate of 78.3% across all tested cases. In a direct comparison of 70 specific cases, the system's accuracy soared to 88.6%, significantly surpassing its predecessor GPT-4's 72.9%. The performance of o1-preview in medical reasoning is also noteworthy. Using the R-IDEA scale, a standard for assessing medical reasoning quality, the AI system scored full marks on 78 out of 80 cases. In contrast, experienced doctors achieved full marks in only 28 cases, while medical residents managed to do so in just 16 cases.

The researchers acknowledged that o1-preview might have included some test cases in its training data. However, when they tested the system on new cases, its performance only slightly declined. Dr. Adam Rodman, one of the authors of the study, emphasized that while this is a benchmark study, the findings have important implications for medical practice.

o1-preview particularly excelled in handling complex management cases specifically designed by 25 experts. "Humans struggle with these challenging problems, but o1's performance is impressive," Rodman explained. In these complex cases, o1-preview scored 86%, while doctors using GPT-4 only scored 41%, and traditional tools scored just 34%.

However, o1-preview is not without its flaws. The system showed no significant improvement in probability assessments; for example, when estimating the likelihood of pneumonia, o1-preview provided a 70% estimate, which is well above the scientific range of 25%-42%. Researchers found that o1-preview performed exceptionally well in tasks requiring critical thinking but struggled with more abstract challenges, such as estimating probabilities.

Moreover, o1-preview typically provides detailed answers, which may enhance its scoring. However, the study focused solely on the performance of o1-preview working independently, without assessing its effectiveness in collaboration with doctors. Some critics pointed out that the diagnostic tests suggested by o1-preview are often costly and impractical.

Despite OpenAI releasing new versions of o1 and o3 that excel in complex reasoning tasks, these more powerful models still do not address the practical application and cost issues raised by critics. Rodman called for researchers to develop better methods for evaluating medical AI systems to capture complexity in real medical decision-making. He emphasized that this study does not imply that AI can replace doctors; real medical care still requires human involvement.

Paper: https://arxiv.org/abs/2412.10849

Key Points:
🌟 o1-preview surpasses doctors in diagnosis rates, achieving an accuracy of 88.6%.
🧠 In medical reasoning, o1-preview scored full marks on 78 out of 80 cases, far exceeding doctor performance.
💰 Despite its excellent performance, the high costs and impractical test suggestions of o1-preview in real-world applications still need to be addressed.

Acceleration of Brain-Computer Interface Industrialization: China's Market Size to Reach 5.58 Billion Yuan by 2027

As the autumn recruitment season approaches, brain-computer interface technology is accelerating its industrialization and has become a new hot spot for college graduates' employment. This cutting-edge interdisciplinary field is expected to reach a market size of 5.58 billion yuan by 2027, with an annual growth rate of 20%. Currently, hundreds of universities and research institutions are involved in its development.

Microsoft's AI Chief Sulman: Microsoft Will Not Develop Sexual Content AI and Draw a Line with OpenAI

Microsoft's CEO of AI business, Sulman, clearly stated that the company will not develop sexual content AI services, emphasizing that this is not within the scope of its services. This statement was made a week after OpenAI announced allowing adults to create sexual content, highlighting Microsoft's firm stance on the ethics of generative AI.

AI Daily: Tencent Launches New IMA 2.0; Microsoft Unveils a Series of Major Updates for Copilot; Alibaba's Quark AI Glasses Go on Pre-sale

[AI Daily] The Kimi k2 model from the company Dark Side of the Moon has received praise for its performance surpassing GPT-5, and the company is about to complete another round of tens of millions of dollars in funding, just months after the last funding round. The domestic AI large model field remains highly active, and developers can learn about the latest product updates through the platform.

China University of Science and Technology and ByteDance Launch MoGA Long Video Generation Model: One-Click Generation of Minute-Level Multi-Shot Short Films

The University of Science and Technology of China and ByteDance jointly launched an end-to-end long video generation model that can directly generate high-quality videos with a duration of minutes, 480p resolution, and 24fps, supporting multi-shot switching. The core innovation is the underlying algorithm MoGA, a novel attention mechanism designed to tackle the challenges of long video generation, marking a key breakthrough in domestic video generation technology.

AI Data Center Company Crusoe Completes $1.38 Billion Equity Financing, Valuation Exceeds $10 Billion

Crusoe completed a $1.38 billion equity financing, with its valuation exceeding $10 billion, reflecting investors' high confidence in the AI infrastructure market. The company operates a large data center in Texas, providing services to giants such as OpenAI and Oracle. This round of financing was led by Valor Equity Partners and the Abu Dhabi Sovereign Wealth Fund.

EA and Stability AI Collaborate: Integrating AI into Game Development to Accelerate Content Creation

EA has formed a partnership with Stability AI, integrating AI technologies such as Stable Diffusion into game development. The two parties plan to jointly develop AI models and tools, redefining content production methods, aiming to accelerate iteration and expand creative boundaries. EA emphasizes that AI is positioned as an auxiliary tool to enhance efficiency, supporting rapid iteration and process optimization, rather than replacing human creativity.

Microsoft Launches a New AI Character: Mico, Clippy Returns as an AI Companion

Microsoft introduced the personified AI character Mico at the Copilot Fall Launch Event. The name comes from Microsoft Copilot, and it has features such as listening, changing color, and customization, positioning it as a warm virtual companion. Its inspiration seems to be derived from the classic Office assistant Clippy, and it includes hidden easter egg interactive designs.

Opera Neon Browser Launches Deep Research Agent ODRA

Recently, Opera announced that the Opera Neon browser will launch a new AI feature called Opera Deep Research Agent (referred to as ODRA). This marks a key step in Opera's efforts to build an AI ecosystem for browsers, providing users with a new and efficient solution for complex query problems. ODRA has been under development for over two years and is a core component of Opera's self-developed AI engine. After months of continuous optimization, ODRA has achieved significant improvements in performance.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Research Finds OpenAI's o1-preview Outperforms Doctors in Diagnosing Complex Medical Cases

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Acceleration of Brain-Computer Interface Industrialization: China's Market Size to Reach 5.58 Billion Yuan by 2027

Microsoft's AI Chief Sulman: Microsoft Will Not Develop Sexual Content AI and Draw a Line with OpenAI

AI Daily: Tencent Launches New IMA 2.0; Microsoft Unveils a Series of Major Updates for Copilot; Alibaba's Quark AI Glasses Go on Pre-sale

China University of Science and Technology and ByteDance Launch MoGA Long Video Generation Model: One-Click Generation of Minute-Level Multi-Shot Short Films

AI Data Center Company Crusoe Completes $1.38 Billion Equity Financing, Valuation Exceeds $10 Billion

EA and Stability AI Collaborate: Integrating AI into Game Development to Accelerate Content Creation

Meta Integrates AI Editing Features Directly into Instagram Stories for Instant Dream Effects

Microsoft Launches a New AI Character: Mico, Clippy Returns as an AI Companion

Opera Neon Browser Launches Deep Research Agent ODRA

Two 20-Year-Old Dropouts Created Turbo AI: The AI Note-taking Myth with 5 Million Users

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Research Finds OpenAI's o1-preview Outperforms Doctors in Diagnosing Complex Medical Cases

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Acceleration of Brain-Computer Interface Industrialization: China's Market Size to Reach 5.58 Billion Yuan by 2027

Microsoft's AI Chief Sulman: Microsoft Will Not Develop Sexual Content AI and Draw a Line with OpenAI

AI Daily: Tencent Launches New IMA 2.0; Microsoft Unveils a Series of Major Updates for Copilot; Alibaba's Quark AI Glasses Go on Pre-sale

China University of Science and Technology and ByteDance Launch MoGA Long Video Generation Model: One-Click Generation of Minute-Level Multi-Shot Short Films

AI Data Center Company Crusoe Completes $1.38 Billion Equity Financing, Valuation Exceeds $10 Billion

EA and Stability AI Collaborate: Integrating AI into Game Development to Accelerate Content Creation

Meta Integrates AI Editing Features Directly into Instagram Stories for Instant Dream Effects

Microsoft Launches a New AI Character: Mico, Clippy Returns as an AI Companion

Opera Neon Browser Launches Deep Research Agent ODRA

Two 20-Year-Old Dropouts Created Turbo AI: The AI Note-taking Myth with 5 Million Users

GEO Services