Even Top AI Models Struggle with Complex Travel Planning, OpenAI o1-preview Also Finds It Challenging

AIbase基地

Published inAI News · 4 min read · Oct 21, 2024

157

A recent study reveals that even advanced AI language models, such as OpenAI's latest o1-preview, struggle with complex planning tasks.

This research, conducted jointly by scientists from Fudan University, Carnegie Mellon University, ByteDance, and The Ohio State University, evaluated the performance of AI models on two planning benchmarks: BlocksWorld and TravelPlanner.

Artificial Intelligence, AI, Human Brain, Future

In the classic planning task of BlocksWorld, most models had accuracy rates below 50%, with only o1-mini (slightly below 60%) and o1-preview (close to 100%) performing relatively well.

However, when researchers turned their attention to the more complex TravelPlanner, the performance of all models was disappointing. GPT-4o achieved a final success rate of only 7.8%, while o1-preview reached 15.6%. Other models, such as GPT-4o-Mini, Llama3.1, and Qwen2, scored between 0 and 2.2%. Although o1-preview showed improvement over GPT-4o, it still fell far short of human planning capabilities.

The researchers identified two main issues. Firstly, models performed poorly in integrating rules and conditions, often resulting in plans that violated preset guidelines. Secondly, as planning time increased, they gradually lost focus on the original problem. To measure the impact of different input components on the planning process, the research team used a "permutation feature importance" method.

Additionally, the research team tested two common strategies to enhance AI's planning abilities. The first was episodic memory updating, which acquired knowledge from previous planning attempts, improving understanding of constraints but not leading to more detailed consideration of individual rules. The second was parametric memory updating, which enhanced the impact of tasks on planning through fine-tuning, but the core issue—diminishing influence over extended plans—remained. Both methods showed some improvement but failed to fully address the fundamental problems.

It is worth noting that the related code and data will soon be made publicly available on GitHub.

Code repository: https://github.com/hsaest/Agent-Planning-Analysis

Key points:
🌍 The study shows that AI models like OpenAI's o1-preview perform poorly in complex travel planning, with GPT-4o achieving a success rate of only 7.8%.
📉 Most models perform reasonably well in BlocksWorld but struggle to achieve ideal results in TravelPlanner.
🧠 The research found that models primarily suffer from inadequate integration of rules and a loss of focus over time.

AI Tastes and Understands New Breakthrough! It's So Easy to Distinguish Coke from Coffee!

Italian scientists developed GO-ISMD, an artificial taste system with 90% accuracy in identifying basic tastes. Using graphene oxide, it detects flavors via conductivity changes, achieving 92.3% accuracy in distinguishing cola/coffee. Published in PNAS, it could help restore taste for impaired patients.....

Unsloth AI Releases 1.8-bit Quantized Kimi K2 Model, Significantly Reducing Deployment Costs

Unsloth AI quantized Moonshot AI's 1T-parameter Kimi K2 model to 1.8bit, reducing size by 80% to 245GB while maintaining performance. The MoE-based model excels in coding and reasoning, now deployable on 512GB M3Ultra devices, lowering costs. This advancement positions Kimi K2 as a GPT-4.1 competitor, benefiting SMEs and boosting open-source AI adoption in education/healthcare.....

Meta Announces World's First 1GW+ Power Supercomputer Cluster to Go Live, AI Computing Competition Rises to New Level

Meta accelerates AI infrastructure, targeting a 1GW 'Prometheus' supercomputer with 1.3M NVIDIA H100 GPUs (2 exaflops) by 2026, plus 5GW 'Hyperion' cluster. Plans $60-65B investment by 2025 for AI/data centers, competing with OpenAI/xAI. Commits to open-source and privacy despite environmental concerns.....

What is UTCP? A New Tool Calling Protocol: Let AI Agents Directly Access Tools, Reducing Latency

Global developers have introduced a universal tool calling protocol (UTCP), allowing AI agents to directly call various tools without relying on proxy servers. Compared to traditional MCP protocols, UTCP supports native interfaces such as HTTP and gRPC, significantly reducing calling latency and complexity. The protocol retains existing enterprise security measures while providing SDKs in TypeScript and Python. Developers can participate in improving the protocol through open-source projects. UTCP has the potential to open up new pathways for AI tool integration.

Cognition Acquires Windsurf AI Coding Tool, Intensifying the Competition in AI Coding!

A dramatic acquisition has recently taken place in the AI coding field: Cognition acquired Windsurf company. Previously, this company had experienced a $2.4 billion reverse talent acquisition by Google and an unsuccessful $3 billion acquisition offer from OpenAI. Windsurf generates $82 million in annual revenue, has 350 enterprise clients, and tens of thousands of daily active users. After the acquisition, Cognition will integrate Windsurf's AI development environment with its own Devin coding assistant and regain access to the Claude AI model. This deal marks another significant move in the competition.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Even Top AI Models Struggle with Complex Travel Planning, OpenAI o1-preview Also Finds It Challenging

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Tastes and Understands New Breakthrough! It's So Easy to Distinguish Coke from Coffee!

AI Daily: Meitu Launches Imaging AI Agent RoboNeo; 1.8bit Quantized Kimi K2 Model Released; Amazon Introduces AI Code Editor Kiro

Grok4 Is Coming! Elon Musk's New AI Star Successfully Challenges Programming Tests

Kimi K2 Sweeps Globally! Open Source AI Tops OpenRouter, Surpassing XAI in Market Share

Claude Major Upgrade! One-Click Link to MCP Tool Directory, AI Workflow Efficiency Soars

Unsloth AI Releases 1.8-bit Quantized Kimi K2 Model, Significantly Reducing Deployment Costs

Meta Announces World's First 1GW+ Power Supercomputer Cluster to Go Live, AI Computing Competition Rises to New Level

UTCP Makes a Strong Entry! Revolutionizing MCP AI Tool Calls into a New Era of Zero Packaging

What is UTCP? A New Tool Calling Protocol: Let AI Agents Directly Access Tools, Reducing Latency

Cognition Acquires Windsurf AI Coding Tool, Intensifying the Competition in AI Coding!