BaiChuan Intelligence and Tianjin University Launch 'Sibyl System' Agent Framework, Leading GAIA Complex Task Rankings

AIbase基地

Published inAI News · 4 min read · Jul 24, 2024

291

BaiChuan Intelligence has partnered with Tianjin University to launch the "Sibyl System" agent framework, achieving first place on the GAIA Leader Board. GAIA, proposed by Meta, Huggingface, and AutoGPT in November 2023, is a novel evaluation scheme primarily assessing the capabilities and solutions of agents in executing complex tasks. This evaluation scheme reveals the shortcomings of existing models and provides directions for improvement in model and agent development.

The test questions of GAIA are closer to the real world, requiring AI to possess abilities such as reasoning, multi-modal understanding (text, images, audio/video), web browsing, and tool usage. These questions are easy for humans to understand but pose significant challenges for models. For instance, GPT-4 has a success rate of only 15% in the tests, while human experimenters can achieve 92%. Completing these questions usually requires a long logical chain and time, involving multiple steps and tools.

WeChat Screenshot_20240724082043.png

"Sibyl System" framework features include:

Human-like browser interface as an alternative to retrieval-augmented generation.
Question-answering instead of dialogue, using stateless question-answering functions to simplify the system architecture.
Using only two general tools, the web browser and Python environment, reducing reliance on specialized tools.
From System1 to System2, introducing a "jury" mechanism, conducting self-criticism and correction through multi-agent debates, and improving response accuracy by utilizing information in the global workspace.

Sibyl System is a structurally simple yet powerful agent framework based on large language models, capable of solving complex reasoning problems using a few tools. By introducing the Global Workspace and Multi-Agent mechanisms, as well as a browser-based general information acquisition channel, it reduces system complexity while expanding the complexity of problem-solving, achieving a shift in models from "fast thinking" to "slow thinking." Sibyl System also has excellent scalability and ease of debugging, allowing for easy replacement of other model's agent modules to enhance model capabilities.

Technical Report: https://arxiv.org/pdf/2407.10718

"Baichuan Intelligence"

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

The UK is Actively Addressing the Power Challenges Brought by Artificial Intelligence

Jun 30, 2025

Tencent Open Sources Hunyuan-A13B: An AI Model with Small Size and Great Intelligence

Jun 30, 2025

860

OpenAI CEO: Be Wary of Over-Trust in Artificial Intelligence

In a recent interview, Sam Altman, the CEO of OpenAI, expressed his concern about users' excessive trust in the AI chatbot ChatGPT. Although ChatGPT is becoming increasingly widely used around the world, Altman pointed out that this technology is not without flaws, and users should remain cautious when using it. In the first episode of OpenAI's official podcast, Altman mentioned that although ChatGPT is beloved by many and applied in various fields

Jun 30, 2025

110

OpenAI CEO Speaks Out: Don't Believe Too Much in Artificial Intelligence, There Are Risks Behind It!

In the context of artificial intelligence becoming increasingly prominent, OpenAI's CEO Sam Altman recently issued an important warning. He pointed out that although his company's chatbot ChatGPT has gained widespread application and recognition globally, the level of user trust in it has surprised him, even caused some concern. In a recent interview, Altman emphasized that users should maintain a cautious attitude towards ChatGPT. Altman at OpenAI

Jun 30, 2025

140

Major Announcement! China's First Open-Source Ocean Large Model, Cangyuan, Launches to Promote the Era of Ocean Intelligence!

China's first open-source large model in the marine field, OceanGPT (Cangyuan), was officially launched in Hangzhou, Zhejiang. This innovative achievement was developed by the National Key Laboratory of Marine Precision Sensing Technology at Zhejiang University, marking a significant step forward for China in the field of marine technology. OceanGPT has basic capabilities for answering questions about marine knowledge and can interpret multi-modal data such as sonar images and marine observation maps in natural language. This capability allows OceanGPT to perform exceptionally well when handling complex marine data. The model also adopts

Jun 27, 2025

810

Google Launches Experimental AI Try-On App Doppl: A New Virtual Fashion Experience

Google launched a new experimental app called Doppl on Thursday in the US for iOS and Android platforms, aiming to let users see how different clothes look on them through artificial intelligence technology. The app uses AI to generate virtual images of users wearing clothes, even converting static images into dynamic videos, providing an immersive try-on experience. The core feature of Doppl allows users to upload full-body photos, then import photos or screenshots of clothing to try them on their digital version.

Jun 27, 2025

240

Google Launches Offerwall Tool: Help Publishers Cope with AI Search Impact, Test Shows Revenue Increased by 9%

AIbase Report — Features and Application Scenarios Offerwall allows publishers to provide website readers with various ways to access content, including small payments, participating in surveys, watching ads, etc. Publishers can also add custom options, such as subscribing to newsletters. The tool is now available for free in Google Ad Manager and uses AI intelligence to decide when to display it to visitors, maximizing engagement and revenue. After more than a year of testing, 1,000 publishers have participated in the trial. Google has partnered with third parties

Jun 27, 2025

110

Anthropic Launches New Feature: Users Can Directly Build AI Applications in Claude

Anthropic, an American startup focused on generative artificial intelligence, recently announced the launch of a new feature called "Artifacts," which allows users to create personalized applications. Users can create by means of simple conversations without any programming knowledge. This feature's release marks an important step for Anthropic in the field of artificial intelligence application development. The Artifacts feature was initially launched last June and became available to all users in August. Users can access it next to the conversation window.

Jun 26, 2025

1.1k

High-Level Call Between OpenAI and Microsoft! The Future of Their Cooperation Remains Uncertain

As competition in the field of artificial intelligence intensifies, the CEO of OpenAI, Sam Altman, recently had a phone conversation with the CEO of Microsoft, Satya Nadella, discussing their future cooperation. This information was revealed in an interview on Altman's podcast on Tuesday. He mentioned that the conversation with Nadella mainly focused on how to revise their investment terms and issues related to future equity. It is known that Microsoft is a key investor in OpenAI, and recently there have been some differences between the two sides regarding investment details, especially concerning Microsoft's future shareholding.

Jun 26, 2025

110

Ring Doorbell and Camera Get AI Upgrade: Provide Detailed Motion Event Descriptions, Privacy Concerns Arise

Amazon's smart home security company Ring announced on Wednesday that it will introduce an innovative artificial intelligence feature into its doorbells and cameras. This new feature will provide users with specific text descriptions of the movement activities detected by their devices, significantly improving the clarity of real-time notifications. Now, when users receive real-time notifications about events around their homes, the updated information will be more detailed and intuitive. For example, the notification might read, 'A person is walking up the steps with a black dog,' or 'Two people are watching a white car parked in the driveway.'

Jun 26, 2025

150

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

BaiChuan Intelligence and Tianjin University Launch 'Sibyl System' Agent Framework, Leading GAIA Complex Task Rankings

AIbase基地

This article is from AIbase Daily

AI News Recommendations

The UK is Actively Addressing the Power Challenges Brought by Artificial Intelligence

Tencent Open Sources Hunyuan-A13B: An AI Model with Small Size and Great Intelligence

OpenAI CEO: Be Wary of Over-Trust in Artificial Intelligence

OpenAI CEO Speaks Out: Don't Believe Too Much in Artificial Intelligence, There Are Risks Behind It!

Major Announcement! China's First Open-Source Ocean Large Model, Cangyuan, Launches to Promote the Era of Ocean Intelligence!

Google Launches Experimental AI Try-On App Doppl: A New Virtual Fashion Experience

Google Launches Offerwall Tool: Help Publishers Cope with AI Search Impact, Test Shows Revenue Increased by 9%

Anthropic Launches New Feature: Users Can Directly Build AI Applications in Claude

High-Level Call Between OpenAI and Microsoft! The Future of Their Cooperation Remains Uncertain

Ring Doorbell and Camera Get AI Upgrade: Provide Detailed Motion Event Descriptions, Privacy Concerns Arise