Stanford Research Team Launches SIRIUS: A Self-Optimizing Multi-Agent Reasoning Framework

With the development of artificial intelligence technology, multi-agent systems are becoming increasingly capable of handling complex tasks across various fields. These systems consist of multiple specialized agents that collaborate to leverage their strengths and achieve common goals. Such collaboration excels in areas like complex reasoning, programming, drug discovery, and safety assurance, as the structured interactions between agents not only enhance problem-solving efficiency but also allow for mutual correction, thereby improving each agent's output. Research indicates that this collaborative approach often outperforms individual agent performance in tasks requiring strict reasoning or fact verification.

However, optimizing multi-agent systems still faces significant challenges. A major issue is how to obtain appropriate training signals for each agent. While task-level reward feedback can be obtained, it becomes unclear how to allocate credit among the various agents. Due to the complex and unstructured reasoning processes of language models, attributing success or failure to specific decisions and reasoning steps of each agent becomes more difficult, similar to the credit allocation problem in multi-agent reinforcement learning.

To address this issue, researchers at Stanford University introduced the SIRIUS framework, a self-improving multi-agent optimization framework that utilizes reasoning-driven learning. SIRIUS builds a repository of experiences by retaining successful reasoning trajectories, providing a high-quality training set. At the same time, it enhances unsuccessful attempts to enrich the dataset. Research results show that SIRIUS improves performance in reasoning and biomedical question answering by 2.86% to 21.88%, and enhances agent negotiation capabilities in competitive environments. Agents iteratively improve their collaborative strategies by learning from successful interactions, achieving self-optimization without direct supervision.

The SIRIUS framework also includes an iterative fine-tuning process where agents interact in a natural language environment, generating responses, evaluating them, improving low-quality outputs, and updating strategies through supervised learning. Through continuous response optimization, SIRIUS enhances the reasoning and decision-making capabilities in language-based multi-agent systems, leading to more effective and coherent interactions over time.

In experiments, SIRIUS was compared with various baseline models, including single agents, STaR, CoMM, and TextGrad. The results showed that SIRIUS excelled in problem-solving, task decomposition, and agent collaboration. Ablation studies revealed that specialized agent roles, multi-agent optimization, and experience enhancement are key factors in improving performance. SIRIUS also performed well in actor-critic and competitive environments, outperforming other methods in tasks like PubMedQA and resource exchange games.

In summary, SIRIUS is a framework designed to optimize multi-agent systems by learning from successful interactions and improving upon failures. It builds a repository of high-quality reasoning steps as a training set for system optimization, while enriching the repository by enhancing unsuccessful trajectories. This framework significantly improves reasoning, biomedical question answering, and agent negotiation capabilities, promoting continuous self-improvement in multi-agent collaboration.

Paper: https://arxiv.org/pdf/2502.04780

Key Points:
🌟 The SIRIUS framework optimizes the performance of multi-agent systems through self-improvement and learning from successful experiences.
📈 Research shows that SIRIUS achieves performance improvements of 2.86% to 21.88% in tasks like reasoning and biomedical question answering.
🤝 Interaction between multiple agents and the construction of an experience repository are central to SIRIUS's optimization process, aiding agents in collaborating more effectively on complex tasks.