Magentic-One: A Multi-Agent System for Complex Tasks

Paper Address: https://arxiv.org/html/2411.04468v1

GitHub Address: https://github.com/microsoft/autogen/tree/main/python/packages/autogen-magentic-one

Magentic-One: A Multi-Agent System for Complex Tasks

Abstract

Magentic-One is an innovative open-source multi-agent system that solves complex tasks by simulating human planning and execution capabilities. The core of the system is an agent named Orchestrator, which is responsible for formulating and adjusting task execution plans, monitoring progress, and guiding other agents. Magentic-One has demonstrated competitive performance against state-of-the-art technologies in multiple benchmark tests, including GAIA, AssistantBench, and WebArena, covering a diverse range of tasks from web browsing to file processing and coding.

Keywords

Multi-agent system, complex tasks, autonomous planning, error recovery, modular design

1. Introduction

With the rapid development of artificial intelligence and foundational models, we are increasingly approaching the realization of agent systems capable of representing humans in completing complex tasks. These systems not only enhance our productivity but also have the potential to fundamentally change our way of life. Magentic-One was born out of such a vision, showcasing strong performance and generalization capabilities in diverse tasks through multi-agent collaboration.

2. Related Work

Single-agent approaches: Introduced the development of autonomous agents based on large language models (LLMs), which have shown significant skills in software development, network operations, and more.
Multi-agent approaches: The multi-agent paradigm provides a modular and flexible way to handle complex tasks, where each agent can access different tools or play different roles within a team.
Agent evaluation: To assess agents’ performance on general multi-step tasks, many benchmark tests have been proposed in the literature, often involving interactions with real websites, which can more accurately reflect real-world tasks.

3. Problem Setup

The goal of Magentic-One is to build a general agent system capable of solving complex tasks across multiple domains. The system defines a complex task if it requires or significantly benefits from a process involving planning, action, observation, and reflection. This setup requires the agent system not only to generate labels but also to execute code, use tools, or interact with the environment.

4. Magentic-One Overview

Magentic-One is based on a multi-agent architecture, where the Orchestrator agent is responsible for task decomposition and planning, guiding other agents to execute sub-tasks, tracking overall progress, and taking corrective actions when necessary. The design of the system allows agents to autonomously adapt and act in dynamically changing environments. The agents in Magentic-One include:

Orchestrator: Responsible for high-level planning and task decomposition.
WebSurfer: An agent specialized in browsing the web and executing related operations.
FileSurfer: An agent for processing and reading local files.
Coder: An agent responsible for writing and debugging code.
ComputerTerminal: Provides an environment for executing code and commands.

This multi-agent design enables Magentic-One to efficiently solve open-ended problems.

5. Experiments

Magentic-One has demonstrated competitive performance against state-of-the-art technologies in three benchmark tests: GAIA, AssistantBench, and WebArena. The system’s performance has been rigorously evaluated through the AutoGenBench tool. The experimental results show that Magentic-One achieved a task completion rate of 38% in GAIA, 32.8% in WebArena, and an accuracy of 27.7% in AssistantBench. These results indicate the effectiveness and versatility of Magentic-One in handling complex tasks.

6. Discussion

The advantages of multi-agent design are discussed, including ease of development and system flexibility. The modular design of Magentic-One allows for rapid adaptation and expansion, providing stronger performance than single-agent systems. The paper also points out the current limitations of the system, including high costs, latency, and limited modality support. Additionally, the risks and potential societal impacts of the agent system are discussed, emphasizing the need for safety and ethical considerations in designing agents.

7. Conclusion

Magentic-One represents a significant advancement in agent systems for solving open-ended tasks, showcasing strong performance and generalization capabilities. Through open-source implementation, Magentic-One provides an important foundation for future agent research and applications. The paper emphasizes the importance of continuing to improve and expand agent systems to address increasingly complex tasks and environments.

Notes

Using Magentic-One involves interactions with a digital world designed for humans, which brings inherent risks. To minimize these risks, consider the following precautions:

Run all tasks in a Docker container to isolate agents and prevent direct system attacks;
Run agents in a virtual environment to prevent them from accessing sensitive data;
Closely monitor logs during and after execution to detect and mitigate risky behaviors;
Run examples under human supervision to oversee agents and prevent unintended consequences;
Limit agents’ access to the internet and other resources to prevent unauthorized actions;
Ensure agents cannot access potentially leaked sensitive data or resources and do not share sensitive information with agents;
Agents may attempt to recruit human assistance or engage in risky behaviors such as accepting cookie agreements without human involvement;
Always ensure agents are monitored and run in a controlled environment to prevent unintended consequences;
Magentic-One may be vulnerable to web prompt injection attacks;