A Multi-Agent System for Complex Tasks
About a week ago, Microsoft released a new intelligent system called Magentic-One, aimed at “solving complex tasks,” which seems to have gone unnoticed. With the recent discussions surrounding the capabilities of Anthropic’s computing, Microsoft appears to want to re-establish its credentials in this area.
In this article, we will introduce Magentic-One, explain its capabilities, and discuss how to use it to accomplish useful work.
data:image/s3,"s3://crabby-images/3cc0d/3cc0da0b61cc602376e448da75bc145e29fbbecd" alt="Introduction to Microsoft's Magentic-One Agent Framework"
According to Microsoft’s own announcement, Magentic-One is…
“…a high-performance general-purpose agent system designed to solve such tasks. Magentic-One adopts a multi-agent architecture, where the main agent—the orchestrator—commands four other agents to solve tasks. The orchestrator performs task planning, tracks progress, and replans when errors occur, while directing specialized agents to execute tasks such as operating a web browser, navigating local files, or writing and executing Python code.”
Magentic-One is built on top of Microsoft’s existing Autogen product, which is its open-source multi-agent framework.
Magentic-One has five key components.
1/ Orchestrator Agent
Responsible for task decomposition and planning, assigning subtasks to other agents for execution. Tracks the progress of task completion and takes corrective actions as needed.
2/ Web Browsing Agent
Focuses on controlling and managing the state of a Chromium-based web browser. For each incoming request, the web browser performs specified actions in the browser and then reports the updated status of the web page. Its operations include:
-
• Navigation (e.g., visiting URLs, performing web searches),
-
• Page Interaction (e.g., clicking elements, entering content),
-
• Reading and Understanding (e.g., summarizing content, answering questions).
The web browser effectively executes tasks using the browser’s accessibility tree and a set of markup hint techniques.
3/ File Browsing Agent
Can read most types of local files and perform common navigation tasks such as listing directory contents and navigating folder structures.
4/ Coding Agent
An LLM-based agent dedicated to writing code, analyzing information collected from other agents, or creating new artifacts.
5/ Terminal Agent
Provides access to the console shell, where programs from the coding agent can be executed and new programming libraries can be installed.
Risks
Before proceeding, I want to highlight a particular aspect mentioned by Microsoft in its announcement regarding the risks associated with using agentic systems like this. This is indeed noteworthy.
Agentic systems like Magentic-One represent a phase shift in the opportunities and risks of having AI systems in the world. Magentic-One interacts with a digital world designed for and inhabited by humans. It can take actions that change the state of the world and lead to potentially irreversible consequences. This brings inherent and undeniable risks, and we have observed examples of emerging risks during testing. For instance, during development, a misconfiguration led the agent to fail to log in successfully to a specific WebArena site. The agent attempted to log in until repeated attempts caused the account to be temporarily suspended. Then, the agent attempted to reset the account’s password. More concerning, in a few instances—and without being prompted—the agent occasionally tried to recruit others for assistance (e.g., by posting on social media, emailing textbook authors, or in one case, drafting a Freedom of Information request to a government entity). In these cases, the agents all failed as they did not have access to the necessary tools or accounts, and/or were blocked by human observers.
Now, let’s look at how to use Magentic-One to perform some useful work. Hopefully, we won’t end up destroying the world in the process. 😉
Installing Magentic-One
I am a Windows user, but I will install the code using WSL2 Ubuntu for Windows. If you want to follow along, I have a complete guide on installing WSL2 Ubuntu here.
Please click here to go to the Magentic-One GitHub repository. Run the following commands on your local system (in the usual place where you keep projects).
git clone https://github.com/microsoft/autogen.git
cd autogen/python
uv sync --all-extras
source .venv/bin/activate
cd packages/autogen-magentic-one
Next, configure the environment variables for the chat completion client. Currently, Magentic-One only supports OpenAI’s GPT-4o as the underlying LLM.
You can set this configuration up through OpenAI or Azure Active Directory. Here are the instructions for using OpenAI.
export CHAT_COMPLETION_PROVIDER='openai'
export CHAT_COMPLETION_KWARGS_JSON='{"api_key": "gpt-4o"}'
One important note is that if you have a GitHub account, you can use the GPT4-o model in the GitHub model, which will give you free access to GPT4-o. However, usage limits may be somewhat strict.
To operate through the GitHub model, click here and log in with your GitHub account, or create one if you don’t have an account yet. Click the GPT-4o button. In the top right corner of the displayed page, there will be a green Get API Key
button. Click it, and then from there click the Get Developer Key
button.
Finally, you should see a screen where you can generate a classic personal access token. Go ahead and do it. You will need to enter a note describing the purpose of the key, but you do not need to grant it any additional permissions. Make a note of the generated key.
To use the GitHub GPT4-o model, change your environment variables as follows:
export CHAT_COMPLETION_PROVIDER='openai'
export CHAT_COMPLETION_KWARGS_JSON='{"base_url": "https://models.inference.ai.azure.com", "api_key": "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", "model": "gpt-4o"}'
Before running some example code, we need to install two final dependencies.
Magentic-One uses Playwright to interact with web pages, so you must install the Playwright dependencies.
playwright install --with-deps chromium
To allow Magentic-One to run Python code, we need to install and run Docker. Please check this link for instructions on how to do this.
Eventually, I was able to try out Magentic-One.
Example 1 — Writing Some Python Code.
(base) tom@tpr-desktransform: translateY(~/projects/autogen/python/packages$ python examples/example --logs_dir ./logs
/home/tom/projects/autogen/python/.venv/lib/python3.11/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
User input ('exit' to quit): Write a Python program to calculate and display
the first 5 fibonacci numbers
There was a lot of output, but after a while, Magentic-One asked me if I wanted to run the Python code it created, and I answered yes.
...
...
Executor is about to execute code (lang: python):
## filename: fibonacci.py
def fibonacci_sequence(n):
fib_numbers = [0, 1]
for i in range(2, n):
next_value = fib_numbers[i - 1] + fib_numbers[i - 2]
fib_numbers.append(next_value)
return fib_numbers
first_five_fib = fibonacci_sequence(5)
print("The first 5 Fibonacci numbers are:", first_five_fib)
Do you want to proceed? (yes/no): yes
---------------------------------------------------------------------------
[2024-11-10T13:25:40.508594], Executor:
The script ran, then exited with Unix exit code: 0
Its output was:
The first 5 Fibonacci numbers are: [0, 1, 1, 2, 3]
...
...
Example 2 — Searching the Web
To use Magentic to search the web, you need a Bing API key. You can set this key up through Microsoft Azure (Bing Search V7).
If you choose the lowest available “F” level, you can arrange it as a no-cost option. However, this limits the number of searches per second to 3, and there are also restrictions on the total number of search calls per month.
Setting this up is a bit complicated, but basically, you need to follow these steps:
-
• If you don’t have a Microsoft Azure account, sign up for a free account
-
• Create a Bing Search resource in the Azure portal; make sure to select the lowest F level, which is free but has more restrictions as mentioned above.
-
• Obtain your API key from the resource overview
Once you have the Bing API key, assign its value to the BING_API_KEY environment variable.
(base) tom@tpr-desktransform: translateY(~/projects/autogen/python/packages$ python examples/example --logs_dir ./logs
/home/tom/projects/autogen/python/.venv/lib/python3.11/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
User input ('exit' to quit): search the web and find the current weather
forecast for Edinburgh UK
Again, there was a lot of output, here are some notable contents.
...
...
Initial plan:
We are working to address the following user request:
search the web and find the current weather forecast for Edinburgh UK
To answer this request we have assembled the following team:
WebSurfer: A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, and interact with content (e.g., clicking links, scrolling the viewport, etc., filling in form fields, etc.) It can also summarize the entire page, or answer questions based on the content of the page. It can also be asked to sleep and wait for pages to load, in cases where the pages seem to be taking a while to load.
Coder: A helpful and general-purpose AI assistant that has strong language skills, Python skills, and Linux command line skills.
Executor: A agent for executing code
file_surfer: An agent that can handle local files.
Here is an initial fact sheet to consider:
1. GIVEN OR VERIFIED FACTS
- The request is asking for the current weather forecast for Edinburgh, UK.
2. FACTS TO LOOK UP
- The current weather forecast for Edinburgh, UK can be found on various weather websites such as the BBC Weather, Met Office, or Weather.com.
3. FACTS TO DERIVE
- N/A
4. EDUCATED GUESSES
- The current weather forecast will likely include details such as temperature, precipitation chance, wind speed, and potential weather warnings, which are typically part of a standard weather forecast.
Here is the plan to follow as best as possible:
- Request WebSurfer to search for the current weather forecast for Edinburgh, UK on a reliable weather website such as BBC Weather, Met Office, or Weather.com.
- Instruct WebSurfer to summarize the weather forecast details including temperature, precipitation chance, wind speed, and any potential weather warnings.
- Present the gathered weather information for Edinburgh, UK from WebSurfer.
...
...
I typed 'Edinburgh UK current weather forecast' into the browser search bar.
Here is a screenshot of [Edinburgh UK current weather forecast - Search](https://www.bing.com/search?q=Edinburgh+UK+current+weather+forecast&FORM=QBLH). The viewport shows 28% of the webpage, and is positioned at the top of the page.
The following metadata was extracted from the webpage:
{
"meta_tags": {
"referrer": "origin-when-cross-origin",
"og:description": "Intelligent search from Bing makes it easier to quickly find what you’re looking for and rewards you.",
"og:site_name": "Bing",
"og:title": "Edinburgh UK current weather forecast - Bing",
"og:url": "https://www.bing.com/search?q=Edinburgh+UK+current+weather+forecast&FORM=QBLH",
"fb:app_id": "3732605936979161",
"og:image": "http://www.bing.com/sa/simg/facebook_sharing_5.png",
"og:type": "website",
"og:image:width": "600",
"og:image:height": "315"
}
}
Automatic OCR of the page screenshot has detected the following text:
**Page Content:**
Microsoft Bing
Search input field: Edinburgh UK current weather forecast
**Menu:**
- Search
- Copilot
- News
- Images
- Videos
- Maps
- Shopping
- More
- Tools
Deep search
Sign in
Mobile
**Weather Information:**
About 3,180,000 results
Edinburgh
Capital city of Scotland, UK
Buttons:
- Map
- Things to do
- Weather (Selected)
- Covid-19
- Flights
- History
- Travel guide
**Weather Widget:**
**Weather Details:**
12°C / °F
13°
6°
Wind: 17 KMPH
Humidity: 90%
Cloudy · Sun 10, 13:44
**Hourly Forecast:**
14:00 17:00 20:00 23:00 2:00 5:00 8:00 11:00
**Weekly Forecast:**
- Sun 10: 13°/6°
- Mon 11: 🌞 11°/2°
- Tue 12: 🌧 9°/5°
- Wed 13: 🌥 12°/8°
- Thu 14: 🌧 10°/8°
- Fri 15: 🌧 11°/7°
- Sat 16: 🌧 10°/7°
- Sun 17: 🌥 7°/2°
**Sidebar Information:**
- UV index: No forecast
- Moderate breeze: 17 KMPH, WSW
- Sunrise: 07:39 AM
- Sunset: 04:12 PM
...
...
The final answer is this, completely accurate.
[2024-11-10T13:44:43.570437], Orchestrator (final answer):
The current weather in Edinburgh is 12°C with cloudy conditions.
There's a moderate breeze at 17 KMPH, and the humidity is at 90%.
The temperature is expected to range between 13°C and 6°C today.
Example 3 — Clicking on Website Links
As I write this article, there is a major rugby match going on in the UK between Wales and Fiji. I wanted to know the latest score of the Wales vs Fiji match.
(base) tom@tpr-desktransform: translateY(~/projects/autogen/python/packages$ python examples/example --logs_dir ./logs
/home/tom/projects/autogen/python/.venv/lib/python3.11/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
User input ('exit' to quit): Click on the bbc.co.uk website, click on the
Sport link near the top of the page. Look for a link in the page that
displays about the Wales v Fiji rugby match. Click on that link and tell me
what the latest score is
Again, I omitted much output to save space.
...
...
...
Page screenshot's automatic OCR detected the following text:
Of course, here is the transcribed text:
---
**BBC**
Sign in
Home
News
Sport
Weather
iPlayer
Sounds
Little Knowledge
Sport
Home | Football | Cricket | Formula 1 | Rugby U | Rugby L | Tennis | Golf | Boxing | Athletics
Discover your BBC
Sign in or create an account to watch, listen and participate
Log in or register
Request has been satisfied.
...
...
...
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
[2024–11–10T13:55:10.606578], Orchestrator (final answer):
According to the BBC Sports website, the latest score of the Wales vs Fiji match is Wales 7–0 Fiji, with Murray scoring.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
[2024–11–10T13:55:10.617212], Orchestrator (termination condition):
This is a screenshot I took shortly after the model answered (Fiji quickly scored after Wales’s initial score).
data:image/s3,"s3://crabby-images/7fb80/7fb800ff5457a17ffedb369d2230476473addebc" alt="Introduction to Microsoft's Magentic-One Agent Framework"
Example 4 — Reading Local XL Files.
I have a local XL file. Let’s see if Magentic-One can find it, open it, and answer questions about it.
data:image/s3,"s3://crabby-images/5ee21/5ee21abf858e12aa9ee91d226277c9ac70ccf9d8" alt="Introduction to Microsoft's Magentic-One Agent Framework"
(base) tom@tpr-desktransform: translateY(~/projects/autogen/python/packages$ python examples/example --logs_dir ./logs
/home/tom/projects/autogen/python/.venv/lib/python3.11/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
User input ('exit' to quit): I have a file called fake_data.xlsx in my /mnt/d/data directory. Can you tell me what the third record in the file is?
...
...
Next speaker file_surfer
---------------------------------------------------------------------------
[2024-11-10T14:16:57.676137], file_surfer:
Address: file:///mnt/d/data/fake_data.xlsx
Viewport position: displaying page 1 of 1.
=======================
### Sheet1
| Date | Sales | Expenses |
| --- | --- | --- |
| 2024-01-31 | 302 | 187 |
| 2024-02-29 | 635 | 472 |
| 2024-03-31 | 470 | 199 |
| 2024-04-30 | 306 | 459 |
| 2024-05-31 | 271 | 251 |
| 2024-06-30 | 900 | 230 |
| 2024-07-31 | 220 | 249 |
| 2024-08-31 | 814 | 408 |
| 2024-09-30 | 321 | 357 |
| 2024-10-31 | 666 | 443 |
| 2024-11-30 | 414 | 393 |
| 2024-12-31 | 530 | 485 |
---------------------------------------------------------------------------
[2024-11-10T14:17:00.613740], Orchestrator (thinking):
Updated ledger:
{
"is_request_satisfied": {
"reason": "Successfully retrieved and displayed the content of the third record in the file 'fake_data.xlsx'.",
"answer": true
},
"is_in_loop": {
"reason": "The task was completed in a straightforward manner without repetitive actions.",
"answer": false
},
"is_progress_being_made": {
"reason": "The content of the third record was successfully retrieved and displayed, indicating progress.",
"answer": true
},
"next_speaker": {
"reason": "The task is completed, so no further action is needed.",
"answer": "file_surfer"
},
"instruction_or_question": {
"reason": "By providing the content of the third record, the request was satisfied.",
"answer": "The third record in the file is: Date: 2024-03-31, Sales: 470, Expenses: 199."
}
}
---------------------------------------------------------------------------
[2024-11-10T14:17:01.465848], Orchestrator (final answer):
The third record in the file "fake_data.xlsx" contains the following information:
- Date: 2024-03-31
- Sales: 470
- Expenses: 199
---------------------------------------------------------------------------
[2024-11-10T14:17:01.465908], Orchestrator (termination condition):
No agent selected.
(base) tom@tpr-desktransform: translateY(~/projects/autogen/python/packages$
I love that the agent determined the first record was the header, so it returned the actual third data record. That was amazing.
Summary
Well, I don’t know what you think, but I think this is a series of quite impressive demonstrations. Microsoft has developed a remarkable agent system and seems to intend to integrate it fully into their Autogen framework in the near future.
In this article, I explained what Magentic-One is and how to download and run it to accomplish some useful tasks. I explained that its key components are
-
• Orchestration
-
• Web and file browsing
-
• Coding and terminal operations
I demonstrated the workings of these components through a series of examples, including
-
• Creating and running Python code
-
• Checking local files and answering questions about their content
-
• Searching for information online
-
• Clicking on website links