An Analysis of MetaGPT Source Code for AI Agents

Introduction

In my work, I need to use AI Agents, so I need to delve deeper into AI Agents. Reading various articles alone often feels insufficient for understanding the details. Therefore, I started looking at various open-source projects related to AI Agents. This is the first article where I will study the source code of MetaGPT.

Basic Goals

MetaGPT is a multi-agent framework that abstracts the main roles in a software company, using different AI Agents to play these roles. These AI Agents include Product Managers, Software Architects, Project Managers, and Engineers. These AI Agents interact according to the SOP designed by the development team to ultimately deliver a project.

Old habit: I read not just for the sake of reading, but to solve certain problems or clarify certain concepts. Therefore, when facing MetaGPT, I have the following goals:

How does MetaGPT abstract the software development process of a company? How is the SOP specifically implemented in the code?
How do different AI Agents interact in MetaGPT?
From an output perspective, the content generated by MetaGPT is quite formatted. How are the prompts written to achieve such an effect?
How does MetaGPT abstract specific professions? For example, how is the Product Manager abstracted?

This article mainly aims to find answers to the above questions from the source code.

Entry Point

Although the README.md mentions that many students encounter some issues while running it, I found it quite easy to run. Compared to the early sd-webui, it is much more user-friendly, so I won’t elaborate further and will just run it as per the README instructions.

Let’s look at the entry method directly.

# startup.py 

async def startup(
    idea: str,
    investment: float = 3.0,
    n_round: int = 5,
    code_review: bool = False,
    run_tests: bool = False,
    implement: bool = True
):
    # 1. Start a company
    company = SoftwareCompany()
    # 2. Hire employees
    company.hire([
        ProductManager(),  # Product Manager
        Architect(), # Architect
        ProjectManager(), # Project Manager
    ])

    if implement or code_review:
        # 3. Hire developers
        company.hire([Engineer(n_borg=5, use_code_review=code_review)])

    if run_tests:
        
        company.hire([QaEngineer()])
    # 4. Set the amount (maximum amount of money that can be spent on GPT4 for this run)
    company.invest(investment)
    # 5. Boss’s requirements
    company.start_project(idea)
    # 6. Run for several rounds
    await company.run(n_round=n_round)

From the startup method, the entire process is very clear and highly readable!

Let’s first look at the SoftwareCompany class, part of the code is as follows:

# metagpt/software_company.py

class SoftwareCompany(BaseModel):
 # Environment
    environment: Environment = Field(default_factory=Environment)
    investment: float = Field(default=10.0)
    idea: str = Field(default="")

    async def run(self, n_round=3):
        """Run company until target round or no money"""
        while n_round > 0:
            # self._save()
            n_round -= 1
            logger.debug(f"{n_round=}")
            self._check_balance()
            await self.environment.run()
        return self.environment.history

Through the SoftwareCompany class, we abstract a software company. After reading the code, you will find that the environment object in the SoftwareCompany class is very important. Different AI Agents in the company will interact with the environment object. This abstraction is also quite clever; in a company, communication among colleagues indeed spreads within this ‘environment’.

Next is the run method. At the end of the startup method, it calls the run method of the SoftwareCompany class, which checks the balance and the number of rounds to run. If the balance is insufficient or the rounds exceed, it will stop running.

Since MetaGPT’s underlying implementation uses the OpenAI API, each request incurs a cost. You set a budget, which defaults to $3. If it runs up to $3, it will forcibly end, even if the task is not completed.

The run method of the SoftwareCompany class will call the run method of the environment object, as shown in the code below:

# metagpt/environment.py/Environment

 async def run(self, k=1):
        """
        Process all Role runs at once
        """

        for _ in range(k):
            futures = []
            for role in self.roles.values():
                future = role.run()
                futures.append(future)
            # When calling asyncio.gather(*futures), it runs the passed coroutines simultaneously in the background without blocking the main thread.
            await asyncio.gather(*futures)

From the above code, we can see that this is the real entry point, looping through all roles and calling each role’s run method. The whole process runs asynchronously through Python coroutines, thus improving the program’s execution efficiency.

Here, the role refers to different employee identities, which essentially use different prompts to request OpenAI’s code logic.

We need to read some role-related code to try to understand the sequentiality issue.

Role

In the startup method, we first hired three different roles:

company.hire([
        ProductManager(),  # Product Manager
        Architect(), # Architect
        ProjectManager(),
    ])

The order of hiring is significant. The first hired is an instance of ProductManager, which means the instantiation code of the ProductManager class has already been executed here:

# metagpt/roles/product_manager.py

class ProductManager(Role):

    def __init__(self, 
                 name: str = "Alice", 
                 profile: str = "Product Manager", 
                 goal: str = "Efficiently create a successful product",
                 constraints: str = "") -> None:
       
        super().__init__(name, profile, goal, constraints)
        # 1. Write the Product Requirement Document (PRD)
        self._init_actions([WritePRD])
        # 2. Observe the boss's requirements
        self._watch([BossRequirement])

In the above code, _init_actions and _watch are both methods of the Role class, which means these methods belong to the current Product Manager role. It initializes its actions: writing requirement documents and defines what it needs to observe: the boss’s requirements.

How is BossRequirement created? In the startup method, we call the start_project method of SoftwareCompany, which will send the needs as the BOSS to the environment of SoftwareCompany. The code is as follows:

# metagpt/software_company.py/SoftwareCompany

def start_project(self, idea):
        """Start a project from publishing boss requirement."""
        self.idea = idea
        # Initial message observed by Role
        self.environment.publish_message(Message(role="BOSS", content=idea, cause_by=BossRequirement))

This process is similar to a boss walking into the product development area and shouting, ‘I want to create xxxx’, then everyone knows. MetaGPT abstracts this process, allowing other roles to read the boss and other roles’ information through the environment of SoftwareCompany.

When the run method of the environment object is executed, the run method of the role will be executed,

async def run(self, message=None):
        if message:
            # 1. If someone sends you a message, store it in the role's short-term memory
            if isinstance(message, str):
                message = Message(message)
            if isinstance(message, Message):
                self.recv(message)
            if isinstance(message, list):
                self.recv(Message("\n".join(message)))
        # 2. Observe information in the environment to see if there is anything to process
        elif not await self._observe():
            # If there is no new information, suspend and wait
            logger.debug(f"{self._setting}: no news. waiting.")
            return
     # 3. If there is something to process, handle it through _react and obtain the result
        rsp = await self._react()
        # 4. Send the processing result back to the environment
        self._publish_message(rsp)
        return rsp

The logic of the run method is also very clear. It first observes the information in the environment. If there is something to process, it will handle it through _react and send the result back to the environment.

So how does it observe? Let’s take a look at the code for the _observe method:

# metagpt/roles/role.py/Role
 
 async def _observe(self) -> int:
        if not self._rc.env:
            return 0
        # Get information from the env's short-term memory
        env_msgs = self._rc.env.memory.get()
        # Observe the objects to be observed from env, obtaining the corresponding message
        observed = self._rc.env.memory.get_by_actions(self._rc.watch)
        # Record it down (role's own memory)
        self._rc.news = self._rc.memory.remember(observed)  # remember recent exact or similar memories

        for i in env_msgs:
            # Record information from the environment into role memory
            self.recv(i)

        news_text = [f"{i.role}: {i.content[:20]}..." for i in self._rc.news]
        if news_text:
            logger.debug(f'{self._setting} observed: {news_text}')
        return len(self._rc.news)

    def recv(self, message: Message) -> None:
        if message in self._rc.memory.get():
            return
        # Record the information
        self._rc.memory.add(message)

In the above code, the self._rc.env object is the environment where the current role is located, and a common bidirectional association technique is used here.

When other roles send messages to the environment using the publish_message method, the messages actually exist in the memory of the environment. In the _observe method, it first reads the messages from the memory of the environment, and then uses get_by_actions to obtain the messages corresponding to the actions that need to be observed by the current role. For the Product Manager, it sets BossRequirement as the object to be observed through the _watch method.

When the role calls get_by_actions, it will look for the message corresponding to BossRequirement (which is actually the required message that will be processed by GPT4). The code for get_by_actions is as follows:

# metagpt/memory.py/Memory

def get_by_action(self, action: Type[Action]) -> list[Message]:
        """Return all messages triggered by a specified Action"""
        return self.index[action]

    def get_by_actions(self, actions: Iterable[Type[Action]]) -> list[Message]:
        """Return all messages triggered by specified Actions"""
        rsp = []
        for action in actions:
            if action not in self.index:
                continue
            rsp += self.index[action]
        return rsp

    def add(self, message: Message):
        """Add a new message to storage, while updating the index"""
        if message in self.storage:
            return
        self.storage.append(message)
        if message.cause_by:
            self.index[message.cause_by].append(message)

Since get_by_actions actually finds the value corresponding to the key action from self.index, what self.index is becomes crucial. This requires looking at the add method. For BossRequirement, the key is set as BossRequirement, and then the message is stored.

In simple terms, the get_by_actions method will obtain the messages corresponding to the objects that the role specifies to observe through the _watch method, which will be stored as self._rc.news, and then the role will save these news into its memory.

If the role observes news through the _observe method, it will need to execute the _react method:

# metagpt/roles/role.py

 async def _react(self) -> Message:
        """Think first, then act"""
        await self._think()
        logger.debug(f"{self._setting}: {self._rc.state=}, will do {self._rc.todo}")
        return await self._act()

First, let’s look at the _think method:

# metagpt/roles/role.py/Role

 async def _think(self) -> None:
        # 1. If there is only one action, directly execute that action
        if len(self._actions) == 1:
            self._set_state(0)
            return
        prompt = self._get_prefix()
        # 2. Integrate the role's memory and state into the prompt, letting GPT4 process it
        prompt += STATE_TEMPLATE.format(history=self._rc.history, states="\n".join(self._states),
                                        n_states=len(self._states) - 1)
        next_state = await self._llm.aask(prompt)
        logger.debug(f"{prompt=}")
        if not next_state.isdigit() or int(next_state) not in range(len(self._states)):
            logger.warning(f'Invalid answer of state, {next_state=}')
            next_state = "0"
        self._set_state(int(next_state))

The _think method’s role is to set the state, and then _act will decide which action to execute based on the set state. If the current role has only one state, it sets the state directly to 0; if there are multiple actions, it lets GPT4 determine: it takes all the information from the role’s memory as history, integrates all the states of the role, and puts them together in the prompt to let GPT4 choose the best action to take under the current history, returning the state.

Next, let’s look at the _act method:

    async def _act(self) -> Message:
        logger.info(f"{self._setting}: ready to {self._rc.todo}")
        # Let the role execute the corresponding action
        response = await self._rc.todo.run(self._rc.important_memory)
        if isinstance(response, ActionOutput):
            msg = Message(content=response.content, instruct_content=response.instruct_content,
                        role=self.profile, cause_by=type(self._rc.todo))
        else:
            msg = Message(content=response, role=self.profile, cause_by=type(self._rc.todo))
        # Record the current action's returned message
        self._rc.memory.add(msg)
        # logger.debug(f"{response}")

        return msg

The logic of the _act method is to execute the action corresponding to self._rc.todo. This action is determined by the _think method through the _set_state method. For example, if the Product Manager has only one action: WritePRD, it will execute the run method of WritePRD. It’s important to note that to allow different roles to interact, _act will also produce a message, and each message will set the cause_by parameter to indicate which action generated the current message, so that other roles can see the needed message through the _watch method.

Subsequently, we see the details of WritePRD, as shown in the code below:

#  metagpt/actions/write_prd.py
class WritePRD(Action):
    def __init__(self, name="", context=None, llm=None):
        super().__init__(name, context, llm)

    async def run(self, requirements, *args, **kwargs) -> ActionOutput:
        sas = SearchAndSummarize()
        rsp = ""
        info = f"### Search Results\n{sas.result}\n\n### Search Summary\n{rsp}"
        if sas.result:
            logger.info(sas.result)
            logger.info(rsp)

        prompt = PROMPT_TEMPLATE.format(requirements=requirements, search_information=info,
                                        format_example=FORMAT_EXAMPLE)
        logger.debug(prompt)
        prd = await self._aask_v1(prompt, "prd", OUTPUT_MAPPING)
        return prd

The WritePRD class is essentially a prompt. It accepts the requirements parameter, which, from the entire flow of MetaGPT, is the boss’s requirement message. The Product Manager needs to write the PRD (Product Requirement Document) based on the boss’s requirement message and then record the result into the role’s memory.

Interaction Between Roles

Initially, the user inputs a requirement message as the BOSS role. This message is sent to the environment, as shown in the code below, mainly using the publish_message method where cause_by uses BossRequirement. For convenience in explanation, I will set the requirement as: ‘Develop a quantitative trading system for crypto’.

# metagpt/software_company.py/SoftwareCompany

def start_project(self, idea):
        self.idea = idea
        # Initial message observed by Role
        self.environment.publish_message(Message(role="BOSS", content=idea, cause_by=BossRequirement))

The Product Manager focuses on BossRequirement through the _watch method. When the Product Manager runs, the _observe method will receive the message [Develop a quantitative trading system for crypto] as self._rc.news. Then it calls the _think and _act methods. Since there is only one action (WritePRD), it will call the run method of WritePRD and pass the message [Develop a quantitative trading system for crypto] as requirements into the prompt.

From the relevant code of the startup method, after the Product Manager is the Architect. The Architect focuses on WritePRD through the _watch method.

In the _act of the Product Manager, it adds the Message to the environment, with cause_by being WritePRD. At this point, the Architect can focus on it through the _watch method and obtain the output produced by WritePRD, storing it in the Architect’s memory. This will then serve as input for the WriteDesign action, which will include the content of WritePRD in its prompt.

Thus, the interaction process between roles becomes quite clear.

Answering Questions

MetaGPT has many code details, but having read this far, we can already answer the initial questions:

1. How does MetaGPT abstract the software development process of a company? How is the SOP specifically implemented in the code? The SOP is the sequence of interactions between different roles. For example, the Product Manager needs the BOSS’s message as a constraint for their prompt, and the output from the Product Manager is the input for the Architect. The input and output between different roles constitute the SOP.

2. How do different AI Agents interact in MetaGPT? Roles determine which action’s output to obtain from which role through the _watch method. The specific acquisition process occurs in the _observer method. After obtaining the messages from other roles, if the current role has multiple actions, it will use _think to select one action (using the memory of the environment, i.e., the current events occurring), and then execute that action through _act, retrieving the necessary message from the role’s memory.

3. From an output perspective, the content generated by MetaGPT is quite formatted. How is the prompt written to achieve such an effect? The design format of MetaGPT’s prompts is worth learning from. It primarily uses Markdown format to design prompts, and most prompts include context and examples, allowing GPT4 to better leverage its zero-shot capabilities. To know more specifically, I recommend pulling the code directly to take a look.

4. How does MetaGPT abstract specific professions? From the perspective of professions, it mainly abstracts through actions and the messages they carry. For example, the Product Manager is abstracted as receiving the boss’s requirements, producing the Product Requirement Document, and then handing that document to the Architect. The results generated by each role will be placed in the environment, allowing others to see them (many roles have only one action, so they won’t use the messages in the environment).

Conclusion

I have run MetaGPT several times and still noticed significant limitations. For instance, when I ask it to write a Python game in the format of an example, the process is smooth. However, when I switch to asking MetaGPT to help me design an ad recommendation system for today’s headlines, it produces something that is clearly not very usable.

An Analysis of MetaGPT Source Code for AI Agents

Additionally, during the process of reading the source code, I also discovered some issues raised by the MetaGPT team itself, such as the limitation of large chunks of code being constrained by GPT4’s token limit, which makes it impossible to generate them together. This may require providing additional code as context, and the effect may not be ideal.

Of course, I still have great respect for the MetaGPT team. The code design is clear, and achieving the goal of self-bootstrapping with MetaGPT is also very cool.

That’s all.