Author: Wang Lei, Senior Architect of MindSpore, AI Engineering Expert

Organizer: Qingke AI

^>^{Join us! NLP Paper Submission, LLM Communication, Live Paper Discussion Group}

Introduction to MetaGPT

In recent years, large language models have become a hot topic in the AI field due to their powerful natural language processing capabilities. They can not only generate and understand text but also perform complex analysis and reasoning. Meanwhile, large language models have also sparked interest in agents (i.e., AI Agents). An agent is an intelligent entity capable of perceiving its environment, making decisions, and executing actions. Unlike traditional artificial intelligence, agents have the ability to independently think and use tools to gradually achieve given goals. With large language models as their brains, agents are equipped with the capability for automated handling of general issues.

MetaGPT is a multi-agent framework that utilizes SOP (Standard Operating Procedures) to coordinate a multi-agent system based on large language models, thus achieving metaprogramming techniques. This framework simulates a virtual software team with roles such as product managers, architects, project managers, engineers, and quality engineers, introducing SOP as the development process for the virtual software team. It focuses on software development, covering the entire lifecycle from requirement analysis to code implementation.

In MetaGPT, multi-agents are viewed as an agent society, where multi-agents = agents + environment + standard processes (SOP) + communication + economy. Each component plays an essential role:

• Agents: Extends the definition of single agents to multi-agents. In a multi-agent system, multiple single agents can collaborate, each equipped with unique LLMs, observation, thinking, actions, and memory.
• Environment: The environment is the common space where agents exist and interact. Agents observe important information from the environment and publish action output results for other agents to use.
• Standard Processes (SOP): Established procedures that manage agent actions and interactions, ensuring orderly and efficient operation within the system.
• Communication: The process of information exchange between agents. It is crucial for collaboration, negotiation, and competition within the system.
• Economy: Refers to the value exchange system within the multi-agent environment, determining resource allocation and task priorities.

SOP defines the work roles and workflows within the society. In software engineering, the Waterfall Model outlines the steps from analysis to delivery, facilitating cross-role team collaboration. MetaGPT’s approach demonstrates its ability to decompose high-level tasks into detailed, actionable components handled by different roles (product managers, architects, project managers, engineers, quality engineers), thus promoting role-specific expertise and coordination. This diagram illustrates how MetaGPT is designed to handle complex tasks and facilitate clear role delineation, making it a valuable tool in complex software development scenarios.

MetaGPT Framework

The design of MetaGPT is divided into two levels: the Basic Component Layer and the Collaboration Layer.

Basic Component Layer

The Basic Component Layer centers around AI Agents, providing capabilities such as observation and thinking. It establishes the core modules necessary for individual agent operation and information exchange across the system, including environment, memory, roles, actions, and tools, as shown in Figure 2.

• Environment: Provides a collaborative workspace and communication platform for agents.
• Memory: Stores and retrieves historical messages.
• Roles: Encapsulates professional skills and workflows based on the domain.
• Actions: Executes modular sub-tasks.
• Tools: Provides common services and tools.

This layer provides the infrastructure for agents to operate within their assigned roles, allowing them to interact with each other and the system.

Collaboration Layer

Built on the Basic Component Layer, it coordinates various agents to collaboratively solve complex problems. It provides two basic mechanisms: knowledge sharing and workflow encapsulation.

• Knowledge Sharing: This mechanism allows agents to effectively exchange information and contribute to a shared knowledge base. Agents can store, retrieve, and share data at different granularities. It not only enhances coordination but also reduces redundant communication, improving overall operational efficiency.
• Workflow Encapsulation: This mechanism uses SOP to decompose complex tasks into smaller, manageable sub-tasks. It assigns these sub-tasks to suitable agents and monitors them through standardized outputs to ensure their actions align with the overall goals.

In this framework, the capabilities of agents in MetaGPT are significantly enhanced. The instantiation of agents, guided by specialized role prompts, is referred to as “anchored agents,” providing roles with the ability to observe, think, reflect, and accumulate knowledge. These roles interact with the environment through established subscription and publishing methods.

The separation of the Basic and Collaboration Layers facilitates modularization while ensuring the individual and collective capabilities of agents. The Basic Components provide reusable building blocks and tools, while the collaboration module implements purposeful coordination.

The division of the Basic and Collaboration Layers promotes modularization while ensuring the capabilities of individual and collective agents. The components provide reusable building blocks and utilities, while the collaboration module integrates purposeful coordination.

Implementation Mechanism of MetaGPT

Role Definition

The MetaGPT framework supports the creation of various specialized roles, such as product managers and architects. The basic role class is defined by a set of key attributes: name, profile, goals, constraints, and description. The goals represent the primary responsibilities or objectives the role seeks to accomplish. Constraints indicate the limitations or principles that the role must follow when executing actions. Constraints can specify, for example, “The code you write should comply with coding standards such as PEP8, with modularity, readability, and maintainability characteristics.” The description provides additional specific information to help establish a more comprehensive role definition.

The comprehensive role definitions provided by the MetaGPT framework enable the creation of highly specialized LLM-based agents, each tailored to specific domains and objectives. Role definitions not only introduce behavior guidance based on expected functions but also contribute to creating diverse and specialized agents, each an expert in its field.

• Think & Reflect: Roles can retrieve role descriptions to build thoughts and then reflect on what needs to be done and decide on the next course of action using the _think() function.
• Observe: Roles can observe the environment and use the _observe() function to think and act based on their observations. They will focus on important information and incorporate it into memory to enrich their contextual understanding and inform future decisions.
• Broadcast Messages: Roles can use the _publish_message() function to broadcast messages to the environment. These messages contain details about current execution results and related action records for publishing and sharing information.
• Knowledge Precipitation & Act: Roles are not only broadcasters but also receivers of information from the environment. Roles can assess the relevance and timeliness of incoming messages, extract relevant knowledge from the shared environment, and maintain an internal knowledge base to support decision-making. They execute actions by consulting LLMs and leveraging their rich contextual information and self-knowledge. The execution results are encapsulated as messages, while normative components are shared by the environment.
• State Management: Roles can track their actions by updating work states and monitoring to-do lists. This allows roles to process multiple actions sequentially without interruption. Before executing each action, roles first lock their state. After completing the action, the state is marked as unlocked. This prevents other actions from interrupting the workflow.

Instantiating SOP with Prompts

MetaGPT uses prompts to transform real-world Standard Operating Procedures (SOP) into clearly defined agent workflows. This process involves using prompts to instantiate SOPs and providing step-by-step guidance based on established practices to ensure consistent and structured execution of complex sequential tasks.

First, we detail the Action class and then demonstrate how to design standardized action-level fine-grained prompts. In the MetaGPT framework, Action serves as the atomic unit for agents to perform specific tasks, specified through natural language. Key attributes include:

• Prefix: Injects role-specific prefixes into prompts to establish role context. Use the set_prefix() method to configure identifiers for role-specific prompts.
• LLM Proxy: Each Action contains an LLM proxy that can be invoked via the aask() method, using context inputs expressed in natural language prompts to enrich action details. Additionally, various role-specific context parsing functions can be implemented in the Action class. These functions aim to extract and provide sufficient contextual information to the LLM from inputs.
• Standardized Outputs Schema: Defines the expected output schema using structured representations for extracting structured data. Standardized output schema defines the expected output format for extracting structured data.
• Retry Mechanism: Implements retries for actions by defining the number of attempts and wait time to enhance robustness.

Standardized Outputs for Actions

The effectiveness of instantiating workflows in MetaGPT largely depends on the standardized outputs of each action. These outputs leverage expert domain knowledge and industry best practices, tailoring workflows to specific roles and contexts. The design of structured outputs serves the following purposes:

Standardized outputs ensure consistent LLM results that are predictable, repeatable, and aligned with the responsibilities of the agents. They guide high-quality, structured, and task-specific LLM generation by setting output expectations.

Furthermore, standardized patterns serve as blueprints that constrain LLM behavior within suitable boundaries for roles. This helps maintain focus on the target task and prevents deviation from objectives. Since actions are part of a comprehensive role baseline guide, this role-awareness guidance ensures outputs align with real-world quality standards.

In summary, the design and implementation of standardized outputs in MetaGPT provide powerful tools for handling complex tasks. Transforming complex tasks defined in natural language into standardized outputs promotes consistency in collaboration, thereby reducing the potential for incoherent multi-turn dialogue interactions. Additionally, it can clearly and consistently represent structured information, which may be challenging to convey solely through natural language, especially for LLM-based agents. Moreover, by providing structured and standardized outputs, different agents can achieve a clear and consistent understanding of their tasks and responsibilities.

Knowledge Sharing Mechanism and Custom Knowledge Management

In MetaGPT, each agent actively curates personalized knowledge by retrieving relevant historical information from shared environment logs. Agents do not passively rely on dialogue but utilize role-based interests to extract relevant information.

As mentioned earlier, each agent in MetaGPT maintains a memory cache and indexes subscription messages related to its role, achieving personalized knowledge curation. Specifically, the centralized replication of messages creates a unified data source. Agents can register subscriptions to automatically receive messages relevant to their roles from this data source. Internally, agents index their memory cache by content, sender, and receiver to enable rapid retrieval in relevant contexts.

• Message Sharing: When an agent generates a message, it is copied to the shared environment log, creating a real single data source. This ensures all agents can access the same information.
• Role-Based Subscriptions: Agents can register subscriptions for message types that are meaningful to their roles. This is done based on predefined criteria consistent with the responsibilities and tasks of the agents.
• Message Distribution: When new messages meet subscription criteria, notifications are automatically distributed to relevant agents. This proactive information dissemination prevents agents from missing important updates.
• Memory Cache and Indexing: Agents maintain an internal memory cache where subscribed messages are stored and indexed by content, sender, and receiver. This ensures efficient information storage and retrieval.
• Context Retrieval: The environment maintains a shared memory pool that supports caching and indexing. Meanwhile, agents can query their internal memory as needed to obtain contextual details relevant to their current tasks. This helps improve their understanding and make better decisions.
• Update Synchronization: Any updates or changes made to messages will be synchronized across all linked agents’ memories to maintain a consistent view of the information. This ensures all agents have access to the latest data.

By organizing the flow of information around agent roles, collaboration between multi-agents is ensured. By combining centralized knowledge sharing with role-based personalized memory caching, customized knowledge management is achieved. This reduces the presence of irrelevant data and provides a common context, thus achieving a balance between team collaboration and individual efficiency.

Advantages and Disadvantages of MetaGPT

Advantages

• Multi-agent composition collaboratively handles more complex tasks.
• By integrating SOP into multi-agent collaboration, it addresses the hallucination issues present in existing methods for handling complex tasks.
• Through structured coordination and modular outputs, it effectively resolves complex multi-agent collaboration issues.
• Agents can actively acquire relevant knowledge from the environment rather than simply obtaining information through dialogue. This design aligns better with how human organizations operate.

Disadvantages

• Occasionally references non-existent resource files, such as images and audio.
• It is prone to calling undefined or unimported classes or variables when executing complex tasks.
• Roles and processes are still relatively fixed and cannot achieve dynamic expansion, such as extending UI design roles.

MetaGPT Experience

Preparation

Access OpenAI’s server or local computer.

Since the OpenAI API needs to be called, prepare your OPENAI_API_KEY in advance.

Installing MetaGPT

• 1. Ensure that Python 3.9 or higher is installed on your system. If not installed, please do so.

python --version

• 2. Clone the repository to your local machine and install it.

git clone https://github.com/geekan/MetaGPT.git
cd MetaGPT
pip3 install -e. # or pip3 install metagpt # Install stable version

• 3. Configure OPENAI_API_KEY in the config/key.yaml file.

# Copy the configuration file config.yaml to key.yaml and modify it as follows
cp config/config.yaml config/key.yaml

Variable Name	config/key.yaml
OPENAI_API_KEY # Replace with your own key	OPENAI_API_KEY: “sk-…”
OPENAI_API_BASE # Optional	OPENAI_API_BASE: “https:///v1″

• 4. (Optional) If you want to save outputs like quadrant diagrams, system designs, sequence flows, etc., during execution, you can install mermaid-js first.

# If executing, ensure that NPM is installed on your system. And use npm to install mermaid-js.
npm --version
sudo npm install -g @mermaid-js/mermaid-cli

• 5. Start using it.

python startup.py "Write a command-line Snake game"

# Enabling code review mode will cost more money but will improve code quality and success rate
python startup.py "Write a command-line Snake game" --code_review True

After running the script, you can find your new project in the workspace/ directory.

References

[1]Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, et al. MetaGPT: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
[2]https://github.com/geekan/MetaGPT
[3]https://docs.deepwisdom.ai/zhcn/guide/get_started/introduction.html

Reply in the public account backend with aaai, acl, naacl to directly enter the submission group~

ReplyLLM to join the technical exchange group~

Reply nice to enter the weekly paper live sharing group~

MetaGPT Framework – An In-Depth Look at AI Agents