Key Points of MetaGPT Technology and Open Source Model Practice

MetaGPT is named after the “GPT-based Meta-Programming framework”, which means that MetaGPT is a meta-programming framework. It utilizes Standard Operating Procedures (SOP) to enhance the problem-solving capabilities of multi-agent systems based on large language models (LLMs). The framework simulates a virtual software team that includes roles such as product managers, architects, project managers, engineers, and quality engineers.

MetaGPT integrates efficient human workflows into LLM-based multi-agent collaboration, with SOP coding serving as a prompt sequence to achieve a more streamlined workflow, allowing agents with human-like domain expertise to validate intermediate results and reduce errors. These SOPs play a crucial role in supporting task decomposition and effective coordination.

MetaGPT requires agents to generate structured outputs, such as high-quality requirement documents, design artifacts, flowcharts, and interface specifications. The structured outputs significantly increase the success rate of target code generation (which is essentially a higher-level CoT) and reduce the risk of hallucinations caused by communication between LLMs.

Compared to mainstream agent projects (AutoGPT, LangChain, AgentVerse, ChatDev, etc.), MetaGPT also stands out in handling higher levels of software complexity and providing extensive functionalities.

MetaGPT has the following important innovative mechanisms compared to other agent projects:

Role Specialization

Clear role specialization allows complex work to be broken down into smaller, more specific tasks. MetaGPT designates the roles of agents (name, profile, goals, and constraints) and initializes specific contexts and skills for each role. For example, a product manager can use web search tools, while an engineer can execute code.

The paper also conducted ablation experiments on roles, showing that as the richness of roles increases, although costs rise, the performance gains far outweigh the increased expenses.

Workflow/SOP

Following SOPs in software development allows all agents to work sequentially. After obtaining user requirements, the product manager conducts detailed analysis and creates a comprehensive PRD that includes user stories and a pool of requirements. This is a preliminary functional decomposition.

The structured PRD is then passed to the architect, who transforms the requirements into system design components, such as a list of documents, data structures, and interface definitions. Once information is captured in the system design, it is guided to the project manager for task assignment.

The engineer continues to execute the specified classes and functionalities, and in the subsequent phase, the QA engineer develops test cases to enhance code quality.

Incorporating SOPs is akin to injecting thinking chains (CoT) into LLMs, significantly enhancing code generation capabilities.

Structured Communication Interface

Previous multi-agent projects used unconstrained natural language as a communication interface, which could lead to distortions of the original information, similar to a “telephone game”.

In contrast, MetaGPT communicates through structured outputs like documents and diagrams, significantly reducing the risk of distortion of the original information.

Publish-Subscribe Mechanism

MetaGPT stores information in a shared message pool.

The shared message pool allows all agents to exchange messages directly. Agents not only publish their structured messages in the pool but also transparently access messages from other agents. Any agent can directly retrieve the required information from the shared pool without asking other agents and waiting for their response, greatly improving communication efficiency.

During task execution, the subscription mechanism allows agents to tend to receive only task-related information and avoid distractions from unrelated details. For example, the architect mainly focuses on the PRD provided by the product manager, while documents from roles like quality engineers may receive less attention.

Program Iteration and Executable Feedback

MetaGPT explores a self-referential mechanism that recursively modifies the constraints and prompts of agents based on the information observed by agents during software development. This allows engineers to continuously improve their code by leveraging their historical execution and debugging memories. To gain more information, engineers write and execute corresponding unit test cases and then receive test results. If the results are satisfactory, additional development tasks are initiated; otherwise, debugging continues. This iterative testing process continues until the tests pass or a maximum of three retries is reached. This program iteration strategy significantly improves code generation quality (with an absolute improvement rate of 5.4% on MBPP).

Open Source Model Practice

Use FastChat to wrap local open-source models in a manner similar to the OpenAI API.

# Install fastchat
pip3 install "fschat[model_worker,webui]"
# Start the console
python3 -m fastchat.serve.controller
# Use Llama-2-13b
python3 -m fastchat.serve.model_worker --model-path meta-llama/Llama-2-13b-chat-hf
# Enable RESTful API Server
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000

Install MetaGPT.

# Since metagpt conflicts with the dependency libraries of fastchat, create a separate python (3.9+) environment
# Clone the repository, you can choose a release version branch
git clone https://github.com/geekan/MetaGPT.git
# Install in development mode
cd MetaGPT
pip3 install -e .

Create a key.yaml file in the MetaGPT/config folder, adding the following content:

OPENAI_BASE_URL: "http://localhost:8000/v1"
OPENAI_API_KEY: "sk-xxx"
OPENAI_API_MODEL: "Llama-2-13b-chat-hf"
MAX_TOKENS: 4096

Run with minimal setup.

metagpt "Create a 2048 game"