MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

Background Introduction

Amazon Web Services recently partnered with Extraordinary Capital to hold the “AI Globalization Special Acceleration Program” offline event in Shanghai, where many leading AI startups from both domestic and abroad gathered to communicate with over a hundred AI entrepreneurs.

This content is compiled from the roundtable discussion at the event.

Event series content review:

“Hugging Face: Open Source Machine Learning – From Pre-trained Models to Running State-of-the-Art Open Source LLMs in Production”

“Opportunities and Challenges for AI Applications Going Abroad”

“Comparison of Commercialization Ecosystems of AI Companies in China and the U.S.”

“Babel CEO Zhang Hailong: Misunderstandings and Key Points to Note When Building Complex LLM Applications”

Speaker Introduction

Wu Chenglin, Founder & CEO of DeepWisdom

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

Has experience in implementing large-scale complex AI with billions of users and hundreds of billions of data at companies like Tencent; author of the open-source multi-agent framework MetaGPT; world champion in top competitions such as NeurIPS AutoDL/NeurIPS AutoWSL/KDDCup Graph; multiple papers published in top conferences and journals like TPAMI/KDD/CVPR/ACL; also awarded Forbes 30 Under 30, Hurun 30 Under 30, and numerous internal awards from Tencent and Huawei.

Keypoint:

Agents based on large models incorporate memory, planning, tools, etc., while multi-agents further integrate environmental factors, SOPs, etc.
In building this intelligent system, we derived a key formula: Code = SOP (Team). We are now replacing SOP with an intelligent system, and the generation of code will eventually achieve complete automation due to the online nature of SOP and teams.
Assessing the quality of SOP is akin to assessing the quality of code; ultimately, there needs to be a benchmark. Only by establishing a unified and reasonable evaluation metric can we scientifically judge SOP.
Just like OpenAI co-founder and chief scientist Ilya’s continuous questioning of whether to make AI obey humans or to make AI love humans, this is also something we have been pondering.
We have been considering which tasks should be handled by large models and which should be handled by agents. What is the boundary in between? Currently, from an effectiveness standpoint, while agents are highly necessary, whether this effectiveness can be internalized into large models is still uncertain. This boundary is very vague.
Agents excel at longer code planning and code generation. The core metric for the value of assistive (Copilot) versus agents in the software development process is leverage. The production leverage of assistive tools is limited, while agents can achieve infinite leverage.
The era of natural language programming is approaching quickly, which will effectively solve many issues in the software development and delivery process.
Whether it’s AutoGPT, AutoGen, or MetaGPT, each framework has its own design features and advantages. The research approach of MetaGPT is more akin to the Android route, and of course, this process is very challenging.

1. Practical Applications and Advantages of MetaGPT

In the MetaGPT project, we are currently focusing on two main tasks. First, we are dedicated to building a natural language programming framework that allows users to guide agents through natural language to easily write code. Second, we are also creating a society of agents, particularly emphasizing typical scenarios of multi-agent collaboration.

At the end of last year, we sensed a significant upcoming milestone – natural language programming will become a reality within two years. Software programming has evolved from C, C++, Java to Python, ultimately returning to natural language. Programming languages are merely designed for machines, while human language is always natural language; thus, programming languages like Python are just historical stages of evolution, not the endpoint.

Previously, writing Python might correspond to only 10 lines of code, while using the intelligent agent model to write Python could correspond one sentence to an unlimited number of lines, as natural language can comprehensively cover all coding tasks. It is also worth emphasizing that there is a significant difference between agents and the Copilot model in writing code; Copilot has limited production leverage, while agents can achieve infinite leverage.

Since MetaGPT went live on GitHub, it has already reached over 30,000 stars, and with just one line of prompt, you can successfully generate a mini-game on GitHub, with each prompt costing only one dollar.

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

MetaGPT can construct various tasks and also play the role of the user in these tasks. By selecting large language models that match the prompts, users can easily generate specific software. The concepts of “build something” and “use something” are essentially two sides of the same coin; MetaGPT creates a comprehensive agent in this way. MetaGPT emphasizes the importance of simultaneously playing the roles of builder and user in software development. This dual role allows MetaGPT to self-iterate, continuously test, and solve various problems in software development, including complex games and CRM systems.

In the implementation of the Minecraft game agent, the MetaGPT team enhanced the performance and adaptability of the game agent by constructing task-oriented strategies. By publishing tasks, executing tasks, and learning tasks in a loop, the MetaGPT game agent achieved the goal of unlocking the diamond tech tree in just 16 rounds of tasks, five times faster than VOYAGER.

Additionally, the MetaGPT team designed a refined multi-agent communication mechanism for the party game Werewolf, establishing a complex communication topology among agents and adjusting reflection and experience learning mechanisms, with preliminary experimental results improving agent performance. This project provides new ideas and references for the application of large language models in the gaming field.

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

The current technical challenge is whether MetaGPT can break through the limits of coding capability; our primary goal is to achieve an unprecedented Human Eval Pass@1 score of 100. Currently, by leveraging the power of MetaGPT, we have reached an impressive score of 86 (for reference, GPT-4’s current score is 67), and data-based predictions indicate it can reach nearly risk-free 97 points.

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

MetaGPT envisions a programming future with infinite possibilities. When it can generate all code correctly in one go and ensure every function is accurate enough, the core issues of programming will undergo transformation. This progress shifts the focus from generating a single function to two aspects: generating a list of functions and generating a single function. If the score can approach 100 infinitely, unlimited code generation is expected.

2. Key Formula for Building Agents: Code = SOP (Team)

Many believe that code generation is a rather complex problem, but in fact, we have already found a solution. Here, I will briefly explain how MetaGPT works, specifically how it assists through multi-entity collaboration. For instance, if a user proposes a production requirement for creating a 2048 game, the entire process is a multi-agent collaborative process. The entities involved include product managers, architects, project managers, engineers, and QA. They need to conduct requirement analysis, requirement breakdown, competitive analysis, draw competitive quadrant charts, and output project requirement pools.

This process follows a typical software company development workflow, containing a large number of standard operating procedures (SOPs), simulating the Scrum process (Agile Development), forming a small waterfall iterative model.

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

During this process, we observed a mysterious phenomenon. For example, when we propose a requirement similar to the recommendation system of today’s headlines (even though this requirement does not exist on GitHub), MetaGPT can accurately draw a complete architecture diagram. We believe that during this process, MetaGPT simulates neural networks acting as an information compressor. Compared to the laborious process of humans solving complex problems, MetaGPT can summarize the truth based on a large amount of fragmented information, providing new ideas for solving problems.

Such scenarios are essentially an abstraction of standard operating procedures (SOPs) in the real world. Every company has a large number of SOPs, and if we could obtain the SOPs of all Fortune 500 companies, perhaps we could simulate a new Fortune 500 company. Furthermore, if we migrate the SOPs of software companies online, we could create an entirely new online software company.

In building the intelligent system, we derived a key formula: Code = SOP (Team). The experience of applying SOP to drive team work has greatly inspired me; we are now replacing SOP with an intelligent system, and the generation of code may achieve complete automation due to the online nature of SOP and teams. Once all SOPs are transferred online, it becomes possible to generate a powerful team of intelligent agents, which can help us fulfill certain scaffolding needs.

3. Promoting Agents Towards High-Level Self-Optimization and Development

In the operation of MetaGPT, a key issue involves the “human-in-the-loop” concept, which requires human intervention to resolve issues that agents handle poorly. For instance, agents may have shortcomings in visual language; it is well-known that GPT is primarily a text model and has some semantic deficiencies in processing visual language. To address these issues, one method is to introduce an intermediate human employee, as part of the overall iteration of the support team, responsible for handling overall UI drafts and solving specific problems. Of course, MetaGPT itself also has an agent UI, created through a fine-tuned model of Stable Diffusion, which can assist us in generating software interfaces. Although it doesn’t perform as well as human employees, it can at least achieve a certain level of effectiveness.

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

MetaGPT itself is a complete framework; if software companies can operate in the virtual world, other types of companies also seem capable of migrating to the virtual world, including e-commerce companies and gaming companies. Within the developer community of MetaGPT, there are even CEOs of e-commerce companies who have moved their entire business logic into the virtual world by writing code, further demonstrating the potential of MetaGPT in promoting the realization of diverse goals for various companies in the virtual world.

The MetaGPT project is a typical three-dimensional stack framework, including the underlying operating system (OS) and AI Infra. AI Infra is responsible for all AI-related tasks, including model training, fine-tuning, inference, deployment optimization, data processing, and feature engineering. MetaGPT is the core of this framework, serving internally as an IDE (Integrated Development Environment) to address issues at the natural language level.

MetaGPT encompasses a large number of agent tasks, with its top layer being the Agent Store, where agents can interact in various ways. A one-stop AI Infra service allows AI to truly serve various industries, accelerating model development through self-optimizing groups of agents, improving quality, and achieving more efficient management and operation. The ultimate goal is to promote agents towards higher levels of self-optimization and self-development.

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

From a human perspective, an agent is essentially a “humanoid structure.” OpenAI’s goal is to expand the parameter count of GPT by 100 times, reaching a level comparable to the human brain. However, OpenAI’s focus has always been more on the completion and partial capabilities of large language models. But these two points do not solve the problems of the human brain. The human brain has many specific issues, one of which is memory. Memory is not equivalent to a vector database. A vector database can retrieve about 60% to 70% accuracy from around 100 pieces of text, but the human brain cannot achieve this.

A vector database can serve as an infrastructure, but it is difficult to use it as a universal memory tool. The human memory system actually consists of five types, including memory like a program, allowing us to recall and execute specific programs at certain moments. Meanwhile, the human visual and memory systems have many special mechanisms, the most important of which is the forgetting mechanism. Human memory excels at forgetting rather than remembering. Most things cannot be remembered, only the important ones leave a deep impression. However, these memories constitute our semantic memory, situational memory, procedural memory, and so on.

4. New Paradigm of Human-Machine Collaboration: Agents Co-Creating the Future with Humans

When discussing multi-agents, we introduced two important concepts: SOP (Standard Operating Procedures) and review. SOP is essentially a coding of humans, summarized from many practices. This coding of humans is completely unified with the coding of machines (like Python), both of which can be transformed into the digital world. This means we can manipulate various problems in the digital world, unifying machine coding and human coding completely.

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

In the development of large language models, a significant challenge is how to handle standard operating procedures (SOPs). Since SOPs were previously stored in the information systems and document libraries of various leading companies, this data is not easily accessible. Therefore, the modeling work of the models needs to establish close relationships with various industries to obtain the necessary data.

Review is particularly important in the training process of large language models. Humans are often influenced by hallucinations, and the review process can correct errors made by the model. This review mechanism can prevent 95% of errors in the human world.

The communication methods between large language models and agents may ideally be as follows: although we cannot explain these vectors, they indeed complete the overall review process.

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

We can see the evolution pattern of human-machine collaboration over the past few years, from embedded to assistive to agents, and finally to the social model. We are currently in the agent model, which has greatly changed our work and lifestyle. In the future, perhaps agents could become a company’s CEO, while we merely need to be the bot members of the company, having a button to control the agents to ensure they do no harm.

5. In the Era of Large Language Models, All Internet Applications are Worth Redesigning

Over the past 30 years, changes in information efficiency have brought about different stages of the internet. From yellow pages to search to recommendations, the efficiency of information retrieval has gradually improved, and the emergence of large language models further enhances the capability of fuzzy logic computation. This means many internet applications are worth redesigning and optimizing. Reading papers can also be digested through a round of large language models and then peer-reviewed, and the performance of large language models in peer review is close to that of humans, suggesting that future peer reviews of papers might no longer require human intervention. This implies that most internet applications are worth doing again.

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

The emergence of large language models has triggered many applications to be redesigned. From WeChat to Taobao, these applications are considering how to better utilize the capabilities of large language models. The future development direction of these applications is still uncertain, whether these applications will integrate these services themselves or whether third-party applications will take their traffic to innovate.

In summary, the emergence of large language models has brought about the capability of fuzzy logic computation, and agents are, in a sense, an abstraction of human brain logic. If we can solve most of the problems of the human brain, addressing standardized issues like thinking and memory, perhaps we can truly usher in the era of agents.

– END –

Editor | Lin Qiaoran

Proofreader | Qiu Ping

– Recommended Reading –

Industry Insights | Global venture capital cools, can the AIGC track stand out? | AI in finance: Technology leads the future while facing dual challenges of talent and data | Creative transformation and market disruption: The comprehensive impact of AIGC on the advertising industry｜The creativity and opportunities of digital humans and generative AI in the future | How AIGC revolutionizes content marketing efficiency｜Industry data + scenarios: In the AI boom, the first beneficiaries are such companies｜AI + Finance: Leading the transformation from transaction-centered to user-centered growth｜Exploring the future: AI-Agents leading the AI revolution｜From creativity to revenue: AIGC showcases a new commercialization perspective｜The AI era of cross-border e-commerce: A comprehensive transformation from factory innovation to customer service

Company Research | Baidu｜Cambricon｜Kunlun Wanwei｜BlueFocus｜iFLYTEK

Industry Reports | October 2023 Global AIGC Industry Monthly Report | 2023 H1 Global AIGC Industry Semi-Annual Report

Industry Map | 2023 China AIGC Industry Map V3.0

For more historical articles and report collections, please click 👉 Complete Report Collection

MetaGPT: Empowering Unique Intelligence in the Era of Agents as a Service

Keypoint:

1. Practical Applications and Advantages of MetaGPT

2. Key Formula for Building Agents: Code = SOP (Team)

3. Promoting Agents Towards High-Level Self-Optimization and Development

4. New Paradigm of Human-Machine Collaboration: Agents Co-Creating the Future with Humans

5. In the Era of Large Language Models, All Internet Applications are Worth Redesigning

Leave a Comment Cancel reply