A New Era of AI Evolution: Getting Started with HuggingGPT & MetaGPT

Follow the official account “Carl’s AI Watts” and set it as “Starred” to get the latest AIGC news

Author: Carl

AIGC Open Source Free Tutorial 2.0 (supports ChatGPT, Midjourney, Stable Diffusion, Runway, AI Agents, Digital Humans, AI Voice & Music):

https://www.learnprompt.pro

In the first three articles, we discussed the principles and components of Agents as well as AutoGPT and AgentGPT. In this section, I will introduce two more Agents projects worth exploring in depth, detailing their working principles and differences, and teaching you how to use Agents hands-on 🎉

If you want to understand Agents from scratch, feel free to check out the previous three articles.

A New Era of AI Evolution: AI Agents Era (3/9) — Practical Use of AutoGPT & AgentGPT

A New Era of AI Evolution: AI Agents Era (2/9) — Decoding Agents Capabilities: Planning + Memory + Tool Usage

A New Era of AI Evolution: AI Agents Era (1/9) — What is an Agent?

🟢 HuggingGPT

HuggingGPT is a multi-model calling Agent framework that utilizes ChatGPT as a task planner, selecting available models on the HuggingFace platform based on the description of each model, and finally generating a summary response based on the execution results of the models.

This project is currently open source on GitHub and has a very cool name called JARVIS (Iron Man’s assistant). This research mainly involves two subjects: the well-known ChatGPT and the Hugging Face community in AI.

A New Era of AI Evolution: Getting Started with HuggingGPT & MetaGPT

What is Hugging Face?

Simply put, Hugging Face is an open-source community platform focused on artificial intelligence where users can publish and share pre-trained models, datasets, and demo files. Currently, over 100,000 pre-trained models and more than 10,000 datasets have been shared on Hugging Face. More than 10,000 organizations from various industries, including Microsoft, Google, Bloomberg, and Intel, are using Hugging Face’s products.

A New Era of AI Evolution: Getting Started with HuggingGPT & MetaGPT

In HuggingGPT, ChatGPT acts as the “operational brain,” automatically parsing user requests, and performing automatic model selection, execution, and reporting from the AI model library of Hugging Face, greatly facilitating the development of more complex AI programs.

Working Principle of HuggingGPT

This system consists of four stages:

A New Era of AI Evolution: Getting Started with HuggingGPT & MetaGPT

1. Task Planning

Using LLM as the brain, the user’s request is parsed into multiple tasks. Each task has four attributes: task type, ID, dependencies, and parameters. The system uses some examples to guide the LLM in task parsing and planning.

The specific instructions are as follows:

[{"task": task, "id": task_id, "dep": dependency_task_ids, "args": {"text": text, "image": URL, "audio": URL, "video": URL}}]
  • The “dep” field indicates the ID of the previous task, which generated the new resources that the current task depends on.
  • The “-task_id” field refers to the text, image, audio, and video generated from the dependent task with task ID task_id.

2. Model Selection

LLM assigns tasks to specialized models, with these requests being constructed into a multiple-choice question. LLM provides a list of models for users to choose from. Due to context length limitations, filtering needs to be done based on task types.

The specific instructions are as follows:

Based on user requests and calling commands, the Agent helps users select a suitable model from the model list to handle user requests. The Agent only outputs the model ID of the most suitable model. The output must follow a strict JSON format: {“id”: “modelID”, “reason”: “Detailed reason for choosing this model”}.

Afterwards, HuggingGPT ranks the models based on the number of downloads, as download counts are considered a reliable indicator of model quality. The selected model is chosen from the “Top-K” models in this ranking. K here is just a constant representing the number of models, for example, if set to 3, it will select the three models with the most downloads.

3. Task Execution:

Expert models execute on specific tasks and record results.

Based on inputs and inference results, the Agent needs to describe the process and results. The previous stages can form the following input:

User Input: {{User Input}}, Task Planning: {{Task}}, Model Selection: {{Model Assignment}}, Task Execution: {{Prediction Result}}.

To improve the efficiency of this process, HuggingGPT can run different models simultaneously, as long as they do not require the same resources. For example, if I prompt to generate images of cats and dogs, separate models can run in parallel to execute this task.

However, sometimes models may need the same resources, which is why HuggingGPT maintains an attribute to track resources. It ensures that resources are utilized effectively.

4. Response Generation

LLM receives execution results and provides summary results to the user.

However, to apply HuggingGPT in practical scenarios, we need to address some challenges:

  • Improving Efficiency: As the reasoning rounds of LLM and interactions with other models slow down processing speed
  • Dependence on Long Context Windows: LLM needs to use long contextual information to convey complex task content
  • Enhancing Stability: Improvements are needed in the output quality of LLM and the stability of external model services.

Now, let’s assume you want the model to generate audio based on an image. HuggingGPT will execute this task in the most suitable way. You can see more detailed response information in the diagram below:

A New Era of AI Evolution: Getting Started with HuggingGPT & MetaGPT

Quick Experience

Experiencing HuggingGPT is very simple; just input the OpenAI API key and HuggingGPT token:

A New Era of AI Evolution: Getting Started with HuggingGPT & MetaGPT

Access address:

https://huggingface.co/spaces/microsoft/HuggingGPT

After understanding the working principles of AutoGPT, AgentGPT, and HuggingGPT, I believe everyone has a certain understanding of the capabilities of Agents. Now, how does MetaGPT, born after them, become another sensational Agents project? Let’s break down MetaGPT.

🟡 MetaGPT

MetaGPT introduces a framework that seamlessly integrates human workflows with multi-agent collaboration. By encoding standardized operating procedures (SOP) into prompts, MetaGPT ensures a structured approach to problem-solving, thereby reducing the likelihood of errors.

A New Era of AI Evolution: Getting Started with HuggingGPT & MetaGPT

Current Agent solutions have a problem: although these language model-driven Agents have made significant progress in simple conversational tasks, they struggle with complex tasks, as if they see things that do not exist (hallucination). When chaining these Agents together, it can trigger a chaotic chain reaction.

Now MetaGPT introduces standard operating procedures. These procedures act like cheat codes to smoothly coordinate work. They inform the agents about what is happening, guiding them in an orderly manner.

With these procedures, agents can become familiar with their work like domain experts and validate outputs to avoid errors. Like a high-tech assembly line, each agent plays a unique role, collectively understanding complex teamwork.

Why MetaGPT is Important

MetaGPT offers a fresh perspective. This is why it has stirred up waves:

  • Stable Solutions: With SOP, MetaGPT has been proven to generate more consistent and correct solutions compared to other Agents.

  • Diverse Role Assignments: The ability to assign different roles to LLM ensures comprehensiveness in problem-solving.

MetaGPT Software Development Process

A New Era of AI Evolution: Getting Started with HuggingGPT & MetaGPT
  1. Requirement Analysis: The process begins upon receiving requirements. This stage focuses on clarifying the functions and requirements needed for the software.
  2. Acting as Product Manager: The product manager initiates the entire process based on requirements and feasibility analysis. They are responsible for understanding the requirements and setting a clear direction for the project.
  3. Acting as Architect: Once the requirements are clear, the architect creates a technical design plan for the project. They are responsible for constructing system interface designs, ensuring that the technical implementation meets the requirements. In MetaGPT, the architect Agent can automatically generate system interface designs, such as the development of a content recommendation engine.
  4. Acting as Project Manager: The project manager uses sequence flow diagrams to satisfy each requirement. They ensure that the project progresses as planned, with each phase executed in a timely manner.
  5. Acting as Engineer: Engineers are responsible for the actual code development. They use designs and flow diagrams to convert them into fully functional code.
  6. Acting as Quality Assurance (QA) Engineer: After the development phase, QA engineers conduct comprehensive testing. They ensure that the software meets the required standards and has no errors or issues.

Example

For example, when you input

python startup.py “Design a RecSys like Toutiao”,

MetaGPT will provide you with multiple outputs, one of which is guidance on data and API design.

A New Era of AI Evolution: Getting Started with HuggingGPT & MetaGPT

The cost of generating an analysis and design example is about $0.2 (using the GPT-4 API), while the cost of a complete project is about $2.0. In this way, MetaGPT provides a cost-effective solution, allowing you to quickly obtain the information and guidance you need.

Quick Experience

Currently, there is no online experience version of MetaGPT. Here, I will list the installation method for Docker to minimize the environmental difficulties you may face:

# Step 1: Download metagpt official image and prepare config.yaml
docker pull metagpt/metagpt:v0.3.1
mkdir -p /opt/metagpt/{config,workspace}
docker run --rm metagpt/metagpt:v0.3.1 cat /app/metagpt/config/config.yaml > /opt/metagpt/config/key.yaml
vim /opt/metagpt/config/key.yaml # Change the config
# Step 2: Run metagpt demo with container
docker run --rm \
    --privileged \
    -v /opt/metagpt/config/key.yaml:/app/metagpt/config/key.yaml \
    -v /opt/metagpt/workspace:/app/metagpt/workspace \
    metagpt/metagpt:v0.3.1 \
    python startup.py "Write a cli snake game"
# You can also start a container and execute commands in it
docker run --name metagpt -d \
    --privileged \
    -v /opt/metagpt/config/key.yaml:/app/metagpt/config/key.yaml \
    -v /opt/metagpt/workspace:/app/metagpt/workspace \
    metagpt/metagpt:v0.3.1
docker exec -it metagpt /bin/bash
$ python startup.py "Write a cli snake game"

Try replacing “Write a cli snake game” with your preferred command!

For more installation tutorials, please refer to the official guide.

In the next section, we will introduce AI Town, welcome to follow “Carl’s AI Watts” 🧙

AIGC Communication Group

“Carl’s AI Watts” is a Chinese AIGC learning community. Our open-source courses at https://www.learnprompt.pro/ now support ChatGPT, Midjourney, Runway, Stable Diffusion, AI Agents, AI Digital Humans, AI Voice & Music, Large Model Fine-tuning and other popular projects. We will also update AI-related popular science knowledge and hands-on fine-tuning of large models in the field. Everyone is welcome to exchange ideas on the future scenarios of AIGC here!

A New Era of AI Evolution: Getting Started with HuggingGPT & MetaGPT

Reference

  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
  • HuggingGPT: The Secret Weapon to Solve Complex AI Tasks
  • MetaGPT: The Multi-Agent Framework
  • MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
  • MetaGPT: Multi-Agent Harmony for Complex Problem Solving
  • MetaGPT: The Future of Multi-Agent Collaboration in AI (A Brief Guide)

Leave a Comment