Understanding AI Agents: Classic Cases and Frameworks

Hello everyone, I am Student Zhang. Continuous learning and continuous output of valuable content, follow me, and let’s learn AI large model technology together!
Overview of Articles in the Official Account

Understanding AI Agents: Classic Cases and Frameworks

If you have any questions, feel free to add me on WeChat: jasper_8017. Looking forward to discussing and progressing together with like-minded friends!

This series of articles follows the MetaGPT Multi-Agent Course, delving into the understanding and practice of multi-agent system development.

This article serves as notes for Chapter 2 (Overview of Agents and Introduction to Multi-Agent Frameworks) of the course.

0. Review and Learn – Reexamining What AI Agent Is

Previous articles have already introduced my understanding of the concept of AI Agent.

  • An agent is like a human, capable of understanding real-world affairs, having memory, thinking, summarizing, learning, planning, making decisions, and using various tools to accomplish tasks.

  • Multi-agent systems are like a team, akin to the society we live in, where each agent has its own functions and domains. Through collaboration between individuals, more complex and larger goals can be achieved.

Now let’s look at this classic diagram:

Understanding AI Agents: Classic Cases and Frameworks

It seems that the agent in the middle should be replaced with LLM for better accuracy, and the entire diagram should be called an Agent. In other words: The Agent is a collection of all the capabilities in the above diagram, with LLM as the brain to determine the steps needed to complete tasks, plan which tools to use, and what results to achieve. Coupled with memory capabilities, the entire process can autonomously make decisions and take actions to achieve its goals, just like a human.

1. Introduction to an AI Agent Example – BabyAGI

Project address: BabyAGI on GitHub

The operation process is as follows: (1) Extract the first task from the task list (2) Send the task to the Execution Agent, which uses LLM to complete the task based on the context. (3) Enrich the results and store them in a vector database (4) Create new tasks and re-prioritize the task list based on the previous task’s goals and results. (5) Repeat the above steps.

This involves four agents, with the first three utilizing the capabilities of large models for task planning and summarization:

  • Execution Agent receives objectives and tasks, calling the large model LLM to generate task results.

    Task Creation Agent uses the large model LLM to create new tasks based on objectives and the results of the previous task. Its inputs are: objectives, results of the previous task, task descriptions, and the current task list.

    Prioritization Agent uses the large model LLM to reorder the task list. It accepts one parameter: the ID of the current task.

    Context Agent uses vector storage and retrieval of task results to obtain context.

    Understanding AI Agents: Classic Cases and Frameworks
For the process after it runs, you can refer to this article: BabyAGI: AI Task Management System
Link to Article

2. Comparison of Multi-Agent Frameworks

I don’t know much about the multi-agent frameworks available on the market. So far, I have only used MetaGPT and AutoGPT, and have heard of AutoGen. For more information, you can refer to this article: AI Agents Based on Large Language Models – Part 3 (Link to Article), which summarizes and compares common multi-agent frameworks:

Understanding AI Agents: Classic Cases and Frameworks

Here, I will only compare and explain my experiences with MetaGPT and AutoGPT. These are my personal feelings and opinions, and feedback is welcome.

  • AutoGPT relies more on the large model for planning and action, which I feel is somewhat overly dependent on the capabilities of the large model. Currently, the capabilities of the large model have not reached the level of autonomous planning and decision-making, making the execution results of AutoGPT very unpredictable, and it is difficult to achieve the desired outcomes. Moreover, it is a single-agent system, and the tasks it can accomplish are relatively simple. The interface is also not very user-friendly.

  • MetaGPT relies more on predefined SOPs (Standard Operating Procedures), which makes the task execution process relatively controllable, maximizing the assurance of correct task execution. It can also be multi-agent, allowing for relatively complex tasks. The interface encapsulation is also quite clear. Especially the three abstractions of Team, Role, and Action truly abstract the agents as humans. In summary, MetaGPT’s workflow is like: the SOP establishes a standardized assembly line, and the Roles are workers distributed at various positions along the assembly line, each performing their duties for a win-win cooperation.

3. Warning?

Agents involve multiple steps, and since the capabilities of the large model are not as strong as we imagine, if not limited, they can easily fall into a deadlock… This will directly result in a loss of your wealth… So caution is essential. Just like the warning in BabyAGI:

Understanding AI Agents: Classic Cases and Frameworks
Hello everyone, I am Student Zhang. Continuous learning and continuous output of valuable content, follow me, and let’s learn AI large model technology together!
Overview of Articles in the Official Account

Understanding AI Agents: Classic Cases and Frameworks

If you have any questions, feel free to add me on WeChat: jasper_8017. Looking forward to discussing and progressing together with like-minded friends!

Leave a Comment