Concept of AI Agents and the Swarm Framework

Content Source:

https://cookbook.openai.com/examples/orchestrating_agents

https://github.com/openai/swarm?tab=readme-ov-file

AI Multi-agent Concept and Origins

When using language models, achieving stable performance usually only requires a good prompt and the right tools. However, things can get tricky when dealing with many unique processes.

Basic Concepts:

1.Routines:

The concept of a routine is not strictly defined, but is intended to capture the idea of a set of steps. Specifically, we define a routine as a list of instructions in natural language (which we will represent with system prompts) and the tools required to carry out those instructions. This is more like an operational specification in certain jobs that requires the order of execution or specific requirements.

Let’s look at an example. Below, we define a routine for a customer service agent, instructing it to categorize user questions and then suggest fixes or provide refunds. We also define the necessary functions execute_refund and look_up_item. We can call this a customer service routine, agent, assistant, etc.— but the idea itself is the same: a set of steps and the tools to execute those steps.

2.Handoffs:

A handoff is defined as the agent (or routine) transferring an active conversation to another agent, just like you would transfer a call to someone else. Except in this case, the agent fully understands your previous conversation! This is different from the handoffs defined in workflows like fastgpt, which are not a fixed workflow transfer but more like a handoff or help request between people, similar to determining whether it is your job or what you need to do before passing it to the next person, or letting the model decide who to give it to.

For example, in the example below, a customer conversation is designed where if the conversation is in English, it is handled by an English customer service representative, and if it is in Chinese, it is handled by a Chinese customer service representative.

Results are as follows:

Key Code: english_agent.functions.append(transfer_to_chinese_agent) provides the help of a Chinese customer service representative as a tool for the English customer service representative. The workflow can be understood as: customer issue → English customer service. If it is in English, English customer service A replies directly; if it is in Chinese → it is handed over to Chinese customer service B to reply.

This example is only used to demonstrate the workflow switch, as the large model itself can automatically switch based on the input language, so it doesn’t have much meaning. However, to demonstrate, it is necessary to clearly state when replying whether you are customer serviceA or customer service B. The example above shows the difference between handoffs and workflows.

Handoffs

More like a work guideline, where it is not possible to choose a suitable “tool,” for A, customer service B is also a tool for Chinese replies.

Workflow usageAI and multi-agent differences

What are AI plugins in workflows?

Currently common coze, fastgpt, and dify are workflow-centric, with interfaces similar to the one below. In the workflow, there is a plugin that interacts with the LLM model, along with various other plugins, forming a relatively stable workflow. This solves certain specific tasks that require the capabilities of the LLM model. However, the characteristics of this workflow are:

1. Each node in the workflow is fixed and cannot be changed.

2. The LLM is a fixed plugin, which can essentially also set up a workflow without an LLM, which is no different from traditional workflow engineering.

Key Aspects of Multi-Agent Workflows

The key aspect of multi-agent workflows is that each node can transfer the work of the next node based on its own state, giving the choice and decision-making power of which node to hand off to or which tool to call to theLLM model. Therefore, it can be understood as a workflow centered around theLLM.

If viewed from the perspective of the final code execution process, the relationship and judgment between each node in the AI execution process of the workflow are all planned when the workflow is designed. In contrast, multi-agent workflows only have a relationship graph but do not represent the execution process.

Introduction to the Swarm Framework

Swarm focuses on enabling agents to coordinate and execute lightweight, highly controllable, and easy-to-test tasks. It achieves this through two primitive abstractions: Agents and handoffs. An Agent contains instructions and tools, and can choose to hand over a conversation to another Agent at any time.

This approach is already semantically powerful enough; however, OpenAI’s development of Swarm itself is also an educational and practical framework for multi-agents, thus cannot be directly applied to generative systems.

The Design Philosophy of Swarm

One interesting aspect of the design of Swarm is that since the next node’s handoff in a multi-agent workflow is determined by theLLM, in Swarm, each Agent can be seen as a collection of tools (functions) under a specific routine. The LLM selects a function based on the input and output, and then executes that function. The corresponding advantages and disadvantages are:

Advantages: More flexible workflow settings, specifying norms rather than specific working steps.

Disadvantages:

1. The hallucination and uncertainty of LLM lead to significant uncertainty in the output results of the workflow, depending on the LLM‘s understanding and capability of specific routines (similar to a person’s work ability).

2. Testing multi-agent workflows is relatively difficult due to the large uncertainty involved.

Finally:

As the capabilities of LLM enhance, the multi-agent model will definitely have more applications, while how to design, develop, test, and maintain it still has significant gaps that need to be explored.

Leave a Comment Cancel reply