In-Depth Analysis of OpenAI Swarm Source Code: Multi-Agent Scheduling Framework

Author: Chen Dihao — Senior Engineer at SF Technology AI Platform

Responsible for AI and large model infrastructure functions at SF Group, previously served as platform architect at Fourth Paradigm and PMC for OpenMLDB project, and has been an architect for Xiaomi’s cloud deep learning platform and head of storage and container team at USTC. Active in open source communities related to distributed systems and machine learning, and a contributor to open source projects such as HBase, OpenStack, TensorFlow, TVM, etc.

▼

Introduction

OpenAI Swarm is an open-source multi-agent scheduling framework developed by OpenAI. The entire implementation logic consists of less than 500 lines of Python code, enabling multi-agent scheduling and interactivemulti-turn dialogues and other functionalities.

This article will deeply analyze the conceptual abstraction and implementation principles of OpenAI Swarm from the perspective of the source code, understanding the interaction process among multiple agents from the lowest level of the code, and expanding more advanced functionalities fromthe scheduling framework itself.

User Interface

The OpenAI Swarm project provides multiple examples of usage cases, mainly including single-turn dialogues and interactive multi-turn dialogues. In the single-turn dialogue, the code abstractions for Swarm and Agent are proposed; see the code example below.

from swarm import Swarm, Agent

client = Swarm()

def transfer_to_agent_b():
    return agent_b

agent_a = Agent(
    name="Agent A",
    instructions="You are a helpful agent.",
    functions=[transfer_to_agent_b],
)

agent_b = Agent(
    name="Agent B",
    instructions="Only speak in Haikus.",
)

response = client.run(
    agent=agent_a,
    messages=[{"role": "user", "content": "I want to talk to agent B."}],
)

print(response.messages[-1]["content"])

Executing the above code will yield a text output from a large model. Opening the debug logs will show that the actual code will call the OpenAI API multiple times and return the final result after scheduling through multiple agents. The logic of Swarm and Agent will be introduced in detail later.

The interactive multi-turn interface is a Python function encapsulation, and most examples provide an entry function similar to the one below.

from swarm.repl import run_demo_loop
from agents import triage_agent

if __name__ == "__main__":
    run_demo_loop(triage_agent)

Executing the above code will first wait for user input, then after multiple agent calculations, it will continue to wait for user input. The logic of the run_demo_loop function is relatively simple and can be introduced in advance.

Implementation Principles of Interactive Multi-Turn Dialogue

OpenAI Swarm provides a simple run_demo_loop function that offers a command-line interactive interface, allowing users to input multiple times and interact with multiple agents conveniently. The function implementation is also very simple, as shown in the code below.

def run_demo_loop(
    starting_agent, context_variables=None, stream=False, debug=False
) -> None:
    client = Swarm()
    print("Starting Swarm CLI  ")

    messages = []
    agent = starting_agent

    while True:
        user_input = input("\033[90mUser\033[0m: ")
        messages.append({"role": "user", "content": user_input})

        response = client.run(
            agent=agent,
            messages=messages,
            context_variables=context_variables or {},
            stream=stream,
            debug=debug,
        )

        if stream:
            response = process_and_print_streaming_response(response)
        else:
            pretty_print_messages(response.messages)

        messages.extend(response.messages)
        agent = response.agent

First, it enters an infinite while True loop, obtaining user command-line input through the Python input function, then uses the Swarm object’s interface to initiate the first agent, with the input content being the string the user entered in the command line. It provides print functions for both streaming and non-streaming parameters to print the final returned agent name and agent output. Note that a user question may involve multiple agent function calls or multiple agent outputs, but only the final agent information will be outputted, and then it enters the next user input and return loop.

To understand the underlying implementation details, we will now delve into the lower-level implementation of Swarm and Agent interfaces.

Domain Abstraction of Swarm

Whether in single-turn dialogues or interactive multi-turn dialogues, all agent scheduling entry points are the Swarm class and the run function. Therefore, we first look at the definition of the Swarm class, simplifying the specific implementation to obtain the following class definition.

class Swarm:
    def __init__(self, client=None):
        self.client = client

    def run() -> Response

    def get_chat_completion() -> ChatCompletionMessage

    def handle_function_result() -> Result

    def handle_tool_calls() -> Response

    def run_and_stream() -> None

First, Swarm is a utility class with only one member variable, which is the client that accesses the OpenAI API for large models, and it provides the run function to run various agents. The other functions are internal utility functions used by the run function. Since the specific logic of run involves the internal functions of the Agent object, we will focus on the abstraction unrelated to the Agent.

The first object returned by the run function is a Response object, which actually yields a Response object in streaming mode. The definition of this object is as follows.

class Response(BaseModel):
    messages: List = []
    agent: Optional[Agent] = None
    context_variables: dict = {}

This is the return result that users can obtain using the Swarm framework. The first messages object contains the historical output of all agents, with an example as follows, which not only includes the output of the large model but also information about whether functions and tools were called.

[{'content': 'The capital of China is Beijing.', 'refusal': None, 'role': 'assistant', 'audio': None, 'function_call': None, 'tool_calls': None, 'sender': 'agent'}]

The second object is agent. If it is empty, it means there is no need to continue calling other agents. If it is not empty, then there is logic in the run function that checks if there are agents that need to be called continuously, which will continue to loop.

while len(history) - init_len < max_turns and active_agent:
  ......
  partial_response = self.handle_tool_calls(
      message.tool_calls, active_agent.functions, context_variables, debug
  )
  ......
  if partial_response.agent:
      active_agent = partial_response.agent
  ......

The third object is the global context_variables, which will be passed to each agent and the agent’s functions. Users can add any elements to this global map to facilitate functions in obtaining more global information, which will be elaborated in detail in the Agent section later.

Domain Abstraction of Agent

Before understanding the logic of the run function of Swarm, we first introduce the conceptual abstraction of Agent. The definition of the Agent class is also quite simple, as shown in the code below.

class Agent(BaseModel):
    name: str = "Agent"
    model: str = "gpt-4o"
    instructions: Union[str, Callable[[], str]] = "You are a helpful agent."
    functions: List[AgentFunction] = []
    tool_choice: str = None
    parallel_tool_calls: bool = True

First, each agent has a string name, which is mainly for configuring the sender parameter when calling the OpenAI API large model, and later it can be printed to understand which agent the output comes from. The actual dependencies among agents are associated through the Python Agent class object, which is not significantly related to whether the name is configured.

The second parameter is the large model used. The Swarm object specifies the OpenAI API service being used, so all agents currently can only use the same large model service. However, different agents can define the large model they use in the model attribute, which is reasonable for agents handling tasks of varying complexity.

The third parameter is instructions, which could theoretically be renamed to system prompt. This parameter is for generating the system prompt parameter when requesting the large model. The supported parameter types include not only strings but also Python functions that can obtain information from the previously mentioned context_variables, thus generating a more complex system prompt, as shown below.

def instructions(context_variables):
   user_name = context_variables["user_name"]
   return f"Help the user, {user_name}, do whatever they want."

agent = Agent(
   instructions=instructions
)

Clearly, there is a bug in the type definition of the instructions variable in the Agent class, which should be modified to the code below. The community has not yet accepted this PR fix.https://github.com/openai/swarm/pull/44/files 。

instructions: Union[str, Callable[[dict[str, Any]], str]] = "You are a helpful agent."

The fourth parameter is functions, which is a group of AgentFunction objects. The type definition of AgentFunction is as follows.

AgentFunction = Callable[[], Union[str, "Agent", dict]]

This is essentially a standard Python function, which is the type of function required by the OpenAI Swarm framework to be passed to the agent. In fact, the expected return types of strings, “Agent”, or dictionary objects also represent a bug, as the results of processing agent functions are handled using the following pattern matching.

    def handle_function_result(self, result, debug) -> Result:
        match result:
            case Result() as result:
                return result

            case Agent() as agent:
                return Result(
                    value=json.dumps({"assistant": agent.name}),
                    agent=agent,
                )
            case _:
                try:
                    return Result(value=str(result))
                except Exception as e:
                    error_message = f"Failed to cast response to string: {result}. Make sure agent functions return a string or Result object. Error: {str(e)}"
                    debug_print(debug, error_message)
                    raise TypeError(error_message)

In the handling logic below, if a user function returns a string or “Agent” or “dict”, it will be simply considered as string processing. In most examples, agent functions directly return agent objects. Since the official OpenAI Swarm project has closed issue and PR comments, this part of the code modification plan will be maintained in orchard-swarm.https://github.com/OrchardUniverse/orchard-swarm 。

In-Depth Analysis of OpenAI Swarm Source Code: Multi-Agent Scheduling Framework

Finally, the fifth and sixth parameters, tool_choice and parallel_tool_calls, are parameters of the OpenAI Chat Completions API, which can control tool calls. However, the OpenAI Swarm project currently does not utilize or modify these parameters, so they can be ignored.

Conclusion

We have deeply interpreted the definitions of the Swarm utility class and Agent abstract class in the OpenAI Swarm project. Based on this, understanding the scheduling of agents and the logic of calling large models becomes very simple.

If you want to learn more about the application of agents, come to the “AI+ R&D Digital Summit (AiDD) on November 8-9 in Shenzhen, where I will give a keynote speech onTransforming Dify to Achieve Production-Ready AI Agent Applications and will introduce the internal modifications to Dify at SF Technology and actual application cases.

In the “AI Agents R&D Implementation Practice” forum, we also invited senior technical product experts from Tencent Cloud, Wang Shengjie, CTO of Future Intelligent, and senior engineer Zhao Bing from Tencent Cloud to discuss the pilot projects and infinite possibilities of AI Agents in software engineering, focusing on how agents simulate human cognitive understanding of task requirements and achieve a powerful integration of large models with traditional software engineering toolchains.Welcome to scan the code for on-site communication!

Recommended Activities

The survey on “2024 Software R&D Applications of Large Models” initiated by the AiDD Organizing Committee in collaboration with multiple communities has ended, and the report will be released on November 8 at the AiDD Summit Shenzhen main venue Keynote. If you want to download it online, please follow the “China Intelligent Kai Ling” WeChat public account and enter “AiDD2024 Survey” to make a free reservation for download.