Author: Chen Dihao — Senior Engineer at SF Technology AI Platform
Responsible for AI and large model infrastructure functions at SF Group, previously served as platform architect at Fourth Paradigm and PMC for OpenMLDB project, and has been an architect for Xiaomi’s cloud deep learning platform and head of storage and container team at USTC. Active in open source communities related to distributed systems and machine learning, and a contributor to open source projects such as HBase, OpenStack, TensorFlow, TVM, etc.
Introduction
OpenAI Swarm is an open-source multi-agent scheduling framework developed by OpenAI. The entire implementation logic consists of less than 500 lines of Python
code, enabling multi-agent scheduling and interactivemulti-turn dialogues and other functionalities.
This article will deeply analyze the conceptual abstraction and implementation principles of OpenAI Swarm from the perspective of the source code, understanding the interaction process among multiple agents from the lowest level of the code, and expanding more advanced functionalities fromthe scheduling framework itself.
User Interface
The OpenAI Swarm project provides multiple examples of usage cases, mainly including single-turn dialogues and interactive multi-turn dialogues. In the single-turn dialogue, the code abstractions for Swarm and Agent are proposed; see the code example below.
from swarm import Swarm, Agent
client = Swarm()
def transfer_to_agent_b():
return agent_b
agent_a = Agent(
name="Agent A",
instructions="You are a helpful agent.",
functions=[transfer_to_agent_b],
)
agent_b = Agent(
name="Agent B",
instructions="Only speak in Haikus.",
)
response = client.run(
agent=agent_a,
messages=[{"role": "user", "content": "I want to talk to agent B."}],
)
print(response.messages[-1]["content"])
Executing the above code will yield a text output from a large model. Opening the debug logs will show that the actual code will call the OpenAI API multiple times and return the final result after scheduling through multiple agents. The logic of Swarm and Agent will be introduced in detail later.
The interactive multi-turn interface is a Python function encapsulation, and most examples provide an entry function similar to the one below.
from swarm.repl import run_demo_loop
from agents import triage_agent
if __name__ == "__main__":
run_demo_loop(triage_agent)
Executing the above code will first wait for user input, then after multiple agent calculations, it will continue to wait for user input. The logic of the run_demo_loop
function is relatively simple and can be introduced in advance.
Implementation Principles of Interactive Multi-Turn Dialogue
OpenAI Swarm provides a simple run_demo_loop
function that offers a command-line interactive interface, allowing users to input multiple times and interact with multiple agents conveniently. The function implementation is also very simple, as shown in the code below.
def run_demo_loop(
starting_agent, context_variables=None, stream=False, debug=False
) -> None:
client = Swarm()
print("Starting Swarm CLI ")
messages = []
agent = starting_agent
while True:
user_input = input("\033[90mUser\033[0m: ")
messages.append({"role": "user", "content": user_input})
response = client.run(
agent=agent,
messages=messages,
context_variables=context_variables or {},
stream=stream,
debug=debug,
)
if stream:
response = process_and_print_streaming_response(response)
else:
pretty_print_messages(response.messages)
messages.extend(response.messages)
agent = response.agent
First, it enters an infinite while True
loop, obtaining user command-line input through the Python input
function, then uses the Swarm object’s interface to initiate the first agent, with the input content being the string the user entered in the command line. It provides print functions for both streaming and non-streaming parameters to print the final returned agent name and agent output. Note that a user question may involve multiple agent function calls or multiple agent outputs, but only the final agent information will be outputted, and then it enters the next user input and return loop.
To understand the underlying implementation details, we will now delve into the lower-level implementation of Swarm and Agent interfaces.
Domain Abstraction of Swarm
Whether in single-turn dialogues or interactive multi-turn dialogues, all agent scheduling entry points are the Swarm
class and the run
function. Therefore, we first look at the definition of the Swarm
class, simplifying the specific implementation to obtain the following class definition.
class Swarm:
def __init__(self, client=None):
self.client = client
def run() -> Response
def get_chat_completion() -> ChatCompletionMessage
def handle_function_result() -> Result
def handle_tool_calls() -> Response
def run_and_stream() -> None
First, Swarm is a utility class with only one member variable, which is the client that accesses the OpenAI API for large models, and it provides the run
function to run various agents. The other functions are internal utility functions used by the run
function. Since the specific logic of run
involves the internal functions of the Agent object, we will focus on the abstraction unrelated to the Agent.
The first object returned by the run
function is a Response
object, which actually yields a Response
object in streaming mode. The definition of this object is as follows.
class Response(BaseModel):
messages: List = []
agent: Optional[Agent] = None
context_variables: dict = {}
This is the return result that users can obtain using the Swarm framework. The first messages
object contains the historical output of all agents, with an example as follows, which not only includes the output of the large model but also information about whether functions and tools were called.
[{'content': 'The capital of China is Beijing.', 'refusal': None, 'role': 'assistant', 'audio': None, 'function_call': None, 'tool_calls': None, 'sender': 'agent'}]
The second object is agent
. If it is empty, it means there is no need to continue calling other agents. If it is not empty, then there is logic in the run
function that checks if there are agents that need to be called continuously, which will continue to loop.
while len(history) - init_len < max_turns and active_agent:
......
partial_response = self.handle_tool_calls(
message.tool_calls, active_agent.functions, context_variables, debug
)
......
if partial_response.agent:
active_agent = partial_response.agent
......
The third object is the global context_variables
, which will be passed to each agent and the agent’s functions. Users can add any elements to this global map to facilitate functions in obtaining more global information, which will be elaborated in detail in the Agent section later.
Domain Abstraction of Agent
Before understanding the logic of the run
function of Swarm, we first introduce the conceptual abstraction of Agent. The definition of the Agent
class is also quite simple, as shown in the code below.
class Agent(BaseModel):
name: str = "Agent"
model: str = "gpt-4o"
instructions: Union[str, Callable[[], str]] = "You are a helpful agent."
functions: List[AgentFunction] = []
tool_choice: str = None
parallel_tool_calls: bool = True
First, each agent has a string name, which is mainly for configuring the sender
parameter when calling the OpenAI API large model, and later it can be printed to understand which agent the output comes from. The actual dependencies among agents are associated through the Python Agent
class object, which is not significantly related to whether the name is configured.
The second parameter is the large model used. The Swarm
object specifies the OpenAI API service being used, so all agents currently can only use the same large model service. However, different agents can define the large model they use in the model
attribute, which is reasonable for agents handling tasks of varying complexity.
The third parameter is instructions
, which could theoretically be renamed to system prompt
. This parameter is for generating the system prompt parameter when requesting the large model. The supported parameter types include not only strings but also Python functions that can obtain information from the previously mentioned context_variables
, thus generating a more complex system prompt, as shown below.
def instructions(context_variables):
user_name = context_variables["user_name"]
return f"Help the user, {user_name}, do whatever they want."
agent = Agent(
instructions=instructions
)
Clearly, there is a bug in the type definition of the instructions
variable in the Agent
class, which should be modified to the code below. The community has not yet accepted this PR fix.https://github.com/openai/swarm/pull/44/files 。
instructions: Union[str, Callable[[dict[str, Any]], str]] = "You are a helpful agent."
The fourth parameter is functions
, which is a group of AgentFunction
objects. The type definition of AgentFunction
is as follows.
AgentFunction = Callable[[], Union[str, "Agent", dict]]
This is essentially a standard Python function, which is the type of function required by the OpenAI Swarm framework to be passed to the agent. In fact, the expected return types of strings, “Agent”, or dictionary objects also represent a bug, as the results of processing agent functions are handled using the following pattern matching.
def handle_function_result(self, result, debug) -> Result:
match result:
case Result() as result:
return result
case Agent() as agent:
return Result(
value=json.dumps({"assistant": agent.name}),
agent=agent,
)
case _:
try:
return Result(value=str(result))
except Exception as e:
error_message = f"Failed to cast response to string: {result}. Make sure agent functions return a string or Result object. Error: {str(e)}"
debug_print(debug, error_message)
raise TypeError(error_message)
In the handling logic below, if a user function returns a string or “Agent” or “dict”, it will be simply considered as string processing. In most examples, agent functions directly return agent objects. Since the official OpenAI Swarm project has closed issue and PR comments, this part of the code modification plan will be maintained in orchard-swarm
.https://github.com/OrchardUniverse/orchard-swarm 。

Finally, the fifth and sixth parameters, tool_choice
and parallel_tool_calls
, are parameters of the OpenAI Chat Completions API, which can control tool calls. However, the OpenAI Swarm project currently does not utilize or modify these parameters, so they can be ignored.
Conclusion
We have deeply interpreted the definitions of the Swarm utility class and Agent abstract class in the OpenAI Swarm project. Based on this, understanding the scheduling of agents and the logic of calling large models becomes very simple.
If you want to learn more about the application of agents, come to the “AI+ R&D Digital Summit (AiDD) on November 8-9 in Shenzhen, where I will give a keynote speech onTransforming Dify to Achieve Production-Ready AI Agent Applications and will introduce the internal modifications to Dify at SF Technology and actual application cases.
Recommended Activities