Understanding Agent Orchestration with OpenAI in 300 Lines of Code

When using large language models, achieving stable performance usually only requires a good prompt and the right tools. However, dealing with many unique processes can become tricky. To address this, OpenAI published a blog post on their official website titled “Orchestrating Agents: Routines and Handoffs”, which introduces the concepts of Routines and Handoffs. The code demonstrated in this post has been further encapsulated into the swarm library: OpenAI’s open-source swarm shows that the framework for agent applications can actually be quite simple, supporting domestic large models with just a few lines of code.

Through OpenAI’s article “Orchestrating Agents: Routines and Handoffs” and its example code, it is easy to understand the core mechanisms of what is known as agent orchestration, realizing that swarm, as an agent framework, is actually just 300 lines of code:

https://cookbook.openai.com/examples/orchestrating_agents

For example, the core concept of the Agent, Routines, can be defined as a list of natural language instructions (which we will represent using system prompts) and the tools required to complete these instructions. Essentially, it is a single call to chat with tools, meaning the core of the Agent relies on the function calling capability of the large model, and the instability of various Agent frameworks stems from the large model’s function calling ability:

Understanding Agent Orchestration with OpenAI in 300 Lines of Code

Using a diagram from Anthropic: Yesterday, Anthropic summarized that building effective agents is full of insights.

When performing function calling, the most troublesome aspect is passing the definition list of tools as parameters to the large model. OpenAI’s blog post demonstrates a Python trick, namely specifying any Python function to directly generate the JSON format description of the tools parameter for large model calls:

import inspect
def function_to_schema(func) -> dict:
    type_map = {
        str: "string",
        int: "integer",
        float: "number",
        bool: "boolean",
        list: "array",
        dict: "object",
        type(None): "null",
    }
    try:
        signature = inspect.signature(func)
    except ValueError as e:
        raise ValueError(
            f"Failed to get signature for function {func.__name__}: {str(e)}"
        )
    parameters = {}
    for param in signature.parameters.values():
        try:
            param_type = type_map.get(param.annotation, "string")
        except KeyError as e:
            raise KeyError(
                f"Unknown type annotation {param.annotation} for parameter {param.name}: {str(e)}"
            )
        parameters[param.name] = {"type": param_type}
    required = [
        param.name
        for param in signature.parameters.values()
        if param.default == inspect._empty
    ]
    return {
        "type": "function",
        "function": {
            "name": func.__name__,
            "description": (func.__doc__ or "").strip(),
            "parameters": {
                "type": "object",
                "properties": parameters,
                "required": required,
            },
        },
    }

The function_to_schema function in the blog has been further optimized in the swarm library into a utility function function_to_json:

from swarm import util
# Define a utility function
def get_weather(location) -> str:
    return "{'temp':67, 'unit':'F'}"
    # Convert to the JSON format required for function calling parameters
    f1 = util.function_to_json(get_weather)
print(f1)

As for the second introduced concept “handoffs”, it is defined as the routine transferring the active conversation to other agents for processing. The implementation is straightforward: simply define the transfer to XXX agent as a function transfer_to_XXX, which is included as part of the tools passed to the LLM, allowing the large model to determine whether a transfer is necessary. This is indeed a simple and effective method.

After reading this article, looking at the source code of swarm, one would feel that to develop a native application for a large model, it is indeed possible to set aside those hotly discussed agent frameworks and just go for a simple and direct approach.

Understanding Agent Orchestration with OpenAI in 300 Lines of Code

To generate similar images, you can send the image to a tool like Zhipu Qingyan for interpretation, and with slight modifications, it can be turned into a prompt for text-to-image generation.

Leave a Comment