Comprehensive Analysis of AI Agents Technology Stack in 2024

Letta is a company focused on AI agents, providing open-source tools and cloud services to help developers build, deploy, and manage AI agents with memory and tool-calling capabilities.

Original text: https://www.letta.com/blog/ai-agents-stack

(Translated by ChatGPT)

Understanding the Ecosystem of AI Agents

While there are many maps of agent technology stacks and markets available, we often disagree on their classification methods and find they rarely reflect the tools and technologies actually used by developers. In recent months, significant advancements in memory mechanisms, tool usage, secure execution, and deployment have led to notable developments in the agent software ecosystem. Therefore, based on our over a year of open-source AI development experience and more than seven years of AI research, we decided to share our own “agent technology stack”.

The AI agents technology stack by the end of 2024 is divided into three key layers:

Agent hosting/services, agent frameworks, and LLM models with storage.

From LLMs to LLM agents

In 2022 and 2023, we witnessed the rise of LLM frameworks and SDKs, such as LangChain (released in October 2022) and LlamaIndex (released in November 2022). At the same time, several “standard” platforms emerged for consuming LLMs through APIs, as well as self-deployed LLM inference tools (like vLLM and Ollama).

By 2024, there has been a significant shift in market interest towards AI “agents” and more broadly, composite systems. Although the term “agents” has existed in the AI field for decades (especially in reinforcement learning), in the post-ChatGPT era, “agents” has become a more broadly defined concept, typically referring to LLMs that are tasked with outputting actions (tool calls) and operating in autonomous environments.

The combination of tool usage, autonomous execution, and memory mechanisms required to transition from LLM to agent has driven the development of a new agents technology stack.

The uniqueness of the Agents Technology Stack

Compared to basic LLM chatbots, agents face greater engineering challenges. This is because they require state management (retaining message/event history, storing long-term memory, executing multiple LLM calls in the agents loop) and tool execution (safely executing actions outputted by LLM and returning results).

Thus, the technology stack for AI agents is significantly different from that of traditional LLMs. Next, we will dissect today’s AI agents technology stack starting from the lowest model service layer:

Model Service

At the core of AI agents is the LLM. To use an LLM, the model needs to be served through an inference engine, which typically runs behind a paid API service.

Among closed API-based model inference providers, OpenAI and Anthropic lead with their proprietary cutting-edge models. Together.AI, Fireworks, and Groq are some popular options that provide services for open-weight models (like Llama 3) through paid APIs. Among local model inference providers, vLLM is a leader for production-grade GPU service loads, while SGLang is an emerging project aimed at a similar developer community.

For enthusiasts (“AI enthusiasts”), Ollama and LM Studio are two popular choices for running local models, particularly suitable for running models on personal computers (such as M-series Apple MacBooks).

Storage

Storage is one of the foundational modules for building stateful agents. The definition of agents relies on persistent states, such as their conversation history, memory, and external data sources used for RAG (retrieval-augmented generation).

Vector databases are very popular for storing agents’ “external memory,” such as Chroma, Weaviate, Pinecone, Qdrant, and Milvus. They allow agents to utilize vast amounts of data sources and conversational history that cannot fit into the context window. Additionally, traditional databases like Postgres, which have existed since the 1980s, now support vector search through the pgvector extension.

Some Postgres-based companies are also providing embedded search and storage capabilities for agents, such as Neon (serverless Postgres) and Supabase, which are also important players in this field.

Tools & Libraries

One of the main differences between standard AI chatbots and AI agents is that agents can call “tools” (or “functions”). In most cases, this calling mechanism is implemented by the LLM generating structured outputs (such as JSON objects) that specify which function to call and its parameters.

A common misconception about agent tool execution is that the actual execution of tools is not performed by the LLM provider. The LLM’s responsibility is limited to selecting which tool to call and what parameters to provide. Agent services that support arbitrary tools or parameters must use sandbox environments (like Modal, E2B) to ensure safe execution.

All agents call tools through the JSON schema defined by OpenAI, meaning that agents and tools under different frameworks can achieve compatibility. For example, Letta agents can call tools from LangChain, CrewAI, and Composio because they adhere to the same schema. As a result, an ecosystem serving common tools is rapidly growing.

Composio is a popular general-purpose tool library that also handles authorization management; Browserbase is a dedicated tool for web browsing; while Exa provides tools focused on web search. With the development of more agents, we expect the tool ecosystem to further expand and introduce new features, such as authentication and access control services for agents.

Agent Framework

Agent Framework: Orchestrating LLM Calls and Managing Agent State

The agent framework is responsible for coordinating LLM calls and managing the state of the agent. Different frameworks are designed differently in several aspects:

1. Agent State Management

Most frameworks introduce some concept of “state serialization,” allowing the agent’s state (such as conversation history, memory, and execution phase) to be serialized into files (like JSON or bytes) for later reloading into the same script. In Letta, all states are supported by a database (such as message tables, agent state tables, memory block tables), so there is no “serialization” concept as the state is always persistently stored. This approach facilitates querying the agent’s state, for example, by finding past messages by date.

The representation and management of state not only determine whether the agent application can scale as the conversation history extends or the number of agents increases but also affects the flexibility of accessing and modifying the state.

2. Structure of Agent Context Window

Each time an LLM is called, the framework compiles the agent’s state into the context window. Different frameworks organize data (such as instructions, message buffers, etc.) in the context window differently, affecting performance. We recommend choosing a framework that makes the context window transparent, as this directly impacts your control over agent behavior.

3. Communication Between Agents (Multi-Agent Systems)

Llama Index implements communication between agents through message queues, while CrewAI and AutoGen provide explicit abstractions to support multi-agent systems. Letta and LangGraph allow agents to call each other directly, supporting centralized (through supervisory agents) or distributed communication. Most frameworks currently support both single-agent and multi-agent systems, as a well-designed single-agent system should easily scale to multi-agent collaboration.

4. Methods of Memory Management

Due to the context window limitations of LLMs, memory must be managed through technical means. Some frameworks have built-in memory management functions, while others require developers to manage it themselves. CrewAI and AutoGen rely solely on RAG-based memory, while phidata and Letta adopt additional techniques, such as self-editing memory (from MemGPT) and recursive summarization. Letta agents come with a set of memory management tools that allow agents to search historical messages by text or date, draft memories, and edit their own context windows.

5. Support for Open Models

Model providers typically perform many operations behind the scenes to ensure that LLM generates text in the correct format (like tool calls). For example, when an LLM does not generate the correct tool parameters, it may need to resample its output or add cues in the prompt (like “please output in JSON format”). Supporting open models requires the framework to handle these challenges, so some frameworks only support mainstream model providers.

How to Choose the Right Framework

When building an agent, the choice of framework should depend on the specific application, such as whether you are building a conversational agent or a workflow agent, whether you want to run the agent in a notebook or as a service, and the requirements for support of open-weight models.

We expect that the design differences in deploying workflows across frameworks will become a major competitive point, especially in state/memory management and tool execution design choices, which will have a greater impact on practical applications.

Agent Hosting and Agent Services

The Current State and Future Trends of Most Agent Frameworks

Currently, most agent frameworks design agents to exist only within the Python scripts or Jupyter notebooks that write them. However, we believe the future of agents should be as services, deployed on local or cloud infrastructure, and accessed via REST APIs. Just as OpenAI’s ChatCompletion API has become the industry standard for interacting with LLM services, we anticipate a unified Agents API standard will emerge in the future, but this “winner” has yet to appear.

Deploying agents as services is much more complex than deploying LLMs as services, primarily due to state management and secure tool execution issues. Tools and their required dependencies and environments must be explicitly stored in databases, as services need to recreate the environment in which the tools run (which is not a problem when tools run in the same script as agents).

Additionally, applications may need to run millions of agents, each with an ever-growing conversational history. From prototype development to production stages, there will inevitably be a need for data normalization of agent states while defining the interaction of agents through REST APIs. Currently, this process is often completed by developers writing FastAPI and database code, but we expect that as agents mature, these functionalities will be more deeply integrated into frameworks.

—

Conclusion

The agent technology stack is still in a very early stage of development, but we are excited about the future expansion and evolution of its ecosystem.

Leave a Comment Cancel reply