As the new year 2025 begins, this year will be the year of “Agentic Systems“, and “2025 will see the emergence of true Agents“. Time waits for no one, please receive your overview of the AI Agents technology stack.
Understanding the AI Agents Ecosystem
Although we have seen numerous classification diagrams regarding the Agent stack and market, we often disagree with their classification methods, finding that they rarely reflect actual developer usage. In the past few months, with advancements in memory, tool usage, secure execution, and deployment, the Agent software ecosystem has developed significantly. So, what should a truly practical “Agent Technology Stack” look like?
AI Agents Technology Stack is organized into three key layers: Agent Hosting/Services, Agent Frameworks, and LLM Models and Storage.
From LLM to LLM Agent
In 2022 and 2023, we saw the rise of LLM frameworks and SDKs, such as LangChain (released in October 2022) and LlamaIndex (released in November 2022). At the same time, we also witnessed the establishment of several “standard” platforms that consume LLMs via API and self-deploy LLM inference (vLLM and Ollama).
In 2024, we observed a sharp shift in interest towards AI “Agents”, and more broadly, towards composite systems. Although the term “Agent” has existed in AI for decades (particularly in reinforcement learning), in the post-ChatGPT era, “Agent” has become a loosely defined term, often referring to LLMs that are endowed with output actions (tool calls) and operate in autonomous settings. The combination of tool usage, autonomous execution, and memory required to transition from LLM to Agent has prompted the development of a new Agent stack.
The uniqueness of the Agent Technology Stack
Compared to basic LLM chatbots, Agents pose a more complex engineering challenge, as they require state management (retaining message/event history, storing long-term memory, executing multiple LLM calls within a single Agent loop) and tool execution (safely executing actions from LLM outputs and returning results).
Therefore, the AI Agent stack looks very different from the standard LLM stack. Let’s start from the model service layer and break down today’s AI Agent stack:
The core of AI Agents is LLM. To use LLM, the model needs to be served through an inference engine, typically running behind a paid API service.
OpenAI and Anthropic lead the closed API-based model inference providers with their proprietary cutting-edge models. Together.AI, Fireworks, and Groq are popular options providing paid APIs behind open-weight models (e.g., Llama 3). Among local model inference providers, vLLM is most commonly seen leading production-grade GPU-based service loads. SGLang is an emerging project with a similar developer audience. Among hobbyists (“AI enthusiasts”), Ollama and LM Studio are two popular options for running models on your own computer (e.g., M-series Apple MacBooks).
Storage is a fundamental building block defining stateful Agents—Agents are defined by persistent states such as their conversation history, memories, and external data sources used for RAG. Vector databases like Chroma, Weaviate, Pinecone, Qdrant, and Milvus are popular for storing Agents’ “external memories”, allowing Agents to leverage data sources and conversation histories that are too large to fit into a context window. Postgres is a traditional database that has existed since the 1980s and now also supports vector searches through pgvector. Companies based on Postgres, such as Neon (serverless Postgres) and Supabase, also provide embedding-based search and storage for Agents.
A major distinction between standard AI chatbots and AI Agents is the ability of Agents to invoke “tools” (or “functions”). In most cases, the mechanism for this action is for the LLM to generate structured output (e.g., JSON objects) that specify the function to call and the parameters provided. A common point of confusion regarding tool execution by Agents is that the tool execution is _not_ performed by the LLM provider itself—the LLM merely selects which tool to call and the parameters provided. Agent services that support arbitrary tools or tools with arbitrary parameter inputs must utilize sandboxes (e.g., Modal, E2B) to ensure safe execution.
Agents invoke tools via a JSON schema defined by OpenAI—which means Agents and tools can actually be compatible across different frameworks. The Letta Agent can invoke tools from LangChain, CrewAI, and Composio because they are all defined by the same schema. Thus, for common tools, there is a growing ecosystem of tool providers. Composio is a popular general-purpose tool library that also manages authorizations. Browserbase is an example of a dedicated tool specifically for web browsing, while Exa provides a dedicated tool for searching the web. As more Agents are built, we expect the tool ecosystem to grow and offer existing new features such as authentication and access control for Agents.
Agent frameworks coordinate LLM calls and manage Agent states. Different frameworks will have different designs for the following aspects:
-
Managing Agent States: Most frameworks introduce the concept of “serializing” states, allowing Agents to save serialized states (e.g., JSON, bytes) to files and load back the same script at a later time—this includes states such as conversation history, Agent memory, and execution phases. In Letta, all states are supported by a database (e.g., message table, Agent state table, memory block table), and there is no concept of “serialization” because Agent states are always persisted. This allows easy querying of Agent states (e.g., finding past information by date). The representation and management of states determine how Agent applications will scale with longer conversation histories or more Agents, as well as how flexibly states can be accessed or modified.
-
Context Window Structure of Agents: Every time an LLM is called, the framework will “compile” the Agent’s state into the context window. Different frameworks will place data into the context window in different ways (e.g., instructions, message buffers, etc.), which can affect performance. We recommend choosing a framework that makes the context window transparent, as this ultimately is how you can control your Agent’s behavior.
-
Cross-Agent Communication (i.e., Multi-Agent): Llama Index enables Agent communication through message queues, while CrewAI and AutoGen have explicit abstractions for multi-Agent. Letta and LangGraph both support direct calls between Agents, allowing centralized (through supervisor Agents) and distributed communication across Agents. Most frameworks now support both multi-Agent and single-Agent, as a well-designed single-Agent system should make cross-Agent collaboration easy to implement.
-
Memory Approaches: The fundamental limitation of LLMs is their limited context window, necessitating memory management over time. Some frameworks have built-in memory management, while others expect developers to manage memory themselves. CrewAI and AutoGen rely entirely on RAG-based memory, while phidata and Letta use additional techniques such as self-editing memory (from MemGPT) and recursive summarization. Letta Agents come automatically equipped with a set of memory management tools, allowing Agents to search previous messages through text or data, write memories, and edit their own context windows (you can read more about this here).
-
Supporting Open Models: Model providers actually do a lot of behind-the-scenes tricks to get LLMs to generate text in the right format (e.g., for tool calls)—for example, resampling LLM outputs when they do not generate appropriate tool parameters or adding prompts in the prompt (e.g., “Please output JSON”). Supporting open models requires frameworks to handle these challenges, so some frameworks limit support to major model providers.
When building Agents today, the right framework choice depends on your application, such as whether you are building a conversational Agent or a workflow, whether you want to run the Agent in a notebook or as a service, and your requirements for supporting open-weight models.
We expect the main differences between frameworks to emerge in their deployment workflows, with state/memory management and design choices for tool execution becoming increasingly important.
Agent Hosting and Agent Services
Most Agent frameworks today are designed for Agents that do not exist outside the Python scripts or Jupyter notebooks they are written in. We believe the future of Agents is to treat them as a _service_ that is deployed to local or cloud infrastructure and can be accessed via REST APIs. Just as OpenAI’s ChatCompletion API has become the industry standard for interacting with LLM services, we expect there will eventually be a winner for the Agent API. But there is not one yet…
Deploying Agents as a service is significantly more complex than deploying LLMs as a service due to the issues of state management and secure tool execution. Tools and their required dependencies and environment requirements need to be explicitly stored in databases, as the environment running them needs to be recreated by the service (this is not an issue when your tools and Agents run within the same script). Applications may need to run millions of Agents, each accumulating an increasing amount of conversation history. When transitioning from prototype to production, Agent states inevitably must undergo a data normalization process, and Agent interactions must be defined by REST APIs. Today, this process is often accomplished by developers writing their own FastAPI and database code, but we expect this functionality to become more embedded in frameworks as Agents mature.
The Agent technology stack is still very early, and we are excited about how the ecosystem will expand and evolve. Do you have any additional insights on the future development of the Agent technology stack?
https://www.letta.com/blog/ai-agents-stack