Understanding Agentic RAG: AI-Driven Retrieval Augmentation

Despite Retrieval-Augmented Generation (RAG) dominating in 2023, agentic workflows bring significant advancements in 2024. The application of AI agents opens up new possibilities for developing more powerful, robust, and versatile applications driven by large language models (LLMs). One possibility is to leverage AI agents in agentic RAG pipelines to enhance RAG processes.

This article will introduce you to the concept of agentic RAG, its implementation methods, and its advantages and disadvantages.

# Agentic RAG

Agentic RAG describes an AI agent-based implementation of RAG. Before diving deeper, let’s quickly review the basic concepts of RAG and AI agents.

## What is Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a technique for building LLM-driven applications. It utilizes external knowledge sources to provide relevant context for LLMs and reduce hallucinations.

A simple RAG pipeline consists of a retrieval component (usually made up of an embedding model and a vector database) and a generation component (LLM). During inference, user queries are used to perform similarity searches on indexed documents to retrieve the most relevant documents and provide additional context to the LLM.

Understanding Agentic RAG: AI-Driven Retrieval Augmentation

Typical RAG applications have two significant limitations:

A simple RAG pipeline considers only one external knowledge source. However, some solutions may require two external knowledge sources, while others may need external tools and APIs, such as web searches.
They are one-time solutions, meaning context is only retrieved once. There is no reasoning or validation of the quality of the retrieved context.

## What is an Agent in AI Systems?

With the popularity of LLMs, new paradigms of AI agents and multi-agent systems have emerged. AI agents are LLMs with roles and tasks that can access intrinsic or external tools. The reasoning capabilities of LLMs help agents plan the necessary steps and take actions to complete the tasks at hand.

Thus, the core components of AI agents are:

LLM (with roles and tasks)
Memory (short-term and long-term)
Planning (e.g., reflection, self-criticism, query routing, etc.)
Tools (e.g., calculators, web searches, etc.)

Understanding Agentic RAG: AI-Driven Retrieval Augmentation

A popular framework is the ReAct framework. ReAct agents can handle sequential multi-part queries by combining routing, query planning, and tool usage into a single entity while maintaining state (in memory).

ReAct = Reason + Act (LLM)

The process involves the following steps:

Think: After receiving a user query, the agent infers the next action to take
Action: The agent decides on an action and executes it (e.g., using a tool)
Observe: The agent observes the feedback from the action

This process repeats continuously until the agent completes the task and responds to the user.

## What is Agentic RAG

Agentic RAG describes an AI agent-based implementation of RAG. Specifically, it integrates AI agents into the RAG pipeline to coordinate its components and perform additional operations beyond simple information retrieval and generation, overcoming the limitations of non-agentic pipelines.

Agentic RAG describes an AI agent-based implementation of RAG.

### How Agentic RAG Works

While agents can be incorporated into various stages of the RAG pipeline, agentic RAG most commonly refers to the use of agents in the retrieval (Retrieval) component.

Specifically, the retrieval component becomes agentic by using retrieval agents that can access different retrieval tools, such as:

Vector search engines (also known as query engines), which perform vector searches through vector indexing (as in a typical RAG pipeline)
Web searches
Calculators
Any API that can be programmatically accessed, such as email or chat programs
And so on.

Then, RAG agents can reason and take actions based on the following example retrieval scenarios:

Decide whether to retrieve information
Decide which tool to use to retrieve relevant information
Formulate the query itself
Evaluate the retrieved context and decide whether re-retrieval is needed.

### Agentic RAG Architecture

Compared to sequential simple RAG architectures, the core of agentic RAG architecture is the agent. Agentic RAG architectures can have varying degrees of complexity. In its simplest form, a single-agent RAG architecture is a simple router. However, you can also add multiple agents to a multi-agent RAG architecture. This section discusses two basic RAG architectures.

#### Single-Agent RAG (Router)

The simplest form is that agentic RAG is a router. This means you have at least two external knowledge sources, and the agent decides from which source to retrieve additional context. However, the external knowledge sources do not have to be limited to (vector) databases. You can also retrieve more information from tools. For example, you can perform web searches, or you can use APIs to retrieve additional information from Slack channels or your email accounts.

#### Multi-Agent RAG System

As you might guess, single-agent systems also have their limitations, as they are limited to one agent, integrating reasoning, retrieval, and answer generation into one unit. Therefore, linking multiple agents into multi-agent RAG applications is beneficial.

For example, you could have a primary agent coordinating information retrieval among multiple specialized retrieval agents. One agent could retrieve information from proprietary internal data sources. Another agent could specialize in retrieving information from your personal accounts (e.g., email or chat). Another agent could also specialize in retrieving public information from web searches.

### Beyond Retrieval Agents

The examples above demonstrate the use of different retrieval agents. However, you can also use agents for purposes beyond retrieval. The possibilities of agents in RAG systems are diverse.

Agentic RAG vs. (Vanilla) RAG

While the basic concept of RAG (sending queries, retrieving information, and generating responses) remains unchanged, the use of tools generalizes it, making it more flexible and powerful.

Think of it this way: a standard (vanilla) RAG is like answering specific questions in a library (before smartphones existed). On the other hand, agentic RAG is like having a smartphone with a web browser, calculator, email, etc.

## Implementing Agentic

As mentioned earlier, agents consist of multiple components. To build an agentic RAG pipeline, there are two options: language models with function calling or agent frameworks. Both implementations yield the same results; it merely depends on the control and flexibility you desire.

### Large Language Models with Function Calling

Language models are the primary components of agentic RAG systems. Another component is tools, which enable language models to access external services. Language models with function calling provide a way to build agent systems that allow the model to interact with predefined tools. Language model providers have added this functionality to their offerings.

In June 2023, OpenAI released function calling and gpt-3.5-turbo. gpt-4 enables these models to reliably connect GPT capabilities with external tools and APIs. Developers quickly began building applications that could plug in gpt-4 code executors, databases, calculators, and more.

Cohere further introduced its connector API, adding tools to the Command-R model suite. Additionally, Anthropic and Google have also launched function calling for Claude and Gemini. By providing these models with external services, they can access and reference web resources, execute code, and more.

Function calling is not limited to proprietary models. Ollama has introduced tool support for popular open-source models (like Llama3.2, nemotron-mini, etc.).

To build tools, you first need to define functions. In this code snippet, we are writing a function that retrieves objects from a database using Weaviate’s hybrid search:

Then, we will pass this function to the language model through tools_schema. This schema will be used in the language model’s prompt (think of it as an open API description):

Since you are directly connected to the language model API, you need to write a loop that routes between the language model and the tools:

Your queries will look like this:

Below is the GitHub Jupyter Book demonstration code

### Agent Frameworks

Agent frameworks such as DSPy, LangChain, CrewAI, LlamaIndex, and Letta have emerged to facilitate the use of language models in building applications. These frameworks simplify the construction of agentic RAG systems by combining pre-built templates.

DSPy supports ReAct agents and Avatar optimization. Avatar optimization describes the use of automatic prompt engineering in each tool description.
LangChain offers various services for using tools. LangChain’s LCEL and LangGraph frameworks further provide built-in tools.
LlamaIndex further introduces QueryEngineTool, a collection of templates for retrieval tools.
CrewAI is one of the leading frameworks for developing multi-agent systems. One of the key concepts of tool usage is sharing tools among agents.
Swarm is a framework built by OpenAI for multi-agent orchestration. Swarm also focuses on how agents share tools.
Letta reflects and improves the internal world model as a function for interaction. Beyond answering questions, this may also use search results to update the agent’s memory of chatbot users.

## Why Enterprises Adopt Agentic

Companies are transitioning from standard RAG to building agentic RAG applications. Replit has released an agent that helps developers build and debug software. Additionally, Microsoft has announced a co-pilot that works with users to provide task completion suggestions. These are just a few examples of agents in production, and their possibilities are endless.

### Advantages of Agentic RAG

The shift from standard RAG to agentic RAG enables these systems to produce more accurate responses, autonomously execute tasks, and better collaborate with humans.

The advantages of agentic RAG primarily lie in the improved quality of the retrieved additional information. By adding agents with tool usage capabilities, retrieval agents can route queries to specialized knowledge sources. Furthermore, the reasoning capabilities of agents allow the retrieved context to undergo a layer of validation before being used for further processing. Therefore, agentic RAG pipelines can deliver more robust and accurate responses.

### Limitations of Agentic RAG

However, everything has two sides. Using AI agents to perform subtasks means combining LLMs to complete the tasks. This brings limitations associated with using LLMs in any application, such as increased latency and unreliability. Depending on the reasoning capabilities of LLMs, agents may not be able to sufficiently complete tasks (or may not be able to complete them at all). It is crucial to incorporate appropriate failure modes to help AI agents navigate challenges when they cannot complete a task.

## Refer

https://arxiv.org/abs/2210.03629 ReAct
https://github.com/weaviate/recipes/blob/main/integrations/llm-frameworks/function-calling/ollama/ollama-weaviate-agents.ipynb
https://dspy.ai/deep-dive/modules/react/
https://www.langchain.com/
https://www.llamaindex.ai/
https://www.crewai.com/
https://github.com/openai/swarm
https://docs.letta.com/introduction
https://docs.replit.com/replitai/agent
https://blogs.microsoft.com/blog/2024/10/21/new-autonomous-agents-scale-your-team-like-never-before/
https://weaviate.io/blog/what-is-agentic-rag Original Article

# Agentic RAG

## What is Retrieval-Augmented Generation (RAG)

## What is an Agent in AI Systems?

Leave a Comment Cancel reply