Implementing RAG Queries in LlamaIndex Agent

Implementing RAG Queries in LlamaIndex Agent

Overview

This article explains how to integrate a RAG query engine into an Agent, enabling the Agent to utilize external knowledge bases for data queries, thus enhancing its capabilities.

This approach is useful in many scenarios, for instance: often we need to query or compute a specific metric first, and if the data or metric meets certain conditions, we then process it using tool functions. The experiments in this article are based on an example provided by LlamaIndex, which can be extended to fit our own business needs.

The code presented in this article runs entirely on an x86 machine and is deployed locally, without relying on any OpenAI interfaces, making it completely free.

Implementation Strategy

To demonstrate how to use the RAG engine within an Agent, we will create a very simple RAG query engine. Our source data will be the Wikipedia page on the 2023 Canadian federal budget, which has been printed in PDF format.

By combining RAG and Agent, we can first retrieve the corresponding data from the document and then invoke tool functions to process the data retrieved from the document.

Note: These steps are automatically executed by the large model based on its semantic understanding, selecting the appropriate tools. We only need to provide our code with instructions on what we want.

Running Environment

My environment is a virtual machine without a GPU, configured as follows:

  • CPU Type: x86

  • Core Count: 16 cores

  • Memory: 32G

Implementation Steps

  1. Prepare the Local Embedding Model

Download all files of the bge-base-en-v1.5 model from Hugging Face or other sources to the following directory:

/opt/models/BAAI/bge-base-en-v1.5
  1. Prepare the Local Large Model

You can directly obtain the large model locally using Ollama; I am using gemma2 here.

ollama pull gemma2
  1. Add the RAG Query Engine to the Agent’s Tool List

# rag pipeline as a tool
budget_tool = QueryEngineTool.from_defaults(
    query_engine, 
    name="canadian_budget_2023",
    description="A RAG engine with some basic facts about the 2023 Canadian federal budget."
)

# Add the rag query engine as a tool in the agent's tool list
agent = ReActAgent.from_tools([multiply_tool, add_tool, budget_tool], verbose=True)
  1. The Document Content We Want to Query Is as Follows

Implementing RAG Queries in LlamaIndex Agent

Save https://en.wikipedia.org/wiki/2023_Canadian_federal_budget as a PDF document, with the content as shown in the image above, and save it to the ./agent_rag_data directory.

Code Implementation

First, download the large model locally using Ollama and download and save the local embedding model. I am using the bge-base-en-v1.5 model, and all files for this model are stored in the /opt/models/BAAI/bge-base-en-v1.5 directory.

from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import FunctionTool
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Settings
from llama_index.core.tools import QueryEngineTool
from llama_index.llms.ollama import Ollama

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama

# Create the local large model
#Settings.llm = Ollama(model="llama3.2", request_timeout=360)
Settings.llm = Ollama(model="gemma2", request_timeout=360)

local_model = "/opt/models/BAAI/bge-base-en-v1.5"
# Create the local embedding large model
Settings.embed_model = HuggingFaceEmbedding(model_name=local_model)

# Create other data processing tool functions
def multiply(a: float, b: float) -> float:
    """Multiply two numbers and returns the product"""
    return a * b

multiply_tool = FunctionTool.from_defaults(fn=multiply)

def add(a: float, b: float) -> float:
    """Add two numbers and returns the sum"""
    return a + b

add_tool = FunctionTool.from_defaults(fn=add)

# Build the knowledge base query engine
documents = SimpleDirectoryReader("./agent_rag_data").load_data()
index = VectorStoreIndex.from_documents(documents,show_progress=True)
query_engine = index.as_query_engine()

# test 2
# rag pipeline as a tool
budget_tool = QueryEngineTool.from_defaults(
    query_engine, 
    name="canadian_budget_2023",
    description="A RAG engine with some basic facts about the 2023 Canadian federal budget."
)

agent = ReActAgent.from_tools([multiply_tool, add_tool, budget_tool], verbose=True)
response = agent.chat("What is the total amount of the 2023 Canadian federal budget multiplied by 3? Go step by step, using a tool to do any math.")
print(response)

Using the gemma2 local large model, the output result is as follows:

python agent_rag.py 
Parsing nodes: 100%|█████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 1161.94it/s]
Generating embeddings: 100%|███████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.92it/s]
> Running step a8beded3-6a08-4d4d-acec-189c63f0fbf3. Step input: What is the total amount of the 2023 Canadian federal budget multiplied by 3? Go step by step, using a tool to do any math.
Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: canadian_budget_2023
Action Input: {'input': 'What is the total amount of the 2023 Canadian federal budget?'}
Observation: $496.9 billion 

> Running step a783cc6d-ee83-4166-af2c-e2ea58333a9a. Step input: None
Thought: I need to multiply this by three.
Action: multiply
Action Input: {'a': 496.9, 'b': 3}
Observation: 1490.6999999999998
> Running step 8dfa9d5c-c729-454d-bc97-7c59beba99b2. Step input: None
Thought: I can answer without using any more tools. I'll use the user's language to answer
Answer: The total amount of the 2023 Canadian federal budget multiplied by 3 is $1,490.7 billion.
The total amount of the 2023 Canadian federal budget multiplied by 3 is $1,490.7 billion.

From the above result, we can see that the gemma2 combined with the embedding model retrieved the data of $496.9 billion from the document, and then multiplied that data by 3 to obtain the final required result.

Note: You can try using other large models, such as llama3, but sometimes the results may not be satisfactory, and you may not get the desired answers. Therefore, further testing and optimization are needed for better usage.

References

Leave a Comment