LlamaIndex Practical Application – ChatEngine ReAct Agent Mode

Overview

ReAct is an agent-based chat mode built on top of a data query engine. For each chat interaction, the agent enters a ReAct loop:

  • First, decide whether to use the query engine tool and propose appropriate input

  • (Optional) Use the query engine tool and observe its output

  • Decide whether to repeat or give a final response

This approach is flexible because it can choose whether to query the knowledge base flexibly, and it is implemented based on agents. However, performance also relies more on the quality of the LLM. You may need to enforce more to ensure it chooses to query the knowledge base at the right time rather than generating hallucinated answers.

Implementation Logic

  1. Build and use a local large model. The model used here is gemma2, and other large models can also be configured.

  2. Build an index from the documents

  3. Convert the index into a query engine:index.as_chat_engine, and set chat_mode to react.

Note: I am using the local large model gemma2 here, and the performance may not be as good as OpenAI’s.

Implementation Code

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama

local_model = "/opt/models/BAAI/bge-base-en-v1.5"

# bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name=local_model)
# ollama
Settings.llm = Ollama(model="gemma2", request_timeout=360.0)

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

data = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(data)

# Set to use react mode
chat_engine = index.as_chat_engine(chat_mode="react", llm=Settings.llm, verbose=True)

response = chat_engine.chat( "Use the tool to answer what did Paul Graham do in the summer of 1995?")

Output

From the following output, it can be seen that different large models produce different outputs. The agent obtained the corresponding index and text information through the query engine.

$ python chat_react.py 
> Running step 3e748b23-a1bb-4807-89f6-7bda3b418b86. Step input: Use the tool to answer what did Paul Graham do in the summer of 1995?
Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: query_engine_tool
Action Input: {'input': 'What did Paul Graham do in the summer of 1995?'}
Observation: He worked on his Lisp-based web server.  

> Running step 5f4592b6-f1d0-4fcf-8b03-a50d46641ef2. Step input: None
Thought: I can answer without using any more tools. I'll use the user's language to answer
Answer: In the summer of 1995, Paul Graham worked on his Lisp-based web server.

Implementation Analysis

From the following implementation code, it can be seen that when the chat mode is REACT mode, an AgentRunner is created, and the query engine is placed in the agent’s tool list.

    def as_chat_engine(
        self,
        chat_mode: ChatMode = ChatMode.BEST,
        llm: Optional[LLMType] = None,
        **kwargs: Any,
    ) -> BaseChatEngine:    
           
           if chat_mode in [ChatMode.REACT, ChatMode.OPENAI, ChatMode.BEST]:
            # use an agent with query engine tool in these chat modes
            # NOTE: lazy import
            from llama_index.core.agent import AgentRunner
            from llama_index.core.tools.query_engine import QueryEngineTool

            # convert query engine to tool
            query_engine_tool = QueryEngineTool.from_defaults(query_engine=query_engine)

            return AgentRunner.from_llm(
                tools=[query_engine_tool],
                llm=llm,
                **kwargs,
            )

Summary

The REACT mode creates an agent and places the query engine as a tool within that agent. Then, it queries the desired content through the capabilities of the query engine.

Leave a Comment