Practical Implementation of Context Mode in ChatEngine

Overview

The ContextChatEngine class is a contextual chat engine designed to provide a smooth chat experience by retrieving contextual information from the chat and using a language model (LLM) to generate responses based on system prompts.

It is a simple chat mode built on top of a data retriever. For each chat interaction:

First, retrieve text from the index using the user message.
Set the retrieved text as context in the system prompt.
Return the answer to the user’s message.

This method is straightforward and suitable for questions directly related to knowledge bases and general interactions.

Implementation Logic

Build and use a local large model. Here, the gemma2 model is used, but other large models can also be configured.
Build an index from the documents.
Define a memory buffer to save the historical chat content.
Convert the index into a query engine: index.as_chat_engine, and set the chat_mode and the memory buffer for historical messages.

Note: Since the retrieved context may occupy a large portion of the available LLM context, we must ensure that we configure a smaller limit for chat history:

memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

Implementation Code

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama

local_model = "/opt/models/BAAI/bge-base-en-v1.5"
# bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name=local_model)

# ollama
Settings.llm = Ollama(model="gemma2", request_timeout=360.0)

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

data = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(data)

from llama_index.core.memory import ChatMemoryBuffer
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

# Build chat engine
chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about an essay discussing Paul Graham's life."
    ),
)

# Test results
response = chat_engine.chat("Hello!")
print(response)

response = chat_engine.chat("What did Paul Graham do growing up?")
print(response)

response = chat_engine.chat("Can you tell me more?")
print(response)


print("--------------reset chat-------------------------")
chat_engine.reset()
response = chat_engine.chat("Hello! What do you know?")
print(response)

Output

From the following output, it can be seen that different large models produce slightly different outputs. This aligns with our expectations.

$ python chat_context.py 
Hello! 👋

I'm ready to chat.  Is there anything you'd like to know about Paul Graham or his essay? I have access to the text at "/root/work/do_llamaindex/data/paul_graham/paul_graham_essay.txt", so I can answer questions about it directly. 

What are you curious about? 😊  

According to the essay, Paul Graham was a bit of a loner growing up and spent a lot of time reading science fiction and learning to program. 

He even started writing his own programs at a young age, showing early signs of his future success in the tech world.  He wasn't particularly interested in traditional school activities but found joy and fulfillment in exploring the world of computers.


Do you want to know more about his childhood or something else from the essay?
Sure! 

The essay describes Graham as being quite solitary as a child. He wasn't very interested in sports or typical social activities that other kids enjoyed. Instead, he preferred immersing himself in books, particularly science fiction, and teaching himself to program.  He even built his own computer from scratch at one point! 

His parents recognized his unique interests and supported him in pursuing them. They encouraged his love of learning and provided him with the space and resources to explore his passions. 

This early focus on self-directed learning and technical pursuits would undoubtedly shape Graham's future path as a successful programmer, entrepreneur, and influential figure in the tech world.


Is there anything specific about his childhood that you'd like to know more about?
--------------reset chat-------------------------
Hello! I know that I have access to an essay about Paul Graham's life located at the file path /root/work/do_llamaindex/data/paul_graham/paul_graham_essay.txt. 

I can tell you things about the essay's content, but I haven't actually read it yet. Would you like me to open the file and summarize it for you? Or perhaps you have some specific questions about Paul Graham that you'd like me to try and answer based on the information in the essay?

Summary

By caching historical messages, we can obtain contextually relevant information that allows the large model’s responses to be more accurate. Of course, I believe we cannot rely entirely on this caching mechanism, as the amount of data that can be cached is limited, and there may be inaccuracies in retrieving relevant contextual content.

Overview

Implementation Logic

Implementation Code

Output

Summary

Leave a Comment Cancel reply