LlamaIndex Practical - ChatEngine Condense Question Mode

Overview

The Condense Question mode is a simple chat mode built on top of a data query engine. It provides a flexible chat engine by compressing the conversation context and latest message into standalone questions, which are then interacted with the query engine.

For each chat interaction:

1. First, generate a standalone question based on the conversation context and last message;
2. Use the compressed question to query the query engine for a response;

This method is straightforward and is suitable for questions directly related to the knowledge base. Since it always queries the knowledge base and relies on the knowledge therein, it may struggle to answer meta-questions like “What did I ask you before?”

Implementation Logic

Build and use a local large model. Here, the llama3.2 model is used, but other large models can also be configured.
Build an index from the documents.
Convert the index into a query engine:index.as_chat_engine, and set chat_mode.

Note: I am using the local large model llama3.2; the effect is not very good with gemma2.

Implementation Code

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader


local_model = "/opt/models/BAAI/bge-base-en-v1.5"
# bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name=local_model)
# ollama
Settings.llm = Ollama(model="llama3.2", request_timeout=360.0)

data = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(data)


# Use condense_question mode
chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)

print("What did Paul Graham do after YC?")
response = chat_engine.chat("What did Paul Graham do after YC?")
print(response)

print("What about after that?")
response = chat_engine.chat("What about after that?")
print(response)


print("Can you tell me more?")
response = chat_engine.chat("Can you tell me more?")
print(response)


chat_engine.reset()
print("What about after that?")
response = chat_engine.chat("What about after that?")
print(response)

Output

From the following output, it can be seen that this chat mode retrieves the corresponding contextual information from the historical dialogue content, making the chat content more coherent. However, this effect also depends on the capability of the large model. If I use the gemma2 large model here, the effect would not be very good.

$python chat_condense_question.py 
What did Paul Graham do after YC?
Querying with: What did Paul Graham do after YC?
After his time at Y Combinator, he went on to found a number of companies, including Foundry Group, which is an early stage venture capital firm that focuses on investing in startups.

What about after that?
Querying with: What were Paul Graham's activities and ventures after founding Foundry Group?
Paul Graham went on to establish Y Combinator, a well-known startup accelerator. He also continued to invest in various startups through his venture capital firm, Founders Fund. Additionally, he has been involved in other business activities and investments outside of these endeavors.

Can you tell me more?
Querying with: Here is the rewritten message:
"What happened after Paul Graham co-founded Y Combinator?"
This standalone question captures all relevant context from the conversation, including his role as a venture capitalist through Foundry Group and his involvement with Founders Fund.
Paul Graham's co-founding of Y Combinator marked a significant turning point in his career. Following its inception, he shifted his focus to investing, leveraging his experience as a founder to guide early-stage companies. He also became involved with Foundry Group and eventually Founders Fund, solidifying himself as a prominent venture capitalist.

# Results after reset
What about after that?
Querying with: What about after that?
It seems like a natural progression would be to look at what follows immediately after in the text. The original passage appears to be setting up a question or scenario that leads into further discussion or consideration of the topic at hand.

Source Code Analysis

The CondenseQuestionChatEngine class provides a flexible chat engine by compressing the conversation context and the latest message into standalone questions and interacting with the query engine.

The main functions of the chat function are:

Retrieve historical messages: If no chat history is provided during the call, retrieve chat history related to the current message from memory.
Then generate a standalone question based on the chat history and the latest message.
Use the query engine to process the question and obtain a response.
Record user messages and assistant responses.
Return the assistant’s response and related tool outputs.

Summary

The condense_question mode generates standalone questions based on historical and latest messages. Then, it queries the desired content through the capabilities of the query engine. This mode allows the chat engine to be more coherent.

LlamaIndex Practical – ChatEngine Condense Question Mode