Prompt is the fundamental input that grants LLM expressive capabilities. LlamaIndex uses prompts to build indexes, execute inserts, retrieve during queries, and synthesize final answers.
LlamaIndex provides a set of out-of-the-box default prompt templates:
https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/prompts/default_prompts.py
Additionally, here are some prompts specifically written for chat models like gpt-3.5-turbo:
https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/prompts/chat_prompts.py
Custom Prompts
Users can also provide their own prompt templates to further customize the framework’s behavior. The best way to customize is to copy the default prompts from the links above and modify them:
from llama_index.core import PromptTemplate
template = ( "We have provided context information below. \n" "---------------------\n" "{context_str}" "\n---------------------\n" "Given this information, please answer the question: {query_str}\n")
qa_template = PromptTemplate(template)
# you can create text prompt (for completion API)
prompt = qa_template.format(context_str=..., query_str=...)
# or easily convert to message prompts (for chat API)
messages = qa_template.format_messages(context_str=..., query_str=...)
You can even directly translate the English prompt into Chinese and then use your preferred Chinese LLM.
Note! Do not change the strings {context_str} and {query_str}.
We can also customize the message templates for chat:
from llama_index.core import ChatPromptTemplate
from llama_index.core.llms import ChatMessage, MessageRole
message_templates = [ ChatMessage(content="You are an expert system.", role=MessageRole.SYSTEM), ChatMessage( content="Generate a short story about {topic}", role=MessageRole.USER, ),]
chat_template = ChatPromptTemplate(message_templates=message_templates)
# you can create message prompts (for chat API)
messages = chat_template.format_messages(topic=...)
# or easily convert to text prompt (for completion API)
prompt = chat_template.format(topic=...)
Getting and Setting Custom Prompts
The most commonly used prompt templates are as follows:
-
text_qa_template: Used to get the initial answer to a query from the retrieved nodes.
-
refine_template: Used when the retrieved text is not suitable for a single LLM call with response_mode = “compact” (default), or when retrieving multiple nodes with response_mode = “refine”. The answer to the first query will be inserted as an existing answer, and the LLM must update or repeat the existing answer based on the new context.
We can call get_prompts on many modules in LlamaIndex to get the list of prompts used in the module and nested submodules:
query_engine = index.as_query_engine(response_mode="compact")
prompts_dict = query_engine.get_prompts()
print(list(prompts_dict.keys()))
The output is as follows:
['response_synthesizer:text_qa_template', 'response_synthesizer:refine_template']
Note that each prompt has its own submodule as its namespace.
We can use the update_prompts function to update prompts based on keys:
query_engine.update_prompts( {"response_synthesizer:text_qa_template": qa_template})
Changing Prompts in the Query Engine
For the query engine, we can also directly pass custom prompts during queries:
-
Through the high-level API:
query_engine = index.as_query_engine( text_qa_template=custom_qa_prompt, refine_template=custom_refine_prompt)
2. Through the low-level API:
retriever = index.as_retriever()
synth = get_response_synthesizer( text_qa_template=custom_qa_prompt, refine_template=custom_refine_prompt)
query_engine = RetrieverQueryEngine(retriever, response_synthesizer)
The above two methods are equivalent, where 1 is essentially syntactic sugar for 2 and hides potential complexity. We can use 1 to quickly modify some common parameters and use 2 for finer control.
Changing the Prompts Used When Building the Index
The most commonly used VectorStoreIndex and SummaryIndex do not use any prompts during construction.
However, some indexes may use different prompts during construction. For example, TreeIndex uses summary prompts to hierarchically summarize nodes, while KeywordTableIndex uses keyword extraction prompts to extract keywords.
There are two equivalent methods to override the default prompts:
index = TreeIndex(nodes, summary_template=custom_prompt)
and
index = TreeIndex.from_documents(docs, summary_template=custom_prompt)
Some Advanced Prompt Techniques
Partial Formatting, filling in part of the variables first, and the rest later:
from llama_index.core import PromptTemplate
prompt_tmpl_str = "{foo} {bar}"
prompt_tmpl = PromptTemplate(prompt_tmpl_str)
partial_prompt_tmpl = prompt_tmpl.partial_format(foo="abc")
fmt_str = partial_prompt_tmpl.format(bar="def")
Template Variable Mapping
LlamaIndex prompt abstractions often require certain keys (if you’re familiar with Python string formatting). For example, text_qa_prompt requires context_str as context and query_str as user query.
If we find a good prompt template elsewhere but the keys in the template are different, we can either manually replace those keys or define template_var_mappings:
template_var_mappings = {"context_str": "my_context", "query_str": "my_query"}
prompt_tmpl = PromptTemplate( qa_prompt_tmpl_str, template_var_mappings=template_var_mappings)
Function Mapping
Passing functions as template variables instead of fixed values. This is quite advanced and powerful, allowing for dynamic prompts.
Here is an example of reformatting context_str:
def format_context_fn(**kwargs): # format context with bullet points context_list = kwargs["context_str"].split("\n\n") fmtted_context = "\n\n".join([f"- {c}" for c in context_list]) return fmtted_context
prompt_tmpl = PromptTemplate( qa_prompt_tmpl_str, function_mappings={"context_str": format_context_fn})
prompt_tmpl.format(context_str="context", query_str="query")