This article explains the relevant concepts of RAG, combined with code examples based on the “Building a Personal Knowledge Base with ERNIE SDK + LangChain”.

Concept
In 2020, the Facebook AI Research (FAIR) team published a paper titled “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”. This paper first introduced the concept of RAG, which is currently an important concept in the field of large language models, and provided a detailed explanation of this concept.
This diagram is an overview of the FAIR team’s approach. It combines a pre-trained retriever (query encoder + document index) and performs end-to-end fine-tuning. For query x, the authors use maximum inner product search (MIPS) to find the top K documents zi for the final prediction y, treating z as a latent variable and marginalizing over seq2seq predictions given different literature.
The RAG model combines language models and information retrieval techniques. Specifically, when the model needs to generate text or answer questions, it first retrieves relevant information from a large collection of documents, and then uses this retrieved information to guide text generation, thereby improving the quality and accuracy of predictions.
Among them, “Retrieval”, “Utilization”, and “Generation” are the key parts of RAG. So how can we understand these three parts more intuitively?
Let’s take a simple example:
You are writing an article about puppies, but your knowledge about puppies is limited. At this point, you are likely to perform the following actions:
1. Retrieval:First, you open your computer and enter the search request with the keyword “puppy” to retrieve a large number of articles, blogs, and information about puppies on the internet.
2. Utilization:Next, you analyze these search results and extract important information, including dog breeds, behavior patterns, feeding methods, etc. You organize this information into a knowledge base, which is like an encyclopedia containing various knowledge points about puppies.
3. Generation:Now, you need to write the article. At the beginning of the article, you introduce with a question: “How long do puppies live?” Then, you can use the previously retrieved and organized information to answer the question or generate paragraphs for the article. This step is not just a simple copy-paste, but involves generating natural and fluent text based on context and grammatical rules.
In fact, the workflow of “you” above is the workflow of “RAG”, where “you” can be seen as a RAG model, namely “Retrieval”, “Utilization”, and “Generation”.
After understanding the basic workflow of RAG, you may wonder: In what scenarios is RAG mainly used?
If they perform “Retrieval”, “Utilization”, and “Generation” in these scenarios, what specific work will it involve?
RAG technology can play a role in the following common natural language processing tasks:
1. Question Answering Systems (QA Systems):RAG can be used to build powerful question-answering systems that can answer a variety of questions posed by users. It can provide accurate answers by retrieving from a large collection of documents without specific training for each question.
2. Document Generation and Automatic Summarization:RAG can be used to automatically generate article paragraphs, documents, or summaries based on retrieved knowledge to fill in the text, making the generated content more informative.
3. Intelligent Assistants and Virtual Agents:RAG can be used to build intelligent assistants or virtual agents that answer user questions, provide information, and perform tasks based on chat logs, without specific task fine-tuning.
4. Information Retrieval:RAG can improve information retrieval systems, making them more accurate and profound. Users can pose more specific queries, no longer limited to keyword matching.
5. Knowledge Graph Population:RAG can be used to populate entity relationships in knowledge graphs by retrieving documents to identify and add new knowledge points.
The above are some common application scenarios of RAG. After clarifying the application scope of RAG, you may wonder: Why do these scenarios need to use RAG instead of fine-tuning or other methods?
Next, let’s further understand the advantages of RAG.
The following are the specific advantages of RAG:
1. Utilization of External Knowledge:The RAG model can effectively utilize external knowledge bases, referencing a large amount of information to provide deeper, more accurate, and valuable answers, which improves the reliability of the generated text.
2. Timeliness of Data Updates:The RAG model has a retrieval library update mechanism, allowing for instant updates of knowledge without retraining the model. This means that the RAG model can provide answers related to the latest information, highly adaptable to applications that require timeliness.
3. Explanatory Responses:Because the answers from the RAG model come directly from the retrieval library, its responses have strong interpretability, reducing the hallucinations of large models. Users can verify the accuracy of the answers and obtain support from the information source.
4. High Customization Capability:The RAG model can be customized based on specific domain knowledge bases and prompts, quickly acquiring capabilities in that domain. This indicates that the RAG model is widely applicable in various fields and applications, such as virtual companions and virtual pets.
5. Security and Privacy Management:The RAG model can achieve security control by restricting access permissions to the knowledge base, ensuring that sensitive information is not leaked, thereby improving data security.
6. Reduced Training Costs:The RAG model has strong scalability in data, allowing large amounts of data to be directly updated to the knowledge base for knowledge updates. This process does not require retraining the model, making it more economical.
Comparison with Fine-Tuning
Next, by comparing RAG with fine-tuning, we help everyone choose the appropriate strategy based on specific business needs:
-
Task Specific vs Generality:Fine-tuning is usually optimized for specific tasks, while RAG is general and can be used for multiple tasks. Fine-tuning performs well for specific task completion but lacks flexibility on generality.
-
Knowledge Referencing vs Learning:The RAG model generates answers by referencing the knowledge base, while fine-tuning generates answers by learning task-specific data. The answers from RAG come directly from external knowledge, making them easier to verify.
-
Timeliness vs Training:The RAG model can achieve instant knowledge updates without retraining, giving it an advantage in applications with high timeliness requirements. Fine-tuning usually requires retraining the model, which is time-consuming.
-
Interpretability vs Difficulty of Explanation:RAG’s answers are highly interpretable because they come from the knowledge base. The internal learning of fine-tuned models may be difficult to explain.
-
Customization vs Generality:RAG can be customized for specific domains, while fine-tuning requires specific fine-tuning for each task and more task-specific data.
Combining the above comparisons, we can clearly see that the advantages of RAG lie in generality, knowledge referencing, timeliness, and interpretability, while fine-tuning may be more suitable for specific tasks but requires more task-specific data and training. The choice of which method to use should be determined based on specific application needs and tasks.
So how is RAG specifically implemented?
We will use a simple code example: based on ERNIE SDK and LangChain to build a personal knowledge base.
!pip install --upgrade erniebot
# Test embedding
import erniebot
erniebot.api_type = "aistudio"
erniebot.access_token = "<your_token>"
response = erniebot.Embedding.create(
model="ernie-text-embedding",
input=[
"I am an artificial intelligence language model developed by Baidu. My Chinese name is Wenxin Yiyan, and my English name is ERNIE-Bot. I can assist you in completing a wide range of tasks and provide information on various topics, such as answering questions, providing definitions and explanations, and suggestions. If you have any questions, feel free to ask me."
])
print(response.get_result())
Import Chromadb Vector Database
Custom Embedding Function
Define a custom embedding function to convert text content into embedding vectors. The ERNIE Bot library is used to create text embeddings, and the Chromadb library is used to manage these embedding vectors.
import os
import erniebot
from typing import Dict, List, Optional
import chromadb
from chromadb.api.types import Documents, EmbeddingFunction, Embeddings
def embed_query(content):
response = erniebot.embedding.create(
model="ernie-text-embedding",
input=[content])
result = response.get_result()
print(result)
return result
class ErnieEmbeddingFunction(EmbeddingFunction):
def __call__(self, input: Documents) -> Embeddings:
embeddings = []
for text in input:
response = embed_query(text)
try:
embedding = response[0]
embeddings.append(embedding)
except (IndexError, TypeError, KeyError) as e:
print(f"Error processing text: {text}, Error: {e}")
return embeddings
chroma_client = chromadb.Client()
chroma_client = chromadb.PersistentClient(path="chromac") # Optional: Data saved to hard drive location
collection = chroma_client.create_collection(name="demo", embedding_function=ErnieEmbeddingFunction())
print(collection)
Select course content as the knowledge base:
https://aistudio.baidu.com/datasetdetail/260836
Use the LangChain library to process and split text documents
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader
loader = TextLoader('./AI大课逐字稿.txt',encoding='utf-8')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=600, chunk_overlap=20)
docs = text_splitter.split_documents(documents)
docs
Convert the list of split documents into embedding vectors for further analysis and processing.
import uuid
docs_list=[]
metadatas=[]
ids=[]
for item in docs:
docs_list.append(item.page_content)
metadatas.append({"source": "AI大课逐字稿"})
ids.append(str(uuid.uuid4()))
collection.add(
documents=docs_list,
metadatas=metadatas,
ids=ids
)
query = "The lecturer said there are two erroneous mindsets when meeting VCs, what are they?"
results = collection.query(
query_texts=[query],
n_results=2
)
content=results['documents'][0]
[ ]
prompt=f"""
User question:{query}
<context>
{content}
</context>
Answer the user question based on the knowledge points in <context>
"""
response = erniebot.ChatCompletion.create(model="ernie-4.0", messages=[{"role": "user", "content": prompt}])
print(response.get_result())
# The lecturer said there are two erroneous mindsets when meeting VCs:
##1. Applying past methods to today's AI, such as comparing it to an OS. Once it is compared to an operating system, the conclusion is that there will inevitably be two to three monopolies in the world, and you might think there will be no opportunities left, which is a typical case of missing the boat.
#2. People tend to be overly critical of new generation entrepreneurs who are innovating, especially some VCs who easily fall into this trap. For example, OpenAI has succeeded, and it has been proven; even a fool can see that OpenAI has been very successful. We tend to worship it and almost kneel down to it. However, for many entrepreneurs whose ideas are still not formed, because they haven't even started, we only see your many shortcomings. This value judgment is wrong and easily misses some potential projects.
It contains the text embedding vectors stored in the previous steps. The purpose of the function is to receive user queries, retrieve relevant information from the database, and generate an answer.
def main(query):
results = collection.query(
query_texts=[query],
n_results=2
)
content=results['documents'][0]
prompt=f"""
User question:{query}
<context>
{content}
</context>
Answer the user question based on the knowledge points in <context>
"""
response = erniebot.ChatCompletion.create(model="ernie-4.0", messages=[{"role": "user", "content": prompt}])
return response.get_result()
query=input("Please enter your query:")
print(main(query))
https://aistudio.baidu.com/projectdetail/7431640
Clearly, the application of RAG is not limited to this; various advanced applications of RAG have also emerged.
By continuously optimizing RAG to enhance its information understanding capabilities, it can understand questions more thoroughly and find highly relevant information to generate more accurate answers. For example, in response to the instruction “Talk about the characteristics of Golden Retrievers”, an advanced RAG model can understand that this is a question about a specific breed of dog and extract detailed information about Golden Retrievers from the knowledge base, such as physique, personality, history, etc., to align with the granularity of the question and provide detailed answers.
During the optimization of RAG, a series of related methods have also emerged.
In the fields of information retrieval and search engine optimization, implementing a series of strategies can significantly enhance the performance of retrieval systems. Index optimization can improve retrieval accuracy and efficiency through methods such as enhancing data granularity, optimizing index structures, adding metadata, alignment optimization, and hybrid retrieval. Optimization of vector representation models enhances the model’s understanding of specific domains or issues through fine-tuning and dynamic embedding technologies. Post-retrieval processing strategies, such as reordering and prompt compression, further enhance the relevance of retrieval results and user satisfaction. Recursive retrieval and search engine optimization achieve more complex and precise retrieval requirements through techniques such as recursive retrieval and subqueries. Finally, RAG evaluation ensures that retrieval systems meet user needs in all aspects through independent evaluation and end-to-end evaluation methods. The implementation of these strategies collectively drives the advancement of retrieval technology, providing users with more efficient and precise information services.
Refer to the diagram below:
In addition to these five methods, there are many other advanced RAG applications. Everyone can refer to relevant papers on the parts they are interested in for further study and understanding.
Follow the 【PaddlePaddle】 public account
To get more technical content~