1. What is RAG?
Retrieval-Augmented Generation (RAG) refers to optimizing the output of large language models to enable them to reference authoritative knowledge bases outside of the training data sources before generating responses. Large Language Models (LLMs) are trained on vast amounts of data, using billions of parameters to generate raw outputs for tasks like answering questions, translating languages, and completing sentences. Building on the already powerful capabilities of LLMs, RAG extends their functionality to access internal knowledge bases specific to a domain or organization, all without the need to retrain the model. This is an economical way to improve LLM outputs, keeping them relevant, accurate, and practical in various contexts.
Why is Retrieval-Augmented Generation Important?
LLMs are a key artificial intelligence (AI) technology that powers intelligent chatbots and other natural language processing (NLP) applications. The goal is to create bots that can answer user questions across various environments by cross-referencing authoritative knowledge sources. Unfortunately, the nature of LLM technology introduces unpredictability into LLM responses. Additionally, LLM training data is static, introducing a cutoff date for the knowledge it possesses. Known challenges faced by LLMs include:
-
Providing false information when there is no answer.
-
Providing outdated or generic information when users need specific current responses.
-
Generating responses based on non-authoritative sources.
-
Producing inaccurate responses due to term confusion, where different training sources use the same terms to discuss different things.
You can think of a large language model as an overly enthusiastic new employee who refuses to keep up with current events but always confidently answers every question. Unfortunately, this attitude can negatively impact user trust, which is not what you want your chatbot to emulate! RAG is one approach to addressing some of these challenges. It redirects LLMs to retrieve relevant information from authoritative, pre-determined knowledge sources. Organizations can better control the generated text output, and users can gain insights into how the LLM generates responses.
How Does Retrieval-Augmented Generation Work?
Without RAG, the LLM takes user input and creates responses based on the information it has been trained on or what it already knows. RAG introduces an information retrieval component that extracts information from new data sources based on user input. Both the user query and relevant information are provided to the LLM. The LLM uses the new knowledge along with its training data to create better responses. The following sections outline the process.
Creating External Data
New data outside the original training dataset of the LLM is referred to as external data. It can come from multiple data sources such as APIs, databases, or document repositories. Data may exist in various formats, such as files, database records, or long texts. Another AI technique known as embedding language models converts the data into numerical representations and stores it in a vector database. This process creates a knowledge base that a generative AI model can understand.
Retrieving Relevant Information
The next step is to perform a relevance search. The user query is converted into a vector representation and matched against the vector database. For example, consider an intelligent chatbot that can answer HR questions for an organization. If an employee searches for “How many vacation days do I have?”, the system will retrieve the vacation policy document and the employee’s past vacation records. These specific documents will be returned because they are highly relevant to the employee’s input. Relevance is calculated and established using mathematical vector calculations and representations.
Enhancing LLM Prompts
Next, the RAG model enhances the user input (or prompt) by adding the retrieved relevant data in context. This step utilizes prompt engineering techniques to effectively communicate with the LLM. Enhanced prompts allow the large language model to generate accurate answers to user queries.
Updating External Data
The next question might be—what if the external data is outdated? To maintain current information for retrieval, documents should be updated asynchronously, along with updating the document’s embedding representations. This can be done through automated real-time processes or regular batch updates. This is a common challenge in data analytics—change management can be performed using various data science methods. The diagram below illustrates the conceptual workflow of using RAG with LLMs.
2. What is LangChain?
LangChain is a framework for developing applications powered by language models. It primarily has two capabilities:
-
It can connect LLM models with external data sources.
-
It allows interaction with LLM models.
LLM Model: Large Language Model
3. Code Engineering
Experimental Purpose
Utilize LangChain to implement RAG applications.
pom.xml
Controller
Service
EmbeddingStoreLoggingRetriever
Components
Initializing documents
Initializing LangChain
application.yaml
index.html
Just some key code, for all code please refer to the code repository below.
Code Repository
-
https://github.com/Harries/springboot-demo(dag)
4. Testing
Start the Spring Boot application and visit http://127.0.0.1:8080/
5. References
-
https://github.com/miliariadnane/spring-boot-doc-rag-bot
-
https://aws.amazon.com/cn/what-is/retrieval-augmented-generation/
-
https://github.com/liaokongVFX/LangChain-Chinese-Getting-Started-Guide
-
http://www.liuhaihua.cn/archives/711424.html