Building RAG Q&A System Using LangGraph

In this tutorial, we will learn how to build an intelligent document retrieval system using langgraph. This system can extract information from web pages, perform intelligent segmentation, and achieve precise Q&A functionality through query analysis and vector retrieval.

1. Install Dependencies

pip install beautifulsoup4

2. Import Necessary Libraries

import bs4
from typing import Literal
from typing_extensions import List, TypedDict, Annotated
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate

3. Load Web Content

WebBaseLoader is a powerful web content loader provided by LangChain, and its workflow is as follows:

1. URL Fetching: Use the urllib library to fetch raw HTML content from the specified URL
2. HTML Parsing: Use the BeautifulSoup4 library to parse the HTML content
3. Content Filtering: You can customize the parsing rules through the bs_kwargs parameter

• In our example, we use SoupStrainer("li") to extract only the list item content
• This effectively filters out irrelevant content such as navigation bars and footers from the web page

loader = WebBaseLoader(
    web_paths=("https://github.com/jobbole/awesome-python-cn/blob/master/README.md",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer("li")
    ),
)
docs = loader.load()

4. Intelligent Document Segmentation

The text splitter uses a recursive strategy, with the following steps:

1. Initial Splitting: First, attempt to use the highest level of delimiters (such as newline characters and paragraph markers)
2. Recursive Processing: If the resulting chunks are still too large, continue splitting using secondary delimiters (such as periods and semicolons)
3. Overlap Processing:

• chunk_overlap=200 means that each adjacent chunk shares 200 characters
• This overlap design ensures continuity of context, preventing sentences from being abruptly cut off
• For example, if an important concept spans two chunks, the overlap allows it to be fully captured during retrieval

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

5. Metadata Enhancement

To achieve smarter retrieval, we add location-related metadata to the documents. This metadata can help us filter more precisely during retrieval:

total_documents = len(all_splits)
third = total_documents // 3
for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"

By adding section metadata, we can:

• Perform targeted searches during retrieval
• Search only the content at the beginning, middle, or end of the document
• Improve the accuracy of retrieval

6. Define Query Structure

Use TypedDict to define the data structure for queries to ensure standardization and maintainability:

class Search(TypedDict):
    """Search query."""
    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ..., 
        "Section to query."
    ]

7. Vector Store Setup

InMemoryVectorStore provides efficient vector storage and retrieval capabilities:

• Use OpenAI’s text-embedding-3-large model to convert text into high-dimensional vectors
• Each document chunk will be converted into a unique vector representation

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents=all_splits)

8. Set Up Language Model and Prompt Template

The design of the prompt template considers the following key points:

1. Context Injection:

• Provide the retrieved document content as context to the language model
• Use {context} and {question} placeholders to dynamically insert content

2. Answer Constraints:

• Limit answers to a maximum of three sentences to keep them concise
• Clearly instruct to acknowledge uncertainty if the answer is unknown, avoiding fabricated answers
• Add a fixed closing phrase “thanks for asking!” to maintain a consistent interaction style

llm = ChatOpenAI(model="gpt-4o-mini")

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(template)

9. Build Processing Flow

LangGraph provides a flexible workflow management system that allows us to break complex processing flows into multiple independent steps and coordinate the data flow between these steps through state management.

9.1 State Management

First, we define a TypedDict to manage the state throughout the processing flow:

class State(TypedDict):
    question: str      # User's original question
    query: Search      # Structured query information
    context: List[Document]  # Retrieved relevant documents
    answer: str       # Final answer

This state dictionary contains all the key data in the processing flow:

• question: Stores the user’s original input question
• query: Stores the structured query after analysis (using the previously defined Search type)
• context: Stores the relevant documents retrieved from the vector database
• answer: Stores the final generated answer

9.2 Processing Steps

The processing flow is broken down into three main steps, each of which is an independent function that receives the current state and returns the updated state portion:

1. Query Analysis (analyze_query):

def analyze_query(state: State):
    # Use LLM to convert natural language questions into structured queries
    structured_llm = llm.with_structured_output(Search)
    # Call LLM for structured output
    query = structured_llm.invoke(state["question"])
    # Return the updated state portion
    return {"query": query}

The purpose of this step is:

• Receive the user’s natural language question
• Use LLM to analyze the question and generate a structured query
• Determine which part of the document (beginning, middle, end) the query should search

2. Document Retrieval (retrieve):

def retrieve(state: State):
    query = state["query"]
    # Use vector storage for similarity search
    retrieved_docs = vector_store.similarity_search(
        query["query"],  # Use the query text from the structured query
        filter=lambda doc: doc.metadata.get("section") == query["section"],  # Use metadata filtering
    )
    # Return the retrieved documents
    return {"context": retrieved_docs}

The functionalities of this step include:

• Retrieve the structured query from the state
• Use the query text to search for similar documents in the vector storage
• Filter documents using section metadata
• Return the most relevant document list

3. Answer Generation (generate):

def generate(state: State):
    # Merge the content of the retrieved documents
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    # Use the prompt template to construct the input message
    messages = prompt.invoke({
        "question": state["question"],  # Original question
        "context": docs_content         # Merged document content
    })
    # Use LLM to generate the answer
    response = llm.invoke(messages)
    # Return the generated answer
    return {"answer": response.content}

The processing flow of this step is:

• Merge the content of all retrieved documents into one text
• Use the prompt template to construct an input containing context and question
• Call LLM to generate the final answer
• Return the generated answer text

10. Assemble Processing Graph

Use LangGraph to chain the various processing steps into a directed acyclic graph:

• The output of each step will automatically update the state for use in the next step
• Supports conditional branching and parallel processing (this example uses a simple linear flow)

graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()

11. Usage Example

for message, metadata in graph.stream(
    {"question": "Please list the recommended Python libraries at the end of the article?"}, stream_mode="messages"
):
    print(message.content, end="")
# The recommended Python libraries at the end of the article include: Pyro, PyUserInput, scapy, wifi, Pingo, keyboard, mouse, Python-Future, Six, and modernize. Thanks for asking!

12. Summary

This project demonstrates how to build a complete intelligent document retrieval system using langgraph. The main features of the system include:

1. Intelligent web content extraction
2. Intelligent segmentation and metadata enhancement of documents
3. Vectorized storage and similarity retrieval
4. Intelligent Q&A based on LLM
5. Process-oriented architecture

Through this system, we can easily implement intelligent retrieval and Q&A functionality for large documents. This architecture is not only suitable for web content but can also be extended to other types of document processing scenarios.

13. Complete Code

import bs4
from typing import Literal
from typing_extensions import List, TypedDict, Annotated
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate


# Web content loading 
# WebBaseLoader uses urllib to load HTML from web URLs and parses it into text using BeautifulSoup.
# We can customize the parsing process of HTML to text by passing parameters to the BeautifulSoup parser,
# here we only parse HTML tags with the "li" class
loader = WebBaseLoader(
    web_paths=("https://github.com/jobbole/awesome-python-cn/blob/master/README.md",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer("li")
    ),
)
docs = loader.load()
# print("docs: ", docs)

# Document segmentation
# The recursive character text splitter will recursively split the document using common delimiters (like new lines)
# until each chunk is of appropriate size. This is the recommended text splitter for general text use cases.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Query analysis: Add metadata to documents to transform or build optimized search queries from the original user input

total_documents = len(all_splits)
third = total_documents // 3
for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"
# Define a pattern for our search queries
class Search(TypedDict):
    """Search query."""
    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ..., 
        "Section to query."
    ]


# Document embedding storage
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents=all_splits)

# Large language model
llm = ChatOpenAI(model="gpt-4o-mini")

# Prompt template
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(template)

# Define the state of the graph, including: question, query pattern, context, and answer
class State(TypedDict):
    question: str
    query: Search
    context: List[Document]
    answer: str

# Query analysis step: Extract input question information into the specified query pattern Search

def analyze_query(state: State):
    structured_llm = llm.with_structured_output(Search)
    query = structured_llm.invoke(state["question"])
    return {"query": query}

# Retrieval step: Use the input question for similarity search

def retrieve(state: State):
    query = state["query"]
    retrieved_docs = vector_store.similarity_search(
        query["query"],
        filter=lambda doc: doc.metadata.get("section") == query["section"],
    )
    return {"context": retrieved_docs}

# Generation step: Format the retrieved context and original question into the prompt for the chat model

def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

# Compile the graph
# Connect the retrieval and generation steps into a single sequence
graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()

for message, metadata in graph.stream(
    {"question": "Please list the recommended Python libraries at the end of the article?"}, stream_mode="messages"
):
    print(message.content, end="")
# The recommended Python libraries at the end of the article include: Pyro, PyUserInput, scapy, wifi, Pingo, keyboard, mouse, Python-Future, Six, and modernize. Thanks for asking!