Building RAG Q&A System Using LangGraph

In this tutorial, we will learn how to build an intelligent document retrieval system using langgraph. This system can extract information from web pages, perform intelligent segmentation, and achieve precise Q&A functionality through query analysis and vector retrieval.

1. Install Dependencies

<span>pip install beautifulsoup4</span>

2. Import Necessary Libraries

import bs4
from typing import Literal
from typing_extensions import List, TypedDict, Annotated
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate

3. Load Web Content

WebBaseLoader is a powerful web content loader provided by LangChain, and its workflow is as follows:

  1. 1. URL Fetching: Use the urllib library to fetch raw HTML content from the specified URL
  2. 2. HTML Parsing: Use the BeautifulSoup4 library to parse the HTML content
  3. 3. Content Filtering: You can customize the parsing rules through the <span>bs_kwargs</span> parameter
  • • In our example, we use <span>SoupStrainer("li")</span> to extract only the list item content
  • • This effectively filters out irrelevant content such as navigation bars and footers from the web page
loader = WebBaseLoader(
    web_paths=("https://github.com/jobbole/awesome-python-cn/blob/master/README.md",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer("li")
    ),
)
docs = loader.load()

4. Intelligent Document Segmentation

The text splitter uses a recursive strategy, with the following steps:

  1. 1. Initial Splitting: First, attempt to use the highest level of delimiters (such as newline characters and paragraph markers)
  2. 2. Recursive Processing: If the resulting chunks are still too large, continue splitting using secondary delimiters (such as periods and semicolons)
  3. 3. Overlap Processing:
  • chunk_overlap=200 means that each adjacent chunk shares 200 characters
  • • This overlap design ensures continuity of context, preventing sentences from being abruptly cut off
  • • For example, if an important concept spans two chunks, the overlap allows it to be fully captured during retrieval
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

5. Metadata Enhancement

To achieve smarter retrieval, we add location-related metadata to the documents. This metadata can help us filter more precisely during retrieval:

total_documents = len(all_splits)
third = total_documents // 3
for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"

By adding section metadata, we can:

  • • Perform targeted searches during retrieval
  • • Search only the content at the beginning, middle, or end of the document
  • • Improve the accuracy of retrieval

6. Define Query Structure

Use TypedDict to define the data structure for queries to ensure standardization and maintainability:

class Search(TypedDict):
    """Search query."""
    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ..., 
        "Section to query."
    ]

7. Vector Store Setup

InMemoryVectorStore provides efficient vector storage and retrieval capabilities:

  • • Use OpenAI’s text-embedding-3-large model to convert text into high-dimensional vectors
  • • Each document chunk will be converted into a unique vector representation
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents=all_splits)

8. Set Up Language Model and Prompt Template

The design of the prompt template considers the following key points:

  1. 1. Context Injection:
  • • Provide the retrieved document content as context to the language model
  • • Use {context} and {question} placeholders to dynamically insert content
  • 2. Answer Constraints:
    • • Limit answers to a maximum of three sentences to keep them concise
    • • Clearly instruct to acknowledge uncertainty if the answer is unknown, avoiding fabricated answers
    • • Add a fixed closing phrase “thanks for asking!” to maintain a consistent interaction style
    llm = ChatOpenAI(model="gpt-4o-mini")
    
    template = """Use the following pieces of context to answer the question at the end.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    Use three sentences maximum and keep the answer as concise as possible.
    Always say "thanks for asking!" at the end of the answer.
    
    {context}
    
    Question: {question}
    
    Helpful Answer:"""
    prompt = PromptTemplate.from_template(template)

    9. Build Processing Flow

    LangGraph provides a flexible workflow management system that allows us to break complex processing flows into multiple independent steps and coordinate the data flow between these steps through state management.

    9.1 State Management

    First, we define a TypedDict to manage the state throughout the processing flow:

    class State(TypedDict):
        question: str      # User's original question
        query: Search      # Structured query information
        context: List[Document]  # Retrieved relevant documents
        answer: str       # Final answer

    This state dictionary contains all the key data in the processing flow:

    • • question: Stores the user’s original input question
    • • query: Stores the structured query after analysis (using the previously defined Search type)
    • • context: Stores the relevant documents retrieved from the vector database
    • • answer: Stores the final generated answer

    9.2 Processing Steps

    The processing flow is broken down into three main steps, each of which is an independent function that receives the current state and returns the updated state portion:

    1. 1. Query Analysis (analyze_query):
    def analyze_query(state: State):
        # Use LLM to convert natural language questions into structured queries
        structured_llm = llm.with_structured_output(Search)
        # Call LLM for structured output
        query = structured_llm.invoke(state["question"])
        # Return the updated state portion
        return {"query": query}

    The purpose of this step is:

    • • Receive the user’s natural language question
    • • Use LLM to analyze the question and generate a structured query
    • • Determine which part of the document (beginning, middle, end) the query should search
    1. 2. Document Retrieval (retrieve):
    def retrieve(state: State):
        query = state["query"]
        # Use vector storage for similarity search
        retrieved_docs = vector_store.similarity_search(
            query["query"],  # Use the query text from the structured query
            filter=lambda doc: doc.metadata.get("section") == query["section"],  # Use metadata filtering
        )
        # Return the retrieved documents
        return {"context": retrieved_docs}

    The functionalities of this step include:

    • • Retrieve the structured query from the state
    • • Use the query text to search for similar documents in the vector storage
    • • Filter documents using section metadata
    • • Return the most relevant document list
    1. 3. Answer Generation (generate):
    def generate(state: State):
        # Merge the content of the retrieved documents
        docs_content = "\n\n".join(doc.page_content for doc in state["context"])
        # Use the prompt template to construct the input message
        messages = prompt.invoke({
            "question": state["question"],  # Original question
            "context": docs_content         # Merged document content
        })
        # Use LLM to generate the answer
        response = llm.invoke(messages)
        # Return the generated answer
        return {"answer": response.content}

    The processing flow of this step is:

    • • Merge the content of all retrieved documents into one text
    • • Use the prompt template to construct an input containing context and question
    • • Call LLM to generate the final answer
    • • Return the generated answer text

    10. Assemble Processing Graph

    Use LangGraph to chain the various processing steps into a directed acyclic graph:

    • • The output of each step will automatically update the state for use in the next step
    • • Supports conditional branching and parallel processing (this example uses a simple linear flow)
    graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
    graph_builder.add_edge(START, "analyze_query")
    graph = graph_builder.compile()

    11. Usage Example

    for message, metadata in graph.stream(
        {"question": "Please list the recommended Python libraries at the end of the article?"}, stream_mode="messages"
    ):
        print(message.content, end="")
    # The recommended Python libraries at the end of the article include: Pyro, PyUserInput, scapy, wifi, Pingo, keyboard, mouse, Python-Future, Six, and modernize. Thanks for asking!

    12. Summary

    This project demonstrates how to build a complete intelligent document retrieval system using langgraph. The main features of the system include:

    1. 1. Intelligent web content extraction
    2. 2. Intelligent segmentation and metadata enhancement of documents
    3. 3. Vectorized storage and similarity retrieval
    4. 4. Intelligent Q&A based on LLM
    5. 5. Process-oriented architecture

    Through this system, we can easily implement intelligent retrieval and Q&A functionality for large documents. This architecture is not only suitable for web content but can also be extended to other types of document processing scenarios.

    13. Complete Code

    import bs4
    from typing import Literal
    from typing_extensions import List, TypedDict, Annotated
    from langchain_openai import ChatOpenAI
    from langchain_openai import OpenAIEmbeddings
    from langchain_core.vectorstores import InMemoryVectorStore
    from langchain import hub
    from langchain_community.document_loaders import WebBaseLoader
    from langchain_core.documents import Document
    from langchain_text_splitters import RecursiveCharacterTextSplitter
    from langgraph.graph import START, StateGraph
    from langchain_core.prompts import PromptTemplate
    
    
    # Web content loading 
    # WebBaseLoader uses urllib to load HTML from web URLs and parses it into text using BeautifulSoup.
    # We can customize the parsing process of HTML to text by passing parameters to the BeautifulSoup parser,
    # here we only parse HTML tags with the "li" class
    loader = WebBaseLoader(
        web_paths=("https://github.com/jobbole/awesome-python-cn/blob/master/README.md",),
        bs_kwargs=dict(
            parse_only=bs4.SoupStrainer("li")
        ),
    )
    docs = loader.load()
    # print("docs: ", docs)
    
    # Document segmentation
    # The recursive character text splitter will recursively split the document using common delimiters (like new lines)
    # until each chunk is of appropriate size. This is the recommended text splitter for general text use cases.
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    all_splits = text_splitter.split_documents(docs)
    
    # Query analysis: Add metadata to documents to transform or build optimized search queries from the original user input
    
    total_documents = len(all_splits)
    third = total_documents // 3
    for i, document in enumerate(all_splits):
        if i < third:
            document.metadata["section"] = "beginning"
        elif i < 2 * third:
            document.metadata["section"] = "middle"
        else:
            document.metadata["section"] = "end"
    # Define a pattern for our search queries
    class Search(TypedDict):
        """Search query."""
        query: Annotated[str, ..., "Search query to run."]
        section: Annotated[
            Literal["beginning", "middle", "end"],
            ..., 
            "Section to query."
        ]
    
    
    # Document embedding storage
    embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
    vector_store = InMemoryVectorStore(embeddings)
    _ = vector_store.add_documents(documents=all_splits)
    
    # Large language model
    llm = ChatOpenAI(model="gpt-4o-mini")
    
    # Prompt template
    template = """Use the following pieces of context to answer the question at the end.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    Use three sentences maximum and keep the answer as concise as possible.
    Always say "thanks for asking!" at the end of the answer.
    
    {context}
    
    Question: {question}
    
    Helpful Answer:"""
    prompt = PromptTemplate.from_template(template)
    
    # Define the state of the graph, including: question, query pattern, context, and answer
    class State(TypedDict):
        question: str
        query: Search
        context: List[Document]
        answer: str
    
    # Query analysis step: Extract input question information into the specified query pattern Search
    
    def analyze_query(state: State):
        structured_llm = llm.with_structured_output(Search)
        query = structured_llm.invoke(state["question"])
        return {"query": query}
    
    # Retrieval step: Use the input question for similarity search
    
    def retrieve(state: State):
        query = state["query"]
        retrieved_docs = vector_store.similarity_search(
            query["query"],
            filter=lambda doc: doc.metadata.get("section") == query["section"],
        )
        return {"context": retrieved_docs}
    
    # Generation step: Format the retrieved context and original question into the prompt for the chat model
    
    def generate(state: State):
        docs_content = "\n\n".join(doc.page_content for doc in state["context"])
        messages = prompt.invoke({"question": state["question"], "context": docs_content})
        response = llm.invoke(messages)
        return {"answer": response.content}
    
    # Compile the graph
    # Connect the retrieval and generation steps into a single sequence
    graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
    graph_builder.add_edge(START, "analyze_query")
    graph = graph_builder.compile()
    
    for message, metadata in graph.stream(
        {"question": "Please list the recommended Python libraries at the end of the article?"}, stream_mode="messages"
    ):
        print(message.content, end="")
    # The recommended Python libraries at the end of the article include: Pyro, PyUserInput, scapy, wifi, Pingo, keyboard, mouse, Python-Future, Six, and modernize. Thanks for asking!

    Recommended Reading

    • FastAPI Beginner Series Collection

    • Django Beginner Series Collection

    • Flask Tutorial Series Collection

    • tkinter Tutorial Series Collection

    • Flet Tutorial Series Collection

    Please open in WeChat client

    Leave a Comment