Building a PDF Q&A Application Using OLLama and LangChain

Yesterday, I wrote about How OLLama Builds a Knowledge Base for Q&A Using Existing Documents? OLLama + LangChain to Assist You. There was a streamlit related to the graphical interface for the OLLama knowledge base Q&A that I didn’t write about because the code didn’t run, so today I will supplement that part. Finally, there is a section on how to install the OLLama model offline, which you can refer to at the end if needed.

We will build a simpler application similar to chatPDF where users can upload PDF files and ask simple questions based on the existing large model along with the submitted documents. This primarily uses LangChain, OLLama, and Streamlit. The code is not complex; we will use embedding to split the documents, convert them into vectors, and store them in a Chroma database.

1. Running OLLama

Install the appropriate version of OLLama from the official website and start the service in the background. Enter http://localhost:11434/ in your browser; if you see Ollama is running, it indicates that the background service is active. You can also manually start the OLLama service. Enter ollama serve in the command line, and do not close this command window as it will display some backend information.

Personally, I feel that the service through this window is slightly more efficient than the service from the icon in the bottom right corner, as it supports gpu better, just my personal feeling. When using the icon service in the bottom right corner, the CPU is fully utilized, causing freezing, while running the service in the command line reduces CPU usage and increases GPU usage.

To pull Mistral-7B, enter the following command in the command line:

ollama pull mistral

To check if the pull was successful, you can use:

ollama list

to view: Building a PDF Q&A Application Using OLLama and LangChain

2. Building RAG

Retrieval-Augmented Generation (RAG) is a technique that uses information from private or proprietary data sources to assist in text generation. It combines a retrieval model (designed to search large datasets or knowledge bases) with a generative model (such as a large language model (LLM), which generates readable text responses based on retrieved information).

The process is summarized in this diagram: Building a PDF Q&A Application Using OLLama and LangChain

First, it splits the documents into smaller chunks to fit the token limit of the LLM; second, it uses embeddings to vectorize these chunks and store them in Chroma. This method processes user queries. Users can ask a question, and then Retrieval uses vector similarity search techniques to retrieve relevant context (document chunks). Based on the user’s question and the retrieved context, we can write a prompt and request the LLM server for a prediction.

Loading PDF Files

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("./you_dont_know_javascript_volume1.pdf")
pages = loader.load_and_split()
print(pages)

(https://stackoverflow.com/questions/76431655/langchain-pypdfloader)

After loading the PDF file, we will still need to split it and convert it into vectors, storing them in the database. The steps and code are the same as yesterday, just with a simple modification to the data source. You can load existing PDFs. Building a PDF Q&A Application Using OLLama and LangChain

Loading Files with Streamlit

Streamlit is an open-source Python library for creating data applications. It allows developers to quickly build interactive data science and machine learning applications using Python without writing a lot of front-end code. With Streamlit, developers can easily integrate data processing, visualization, machine learning model deployment, and other functions into a user-friendly web application.

Streamlit provides a simple API, allowing interactive components such as sliders, dropdown menus, checkboxes, and data visualization components like charts and maps to be created with just a few lines of Python code. Streamlit automatically converts the code into a web application that users can access through their browser.

Using Streamlit to upload a PDF file:

import streamlit as st
from PyPDF2 import PdfReader

uploaded_file = st.file_uploader("Upload your PDF")
if uploaded_file is not None:
   reader = PdfReader(uploaded_file)

Building a PDF Q&A Application Using OLLama and LangChainBuilding a PDF Q&A Application Using OLLama and LangChain

model_local = ChatOllama(model="mistral")
uploaded_file = st.file_uploader("Upload your PDF")
docs = []
if uploaded_file is not None:
    reader = PdfReader(uploaded_file)
    i = 1
    for page in reader.pages:
        docs.append(Document(page_content=page.extract_text(), metadata={'page':i}))
        i += 1

Add a sidebar in Streamlit:

with st.sidebar:
    st.title('🤗💬 PDF Chat App')
    st.markdown('## About')
    st.markdown('This app is an LLM-powered chatbot built using:')
    st.markdown('- [Streamlit](https://streamlit.io/)')
    st.markdown('- [LangChain](https://python.langchain.com/)')
    st.markdown('- [OpenAI](https://platform.openai.com/docs/models) LLM model')
    #add_vertical_space(5)
    st.write('Made with Feng Ge Python Notes')
Building a PDF Q&A Application Using OLLama and LangChain

When running the Streamlit code, you need to enter streamlit run app.py in the command line, following a similar format.

3. Final Interactive Interface

Building a PDF Q&A Application Using OLLama and LangChain

4. Code

class ChatPDF:
    vector_store = None
    retriever = None
    chain = None

    def __init__(self):
        self.model = ChatOllama(model="mistral")
        self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=100)
        self.prompt = PromptTemplate.from_template(
            """
            <s> [INST] You are an assistant for question-answering tasks. Use the following pieces of retrieved context 
            to answer the question. If you don't know the answer, just say that you don't know. Use three sentences
             maximum and keep the answer concise. [/INST] </s> 
            [INST] Question: {question} 
            Context: {context} 
            Answer: [/INST]
            """
        )

    def ingest(self, pdf_file_path: str):
        docs = PyPDFLoader(file_path=pdf_file_path).load()
        chunks = self.text_splitter.split_documents(docs)
        chunks = filter_complex_metadata(chunks)

        vector_store = Chroma.from_documents(documents=chunks, embedding=FastEmbedEmbeddings())
        self.retriever = vector_store.as_retriever(
            search_type="similarity_score_threshold",
            search_kwargs={
                "k": 3,
                "score_threshold": 0.5,
            },
        )

        self.chain = ({"context": self.retriever, "question": RunnablePassthrough()}
                      | self.prompt
                      | self.model
                      | StrOutputParser())

    def ask(self, query: str):
        if not self.chain:
            return "Please, add a PDF document first."

        return self.chain.invoke(query)

st.set_page_config(page_title="ChatPDF")
with st.sidebar:
    st.title('🤗💬 PDF Chat App')
    st.markdown('## About')
    st.markdown('This app is an LLM-powered chatbot built using:')
    st.markdown('- [Streamlit](https://streamlit.io/)')
    st.markdown('- [LangChain](https://python.langchain.com/)')
    st.markdown('- [OpenAI](https://platform.openai.com/docs/models) LLM model')
    #add_vertical_space(5)
    st.write('Made with Feng Ge Python Notes')


def display_messages():
    st.subheader("Chat")
    for i, (msg, is_user) in enumerate(st.session_state["messages"]):
        message(msg, is_user=is_user, key=str(i))
    st.session_state["thinking_spinner"] = st.empty()


def process_input():
    if st.session_state["user_input"] and len(st.session_state["user_input"].strip()) > 0:
        user_text = st.session_state["user_input"].strip()
        with st.session_state["thinking_spinner"], st.spinner(f"Thinking"):
            agent_text = st.session_state["assistant"].ask(user_text)

        st.session_state["messages"].append((user_text, True))
        st.session_state["messages"].append((agent_text, False))


def read_and_save_file():
    st.session_state["assistant"].clear()
    st.session_state["messages"] = []
    st.session_state["user_input"] = ""

    for file in st.session_state["file_uploader"]:
        with tempfile.NamedTemporaryFile(delete=False) as tf:
            tf.write(file.getbuffer())
            file_path = tf.name

        with st.session_state["ingestion_spinner"], st.spinner(f"Ingesting {file.name}"):
            st.session_state["assistant"].ingest(file_path)
        os.remove(file_path)


def main():
    if len(st.session_state) == 0:
        st.session_state["messages"] = []
        st.session_state["assistant"] = ChatPDF()

    st.header("ChatPDF")

    st.subheader("Upload a document")
    st.file_uploader(
        "Upload document",
        type=["pdf"],
        key="file_uploader",
        on_change=read_and_save_file,
        label_visibility="collapsed",
        accept_multiple_files=True,
    )

    st.session_state["ingestion_spinner"] = st.empty()

    display_messages()
    st.text_input("Message", key="user_input", on_change=process_input)

if __name__ == "__main__":
    main()

This segment of code uses the Streamlit library to build the user interface and processes uploaded PDF files and generates chat messages by invoking methods from the ChatPDF class. Users can input messages in a text box, and the program will generate responses based on the input and display them on the page.

  1. Using st.set_page_config to set the page configuration, including the page title as “ChatPDF”.
  2. Using st.sidebar to create a sidebar that displays the application title and about information.
  3. Defining the display_messages function to show chat messages, and the process_input function to handle user input and generate reply messages.
  4. Defining the read_and_save_file function to read and save uploaded PDF files, and invoking the ingest method from the ChatPDF class to process the document.
  5. Defining the main function as the main page of the application, which includes uploading PDF documents, displaying chat messages, and user input sections.

In Streamlit, st.session_state is a dictionary used to store session state data for the application. Session state data refers to data related to a specific user session, which persists across different requests and remains available throughout the user session. Using st.session_state allows sharing data between different parts of the Streamlit application and maintaining data state across multiple user requests. This is useful for tracking user input, storing application configuration options, or preserving user session states. st.session_state["message"] manages user input messages and returned messages, while st.session_state["assistant"] is an object that handles ollama, including operations like uploading PDFs, invoking models, splitting, converting to vectors, and storing in the database. message is used to display the conversation process and show st.session_state["message"].

In Streamlit, st.spinner(f"Thinking") is a widget used to display a loading state or progress indication during long-running tasks. It informs users that the application is processing certain tasks, enhancing user experience by making them feel that the application is running.

5. Code Acquisition

Reply with ‘streamlit-chat’ in the backend to get the relevant example code.

6. Using OLLama with Self-Downloaded Models

Someone in the group asked how to use models downloaded from huggingface in .gguf format. After searching, I found the method:

1. Manual Download

Download the file mistrallite.Q4_K_M.gguf to your local machine; it can be any other file, as long as it is a model supported by ollama.

2. Create a Modelfile

Write the correct path to this file in the Modelfile.

FROM ./mistrallite.Q4_K_M.gguf

3. Create a New Model

Create a new model using the Modelfile:

ollama create mistrallite -f Modelfile

The -f option is followed by this Modelfile, and the command above creates a model named mistrallite, which can be run using ollama run mistrallite to ask questions.

Building a PDF Q&A Application Using OLLama and LangChain

If the network is unstable, you can try this method to manually install the model.

7. Learning and Exchange Matrix

Two Python learning groups have been formed with very good learning ecology. Now, various large models are emerging, chatgpt4 is already impressive, and now there is claude3, along with open-source large models like ollama, which can run large models on consumer-level machines. Yesterday, Boss Li said, “In the future, there will be no such profession as a programmer.” Everyone will have programming skills. Just a few days ago, Huang Boss from Nvidia said something similar: leave programming to AI, but where does your backend support come from? Let’s learn about large models together; I want to form a large model exchange group. If you are interested in large models, add me on WeChat, and I will add you to the large model group. Please note when adding WeChat. Yesterday alone, more than 30 people joined the group. I will distribute some benefits to the group, introducing a useful tool that integrates chatgpt, claude, bard, and other large models; reply with large model in the backend to get this tool.

I’ve been forming the PyQt6 Learning Exchange Group 1 for almost a year, and everyone actively exchanges ideas. Now there are over 400 participants in the group, with many experts and industry leaders, forming a good Python PyQt6 exchange ecology that has benefited me greatly. The group is relatively loose, and open access can become chaotic, so now entry is by invitation only. Now forming PyQt6 Learning Exchange Group 2. If you need to join, follow the public account below and enter the group according to the instructions. The group will periodically distribute learning materials, code, videos, etc. Additionally, if my writing helps you, remember to like, share, and click to read.

Building a PDF Q&A Application Using OLLama and LangChain

Recently, I have also been learning JavaScript and want to form a group for joint study; those interested can join the discussion. Just scan the code to enter the group.

Building a PDF Q&A Application Using OLLama and LangChain
  1. Evening of March 13, 2024

Leave a Comment