Creating a Web Q&A Bot: Implementing RAG with Ollama and Scrapers

Recently, the official Ollama released its Python tool library. This article introduces a guide to integrating Ollama into Python, showcasing how developers can easily utilize AI capabilities.

The previous article discussed how to deploy Ollama’s large model. This time, we will expand on that foundation to create a web Q&A bot.

Integrating the large model into a notebook: Ollama + Open Web UI

Design Idea:

First, we create a new Python environment and install the Ollama Python library.

conda create -n ollama python=3.11
conda activate ollama
pip3 install ollama

Next, we install the necessary dependencies, such as langchain (prompt engineering), beautifulsoup4 (HTML scraping), chromadb (handling word embeddings), and gradio (graphical interface), and then create a Python file.

pip3 install langchain beautifulsoup4 chromadb gradio
# Create Python file
New-Item QAmachine.py

In the file, first import the dependencies and create a web content receiver, splitter, and embedding storage to load, split, and retrieve documents later. Most of this uses langchain’s interface, which is very useful in large model applications. Note that the LLM used here is OpenAI’s “Mistral,” suitable for text processing.

import gradio as gr
import bs4
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
import ollama

# Function to load, split, and retrieve documents
def load_and_retrieve_docs(url):
    # Initialize document loader with given URL
    loader = WebBaseLoader(
        web_paths=(url,),
        bs_kwargs=dict()  # Empty dictionary for BeautifulSoup's initialization parameters
    )
    # Load documents
    docs = loader.load()
    # Initialize document splitter, set chunk size and overlap size
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    # Split documents
    splits = text_splitter.split_documents(docs)
    # Initialize embedding model
    embeddings = OllamaEmbeddings(model="mistral")
    # Create vector store from document splits and embeddings
    vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)
    # Convert vector store to retriever and return
    return vectorstore.as_retriever()

Merge document contents and define the retriever to answer questions from the text content, thus implementing RAG.

# Function to format documents
def format_docs(docs):
    # Merge document contents into a single string and return
    return "\n\n".join(doc.page_content for doc in docs)

# Define RAG chain execution function
def rag_chain(url, question):
    # Load and retrieve documents related to the URL
    retriever = load_and_retrieve_docs(url)
    # Use retriever to fetch documents based on the question
    retrieved_docs = retriever.invoke(question)
    # Format the retrieved documents
    formatted_context = format_docs(retrieved_docs)
    # Construct input prompt for the Ollama model
    formatted_prompt = f"Question: {question}\n\nContext: {formatted_context}"
    # Use Ollama model to generate an answer
    response = ollama.chat(model='mistral', messages=[{'role': 'user', 'content': formatted_prompt}])
    return response['message']['content']

Finally, create a web UI with Gradio and write the launch command.

# Create Gradio interface
iface = gr.Interface(
    fn=rag_chain,
    inputs=["text", "text"],
    outputs="text",
    title="RAG Chain Web Q&A",
    description="Enter URL and get answers from the RAG chain."
)
# Launch command
iface.launch()

After saving the code, run it by entering python + QAmachine.py in the command line, and click the port link to enter the web Q&A bot interface~

Here, I tested with a news report from “China Daily,” and the accuracy of the content was very high, which is very RAG. I will continue to explore more local LLM applications. Bye~

Leave a Comment Cancel reply