Practical Programming with Local Large Models (20): Implementing RAG with LangGraph and Agents (4)

In the previous article, we practiced a [RAG (Retrieval Augmented Generation) system implemented with `langgraph`]. This article will build upon that by adding an automatic chat history logging feature. Additionally, we will use an `Agent` to achieve almost the same functionality. Let’s explore the differences between implementing the `RAG system` using `langgraph` and `Agent`.

– The `LangGraph` chain structure we are building is shown in the figure below:

Practical Programming with Local Large Models (20): Implementing RAG with LangGraph and Agents (4)

As shown in the figure, query_or_respond is a conditional node that determines whether to retrieve from the vector knowledge base based on whether it can generate tool calls from the user’s question: If tool calls is empty, it is directly processed by the large language model; otherwise, it calls tools to perform the retrieval.

– The structure of the agent that achieves similar functionality is shown in the figure below:

Practical Programming with Local Large Models (20): Implementing RAG with LangGraph and Agents (4)

We can intuitively find that: using an `Agent` is simpler.

> Using `qwen2.5`, `deepseek`, and `llama3.1` for experiments, with `shaw/dmeta-embedding-zh` for Chinese embedding and retrieval.

Preparation

Before we start coding, we need to prepare the programming environment.

1. Computer

All the code involved in this article can be executed in an environment without GPU memory. My machine configuration is:

– CPU: Intel i5-8400 2.80GHz

– Memory: 16GB

2. Visual Studio Code and venv

This is a very popular development tool, and the code related to this article can be developed and debugged in `Visual Studio Code`. We use `python`’s `venv` to create a virtual environment, see:

[Configuring venv in Visual Studio Code]

(http://wfcoding.com/articles/practice/0101%E5%9C%A8visual-studio-code%E4%B8%AD%E9%85%8D%E7%BD%AEvenv/).

3. Ollama

Deploying local large models on the `Ollama` platform is very convenient. Based on this platform, we can use various local large models such as `llama3.1`, `qwen2.5`, `deepseek`, etc. with `langchain`. See:

[Using locally deployed llama3.1 model in langchain]

(http://wfcoding.com/articles/practice/0102%E5%9C%A8langchian%E4%B8%AD%E4%BD%BF%E7%94%A8%E6%9C%AC%E5%9C%B0%E9%83%A8%E7%BD%B2%E7%9A%84llama3%E5%A4%A7%E6%A8%A1%E5%9E%8B/).

Managing the `state` of chat history

In a production environment, Q&A applications typically save chat history to a database and can read and update it appropriately.

`LangGraph` implements a built-in persistence layer to support chat applications with multiple dialogue rounds.

To manage multiple dialogue rounds and threads, all we need to do is specify a `checkpointer` when compiling the application. The nodes in the figure will append messages to the `state`.

We will use a simple MemorySaver to retain chat history in memory; of course, a database (e.g., SQLite or Postgres) can also be used for persistent storage.

The chain built in this article is almost identical to the code in the previous article: [RAG (Retrieval Augmented Generation) implemented with `langgraph` (3)](http://wfcoding.com/articles/practice/0319/), with the only difference being:

graph = graph_builder.compile()

changed to:

memory = MemorySaver()graph = graph_builder.compile(checkpointer=memory)

Testing the `LangGraph` chain

We will test the defined `LangGraph` chain with different models. First, we define a test method:

def ask_with_history(graph,thread_id,question):    """Ask a question and record chat history"""
    print('---ask_with_history---')    conf = {"configurable": {"thread_id": thread_id}}    for step in graph.stream(        {"messages": [{"role": "user", "content": question}]},            stream_mode="values",            config = conf,        ):            step["messages"][-1].pretty_print()
def test_model(llm_model_name):    """Test the large language model"""
    print(f'------{llm_model_name}------')
    question1 = "What is the scientific name of sheep?"    question2 = "What are its characteristics?"    thread_id = "liu2233"
    graph = build_graph_with_memory(llm_model_name)    ask_with_history(graph,thread_id,question1)    ask_with_history(graph,thread_id,question2)

`qwen2.5`

– Question 1: “What is the scientific name of sheep?”

================================ Human Message =================================
What is the scientific name of sheep?================================== Ai Message ==================================Tool Calls:  retrieve (3bd81e7c-75b9-4c30-be0c-fedb8c0f18a4) Call ID: 3bd81e7c-75b9-4c30-be0c-fedb8c0f18a4  Args:    query: What is the scientific name of sheep?start retrieve: What is the scientific name of sheep?================================= Tool Message =================================Name: retrieve
Source: {'row': 4, 'source': 'D:\project\programming-with-local-large-language-model\server\services\practice\assert/animals.csv'}Content: Name: SheepScientific Name: Ovis ariesCharacteristics: Gentle, easy to raise, wool and milk have great contributions to humansRole: Wool (clothing), milk (dairy products), mutton (food source)================================== Ai Message ==================================
The scientific name of sheep is Ovis aries.

Everything went as expected: The large language model used query_or_respond to infer the tool calls, then called the retrieval tool to return the most similar result, and finally called the generate method to organize these messages, forming a prompt to be processed by the large language model.

Perfect!

– Question 2: “What are its characteristics?”

================================ Human Message =================================
What are its characteristics?================================== Ai Message ==================================
The characteristics of sheep include being gentle, easy to raise, and wool and milk have great contributions to humans.

As we can see, this time there was no call to the retrieval method because the large language model, when calling query_or_respond, intelligently performed anaphora resolution based on the message records in the `state`: it inferred that it refers to sheep and directly deduced the response based on the retrieval result from the previous conversation. At this point, the tool calls were empty, and the returned content contained the inferred result here.

Perfect!

`llama3.1`

– Question 1: “What is the scientific name of sheep?”

It may have issues processing Chinese; the inferred tool calls were:

================================== Ai Message ==================================Tool Calls:  retrieve (9522bb3e-d961-4022-90c7-2ce3669efe71) Call ID: 9522bb3e-d961-4022-90c7-2ce3669efe71  Args:    query: What is the scientific name of sheep?, Ovis aries

Clearly, the content of this query is incorrect.

`MFDoom/deepseek-r1-tool-calling:7b`

– Question 1: “What is the scientific name of sheep?”

================================ Human Message =================================
What is the scientific name of sheep?================================== Ai Message ==================================Tool Calls:  retrieve (1893c5a2-b132-4682-8bcd-95d776855bc1) Call ID: 1893c5a2-b132-4682-8bcd-95d776855bc1  Args:    query: What is the scientific name of sheep?start retrieve: What is the scientific name of sheep?================================= Tool Message =================================Name: retrieve
Source: {'row': 4, 'source': 'D:\project\programming-with-local-large-language-model\server\services\practice\assert/animals.csv'}Content: Name: SheepScientific Name: Ovis ariesCharacteristics: Gentle, easy to raise, wool and milk have great contributions to humansRole: Wool (clothing), milk (dairy products), mutton (food source)================================== Ai Message ==================================
<think>Okay, I need to answer the user's question: "What is the scientific name of sheep?" Based on the provided context, it is clearly mentioned in the content: "Scientific Name: Ovis aries". Therefore, I can directly quote this information to answer. Additionally, considering that the user may not be familiar with the scientific name, I should ensure the answer is concise and clear, so my response should be: "The scientific name of sheep is Ovis aries."</think>
The scientific name of sheep is Ovis aries.

Perfect!

– Question 2: “What are its characteristics?”

================================ Human Message =================================
What are its characteristics?================================== Ai Message ==================================Tool Calls:  retrieve (8768eef4-0379-4723-88f7-9c792fd58460) Call ID: 8768eef4-0379-4723-88f7-9c792fd58460  Args:    query: What are its characteristics?start retrieve: What are its characteristics?================================= Tool Message =================================Name: retrieve
Source: {'row': 4, 'source': 'D:\project\programming-with-local-large-language-model\server\services\practice\assert/animals.csv'}Content: Name: SheepScientific Name: Ovis ariesCharacteristics: Gentle, easy to raise, wool and milk have great contributions to humansRole: Wool (clothing), milk (dairy products), mutton (food source)================================== Ai Message ==================================
In calling the <strong>query_or_respond</strong> method, it also correctly completed the anaphora resolution and determined good retrieval parameters. However, it did perform a retrieval again.<p><span><span>Very well!</span></span></p><p><span><span>Replacing with an agent</span></span></p><p><span><span>Agents utilize the reasoning capabilities of <strong>LLMs</strong> to make decisions during execution. Using an agent can relieve you of additional judgment during the retrieval process.</span></span></p><p><span><span>While their behavior is harder to predict than the aforementioned "chain", they can perform multiple retrieval steps to handle queries or iterate within a single search.</span></span></p><p><span><span>Next, we will use a `ReAct (Reasoning + Acting)` agent to implement similar functionality.</span></span></p><p><span><span>> The `ReAct Agent` combines <strong>Reasoning</strong> and <strong>Acting</strong> to allow the agent to think and execute tasks more flexibly.</span></span></p><p><span><span>For more information, see: [ReACT Agent Model](https://klu.ai/glossary/react-agent-model)</span></span></p><section><pre><code class="language-python">def create_agent(llm_model_name):    """Create an agent"""
    llm = ChatOllama(model=llm_model_name,temperature=0, verbose=True)    memory = MemorySaver()    agent_executor = create_react_agent(llm, tools=[retrieve], checkpointer=memory)    return agent_executor

It looks much simpler; aside from telling the agent which tools to use, the specifics are left to the agent to determine.

Now let’s redefine the test method:

def ask_agent(agent,thread_id,question):    """Consult the agent"""
    print('---ask_agent---')    conf = {"configurable": {"thread_id": thread_id}}    for step in agent.stream(        {"messages": [{"role": "user", "content": question}]},        stream_mode="values",        config=conf,    ):        step["messages"][-1].pretty_print()
def test_model(llm_model_name):    """Test the large language model"""
    print(f'------{llm_model_name}------')
    question1 = "What is the scientific name of sheep?"    question2 = "What are its characteristics?"    thread_id = "liu2233"
    agent = create_agent(llm_model_name)    ask_agent(agent,thread_id,question1)    ask_agent(agent,thread_id,question2)

Our questions remain unchanged; now let’s see how the large models perform:

`qwen2.5`

– Question 1: “What is the scientific name of sheep?”

================================ Human Message =================================
What is the scientific name of sheep?================================== Ai Message ==================================Tool Calls:  retrieve (0bd293dc-ef9e-402d-b2c0-4b138c3bff38) Call ID: 0bd293dc-ef9e-402d-b2c0-4b138c3bff38  Args:    query: What is the scientific name of sheep?start retrieve: What is the scientific name of sheep?================================= Tool Message =================================Name: retrieve
Source: {'row': 4, 'source': 'D:\project\programming-with-local-large-language-model\server\services\practice\assert/animals.csv'}Content: Name: SheepScientific Name: Ovis ariesCharacteristics: Gentle, easy to raise, wool and milk have great contributions to humansRole: Wool (clothing), milk (dairy products), mutton (food source)================================== Ai Message ==================================
The scientific name of sheep is Ovis aries.

`qwen2.5` called `retrieve` once and then properly answered the question.

Perfect!

– Question 2: “What are its characteristics?”

================================ Human Message =================================
What are its characteristics?================================== Ai Message ==================================
The characteristics of sheep include being gentle, easy to raise, and wool and milk have great contributions to humans.

This time, the large model did not call `retrieve` again and directly provided the answer.

Perfect!

`llama3.1`

– Question 1: “What is the scientific name of sheep?”

================================ Human Message =================================
What is the scientific name of sheep?================================== Ai Message ==================================Tool Calls:  retrieve (d26c50f4-91bd-4839-a0dd-edd1dd42b956) Call ID: d26c50f4-91bd-4839-a0dd-edd1dd42b956  Args:    query: What is the scientific name of sheep?, Ovis ariesstart retrieve: What is the scientific name of sheep?, Ovis aries================================= Tool Message =================================Name: retrieve
Sorry, I cannot find any relevant information.================================== Ai Message ==================================
Based on the tool call results, the answer is:
The scientific name of sheep is Ovis aries.

The large model inferred incorrect retrieval parameters.

– Question 2: “What are its characteristics?”

================================ Human Message =================================
What are its characteristics?================================== Ai Message ==================================Tool Calls:  retrieve (67816d7f-664d-49c3-ac07-21d293b0c8b3) Call ID: 67816d7f-664d-49c3-ac07-21d293b0c8b3  Args:    query: What are its characteristics?start retrieve: What are its characteristics?================================= Tool Message =================================Name: retrieve
Source: {'row': 4, 'source': 'D:\project\programming-with-local-large-language-model\server\services\practice\assert/animals.csv'}Content: Name: SheepScientific Name: Ovis ariesCharacteristics: Gentle, easy to raise, wool and milk have great contributions to humansRole: Wool (clothing), milk (dairy products), mutton (food source)================================== Ai Message ==================================
Based on the tool call results, the answer is:
The characteristics of sheep include being gentle, easy to raise, and providing important resources such as wool, milk, and mutton.

This time the agent correctly performed anaphora resolution and determined decent retrieval parameters.

Very good!

`MFDoom/deepseek-r1-tool-calling:7b`

– Question 1: “What is the scientific name of sheep?”

This time I did not finish executing because during the process, many identical tool calls were generated, and many retrievals were repeated, feeling a bit stuck in a loop:

...================================== Ai Message ==================================Tool Calls:  retrieve (a799739e-21ed-4e92-ae14-c97bb0ab5f23) Call ID: a799739e-21ed-4e92-ae14-c97bb0ab5f23  Args:    query: What is the scientific name of sheep?  retrieve (f45032aa-ec11-4201-a35f-606a1a6f59e7) Call ID: f45032aa-ec11-4201-a35f-606a1a6f59e7  Args:    query: What is the scientific name of sheep?  retrieve (a0611f8d-dd8f-4d06-99cf-73c99da14073) Call ID: a0611f8d-dd8f-4d06-99cf-73c99da14073  Args:    query: What is the scientific name of sheep?start retrieve: What is the scientific name of sheep?start retrieve: What is the scientific name of sheep?start retrieve: What is the scientific name of sheep?...

Summary

We implemented a simple `RAG` system using both the `langgraph` chain and the `agent`, achieving basic state management functionality.

Clearly, the code implemented with the `reAct` agent is more insightful, but the intermediate logic judgment can only be left to the agent, like a black box; whereas implementing through `langgraph` allows for better control of details, like a white box.

It seems that `qwen2.5` is the most reliable.

> If you want to implement a complete `RAG` system with both frontend and backend, [building a RAG system from scratch with langchain, local large models, and local vector databases] might help you get started.

(http://wfcoding.com/articles/practice/01%E4%BB%8E%E9%9B%B6%E6%90%AD%E5%BB%BAlangchain+%E6%9C%AC%E5%9C%B0%E5%A4%A7%E6%A8%A1%E5%9E%8B+%E6%9C%AC%E5%9C%B0%E7%9F%A2%E9%87%8F%E6%95%B0%E6%8D%AE%E5%BA%93%E7%9A%84rag%E7%B3%BB%E7%BB%9F/)

Code

All the code and related resources involved in this article have been shared, see:

– [github]

(https://github.com/liupras/Practical-local-LLM-programming/tree/main/server/services/practice)

– [gitee]

(https://gitee.com/liupras/programming-with-local-large-language-model/tree/master/server/services/practice)

> To make it easier to find the code, the file names of the programs start with the same number as the document number of this series of articles.

πŸͺ Thank you for watching, good luck πŸͺ

Leave a Comment