Dissecting the Design and Technical Details of Multi-Agent Projects Based on LangGraph

Andrew Ng recently said in his public lecture: I believe that AI Agent workflows will drive significant advances in artificial intelligence this year, possibly surpassing the next-generation foundational models. This is an important trend, and I urge everyone working in AI to pay attention to it. Coupled with the four paradigms for implementing Agent workflows that he shared at his Sequoia AI Summit last month, it is clear that Ng has been focusing on the development of Agents.

Let me explain these four paradigms:

Reflection: The LLM reflects on its work and identifies ways to improve.
Tool Use: The LLM is equipped with tools such as web search, code execution, or any other function to help it gather information, take action, or process data.
Planning: The LLM proposes and executes a multi-step plan to achieve a goal (e.g., drafting an outline for an article, conducting online research, then writing a draft, etc.).
Multi-agent Collaboration: Multiple AI Agents work together, breaking down tasks, discussing and debating ideas, and proposing better solutions than a single agent could.

Purpose

In this article, we will dissect a project on GitHub called GPT-researcher (https://github.com/assafelovic/gpt-researcher). This project collaborates through multiple Agents to search for information online, analyze, and write based on user-provided topics, ultimately outputting several pages of research reports in PDF or Markdown format. This project is also inspired by Stanford University’s STORM research paper, which can be referenced at https://arxiv.org/abs/2402.14207.

The overall execution process involves the four paradigms mentioned by Ng: multi-agent collaboration, plan-and-execute, tool usage (online information searching), and reflection.

The collaboration of multiple Agents is implemented based on LangGraph, which is an extension of LangChain. I will discuss this later, and I also want to critique LangGraph.

Multi-Agent Collaboration Architecture

The project has a core module called gpt-researcher, which functions to allow the LLM to choose a suitable Agent based on the topic provided by the user. It generates a plan based on the topic and then creates sub-queries for multiple queries based on that plan. Each sub-query searches for information online, organizes it, and provides sources. Finally, it summarizes and formats the outputs of all sub-queries into a report. This is a function that the author implemented early on, and I will explain it in detail later. This article mainly introduces multi-agent collaboration based on LangGraph.

It mainly includes seven Agents simulating a research team collaboration:

Chief Editor: The Chief Editor supervises the research process and manages the team. This is the “master” Agent that coordinates other Agents using LangGraph.
Researcher: The Researcher uses the core module gpt-researcher to conduct in-depth research on the given topic.
Editor: The Editor is responsible for planning the research outline and structure.
Reviewer: The Reviewer verifies the correctness of research results based on a set of standards.
Revisor: The Revisor revises the research results based on feedback from the Reviewer.
Writer: The Writer is responsible for drafting the final report.
Publisher: The Publisher is responsible for publishing the final report in various formats such as PDF, WORD, MD.

Flowchart

Main Process:

The user provides a task, including the topic query, maximum sections, and other requirements.
The Chief Editor, the master Agent, receives the task, initializes the process executor (Executor, a LangGraph process), and assigns the task to the Browser.
The Browser in the above diagram is also the Researcher Agent, which conducts preliminary research on the internet based on the given research task.
The Editor, a complex Agent, plans the report outline and structure based on preliminary research, executing Plan-and-Execute. The Editor’s editing includes a LangGraph subprocess.
For each outline topic (in parallel), this is where LangGraph’s advantages are evident, as it can execute iterative processes. The Reviewer and Revisor:

The Researcher is responsible for invoking gpt-researcher to conduct in-depth research on sub-topics and write drafts.
The Reviewer verifies the correctness of the draft based on a set of standards from the user’s task’s requirements and provides feedback.
The Revisor revises the draft based on the Reviewer’s feedback and returns it to the Reviewer until the Reviewer is satisfied, meaning no further revision suggestions are given.

The Writer compiles and writes the final report, including the introduction, conclusion, and references.

The Publisher publishes the final report in various formats such as PDF, Docx, Markdown, etc.

The flowchart above is not very well drawn; the Chief Editor is not represented, and the Task goes directly to the Browser, which would confuse most people. Didn’t I mention that seven Agents were defined? Where did the Browser come from? In reality, it is the Researcher Agent. The Researcher defines two functions: one is to generate the outline, and the other is to analyze each topic in detail based on the outline and generate a draft. The second function involves a loop between the Researcher and the Revisor; in fact, the Researcher does not participate in the loop. Once the draft is completed, the Researcher ends its role, and the subsequent process involves the Reviewer and Revisor loop. The third empty subprocess on the right side, I understand the author’s intention is to express that the outline is divided into multiple topics, and each topic will have a subprocess similar to the one on the left. However, just drawing a few empty boxes… it would be hard to understand without looking at the source code.

Some Technical Details

Using tools is an important feature of Agents, and almost all LLMs now support tool calling. LangChain also provides several ways to implement tools and a large number of well-packaged community tools, allowing developers to easily introduce additional capabilities to LLMs. This enables LLMs to autonomously determine whether to call a tool upon receiving input and extract the necessary parameters from the input. However, the Agents in this project do not utilize this; the author implemented Python methods (mostly run methods) and directly set them as nodes in LangGraph. This is not unfeasible… but it means that the Agent only serves as an object encapsulation.

The core of this Agent, aside from execution logic, is the Prompt, which defines each Agent’s persona and what to do when interacting with the LLM. For example, the main Prompt for the Revisor is as follows:

prompt = [{"role": "system","content": "You are an expert writer. Your goal is to revise drafts based on reviewer notes."}, {"role": "user","content": f"""Draft:\n{{draft_report}}" + "Reviewer's notes:\n{{review}}\n\nYou have been tasked by your reviewer with revising the following draft, which was written by a non-expert. If you decide to follow the reviewer's notes, please write a new draft and make sure to address all of the points they raised. Please keep all other aspects of the draft the same. You MUST return nothing but a JSON in the following format:{{sample_revision_notes}}"""}]

Let’s translate the main content:

“role”: “system”,”content”: “You are a professional writer. Your goal is to revise drafts based on reviewer notes.”

“role”: “user”,”content”: “Your reviewer has tasked you with revising the following draft written by a non-expert. If you decide to follow the reviewer’s notes, please write a new draft and ensure that you address all the points they raised. Please keep all other aspects of the draft unchanged. You can only return JSON in the following format:”

Search Tools

As a project that outputs research reports, obtaining information from the web is a given. Information sources can be categorized into three types: first, the knowledge internalized by the LLM, which depends on the training data and has already been compressed; second, well-maintained databases, business data, or personal documents; and third, online information obtained in real-time via search engines.

Currently, several popular search engines include:

Tavily: A startup that has emerged in the AI era, providing APIs related to Search and News, capable of returning data in a format that is friendly to LLMs in real-time. I also used it during my development, and the overall effect was quite good, though the free call limit is only 1000 times, and running this project once consumed 6 calls.
SerpApi: A long-established search API integration service that even includes Baidu’s API, although it is a bit pricey.
DuckDuckGo: A search engine that emphasizes privacy, not tracking users’ search history or browsing habits.
BingSearch: A search engine launched by Microsoft, which can be activated on Azure cloud services, but the returned results are still in a relatively traditional format.
SearxNG: Open-source and can be self-deployed; I found its performance to be average after trying it out, so I abandoned it. However, the allure of open-source and free tools is still quite significant.

There are many others, including Yahoo Finance news, stock data, WIKI encyclopedia, Youtube videos, ArXiv papers, etc.

Reflection

This multi-Agent architecture outputs documents that look quite decent, but the content is rather average, failing to meet human standards. This is a common issue with most AI applications. At first glance, it seems impressive, but when it comes to solving problems, it always falls short. The desired effect for businesses is a score of 90, while most outputs only achieve around 70. As a research assistant, Copilot is still useful, saving the effort of searching for information online, which is similar to what Secret Tower offers. Let’s take a look at the output effect:

AI Agents are highly regarded, with domain celebrities like Andrew Ng and Sam Altman continually paying attention, and recently, Baidu’s Robin Li also urged everyone to compete in applications. Although Agents are not a new concept, it is still challenging to define what constitutes an Agent. Whether this project’s implementation qualifies is debatable. However, there is no need to get caught up in semantics; as Deng Xiaoping said, it doesn’t matter whether a cat is black or white, as long as it catches mice, it’s a good cat. The core issue is whether it can solve users’ real problems; users only care about results and whether their needs are met. Most people won’t pay attention to the underlying implementation principles. This project still holds significant reference value. I hope AI can help me read and research papers and blogs in a field, then summarize and organize the information structure I need. I will attempt to implement this in the future.

Critique of LangGraph

LangGraph is a wrapper based on LangChain LCEL, implementing many ready-to-use functionalities. However, these wrappers can complicate personalized implementations. For example, in a multi-turn dialogue scenario, if I want to change the logic based on user input during runtime, originally, the LLM decides whether to use a tool, but I want to force the use of a specific tool in the current dialogue. Implementing this becomes quite cumbersome, requiring significant modifications to define the implementation.

LangGraph is designed to meet general needs as much as possible, allowing developers to quickly get started and keeping development simple, but this results in a loss of flexibility. Designing for generality and flexibility are inherently two different directions, forming an impossible triangle along with complexity.

For example, the design of sedans is based on the scenario of paved roads, which is generally correct since 95% of use cases involve driving on paved roads. The benefits are comfort, good handling, and fuel efficiency. However, if a user needs to handle complex road conditions, which are rare and only account for 5%, the sedan’s chassis cannot handle it. The LangGraph framework is like a car designed to run on paved roads; it cannot handle 5% of rough terrain. Adapting LangGraph to handle rough roads is as challenging as converting a sedan into an off-road vehicle.

The popular open-source AI application project Dify announced last month that it removed LangChain from their project. For relatively mature projects like Dify, implementing a set of their own architecture should not be too difficult. One reason might also be that LangChain frequently undergoes disruptive updates, which can be exhausting to adapt to.

Overall, LangChain and LangGraph are still excellent projects, and designs like LCEL and dynamic configuration are very good. They can indeed save a lot of time when starting AI application projects, and I still recommend using them.

Leave a Comment Cancel reply