Design Patterns for Compound AI Systems (Conversational AI, CoPilots & RAG)

Author: Raunak Jain March 18, 2024

Translator: Chen Zhiyan
Proofreader: zrx

This article is approximately 3500 words long and is suggested to be read in 10 minutes.
How to build a flow-configurable compound artificial intelligence system using open-source tools.

In the previous section, we introduced what a compound artificial intelligence system is, its system components, and how they interact with each other to build complex systems, the autonomous agents based on LLMs — a key module in compound AI systems, and the clarifications regarding definitions and considerations prior to choosing patterns in the design patterns of compound AI systems.

In the following sections, we will introduce three deployment patterns for compound AI systems.

Deployment Pattern 1 — RAG / Conversational RAG

The diagram below shows the main responsibilities of each module in the RAG / conversational RAG system, which traditionally belongs to the IR field, first improved by neural search and knowledge graphs, and then followed by the generative method loop using LLMs. From another perspective, this is a conversational IR system, where IR and conversational systems merge, treating queries as objects that transform context.

For the success of the RAG system, the key lies in understanding the user’s query and mapping it to the underlying knowledge (structured or unstructured), providing it along with appropriate instructions to the generator/conversation manager. These actions can be executed using a clearly defined workflow or using agent modules, where the actions of the modules determine which steps to execute (to be elaborated in the next section).

RAG process, handed over to the conversation manager — if the conversation manager is an agent, RAG becomes a tool.

Let’s take a look at some intermediate modules/tools that allow agents to navigate in the complex RAG world.

Query Understanding and Reconstruction

Query Expansion / Multi-query

Using LLMs to expand queries can improve search results when using sparse and statistical retrievers.

Query Rewriting / Self-query

A self-query retriever, as the name suggests, has the ability for self-querying. Specifically, given any natural language query, the retriever constructs an LLM chain from the query to write a structured query, which it then applies to the underlying VectorStore. The retriever can not only compare the semantic similarity of the user-input query with the content of the documents but also extract filters from the user query based on document metadata and execute these filters.

Entity Recognition
Query Enhancement
Knowledge or Intent Retrieval
Multi-document Search
Conversation Management
Response Generation

Agent-based RAG

Agent-based RAG is a design pattern where one module is driven by LLMs, reasoning and planning how to answer questions based on its available toolset. In advanced scenarios, multiple agents may also be connected to creatively solve RAG problems, where agents can not only retrieve but also validate, summarize, etc. For more information, see the multi-agent section.

Key steps and components that need refinement:

1. Planning based on reasoning, subtask formulation, and systematic arrangement.

2. Self-correction based on self-consistency; due to generating multiple paths and reasoning, planning-based RAG methods (ReWoo and Plan+) perform better than those purely based on reasoning (ReAct).

3. Adaptability to execution, with more diverse agent examples.

Typically, executed using the following patterns:

Reasoning-based Agent RAG

ReAct

Using search tools for reasoning and actions.

Planning-based Agent RAG

https://blog.langchain.dev/planning-agents/

ReWoo

ReWoo — generates fewer tokens than ReAct.

PlanRAG

It consists of two components: first, formulating a plan that divides the entire task into smaller subtasks, and then executing the subtasks according to the plan.

Deployment Pattern 2 — Conversational AI

Traditionally, conversational flows have been highly scripted, represented as “robot says” -> “human says” -> “robot says”… interactions, indicating different hypotheses of real-world scenarios, also known as “stories” for Rasa developers. Each user’s intent can be expressed as hundreds of “stories” based on the user’s state and interaction, where the robot takes actions to execute the predefined stories and respond accordingly. For example, if a user wants to subscribe to a newsletter, there could be two paths:

1. The user is already subscribed.

2. The user is not yet subscribed.

Source

If the user says “How do I subscribe to the newsletter” triggering the intent, the robot needs to check if that user is already subscribed and then take the appropriate next step. This decision of “what to do next” is a manually hardcoded path. If it deviates from the path, the robot would say, “Sorry, I am still learning, I can help you do xyz..”.

The real cost of building and maintaining a robot comes from these stories. The reason for setting up the above tedious pattern is to enable the robot to navigate diverse real-world scenarios in an organized manner and to add new paths. Path writers often always have some “conditions to check,” “actions to execute,” and “the ultimate goal of the conversation” to build a goal-oriented execution script.

With LLMs, we can try to automate script writing or path planning using the “reasoning” and “planning” capabilities of LLMs.

Imagine you are a customer service agent, and a user comes to you with different needs, how should they subscribe to your service? How would you define the next step to take? Can it be completely open? Probably not, from a management cost perspective, it also cannot be highly scripted. If I tell you the following:

Condition — If there is an email, then the user can subscribe.

Tools — check_subscription, add_subscription

As a self-respecting human, you would be able to weave the following story in your mind:

1. The user wants to subscribe based on the statement — “How do I subscribe?”

2. Ask the user for their email — “What is your email?”

3. If they provide a valid email, trigger the tool — check_subscription.

4. If the user is not yet subscribed, trigger add_subscription.

5. Respond with success or failure.

This is what we want LLMs to do, generate a “plan” that it can reference and take actions based on it at runtime.

Back to the module template, let’s see what a planner looks like:

The above planner constructs plans or stories using tools and conditions; let’s look at a real example from research:

KnowAgent: Knowledge-based LLM agent enhanced planning.

What tools can assist planners in deciding paths based on reliable reasoning?

1. Paths triggered by previous similar statements.

2. Dependencies between enterprise action graphs and actions. Helping planners determine if an action will yield the correct result and then proceed to the next action, and so on, until the problem is recursively solved.

3. The current state of the user/conversation.

Deployment Pattern 3 — Multi-agent

In a multi-agent setup, the goal is to define roles and responsibilities supported by LLMs, equipped with precise tools, to work collaboratively to generate intelligent answers/solutions.

Thanks to clearly defined roles and underlying models, agents delegate sub-goals or parts of “plans” to “experts” and then decide what to do next based on the output.

Using the communication patterns below to control the permissions for the next steps.

How agents/modules communicate to build real-world CoPilots — https://arxiv.org/pdf/2402.01680v1.pdf

Advantages of multi-agent design:

Separation of concerns: each agent can have its own instructions and few-shot examples, supported by separately fine-tuned language models and various tools. Assigning tasks to different agents can yield better results. Each agent focuses on specific tasks rather than making choices from numerous tools.
Modularity: multi-agent design allows complex problems to be broken down into manageable work units, executed by specialized agents and language models. Multi-agent design allows independent evaluation and improvement of each agent without disrupting the entire application. Grouping tools and responsibilities leads to better outcomes, as agents are more likely to succeed when focused on specific tasks.
Diversity: introducing a strong team of agents brings different perspectives, refining outputs and avoiding hallucinations and biases (like a typical human team).
Reusability: once agents are built, there is an opportunity to apply these agents to different use cases, achieving an ecosystem of agents that come together to solve problems, with appropriate orchestration/coordinating frameworks (such as AutoGen, Crew.ai, etc.).

Deployment Pattern 4 — CoPilot

The only distinction seen in CoPilot systems is its ability to learn from user interactions and testing functionalities.

More content to come…

Framework and Implementation

It is important to distinguish between the frameworks for building CoPilots and the actual implementations of CoPilots (like GPT Pilot and aider). In most cases, there are no open-source CoPilots developed on existing frameworks; all implementations are developed from scratch.

Review popular implementations: OpenDevin, GPT Pilot.

Review popular research papers: AutoDev, AgentCoder.

Popular frameworks — Fabric, LangGraph, DSPy, Crew AI, AutoGen, Meta GPT, Super AGI, etc.

The entire text strives to adhere to the following definition: LLM-based multi-agent.

In-depth Study — APPS

GPT Pilot

The GPT Pilot project is an outstanding example of creative prompt engineering and LLM response chain “layered” processes to execute seemingly complex tasks.

There are several profiles that work in a layered communication manner; see the green box below:

https://www.reddit.com/r/MachineLearning/comments/165gqam/p_i_created_gpt_pilot_a_research_project_for_a/

Each agent interacts in a layered manner, triggering from one node to the next, with decision-making agents not listed in the diagram.

Product deployment based on some elegant principles works well:

1. Break tasks down into small modules that LLM can generate code for.

2. Test-driven development, collecting good human use cases to accurately validate and iterate.

3. Context rollback and code summarization.

Despite the complexity of the above prompt engineering and process design, the benefits of fine-tuning each agent are evident, not only lowering costs but also improving accuracy.

Author Bio:

Raunak Jain.Looking for patterns and building abstractions.

Original Title:

Design Patterns for Compound AI Systems (Conversational AI, CoPilots & RAG)

Original Link:

https://medium.com/@raunak-jain/design-patterns-for-compound-ai-systems-copilot-rag-fa911c7a62e0

Editor: Wang Jing

Translator Bio

Chen Zhiyan, graduated from Beijing Jiaotong University with a Master’s degree in Communication and Control Engineering. Previously worked as an engineer at Great Wall Computer Software and Systems Co., Ltd. and Datang Microelectronics Company. Currently serves as technical support at Beijing Wuyichaoqun Technology Co., Ltd. Engaged in the operation and maintenance of intelligent translation teaching systems, with certain experience in artificial intelligence deep learning and natural language processing (NLP). In his spare time, enjoys translation creation, with translated works including: IEC-ISO 7816, Iraq Oil Engineering Project, New Fiscal Taxism Declaration, etc., among which the Chinese to English work “New Fiscal Taxism Declaration” was officially published in GLOBAL TIMES. Hopes to join the translation volunteer group at THU Data Party platform in his spare time, wishing to communicate and share with everyone for mutual progress.

Translation Group Recruitment Information

Job Content: Requires a meticulous heart to translate selected foreign articles into fluent Chinese. If you are a data science/statistics/computer student studying abroad, or working abroad, or confident in your foreign language skills, you are welcome to join the translation group.

You will get: Regular translation training to improve volunteers’ translation skills, enhance understanding of cutting-edge data science, and overseas friends can maintain contact with domestic technology application development. The background of THU Data Party’s industry-university-research provides good development opportunities for volunteers.

Other Benefits: Data scientists from well-known enterprises, students from prestigious universities such as Peking University and Tsinghua University, and students from overseas will become your partners in the translation group.

Click “Read Original” at the end of the article to join the Data Party team~

Reprint Notice

If reprinted, please indicate the author and source prominently at the beginning (reprinted from: Data Party ID: DatapiTHU), and place a prominent QR code of Data Party at the end of the article. For articles with original identification, please send [Article Name – Pending Authorized Public Account Name and ID] to the contact email to apply for whitelist authorization and edit as required.

After publishing, please feedback the link to the contact email (see below). Unauthorized reprints and adaptations will be legally pursued.

About Us

Data Party THU, as a data science public account, backed by Tsinghua University’s Big Data Research Center, shares cutting-edge data science and big data technology innovation research dynamics, continuously disseminating data science knowledge, striving to build a platform for gathering data talent, and creating the strongest group army of big data in China.

Sina Weibo: @数据派THU

WeChat Video Account: 数据派THU

Today’s Headlines: 数据派THU

Click “Read Original” to embrace the organization

Leave a Comment Cancel reply