Solving RAG’s Challenges: From Demo to Production

Introduction
Many product managers and engineers familiar with RAG often complain, “It only takes a week to produce a demo with RAG, but it takes at least six months to reach a production-level standard!”
This is a realistic issue for the current industrial implementation of RAG. The RAG framework is very simple and understandable, and there are numerous ways to optimize the entire RAG process, which I have detailed in previous articles, such as “The Development History of RAG Systems: From Simple to Advanced to Modular”.
However, whether used internally within companies or for end-users, the most direct feedback is that RAG can only retrieve and answer relatively simple and intuitive questions.
The unicorn in the enterprise knowledge base field, Hebbia, has also conducted experiments, revealing that RAG can only address 16% of internal enterprise issues. What is the reason for this?
Solving RAG's Challenges: From Demo to Production
The answer lies in the classification of questions.
Any user query can be categorized into four types: explicit fact queries, implicit fact queries, explainable reasoning queries, and implicit reasoning queries. The complexity and difficulty of solving these questions increase in that order.
Solving RAG's Challenges: From Demo to Production
The following diagram lists the challenges and solutions for each type of question. It can be seen that only explicit fact queries and some implicit fact queries can be solved using RAG, Iterative RAG, or GraphRAG. However, for explainable reasoning queries and implicit reasoning queries, RAG is ineffective and requires more complex and targeted solutions.
Solving RAG's Challenges: From Demo to Production
In practical enterprise application scenarios, the vast majority of valuable questions for business departments fall into Level 3 and Level 4, which leads to the predicament of “RAG can produce a demo in a week, but it takes six months to go live”.
Next, I will elaborate on the characteristics and solutions for these four types of questions. By the end, I believe you will gain valuable insights!
Explicit Fact Queries
Explicit facts refer to factual information or data that directly exists in external data and do not require additional reasoning. For example, “Where was the 2016 Olympics held?”, “What is the brand and operating temperature of a certain sensor?”, “What was the revenue of store A last month?”.
Explicit fact queries are the simplest form of query, directly retrieving clear factual information from the provided data without the need for complex reasoning or thought, making them very suitable for RAG.
Of course, to accurately and efficiently retrieve and generate relevant content, the RAG system also needs optimization. This can be achieved through methods previously introduced, and here is a brief review.

Index Construction

Block Optimization: By using sliding windows, adding metadata, and other methods, the size, structure, and relevance of content blocks can be divided more reasonably.

Multi-level Index: This involves creating two indexes, one consisting of document summaries and the other consisting of document blocks, and searching in two steps: first filtering out relevant documents through summaries, and then searching only within that relevant group.

Knowledge Graph: Extract entities and relationships between entities to build a global information advantage, thus enhancing the accuracy of RAG.

Pre-retrieval

Multi-query: Use prompt engineering to expand queries with large language models, transforming the original query into multiple similar queries and executing them in parallel.

Sub-query: By decomposing and planning complex problems, the original query is broken down into multiple sub-queries, which are then summarized and merged.

Query Transformation: Transform the user’s original query into a new query content before retrieval and generation.

Query Construction: Convert natural language queries into a language that specific machines or software can understand, such as text2SQL or text2Cypher.

Retrieval

Sparse Retriever: Convert queries and documents into sparse vectors using statistical methods. Its advantage is high efficiency when handling large datasets, focusing only on non-zero elements.

Dense Retriever: Provide dense representations for queries and documents using pre-trained language models (PLMs). Although the computational and storage costs are high, it offers more complex semantic representations.

Retriever Fine-tuning: Fine-tune retrieval models based on labeled domain data, usually using contrastive learning to achieve this.

Post-retrieval

Re-ranking: For the retrieved content blocks, use a specialized ranking model to recalculate the relevance scores of the context.

Compression: For the retrieved content blocks, do not input them directly into the large model, but first remove irrelevant content and highlight important context, thereby reducing the overall prompt length and minimizing the interference of redundant information on the large model.

Implicit Fact Queries

Implicit facts do not directly appear in the original data and require some reasoning and logical judgment. Moreover, the information to deduce implicit facts may be dispersed across multiple paragraphs or data tables, necessitating cross-document retrieval or cross-table querying.
For example, “Query the store with the highest revenue growth rate in the past month” is a typical implicit fact query. It requires obtaining the revenue of all stores for the current and previous months, calculating the revenue growth rate for each store, and then sorting the results.
The main challenge of implicit fact queries is that the data sources and reasoning logic depend on different questions, and ensuring the generalization of the large model during the reasoning process is complex.
The primary solution approach for implicit fact queries includes the following methods:
Multi-hop Retrieval and Reasoning
Iterative RAG: Generate a retrieval plan before searching, and continuously optimize based on retrieval results during the search process. For example, using the ReAct framework, approach the correct answer step by step along the Thought – Action – Observation analysis path.
Self-RAG: Build four key scorers: retrieval demand scorer, retrieval relevance scorer, generation relevance scorer, and answer quality scorer, allowing the large model to autonomously decide when to start retrieving, when to use external search tools, and when to output the final answer.
Solving RAG's Challenges: From Demo to Production
Using Graph and Tree Structures
Raptor: RAPTOR recursively clusters text blocks based on vectors and generates text summaries for these clusters, thereby constructing a tree from the bottom up. The aggregated nodes are sibling nodes; the parent node contains the text summary of the cluster. This structure allows RAPTOR to load context blocks representing different levels of text into the context of LLM, enabling effective and efficient answers to questions at different levels.
Solving RAG's Challenges: From Demo to Production
GraphRAG: A technological paradigm that combines knowledge graphs with RAG. Traditional RAG retrieves from vector databases, whereas GraphRAG retrieves from knowledge graphs stored in graph databases to obtain related knowledge and enhance generation.
Converting Natural Language into SQL Queries
text2SQL: Mainly used for database queries, especially in multi-table query scenarios. Refer to “Understanding the Challenges and Solutions of ChatBI: Answer Accuracy Exceeding 99%”“.

Explainable Reasoning

Explainable reasoning refers to problems that cannot be derived from explicit or implicit facts and require comprehensive data for relatively complex reasoning, induction, and summarization, with the reasoning process being business-explainable.

Attribution analysis in ChatBI is a typical example of explainable reasoning. For instance, “What caused the 5% revenue decline in the South China region in the past month?”. This question cannot be answered directly but can be reasoned through certain means, as shown below:

Total Revenue = New Customers * Conversion Rate * Average Transaction Value + Existing Customers * Repurchase Rate * Average Transaction Value

Analysis shows that the number of new customers, conversion rate, and average transaction value did not change significantly, while the repurchase rate of existing customers declined by about 10%. Therefore, it can be inferred that reasons such as “service quality and competition from rival products” led to the decline in the repurchase rate of existing customers, consequently causing the total revenue to decrease.

Explainable reasoning problems face two main challenges: diverse prompts and limited explainability.

Diverse Prompts: Different query problems require specific business knowledge and decision-making basis. For example, reasoning for revenue decline can use the aforementioned business rules, but if it involves reasoning for the decline in gross margin, a different set of business rules is needed. This diversity in rule sedimentation requires industry experts to organize and convert them into suitable prompts, allowing the large model to understand the underlying logic.

Limited Explainability: The impact of prompts on large models is opaque, making it difficult to assess their influence, thereby hindering the construction of consistent explainability.

In the face of such challenges, I mainly have the following suggestions:

Prompt Engineering Optimization

Optimizing Prompts: It is essential to effectively integrate business reasoning logic into large language models, which tests the industry know-how of prompt designers.

Prompt Fine-tuning: Manually designing prompts can be time-consuming; this issue can be addressed through prompt fine-tuning techniques. For example, using reinforcement learning, the probability of the large model generating correct answers can be used as a reward to guide the model in discovering the best prompt configuration across different datasets.

Solving RAG's Challenges: From Demo to Production

Building Decision Trees

Decision Trees: Transform decision-making processes into state machines, decision trees, or pseudocode for execution by large models. For instance, in equipment maintenance, constructing fault trees is a highly effective fault detection solution.

Solving RAG's Challenges: From Demo to Production

Using Agentic Workflows

Agentic Workflow: Construct specific steps for the large model’s thinking and actions through workflows, thereby constraining its thought direction. The advantage of this method is that it can provide relatively stable outputs, but the downside is a lack of flexibility, requiring workflows to be designed for each type of problem.

Implicit Reasoning Queries

Implicit reasoning queries refer to those that cannot be judged based on pre-established business rules or decision logic but must be inferred through observation and analysis of external data.

For example, in IT intelligent operations and maintenance, there are no pre-existing comprehensive documents detailing the handling methods and rules for each type of problem. The operations and maintenance team only has records of various fault events and solutions handled in the past. The large model needs to mine the best handling solutions for different faults from this data, which constitutes implicit reasoning queries.

Similarly, scenarios such as intelligent operations and maintenance on production lines and intelligent quantitative trading involve numerous implicit reasoning query issues.

The main challenges of implicit reasoning problems include difficulty in logical extraction, data dispersion, and insufficiency, making them the most complex and challenging issues.

Difficulty in Logical Extraction: Mining implicit logic from vast amounts of data requires developing complex and effective algorithms capable of parsing and identifying the logic hidden behind the data. Therefore, relying solely on superficial semantic similarity is insufficient; dedicated small models need to be built to address this.

Data Dispersion and Insufficiency: Implicit logic is often hidden in very dispersed knowledge, requiring the model to possess strong data mining and comprehensive reasoning capabilities. Moreover, when external data is limited or of insufficient quality, it becomes challenging to extract valuable information from it.

For the challenges faced by implicit reasoning problems, the following solution approaches can be considered:

Machine Learning: Summarize potential rules from historical data and cases using traditional machine learning methods.

Solving RAG's Challenges: From Demo to Production

Context Learning: Include relevant examples in prompts to reference for the model. However, the drawback of this method is how to enable the large model to master reasoning capabilities beyond its training domain.

Model Fine-tuning: Fine-tune the model using a large amount of business data and case data, internalizing domain knowledge. However, this method can be resource-intensive, so small and medium-sized enterprises should use it cautiously.

Reinforcement Learning: Encourage the model to generate reasoning logic and answers that best align with business realities through a reward mechanism.

Conclusion

In this article, addressing the issue of RAG being easy to start but difficult to go live, I introduced the four types of user queries and the corresponding problem-solving approaches for each type.

For explicit fact queries and implicit fact queries, various RAG optimization schemes can be employed. However, when facing explainable reasoning and implicit reasoning problems, relying solely on RAG becomes inadequate, necessitating the introduction of methods such as prompt engineering, decision trees, agentic workflows, machine learning, model fine-tuning, and reinforcement learning.

Each of these methods could be elaborated on in a separate series. Therefore, this article merely presents these problem-solving directions without detailed exposition. In the future, I will provide detailed introductions based on practical cases.

This is my last article of 2024, and I wish everyone a Happy New Year, hoping for great prospects in 2025!

More Exciting Articles:

What is AI Agent that the big shots are paying attention to? Analyzing AI Agent using the 5W1H framework (Part 1)

AI Large Model Practical Series: AI Agent Design Pattern – ReAct

RAG Practical Series: Building a Minimum Viable RAG System

In-depth Analysis of Core Application Scenarios of AI Large Models in the Retail Industry

After leaving Tencent after seven years, I officially embark on my entrepreneurial journey in the AI large model field.

Solving RAG's Challenges: From Demo to Production

I am Feng Shu, an entrepreneur in the AI large model field, former product director at Tencent, with over ten years of product design and commercialization experience, possessing rich practical experience in e-commerce, marketing, AI, and big data products. I will continue to share my summaries and reflections, hoping they will be helpful to you.

Leave a Comment