Reducing LLM Hallucinations with the Agentic Approach: In-Depth Analysis and Practice

Click the “blue words” to follow us

In the field of artificial intelligence, especially in the application of large language models (LLMs), the phenomenon of hallucination has always been a key issue affecting the reliability and accuracy of the models. Hallucination (How to Eliminate Hallucinations in Large Language Models (LLMs)) refers to the text generated by LLMs that is either meaningless or contradictory to the input data. This not only reduces user experience but can also lead to serious misunderstandings and erroneous decisions. To tackle this challenge, researchers have proposed various strategies, among which the Agentic approach has shown significant effects in reducing LLM hallucinations due to its unique logical chain and verification mechanism.

1. Overview of LLM Hallucinations

Before delving into the Agentic approach, it is necessary to understand the three main types of LLM hallucinations (LLM Hallucinations: Phenomenon Analysis, Impact, and Coping Strategies):

(1) Intrinsic Hallucination

Intrinsic hallucination refers to the responses of LLMs that contradict the context provided by users. In the current context, the response can be clearly verified as incorrect. For example, when a user provides an accurate description of a specific historical event, and the answer given by the LLM contradicts these known facts, it falls under intrinsic hallucination. This may be due to the model’s inaccurate understanding of knowledge or incorrect information associations acquired during training.

(2) Extrinsic Hallucination

Extrinsic hallucination refers to the responses of LLMs that cannot be verified through the context provided by users. Although the response may be correct or incorrect, its truth cannot be determined solely based on the current context. This situation is common when the model attempts to infer beyond the given information but lacks sufficient basis to confirm the validity of its inference. For instance, when asked about a prediction of a future event not mentioned in the context, the model’s answer becomes difficult to verify from the existing context.

(3) Incoherent Hallucination

Incoherent hallucination manifests as LLM responses that do not answer the question or are meaningless. This means that the model fails to follow instructions and cannot generate answers that are relevant and logically coherent to the question. For example, for a clear mathematical calculation question, if the model provides a passage unrelated to mathematics, this is an example of incoherent hallucination. This situation may arise due to the model’s misunderstanding of the question or its failure to follow the correct logical path while generating the answer.

The existence of these hallucination phenomena severely affects the performance of LLMs in tasks such as question answering, information extraction, and text generation. Therefore, finding effective methods to reduce or even eliminate these hallucination phenomena has become an important topic in current artificial intelligence research.

2. Principles of the Agentic Approach

The Agentic approach is a workflow based on agents, aiming to reduce hallucinations by verifying the answers generated by LLMs through a series of steps (Solutions to LLM Hallucination Problems Based on Chain of Verification). The core idea of this method is to utilize the logical judgment ability of LLMs to self-verify the generated answers. The specific steps are as follows:

Include Context and Ask: First, provide the question and its relevant context to the LLM to obtain a preliminary answer and the relevant context used by the LLM to generate the answer. The purpose of this step is to obtain a preliminary candidate answer and the context information supporting that answer.
Rephrase the Question and Answer as Declarative Statements: Next, rephrase the question and the preliminary answer into a single declarative statement. The purpose of this step is to integrate the question and answer into an easily verifiable statement, preparing for subsequent verification steps.
Verify the Statement: Finally, ask the LLM to analyze the provided context and the declarative statement, and determine whether the context entails the statement. This step is the core of the verification process, confirming the correctness of the answer through the logical judgment ability of the LLM.

3. Techniques to Reduce Hallucinations

(1) Use of Base Settings

Base settings involve providing relevant additional context within the input when posing tasks to the LLM. This provides the LLM with the information needed to correctly answer the question, thereby reducing the likelihood of hallucinations. For example, when asking a mathematical question, simply providing the question itself and simultaneously providing the relevant chapter content from a math book will yield different results, with the latter being more likely to arrive at the correct answer. In practical applications, such as handling document-related tasks, providing extracted context from the document can help the LLM better understand the question and provide accurate answers. This is also the core principle of Retrieval-Augmented Generation (RAG) technology, which supplements input by retrieving relevant information from external knowledge bases, making the model’s answers more substantiated.

(2) Use of Structured Outputs

Enforcing LLM to output valid JSON or YAML text is a method of using structured outputs. This reduces unnecessary verbose expressions and directly obtains answers that “cut to the chase”. At the same time, structured outputs facilitate subsequent verification of the LLM’s responses. Taking Gemini’s API as an example, a Pydantic schema can be defined and sent as part of the query in the “response_schema” field. This requires the LLM to follow this schema during its response, making it easier to parse its output. For example, when asking the LLM about the reasons for hallucinations, by defining a specific schema, the LLM can output answers according to that schema, such as {“answer”: “Reasons for LLM hallucinations include biases in training data, inherent limitations in the model’s understanding of the real world, and the model’s tendency to prioritize fluency and coherence over accuracy.”}, allowing for clear retrieval of the required information and facilitating verification and processing of the answer.

(3) Use of Chain of Thought and Better Prompts

Chain of Thoughts

Giving the LLM space to think about its responses before providing a final answer helps produce higher quality responses, which is the technique of chain of thoughts. It is widely applied and easy to implement. For example, for a simple question like “When did Thomas Jefferson die?”, if a naive approach is taken, it may incorrectly answer 1826 because the context mentioned Jefferson, while in fact, the context refers to Thomas Jefferson and is unrelated to “Davis Jefferson” in the question. However, if the chain of thoughts method is used, allowing the LLM to provide reasoning first, it will recognize that the context discusses Thomas Jefferson and does not mention Davis Jefferson, thereby giving an answer of “N/A” and avoiding the error.
Explicitly Request an Answer of “N/A”

When the LLM cannot find sufficient context to generate a high-quality response, explicitly requesting it to answer “N/A” provides the model with a simple exit strategy instead of forcing it to answer a question it cannot answer. This helps reduce hallucinations caused by the model’s blind guessing. For example, in the earlier question about the year Davis Jefferson died, when the model realizes it cannot obtain an answer from the given context, it is required to respond “N/A”, thus avoiding providing an incorrect year answer.

(4) Agentic Approach

Overview of the Three-Step Process

The agentic approach is implemented by constructing a simple agent that includes three steps.

1) First, provide the context and question to the LLM to obtain the first candidate answer and the relevant context it used to answer.

2) Then, rephrase the question and the first candidate answer as declarative statements.

3) Finally, ask the LLM to verify whether the relevant context entails the candidate answer, a process known as “self-verification”.
Specific Implementation StepsDefine three nodes,the first node will pose the question while including context, the second node will use the LLM, and the third node will check the relevance of the statement to the input context.First Node:

    def answer_question(self, state: DocumentQAState):        logger.info(f"Responding to question '{state.question}'")        assert (            state.pages_as_base64_jpeg_images or state.pages_as_text        ), "Input text or images"        messages = (            [                {                    "mime_type": "image/jpeg", "data": base64_jpeg                }                for base64_jpeg in state.pages_as_base64_jpeg_images            ]            + state.pages_as_text            + [                f"Answer this question: {state.question}",            ]            + [                f"Use this schema for your answer: {self.answer_cot_schema}",            ]        )        response = self.model.generate_content(            messages,            generation_config={                "response_mime_type": "application/json",                "response_schema": self.answer_cot_schema,                "temperature": 0.0,            },        )        answer_cot = AnswerChainOfThoughts(**json.loads(response.text))        return {"answer_cot": answer_cot}

Second Node

def reformulate_answer(self, state: DocumentQAState):        logger.info("Reformulating answer")        if state.answer_cot.answer == "N/A":            return        messages = [            {                "role": "user",                "parts": [                    {                        "text": "Reformulate this question and its answer as a single assertion."                    },                    {"text": f"Question: {state.question}"},                    {"text": f"Answer: {state.answer_cot.answer}"},                ]                + [                    {                        "text": f"Use this schema for your answer: {self.declarative_answer_schema}"                    }                ],            }        ]        response = self.model.generate_content(            messages,            generation_config={                "response_mime_type": "application/json",                "response_schema": self.declarative_answer_schema,                "temperature": 0.0,            },        )        answer_reformulation = AnswerReformulation(**json.loads(response.text))        return {"answer_reformulation": answer_reformulation}

Third Node

def verify_answer(self, state: DocumentQAState):        logger.info(f"Verifying answer '{state.answer_cot.answer}'")        if state.answer_cot.answer == "N/A":            return        messages = [            {                "role": "user",                "parts": [                    {                        "text": "Analyse the following context and the assertion and decide whether the context "                        "entails the assertion or not."                    },                    {"text": f"Context: {state.answer_cot.relevant_context}"},                    {                        "text": f"Assertion: {state.answer_reformulation.declarative_answer}"                    },                    {                        "text": f"Use this schema for your answer: {self.verification_cot_schema}. Be Factual."                    },                ],            }        ]        response = self.model.generate_content(            messages,            generation_config={                "response_mime_type": "application/json",                "response_schema": self.verification_cot_schema,                "temperature": 0.0,            },        )        verification_cot = VerificationChainOfThoughts(**json.loads(response.text))        return {"verification_cot": verification_cot}

5. Use a Stronger Model (Bonus Tip)

Although this technique is not always easy to apply due to budget or delay constraints, stronger LLMs are indeed less prone to hallucinations (RELAI Validation Agent: A New Approach to LLM Hallucination Detection). Benchmarking hallucinations shows that the best-performing models (with the fewest hallucinations) also rank high on traditional natural language processing leaderboards. Therefore, for the most sensitive use cases, if possible, a more powerful LLM should be chosen. For example, in fields requiring high accuracy, such as medical diagnosis and legal document processing, using a more powerful model can reduce the risks associated with hallucinations.

The Agentic approach, as an effective strategy for reducing LLM hallucinations, has broad application prospects in the field of artificial intelligence. Through the self-verification mechanism, this method can significantly reduce the hallucination phenomena in the answers generated by LLMs, improving the accuracy and reliability of the answers. However, this method also has some limitations and challenges, such as high requirements for LLM models, limited ability to handle complex problems, and cost-effectiveness issues.

To overcome these limitations and challenges, future research can explore the following aspects:

Develop Stronger LLM Models

By improving model structure and training algorithms, enhance the logical judgment ability and contextual understanding of LLMs, making them better suited to the needs of the Agentic approach.
Optimize Verification Mechanisms

Research more efficient verification methods and algorithms to reduce the computational resources and time required for verification steps, improving the cost-effectiveness of the Agentic approach.
Expand Application Areas

Apply the Agentic approach to more tasks and scenarios, such as dialogue systems, recommendation systems, knowledge graph construction, etc., to verify its generalization ability and practicality.

In summary, the Agentic approach provides an effective solution for reducing LLM hallucinations (RAG (Retrieval-Augmented Generation) Assessment: Evaluating Hallucination Phenomena in LLMs) and has shown significant results in practical applications.

Code Address: https://github.com/CVxTz/document_ai_agents