Pirate of RAG: Adaptive Attacks on LLMs to Leak Knowledge Bases

Abstract

With the growing popularity of Retrieval-Augmented Generation (RAG) systems in various real-world services, concerns about their security are increasing. RAG systems enhance the generative capabilities of Large Language Models (LLMs) through retrieval mechanisms operating on private knowledge bases. However, unintended exposure of this mechanism can lead to severe consequences, including the leakage of private and sensitive information. This paper proposes a black-box attack that can compel RAG systems to leak their private knowledge bases. This attack method is distinct from existing methods, featuring adaptive and automated characteristics. By generating effective queries based on relevance mechanisms and open-source LLMs on the attacker’s side, it can leak most of the hidden knowledge base. Extensive experiments demonstrate that the proposed algorithm performs excellently across different RAG pipelines and domains, exhibiting stronger adaptability and effectiveness compared to other methods. Therefore, the research results emphasize the urgent need for more robust privacy protection measures in the design and deployment of RAG systems.

1. Introduction

The Retrieval-Augmented Generation (RAG) system enhances the output accuracy and relevance of LLMs by integrating external knowledge bases. The working principle of RAG systems is to retrieve relevant information based on input content and incorporate it into the model’s responses. This method has significant advantages in improving the quality of generated content, especially in applications like customer support, organizational tools, and medical chatbots that require the protection of confidential information. However, RAG systems also face significant challenges regarding data privacy and security.

Currently, the attack methods targeting RAG systems and LLMs mainly include Membership Inference Attacks (MIA), which attempt to identify whether specific data points belong to the training set. Additionally, more advanced techniques focus on directly stealing sensitive knowledge from RAG systems. Although some methods like TGTB and PIDE rely on static prompts, this limits their adaptability. While the Dynamic Greedy Embedding Attack (DGEA) introduces adaptive algorithms, its complexity and resource consumption make it less feasible. Rag-Thief (RThief) utilizes memory mechanisms to extract text blocks, but its flexibility relies on predefined conditions, posing challenges in terms of efficiency, adaptability, and effectiveness, often making RAG systems vulnerable to privacy leaks.

Pirate of RAG: Adaptive Attacks on LLMs to Leak Knowledge Bases

2. Methods

To address the privacy issues in RAG systems, researchers from the University of Perugia, the University of Siena, and the University of Pisa proposed a relevance-based framework aimed at extracting private knowledge while preventing redundant information leakage. This framework employs open-source language models and sentence encoders to automatically explore hidden knowledge bases without relying on pay-per-use services or prior system knowledge. Compared to other methods, this approach features progressive learning characteristics, tending to maximize the coverage of private knowledge bases.

The framework operates in a blind environment, exploring the private knowledge base by utilizing feature representation graphs and adaptive strategies. It is implemented as a black-box attack that can run on standard home computers without the need for specialized hardware or external APIs. This method emphasizes the transferability across RAG configurations and provides a simpler, more cost-effective way to expose vulnerabilities.

3. Experimental Design

The researchers conducted experiments simulating real-world attack scenarios, using different attacking LLMs against three distinct RAG systems. The goal was to extract as much information as possible from the private knowledge base. Each RAG system implemented a virtual agent similar to a chatbot, allowing users to interact through natural language queries. Three agents were defined: Agent A as a diagnostic support chatbot; Agent B as a chemistry and medical research assistant; and Agent C as a children’s education assistant. The private knowledge base was simulated using datasets, with each agent sampling 1,000 data blocks. The experiments compared the proposed method with competitors such as TGTB, PIDE, DGEA, RThief, and GPTGEN, with evaluation metrics including navigation coverage, leaked knowledge, leaked blocks, unique leaked blocks, and attack query generation time.

4. Results Analysis

The results indicate that the proposed method outperforms competitors in both navigation coverage and leaked knowledge, particularly in bounded and unbounded scenarios. Specifically, in bounded scenarios, the Pirate method effectively extracts more unique knowledge blocks and also performs excellently in query generation time. Furthermore, in unbounded scenarios, the Pirate method demonstrates stronger capabilities, continuously extracting new information until the relevance of all anchors is zero.

5. Discussion and Future Directions

The adaptive attack program proposed in this paper provides an effective method for extracting private knowledge from RAG systems. Although the current research reveals the vulnerabilities of RAG systems, further exploration of stronger defense mechanisms and targeted attack methods is still needed. Future research will consider how to introduce targeted strategies into attacks to enhance their accuracy and efficiency.

6. Conclusion

This paper validates the effectiveness of the adaptive attack program in extracting private knowledge from RAG systems through a series of experiments. The results emphasize the necessity of strengthening privacy protection in the design of RAG systems. As RAG systems are widely deployed in practical applications, ensuring their security and privacy will be an important direction for future research.

paper: https://arxiv.org/abs/2412.18295