ConfusedPilot: UT Austin & Symmetry Systems Uncover Novel Attack on RAG-based AI Systems
Published 11/12/2024
Originally published by Symmetry Systems.
Written by Claude Mandy.
Executive Summary
Researchers at the Spark Research Lab (University of Texas at Austin)[1], under the supervision of Symmetry CEO Professor Mohit Tiwari uncovered a novel attack method, dubbed ConfusedPilot. This novel attack method targets widely used Retrieval Augmented Generation (RAG) based AI systems, such as Microsoft 365 Copilot. This attack allows manipulation of AI responses simply by adding malicious content to any documents the AI system might reference, potentially leading to widespread misinformation and compromised decision-making processes within the organization.. With 65% of Fortune 500 companies currently implementing or planning to implement RAG-based AI systems, the potential impact of these attacks cannot be overstated.
Key Findings
- Requires only basic access to manipulate responses by RAG based AI Systems.
- Affects all major RAG implementations
- Can persist even after malicious content is removed
- Bypasses current AI security measures
In this document, we provide a high-level overview of ConfusedPilot and its implications for organizations using RAG-based AI systems. Given the widespread and rapid adoption of AI Copilots and the potential sensitivity of this vulnerability, we have chosen to withhold certain technical details and specific exploit information at this time.
What is Confused Pilot?
ConfusedPilot was discovered and researched by a team of cybersecurity experts and computer scientists (Note 1) from the University of Texas at Austin, under the supervision of Professor Mohit Tiwari, who directs the SPARK lab at UT Austin and is also the CEO of Symmetry Systems.
In normal circumstances, a retrieval Augmented Generation (RAG) based AI system will use a retrieval mechanism to extract relevant keywords to search and match with resources stored in a Vector database, using that embedded context to create a new prompt containing the relevant information to reference.
The researchers demonstrated that due to the architecture used by Retrieval Augmented Generation (RAG) to essentially leverage content as a prompt in certain circumstances, an attacker could indirectly manipulate AI-generated responses by adding content to any documents the AI system might reference, potentially leading to widespread misinformation and compromised decision-making processes within the organization.
The Attack Flow
An adversary attempting ConfusedPilot attack would likely follow these steps:
- Data Environment Poisoning: An attacker introduces an innocuous document that contains specifically crafted strings into the target’s environment. This could be achieved by any identity with access to save documents or data to an environment indexed by the AI copilot.
- Document used in Query Response: When a user makes a relevant query, the RAG system retrieves the document containing these strings.
- AI Copilot interprets strings as user Instructions: The document contains strings that could act as instructions to the AI system, including:
- Content Suppression: The malicious instructions cause the AI to disregard other relevant, legitimate content.
- Misinformation Generation: The AI generates a response using only the corrupted information.
- False Attribution: The response may be falsely attributed to legitimate sources, increasing its perceived credibility.
- AI Copilot retains instructions: Even if the malicious document is later removed, the corrupted information may persist in the system’s responses for a period of time.
Who can be Affected?
While these types of attack could be directed at any organization or individual using RAG-based AI systems, they are especially relevant for large enterprises and service providers that allow multiple users or departments to contribute to the data pool used by these AI systems. Any environment that allows the input of data from multiple sources or users – either internally or from external partners – is at higher risk, given that this attack only requires data to be indexed by the AI Copilots. Essentially the attack surface to introduce harmful data, manipulate the AI’s responses, and potentially use this to target the organization’s decision making tools is increased exponentially as demonstrated by the below image.
The urgency with which you should take steps to defend against these forms of attacks depends on your organization’s use of RAG-based AI systems, the level of trust required and boundaries you place around the data sources used by these systems.
A few illustrative examples:
- Enterprise knowledge management systems: If an attacker introduces a malicious document into the company’s knowledge base copilot (as a result of social engineering or intentional sabotage, for example), the attacker could then manipulate AI-generated responses across the organization to spread misinformation throughout the organization, potentially affecting critical business decisions.
- AI-assisted decision support systems: In environments where AI systems are used to analyze data and provide recommendations for strategic decisions, an attacker could inject false information that persists even after the original malicious content is removed. This could lead to a series of poor decisions over time due to reliance on the use of AI, with the source of the problem remaining elusive without thorough forensic investigation.
- Customer-facing AI services: For organizations providing AI-powered services to customers, Confused Pilot becomes even more dangerous. An attacker could potentially inject malicious data that affects the AI’s responses to multiple customers, leading to widespread misinformation, loss of trust, and potential legal liabilities.
- End Users Relying on AI-generated Content: Whether it’s employees, or executives, any end user using AI assistants for daily tasks or synthesizing AI-generated insights could make critically flawed decisions and unknowingly spread misinformation throughout the organization.
Industry Response
The disclosure of ConfusedPilot at DEF CON’s AI Village sparked significant industry attention, with Microsoft taking the lead in reaching out to the research team. While the demonstration at DEF CON focused on Microsoft’s Copilot, the researchers emphasized that the attack was possible on RAG architectures broadly, prompting interest from various organizations planning or implementing AI systems. The industry’s response has been notably positive, particularly regarding the team’s approach to not just highlighting the problem but providing practical mitigation strategies using existing tools.
Mitigation: Securing the Data+AI Ecosystem
Given the deeply intertwined nature of data and AI copilot’s in RAG architectures, protecting against ConfusedPilot requires a multi-faceted approach that addresses the security of both the data inputs and the AI outputs. Longer term, better architectural models will be required to try to separate the data plan from the control plan in these models. Key current strategies for mitigation include:
- Data Access Controls: Limit and scrutinize who can upload, modify, or delete data that RAG-based systems reference. Implementing strict data governance policies is essential.
- Data Integrity Audits: Regularly audit and verify the integrity of your data repositories to detect unauthorized changes or the introduction of malicious content early.
- Data Segmentation: Where possible, keep sensitive data isolated from broader data sets to prevent the spread of corrupted information across the AI system.
- AI-Specific Security Tools: Use specialized and research driven AI security solutions throughout the LLM pipeline, such as Guard LLM’s, fact checkers on grounding documents, Guard LLM’s and prompt shields, or anomaly detection systems that monitor and compare responses over time to detect irregularities in AI outputs resulting from poisoned data.
- Human Oversight: Maintain human oversight over critical AI-generated content, especially in decision-making contexts, to validate the accuracy of information before acting upon it. Train employees to critically evaluate AI-generated content, raising awareness of the potential risks associated with blindly trusting AI outputs.
Key Takeaways
The emergence of ConfusedPilot has revealed the inseparable relationship between data and AI security in RAG-based systems. This novel attack technique demonstrates how data poisoning can effectively compromise AI outputs, emphasizing the need to treat data as a core component of AI security architecture. The performance and reliability of AI systems are intrinsically linked to the quality and security of their training data, requiring a fundamental shift in security approaches.
The insider threat vector is particularly significant in this context. ConfusedPilot shows how individuals with even minimal data access can potentially influence AI system outputs. This highlights the limitations of current security models and underscores the need for comprehensive data security posture management.
Addressing these challenges requires an integrated defense strategy centered on robust data security posture management (DSPM) tools. These tools provide critical capabilities such as continuous data discovery, classification, and risk assessment across an organization’s entire data estate. When combined with access controls and employee training, DSPM tools form the cornerstone of a strong defense against data poisoning attacks. Organizations should prioritize implementing DSPM solutions that can monitor data lineage, detect anomalous changes, and ensure the integrity of data feeding into AI systems.
While AI continues to drive innovation, the discovery of ConfusedPilot marks a turning point in how organizations must think about Data+AI security. The integrity of AI systems is intrinsically tied to the integrity of the data they reference. As AI adoption accelerates, ensuring that data is protected from manipulation will be a cornerstone of securing the future of AI-driven business operations.
[1] UT AUSTIN Research TEAM
Ayush Roychowdhury (UT Austin) Ayush RoyChowdhury is a first year masters student at the Chandra Department of Electrical and Computer Engineering at the University of Texas Austin. His research interests include language model security, data security, and explainable artificial intelligence for security. |
Mulong Luo (UT Austin) Mulong Luo is currently a postdoctoral researcher at the University of Texas at Austin. His research interests are in computer architecture, side channel, and machine learning. He won best paper award at CPS-SPC workshop. He got his Ph.D. from Cornell University in 2023. |
Prateek Sahu (UT Austin) Prateek Sahu is currently a Ph.D. student at the University of Texas at Austin. His research interests are microservices, service mesh, cloud computing, function-as-a-service measurement. |
Sarbartha Banerjee (UT Austin) Sarbartha Banerjee is currently a Ph.D. candidate at the University of Texas at Austin. His research interests are secure accelerators, side channel defense, machine learning security. |
Mohit Tiwari (Symmetry Systems / UT Austin) Mohit Tiwari is an associate professor and he directs the SPARK lab at the University of Texas at Austin. He is also the CEO of Symmetry Systems, Inc. His current research focuses on building secure systems, all the way from hardware to system software to applications that run on them. Prof. Tiwari received a PhD from UC Santa Barbara (2011) and was a post-doctoral fellow at UC Berkeley (2011-13) before joining UT. |
Related Resources
Related Articles:
10 Fast Facts About Cybersecurity for Financial Services—And How ASPM Can Help
Published: 12/20/2024
The EU AI Act and SMB Compliance
Published: 12/18/2024
Decoding the Volt Typhoon Attacks: In-Depth Analysis and Defense Strategies
Published: 12/17/2024
Threats in Transit: Cyberattacks Disrupting the Transportation Industry
Published: 12/17/2024