Never Trust User Inputs—And AI Isn't an Exception: A Security-First Approach

Published 09/13/2024

Originally published by Tenable.

Written by Rémy Marot.

Artificial Intelligence (AI) is transforming industries and beginning to be widely adopted by software developers to build business applications. However, it’s critical that organizations ensure the security of their users, their data and their infrastructures.

In cybersecurity, a core rule is: “Never trust user inputs." This rule should be extended to AI technologies. AI systems, such as chatbots, act as intermediaries and they process and generate outputs based on user inputs. These AI technologies should also be treated as a new form of input and subject to the same level of scrutiny and security.

In this blog post, we’ll explain the importance of a security-first approach in AI development and the risks of open-source tools.

AI tools’ lack of security by design

Most AI tools are open-source and designed to be ready-to-use locally on a developer’s machine. Many of these tools do not adhere to robust security practices by default, which makes them vulnerable to exploitation. While analyzing common projects available on GitHub, Tenable Research discovered that, for example, most of them do not offer any authentication by default. Their attack surface is large because they have a web interface, an API and a command-line interface.

It’s likely that the strong market interest in AI-related tools and applications has negatively influenced their development, favoring the emergence of proof-of-concept (POC) software, which is becoming very popular, rather than building battle-tested software.

In the cloud era, it’s possible to quickly build new services or rely on pre-existing Docker images without first vetting them for exposure. However, it can be a major risk for organizations to leave this door open. In such situations, deploying, for example, an internal AI model on a tool that lacks proper authentication could have dramatic outcomes. To cite a recent example: the Ollama tool allowed remote code execution (RCE) without any specific configuration other than having its API exposed.

During our research, we discovered several zero-day vulnerabilities in projects that are very popular on GitHub. However, despite many coordinated disclosure attempts, the projects’ maintainers have not responded in a reasonable amount of time (and sometimes not at all). We think this shows the lack of security maturity in this ecosystem, which seems to prioritize speed of delivery to the detriment of security.

While conducting our research, we found that previous vulnerability patches could be bypassed like this NextChat server-side request forgery (SSRF) vulnerability. Our analysis of a well-known software named Langflow also highlighted a vulnerability in the permission model implementation, allowing a low-privileged user to gain super-admin privileges without any interaction.

The risks of relying on third-party LLMs

Large language models (LLMs) require substantial compute and storage resources. For this reason, many organizations find it difficult to deploy and maintain them on-premises. Consequently, it is often easier to rely on third-party providers to manage these resource-intensive models. However, it’s difficult to blindly trust third-party service providers with potential critical business data.

The critical risks related to such usage are real and should be handled on different levels:

Data breach on the provider side: Processed data could be compromised if the provider suffers a data breach. That’s why you must vet third-party providers and ensure they follow privacy and data-protection policies.
Credential leakage: Accessing third-party services requires handling credentials and authentication data. As for any secret data, these credentials can be inadvertently leaked in different places such as public source code management (SCM) software or web application front-ends.
Model trustworthiness: Third-party services can provide numerous models to their customers. You must assess their reliability, safety and adherence to ethical guidelines.

As organizations embrace these new technologies to enhance their business, they should ensure that their AI governance rules cover these risks.

The perils of inadequate datasets

AI is built to fully leverage the data that it consumes. One of its goals is to help organizations take full advantage of the data and knowledge gained over the years.

You should see the dataset used to train the model as an input and carefully analyze it. An inadvertent leak of confidential data via model outputs could cause a significant security breach. Biased data can lead AI software to make unfair or harmful decisions.

To properly handle model security, you should focus on confidentiality, integrity and availability. Some examples include:

Only include data that is safe for exposure to intended users in datasets. When possible, use data-anonymization techniques to help safeguard sensitive information such as personally identifiable information (PII) and decrease risks of failing to comply with laws and regulations.
Properly implement and monitor data collection processes to ensure that data comes only from trusted sources and is accessible only to authorized users, and that the model uses the data and operates according to expectations.
Data availability is crucial for the model to be trained on a complete dataset that matches business requirements. Model availability is also a concern for applications that require usage in a synchronous way. The application fallback behavior should be carefully reviewed and tested like any other failure in classic developments.

Emerging AI vulnerabilities

LLMs introduce new classes of vulnerabilities that traditional security measures may not address properly. The most prevalent AI-related vulnerabilities are prompt injection attacks, model theft and training data poisoning.

In prompt injection attacks, malicious users craft inputs to manipulate LLMs into generating harmful or unauthorized outputs. Remember the “Never trust user inputs” cardinal rule? In this case, the LLM will act as an intermediary between the user inputs and the system. This could result in the system exposing sensitive information, executing malicious commands or becoming an attack vector for other common vulnerabilities like stored cross-site scripting. As an example, Vanna.AI, a Python-based library designed to simplify SQL queries from natural language inputs, was recently identified as being vulnerable to prompt injection attacks and leading to remote code execution on vulnerable systems.

We should protect models in the same way we protect confidential and business critical data. The first part of this blog post described how easily some AI tools can expose data to unauthorized actors. Applying defense-in-depth principles will help minimize intellectual property leakage if model theft occurs. It’s also important to harden model security with techniques such as encryption and obfuscation, with proper monitoring.

Finally, AI training data-poisoning is a modern supply-chain attack. By altering the data used by the model, attackers can corrupt its behavior and trigger biased or harmful output, leading to direct impacts on the applications using it to achieve business goals.

As for other traditional fields, developers should always stay updated with the latest security guidelines and incorporate strategies from the OWASP Top 10 for LLMs. Techniques such as input validation, anomaly detection and robust monitoring of the AI ecosystem's behavior can help detect and mitigate potential threats.

Balancing innovation and risk

AI technologies are promising and can transform many industries and businesses, offering innovation and efficiency opportunities. However, AI technologies represent a huge security challenge at many levels in organizations and this should not be overlooked.

By adopting a security-first approach, following best practices and having robust governance, organizations can harness the power of AI and mitigate the emerging threats related to its adoption.

Check out the Tenable white papers “7 Steps to Harden Cloud Security Posture” and “Elevate Your Vulnerability Remediation Maturity.”

About the Author

Rémy joined Tenable in 2020 as a Senior Research Engineer on the Web Application Scanning Content team. Over the past decade, he led the IT managed-services team of a web hosting provider and was responsible for designing and building innovative security services in a Research & Development team. He also contributed to open-source security software, helping organizations increase their security posture.

Interests outside of work: Rémy enjoys spending time with his family, cooking and traveling the world. Passionate about offensive security, he enjoys doing ethical hacking in his spare time.

Artificial Intelligence Data Security Risk Management Vulnerabilities