AI Security Risks Start with Poor Data Visibility

Published 04/06/2026

For a lot of organizations, AI has become the answer to almost every security question.

Need faster detection? Add AI.

Need better prioritization? Add AI.

Need help managing an exploding volume of files, messages, logs, and documents? Definitely add AI.

But CSA’s new survey report, commissioned by Thales, offers a more grounded takeaway. AI can help improve security, but only if the fundamentals are already in place.

For unstructured data security, the real story is not simply that AI is powerful. AI increases both opportunity and risk. The outcome depends on whether organizations have built the necessary data visibility, classification, labeling, and governance foundations.

Unstructured data is not a niche problem anymore. It is the everyday content of the enterprise: documents and files, communication data, and logs or observational data.

In the survey, documents and files accounted for 73% of unstructured data volume. Communication data accounted for 62% and logs for 43%. This is the data employees create, share, copy, store, and move across cloud apps, platforms, file servers, and public cloud environments all day long.

Now add AI to that environment.

In the report, 47% of respondents identified AI-driven threats as the top risk to unstructured data in 2026. At the same time, organizations are planning to use AI as a core security capability:

40% plan to use it for threat detection and automating security workflows
37% plan to apply it to classification and labeling as well as discovery and inventory

Defenders and attackers are both bringing AI to the same fight. This is where the new report gets especially useful for security practitioners. It avoids the easy “AI is coming, therefore buy more AI” narrative. Instead, it shows that if you deploy AI on top of weak data security foundations, you may just automate confusion at scale.

Why Foundations Matter More Than AI Hype

Only 35% of organizations report full visibility into where unstructured data resides. Just 9% have real-time scanning capabilities, and 23% cannot scan unstructured data for risks at all. These act as structural limits on what any AI system can do well.

AI models do not magically compensate for missing inventory, incomplete coverage, or inconsistent controls. If an organization doesn't know the following, then AI has little reliable context to work with:

Where its sensitive files live
How they're labeled
Who can access them
How quickly risky content can be scanned

The likely result is more noise, more blind spots, or faster decisions based on incomplete information.

Unstructured data is also distributed and dynamic. The report notes that sensitive unstructured data is commonly stored in:

Cloud applications (58%)
File servers (57%)
Public cloud environments (47%)
On-premises databases (46%)
Cloud collaboration tools (45%)

As data moves across those locations, visibility, protection, and operational readiness lag behind. AI cannot establish visibility retroactively if you don't clearly understand the underlying data estate.

The Classification and Labeling Problem is About to Get Bigger

The report also shows how much organizations want AI to help with classification and labeling. That makes sense, since classification is tedious, inconsistent, and difficult to scale manually. At the same time, it's foundational to unstructured data environments. If you cannot distinguish sensitive content from lower-risk data, then access control, monitoring, and incident response all become less effective.

But the report also shows that organizations are not starting from a clean baseline. One in ten organizations report having no sensitivity labeling for unstructured data at all, and classification practices remain inconsistent. That means many enterprises want to use AI to accelerate a process that they have not yet consistently defined.

AI can absolutely help classify large volumes of content, surface patterns, and improve discovery and inventory across sprawling environments.

However, AI-driven classification still depends on policy clarity. Teams have to define what counts as confidential, regulated, restricted, or business-sensitive information. They have to decide on follow-up actions. They need governance for exceptions, ownership, and review.

Otherwise, classification and labeling becomes another checkbox capability that looks mature on a roadmap and behaves unpredictably in production.

AI Access Risks Are Not Theoretical

The report also points to another issue: AI also creates new forms of exposure. AI-generated data and AI access risks are already emerging drivers of unstructured data growth. Organizations are dealing with:

More content created by AI systems
More systems touching enterprise data
More opportunities for prompts, integrations, model training workflows, or over-permissioned tooling to expose sensitive information

Organizations need clear policies for access, usage, and oversight governing how AI systems interact with sensitive unstructured data. This is necessary to reduce the chances of non-compliance, IP loss, or reputational damage as AI adoption expands.

Security teams have seen this pattern before. A new technology promises scale and efficiency. Organizations adopt it quickly. Governance arrives later, usually after a few painful lessons.

With AI and unstructured data, the cost of learning late can be high. The data involved often includes exactly the information organizations can least afford to expose.

What Security Leaders Should Do Next

Do not treat AI as a shortcut around foundational data security work. If your organization is investing in AI for unstructured data security, start by asking a few critical questions:

Do we have full data visibility into where sensitive unstructured data resides?

AI thrives on context. Without accurate and comprehensive data inventory, even the most advanced models will produce incomplete or misleading outputs. Security teams should prioritize building a centralized view of unstructured data. Make sure to consider all cloud apps, collaboration tools, file systems, and hybrid environments.

Knowing where your organization stores the data isn't enough. You also need to understand:

Who owns it
Who has access to it
How it is being used or shared

Without this baseline, AI-driven detection or classification becomes guesswork at scale.

Are our classification and labeling policies clearly defined and consistently applied?

Using AI for classification and labeling will only succeed if you clearly define policies upfront. AI can accelerate classification, but it cannot define what “sensitive” means for your organization.

Security leaders should:

Establish standardized classification schemas (e.g., public, internal, confidential, restricted)
Align those schemas with regulatory and business requirements
Define enforcement actions tied to each classification level

Once those guardrails are in place, AI can help apply them consistently across large volumes of unstructured data.

Do we understand which AI systems can access which data and under what controls?

AI is unquestionably expanding the attack surface. AI systems interact directly with unstructured data, whether through prompts, APIs, integrations, or training pipelines.

That means security teams must account for:

AI access risks, including over-permissioned tools or unintended data exposure
The creation of AI-generated data, which may introduce new governance and lifecycle challenges
The potential for sensitive data leakage through model interactions

Establishing clear policies for access, usage, and baseline protection is now a requirement for managing AI securely in data-rich environments.

Can we scan and assess unstructured data quickly enough to support meaningful detection and response?

Unstructured data environments are inherently distributed and constantly changing. Point solutions that address a single repository or use case will struggle to keep up. Instead, organizations should focus on scalable approaches that integrate visibility, classification, and monitoring across environments.

This may include:

Unified data security platforms with posture management for unstructured and structured data types
Automation workflows that integrate with existing tools
Continuous scanning and monitoring capabilities

The goal is to create an ecosystem where AI can operate effectively and reliably.

Conclusion

The questions above may sound foundational, but that’s exactly the point. CSA's new report makes it clear that many organizations are still operating with partial visibility, inconsistent classification, and limited scanning capabilities. Layering AI on top of those gaps often amplifies them instead of closing them.

Foundational readiness determines whether AI becomes a force multiplier or a force magnifier of existing weaknesses. In the context of AI security risks, this means organizations need to shift their mindset. Instead of asking, “How can AI solve our data security challenges?”, the better question is, “Are we ready for AI to interact with our data securely?”

To understand how your organization compares, download The Rise in Unstructured Data and AI Security Risks. The survey report shows where the industry stands today and what it will take to secure unstructured data in an AI-driven future.

To quote Cybersecurity & Data Protection Leader Hans Vargas: "This survey report examines the growing security challenges associated with unstructured data in modern enterprises. It reveals that while this data—comprising documents, emails, and logs—is the primary driver of corporate data growth, organizations suffer from significant visibility and classification gaps. A notable disconnect exists between high executive confidence and the reality of fragmented tools, manual processes, and unprotected data sets. Furthermore, the report highlights Artificial Intelligence as both a sophisticated security threat and a vital tool for future defense. Ultimately, the authors argue that establishing strong foundational controls is essential for managing risk as data continues to disperse across cloud environments."

For an expert analysis of the report that touches on all the key findings, check out our joint webinar with Jon G Shende, Chief Technology Officer at Thales, and Hillary Baron, AVP of Research at CSA (now available on-demand).

Data Security Compliance Risk Management Artificial Intelligence Surveys