What Is Dark Data and Why Must You Find It?

Published 03/11/2022

This blog was originally published by BigID here.

Written by Kimberly Steele, BigID.

In the most straightforward terms, dark data is data that organizations don’t know they have. It is part of the massive, complex, sprawling world of Big Data — and the biggest part, at that.

Think about all the data that organizations collect and process for a specific purpose. If they’re actively analyzing it, chances are they know about it. But then there’s the rest of the data that organizations collect and store — the data that doesn’t get used, processed, or analyzed; the data that lurks in the shadows and hides below the surface, gathering risk and sleeping on missed business opportunities; the unorganized, untapped, unprotected, and unknown data that organizations inevitably have, but just don’t know it.

That’s dark data. And there’s a lot of it — likely more than half of your organization’s total data, right now.

Dark Data Challenges

Dark data often gets captured right alongside purpose-driven data — and therefore regularly contains sensitive, personal, regulated, vulnerable, or high-risk information that must be kept out of the wrong hands. The fact that this data remains unanalyzed creates both active and passive problems for companies — problems that can lead to substantial costs.

Actively, dark data increases security risk merely by existing in a company’s system, unnoticed, without having the proper safeguards around it — sometimes for a very long time. Since the data is unknown, it also goes without the necessary regulatory processes a company would normally put in place for compliance. And since unknown data is essentially ignored, malicious attackers consider it ripe for the picking.

Additionally, untapped data may contain valuable information that companies could leverage for insight if they only knew that it existed, what it contained, and how to locate and utilize it. Businesses might spend millions collecting or analyzing new data to derive insights from relevant information they already have — and could uncover and leverage with the right technology.

Types of Dark Data

Data that organizations hold breaks down into three categories:

critical business data, the highly valuable information that is relevant to a business’s continuous growth and the meeting of goals
redundant, obsolete, and trivial (ROT) data hiding in internal networks that, once discovered, can be marked for deletion or moved into remediation workflows
dark data that companies don’t know they have, don’t use — and that poses constant risk

Unknown data can be anywhere, and while unstructured data makes up the lion’s share of dark data, it can reside in sources that are:

Untapped data may consist of forgotten data, metadata, expired time-sensitive data that is no longer relevant, and more. Some common examples include:

emails and email attachments
zip files that are downloaded and then forgotten
former employee data, including project files and notes
presentations and spreadsheets
geolocation data
log files and account information
transaction histories
customer call logs and records
audio, video, image, and text files
financial statements

Where Is Dark Data Generated?

Gartner calls dark data “the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes.”

Therefore, unused data is often collected right along with data that gets utilized and processed. Any data, anywhere — stored across any type of data source, on-prem or in the cloud — can be dark. Of the average organization’s data, 15% is critical business data, 33% is ROT data, and 52% is dark — and dark data by its very hidden nature is vulnerable and subject to constant risk.

How Should You Handle Dark Data?

Finding and classifying unknown data is critical for organizations' privacy, security, and compliance initiatives.

If you don’t know your data is there, you can’t ensure that it meets compliance — and you can’t meet data privacy standards if you can’t associate your data with an identity. Additionally, you can’t protect what you don’t know you have — or know what level of protection it needs. Therefore, unknown data carries unknown levels of risk, but is often more vulnerable to data breaches and data leaks — which is pretty scary news, considering it very likely contains personal and sensitive information.

For many businesses, beginning to capture untapped data may seem overwhelming, but the process of finding, classifying, analyzing, and unlocking value from it is just a matter of implementing the right discovery solution. Companies need ML-driven technology with a deep discovery foundation that can find data across all systems and sources — everywhere in an organization, no matter where it’s hiding.

Dark Data Analytics

Dark data analytics refers to the technology solutions that companies use to locate unknown data so that its value can be unlocked to inform better business decisions.

Companies that prioritize mining dark data are well-poised to reduce risk and unlock valuable business insights that can help their organization grow and thrive. Enabling a solution so that previously untapped data can be moved to a data analytics platform provides a broader and far more accurate view of customer data across an entire enterprise.

Enhancing cloud security strategy Privacy

Share this content on your favorite social network today!

Latest from CSA

Introductory Guidance to CCM

Agentic AI as the New "Insider Risk"

Using Zero Trust to Secure Enterprise Information in LLM Environments

Unlock Cloud Security Insights

Subscribe to our newsletter for the latest expert trends and updates