The Current State of Cloud Data Security

Published 11/02/2023

Originally published by Dig Security.

Written by Sharon Farber.

Cloud computing has become a go-to solution for businesses worldwide. While cloud services offer several benefits, such as flexibility, scalability, and cost-effectiveness, they also bring in several challenges, especially when handling sensitive data. According to IBM, 45% of breaches involve the cloud, necessitating organizations to be extremely wary about where their sensitive data is stored, who can access it, and where it flows through. Failure to manage this data securely can lead to severe consequences such as reputational damage, direct costs, and legal implications.

Dig Security has comprehensively studied what sensitive data organizations actually store in the cloud. This eye-opening study highlights the challenges of managing cloud data and how to take control and reduce the attack surface.

More than 30% of cloud data assets contain sensitive information

Investigating Data Use Patterns

The study started with a whopping sample set to gather the most relevant information possible. Our data security research team scoured through billions of files and collected Petabytes of data stored in the cloud. Using this colossal amount of information, we could vividly picture utilization patterns across various cloud services. From how different data varieties were stored to the potential risks created by such utilization, we left no stone unturned in their pursuit of understanding cloud data management.

Knowing Your Data

With the ease of development and rapid prototyping that comes with the cloud, it is easy for organizations to have sensitive data make its way into cloud storage without considering the security implications. Whether it was intentionally planned for storage or used as samples for testing, uncontrolled data creates risks and opens up the potential for exposure. It is necessary to have a plan in place to secure sensitive data stored in the cloud to mitigate these risks.

Sensitive data that is uncontrolled and unsecured in the cloud can be a costly mistake for organizations. This is especially true for dev environments, which are often less secure. In our study, it was discovered that over 25% of all sensitive data categories existed in development environments. LastPass discovered the dangers of this when their development environment was breached, leading to a loss of source code and proprietary information, significantly damaging their reputation.

25% of all sensitive data categories existed in development environments

Knowing what data exists in all of your cloud environments and understanding its composition on a granular level is challenging but necessary for mitigating this risk. Databases may have various data types, and unstructured data, such as S3 buckets, often have files with more than one data type. Knowing this information helps to determine if data is sensitive and governed by internal policies, compliance mandates, or legal requirements.

Sensitive Vs. Non-Sensitive

Basic data classification can be broken down into two distinct categories of non-sensitive and sensitive data. Non-sensitive data can be publicly available or harmless if accessed by unauthorized individuals. On the other hand, sensitive data includes information that should remain private, such as personally identifiable, financial, and medical information.

Understanding Classification

Sensitive data classification is more nuanced than only sensitive or non-sensitive, as sensitive data has more specific groupings. These are often based on internal governance, legal, or regulatory mandates. For example, healthcare information may be protected by HIPAA, but if it is for an EU citizen, it is also covered under GDPR. Making a proper classification requires understanding the context of the data as well as its composition.

Our research discovered over 5 billion occurrences of PII

Context also becomes more critical as the data needs to be assessed in relation to other information existing in its proximity. Some data fields on their own may become far more sensitive when combined with other data types such as name and address. This combination does not necessarily need to be in the same database table but could reside in related tables or files where the data can be referenced. It can also create a similar risk if the information is distributed across dev and prod environments or different services.

Is My Data At Risk?

Determining if your sensitive data is at risk can be challenging, as it’s not just about what data is being stored but the controls put in place to secure it. Understanding that even if your data is stored in a secure location, it may still be at risk if proper controls are not implemented. Deep visibility into the cloud ecosystem is crucial for making this determination.

Organizations need to see services open to the public, logging disabled, delete protection not enabled, and data not encrypted at rest and in transit. Many services lack basic data protection features like having encryption enabled or not being open to the public, creating unnecessary attack surfaces. This places sensitive data stored in these services at high risk of being accessed by unauthorized users.

In databases, 91% of sensitive data is not encrypted.

Without adequate visibility, the organization remains blind to these risks and cannot determine appropriate controls for the data. Protecting sensitive data requires a proactive approach, and organizations must continuously monitor and update their controls to stay ahead of emerging threats and provide defenses appropriate to their sensitivity level.

Enhancing cloud security strategy