5 Essential Components of a Cloud DLP Solution

Published 05/23/2023

Originally published by Dig Security.

Written by Yotam Ben-Ezra.

The DLP landscape has taken a long time to catch up with the realities of the public cloud. Below we’ll explain why we think DLP tooling developed in the on-premise era is no longer fit for purpose. We’ll then suggest an alternative framework for designing cloud DLP, based on five core components.

A Quick Definition of DLP

Data loss prevention (DLP) is a security strategy and set of associated tools used to protect organizations from data breaches and other threats to sensitive data. DLP is a crucial part of the cybersecurity landscape because most organizations store data that must remain private – to comply with regulations, protect customers' data privacy, or prevent trade secrets from leaking.

How It Used to Work

Traditional approaches to DLP were developed when organizations stored data on their own physical server infrastructure, and data movement was restricted to internal networks. This was the perimeter that needed securing – and DLP tools would detect sensitive data and block attempts to exfiltrate it. The simplest way to do so was through monitoring the network and using agent-based solutions: software that was installed on servers and endpoints, and which continuously monitored data and user activity.

An agent has the advantage of 'seeing' everything:

Misconfigurations, such as unused open ports
Policy violations such as unencrypted data
Suspicious endpoint activity, such as a thumb drive inserted into a laptop connected to the VPN

Once installed, DLP tools could scan data records at rest in the company's databases in order to detect and classify sensitive data. They could also monitor data in motion as it traveled through the corporate network, and every known endpoint – so that if a breach or leak was happening, the DLP tool could identify it in real time, and alert security teams who would take steps to remediate.

How the Cloud Changed Everything

As in many areas of software development, the twin forces of cloud adoption and digital transformation have shaken up DLP and created a need for new types of solutions. There are four main ways the cloud challenges traditional approaches to DLP:

1. There’s much more data to secure. Organizations want to collect, retain, and process more data than ever before, and the cloud’s elasticity and ease of use enable them to do so with minimal IT overhead. Competitive pressures have created a sense of urgency to accelerate data innovation, which leads to a business environment that’s very supportive of new data initiatives – and these come with additional storage, analytics, and reporting requirements.

2. Cloud environments are complex and in constant flux. Rather than having the corporate network as a single perimeter to secure, and the enterprise data warehouse as the main destination for analytical processing, data is now spread across a multitude of private and public cloud services.

These services can be spun up and down as needed, and new ones can be added at any time. Data itself is constantly moving between data stores, and it’s all but impossible to predict the flow of data in advance.

3. Cloud environments are incompatible with agent based solutions. The cloud abstracts infrastructure behind interfaces (PaaS, DBaaS) or declarative scripting (IaaC). In most of these situations, the business doesn’t have access to the physical hardware – meaning it can no longer install its own software on the machines that store and process data. Tooling available from the cloud providers can only provide a partial picture – particularly in multi-cloud deployments.

Even in cases where it is still technically possible to use agent-based solutions (such as IaaS), the pace in which new servers and clusters are added – as can happen automatically, e.g. auto-scaling – makes this type of monitoring unmanageable due to the quantity and volatility of assets.

4. Overwhelmed security teams. Security teams must keep track of a plethora of services, configurations, and data flows, and struggle to maintain a holistic view of the cloud environment – which is incredibly complex to begin with, and mostly the purview of cloud experts who are always in high demand. With their resources already stretched thin, security teams struggle to stay on top of every alert and notification.

The end result is that the cloud has taken DLP multiple steps backwards. From a mature ecosystem of tools that could provide end-to-end data security, businesses are now left with a patchwork of tools, APIs, and policies to manage. Each of these would only cover some small aspect of overall data security – e.g., data classification on Amazon S3, data protection in Snowflake, or Purview on Azure – and security teams are left with the heavy lift of integrating all these systems into a cohesive, consistent security strategy.

The old perimeter of the internal network that DLP has historically covered

Components of a DLP Solution for the Public Cloud

Needless to say, organizations aren’t going to give up on data security and accept the massive financial and reputational risks of a data breach. Accordingly, a new type of cloud data security solution has emerged – one that’s built to address the unique characteristics of the public cloud, and which does not suffer from the same limitations as its legacy counterparts. This means a different design from the ground up, built around five core components.

The new perimeter includes public clouds that needs to be covered by cloud DLP

1. Agentless data discovery in fractured, complex environments

Any cloud DLP solution needs to address the reality of modern cloud deployments, which are no longer built around a monolithic data platform such as an Oracle data warehouse. Instead, companies rely on a diverse combination of best-of-breed tools to satisfy the analytical requirements of different teams, and to shorten time to value from data initiatives. Across teams and business units, an enterprise might be managing dozens of data services and thousands of data assets. And as data teams adopt principles from microservices based development, we are likely to see an even more fractured data stack in the future (and a higher potential for shadow data).

A cloud DLP tool needs to automate the legwork involved with discovering sensitive data in managed and unmanaged databases, as well object storage such as Amazon S3. Since agent-based solutions are not fit for purpose, this will have to be agentless.

Instead, modern cloud DLP would use APIs, log analysis, or other means to retrieve a representative sample of the data, scan it for sensitive records, and perform further analytical operations, without disrupting production. For security reasons, this must all be done without moving data to a cloud account that’s external to the organization.

2. Data classification and inventory

Once the data is discovered, it needs to be classified according to the organization’s own data security policies. This could include data that comes with specific regulatory requirements such as PII, PCI, or PHI, as well as custom sensitive fields such as customer IDs or product codes.

At the end of the classification process, the security team should have a clear inventory of all the sensitive data residing in its cloud account, including shadow data on cloud object storage or unmanaged data stores, and the ability to prioritize risks and policy violations based on the contents and context of the data.

3. Data-aware posture and static risk analysis

A cloud DLP tool needs to continuously monitor the cloud account for changes in data flows, misconfigurations, and new services that are added to the environment. This includes a “posture” analysis of the account – a real-time check of whether the cloud account is set up according to industry and domain-specific best practices, such as encryption, access control, or well-defined retention periods.

Taking data context and classification into account allows security teams to focus their posture hardening efforts on sensitive data assets, rather than attempt to chase misconfigurations across the entire cloud account.

4. Agentless dynamic monitoring and detection

The previous components all fall broadly under the umbrella of DSPM. They help organizations understand their data environments and establish a realistic data security strategy. However, there is still one major gap compared to the previous generation of DLP solutions – the ability to detect and respond to critical incidents in real time, such as in case of a high-risk policy violation or an actual data breach.

To provide a solution for real-time monitoring, cloud DLP tools need to include data detection and response (DDR) capabilities. Similar to agent-based tools, they should be able to identify records being exfiltrated, as well as suspicious user activity such as a sudden spike in API calls or a user logging in from a new location. By applying real-time predictive analytics to the logs generated by the cloud providers themselves, cloud DLP is able to provide a good level of real-time protection without the need to install agents.

5. A unified and up-to-date threat model

Businesses want to continue moving fast and adopting new technologies, without having to update their security stack or implementation whenever a new data service is added. Cloud DLP should support this motion by providing a unified threat model that is instantly applied to any new component in the data stack – including in multi-cloud and hybrid cloud environments.

The threat model itself should be regularly updated with learnings from the latest data breach incidents, attack pathways, and vulnerability reports. Cloud DLP providers will need to supply not just the technological means, but also the domain expertise to fine-tune these threat models. The threat model must remain accurate and sensitive enough to surface important threats, without contributing to further notification overload.

Enhancing cloud security strategy Threat Intelligence