Data Flow Security: Mitigating the Risks of Continuous Data Movement in the Cloud
Published 05/09/2023
Originally published by Dig Security.
Written by Yotam Ben-Ezra.
Executive Summary
- Data movement is ubiquitous in cloud environments due to diffuse architectural patterns and broad organizational access to data.
- Uncontrolled data flows can create compliance issues and lead to poor visibility over potential breach incidents.
- To effectively monitor data movement, security teams need to know the baseline for data locations, track suspicious activity, and prioritize incidents related to sensitive or compliance-related data.
What is cloud data flow?
Cloud data flow or data movement refers to the transfer, replication, or ingestion of data between cloud data stores (managed databases, object storage, and virtual machines).
An example of how sensitive data travels through cloud environments
Let’s imagine how a single, PII-containing CRM record can travel – and then consider the security implications:
- The flow starts with a user signing up for a web app and submitting their PII to create an account.
- User data is immediately stored in the app’s DynamoDB backend.
- App data is dumped to AWS S3 every 24 hours.
- The customer’s data is then replicated into three different data platforms in order to support different analytical use cases - RevOps, data science, and financial reporting.
- Within one day, the same PII record now exists in at least five different locations, often after undergoing various transformations and enrichments.
This is a very simplified example which does not account for staging and testing environments, unmanaged databases, or shadow data. But you can still see how within 24 hours, the same record now exists in five different data stores, is processed by applications that are potentially running in different regions, and – depending on the architecture in place – might exist on more than one public cloud.
Security Concerns Related to Cloud Data Flows
1. Complying with data residency and sovereignty gets complicated
In the example above, the PII belonged to Mr. Heidelberg of Berlin, Germany. This data is likely subject to the GDPR. To comply with the data sovereignty requirements, the data must reside in servers located in the European Union (or in countries with similar data protection laws).
It’s quite likely that the cloud account spans multiple regions, including non-compliant ones. If the data is being moved or duplicated across different data stores and cloud regions, it can become difficult to track and ensure that the data is being stored in the right region.
To prevent this, the business needs a way to identify PII across its cloud environment(s), and to find connected records. For example, the CRM record might have been joined with additional data such as support tickets or marketing interactions as part of a ‘customer 360’ initiative – creating additional PII records that also need to be accounted for.
2. Sensitive data travels between environments instead of staying segregated
Sensitive data needs to be kept separately (segregated), and under stricter access control policies. This is a basic data security best practice but also a matter of compliance with various regulations – for example, ISO 27002:2022 requires organizations to separate development, test, and production environments.
However, it is very common for developers to download data onto their own machine or to move it between environments for testing purposes – mainly because it makes their life easier and allows them to run more realistic tests. But if sensitive data is being moved, it can put the business at risk. (See: Achieving Cloud Data Compliance)
The business needs to understand data flows between environments, identify when segregation requirements are being ignored, and find out if unauthorized principals are getting access to it along the way.
3. Security incidents can fly under the radar
When nonstop data movement is considered business-as-usual, it’s easy for an actual data breach to go unnoticed. An organization that works with multiple vendors might not bat an eyelid when an external account is accessing a sensitive database – but this type of nonchalance could be exploited by bad actors. There are countless examples of how unsecure access to resources has led to data breach incidents (here is an example from May 2023).
Mapping data flows helps to separate the wheat from the chaff, and identify the flows that should trigger an immediate response from SOC teams. If they see data going to an external account, they need to be able to know immediately whether this is in fact a vendor, and whether this vendor has the right permissions to access this dataset.
Considerations for Monitoring Data Movement in Public Clouds
While data movement poses risks, it’s an inherent part of the way businesses leverage the public cloud. Similarly to dealing with shadow data, security policies and solutions need to balance between business agility and preventing data breach events. Below are key guidelines for achieving this:
Setting a baseline: Effective monitoring requires that the security organization has a ‘baseline’ of where data is meant to reside (physically or logically), can identify deviations, and can prioritize the incidents that pose a risk – usually the ones that involve sensitive data flowing into unauthorized or unmonitored data stores.
Look at the past, present, and future of your data: In some cases, you’ll be able to prevent the policy violation before it happens; in others, your monitoring should lead to further analysis and forensic investigation. Monitoring should encompass:
- The past – where did the data get created? Where does it belong? Are there additional versions of this data?.
- The present – where is the data right now? How did it get there? Does it belong in this data store, or should it be elsewhere? Is it moving where it shouldn't?
- The future – is there an opportunity for the data to move where it shouldn't? Can we prevent wrong data movement if it happens?
Data context and content matters: Not all data is created equal. Classifying your data and creating an inventory of sensitive assets should help you prioritize risks. For each data asset, you should be able to answer:
- Does this data have locality or sovereignty characteristics (e.g., data related to EU or California residents)?
- Does your data have other compliance-related characteristics (e.g., electronic health records)?
- Are the same sensitive records stored in multiple data stores?
- Which security policy does the data fall under?
Make data movement transparent for security teams: Once you’ve identified and prioritized your sensitive data assets, you want to provide visibility into this data moves between regions, between environments, and to external consumers such as vendors or service providers.
Develop the ability to identify when an incident has occurred: Your monitoring solution should not end with reducing static risk to data. Monitoring data flows in real time should alert you to high-risk incidents such as:
- Sensitive data being moved between development and production environments
- A configuration change which creates an opportunity for unauthorized access to the data
- Data being moved into non-compliant storage locations (e.g., EU resident data moved out of the EU)
- Unusually large volumes of company data being downloaded or copied locally
Steps to Reduce Data Flow Risk
Of course, monitoring is just the first step. Once you have a clear picture of your cloud data flows, you can design an effective data security strategy that protects data at rest and in motion. Here’s what you can do to minimize the risk of costly and damaging data leaks:
- Reduce the attack surface: Remove duplicate records and review data retention policies to prevent your organization’s data footprint from growing unnecessarily large.
- Enforce compliance: Use automated controls to monitor and identify data movement that causes non-compliance with GDPR, PCI DSS, or any other regulation.
- Notify owners: Whenever an incident is detected, have an automated process in place to alert the data owners (and other stakeholders if needed).
- Trigger workflows to fix compliance and security issues: Automate processes to address policy violations as soon as they occur.
- Enforce data hygiene: Periodically review your data assets and associated permissions to prevent data sprawl and shadow data.
Related Articles:
Establishing an Always-Ready State with Continuous Controls Monitoring
Published: 11/21/2024
The Lost Art of Visibility, in the World of Clouds
Published: 11/20/2024
5 Big Cybersecurity Laws You Need to Know About Ahead of 2025
Published: 11/20/2024
Managing AI Risk: Three Essential Frameworks to Secure Your AI Systems
Published: 11/19/2024