Implementing a Data-Centric Approach to Security

Published 04/19/2024

Written by Uday Srinivasan, CTO, Acante.

We previously discussed how the modern data stack has changed the threat landscape today. In part II, below, we outline exactly how security and data teams can enable modern data teams to innovate rapidly without compromising on the security and access governance of the enterprise data.

The Need: A Data-centric Approach to Security

Historically, the term “data security” has been largely equated with encryption and similar control measures. However, with the proliferation of data access to human and service identities, existing data protection controls, while important, are not nearly enough and have become mostly a compliance checkbox. Most modern data breaches involve authorized users or applications that have been taken over by malicious actors.

The imperative now is to:

Get full visibility into the data context - all its security attributes and where it resides
Keep track of its fluidity - where it moves and how it is transformed
Express security controls such as threat detection, leakage and access controls in the language of the data and its attributes

We call this “data-centric security”. Let’s delve into what is missing from today’s security stack. We broadly categorize these into three key aspects:

Security Observability for Data: As the adage goes, “you can’t improve what you don’t measure”. Every customer we speak with is struggling with the foundational observability aspects. “What sensitive data do I have?”, “Where all is it going and in what form?”, “Who (humans, machines or applications) is accessing it?”. “Which data is most risky” Capturing this security context for data is not only the bane of data teams responding to asks from their GRC counterparts, but it also guides the focus of the security and access governance programs.

Modern data platforms make these questions significantly more complex. They have petabytes of data – structured, semi-structured and unstructured – and massive query volumes from thousands of users. There are a whole new set of identities involved - notebooks, dashboards, AI models, LLMs, analytical pipelines and jobs access and transform the data. There are new data sharing protocols such as Delta Sharing (Databricks), Secure Data Sharing (Snowflake), cleanrooms, data marketplaces and other mechanisms to share data with other teams, organizations or partners with just a few clicks. Forward-leaning customers we are working with have repeatedly expressed their concerns with this “ticking time bomb” - the unknown security risks where these massive blind spots are left unaddressed.

As an analogy to recent developments in the industry, container security observability needs became critical with the platform shift to K8s and containers. A new category of Cloud Workload Protection solutions became a must-have for platform security teams to overcome the new blind spots in K8s environments.

Zero Standing Data Access Privileges: The last 10+ years of efforts to achieve Zero Trust privileges broadly across all resources remains far from realized. However, if we could achieve least-privilege access specifically to DATA assets, that alone would reduce the risk of data breaches by over 10x. Gartner has been advocating this approach under the moniker of “Zero Standing Privileges”.

The challenge starts with the fact that most existing identity and privilege management solutions understand identities well but operate with limited or none of the relevant data context described above. The existing context is distributed across multiple domains - IdPs, IAM system, the data platforms native policy store, meta-data stores, data catalogs, and of course the tribal knowledge of every data owner. As a result, data access governance teams struggle to understand the basic question, “Who has access to What Data”? Mapping Identity → Data privileges has become a complex and multi-hop graph synthesis problem that has to be continuously analyzed for undesired exposure. Unfortunately, in most organizations data entitlement reviews are manual spreadsheet-driven checkboxes that don’t end up tightening the access privileges to data. According to one study, 99 percent of users, roles, services and resources were granted excessive privileges. Looking at this from the lens of the “data attack surface”, that’s a scary reality but also an extremely high value opportunity to reduce risk. Providing data access governance teams the capability to identify over-provisioned access privileges to data and remediate them, in a highly automated and actionable manner, promises to be a game changer in the fight against data breaches.

Multi-Platform Data Access Controls: As enterprises become more mature, organically or through mergers and acquisitions, they come to own a vast number of platforms in their data stack. Snowflake, Databricks, Kafka, Fivetran, Pinecone, Redshift, and others - every one of these has their own access control layer, policy language and schema. Learning each language is cumbersome for lean data governance teams. In addition, as data moves between these systems, ensuring consistency of policies for the same data becomes a challenge. High on the wish-list for these customers is an out-of-band policy management engine that automatically orchestrates the life cycle of the policies in the underlying data platforms. In effect, a single policy language that allows expression of dynamic policies based on the dynamic attributes of the Identity and Data. In conjunction, the approval of these access grants requires all the relevant data context to make a risk-informed decision. And, it must tie into the customers existing provisioning workflows in their tool of choice - Slack, JIRA, Service Now or others.

An Exciting Mission Ahead of Us

We are at the cusp of hockey stick growth in use of data to power AI/ML applications, LLMs, business intelligence and workflow automations that will affect every aspect of our organizations. Modern data teams need fast and friction-free access to the relevant data to ultimately create value for the business. Responsible, secure and compliant use of the data in these platforms is going to be critical to provide trust and remove obstacles to data democratization.

Check out part I of this blog series to learn how the modern data stack has changed the threat landscape today.

Uday Srinivasan is the co-founder and CTO for Acante, the leader in data security and access governance for modern data stacks.

Data Security Identity and Access Management Threat Intelligence Zero Trust