The Modern Data Stack Has Changed the Security Landscape
Published 04/05/2024
Written by Uday Srinivasan, CTO, Acante.
The way businesses analyze, transform and share data has radically changed over the past few years. We are in the post-Hadoop era with the Apache Software Foundation retiring over 10 Hadoop-related projects over the last three years. The shift of enterprise data to the cloud, demands of rapid online analysis, a trend towards decoupling storage and compute, acceleration in data growth and, not the least, demands of AI/ML applications have created new demands of data management and analysis. Today’s wave of modern data stack is largely built on top of public cloud infrastructure and offered as-a-service with a heterogenous set of best-in-class components. Databricks, Snowflake, S3, a variety of vector databases, dbt, Kafka, Fivetran, to name a few, have become the new core technologies for analytics, model training, transformation and ingestion.
If we look back at the last decade of the security industry, most of our investments revolved around securing the cloud infrastructure itself. These solutions secure the cloud instances, configurations, containers, application APIs and code. I’ve worked on many of these products over my career. What I saw was that these new data platforms completely abstract away the underlying cloud infrastructure. They have their own compute management layers, own access control, new data sharing protocols and movement mechanisms. To explain this with a simple example, an API security solution will have no idea which notebooks are reading sensitive data in Databricks, or leaking them to unauthorized users. It became clear to me that securing the underlying infrastructure addresses only part of the problem in the ultimate quest to secure the enterprise’s data.
As I looked at the emerging trends when starting Acante, engineering teams were already shifting their focus to building analytical applications, customer segmentation models and AI/ML workloads. Traditional cloud security capabilities are completely blind to these “new APIs” to data. Notebooks, dashboards, AI models, LLMs, pipelines and jobs are the new ways data is accessed, manipulated and transformed, making the data itself much more fluid. As a result, cloud infrastructure security solutions that lack the relevant data context (schemas, content type, sensitivity, lineage, leakage risk, meta-data tags and privileges, for example) cannot possibly provide the appropriate level of data protection. In addition, a whole new class of attack vectors have opened up on the modern data stack. It became clear to me that securing the attack surface for these new data platforms is a major problem and requires a unique purpose-built solution.
Check out part II of this blog series to learn more about how security and data teams can implement a data-centric approach to security.
Uday Srinivasan is the co-founder and CTO for Acante, the leader in data security and access governance for modern data stacks.
Related Articles:
A Vulnerability Management Crisis: The Issues with CVE
Published: 11/21/2024
The Lost Art of Visibility, in the World of Clouds
Published: 11/20/2024
Why Application-Specific Passwords are a Security Risk in Google Workspace
Published: 11/19/2024
Top Threat #5 - Third Party Tango: Dancing Around Insecure Resources
Published: 11/18/2024