The modern data stack is largely built as a set of as-a-service platforms delivered on top of public cloud infrastructure. Databricks, S3, ADLS, a variety of vector databases, dbt, Kafka, Fivetran and others have become the new core technologies for analytics, model training, transformation and ingestion.
Unfortunately, traditional cloud security capabilities are completely blind to the fluid data and dynamic access mechanisms to data in these platforms. They lack the relevant data context (schemas, content type, sensitivity, lineage, leakage risk, tags, and privileges, for example) to provide the appropriate level of data security and governance. This lack of visibility and context puts today’s data platform teams in a constant struggle with their security counterparts in their efforts to democratize access to data while still meeting security and compliance mandates.
Unity Catalog: The New Data Governance Portal from Databricks
Databricks has taken a huge step forward toward simplifying data management with the evolution of the Unity Catalog for unified governance for data. “With Unity Catalog, organizations can seamlessly govern their structured and unstructured data, machine learning models, notebooks, dashboards and files on any cloud or platform. Data scientists, analysts and engineers can use Unity Catalog to securely discover, access and collaborate on trusted data and AI assets, leveraging AI to boost productivity and unlock the full potential of the lakehouse architecture.”
Unity Catalog is clearly the go-forward path for all Databricks customers. Given that, Acante has built a seamless and deep integration with Unity Catalog.
Setting up the Acante Data Security Intelligence Platform in a Unity Catalog environment is extremely simple. The Acante deployment is distributed as a notebook – the Metadata Discovery notebook. It is supported by a second provisioning notebook or, alternately, a terraform module. This provisioning notebook automates the entire setup, creating the necessary resources, service principals and configurations while setting up the metadata discovery notebook as a job. Databricks customers can on-board multiple workspaces at once and discover all the catalogs automatically. It also ensures a tight security model, preventing customer data within Databricks from leaving their environment. By just running this single notebook, the whole provisioning process can be completed in less than five minutes.
Acante Captures the Extensive Telemetry Exposed by Unity Catalog
Unity Catalog exposes an extensive set of telemetry – all of which is automatically captured by Acante. This includes platform configurations, catalog schemas, identities, access policies, data lineage, audit logs, metadata about all workloads, cleanrooms, delta shares and other security information. This telemetry collection is fairly involved. It’s captured from multiple sources in Databricks and requires significant transformations to derive relevant security insights. Some examples of telemetry sources include:
- system.information_schema: This includes a variety of tables that carry details about the metastores, catalogs, schema details, masking and filter functions, access privileges, information about Shares and much more.
- Control Plane APIs: There are endpoints available to gather information such as identities and groups, external locations, view definitions and details about all the workloads such as notebooks, jobs, dashboard and pipelines, including their sharing and permissions.
- system.access.audit: This provides granular query-level audit logs with multiple options for logging verbosity.
- system.access.column_lineage and table.lineage: Unity Catalog natively provides data lineage information down to column level along with the source that transformed the data.
From Data (telemetry) to Insights
By analyzing and stitching together all this rich telemetry, Acante is able to generate a host of powerful data security and access insights across the Databricks lakehouse and its ecosystem, including ingress / egress systems, external volumes and connected cloud datastores (e.g. AWS S3, RDS and others). The Acante Dynamic Identity-Data Threat GraphTM is at the core of the platform, powering these insights, and is the only solution in the industry for modern data stacks that brings together data security observability, access intelligence and access governance in a single platform. The security analytics empower data teams to:
- Easily approve, provision and right-size data access privileges (Data Privilege Access Management), all with complete data risk context
- Automatically discover, classify and assess risk for all sensitive data at petabyte-order scale
- Track sensitive data flows and prevent leakage of the data by any identity – including by users, service principals or workloads
- Implement granular data security guardrails to ensure compliant and secure data use
Ultimately, this empowers data teams to deliver fast, secure and compliant access to data. They can accelerate their Databricks adoption journey and democratize access to their data while confidently adopting Unity Catalog as their primary repository for access governance policies. Acante’s tight integration with Unity Catalog allows organizations to get the most out of their Databricks investments.
This is the first blog in a series that will discuss the above capabilities and use cases in more detail.