Our team is still absorbing all the customer conversations at the Databricks Data and AI conference held during the week of June 10th in San Francisco. Governance was a hot topic, as evidenced by the numerous sessions and various announcements from Databricks and its partners. During the keynote, in the chat between Ali Ghodsi and Jensen Huang, Huang mentioned that Databricks has pivoted from data processing to data governance. This statement accurately reflects what Databricks has accomplished with Unity Catalog over the past few years. Ali Ghodsi also highlighted in the keynote that quality and security are the primary reasons why 85% of AI workloads have not transitioned to production. At Acante, we simplify access governance for Data and AI, so I will be focusing on Access governance for the rest of this blog.
A modern data stack consists of the following components with governance ensuring quality, security and compliance across the stack.
We love the lakehouse model that Databricks supports, as it allows the separation of data from compute and provides customers with the flexibility to build out the remaining stack. With the acquisition of Tabular, Databricks seems committed to the UniForm data format, enabling it to store a single copy of data for use by different data platforms. This is a significant win for customers, as it allows them to use various data processing tools for different use cases. In our conversations with customers and observations of the industry, what is lacking is a unifying governance layer across various systems. Without this, customers are still left to manage complex access governance across their data stacks. Each platform provides its own access control layer and telemetry to track access and lineage. Given the fluidity with which data moves between systems, a unified access governance layer would provide customers with choice and drive industry innovation.
Each vendor is attempting to make their governance layer the unifying one across data platforms. For example, Databricks supports Lakehouse Federation, which includes both Database federation (e.g., Redshift, Snowflake) and Catalog federation (e.g., Hive, AWS Glue), allowing Unity Catalog to serve as a governance layer for some customers. However, there is concern about vendor lock-in, as each data platform will inevitably focus on advancing its own data processing stacks. This is why we were very excited to learn about the open-sourcing of the Unity Catalog! It was received with great interest at the conference and community meetings starting the week of June 24th. You can read more about Databricks' intentions with open sourcing.
Note that Databricks didn’t open source their entire Unity Catalog but rather planted a seed with a 0.1 version with a very limited set of features. For UC OSS to be successful, it will require significant investment from Databricks and the ecosystem. We have already started identifying areas where we can innovate. As a validated Databricks partner, Acante welcomes the open-sourcing of Unity Catalog, as it could unleash a new wave of innovation that customers can benefit from!