Data Governance and Security is Core to Delivering AI-Ready Data

To build high-ROI AI applications, companies must use proprietary enterprise data. But AI Safety becomes a barrier to getting these projects to market. Learn how to unlock the ROI of your AI initiatives and make your CFO happy!

‍

April 23, 2025

Dhruv Jain, Co-Founder and Chief Product Officer

Abhishek Das, Co-founder & VP Engineering at Acante

In our previous blog, The Next Wave of AI Safety Needs to Focus on Data Governance, we discussed why governance and security at the data layer is a major and unaddressed technical challenge. Since then, we have collaborated with several organizations and spoken with dozens of data and AI leaders who are either grappling with this issue or anticipating it as they push to bring enterprise AI applications into production. Our observations echoed broader industry surveys showing that nearly 4 out of 5 enterprise AI applications still haven’t reached production. From our conversations with customers, several consistent themes emerged:

To build high-ROI AI applications, teams need to use proprietary enterprise data—but they often face legal resistance or lack mature internal AI policies to enable that.
The security and privacy of the data used for training or as retrieval context is paramount. Overexposure to internal users or, worse, external leakage of sensitive data carries the risk of setting the AI programs back by several quarters
“Responsible AI”, and “AI Safety” as part of that philosophy, have become standard corporate mandates as enterprises govern their use of AI

Depending on where they are in their AI journey, almost half of the enterprises we spoke with have adopted extreme measures—excluding any data source that might contain proprietary, sensitive customer, or consumer PII data. This approach isn’t sustainable, as evidenced by the forward-leaning customers spanning healthcare, finance, and insurance verticals. As these customers built their internal AI platforms, they either started building internally or looked outwards for robust data security and governance capabilities they could embed into the platform. This is where Acante came in to help them.

Acante’s Data Governance & Security Framework for AI

The need for data governance and security spans 2 common scenarios:

Training of AI/ML pipelines and models
Runtime for RAG-based or similar applications architecture

At a fundamental level, the challenges can be expressed in very simple terms:

At ingestion: Is the data I am bringing in, safe for use by the AI systems? Is it being appropriately anonymized where needed?
At retrieval: Who should be able to see what data? In what form (whether anonymized or actual)?

The ingestion questions apply to both the training and runtime scenarios, while the retrieval questions apply at runtime in RAG or similar AI architectures in response to the prompts / queries from users or agents.

*Data Governance & Security Concerns at Ingestion and Retrieval*

In the world of structured data lakes and warehouses such as Snowflake, Databricks or others, there exist basic primitives for role-based access control (RBAC). Acante’s existing data access governance solution provides the multi-platform unified control pane for advanced data compliance and security, a dynamic access control and privacy co-pilot and more, to drive operational efficiency and data risk reduction.

However, with new AI applications, the data sources we see with customers are much more varied - there is traditional structured, semi-structured data already stored in their data platforms such as Snowflake, Databricks and blob stores such as AWS S3. Increasingly though, we see use cases connecting unstructured data and documents from sources such as:

Enterprise file systems e.g. Sharepoint, Office365, Google Drive
Chat transcripts from customer support apps, Slack etc.
Enterprise SaaS applications such as Confluence, Workday, Salesforce and so on

In this context, even basic primitives are completely lacking that address the above data governance challenges across structured and (particularly) unstructured data used in AI workloads. By extending our access governance solution to unstructured data used for AI workloads, Acante has created the first unified approach to Data + AI Governance & Security, precisely to address this market need.

‍

*Acante’s Data Governance & Security Framework for AI Applications*

Similar to the two primary challenges at ingestion and retrieval as outlined above, the Acante solution approach can be broken down into the two broad components:

Ingestion filter: As data teams connect a variety of data sources into their AI pipelines, they are looking for continuous monitoring of the ingested data. Given the dynamic nature of this source data, they look to ensure that:
1. At a coarse-level the documents and data sensitivity levels being ingested are necessary for the purpose by the AI agents or applications. Undesired proprietary documents or chunks should be automatically filtered out or redacted
2. Any sensitive customer or consumer PII data is automatically detected and anonymized at a token-level while allowing it to be retrieved in clear form by authorized users for the right purpose.
3. Robust set of guardrails and reporting are in place to confidently demonstrate these data ingestion controls
Retrieval firewall: to deliver production-ready AI applications with multiple user personas, a dynamic permissioning system with comprehensive auditing becomes a must-have. Customers requirements for such an access control system include the ability to:
1. Define policy based on metadata of the data - the metadata could be based on the context (topics) of the chunks, the sensitivity level of the tokens (e.g. PII types such as names, emails, …), ingested document tags etc.
2. Automatically honor the document access privileges of the source systems i.e. a user’s query should only retrieve document chunks that they have access to in the source document system. These privileges tend to be quite dynamic, especially for file systems such as Office365 or Google Drive where documents are created and shared continuously.
3. Infer and define finer-grained policies based on user context. Most source document systems are significantly over-provisioned. Today, this has a lower risk of data overexposure because humans usually don’t know the extent of their access. However, once the AI system indexes all this data, the likelihood of it retrieving and overexposing the undesired sensitive data is almost close to a certainty.
4. Lastly, providing detailed auditing of what data (documents, chunks, etc.) were filtered out / excluded from retrieval in response to a Users query. This serves the purposes of both retrieval quality analysis and data governance reporting.

We have integrated these data-layer governance capabilities into customers’ AI applications across a range of cloud platforms, including AWS, Databricks, and Snowflake. We work with the leading AI development frameworks such as LangChain, LangGraph, LlamaIndex, and Unstructured. The approach offers clear advantages over relying solely on prompt-layer guardrails. By enforcing governance and security at the data layer, customers are experiencing minimal latency overhead, reduced LLM usage costs, and the ability to apply more granular and expansive data guardrails and access controls.

While prompt-layer safeguards remain valuable for catching model-level issues such as toxicity or bias, they are insufficient on their own. Robust governance and security at the data layer is essential for confidently deploying enterprise-grade, production AI systems.

Data Security & Governance: Helping Deliver AI-Ready Data

Referring to Gartner’s Market Guide for AI TriSM, these capabilities provide comprehensive coverage of the Information Governance layer of the TriSM Technology Functions pyramid along with Runtime Controls for data access and inspection. .

After deploying these capabilities in production, we observed the data and AI teams gaining the confidence to integrate a broader range of structured and unstructured data sources into their AI applications. By unlocking access to proprietary enterprise data, organizations are significantly increasing the ROI of their AI initiatives (and making their CFOs happy!).

For example, one of our healthcare customers initially limited ingestion to publicly available research papers. With Acante in place, they expanded to connect data assets from SharePoint, Confluence, and AWS S3. Enabling legal and privacy teams to sign off on the associated risk controls—particularly regarding data exposure and leakage—has unlocked development of a host of new and more powerful use cases for enterprise AI.

If you are building AI applications and seeing some of these challenges, we’d love to hear from you and explore how we can partner on eliminating these blockers to your AI program. Please reach out to us at dhruv@acante.ai

‍

Unveiling the Challenge

As our digital footprint expands, so do the challenges of securing our data assets. Acante.ai recognizes the exponential proliferation and constant change in data access patterns, creating blind spots for traditional security approaches.

The Acante.ai Difference

At Acante.ai, our approach to data security marks a paradigm shift in the industry. Unlike traditional security models that often succumb to the static nature of data threats, Acante.ai thrives on dynamism. We believe that true security evolves with the challenges, and that's precisely what sets us apart. The Acante.ai difference lies in our commitment to providing security teams with more than just a shield; we offer a strategic ally that anticipates, adapts, and fortifies against the unpredictable proliferation of data access patterns. Our solution doesn't just keep pace with the digital transformation journey; it propels it forward. But what truly defines the Acante.ai difference goes beyond technology; it's ingrained in our culture. We are a collective of thoughtful, compassionate, and collaborative individuals on a shared mission to disrupt the security industry. With deep expertise from major brands and startups, we've collectively built over 10 startups, resulting in category-creating businesses, acquisitions, and IPOs. Our success is a testament to the collaborative spirit within our team, where every member contributes to shaping our culture and the future of data security. Join Acante.ai, and experience the difference that drives us to redefine the limits of protection in the digital age.

Dynamic Data Security

Explore the cutting-edge realm of dynamic data security with Acante.ai. In an era where the digital landscape is in a perpetual state of flux, Acante.ai's comprehensive approach to data security becomes not just a solution but a strategic imperative. Imagine a security system that not only reacts to the ever-changing data access patterns but anticipates and adapts in real-time. This level of sophistication is what sets Acante.ai apart. Our solution not only seamlessly integrates with the native controls of your data lakes and warehouse ecosystems but also evolves with them. It's not just about protecting your data; it's about empowering it. Acante.ai's dynamic data security solution is not confined by static parameters; it's a living, breathing shield that moves in harmony with the pulse of your data. As businesses navigate the complexities of the modern data landscape, Acante.ai provides not just a safeguard but a strategic ally, ensuring that security is not a hindrance but an enabler of progress.

Conclusion

In a world where data is both a valuable asset and a potential liability, Acante.ai emerges as a beacon of innovation. Join us on this exploration of the future of data security and discover how Acante.ai is empowering organizations to navigate the evolving landscape with confidence.

Request a Demo

To build high-ROI AI applications, teams need to use proprietary enterprise data—but they often face legal resistance or lack mature internal AI policies to enable that.
The security and privacy of the data used for training or as retrieval context is paramount. Overexposure to internal users or, worse, external leakage of sensitive data carries the risk of setting the AI programs back by several quarters
“Responsible AI”, and “AI Safety” as part of that philosophy, have become standard corporate mandates as enterprises govern their use of AI

Acante’s Data Governance & Security Framework for AI

The need for data governance and security spans 2 common scenarios:

Training of AI/ML pipelines and models
Runtime for RAG-based or similar applications architecture

At a fundamental level, the challenges can be expressed in very simple terms:

At ingestion: Is the data I am bringing in, safe for use by the AI systems? Is it being appropriately anonymized where needed?
At retrieval: Who should be able to see what data? In what form (whether anonymized or actual)?

Enterprise file systems e.g. Sharepoint, Office365, Google Drive
Chat transcripts from customer support apps, Slack etc.
Enterprise SaaS applications such as Confluence, Workday, Salesforce and so on

‍

Similar to the two primary challenges at ingestion and retrieval as outlined above, the Acante solution approach can be broken down into the two broad components:

Ingestion filter: As data teams connect a variety of data sources into their AI pipelines, they are looking for continuous monitoring of the ingested data. Given the dynamic nature of this source data, they look to ensure that:
1. At a coarse-level the documents and data sensitivity levels being ingested are necessary for the purpose by the AI agents or applications. Undesired proprietary documents or chunks should be automatically filtered out or redacted
2. Any sensitive customer or consumer PII data is automatically detected and anonymized at a token-level while allowing it to be retrieved in clear form by authorized users for the right purpose.
3. Robust set of guardrails and reporting are in place to confidently demonstrate these data ingestion controls
Retrieval firewall: to deliver production-ready AI applications with multiple user personas, a dynamic permissioning system with comprehensive auditing becomes a must-have. Customers requirements for such an access control system include the ability to:
1. Define policy based on metadata of the data - the metadata could be based on the context (topics) of the chunks, the sensitivity level of the tokens (e.g. PII types such as names, emails, …), ingested document tags etc.
2. Automatically honor the document access privileges of the source systems i.e. a user’s query should only retrieve document chunks that they have access to in the source document system. These privileges tend to be quite dynamic, especially for file systems such as Office365 or Google Drive where documents are created and shared continuously.
3. Infer and define finer-grained policies based on user context. Most source document systems are significantly over-provisioned. Today, this has a lower risk of data overexposure because humans usually don’t know the extent of their access. However, once the AI system indexes all this data, the likelihood of it retrieving and overexposing the undesired sensitive data is almost close to a certainty.
4. Lastly, providing detailed auditing of what data (documents, chunks, etc.) were filtered out / excluded from retrieval in response to a Users query. This serves the purposes of both retrieval quality analysis and data governance reporting.

Data Security & Governance: Helping Deliver AI-Ready Data

‍