In our previous blog, The Next Wave of AI Safety Needs to Focus on Data Governance, we discussed why governance and security at the data layer is a major and unaddressed technical challenge. Since then, we have collaborated with several organizations and spoken with dozens of data and AI leaders who are either grappling with this issue or anticipating it as they push to bring enterprise AI applications into production. Our observations echoed broader industry surveys showing that nearly 4 out of 5 enterprise AI applications still haven’t reached production. From our conversations with customers, several consistent themes emerged:
- To build high-ROI AI applications, teams need to use proprietary enterprise data—but they often face legal resistance or lack mature internal AI policies to enable that.
- The security and privacy of the data used for training or as retrieval context is paramount. Overexposure to internal users or, worse, external leakage of sensitive data carries the risk of setting the AI programs back by several quarters
- “Responsible AI”, and “AI Safety” as part of that philosophy, have become standard corporate mandates as enterprises govern their use of AI
Depending on where they are in their AI journey, almost half of the enterprises we spoke with have adopted extreme measures—excluding any data source that might contain proprietary, sensitive customer, or consumer PII data. This approach isn’t sustainable, as evidenced by the forward-leaning customers spanning healthcare, finance, and insurance verticals. As these customers built their internal AI platforms, they either started building internally or looked outwards for robust data security and governance capabilities they could embed into the platform. This is where Acante came in to help them.
Acante’s Data Governance & Security Framework for AI
The need for data governance and security spans 2 common scenarios:
- Training of AI/ML pipelines and models
- Runtime for RAG-based or similar applications architecture
At a fundamental level, the challenges can be expressed in very simple terms:
- At ingestion: Is the data I am bringing in, safe for use by the AI systems? Is it being appropriately anonymized where needed?
- At retrieval: Who should be able to see what data? In what form (whether anonymized or actual)?
The ingestion questions apply to both the training and runtime scenarios, while the retrieval questions apply at runtime in RAG or similar AI architectures in response to the prompts / queries from users or agents.

In the world of structured data lakes and warehouses such as Snowflake, Databricks or others, there exist basic primitives for role-based access control (RBAC). Acante’s existing data access governance solution provides the multi-platform unified control pane for advanced data compliance and security, a dynamic access control and privacy co-pilot and more, to drive operational efficiency and data risk reduction.
However, with new AI applications, the data sources we see with customers are much more varied - there is traditional structured, semi-structured data already stored in their data platforms such as Snowflake, Databricks and blob stores such as AWS S3. Increasingly though, we see use cases connecting unstructured data and documents from sources such as:
- Enterprise file systems e.g. Sharepoint, Office365, Google Drive
- Chat transcripts from customer support apps, Slack etc.
- Enterprise SaaS applications such as Confluence, Workday, Salesforce and so on
In this context, even basic primitives are completely lacking that address the above data governance challenges across structured and (particularly) unstructured data used in AI workloads. By extending our access governance solution to unstructured data used for AI workloads, Acante has created the first unified approach to Data + AI Governance & Security, precisely to address this market need.

Similar to the two primary challenges at ingestion and retrieval as outlined above, the Acante solution approach can be broken down into the two broad components:
- Ingestion filter: As data teams connect a variety of data sources into their AI pipelines, they are looking for continuous monitoring of the ingested data. Given the dynamic nature of this source data, they look to ensure that:
- At a coarse-level the documents and data sensitivity levels being ingested are necessary for the purpose by the AI agents or applications. Undesired proprietary documents or chunks should be automatically filtered out or redacted
- Any sensitive customer or consumer PII data is automatically detected and anonymized at a token-level while allowing it to be retrieved in clear form by authorized users for the right purpose.
- Robust set of guardrails and reporting are in place to confidently demonstrate these data ingestion controls
- Retrieval firewall: to deliver production-ready AI applications with multiple user personas, a dynamic permissioning system with comprehensive auditing becomes a must-have. Customers requirements for such an access control system include the ability to:
- Define policy based on metadata of the data - the metadata could be based on the context (topics) of the chunks, the sensitivity level of the tokens (e.g. PII types such as names, emails, …), ingested document tags etc.
- Automatically honor the document access privileges of the source systems i.e. a user’s query should only retrieve document chunks that they have access to in the source document system. These privileges tend to be quite dynamic, especially for file systems such as Office365 or Google Drive where documents are created and shared continuously.
- Infer and define finer-grained policies based on user context. Most source document systems are significantly over-provisioned. Today, this has a lower risk of data overexposure because humans usually don’t know the extent of their access. However, once the AI system indexes all this data, the likelihood of it retrieving and overexposing the undesired sensitive data is almost close to a certainty.
- Lastly, providing detailed auditing of what data (documents, chunks, etc.) were filtered out / excluded from retrieval in response to a Users query. This serves the purposes of both retrieval quality analysis and data governance reporting.
We have integrated these data-layer governance capabilities into customers’ AI applications across a range of cloud platforms, including AWS, Databricks, and Snowflake. We work with the leading AI development frameworks such as LangChain, LangGraph, LlamaIndex, and Unstructured. The approach offers clear advantages over relying solely on prompt-layer guardrails. By enforcing governance and security at the data layer, customers are experiencing minimal latency overhead, reduced LLM usage costs, and the ability to apply more granular and expansive data guardrails and access controls.
While prompt-layer safeguards remain valuable for catching model-level issues such as toxicity or bias, they are insufficient on their own. Robust governance and security at the data layer is essential for confidently deploying enterprise-grade, production AI systems.
Data Security & Governance: Helping Deliver AI-Ready Data
Referring to Gartner’s Market Guide for AI TriSM, these capabilities provide comprehensive coverage of the Information Governance layer of the TriSM Technology Functions pyramid along with Runtime Controls for data access and inspection. .

After deploying these capabilities in production, we observed the data and AI teams gaining the confidence to integrate a broader range of structured and unstructured data sources into their AI applications. By unlocking access to proprietary enterprise data, organizations are significantly increasing the ROI of their AI initiatives (and making their CFOs happy!).
For example, one of our healthcare customers initially limited ingestion to publicly available research papers. With Acante in place, they expanded to connect data assets from SharePoint, Confluence, and AWS S3. Enabling legal and privacy teams to sign off on the associated risk controls—particularly regarding data exposure and leakage—has unlocked development of a host of new and more powerful use cases for enterprise AI.
If you are building AI applications and seeing some of these challenges, we’d love to hear from you and explore how we can partner on eliminating these blockers to your AI program. Please reach out to us at dhruv@acante.ai