DEPA Private Inferencing

At iSPIRT, our work on digital public infrastructure for data has consistently focused on a core question: how do we unlock value of data while preserving the rights and trust of individuals? With DEPA (Data Empowerment and Protection Architecture), we established a consent-driven framework that empowers individuals to control how their data is shared for a specific purpose. But there are many business scenarios where consent alone is not sufficient.

Consider a simple scenario: a user consents to share their bank transaction data with multiple lenders for the purpose of credit underwriting. The consent artifact clearly specifies purpose, duration, and data scope. However, once that data is shared with the lenders, the user, and even the bank has no technical visibility into how exactly the data is being used by the lender. Are the lenders only computing a credit score, or also deriving behavioral insights for cross-selling? Are any of the lenders retaining longer than necessary? Is the data being combined with other datasets in ways that go beyond the original intent? These questions are becoming increasingly critical as the use of AI for processing personal data expands to pretty much every sphere from agriculture to finance and healthcare. Consent defines permission, but today there are no technical means to ensure that consent is respected.

DEPA Private Inferencing (part of DEPA 2.0) addresses these concerns. It is a framework that enables high-value scenarios, such as AI inference, while ensuring end-to-end, cryptographically verifiable privacy. DEPA Inferencing introduces a new paradigm: controlled data sharing within clean room environments. It allows data to be shared, but only within secure, purpose-limited and verifiable execution environments. This ensures that users’ consent is enforced while data consumers retain flexibility to run complex computations such as AI models while minimizing compliance costs.


Highlights

👉 Unlocks cross-institution inference without exposing customer data to counterparties

👉 Today: 2 million+ private customer inferences in BFSI sector processed within 2 weeks

👉 Open-source stack available for ecosystem adoption and innovation


Before we dive in, let’s quickly recall what the Data Empowerment and Protection Architecture (DEPA 1.0) really is.

What is DEPA and why does it matter?

India Stack is evolving at population scale, enabling the flow of people (Aadhaar, eKYC, DigiLocker, DigiYatra, etc), money (UPI, OCEN), and information (DEPA and Account Aggregator) through Digital Public Infrastructure (DPI). DEPA is foundational to this third layer. It enables the responsible flow and use of data between individuals and organisations for higher-order economic activity such as cross-sell, analytics, AI model training, and AI inference.

As the name suggests, DEPA rests on two key elements. The first is protection, founded on the bedrock of privacy, consent, accountability and purpose limitation of data. The second is empowerment, democratizing data access and enabling the ecosystem to responsibly innovate with it, whether for training AI models, personalizing products and services, advancing scientific research, and a lot more.

The first instantiation of DEPA is the Account Aggregator (AA) framework, which enables real-time user-consented data sharing between Financial Information Providers (FIPs), entities who hold citizen data, and Financial Information Users (FIUs) or entities that require citizen data to provide a service. That model has been transformative for many use cases and remains foundational to India’s data ecosystem.

What is DEPA Private Inferencing?

DEPA Private Inferencing builds on this foundation by introducing a new layer of control at the point of computation. Instead of transferring data to the consumer’s environment, data is released into a Confidential Clean Room (CCR), a hardware-based secure execution environment where computation happens under strict policy enforcement.

In this model, a data provider (DP) allows access to relevant data to a data consumer (DC), governed by user consent (obtained using AA or otherwise). However, instead of getting raw access to the user’s data, the data consumer (DC) brings their inference logic (for example, a credit risk model) into a confidential clean room. The clean room itself enforces constraints on what data can be accessed, how it can be processed, and what outputs can be generated.

In the first incarnation, DEPA Private Inferencing supports a clean room environment that allows the data to be enriched using the PDC’s own data hosted in the clean room followed by inference using the PDC’s AI model. During this process, the clean room environment ensures that

a) personal data remains encrypted at all times, even during use,

b) inferencing is stateless i.e., the AI model cannot keep any state once the data has been processed, and

c) only the results of inference in a predetermined format are returned to the DP.

Importantly the clean room environment ensures that the DC (or for that matter the cloud provider) cannot observe or tamper with the data or the computation, even if the DC is hosting this computation. This is a subtle but important shift. Data does move, but only into an environment where its usage is technically controlled and purpose limited, even from the DC.

DEPA Private Inferencing provides these assurances using confidential computing, a new set of primitives in modern CPUs (Intel, AMD and ARM) and GPUs (from NVIDIA) which enable creation of cryptographically verifiable isolated execution environments called Trusted Execution Environments (TEEs). TEEs ensure that data remains protected throughout its lifetime, at rest, in transit, and during use. TEEs are now broadly available on most cloud platforms and have been battle-tested in large scale deployments such as WhatsApp and Signal.

If the confidential clean room is where trust is enforced, key management is what ensures that this trust cannot be bypassed. In DEPA Private Inferencing, keys for encrypting data as it is transferred from the DP to the DC are generated and governed within a dedicated, transparent key management service (KMS) operated by Samyog, a neutral not-for-profit self regulated organization (SRO) set up for this purpose. Access to private keys is tightly coupled with attestation. Personal data is encrypted using public keys from the KMS. However, before an inferencing service hosted by a PDC can decrypt data, it must prove, using hardware-backed attestation, that it is running clean room code approved by Samyog inside a genuine TEE. Only after this verification does the KMS release the required decryption key securely wrapped so it can only be used within that specific clean room environment. This creates a separation of control, ensuring that no single participant in the ecosystem can unilaterally access sensitive data.

Similar to the clean room, the KMS itself runs within a tamper-evident, fault-tolerant TEE, with transparency and auditability over key release policies, ensuring even administrators of the KMS cannot access private keys.

DEPA Private Inferencing in action

Our first production deployment of DEPA Private Inferencing connects a leading commercial bank (Data Consumer) with a regulated fintech (Data Provider) to run a simple private inference: Does this fintech customer already hold an account at the bank?

Once the DPDP Act is operationalized, even answering this simple question today will be challenging, because as soon as the customer’s data (e.g., mobile number and PAN card) is exposed to the bank, the bank is responsible for managing the entire consent and data lifecycle. Given banks today typically integrate with dozens of such partners, this creates significant operational and compliance overhead.

With DEPA Private Inferencing, the bank can receive this data in encrypted form inside a CCR pre-loaded with its own customer information, process it, and generate a yes/no response – while proving to the fintech partner that the data was used only for this specific purpose, and that the bank has no technical means to access or retain this data. This builds trust, reduces exposure of raw data, and significantly simplifies partner integrations at scale.

Already, within the first two weeks of deployment, DEPA Private Inferencing has processed over 2 million cross-institution inferences, with a p99 latency of < 100ms, while meeting all of the bank’s security and compliance requirements.

Figure: Example Private Inference flow between a Fintech and a Bank.

A New Primitive for the Data Economy

DEPA Private Inferencing represents a shift from thinking about data as something that must be transferred, to thinking about computation as something that can be safely hosted. The clean room becomes the unit of trust, a neutral, enforceable space where data and algorithms interact under well-defined constraints. As India’s digital public infrastructure continues to evolve, this approach lays the groundwork for a more trustworthy and scalable data economy, one where innovation does not come at the cost of control, and where consent is not just captured, but meaningfully upheld.

Get started

👉 Dive into the code: DEPA‑Inferencing on GitHub 🛠️
👉 Reach out to [email protected] for more info.
👉 Watch the Open House video: YouTube 🎬

👉 Think big: What challenges has data privacy kept off-limits? What data has felt forever inaccessible? With DEPA, those doors may finally open. 💡

Interested in contributing to DEPA? Join our group of no-greed no-glory volunteers! Apply here

DEPA-Training: Tech Updates

We’ve rolled out some exciting updates for DEPA‑Training, making it easier to rapidly prototype and run diverse training scenarios — complete with electronic contracts, confidential cleanrooms, privacy-preservation and configurable training SDKs.


✨ What’s new

👉 GUI for end-to-end execution

👉 Step-by-step guide to create and run your own training scenarios

👉 New scenarios introduced for complex multi-party training: MRI brain tumor segmentation, credit default risk prediction


Before we dive in, let’s quickly recall what the Data Empowerment and Protection Architecture (DEPA) really is.

What is DEPA and why does it matter?

India Stack is evolving at population scale, enabling the flow of people (Aadhaar, eKYC, DigiLocker, DigiYatra, etc), money (UPI, OCEN), and information (DEPA and Account Aggregator) through Digital Public Infrastructure (DPI). DEPA is critical in this third layer as it enables the responsible flow of data between individuals and organisations for more complex tasks such as AI model training, AI inference and analytics. 

As the name suggests, DEPA rests on two key elements. The first is protection, founded on the bedrock of privacy, consent, accountability and purpose limitation of data. The second is empowerment, democratizing data access and enabling the ecosystem to responsibly innovate with it, whether for training AI models, personalizing products and services, advancing scientific research, and a lot more.

In light of emerging data protection laws such as the DPDP, GDPR, and others, there is a need for a framework that enables the responsible use of data — unlocking its value while ensuring regulatory compliance and serving the broader public interest.

Ultimately, DEPA solves for two core challenges at the heart of data sharing — Trust and Flow — keeping the rest open and flexible for innovation.

What is DEPA‑Training?

The vision behind DEPA for Training (aka DEPA‑Training) is simple: For India to not only be a consumer of AI, but also a producer of AI, and in a responsible and democratized manner.

AI’s first big leap came from public data. That well is running dry. Our belief is that for the next wave of AI innovation — smarter AI for healthcare, personalized finance, scientific discovery and more — proprietary data will be crucial. But today, that data is fragmented, locked in silos, and difficult to use — often running into challenges around privacy, compliance, and regulatory constraints.

Enter DEPA-Training — a techno-legal Digital Public Infrastructure (DPI) designed to enable secure, agile, and scalable AI model training on sensitive data. It does so by assembling a set of frontier technological primitives:

  • Confidential Clean Rooms (CCRs): Isolated compute environments that can cryptographically attest to their integrity, where data can be processed securely without external exposure.
  • Electronic Contracts: Code-enforced legal agreements between transacting parties, that give data providers control over how their data is used, for eg. through purpose limitation, privacy safeguards and monetization.
  • Secure Training Sandbox: Modular and configurable sandboxes and SDKs for building privacy-preserving and compliant training pipelines across diverse model architectures and data types.

What’s new in DEPA-Training?

Graphical user interface

We’ve introduced an interactive GUI that enables users to explore, configure, and execute DEPA-Training scenarios end to end. The application automatically discovers available scenarios in the repository and provides an intuitive interface to run them — eliminating the need for command-line interaction. A similar GUI workflow is also provided for contract signing.

Scenarios you can try out today

To bring DEPA-Training to life, we showcase a diverse set of scenarios that demonstrate what’s possible in practice. These examples illustrate pathways toward solving larger global challenges and span multiple data modalities (e.g., tabular, images), model paradigms (e.g., classical ML, MLPs, CNNs), and prediction tasks (e.g., regression, classification, image segmentation).

Disease Surveillance Modeling

Pandemics don’t wait. Timely, accurate data can save millions of lives. Yet most infection data is scattered, siloed, and too sensitive to share. With differential privacy, institutions can securely pool data to track virus spread, map risk patterns, and test interventions — powering real-time, data-driven epidemic response.

Example: COVID-19 scenario

Medical Image Modeling

From cancer to cardiovascular disease, from neurology to rare disorders — modern medicine increasingly depends on imaging. Yet medical images are among the hardest datasets to share, trapped in hospital silos and governed by strict privacy laws. DEPA makes it possible to combine imaging data across borders and institutions, unlocking AI models that are more accurate, generalizable, and equitable. This accelerates breakthroughs in diagnostics, improves treatment planning, and addresses one of healthcare’s biggest global challenges: scaling precision medicine while safeguarding patient trust.

Example: BraTS scenario 

Financial Credit Risk Modeling

Access to fair credit fuels economic growth, but risk assessment is often limited by partial data. By safely combining insights across financial institutions, DEPA enables more accurate credit scoring, reduces defaults, and strengthens financial stability — empowering individuals and businesses alike with better access to capital.

Example: Credit Risk scenario

Build your own Scenarios

A new step-by-step guide walks you through building and running your own DEPA-Training scenarios — making it easy to rapidly prototype and iterate with training use-cases of your own.

Currently, DEPA-Training supports the following training frameworks, libraries and file formats (more will be included soon):

  • Frameworks: PyTorch, Scikit‑Learn, XGBoost (LLM Finetuning to be added soon!)
  • Libraries: Opacus, PySpark, Pandas (HuggingFace support coming soon!)
  • Formats: ONNX, Safetensors, Parquet, CSV, HDF5, PNG (No pickle-based formats for security reasons)

What’s in it for the ecosystem?

DEPA-Training democratizes responsible data sharing and model training for all!

  • Enterprises & Startups → Unlock the value of private data to build smarter products and services, while remaining compliant to data laws. Collaborate across organizations to create solutions that no single dataset could power.
  • Research Institutions → Pool data at scale to tackle grand challenges, drive scientific discovery, and advance knowledge for the public good.
  • Policy & Legal Experts → Shape the future of data governance by operationalizing privacy, consent, purpose limitation, and accountability in practice.
  • Builders & researchers → Join us in co-creating this framework!

Get started

👉 Get your hands dirty: DEPA‑Training on GitHub 🛠️

👉 Explore the documentation: DEPA.World 📜
👉 Watch the Open Houses: YouTube Playlist 🎬

👉 Think big: What challenges has data privacy kept off-limits? What data has felt forever inaccessible? With DEPA-Training, those doors may finally open. 💡

Interested in contributing to DEPA? Join our group of no-greed no-glory volunteers! Apply here

Please note: The blog post is authored by our volunteers, Sarang GaladaDr. Shyam Sundaram, Kapil Vaswani and Pavan kumar Adukuri

Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-2

This is a two part blog series. The following is the second part.

In Part 1, we traced how data collaborations are being reimagined, and laid out the conceptual foundations. From redefining consent through the Account Aggregator framework, to recognizing the limits of consent. We explored how privacy-preserving frameworks like differential privacy protect individuals even when models are built from data; how electronic contracts replace slow, manual agreements with enforceable digital rules; and how confidential clean rooms combine secure hardware and privacy guarantees to enable computation without revealing raw data.

In Part 2, we explore how these building blocks come together in practice.

The Connective Tissue: Data Collabs

Technology alone cannot guarantee privacy, fairness, or effective collaboration. Data-sharing ecosystems need institutional scaffolding — entities that can operationalize trust, manage relationships, and abstract away complexity for participants.

This is where Data Collaboratives (or Data Collabs for short) come in.

A Data Collab isn’t a regulator or a government body. Rather, it is a facilitator organization — a neutral yet entrepreneurial entity that enables, orchestrates, and sustains data collaborations using the DEPA Framework behind the scenes, following its standards and processes set by trusted bodies like an Self-Regulatory Organization (SRO) and a Technology Standards Organization (TSO).

You can think of a Data Collab as the connective tissue of a data ecosystem — linking data providers, data consumers, and service providers.

In practice, a Data Collab:

  1. Provides tools and interfaces for participants to register, onboard, sign electronic contracts, and set up secure collaboration environments such as CCRs.
  2. Signs agreements with data providers to clean, prepare, and catalogue datasets so that they can be safely shared with authorized data consumers.
  3. Manages the flow of value — usually collecting payments from data consumers and distributing them fairly to data providers, while covering operational costs.
  4. Assumes accountability for ensuring that all interactions, permissions, and computations are compliant with the DEPA rules and contractual terms.
  5. Adds value beyond infrastructure — offering domain expertise, workflow design, governance and audit support — streamlining data collaborations.

Data Collabs will likely take different forms depending on the domain they serve. For example, some might focus on oncology research, others on financial fraud detection or climate-risk modeling. Each field has its own kinds of data, privacy rules, and ways of working — so it is natural for Data Collabs to specialize.

Because running these collaborations requires significant operational and technical effort, most Data Collabs will probably be for-profit enterprises. At the same time, because they operate on open, interoperable digital public infrastructure like DEPA, they are not monopolistic platforms. Instead, they enable a competitive marketplace where multiple Data Collabs can coexist, offering participants better choices, fairer pricing, and higher-quality services.

In this way, Data Collabs create a persistent institutional layer for responsible data use; enabling long-term, multi-party cooperation that would be impractical to coordinate through ad hoc agreements.

A real-world example: Accelerating Drug Discovery

Imagine three pharmaceutical companies, each developing treatments for the same rare disease. Each has conducted clinical trials with a few hundred patients — but individually, none has enough data in quantity, diversity, or parameter richness to train a robust predictive model of treatment response. 

Much like pieces of a puzzle, valuable insights often emerge only when data from different sources fit together — yet no single party should hold or see the entire picture.

If these companies could combine their datasets, and enrich them with other sources like gene expression profiles, cell imaging results, or public molecular databases, they could uncover deeper patterns and dramatically speed up drug discovery.

But three major barriers stand in their way:

  1. Competitive concerns: Each company treats its clinical data as proprietary and doesn’t want to reveal it to others.
  2. Privacy regulations: Patients gave consent only to the company that ran their trial — not to share data across firms.
  3. Practical limits: Many patients can’t be re-contacted to renew consent, making manual legal processes infeasible.

This is where the DEPA Framework fits in. Here’s how it would work:

A Data Collab is formed for long-term drug discovery collaborations. It signs electronic contracts with each company, defining rights, responsibilities, and permitted use of data. It handles registration, onboarding, and compliance checks through standardized interfaces.

Electronic contracts set out the exact terms of collaboration — specifying each party’s role, the artefacts they contribute, and the rules that govern privacy, usage, and value-sharing.

Each company uploads its encrypted trial data or model into a Confidential Clean Room. Data inside the CCR is decrypted only after checks confirm that all security and compliance conditions are met.

Data is programmatically joined and enriched within the CCR, followed by AI model training using privacy-enhancing techniques like differential privacy, which appropriately bound the chance of re-identifying patients.

Only the final trained model and its accompanying logs — never the underlying data — leave the CCR. The model can be decrypted solely by the authorized data consumer(s) (i.e. the modellers), protecting their trade secrets.

Auditors can review logs and trace the provenance of all artefacts at any time — via the DEPA AI Chain — to verify compliance and resolve disputes.

This framework delivers several benefits for all concerned stakeholders:

  • For society: Promising treatments reach patients faster, while a reusable governance and technology blueprint emerges for future biomedical collaborations. 
  • For the economy: A new data-driven economy is unlocked, enabling novel business interactions and boosting meaningful economic activity.
  • For companies: They can innovate together without exposing trade secrets or breaking regulatory rules, expanding what’s possible in research and development.
  • For regulators and auditors: Every transaction leaves a verifiable trail, simplifying oversight and boosting trust in the ecosystem.

Summing up

India’s journey toward responsible data use has been progressive and layered.

  • It began with the Account Aggregator framework — making consent Open, Revocable, Granular, Auditable, Notifying and Secure (ORGANS principle).
  • For model training and analytics, Privacy-Enhancing Technologies (PETs) — such as Differential Privacy — introduce mechanisms like the privacy budget to safeguard individuals while enabling learning.
  • To make collaboration faster and more reliable, Electronic Contracts replace traditional paper/PDF agreements with machine-readable, enforceable commitments — cutting through the friction of slow legal processes.
  • Confidential Clean Rooms (CCRs) operationalize these safeguards — enabling computation on sensitive data.
  • Finally, Data Collaboratives weave all these elements together — creating institutional and economic frameworks that make responsible, long-term data collaboration practical and sustainable.

This is the next frontier of Digital Public Infrastructure for AI — proving that protection and innovation are not opposites. With the right frameworks, we can have both.

Read Part 1: Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-1

Please note: The blog post is authored by our volunteers, Hari Subramanian and Sarang Galada

For more information, please visit: https://depa.world/