DEPA-Training: Tech Updates

We’ve rolled out some exciting updates for DEPA‑Training, making it easier to rapidly prototype and run diverse training scenarios — complete with electronic contracts, confidential cleanrooms, privacy-preservation and configurable training SDKs.


✨ What’s new

👉 GUI for end-to-end execution

👉 Step-by-step guide to create and run your own training scenarios

👉 New scenarios introduced for complex multi-party training: MRI brain tumor segmentation, credit default risk prediction


Before we dive in, let’s quickly recall what the Data Empowerment and Protection Architecture (DEPA) really is.

What is DEPA and why does it matter?

India Stack is evolving at population scale, enabling the flow of people (Aadhaar, eKYC, DigiLocker, DigiYatra, etc), money (UPI, OCEN), and information (DEPA and Account Aggregator) through Digital Public Infrastructure (DPI). DEPA is critical in this third layer as it enables the responsible flow of data between individuals and organisations for more complex tasks such as AI model training, AI inference and analytics. 

As the name suggests, DEPA rests on two key elements. The first is protection, founded on the bedrock of privacy, consent, accountability and purpose limitation of data. The second is empowerment, democratizing data access and enabling the ecosystem to responsibly innovate with it, whether for training AI models, personalizing products and services, advancing scientific research, and a lot more.

In light of emerging data protection laws such as the DPDP, GDPR, and others, there is a need for a framework that enables the responsible use of data — unlocking its value while ensuring regulatory compliance and serving the broader public interest.

Ultimately, DEPA solves for two core challenges at the heart of data sharing — Trust and Flow — keeping the rest open and flexible for innovation.

What is DEPA‑Training?

The vision behind DEPA for Training (aka DEPA‑Training) is simple: For India to not only be a consumer of AI, but also a producer of AI, and in a responsible and democratized manner.

AI’s first big leap came from public data. That well is running dry. Our belief is that for the next wave of AI innovation — smarter AI for healthcare, personalized finance, scientific discovery and more — proprietary data will be crucial. But today, that data is fragmented, locked in silos, and difficult to use — often running into challenges around privacy, compliance, and regulatory constraints.

Enter DEPA-Training — a techno-legal Digital Public Infrastructure (DPI) designed to enable secure, agile, and scalable AI model training on sensitive data. It does so by assembling a set of frontier technological primitives:

  • Confidential Clean Rooms (CCRs): Isolated compute environments that can cryptographically attest to their integrity, where data can be processed securely without external exposure.
  • Electronic Contracts: Code-enforced legal agreements between transacting parties, that give data providers control over how their data is used, for eg. through purpose limitation, privacy safeguards and monetization.
  • Secure Training Sandbox: Modular and configurable sandboxes and SDKs for building privacy-preserving and compliant training pipelines across diverse model architectures and data types.

What’s new in DEPA-Training?

Graphical user interface

We’ve introduced an interactive GUI that enables users to explore, configure, and execute DEPA-Training scenarios end to end. The application automatically discovers available scenarios in the repository and provides an intuitive interface to run them — eliminating the need for command-line interaction. A similar GUI workflow is also provided for contract signing.

Scenarios you can try out today

To bring DEPA-Training to life, we showcase a diverse set of scenarios that demonstrate what’s possible in practice. These examples illustrate pathways toward solving larger global challenges and span multiple data modalities (e.g., tabular, images), model paradigms (e.g., classical ML, MLPs, CNNs), and prediction tasks (e.g., regression, classification, image segmentation).

Disease Surveillance Modeling

Pandemics don’t wait. Timely, accurate data can save millions of lives. Yet most infection data is scattered, siloed, and too sensitive to share. With differential privacy, institutions can securely pool data to track virus spread, map risk patterns, and test interventions — powering real-time, data-driven epidemic response.

Example: COVID-19 scenario

Medical Image Modeling

From cancer to cardiovascular disease, from neurology to rare disorders — modern medicine increasingly depends on imaging. Yet medical images are among the hardest datasets to share, trapped in hospital silos and governed by strict privacy laws. DEPA makes it possible to combine imaging data across borders and institutions, unlocking AI models that are more accurate, generalizable, and equitable. This accelerates breakthroughs in diagnostics, improves treatment planning, and addresses one of healthcare’s biggest global challenges: scaling precision medicine while safeguarding patient trust.

Example: BraTS scenario 

Financial Credit Risk Modeling

Access to fair credit fuels economic growth, but risk assessment is often limited by partial data. By safely combining insights across financial institutions, DEPA enables more accurate credit scoring, reduces defaults, and strengthens financial stability — empowering individuals and businesses alike with better access to capital.

Example: Credit Risk scenario

Build your own Scenarios

A new step-by-step guide walks you through building and running your own DEPA-Training scenarios — making it easy to rapidly prototype and iterate with training use-cases of your own.

Currently, DEPA-Training supports the following training frameworks, libraries and file formats (more will be included soon):

  • Frameworks: PyTorch, Scikit‑Learn, XGBoost (LLM Finetuning to be added soon!)
  • Libraries: Opacus, PySpark, Pandas (HuggingFace support coming soon!)
  • Formats: ONNX, Safetensors, Parquet, CSV, HDF5, PNG (No pickle-based formats for security reasons)

What’s in it for the ecosystem?

DEPA-Training democratizes responsible data sharing and model training for all!

  • Enterprises & Startups → Unlock the value of private data to build smarter products and services, while remaining compliant to data laws. Collaborate across organizations to create solutions that no single dataset could power.
  • Research Institutions → Pool data at scale to tackle grand challenges, drive scientific discovery, and advance knowledge for the public good.
  • Policy & Legal Experts → Shape the future of data governance by operationalizing privacy, consent, purpose limitation, and accountability in practice.
  • Builders & researchers → Join us in co-creating this framework!

Get started

👉 Get your hands dirty: DEPA‑Training on GitHub 🛠️

👉 Explore the documentation: DEPA.World 📜
👉 Watch the Open Houses: YouTube Playlist 🎬

👉 Think big: What challenges has data privacy kept off-limits? What data has felt forever inaccessible? With DEPA-Training, those doors may finally open. 💡

Interested in contributing to DEPA? Join our group of no-greed no-glory volunteers! Apply here

Please note: The blog post is authored by our volunteers, Sarang GaladaDr. Shyam Sundaram, Kapil Vaswani and Pavan kumar Adukuri

Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-2

This is a two part blog series. The following is the second part.

In Part 1, we traced how data collaborations are being reimagined, and laid out the conceptual foundations. From redefining consent through the Account Aggregator framework, to recognizing the limits of consent. We explored how privacy-preserving frameworks like differential privacy protect individuals even when models are built from data; how electronic contracts replace slow, manual agreements with enforceable digital rules; and how confidential clean rooms combine secure hardware and privacy guarantees to enable computation without revealing raw data.

In Part 2, we explore how these building blocks come together in practice.

The Connective Tissue: Data Collabs

Technology alone cannot guarantee privacy, fairness, or effective collaboration. Data-sharing ecosystems need institutional scaffolding — entities that can operationalize trust, manage relationships, and abstract away complexity for participants.

This is where Data Collaboratives (or Data Collabs for short) come in.

A Data Collab isn’t a regulator or a government body. Rather, it is a facilitator organization — a neutral yet entrepreneurial entity that enables, orchestrates, and sustains data collaborations using the DEPA Framework behind the scenes, following its standards and processes set by trusted bodies like an Self-Regulatory Organization (SRO) and a Technology Standards Organization (TSO).

You can think of a Data Collab as the connective tissue of a data ecosystem — linking data providers, data consumers, and service providers.

In practice, a Data Collab:

  1. Provides tools and interfaces for participants to register, onboard, sign electronic contracts, and set up secure collaboration environments such as CCRs.
  2. Signs agreements with data providers to clean, prepare, and catalogue datasets so that they can be safely shared with authorized data consumers.
  3. Manages the flow of value — usually collecting payments from data consumers and distributing them fairly to data providers, while covering operational costs.
  4. Assumes accountability for ensuring that all interactions, permissions, and computations are compliant with the DEPA rules and contractual terms.
  5. Adds value beyond infrastructure — offering domain expertise, workflow design, governance and audit support — streamlining data collaborations.

Data Collabs will likely take different forms depending on the domain they serve. For example, some might focus on oncology research, others on financial fraud detection or climate-risk modeling. Each field has its own kinds of data, privacy rules, and ways of working — so it is natural for Data Collabs to specialize.

Because running these collaborations requires significant operational and technical effort, most Data Collabs will probably be for-profit enterprises. At the same time, because they operate on open, interoperable digital public infrastructure like DEPA, they are not monopolistic platforms. Instead, they enable a competitive marketplace where multiple Data Collabs can coexist, offering participants better choices, fairer pricing, and higher-quality services.

In this way, Data Collabs create a persistent institutional layer for responsible data use; enabling long-term, multi-party cooperation that would be impractical to coordinate through ad hoc agreements.

A real-world example: Accelerating Drug Discovery

Imagine three pharmaceutical companies, each developing treatments for the same rare disease. Each has conducted clinical trials with a few hundred patients — but individually, none has enough data in quantity, diversity, or parameter richness to train a robust predictive model of treatment response. 

Much like pieces of a puzzle, valuable insights often emerge only when data from different sources fit together — yet no single party should hold or see the entire picture.

If these companies could combine their datasets, and enrich them with other sources like gene expression profiles, cell imaging results, or public molecular databases, they could uncover deeper patterns and dramatically speed up drug discovery.

But three major barriers stand in their way:

  1. Competitive concerns: Each company treats its clinical data as proprietary and doesn’t want to reveal it to others.
  2. Privacy regulations: Patients gave consent only to the company that ran their trial — not to share data across firms.
  3. Practical limits: Many patients can’t be re-contacted to renew consent, making manual legal processes infeasible.

This is where the DEPA Framework fits in. Here’s how it would work:

A Data Collab is formed for long-term drug discovery collaborations. It signs electronic contracts with each company, defining rights, responsibilities, and permitted use of data. It handles registration, onboarding, and compliance checks through standardized interfaces.

Electronic contracts set out the exact terms of collaboration — specifying each party’s role, the artefacts they contribute, and the rules that govern privacy, usage, and value-sharing.

Each company uploads its encrypted trial data or model into a Confidential Clean Room. Data inside the CCR is decrypted only after checks confirm that all security and compliance conditions are met.

Data is programmatically joined and enriched within the CCR, followed by AI model training using privacy-enhancing techniques like differential privacy, which appropriately bound the chance of re-identifying patients.

Only the final trained model and its accompanying logs — never the underlying data — leave the CCR. The model can be decrypted solely by the authorized data consumer(s) (i.e. the modellers), protecting their trade secrets.

Auditors can review logs and trace the provenance of all artefacts at any time — via the DEPA AI Chain — to verify compliance and resolve disputes.

This framework delivers several benefits for all concerned stakeholders:

  • For society: Promising treatments reach patients faster, while a reusable governance and technology blueprint emerges for future biomedical collaborations. 
  • For the economy: A new data-driven economy is unlocked, enabling novel business interactions and boosting meaningful economic activity.
  • For companies: They can innovate together without exposing trade secrets or breaking regulatory rules, expanding what’s possible in research and development.
  • For regulators and auditors: Every transaction leaves a verifiable trail, simplifying oversight and boosting trust in the ecosystem.

Summing up

India’s journey toward responsible data use has been progressive and layered.

  • It began with the Account Aggregator framework — making consent Open, Revocable, Granular, Auditable, Notifying and Secure (ORGANS principle).
  • For model training and analytics, Privacy-Enhancing Technologies (PETs) — such as Differential Privacy — introduce mechanisms like the privacy budget to safeguard individuals while enabling learning.
  • To make collaboration faster and more reliable, Electronic Contracts replace traditional paper/PDF agreements with machine-readable, enforceable commitments — cutting through the friction of slow legal processes.
  • Confidential Clean Rooms (CCRs) operationalize these safeguards — enabling computation on sensitive data.
  • Finally, Data Collaboratives weave all these elements together — creating institutional and economic frameworks that make responsible, long-term data collaboration practical and sustainable.

This is the next frontier of Digital Public Infrastructure for AI — proving that protection and innovation are not opposites. With the right frameworks, we can have both.

Read Part 1: Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-1

Please note: The blog post is authored by our volunteers, Hari Subramanian and Sarang Galada

For more information, please visit: https://depa.world/

How To Empower 1.3 Billion Citizens With Their Data

2018 has been a significant year in our relationship with Data. Globally, the Cambridge Analytica incident made people realise that democracy itself can be vulnerable to data.  Closer to home, we got a first glimpse at the draft bill for Privacy by the Justice Sri Krishna Committee.

The writing on the wall is obvious. We cannot continue the way we have. This is a problem at every level – Individuals need to be more careful with whom they share their data and data controllers need to show more transparency and responsibility in handling user data. But one cannot expect that we will just organically shift to a more responsible, transparent, privacy-protecting regime without the intervention of the state. The draft bill, if it becomes law, will be a great win as it finally prescribes meaningful penalties for transgressions by controllers.

But we must not forget that the flip side of the coin is that data can also help empower people. India has much more socio-economic diversity than other countries where a data protection law has been enacted. Our concerns are more than just limiting the exploitation of user data by data controllers. We must look at data as an opportunity and ask how can we help users generate wealth out of their own data. Thus we propose, that we should design an India-specific Data Protection & Empowerment Architecture (DEPA). Empowerment & Protection are neither opposite nor orthogonal but co-dependent activities. We must think of them together else we will miss the forest for the trees.

In my talk linked below which took place at IDFC Dialogues Goa, I expand more on these ideas. I also talk about the exciting new technology tools that actually help us realise a future where Data can empower.

I hope you take away something of value from the talk. The larger message though, is that it is still early days for the internet. We can participate in shaping its culture, maybe even lead the way, instead of being passive observers. The Indian approach is finding deep resonance globally, and many countries, developing as well as developed, are looking to us for inspiration on how to deal with their own data problem. But it is going to take a lot more collaboration and co-creation before we get there. I hope you will join us on this mission to create a Data Democracy.

Understanding iSPIRT’s Entrepreneur Connect

There is confusion about how iSPIRT engages with entrepreneurs. This post explains to our engagement model so that the expectations are clear. iSPIRT’s mission is to make India into a Product Nation. iSPIRT believes that startups are a critical catalyst in this mission. In-line with the mission, we help entrepreneurs navigate market and mindset shifts so that some of them can become trailblazers and category leaders.

Market Shifts

Some years back global mid-market business applications, delivered as SaaS, had to deal with the ubiquity of mobile. This shift upended the SaaS industry. Now, another such market shift is underway in global SaaS – with AI/ML being one factor in this evolution.

Similar shifts are happening in the India market too. UPI is shaking up the old payments market. JIO’s cheap bandwidth is shifting the digital entertainment landscape. And, India Stack is opening up Bharat (India-2) to digital financial products.

At iSPIRT, we try to help market players navigate these shifts through Bootcamps, Teardowns, Roundtables, and Cohorts (BTRC).

We know that reading market shifts isn’t easy. Like stock market bubbles, market shifts are fully clear only in hindsight. In the middle, there is an open question whether this is a valid market shift or not (similar to whether the stock market is in a bubble or not). There are strong opinions on both sides till the singularity moment happens. The singularity moment is usually someone going bust by failing to see the shift (e.g. Chillr going bust due to UPI) or becoming a trailblazer by leveraging the shift (e.g. PhonePe’s meteoric rise).

Startups are made or unmade on their bets on market shifts. Bill Gates’ epiphany that browser was a big market shift saved Microsoft. Netflix is what it is today on account of its proactive shift from ground to cloud. Closer home, Zoho has constantly reinvented itself.

Founders have a responsibility to catch the shifts. At iSPIRT, we have a strong opinion on some market shifts and work with the founders who embrace these shifts.

Creating Trailblazers through Winning Implementations

We are now tieing our BTRC work to specific market-shifts and mindset-shifts. We will only work with those startups that have a conviction about these market/mindset-shifts (i.e., they are not on the fence), are hungry (and are willing to exploit the shift to get ahead) and can apply what they have learned from iSPIRT Mavens to make better products.

Another change is that we will work with young or old, big or small startups. In the past, we worked with only startups in the “happy-confused” stage.

We are making these changes to improve outcomes. Over the last four years, our BTRC engagements have generated very high NPS (Net Promoter Scores) but many of our startups continue to struggle with their growth ceilings, be it an ARR threshold of $1M, $5M, $10M… or whether it is a scalable yet repeatable product-market fit.

What hasn’t changed is our bias for working with a few startups instead of many. Right from the beginning, iSPIRT’s Playbooks Pillar has been about making a deep impact on a few startups rather than a shallow impact on many. For instance, our first PNGrowth had 186 startups. They had been selected from 600+ that applied. In the end, we concluded that we needed even better curation. So, our PNGrowth#2 had only 50 startups.

The other thing that hasn’t changed is we remain blind to whether the startup is VC funded or bootstrapped. All we are looking for are startups that have the conviction about the market/mindset-shift, the hunger to make a difference and the inner capacity to apply what you learn. We want them to be trailblazers in the ecosystem.

Supported Market/Mindset Shifts

Presently we support 10 market/mindset-shifts. These are:

  1. AI/ML Shift in SaaS – Adapt AI into your SaaS products and business models to create meaningful differentiation and compete on a global level playing field.

  2. Shift to Platform Products – Develop and leverage internal platforms to power a product bouquet. Building enterprise-grade products on a common base at fractional cost allows for a defensible strategy against market shifts or expanding market segments.

  3. Engaging Potential Strategic Partners (PSP) – PSPs are critical for scale and pitching to them is very different from pitching to customers and investors. Additionally, PSPs also offer an opportunity to co-create a growth path to future products & investments.

  4. Flow-based lending – Going after the untapped “largest lending opportunity in the world”.

  5. Bill payments – What credit and corporate cards were to West, bill payments will be to India due to Bharat Bill Pay System (BBPS).

  6. UPI 2.0 – Mass-market payments and new-age collections.

  7. Mutual Fund democratization – Build products and platforms that bring informal savings into the formal sector.

  8. From License Raj to Permissions Artefact for Drones – Platform approach to provisioning airspace from the government.

  9. Microinsurance for Bharat – Build products and platforms that reimagine Agri insurance on the back of India Stack and upcoming Digital Sky drone policy.

  10. Data Empowerment and Protection Architecture (DEPA) – with usage in financial, healthcare and telecom sectors.

This is a fluid list. There will be additions and deletions over time.

Keep in mind that we are trying to replicate for all these market/mindset-shifts what we managed to do for Desk Marketing and Selling (DMS). We focussed on DMS in early 2014 thanks to Mavens like Suresh Sambandam (KissFlow), Girish Mathrubootham (Freshworks), and Krish Subramaniam (Chargebee). Now DMS has gone mainstream and many sources of help are available to the founders.

Seeking Wave#2 Partners

The DMS success has been important for iSPIRT. It has given us the confidence that our BTRC work can meaningfully help startups navigate the market/mindset-shifts. We have also learned that the market/mindset-shift happens in two waves. Wave#1 touches a few early adopters. If one or more of them create winning implementations to become trailblazers, then the rest of the ecosystem jumps in. This is Wave#2. Majority of our startups embrace the market-shift in Wave#2.

iSPIRT’s model is geared to help only Wave#1 players. We falter when it comes to supporting Wave#2 folks. Our volunteer model works best with cutting-edge stuff and small cohorts.

Accelerators and commercial players are better positioned to serve the hundreds of startups embracing the market/mindset-shift in Wave#2. Together, Wave#1 and Wave#2, can produce great outcomes like the thriving AI ecosystem in Toronto.

To ensure that Wave#2 goes well, we have decided to include potential Wave#2 helpers (e.g., Accelerators, VCs, boutique advisory firms and other ecosystem builders) in our Wave#1 work (on a, needless to say, free basis). Some of these BTRC Scale Partners have been identified. If you see yourself as a Wave#2 helper who would like to get involved in our Wave#1 work, please reach out to us.

Best Adopters

As many of you know, iSPIRT isn’t an accelerator (like TLabs), a community (like Headstart), a coworking space (like THub) or a trade body. We are a think-and-do-tank that builds playbooks, societal platforms, policies, and markets. Market players like startups use these public goods to offer best solutions to the market.

If we are missing out on helping you, please let us know by filling out this form. You can also reach out to one of our volunteers here:

Chintan Mehta: AI shift in SaaS, Shift to Platform Products, Engaging PSPs

Praveen Hari: Flow-based lending

Jaishankar AL: Bill payments

Tanuj Bhojwani: Permissions Artefact for Drones

Nikhil Kumar: UPI2.0, MF democratization, Microinsurance for Bharat

Siddharth Shetty: Data Empowerment and Protection Architecture (DEPA)

Meghana Reddyreddy: Wave#2 Partners

We are always looking for high-quality volunteers. In case you’re interested in volunteering, please reach out to one of the existing volunteers or write to us at [email protected]