DEPA-Training: Tech Updates

We’ve rolled out some exciting updates for DEPA‑Training, making it easier to rapidly prototype and run diverse training scenarios — complete with electronic contracts, confidential cleanrooms, privacy-preservation and configurable training SDKs.


✨ What’s new

👉 GUI for end-to-end execution

👉 Step-by-step guide to create and run your own training scenarios

👉 New scenarios introduced for complex multi-party training: MRI brain tumor segmentation, credit default risk prediction


Before we dive in, let’s quickly recall what the Data Empowerment and Protection Architecture (DEPA) really is.

What is DEPA and why does it matter?

India Stack is evolving at population scale, enabling the flow of people (Aadhaar, eKYC, DigiLocker, DigiYatra, etc), money (UPI, OCEN), and information (DEPA and Account Aggregator) through Digital Public Infrastructure (DPI). DEPA is critical in this third layer as it enables the responsible flow of data between individuals and organisations for more complex tasks such as AI model training, AI inference and analytics. 

As the name suggests, DEPA rests on two key elements. The first is protection, founded on the bedrock of privacy, consent, accountability and purpose limitation of data. The second is empowerment, democratizing data access and enabling the ecosystem to responsibly innovate with it, whether for training AI models, personalizing products and services, advancing scientific research, and a lot more.

In light of emerging data protection laws such as the DPDP, GDPR, and others, there is a need for a framework that enables the responsible use of data — unlocking its value while ensuring regulatory compliance and serving the broader public interest.

Ultimately, DEPA solves for two core challenges at the heart of data sharing — Trust and Flow — keeping the rest open and flexible for innovation.

What is DEPA‑Training?

The vision behind DEPA for Training (aka DEPA‑Training) is simple: For India to not only be a consumer of AI, but also a producer of AI, and in a responsible and democratized manner.

AI’s first big leap came from public data. That well is running dry. Our belief is that for the next wave of AI innovation — smarter AI for healthcare, personalized finance, scientific discovery and more — proprietary data will be crucial. But today, that data is fragmented, locked in silos, and difficult to use — often running into challenges around privacy, compliance, and regulatory constraints.

Enter DEPA-Training — a techno-legal Digital Public Infrastructure (DPI) designed to enable secure, agile, and scalable AI model training on sensitive data. It does so by assembling a set of frontier technological primitives:

  • Confidential Clean Rooms (CCRs): Isolated compute environments that can cryptographically attest to their integrity, where data can be processed securely without external exposure.
  • Electronic Contracts: Code-enforced legal agreements between transacting parties, that give data providers control over how their data is used, for eg. through purpose limitation, privacy safeguards and monetization.
  • Secure Training Sandbox: Modular and configurable sandboxes and SDKs for building privacy-preserving and compliant training pipelines across diverse model architectures and data types.

What’s new in DEPA-Training?

Graphical user interface

We’ve introduced an interactive GUI that enables users to explore, configure, and execute DEPA-Training scenarios end to end. The application automatically discovers available scenarios in the repository and provides an intuitive interface to run them — eliminating the need for command-line interaction. A similar GUI workflow is also provided for contract signing.

Scenarios you can try out today

To bring DEPA-Training to life, we showcase a diverse set of scenarios that demonstrate what’s possible in practice. These examples illustrate pathways toward solving larger global challenges and span multiple data modalities (e.g., tabular, images), model paradigms (e.g., classical ML, MLPs, CNNs), and prediction tasks (e.g., regression, classification, image segmentation).

Disease Surveillance Modeling

Pandemics don’t wait. Timely, accurate data can save millions of lives. Yet most infection data is scattered, siloed, and too sensitive to share. With differential privacy, institutions can securely pool data to track virus spread, map risk patterns, and test interventions — powering real-time, data-driven epidemic response.

Example: COVID-19 scenario

Medical Image Modeling

From cancer to cardiovascular disease, from neurology to rare disorders — modern medicine increasingly depends on imaging. Yet medical images are among the hardest datasets to share, trapped in hospital silos and governed by strict privacy laws. DEPA makes it possible to combine imaging data across borders and institutions, unlocking AI models that are more accurate, generalizable, and equitable. This accelerates breakthroughs in diagnostics, improves treatment planning, and addresses one of healthcare’s biggest global challenges: scaling precision medicine while safeguarding patient trust.

Example: BraTS scenario 

Financial Credit Risk Modeling

Access to fair credit fuels economic growth, but risk assessment is often limited by partial data. By safely combining insights across financial institutions, DEPA enables more accurate credit scoring, reduces defaults, and strengthens financial stability — empowering individuals and businesses alike with better access to capital.

Example: Credit Risk scenario

Build your own Scenarios

A new step-by-step guide walks you through building and running your own DEPA-Training scenarios — making it easy to rapidly prototype and iterate with training use-cases of your own.

Currently, DEPA-Training supports the following training frameworks, libraries and file formats (more will be included soon):

  • Frameworks: PyTorch, Scikit‑Learn, XGBoost (LLM Finetuning to be added soon!)
  • Libraries: Opacus, PySpark, Pandas (HuggingFace support coming soon!)
  • Formats: ONNX, Safetensors, Parquet, CSV, HDF5, PNG (No pickle-based formats for security reasons)

What’s in it for the ecosystem?

DEPA-Training democratizes responsible data sharing and model training for all!

  • Enterprises & Startups → Unlock the value of private data to build smarter products and services, while remaining compliant to data laws. Collaborate across organizations to create solutions that no single dataset could power.
  • Research Institutions → Pool data at scale to tackle grand challenges, drive scientific discovery, and advance knowledge for the public good.
  • Policy & Legal Experts → Shape the future of data governance by operationalizing privacy, consent, purpose limitation, and accountability in practice.
  • Builders & researchers → Join us in co-creating this framework!

Get started

👉 Get your hands dirty: DEPA‑Training on GitHub 🛠️

👉 Explore the documentation: DEPA.World 📜
👉 Watch the Open Houses: YouTube Playlist 🎬

👉 Think big: What challenges has data privacy kept off-limits? What data has felt forever inaccessible? With DEPA-Training, those doors may finally open. 💡

Interested in contributing to DEPA? Join our group of no-greed no-glory volunteers! Apply here

Please note: The blog post is authored by our volunteers, Sarang GaladaDr. Shyam Sundaram, Kapil Vaswani and Pavan kumar Adukuri

Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-2

This is a two part blog series. The following is the second part.

In Part 1, we traced how data collaborations are being reimagined, and laid out the conceptual foundations. From redefining consent through the Account Aggregator framework, to recognizing the limits of consent. We explored how privacy-preserving frameworks like differential privacy protect individuals even when models are built from data; how electronic contracts replace slow, manual agreements with enforceable digital rules; and how confidential clean rooms combine secure hardware and privacy guarantees to enable computation without revealing raw data.

In Part 2, we explore how these building blocks come together in practice.

The Connective Tissue: Data Collabs

Technology alone cannot guarantee privacy, fairness, or effective collaboration. Data-sharing ecosystems need institutional scaffolding — entities that can operationalize trust, manage relationships, and abstract away complexity for participants.

This is where Data Collaboratives (or Data Collabs for short) come in.

A Data Collab isn’t a regulator or a government body. Rather, it is a facilitator organization — a neutral yet entrepreneurial entity that enables, orchestrates, and sustains data collaborations using the DEPA Framework behind the scenes, following its standards and processes set by trusted bodies like an Self-Regulatory Organization (SRO) and a Technology Standards Organization (TSO).

You can think of a Data Collab as the connective tissue of a data ecosystem — linking data providers, data consumers, and service providers.

In practice, a Data Collab:

  1. Provides tools and interfaces for participants to register, onboard, sign electronic contracts, and set up secure collaboration environments such as CCRs.
  2. Signs agreements with data providers to clean, prepare, and catalogue datasets so that they can be safely shared with authorized data consumers.
  3. Manages the flow of value — usually collecting payments from data consumers and distributing them fairly to data providers, while covering operational costs.
  4. Assumes accountability for ensuring that all interactions, permissions, and computations are compliant with the DEPA rules and contractual terms.
  5. Adds value beyond infrastructure — offering domain expertise, workflow design, governance and audit support — streamlining data collaborations.

Data Collabs will likely take different forms depending on the domain they serve. For example, some might focus on oncology research, others on financial fraud detection or climate-risk modeling. Each field has its own kinds of data, privacy rules, and ways of working — so it is natural for Data Collabs to specialize.

Because running these collaborations requires significant operational and technical effort, most Data Collabs will probably be for-profit enterprises. At the same time, because they operate on open, interoperable digital public infrastructure like DEPA, they are not monopolistic platforms. Instead, they enable a competitive marketplace where multiple Data Collabs can coexist, offering participants better choices, fairer pricing, and higher-quality services.

In this way, Data Collabs create a persistent institutional layer for responsible data use; enabling long-term, multi-party cooperation that would be impractical to coordinate through ad hoc agreements.

A real-world example: Accelerating Drug Discovery

Imagine three pharmaceutical companies, each developing treatments for the same rare disease. Each has conducted clinical trials with a few hundred patients — but individually, none has enough data in quantity, diversity, or parameter richness to train a robust predictive model of treatment response. 

Much like pieces of a puzzle, valuable insights often emerge only when data from different sources fit together — yet no single party should hold or see the entire picture.

If these companies could combine their datasets, and enrich them with other sources like gene expression profiles, cell imaging results, or public molecular databases, they could uncover deeper patterns and dramatically speed up drug discovery.

But three major barriers stand in their way:

  1. Competitive concerns: Each company treats its clinical data as proprietary and doesn’t want to reveal it to others.
  2. Privacy regulations: Patients gave consent only to the company that ran their trial — not to share data across firms.
  3. Practical limits: Many patients can’t be re-contacted to renew consent, making manual legal processes infeasible.

This is where the DEPA Framework fits in. Here’s how it would work:

A Data Collab is formed for long-term drug discovery collaborations. It signs electronic contracts with each company, defining rights, responsibilities, and permitted use of data. It handles registration, onboarding, and compliance checks through standardized interfaces.

Electronic contracts set out the exact terms of collaboration — specifying each party’s role, the artefacts they contribute, and the rules that govern privacy, usage, and value-sharing.

Each company uploads its encrypted trial data or model into a Confidential Clean Room. Data inside the CCR is decrypted only after checks confirm that all security and compliance conditions are met.

Data is programmatically joined and enriched within the CCR, followed by AI model training using privacy-enhancing techniques like differential privacy, which appropriately bound the chance of re-identifying patients.

Only the final trained model and its accompanying logs — never the underlying data — leave the CCR. The model can be decrypted solely by the authorized data consumer(s) (i.e. the modellers), protecting their trade secrets.

Auditors can review logs and trace the provenance of all artefacts at any time — via the DEPA AI Chain — to verify compliance and resolve disputes.

This framework delivers several benefits for all concerned stakeholders:

  • For society: Promising treatments reach patients faster, while a reusable governance and technology blueprint emerges for future biomedical collaborations. 
  • For the economy: A new data-driven economy is unlocked, enabling novel business interactions and boosting meaningful economic activity.
  • For companies: They can innovate together without exposing trade secrets or breaking regulatory rules, expanding what’s possible in research and development.
  • For regulators and auditors: Every transaction leaves a verifiable trail, simplifying oversight and boosting trust in the ecosystem.

Summing up

India’s journey toward responsible data use has been progressive and layered.

  • It began with the Account Aggregator framework — making consent Open, Revocable, Granular, Auditable, Notifying and Secure (ORGANS principle).
  • For model training and analytics, Privacy-Enhancing Technologies (PETs) — such as Differential Privacy — introduce mechanisms like the privacy budget to safeguard individuals while enabling learning.
  • To make collaboration faster and more reliable, Electronic Contracts replace traditional paper/PDF agreements with machine-readable, enforceable commitments — cutting through the friction of slow legal processes.
  • Confidential Clean Rooms (CCRs) operationalize these safeguards — enabling computation on sensitive data.
  • Finally, Data Collaboratives weave all these elements together — creating institutional and economic frameworks that make responsible, long-term data collaboration practical and sustainable.

This is the next frontier of Digital Public Infrastructure for AI — proving that protection and innovation are not opposites. With the right frameworks, we can have both.

Read Part 1: Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-1

Please note: The blog post is authored by our volunteers, Hari Subramanian and Sarang Galada

For more information, please visit: https://depa.world/

Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-1

This is a two part blog series. The following is the first part.

Every day, we generate vast amounts of digital data — withdrawing cash, visiting doctors, ordering groceries, using various mobile apps. These data trails have the potential to streamline services, personalize experiences, and drive breakthroughs in fields from medicine to finance. Yet they also carry risks: unfair profiling, intrusive targeting, and exposure of sensitive personal information.

This presents a fundamental challenge: How can we harness the value of data while preserving individual privacy?

Understanding Privacy

In the age of AI, privacy violations no longer just expose personal information. They erode autonomy and tilt power toward those who control data and algorithms. As AI systems harvest behavioral cues, digital footprints, and social networks, people lose control, not just over their information, but also over how they are profiled and influenced. This enables subtle yet pervasive forms of coercion, from tailored manipulation of choices to algorithmic exclusion from opportunities.

At scale, such surveillance dynamics erode trust and weaken democratic agency. In this era, privacy is not merely about secrecy, it is a precondition for freedom, dignity and meaningful participation in society.

Privacy is often mistaken for confidentiality, but it’s not simply about hiding information. Privacy is the property of not being able to identify individuals from the signals they produce. Confidentiality, on the other hand, is about limiting access to those signals in the first place. To protect privacy and confidentiality while respecting individual autonomy, we need strong control mechanisms that let people decide what data is shared, with whom, for what purpose, and for how long.

And privacy isn’t a one-time setting. Data moves through a lifecycle — it is collected, used, stored, reused, and eventually deleted. These protections must hold at every stage, or they are lost.

The Mechanics of Consent

Today, consent remains the most common mechanism for privacy — the basic control primitive intended to let people decide how their data is collected, shared, and used. The concept of consent actually predates the digital era — it began in a paper-based world, where signatures and written permissions served as the primary means of authorizing data use. 

It is important to distinguish between two kinds of consent:

  1. Consent to collect data – allowing an entity to initially gather your data (for example, an app accessing your camera).
  2. Consent to share data – granting permission for that data to be used or passed on for a specific purpose (for example, a bank sharing your salary details with a loan underwriter).

Our focus in this article is on consent to share data, since that is where both the greatest privacy challenges and the most meaningful opportunities for value creation lie.

Here is the problem with how consent is currently implemented today. Under frameworks like GDPR, consent has been defined as a very coarse-grained and blunt artifact. The same entity collects your data, gathers your consent, and enforces the rules around its use. For individuals, this typically means an all-or-nothing choice — share everything or nothing at all. And for innovators, it stifles the ability to responsibly explore new uses of data.

India’s Innovation: Unbundling Consent

When India designed its Account Aggregator system for financial data sharing, it chose a different path. Consent to share data was unbundled into two parts:

  • Collect consent: Managed by trusted intermediaries called Account Aggregators.
  • Enforce consent: Managed downstream by Financial Information Users (like banks or wealth advisors), under ecosystem oversight.

https://sahamati.org.in/what-is-account-aggregator/

At the heart of this design lies a set of principles that make consent Open, Revocable, Granular, Auditable, Notifying, and Secure or ORGANS for short.

The Account Aggregator (AA) framework became the first manifestation of DEPA — the Data Empowerment and Protection Architecture. It is now India’s go-to model for user-consented data sharing between institutions, especially for straightforward data transfers and simple inference tasks.

Consent works well for inferences — one-time decisions like a bank checking your last six months of transactions to approve a loan. Yet, in practice, consent has well-known limits. People are asked to grant permission repeatedly, often through long, opaque terms they don’t fully understand, leading to consent fatigue and a loss of meaningful control.

These limitations become clearer when we move from individual decisions to model training and large-scale analytics, where algorithms learn patterns from millions of records. Seeking or managing consent at that scale is neither practical nor effective. 

What’s worse is that models can sometimes memorize sensitive data and inadvertently reveal it later. This highlights the need for new, complementary control primitives that uphold privacy and accountability even when explicit consent isn’t feasible.

Attempts at de-identification — the process of removing or masking identifiers to anonymize data – have significant limitations in practice. Although anonymization is meant to ensure that individuals cannot be re-identified, de-identification techniques are often reversible when datasets are combined with external information. As a result, such approaches offer only weak privacy guarantees, and numerous cases have shown how easily supposedly “anonymous” data can be linked back to individuals.

Privacy-preserving Algorithms: A New Control Primitive for Training and Analytics

To address these limits, a new class of algorithms has emerged under the broad umbrella of Privacy-Enhancing Technologies (PETs). Let us call these privacy-preserving algorithms, to differentiate them from other classes of PETs. They provide a spectrum of technical safeguards that preserve privacy while still enabling useful computation and collaboration on sensitive data.

Among these, Differential Privacy (DP), a mathematical framework for preserving individual privacy in datasets, stands out as a powerful privacy primitive for model training and data analysis.

The key idea: DP adds carefully calibrated noise to queries or model updates so that the results are statistically indistinguishable whether or not any single individual’s data is included. This ensures that nothing specific about an individual can be reliably inferred.

To make this guarantee rigorous, DP introduces the concept of a privacy budget (often represented by the parameters epsilon ε and delta δ):

  • Each query or training step “spends” some of this budget.
  • With more queries or training epochs, the cumulative privacy loss increases.
  • Once the budget is exhausted, no further queries or training is allowed, keeping the risk of re-identification mathematically bounded.

Think of this as a quantitative accounting system for privacy loss. Note, however, that DP comes with a utility tradeoff: adding calibrated noise can reduce model accuracy or data usefulness. Hence, depending on the use-case, the right privacy controls may be achieved through other privacy-preserving algorithms, or a combination thereof.

Electronic Contracts: Digitizing Trust

While privacy-preserving computation enables data to be used securely, participants still need clear agreements defining who may use it, for what purpose, or under what conditions. For such collaborations to function effectively, there must be a well-defined and enforceable contractual framework that specifies each party’s rights, obligations, and permissions.

The need for such a framework becomes even more pressing as organizations seek to unlock real value from data. No single dataset is enough; the most meaningful insights arise when information from multiple sources — hospitals, banks, labs, startups, or agencies — can be combined and analyzed responsibly. Yet each participant brings its own rules, contracts, and compliance obligations, creating a patchwork of agreements that are difficult to align.

Traditionally, contracts are legal documents — PDFs or paper agreements — written in human language, interpreted by lawyers, and enforced by institutions. They work well when a few parties are involved, but in modern data collaborations, this model quickly breaks down.

Today, every new collaboration means drafting, signing, and managing a maze of separate legal agreements, often in different formats, scattered across systems, and maintained by hand. With every participant added, the web of contracts grows bulkier, making coordination slow, expensive and error-prone. Every change or dispute requires human intervention and can take weeks or months to resolve.

This contractual friction has long been the viscous drag holding back scalable, compliant data collaboration. Not because trust is missing, but because it is buried under paperwork.

Electronic contracts transform this equation. They are machine-readable, digitally signed, and executable agreements that translate legal promises into enforceable code. Instead of being static documents, they are active digital objects that the DEPA orchestration layer can interpret and act upon — automatically initiating workflows, enforcing permissions, and ensuring compliance.

In effect, electronic contracts bridge law and computation.  They enable trust, automation, and accountability at digital speed, replacing manual paperwork with a system that can verify, execute, and audit commitments in real time.

Confidential Clean Rooms (CCR)

To operationalize the above elements, we need infrastructure that embeds privacy and compliance mechanisms by design, while also supporting diverse collaboration modalities — from data analytics and model training to various forms of inference.

That’s where Confidential Clean Rooms (CCRs) come in. A CCR is a secure computing environment that allows organizations to collaborate on data without ever sharing it in plain form. You can think of it as a locked, monitored laboratory where data from multiple parties can be brought together for analysis — yet no participant, not even the operator of the lab, can peek inside.

At the heart of every CCR is Confidential Computing — a technology that uses Trusted Execution Environments (TEEs) built into modern processors.  When data enters a TEE, it is encrypted and isolated from the rest of the system, ensuring that even cloud providers or system administrators cannot access it. Computations run inside this protected enclave, and only verified results can leave. Each TEE also produces a cryptographic attestation, a proof that the computation was executed correctly and under the agreed conditions.

https://depa.world/training/architecture

On their own, CCRs provide secure execution. But when combined with other DEPA primitives..

  1. Electronic Contracts, which specify who can use what data for what purpose, and
  2. Privacy-preserving algorithms, which provide mathematical controls about what information can or cannot leak,

..they form a complete privacy-preserving data-sharing stack.

In essence, Confidential Clean Rooms (CCRs) enable confidential, techno-legal, and privacy-preserving computation on data. They make it possible to conduct large-scale data inference, analytics and modelling responsibly, without transferring raw data to any third party, and thereby eliminating the need for consent specifically for data sharing.

But technology alone doesn’t build ecosystems. Who brings this framework to life, abstracting away its complexity for everyday organizations? How might it help us confront our most urgent global challenges — in health, climate and finance? And how could it unlock entirely new kinds of enterprises, fueling a vibrant and responsible data economy for the Intelligence Age?

Data Collabs!

Read Part 2: Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-2

Please note: The blog post is authored by our volunteers, Hari Subramanian and Sarang Galada

For more information, please visit: https://depa.world/

DEPA AI Chain: Empowerment Through Provenance

The DEPA AI Chain is central to operationalising data sharing for AI development and runtime use, while preserving privacy and maintaining verifiable provenance across the entire AI lifecycle — spanning dataset creation and licensing through training, release, inference, and content distribution. Risks and returns are managed through contracts and programmable controls; oversight is delivered via transparency logs and lightweight audits by a self-regulatory organisation (SRO), yielding an efficient and effective supervisory mechanism.

1.0 Unpacking Provenance

Provenance, in digital systems, refers to the systematic tracking of the origin of data and the complete history of the transformations and processes it undergoes throughout its lifecycle. It captures metadata about where the data came from, how it was created, and how it has been modified, combined, or interpreted over time.

Data provenance plays a critical role across a wide range of applications and scenarios. It is essential for ensuring the reproducibility of scientific experiments and computational workflows, enabling others to independently validate results. It supports fault diagnosis and fault tolerance by providing a traceable record that helps isolate and correct errors in complex systems. Provenance is also key to explainability (but also vastly different), as it clarifies how specific outcomes or decisions were derived, particularly in contexts such as AI and automated decision-making. In addition, provenance provides vital support for forensic investigations and auditing, where establishing the trustworthiness and integrity of data is crucial for compliance, accountability, and legal defensibility. By making the history of data transparent and verifiable, provenance serves as a foundational element of trustworthy digital systems.

In the context of personal data sharing, consent without provenance is an unauditable promise. There is a need to include a machine-readable trail linking consent or data protection compliance (the promise) to verifiable facts. 

The concept of provenance is increasingly critical in the context of modern AI systems, which are pervasive across numerous domains. In such systems — often characterised by Markovian or black-box behaviours — establishing clear causal relationships between inputs and outputs is inherently challenging. The opacity of many AI models, particularly deep learning models, makes it difficult to trace how specific outcomes arise, raising significant concerns around trust, accountability, and reproducibility.

Although parallel efforts exist under the banners of Explainable AI (XAI) and Trustworthy AI (TAI), provenance offers a complementary and, in many cases, more scalable and cost-effective approach to enhancing transparency. When thoughtfully designed and integrated into AI pipelines, provenance can provide a systematic, audit-friendly mechanism to capture the lineage and transformations of data and models, often with fewer assumptions than model-specific explainability techniques.

At its core, provenance in AI systems addresses concerns such as: (i) authenticity (of data and its origins), (ii) ownership, (iii) traceability, and (iv) (approximate) reproducibility. In contrast, frameworks such as TAI tend to emphasise aspects including (i) accuracy, (ii) fairness, (iii) explainability, and (iv) safety.

Yet, even with these clear distinctions, provenance is sometimes misframed in policy discussions. Treating any and all provenance artefacts as something that inevitably leads to identity disclosure is an error, one that conflates transparency with surveillance or identity tracking. As critics often put it in “Road to Perdition” terms, unfettered access to provenance data may indeed pose risks — but such access is not meant to be unfettered. It must come with safeguards, constrained by law and subject to due oversight. Framing the choice as either no provenance or dystopia ignores both context and the inevitability of provenance as part of the solution. Even references to Puttaswamy’s judgement, frequently invoked in this debate, are incomplete if not situated within its broader framework of proportionality and legitimate state aim. After all, without engaging with principles such as purpose limitation, retention bounds, or penalties for misuse, how else are systems meant to achieve reliability and harm reduction at scale? The answer lies not in abandoning provenance, but in advancing privacy-preserving provenance — mechanisms that preserve accountability and auditability without compromising individual rights.

1.1 Promise and Potential of AI Chain

The AI Chain is fundamentally a mechanism for capturing the lineage and transformations of data and models in a systematic, effective way, offering a complementary approach to XAI. The AI Chain promises to meet the following requirements:

  • Lineage: Lineage captures the complete journey of data and AI outputs—from consent and licensing, through training, to distribution—ensuring traceability, authenticity, and near-precise reproducibility of AI outcomes. It provides a granular record by assigning unique IDs to datasets and linking a Data Principal’s ID to their data and consent artefact, documenting how data is introduced, modified, combined, and interpreted. To preserve privacy, lineage can be applied to metadata rather than raw data. Cryptographic mechanisms such as hash chains and Merkle trees secure the integrity of the entire lineage.

  • Effective Verification and Its Impact on Liability Allocation: Verifiers can check provenance artefacts—including signatures, attestations, and log proofs—at scale. This may assist in liability and accountability allocation, since the responsibilities of Training Data Providers, Training Data Consumers, publishers, and platforms are clearly stated through policies and contracts, and their actions are immutably recorded in provenance artefacts.

Finally, this approach has second-order effects on data quality: established provenance artefacts increase the value of well-curated datasets.

1.2 What AI Chain Is Not Intended to Do

  • Truthfulness or correctness guarantees: The chain reveals who, what, when, and how a piece of content was created or modified—but it cannot confirm whether the content depicts reality.
  • Bias/fairness or safety adjudication: The chain records facts; value judgements belong to governance, post-facto audits, and external assessments.
  • Enforcement on off-chain actors: Entities falling outside the chain are not snapshotted and can ignore the guardrails.
  • Eliminate the need for legal process: The chain provides strong factual and indisputable evidence, not automatic verdicts.

We welcome feedback and suggestions from all stakeholders at [email protected]

Please note: The blog post is authored by Subodh Sharma, with inputs from Sunu Engineer and Raj Shekhar, all volunteers with iSPIRT.

FAQs and Facts on Techno-Legal Regulation 2.0

This blog continues our discussion on the techno-legal regulation of artificial intelligence (AI), building on our original post from 03.09.25—with a focus on key outstanding issues that required in-depth consideration, alongside the responses and questions we received from stakeholders as of 12.09.25.

Question 1: Since technology is constantly evolving, wouldn’t relying on technology to enable regulation be a flawed approach?

No—what would be flawed is mandating the use of specific technologies for regulation. In fast-evolving domains like AI, rigid technological mandates risk becoming obsolete within a short time—both stifling innovation and undermining public safety. A fundamental insight from systems theory reinforces this: to regulate or control a system that operates at speed x, the regulatory system itself must react and adapt at comparable or greater speed.

AI is evolving at breakneck speed and our understanding of the associated risks and failure pathways remains incomplete. This inherent uncertainty calls for a regulatory framework that is both flexible and adaptive. The most effective way to achieve this is by combining technological agility with failure-related metrics, all governed under lightweight legal constraints and conditions. The techno-legal approach is designed precisely for this: it sets clear outcome-focused obligations for system developers and operators, without prescribing rigid technical solutions, while promoting continuous system monitoring and adaptability to emerging risks.

For example, instead of mandating a particular technique for privacy preservation in AI training, policymakers under the techno-legal approach mandate only the regulatory outcome—i.e., privacy preservation—allowing developers to implement the latest techniques, such as differential privacy or federated learning, to achieve it. As a result, regulation remains effective and adaptive in the face of advancing technology and emerging risks.

Question 2: Isn’t a techno-legal approach most suitable when the subject of regulation is clearly defined? If so, doesn’t AI’s rapidly evolving and non-deterministic nature make it a poor candidate for such regulation?

A precise definition of the regulatory subject is essential for traditional command-and-control regulation. This model relies on ex ante identification and enumeration of risks and corresponding mitigation measures, typically framed as detailed, positive obligations that regulatees must follow. Without a clear regulatory subject, risk assessments can be inaccurate, leading to over-regulation in some areas and under-regulation in others. Given AI’s rapidly evolving and non-deterministic nature, it is ill-suited for such rigid regulation.

In contrast, a techno-legal approach focuses on defining the regulatory outcome, rather than the precise subject of regulation. The regulator requires that the outcome—such as privacy preservation in AI training—be embedded into the technical design of any system that could affect it, without prescribing specific methods to achieve compliance. This removes the need for exhaustive risk enumeration upfront and avoids the pitfalls of narrowly defining the regulatory subject. By focusing on outcomes rather than rigid processes, techno-legal regulation enables continuous adaptability, making it uniquely well-suited to govern AI systems that are non-deterministic and continuously evolving in capability and complexity.

For example, Musical AI’s Rights Management Platform is a techno-legal solution that embeds the regulatory objective of copyright protection directly into the AI model development process. The platform achieves this by restricting training of music generation models to licensed content and integrating attribution technology that logs each output, linking it to the original artist or song. This ensures seamless copyright enforcement and fair revenue sharing. Crucially, the focus remains exclusively on the outcome, i.e.—safeguarding creators’ exclusive rights over the use and distribution of their works, as mandated by copyright laws globally. For such a techno-legal solution to function, the regulator need not define specific AI model types for music generation as the regulatory subject, nor prescribe a particular rights management platform as a compliance mandate. Instead, technologists and companies remain free to innovate in AI music generation, applying any method or architecture they choose—as long as the regulatory outcome of effective copyright protection is achieved.

Question 3: How can techno-legal regulation be designed to avoid becoming redundant or leading to unintended or undesirable consequences?

Techno-legal approaches are intended to tackle the very problem of redundancy in AI regulation, setting clear, outcome-focused obligations for system developers and operators while enabling continuous monitoring and adaptability to emerging risks (as explained in response to Question 1 above).

That said, in addition to having clearly defined regulatory outcomes, techno-legal regulation depends on two key conditions to remain effective and adaptive, ensuring it does not ironically render itself redundant. First, the efficacy of any techno-legal solution must be assessed using well-defined metrics to track its progress toward the regulatory objective. Where direct measurement is impractical, appropriate proxy indicators can be used. Importantly, these metrics should be subject to regular review, ensuring they stay relevant and responsive to emerging externalities and shifts in the operating environment. Second, the techno-legal solution should undergo regular audits to verify its effectiveness and continued alignment with the regulatory objective. This ensures that the system continues to function as intended. When designed with clear objectives, measurable metrics, and periodic auditing—techno-legal regulation remains robust, avoiding potential redundancy and the risk of unintended or undesirable consequences.

Question 4: Wouldn’t the AI Chain architecture under DEPA 2.0 restrict the diversity of relationships in the value chain, thereby limiting novel pathways for innovation?

On the contrary, the AI Chain architecture is specifically designed to enable the broadest diversity of relationships in the AI value chain. Its open, modular design and transparent accountability mechanisms allow various actors—including developers, data providers, service operators, and others—to collaborate with trust and innovate without rigid barriers. This flexibility, in turn, fosters the emergence of novel and unexpected pathways for value creation.

Question 5: Can the allocation of liability—an inherently nuanced area of jurisprudence that has evolved over centuries—be effectively codified into a technology framework?

The allocation of liability, grounded in centuries of jurisprudence, becomes particularly complex when applied to AI. While techno-legal approaches may not be suited to directly assign liability and enforce penalties for AI harms on their own, they could certainly provide valuable tools to help navigate this complexity. For example, the AI Chain architecture under DEPA 2.0 leverages distributed ledger technology to provide end-to-end tracking of system activities and participant actions at a fine-grained level—capturing who performed which action, when, and using which model or dataset, with precise timestamps. Cryptographic proofs such as Merkle trees ensure that every step is irrefutably recorded and immutable. These detailed traces create a tamper-proof, transparent record of events, which auditors, courts, and regulators can use to reconstruct the sequence of actions leading to an AI-related harm.

The technological observability and causal traceability enabled by the architecture could incentivise good behaviour among ecosystem actors, reduce ambiguity in legal and adjudicatory processes, and support the development of robust AI liability jurisprudence—making liability allocation for AI harms streamlined, scalable, transparent, and fair.

We welcome feedback and suggestions from all stakeholders at [email protected]

Please note: The blog post is authored by Raj Shekhar, with inputs from Sunu Engineer and review by Subodh Sharma, all volunteers with iSPIRT.

FAQs and Facts on Techno-Legal Regulation

This blog is an invitation to advance public discourse on techno-legal regulation of artificial intelligence (AI). It builds on an article by Rahul Matthan (15 January 2025), in which he raised reservations about applying techno-legal regulation to AI governance and expressed concerns about the practicability of techno-legal artefacts-particularly their ability to establish liability chains among ecosystem actors-as a tool for enforcing good behaviour and ensuring accountability for AI harms. Through a Q&A format, this blog addresses those reservations and concerns directly, while explaining why techno-legal regulation is not only feasible but also the only practicable and scalable way to regulate AI effectively

Techno-legal regulation isn’t a monolithic concept, it can assume multiple implementations for different problems. DEPA Training embeds privacy and sovereignty requirements directly into AI training pipelines through confidential clean rooms and differential privacy. DEPA Inference creates consent-based data sharing. The proposed AI Chain architecture would establish liability tracking through distributed ledgers. Each solves a different problem using the same core principle: making regulatory compliance systematically enforced rather than legally suggested.

The confusion arises because people conflate these distinct systems. DEPA Training ensures AI models can do data collaboration. Privacy budgets will prevent individual contributions from being traced. DEPA Inference ensures PII based data can’t be accessed without consent because the cryptographic handshake fails without a valid consent artifact. AI Chain would ensure accountability can’t be avoided because every inference generates a log trace. Three different problems, three different techno-legal solutions, one underlying philosophy: architecture enforces what law requires.

Moreover, tools don’t meet the bar of techno-legal: that is precisely why one would want to craft techno-legal docs to accept technology substrates as keys ideas which are accepted and acknowledged as such to be mechanisable to meet certain key properties and invariants (in the real world). Tools are just instances of realising these mechanisable properties/invariants. For instance — can policy be put as attestable and executable code — why not? Policy is a set of unambiguous rules and so long as they are unambiguous and computable, they are automatable. If exceptions to the rule exist then they must also be documented.

There is a general worry that introducing identities into AI systems will erode privacy. From a computer-systems standpoint, that conclusion doesn’t follow. What matters is how identifiers are created and managed and what is recorded. With pairwise (service-scoped) identifiers, selective disclosure, and tamper-evident logging of metadata (not payloads), systems can offer accountability and simultaneously uphold Privacy by Design (PbD). These are not speculative ideas: the web and major identity programs already run variants at scale.

OpenID Connect has long supported pairwise subject identifiers, which purposely give each relying party a different, opaque value, curbing cross-service linkability. Aadhaar’s Virtual ID (VID) and UID tokenization make the same design choice in India: a revocable, tokenized identifier is presented instead of the Aadhaar number, and per-agency tokens prevent easy correlation across services while remaining auditable. In both cases, the principle is the same—identity is scoped to a context.

On the web, the W3C Verifiable Credentials (VC) 2.0 model and cryptographic suites such as BBS+ allow a holder to prove only the claims that are necessary (for example, “over 18”) while withholding the rest; the SD-JWT work in the IETF ecosystem supports similar selective-disclosure for JWTs (JSON Web Tokens). The direction of travel — both in standards and deployments — is to treat “need-to-know” as a first-class property.

Every time a browser trusts a public TLS certificate, it relies on Certificate Transparency (CT) — append-only Merkle-tree logs with efficient inclusion and consistency proofs—to keep Certificate Authorities honest. Chrome and Apple have required CT for certificates issued after 2018. Therein lies a lesson for AI: append-only, publicly auditable logs are one mature way to record event receipts without exposing content.

PbD’s “positive-sum” stance is compatible with a metadata-only accountability layer. Instead of retaining prompts, outputs, or personal payloads, systems can emit signed, append-only receipts that capture who/what/which/when: a scoped user identifier, model and dataset versions, operation type (e.g., generate/transform/moderate), timestamp, and the responsible (but not necessarily trusted) operator or process. Auditors later verify that events occurred and in which order via Merkle proofs; when a lawful process requires more detail, selective-disclosure credentials release the minimum necessary information. This is the same architectural separation that keeps web PKI and identity wallets both auditable and privacy-preserving.

When we track things securely, we do not create a surveillance state. We create a modelable, measurable, manageable state. When the tracking data is misused by parties – parties in power or parties with power to access the data, bypassing access checks – then they have the ability to create a surveillance state or cause damage. DEPA liability chains are designed to establish the connections between different parts of the data economy ecosystem, but using strong cryptographic techniques to detect and protect against unauthorised access.

Traceability and agency/activity chains are needed to construct the data economy ecosystem robustly.

India needs techno-legal regulation because we can’t afford not to have it. We don’t have thousands of judges to adjudicate AI harm. We don’t have armies of auditors to verify compliance. We have scale challenges the West doesn’t face, governing AI for 1.4 billion people requires architectural enforcement. We need to protect our people and enable our innovators.

The question isn’t whether we need techno-legal regulation, it’s whether we’re honest about what happens without it. Without DEPA Training’s cryptographic enforcement, AI systems will train on unauthorized data because detection is impossible at scale. Without immutable audit trails, companies will claim compliance while violating every principle because verification requires resources we don’t have. Without architectural enforcement, the most vulnerable Indians, those who can’t afford lawyers, don’t understand technology, can’t navigate bureaucracy, will be harmed first and most.

AI space is an unknown space. To define legal regulation in a space we need to be able to enumerate ( exhaustively if possible) all the failure modes in the system and then frame the regulations to prevent, to detect, to curtail impact, to correct after the event etc. When we know the details one can compute the legal implications and consequences and define a legal regulation ( 80 percent) supported by technology ( 20 percent). When we are dealing with an unknown space, unknown in the sense that the failure modes are not enumerable, then we can do techno-legal regulation in an evolutionary manner ( even more so when the activity is distributed in space and time and occurring with a high frequency ). Here we start with a base implementation and evolve it based on the discovery of failure modes. We can argue that such an evolutionary approach to creating regulation that not only protects but also fosters growth needs to be implemented on a technology substrate (80 percent tech 20 percent human). Otherwise the evolution will be very slow and the regulation will be out of sync with market needs.

True, current technologies may not be able to solve use limitation and/or data minimisation in the world of AI ex-ante, however, the question should be can we construct testable tech mechanisms to check violations of these requirements ex-post. I believe that is certainly possible — challenging but doable.

DEPA does solve for this indirectly. Retention restrictions, usage limitation, data minimization, all require deep understanding of how and where data is being used. DEPA chains track and trace and provide this information which will enable the DEPA framework itself to implement and enforce these and other constraints and conditions on data use. Without a technology framework to do this, it is likely that there will be many more violations of these kinds of conditions without coming to light. The more complex the regulations get, the more technologically advanced and evolutionary the substrate needs to be.

We’re not encoding Platonic ideals of fairness, we’re implementing specific, measurable requirements that regulators and courts have already defined. DEPA Training’s architecture can use techno-legal solutions to enforce fairness principles, it may work like this: when a dataset enters the clean room, the system automatically computes demographic distributions and compares them against regulatory baselines. If biases are detected appropriate remedial measures are effected.

We welcome feedback and suggestions from all stakeholders at [email protected]

Please note: The blog post is authored by our volunteers, Sunu Engineer, Subodh Sharma, Raj Shekhar and Harshit Kacholiya

Lessons from India’s Digital Public Infrastructure Journey

In just a decade, India has redefined how nations can harness technology for the public good. Through Digital Public Infrastructure (DPI), such as Aadhaar, UPI, and Account Aggregator, followed by newer innovations like OCEN and ONDC, India has shown the world how open, interoperable, and inclusive digital systems when designed as privately provisioned, public infrastructure; can spark innovation, scale rapidly, and empower communities at the grassroots.

To capture these lessons and provide a practical guide for policymakers, technologists, and global stakeholders, iSPIRT Foundation has contributed to the development of the DPI Handbook: Foundations of Digital Public Infrastructure. This handbook distills a decade of India’s pioneering experience into actionable insights, frameworks, and design principles that can help other nations build their own inclusive and interoperable DPI. The paper is now published on the Research and Information System for Developing Countries (RIS)

This Handbook is not the product of a single author, but rather the culmination of years of dedicated volunteerism at iSPIRT, where technologists, policymakers, entrepreneurs, and thinkers came together to exchange ideas, build prototypes, and debate design choices. Each page reflects this collaborative spirit, proof that when diverse minds work in concert, they can create frameworks that transform entire societies.

We extend our deepest gratitude to the iSPIRT volunteer community, past and present, whose passion and commitment, have been instrumental in shaping India’s DPI journey. Their contributions embody the ethos of building digital public infrastructure as a shared national mission.

We hope the DPI Handbook becomes both a guide and an inspiration, for nations building their own digital public infrastructure, and for all who believe that technology, when designed for the public good, can change the course of societies.

Please note: The blog post is co-authored by our volunteer, Arun Iyer

iSPIRT would like to extend its gratitude to Shri. Rajeev Chawla – IAS, Strategic Advisor and Chief Knowledge Officer- Ministry of Agriculture & Farmer’s Welfare who co-authored this Handbook, for his insightful perspectives. We would also like to thank Shri. Sachin Chaturvedi, Director General – RIS for graciously writing the Preface to this Handbook

As the AI race across the world heats up, a post: “𝐈𝐧𝐝𝐢𝐚 𝐝𝐨𝐞𝐬𝐧’𝐭 𝐰𝐢𝐬𝐡 𝐭𝐨 𝐛𝐞 𝐣𝐮𝐬𝐭 𝐚 𝐭𝐫𝐚𝐝𝐞 𝐜𝐨𝐥𝐨𝐧𝐲 𝐨𝐟 𝐂𝐡𝐢𝐧𝐚 𝐨𝐫 𝐭𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐲 𝐜𝐨𝐥𝐨𝐧𝐲 𝐨𝐟 𝐭𝐡𝐞 𝐔𝐒”

To succeed at AI, we need a whole-of-nation approach involving deep-tech startups, enabling industrial policy and pre-commercial publicly-funded research.

When the Biden Administration released its AI Diffusion Executive Order a few weeks back restricting GPUs to countries, it became clear that having strategic autonomy in AI was of paramount importance to India.

Just being the use-case capital for AI wasn’t the right way to go.

India doesn’t wish to be a trade colony of China or the technology colony of the US.

What makes AI different is that it needs a whole-of-nation approach. To win at AI we need deep-tech startups, enabling industrial policy and pre-commercial publicly-funded research. It is only when all three come together that magic can happen.

Our resistance to the whole-of-nation approach is understandable. After all, our IT Services and SaaS industry came up without the whole-of-nation approach. So, many people thought that the same playbook would apply to AI.

China has proved with DeepSeek’s R1 and Moonshot AI [another Chinese company’s] Kimi k1.5 that a whole-of-nation approach can have big payoffs. In India, this approach has worked for cryogenic engines, 4G/5G telecom equipment and India Stack. We do remarkable things when we set our mind to it!

Yes, we have lost some time due to the use-case captial camp. But all is not lost. The field is still young and many areas like neurosymbolic AI are very much open.

The Biden AI Diffusion order, and Chinese success has given new vigour to the whole-of-nation camp within government, private sector and civil society. The debate is now over: You will see some good developments become visible in the coming months  #AI #StrategicAutonomy

Also see: https://www.moneycontrol.com/technology/deepseek-s-llm-success-triggers-big-debate-is-india-s-hesitation-a-strategic-mistake-article-12921811.html

Digital Public Infrastructures

This workshop was organized by the Indian think tank iSPIRT Foundation, French Embassy in India, Consulate General of France in Bangalore, and La French Tech in India, based on the following principles:

  • Gathering high level contributors from India and France: industrials, transdisciplinary academics, diplomats, officials, business founders, think tank members, technology makers;
  • Pushing a Workshop format (not an event, not a round table, not a scientific conference), organizing 3 different days with 3 different viewpoints:
    • Philosophical/epistemological/ human sciences,
    • Economical/techno-legal/social sciences/adoption,
    • Application domains and use cases (Health, Culture, Creative Cities, Agriculture);
  • Targeting recommendations toward the AI Action Summit (Paris, February 2025).

Results:

  • More than 80 speakers, 100 participants in person (in Bangalore or in Paris), 200 participants online;
  • 14 different countries represented all over the world (India, France, Canada, USA, Mexico, Guatemala, Brazil, Germany, Netherland, Italia, Spain, Portugal, Belgium, Thailand);
  • An opening session figuring the Ambassador of India to France H.E. Mr. Jawed Ashraf, the French Digital Affairs Ambassador H.E. Mr. Henri Verdier, the Consul general of France in Bangalore Mr. Marc Lamy;
  • A rich repository of content, including speaker presentations, slides and recordings (https://drive.google.com/drive/folders/1eTDbRgw1g8EOBtXS7uRim9I6gtQRbCZB).

Open House on DPI for AI #4: Why India is best suited to be the breeding ground for AI innovation!

This is the 4th blog in a series of blogs describing and signifying the importance of DPI for AI, a privacy-preserving techno-legal framework for AI data collaboration. Readers are encouraged to first go over the earlier blogs for better understanding and continuity. 

We are at the cusp of history with regard to how AI advancements are unfolding and the potential to build a man-machine society of the future economically, socially, and politically. There is a great opportunity to understand and deliver on potentially breakthrough business and societal use cases while developing and advancing foundational capabilities that can adapt to new ideas and challenges in the future. The major startups in Silicon Valley and big techs are focused first on bringing the advancements of AI to first-world problems – optimized and trained for their contexts. However, we know that first world’s solutions may not work in diverse and unstructured contexts in the rest of the world – may not even for all sections of the developed world.

Let’s address the elephant in the room – what are the critical ingredients that an AI ecosystem needs to succeed –  Data, enabling regulatory framework, talent, computing, capital, and a large market. In this open house

we make a case that India is the place that excels in all these dimensions, making it literally a no-brainer whether you are an investor, a researcher, an AI startup, or a product company to come and do it in India for your own success. 

India has one of the most vibrant, diverse, and eager markets in the world, making it a treasure chest of diverse data at scale, which is vital for AI models. While much of this data happens to be proprietary, the DPI for AI data collaboration framework makes it available in an easy and privacy-preserving way to innovators in India. Literally, no other country has such a scale and game plan for training data. One may ask that diversity and scale are indeed India’s strengths but where is the data? Isn’t most of our data with the US-based platforms? In this context, there are three types of Data: 

a. Public Data,
b. Non-Personal Data (NPD), and
c. Proprietary Datasets.

Let’s look at health. India has far more proprietary datasets than the US. It is just frozen in the current setup. Unfreezing this will give us a play in AI. This is exactly what DPI for AI is doing – in a privacy-preserving manner. In the US, health data platforms like those of Apple and Google are entering into agreements with big hospital chains – to supplement their user health data that comes from wearables. How do we better that? This is the US Big Tech-oriented approach – not exactly an ecosystem approach. Democratic unfreezing of health data with hospitals is the key today. DPI for AI would do that even for all – small or big, developers or researchers! We have continental-scale data with more diversity than any other nation. We need a unique way to unlock them to enable the entire ecosystem, not just big corporations. If we can do that, and we think we can via DPI for AI, we will have AI winners from India.

Combine this with India’s forward looking regulatory thought process that balances Regulation for AI and Regulation of AI in a unique way that encourages innovation without compromising on individual privacy and other potential harms of the technology. The diversity and scale of the Indian market act like a forcing function for innovators to think of robustness, safety, and efficiency from the very start which is critical for the innovations in AI to actually result in financial and societal benefits at scale. There are more engineers and scientists of Indian origin who are both creating AI models or developing innovative applications around AI models. Given our demographic dividend, this is one of our strengths for decades to come. Capital and Compute are clearly not our strong points, but capital literally follows the opportunity. Given India’s position of strength on data, regulation, market, and talent, capital is finding its way to India!

So, what are you all waiting for? India welcomes you with continental scale data with a lightweight but safe regulatory regime and talent like no place else to come build, invest, and innovate in India. India has done it in the past in various sectors, and it is strongly positioned to do it again in AI. Let’s do this together. We are just getting started, and, as always, are very eager for your feedback, suggestions, and participation in this journey!

Please share your feedback here
For more information, please visit depa.world

Please note: The blog post is authored by our volunteers, Sharad Sharma, Gaurav Aggarwal, Umakant Soni, and Sunu Engineer

Open House on DEPA Training #3: The Regulatory and Legal Aspects

This is the third in a series of blogs describing  the structure and importance of Digital Public Infrastructure for Artificial Intelligence (DPI for AI), a privacy-preserving techno-legal framework for data and AI model building  collaborations. Readers are encouraged to go over the first and second blogs for better understanding and continuity.

Open House on DEPA Training #1

Open House on DEPA Training #2: DPI to Unfreeze Data Markets. Let’s Make India an AI Nation!

The techno-legal framework of DEPA, elaborated upon in the earlier blogs, provides the foundations. From multiple discussions and history, it is clear that building and growing a vibrant AI economy that can create a product nation in India, requires a regulatory framework. This regulatory structure will serve as the legal partner to the technology aspect and work hand in hand with it. Upon this reliable techno-legal foundation will the ecosystem and global product companies from India be materialized.

‘Data Empowerment And Protection Architecture’ – or DEPA’s – worldview of ‘regulation for AI’, rather than the more conventional ‘Regulation of AI’ espoused by US, EU and so on sets DEPA apart and drives India towards an AI product nation with a global footprint.

How does one envisage the form and function of ‘Regulation for AI’?  In this open house, we have a dialog between technology and legal sides of the approach to explain the significant facets.

In a nutshell, ‘Regulation for AI’ will focus on 

  • what standards the AI models need to adhere to
  • define a lightweight but foolproof path for getting there for startups as well as the big players 
  • provide an environment which deals with many of the compliance and safety aspects ab initio 
  • define ways to remove hurdles from the innovator’s paths

In contrast, ‘Regulation of AI’ deals with what AI models cannot be and do and the tests and conditions that they have to pass depending on the risk classes that they are placed into. This is akin to certification processes in many fields such as pharma, transportation and so on which impose heavy cost burdens, especially on new innovators. For instance, many pharma companies which develop potentially good drug candidates run out of steam trying to meet the clinical trial conditions. Very often they are unable to find a valid and sizeable sample population to test their products as a part of the mandatory certification process. 

The current standards in the new Regulation of AI in the US, EU and so on leave many aspects such as the risk model classification process undefined, leading to regulatory uncertainty. This also works against investment driven innovation and consequent growth of the ecosystem in multiple ways.

The path to value both for the economy and the users, lies in the power of the data being projected into the universe of applications. These applications will be powered by the AI models in addition to other algorithmic engines. The earlier blogs already addressed the need and the way for data to make their way into models. 

For the models to exhibit their power, we must make sure they are reliable and used widely. This requires the AI models be accessible and available and most importantly, ‘do no harm’ when they are applied, through mistakes, misuse or malfeasance.  In addition to this, humans or their agents must not be allowed to harm the markets and users through monopoly control of the AI models. Large scale monopolistic control of these models which have global use and relevance can lead to situations which are beyond national or international legislation to control or curb. 

In the DEPA model, this benign, and in most ways, benevolent environment is created by a concinnous combination of technology and legal principles. Having analyzed the technological aspects of data privacy in the earlier blogs in this chain, here we will talk about the regulations implemented via a Self-Regulatory Organization – the SRO.  

Though not fully fleshed out, the SRO provides functions such as registration and roles to participants such as TDP (Training Data Provider), TDC (Training Data Consumer) and CCRP (Confidential Clean Room Provider). Many of these functions have been implemented in part to support the tech stack that we have released with respect to the CCR (Ref: DEPA Open House #1). This tech stack currently supports registration and allows the interactions between participants to be mediated via electronic contracts (the technological counterpart of legal contracts). 

The technology that validates the models through pre-deployment analysis based on complex adaptive system models is under development and is based on diverse research efforts across the world. This technology is designed for measuring the positive and negative impact of use of these models on societies at small and large scale and in short and long timescales.   

‘Complex adaptive system models’ are dynamic models which can capture agents with their state information and the multiple feedback loops which determine the changes in the system at different scales, sometimes simultaneously. The large number of components and the many kinds of feedback loops with their dynamic nature are what make these models complex and adaptive. These models, while still in their infancy in many ways, are critical to the question of understanding the AI models’ impact on societies. 

The SRO guides and supports the ecosystem players in building and deploying their models in a safe and secure way with lightweight regulatory ceilings so that large product companies in many fields like finance, healthcare, and education can grow and reach a happy consumer base. This is key to growing the ecosystem and connecting it to other parts of the India stack. 

We envisage leveraging the current legal system in terms of the different Acts (DPDP, IT Act, Copyright etc.) and models of Data Protection through CDO ( Chief Data Office) and CGO ( Grievance Office) in companies in India in defining the SRO’s role and features further.

The regulatory model also looks at the question of data ownership and copyright issues, especially in the context of Generative AI. We require large foundation models independent of the ‘Big Tech’ to fight potential monopolies. These models should be reflective of the local diversity to serve as reliable engines in the context of India. We need these models built and deployed locally, to be able to play a role as a product nation without being subverted or subjugated in our cyberspace strategies. 

To light up the AI sky with these many ‘fireflies’ in different parts of India, infrastructure for compute as well as market access is needed. The SRO creates policies that are not restrictive or protective but promotes participation and value realization. The data players, compute providers, market creators and users need to be able to play with each other in a safe space. Sufficient protection of copyright and creative invention will be provided via existing IP law to incentivize participation while not restricting to the point of killing innovation – this is the balance that the regulatory framework of SRO strives to reach. 

Drawing upon ideas of risk-based categorization of models (such as in the EU AI Act) and regulatory models (including punitive and compensatory measures) proportional to them, the models in India Stack will be easily compatible with international standards, as well as a universal or global standard, should an organization such as a UN agency define it. This makes global market reach of   AI models and products built in India, an easier target to achieve. 

We conjecture that these different aspects of DEPA will release the data from its silos. AI models will proliferate with multiple players profiting from infrastructure, model building, and exporting them to the world. Many applications will be built which will be used both in India (as part of the India Stack) and the world. It is through these models and applications that the latent potential and knowledge in the vast stores of data in India will be realized.

Please share your feedback here

For more information, please visit depa.world

Please note: The blog post is authored by our volunteers, Antara Vats, Vibhav Mithal and Sunu Engineer

Open House on DEPA Training #2: DPI to Unfreeze Data Markets. Let’s Make India an AI Nation!

This is the 2nd blog in a series of blogs describing and signifying the importance of DPI for AI, a privacy-preserving techno-legal framework for AI data collaboration. Readers are encouraged to first go over the 1st blog for better understanding and continuity.

What is unique about the techno-legal framework in DPI for AI is that it allows for data collaboration without compromising on data privacy. Now let’s put this in perspective of Indian enterprises and users. This framework can potentially revolutionize the entire ecosystem to slingshot India towards an AI product nation where we are not just using AI models developed within India but exporting the same. What is the biggest roadblock in this dream? In this open house (https://bit.ly/DEPA-2), we make a case that privileged access to data from Indian contexts is not only necessary to develop AI-based systems that are much more relatable to Indians but in fact, gives Indian innovators a distinct advantage over much larger and better funded big tech companies from the west.

Let’s get started. Clearly, there is a race to build larger and larger AI models these days trained on as much training data as possible. Most training data used in the models is publicly available on the web. Given that Indian enterprises are quite behind in this race, it is unlikely that we will catch up by simply following their footsteps. But what many folks outside of AI research circles often miss is that there has been credible research that shows that access to even relatively small amounts of contextual data can drastically reduce the data and compute requirements to achieve the same level of performance.

This sounds great, right, but (there is always a but!) much of this Indian context data is not in one place and is hidden behind numerous government and corporate walls. What makes the situation worse is most of these data silos are enterprises of traditional nature and are not the typical centers of innovation, at least for modern technologies like AI. This is a fertile ground for DPI for AI. The three core concepts of DPI for AI ensure that this data sitting in silos can be seamlessly (thanks to digital contracts) and democratically shared with innovators around India in a privacy-preserving manner (thanks to differential privacy). The innovators also do not need to worry one bit about the confidentiality of their IP (thanks to confidential computing). The techno-legal framework makes it super easy for anyone to abide by the privacy regulations without sweat. This will keep them safe from future litigations as long as they follow easy-to-follow guidelines provided in the framework. This is what we refer to as the unfreezing of data markets in this Open House. This unfreezing is critical for our innovators to get easy access to contextual data to give them a much-needed leg up against the Western onslaught in the field of AI. This is India’s moment to leapfrog in the field of AI as we have done in so many domains (payments, identity, internet, etc.). Given the enormity of the goal and the need to get it right, we seek participation from folks from varied expertise and backgrounds. Please share your feedback here

For more information, please visit depa.world

Please note: The blog post is authored by our volunteers, Hari Subramanian and Gaurav Aggarwal.

Introducing DEPA for Training: DPI for Responsible AI

In the last decade, we’ve seen an extraordinary explosion in the volume of data that we, as a species, generate. The possibilities that this data-driven era unlocks are mind-boggling. Large language models, trained on vast datasets, are already capable of performing a wide array of tasks, from text completion to image generation and understanding. The potential applications of AI, especially for societal problems, are limitless. However, lurking in the shadows are significant concerns such as security and privacy, abuse and mis-information, fairness and bias.

These concerns have led to stringent data protection laws worldwide, such as the European Union’s General Data Protection Regulation (GDPR) and California’s Consumer Privacy Act (CCPA), and the European AI Act. India has recently joined this global privacy protection movement with the Data Protection and Privacy Act of 2023 (DPDP Act). These laws emphasize the importance of individuals’ right to privacy and the need for real-time, granular, and specific consent when sharing personal data.

In parallel with privacy laws, India has also adopted a techno-legal approach for data sharing, led by the Data Empowerment and Protection Architecture (DEPA). This new-age digital infrastructure introduces a streamlined and compliant approach to consent-driven data sharing.

Today, we are taking the next step in this journey by extending DEPA to support training of AI models in accordance with responsible AI principles. This new digital public infrastructure, which we call DEPA for Training, is designed to address critical scenarios such as detecting fraud using datasets from multiple banks, helping with tracking and diagnosis of diseases, all without compromising the privacy of data principals.

DEPA for Training is founded on three core concepts, digital contracts, confidential clean rooms, and differential privacy. Digital contracts backed by transparent contract services make it simpler for organizations to share datasets and collaborate by recording data sharing agreements transparently. Confidential clean rooms ensure data security and privacy by processing datasets and training models in hardware protected secure environments. Differential privacy further fortifies this approach, allowing AI models to learn from data without risking individuals’ privacy. You can find more details how these concepts come together to create an open and fair ecosystem at https://depa.world.

DEPA for Training represents the first step towards a more responsible and secure AI landscape, where data privacy and technological advancement can thrive side by side. We believe that collaboration and feedback from experts, stakeholders, and the wider community are essential in shaping the future of this approach. Please share your feedback here

For more information, please visit depa.world

Please note: The blog post is authored by our volunteer, Kapil Vaswani