DEPA-Training: Tech Updates

We’ve rolled out some exciting updates for DEPA‑Training, making it easier to rapidly prototype and run diverse training scenarios — complete with electronic contracts, confidential cleanrooms, privacy-preservation and configurable training SDKs.


✨ What’s new

👉 GUI for end-to-end execution

👉 Step-by-step guide to create and run your own training scenarios

👉 New scenarios introduced for complex multi-party training: MRI brain tumor segmentation, credit default risk prediction


Before we dive in, let’s quickly recall what the Data Empowerment and Protection Architecture (DEPA) really is.

What is DEPA and why does it matter?

India Stack is evolving at population scale, enabling the flow of people (Aadhaar, eKYC, DigiLocker, DigiYatra, etc), money (UPI, OCEN), and information (DEPA and Account Aggregator) through Digital Public Infrastructure (DPI). DEPA is critical in this third layer as it enables the responsible flow of data between individuals and organisations for more complex tasks such as AI model training, AI inference and analytics. 

As the name suggests, DEPA rests on two key elements. The first is protection, founded on the bedrock of privacy, consent, accountability and purpose limitation of data. The second is empowerment, democratizing data access and enabling the ecosystem to responsibly innovate with it, whether for training AI models, personalizing products and services, advancing scientific research, and a lot more.

In light of emerging data protection laws such as the DPDP, GDPR, and others, there is a need for a framework that enables the responsible use of data — unlocking its value while ensuring regulatory compliance and serving the broader public interest.

Ultimately, DEPA solves for two core challenges at the heart of data sharing — Trust and Flow — keeping the rest open and flexible for innovation.

What is DEPA‑Training?

The vision behind DEPA for Training (aka DEPA‑Training) is simple: For India to not only be a consumer of AI, but also a producer of AI, and in a responsible and democratized manner.

AI’s first big leap came from public data. That well is running dry. Our belief is that for the next wave of AI innovation — smarter AI for healthcare, personalized finance, scientific discovery and more — proprietary data will be crucial. But today, that data is fragmented, locked in silos, and difficult to use — often running into challenges around privacy, compliance, and regulatory constraints.

Enter DEPA-Training — a techno-legal Digital Public Infrastructure (DPI) designed to enable secure, agile, and scalable AI model training on sensitive data. It does so by assembling a set of frontier technological primitives:

  • Confidential Clean Rooms (CCRs): Isolated compute environments that can cryptographically attest to their integrity, where data can be processed securely without external exposure.
  • Electronic Contracts: Code-enforced legal agreements between transacting parties, that give data providers control over how their data is used, for eg. through purpose limitation, privacy safeguards and monetization.
  • Secure Training Sandbox: Modular and configurable sandboxes and SDKs for building privacy-preserving and compliant training pipelines across diverse model architectures and data types.

What’s new in DEPA-Training?

Graphical user interface

We’ve introduced an interactive GUI that enables users to explore, configure, and execute DEPA-Training scenarios end to end. The application automatically discovers available scenarios in the repository and provides an intuitive interface to run them — eliminating the need for command-line interaction. A similar GUI workflow is also provided for contract signing.

Scenarios you can try out today

To bring DEPA-Training to life, we showcase a diverse set of scenarios that demonstrate what’s possible in practice. These examples illustrate pathways toward solving larger global challenges and span multiple data modalities (e.g., tabular, images), model paradigms (e.g., classical ML, MLPs, CNNs), and prediction tasks (e.g., regression, classification, image segmentation).

Disease Surveillance Modeling

Pandemics don’t wait. Timely, accurate data can save millions of lives. Yet most infection data is scattered, siloed, and too sensitive to share. With differential privacy, institutions can securely pool data to track virus spread, map risk patterns, and test interventions — powering real-time, data-driven epidemic response.

Example: COVID-19 scenario

Medical Image Modeling

From cancer to cardiovascular disease, from neurology to rare disorders — modern medicine increasingly depends on imaging. Yet medical images are among the hardest datasets to share, trapped in hospital silos and governed by strict privacy laws. DEPA makes it possible to combine imaging data across borders and institutions, unlocking AI models that are more accurate, generalizable, and equitable. This accelerates breakthroughs in diagnostics, improves treatment planning, and addresses one of healthcare’s biggest global challenges: scaling precision medicine while safeguarding patient trust.

Example: BraTS scenario 

Financial Credit Risk Modeling

Access to fair credit fuels economic growth, but risk assessment is often limited by partial data. By safely combining insights across financial institutions, DEPA enables more accurate credit scoring, reduces defaults, and strengthens financial stability — empowering individuals and businesses alike with better access to capital.

Example: Credit Risk scenario

Build your own Scenarios

A new step-by-step guide walks you through building and running your own DEPA-Training scenarios — making it easy to rapidly prototype and iterate with training use-cases of your own.

Currently, DEPA-Training supports the following training frameworks, libraries and file formats (more will be included soon):

  • Frameworks: PyTorch, Scikit‑Learn, XGBoost (LLM Finetuning to be added soon!)
  • Libraries: Opacus, PySpark, Pandas (HuggingFace support coming soon!)
  • Formats: ONNX, Safetensors, Parquet, CSV, HDF5, PNG (No pickle-based formats for security reasons)

What’s in it for the ecosystem?

DEPA-Training democratizes responsible data sharing and model training for all!

  • Enterprises & Startups → Unlock the value of private data to build smarter products and services, while remaining compliant to data laws. Collaborate across organizations to create solutions that no single dataset could power.
  • Research Institutions → Pool data at scale to tackle grand challenges, drive scientific discovery, and advance knowledge for the public good.
  • Policy & Legal Experts → Shape the future of data governance by operationalizing privacy, consent, purpose limitation, and accountability in practice.
  • Builders & researchers → Join us in co-creating this framework!

Get started

👉 Get your hands dirty: DEPA‑Training on GitHub 🛠️

👉 Explore the documentation: DEPA.World 📜
👉 Watch the Open Houses: YouTube Playlist 🎬

👉 Think big: What challenges has data privacy kept off-limits? What data has felt forever inaccessible? With DEPA-Training, those doors may finally open. 💡

Interested in contributing to DEPA? Join our group of no-greed no-glory volunteers! Apply here

Please note: The blog post is authored by our volunteers, Sarang GaladaDr. Shyam Sundaram, Kapil Vaswani and Pavan kumar Adukuri

Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-2

This is a two part blog series. The following is the second part.

In Part 1, we traced how data collaborations are being reimagined, and laid out the conceptual foundations. From redefining consent through the Account Aggregator framework, to recognizing the limits of consent. We explored how privacy-preserving frameworks like differential privacy protect individuals even when models are built from data; how electronic contracts replace slow, manual agreements with enforceable digital rules; and how confidential clean rooms combine secure hardware and privacy guarantees to enable computation without revealing raw data.

In Part 2, we explore how these building blocks come together in practice.

The Connective Tissue: Data Collabs

Technology alone cannot guarantee privacy, fairness, or effective collaboration. Data-sharing ecosystems need institutional scaffolding — entities that can operationalize trust, manage relationships, and abstract away complexity for participants.

This is where Data Collaboratives (or Data Collabs for short) come in.

A Data Collab isn’t a regulator or a government body. Rather, it is a facilitator organization — a neutral yet entrepreneurial entity that enables, orchestrates, and sustains data collaborations using the DEPA Framework behind the scenes, following its standards and processes set by trusted bodies like an Self-Regulatory Organization (SRO) and a Technology Standards Organization (TSO).

You can think of a Data Collab as the connective tissue of a data ecosystem — linking data providers, data consumers, and service providers.

In practice, a Data Collab:

  1. Provides tools and interfaces for participants to register, onboard, sign electronic contracts, and set up secure collaboration environments such as CCRs.
  2. Signs agreements with data providers to clean, prepare, and catalogue datasets so that they can be safely shared with authorized data consumers.
  3. Manages the flow of value — usually collecting payments from data consumers and distributing them fairly to data providers, while covering operational costs.
  4. Assumes accountability for ensuring that all interactions, permissions, and computations are compliant with the DEPA rules and contractual terms.
  5. Adds value beyond infrastructure — offering domain expertise, workflow design, governance and audit support — streamlining data collaborations.

Data Collabs will likely take different forms depending on the domain they serve. For example, some might focus on oncology research, others on financial fraud detection or climate-risk modeling. Each field has its own kinds of data, privacy rules, and ways of working — so it is natural for Data Collabs to specialize.

Because running these collaborations requires significant operational and technical effort, most Data Collabs will probably be for-profit enterprises. At the same time, because they operate on open, interoperable digital public infrastructure like DEPA, they are not monopolistic platforms. Instead, they enable a competitive marketplace where multiple Data Collabs can coexist, offering participants better choices, fairer pricing, and higher-quality services.

In this way, Data Collabs create a persistent institutional layer for responsible data use; enabling long-term, multi-party cooperation that would be impractical to coordinate through ad hoc agreements.

A real-world example: Accelerating Drug Discovery

Imagine three pharmaceutical companies, each developing treatments for the same rare disease. Each has conducted clinical trials with a few hundred patients — but individually, none has enough data in quantity, diversity, or parameter richness to train a robust predictive model of treatment response. 

Much like pieces of a puzzle, valuable insights often emerge only when data from different sources fit together — yet no single party should hold or see the entire picture.

If these companies could combine their datasets, and enrich them with other sources like gene expression profiles, cell imaging results, or public molecular databases, they could uncover deeper patterns and dramatically speed up drug discovery.

But three major barriers stand in their way:

  1. Competitive concerns: Each company treats its clinical data as proprietary and doesn’t want to reveal it to others.
  2. Privacy regulations: Patients gave consent only to the company that ran their trial — not to share data across firms.
  3. Practical limits: Many patients can’t be re-contacted to renew consent, making manual legal processes infeasible.

This is where the DEPA Framework fits in. Here’s how it would work:

A Data Collab is formed for long-term drug discovery collaborations. It signs electronic contracts with each company, defining rights, responsibilities, and permitted use of data. It handles registration, onboarding, and compliance checks through standardized interfaces.

Electronic contracts set out the exact terms of collaboration — specifying each party’s role, the artefacts they contribute, and the rules that govern privacy, usage, and value-sharing.

Each company uploads its encrypted trial data or model into a Confidential Clean Room. Data inside the CCR is decrypted only after checks confirm that all security and compliance conditions are met.

Data is programmatically joined and enriched within the CCR, followed by AI model training using privacy-enhancing techniques like differential privacy, which appropriately bound the chance of re-identifying patients.

Only the final trained model and its accompanying logs — never the underlying data — leave the CCR. The model can be decrypted solely by the authorized data consumer(s) (i.e. the modellers), protecting their trade secrets.

Auditors can review logs and trace the provenance of all artefacts at any time — via the DEPA AI Chain — to verify compliance and resolve disputes.

This framework delivers several benefits for all concerned stakeholders:

  • For society: Promising treatments reach patients faster, while a reusable governance and technology blueprint emerges for future biomedical collaborations. 
  • For the economy: A new data-driven economy is unlocked, enabling novel business interactions and boosting meaningful economic activity.
  • For companies: They can innovate together without exposing trade secrets or breaking regulatory rules, expanding what’s possible in research and development.
  • For regulators and auditors: Every transaction leaves a verifiable trail, simplifying oversight and boosting trust in the ecosystem.

Summing up

India’s journey toward responsible data use has been progressive and layered.

  • It began with the Account Aggregator framework — making consent Open, Revocable, Granular, Auditable, Notifying and Secure (ORGANS principle).
  • For model training and analytics, Privacy-Enhancing Technologies (PETs) — such as Differential Privacy — introduce mechanisms like the privacy budget to safeguard individuals while enabling learning.
  • To make collaboration faster and more reliable, Electronic Contracts replace traditional paper/PDF agreements with machine-readable, enforceable commitments — cutting through the friction of slow legal processes.
  • Confidential Clean Rooms (CCRs) operationalize these safeguards — enabling computation on sensitive data.
  • Finally, Data Collaboratives weave all these elements together — creating institutional and economic frameworks that make responsible, long-term data collaboration practical and sustainable.

This is the next frontier of Digital Public Infrastructure for AI — proving that protection and innovation are not opposites. With the right frameworks, we can have both.

Read Part 1: Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-1

Please note: The blog post is authored by our volunteers, Hari Subramanian and Sarang Galada

For more information, please visit: https://depa.world/

Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-1

This is a two part blog series. The following is the first part.

Every day, we generate vast amounts of digital data — withdrawing cash, visiting doctors, ordering groceries, using various mobile apps. These data trails have the potential to streamline services, personalize experiences, and drive breakthroughs in fields from medicine to finance. Yet they also carry risks: unfair profiling, intrusive targeting, and exposure of sensitive personal information.

This presents a fundamental challenge: How can we harness the value of data while preserving individual privacy?

Understanding Privacy

In the age of AI, privacy violations no longer just expose personal information. They erode autonomy and tilt power toward those who control data and algorithms. As AI systems harvest behavioral cues, digital footprints, and social networks, people lose control, not just over their information, but also over how they are profiled and influenced. This enables subtle yet pervasive forms of coercion, from tailored manipulation of choices to algorithmic exclusion from opportunities.

At scale, such surveillance dynamics erode trust and weaken democratic agency. In this era, privacy is not merely about secrecy, it is a precondition for freedom, dignity and meaningful participation in society.

Privacy is often mistaken for confidentiality, but it’s not simply about hiding information. Privacy is the property of not being able to identify individuals from the signals they produce. Confidentiality, on the other hand, is about limiting access to those signals in the first place. To protect privacy and confidentiality while respecting individual autonomy, we need strong control mechanisms that let people decide what data is shared, with whom, for what purpose, and for how long.

And privacy isn’t a one-time setting. Data moves through a lifecycle — it is collected, used, stored, reused, and eventually deleted. These protections must hold at every stage, or they are lost.

The Mechanics of Consent

Today, consent remains the most common mechanism for privacy — the basic control primitive intended to let people decide how their data is collected, shared, and used. The concept of consent actually predates the digital era — it began in a paper-based world, where signatures and written permissions served as the primary means of authorizing data use. 

It is important to distinguish between two kinds of consent:

  1. Consent to collect data – allowing an entity to initially gather your data (for example, an app accessing your camera).
  2. Consent to share data – granting permission for that data to be used or passed on for a specific purpose (for example, a bank sharing your salary details with a loan underwriter).

Our focus in this article is on consent to share data, since that is where both the greatest privacy challenges and the most meaningful opportunities for value creation lie.

Here is the problem with how consent is currently implemented today. Under frameworks like GDPR, consent has been defined as a very coarse-grained and blunt artifact. The same entity collects your data, gathers your consent, and enforces the rules around its use. For individuals, this typically means an all-or-nothing choice — share everything or nothing at all. And for innovators, it stifles the ability to responsibly explore new uses of data.

India’s Innovation: Unbundling Consent

When India designed its Account Aggregator system for financial data sharing, it chose a different path. Consent to share data was unbundled into two parts:

  • Collect consent: Managed by trusted intermediaries called Account Aggregators.
  • Enforce consent: Managed downstream by Financial Information Users (like banks or wealth advisors), under ecosystem oversight.

https://sahamati.org.in/what-is-account-aggregator/

At the heart of this design lies a set of principles that make consent Open, Revocable, Granular, Auditable, Notifying, and Secure or ORGANS for short.

The Account Aggregator (AA) framework became the first manifestation of DEPA — the Data Empowerment and Protection Architecture. It is now India’s go-to model for user-consented data sharing between institutions, especially for straightforward data transfers and simple inference tasks.

Consent works well for inferences — one-time decisions like a bank checking your last six months of transactions to approve a loan. Yet, in practice, consent has well-known limits. People are asked to grant permission repeatedly, often through long, opaque terms they don’t fully understand, leading to consent fatigue and a loss of meaningful control.

These limitations become clearer when we move from individual decisions to model training and large-scale analytics, where algorithms learn patterns from millions of records. Seeking or managing consent at that scale is neither practical nor effective. 

What’s worse is that models can sometimes memorize sensitive data and inadvertently reveal it later. This highlights the need for new, complementary control primitives that uphold privacy and accountability even when explicit consent isn’t feasible.

Attempts at de-identification — the process of removing or masking identifiers to anonymize data – have significant limitations in practice. Although anonymization is meant to ensure that individuals cannot be re-identified, de-identification techniques are often reversible when datasets are combined with external information. As a result, such approaches offer only weak privacy guarantees, and numerous cases have shown how easily supposedly “anonymous” data can be linked back to individuals.

Privacy-preserving Algorithms: A New Control Primitive for Training and Analytics

To address these limits, a new class of algorithms has emerged under the broad umbrella of Privacy-Enhancing Technologies (PETs). Let us call these privacy-preserving algorithms, to differentiate them from other classes of PETs. They provide a spectrum of technical safeguards that preserve privacy while still enabling useful computation and collaboration on sensitive data.

Among these, Differential Privacy (DP), a mathematical framework for preserving individual privacy in datasets, stands out as a powerful privacy primitive for model training and data analysis.

The key idea: DP adds carefully calibrated noise to queries or model updates so that the results are statistically indistinguishable whether or not any single individual’s data is included. This ensures that nothing specific about an individual can be reliably inferred.

To make this guarantee rigorous, DP introduces the concept of a privacy budget (often represented by the parameters epsilon ε and delta δ):

  • Each query or training step “spends” some of this budget.
  • With more queries or training epochs, the cumulative privacy loss increases.
  • Once the budget is exhausted, no further queries or training is allowed, keeping the risk of re-identification mathematically bounded.

Think of this as a quantitative accounting system for privacy loss. Note, however, that DP comes with a utility tradeoff: adding calibrated noise can reduce model accuracy or data usefulness. Hence, depending on the use-case, the right privacy controls may be achieved through other privacy-preserving algorithms, or a combination thereof.

Electronic Contracts: Digitizing Trust

While privacy-preserving computation enables data to be used securely, participants still need clear agreements defining who may use it, for what purpose, or under what conditions. For such collaborations to function effectively, there must be a well-defined and enforceable contractual framework that specifies each party’s rights, obligations, and permissions.

The need for such a framework becomes even more pressing as organizations seek to unlock real value from data. No single dataset is enough; the most meaningful insights arise when information from multiple sources — hospitals, banks, labs, startups, or agencies — can be combined and analyzed responsibly. Yet each participant brings its own rules, contracts, and compliance obligations, creating a patchwork of agreements that are difficult to align.

Traditionally, contracts are legal documents — PDFs or paper agreements — written in human language, interpreted by lawyers, and enforced by institutions. They work well when a few parties are involved, but in modern data collaborations, this model quickly breaks down.

Today, every new collaboration means drafting, signing, and managing a maze of separate legal agreements, often in different formats, scattered across systems, and maintained by hand. With every participant added, the web of contracts grows bulkier, making coordination slow, expensive and error-prone. Every change or dispute requires human intervention and can take weeks or months to resolve.

This contractual friction has long been the viscous drag holding back scalable, compliant data collaboration. Not because trust is missing, but because it is buried under paperwork.

Electronic contracts transform this equation. They are machine-readable, digitally signed, and executable agreements that translate legal promises into enforceable code. Instead of being static documents, they are active digital objects that the DEPA orchestration layer can interpret and act upon — automatically initiating workflows, enforcing permissions, and ensuring compliance.

In effect, electronic contracts bridge law and computation.  They enable trust, automation, and accountability at digital speed, replacing manual paperwork with a system that can verify, execute, and audit commitments in real time.

Confidential Clean Rooms (CCR)

To operationalize the above elements, we need infrastructure that embeds privacy and compliance mechanisms by design, while also supporting diverse collaboration modalities — from data analytics and model training to various forms of inference.

That’s where Confidential Clean Rooms (CCRs) come in. A CCR is a secure computing environment that allows organizations to collaborate on data without ever sharing it in plain form. You can think of it as a locked, monitored laboratory where data from multiple parties can be brought together for analysis — yet no participant, not even the operator of the lab, can peek inside.

At the heart of every CCR is Confidential Computing — a technology that uses Trusted Execution Environments (TEEs) built into modern processors.  When data enters a TEE, it is encrypted and isolated from the rest of the system, ensuring that even cloud providers or system administrators cannot access it. Computations run inside this protected enclave, and only verified results can leave. Each TEE also produces a cryptographic attestation, a proof that the computation was executed correctly and under the agreed conditions.

https://depa.world/training/architecture

On their own, CCRs provide secure execution. But when combined with other DEPA primitives..

  1. Electronic Contracts, which specify who can use what data for what purpose, and
  2. Privacy-preserving algorithms, which provide mathematical controls about what information can or cannot leak,

..they form a complete privacy-preserving data-sharing stack.

In essence, Confidential Clean Rooms (CCRs) enable confidential, techno-legal, and privacy-preserving computation on data. They make it possible to conduct large-scale data inference, analytics and modelling responsibly, without transferring raw data to any third party, and thereby eliminating the need for consent specifically for data sharing.

But technology alone doesn’t build ecosystems. Who brings this framework to life, abstracting away its complexity for everyday organizations? How might it help us confront our most urgent global challenges — in health, climate and finance? And how could it unlock entirely new kinds of enterprises, fueling a vibrant and responsible data economy for the Intelligence Age?

Data Collabs!

Read Part 2: Privacy in the Age of AI: New Frameworks for Data Collaboration-Part-2

Please note: The blog post is authored by our volunteers, Hari Subramanian and Sarang Galada

For more information, please visit: https://depa.world/

DEPA AI Chain: Empowerment Through Provenance

The DEPA AI Chain is central to operationalising data sharing for AI development and runtime use, while preserving privacy and maintaining verifiable provenance across the entire AI lifecycle — spanning dataset creation and licensing through training, release, inference, and content distribution. Risks and returns are managed through contracts and programmable controls; oversight is delivered via transparency logs and lightweight audits by a self-regulatory organisation (SRO), yielding an efficient and effective supervisory mechanism.

1.0 Unpacking Provenance

Provenance, in digital systems, refers to the systematic tracking of the origin of data and the complete history of the transformations and processes it undergoes throughout its lifecycle. It captures metadata about where the data came from, how it was created, and how it has been modified, combined, or interpreted over time.

Data provenance plays a critical role across a wide range of applications and scenarios. It is essential for ensuring the reproducibility of scientific experiments and computational workflows, enabling others to independently validate results. It supports fault diagnosis and fault tolerance by providing a traceable record that helps isolate and correct errors in complex systems. Provenance is also key to explainability (but also vastly different), as it clarifies how specific outcomes or decisions were derived, particularly in contexts such as AI and automated decision-making. In addition, provenance provides vital support for forensic investigations and auditing, where establishing the trustworthiness and integrity of data is crucial for compliance, accountability, and legal defensibility. By making the history of data transparent and verifiable, provenance serves as a foundational element of trustworthy digital systems.

In the context of personal data sharing, consent without provenance is an unauditable promise. There is a need to include a machine-readable trail linking consent or data protection compliance (the promise) to verifiable facts. 

The concept of provenance is increasingly critical in the context of modern AI systems, which are pervasive across numerous domains. In such systems — often characterised by Markovian or black-box behaviours — establishing clear causal relationships between inputs and outputs is inherently challenging. The opacity of many AI models, particularly deep learning models, makes it difficult to trace how specific outcomes arise, raising significant concerns around trust, accountability, and reproducibility.

Although parallel efforts exist under the banners of Explainable AI (XAI) and Trustworthy AI (TAI), provenance offers a complementary and, in many cases, more scalable and cost-effective approach to enhancing transparency. When thoughtfully designed and integrated into AI pipelines, provenance can provide a systematic, audit-friendly mechanism to capture the lineage and transformations of data and models, often with fewer assumptions than model-specific explainability techniques.

At its core, provenance in AI systems addresses concerns such as: (i) authenticity (of data and its origins), (ii) ownership, (iii) traceability, and (iv) (approximate) reproducibility. In contrast, frameworks such as TAI tend to emphasise aspects including (i) accuracy, (ii) fairness, (iii) explainability, and (iv) safety.

Yet, even with these clear distinctions, provenance is sometimes misframed in policy discussions. Treating any and all provenance artefacts as something that inevitably leads to identity disclosure is an error, one that conflates transparency with surveillance or identity tracking. As critics often put it in “Road to Perdition” terms, unfettered access to provenance data may indeed pose risks — but such access is not meant to be unfettered. It must come with safeguards, constrained by law and subject to due oversight. Framing the choice as either no provenance or dystopia ignores both context and the inevitability of provenance as part of the solution. Even references to Puttaswamy’s judgement, frequently invoked in this debate, are incomplete if not situated within its broader framework of proportionality and legitimate state aim. After all, without engaging with principles such as purpose limitation, retention bounds, or penalties for misuse, how else are systems meant to achieve reliability and harm reduction at scale? The answer lies not in abandoning provenance, but in advancing privacy-preserving provenance — mechanisms that preserve accountability and auditability without compromising individual rights.

1.1 Promise and Potential of AI Chain

The AI Chain is fundamentally a mechanism for capturing the lineage and transformations of data and models in a systematic, effective way, offering a complementary approach to XAI. The AI Chain promises to meet the following requirements:

  • Lineage: Lineage captures the complete journey of data and AI outputs—from consent and licensing, through training, to distribution—ensuring traceability, authenticity, and near-precise reproducibility of AI outcomes. It provides a granular record by assigning unique IDs to datasets and linking a Data Principal’s ID to their data and consent artefact, documenting how data is introduced, modified, combined, and interpreted. To preserve privacy, lineage can be applied to metadata rather than raw data. Cryptographic mechanisms such as hash chains and Merkle trees secure the integrity of the entire lineage.

  • Effective Verification and Its Impact on Liability Allocation: Verifiers can check provenance artefacts—including signatures, attestations, and log proofs—at scale. This may assist in liability and accountability allocation, since the responsibilities of Training Data Providers, Training Data Consumers, publishers, and platforms are clearly stated through policies and contracts, and their actions are immutably recorded in provenance artefacts.

Finally, this approach has second-order effects on data quality: established provenance artefacts increase the value of well-curated datasets.

1.2 What AI Chain Is Not Intended to Do

  • Truthfulness or correctness guarantees: The chain reveals who, what, when, and how a piece of content was created or modified—but it cannot confirm whether the content depicts reality.
  • Bias/fairness or safety adjudication: The chain records facts; value judgements belong to governance, post-facto audits, and external assessments.
  • Enforcement on off-chain actors: Entities falling outside the chain are not snapshotted and can ignore the guardrails.
  • Eliminate the need for legal process: The chain provides strong factual and indisputable evidence, not automatic verdicts.

We welcome feedback and suggestions from all stakeholders at [email protected]

Please note: The blog post is authored by Subodh Sharma, with inputs from Sunu Engineer and Raj Shekhar, all volunteers with iSPIRT.

FAQs and Facts on Techno-Legal Regulation 2.0

This blog continues our discussion on the techno-legal regulation of artificial intelligence (AI), building on our original post from 03.09.25—with a focus on key outstanding issues that required in-depth consideration, alongside the responses and questions we received from stakeholders as of 12.09.25.

Question 1: Since technology is constantly evolving, wouldn’t relying on technology to enable regulation be a flawed approach?

No—what would be flawed is mandating the use of specific technologies for regulation. In fast-evolving domains like AI, rigid technological mandates risk becoming obsolete within a short time—both stifling innovation and undermining public safety. A fundamental insight from systems theory reinforces this: to regulate or control a system that operates at speed x, the regulatory system itself must react and adapt at comparable or greater speed.

AI is evolving at breakneck speed and our understanding of the associated risks and failure pathways remains incomplete. This inherent uncertainty calls for a regulatory framework that is both flexible and adaptive. The most effective way to achieve this is by combining technological agility with failure-related metrics, all governed under lightweight legal constraints and conditions. The techno-legal approach is designed precisely for this: it sets clear outcome-focused obligations for system developers and operators, without prescribing rigid technical solutions, while promoting continuous system monitoring and adaptability to emerging risks.

For example, instead of mandating a particular technique for privacy preservation in AI training, policymakers under the techno-legal approach mandate only the regulatory outcome—i.e., privacy preservation—allowing developers to implement the latest techniques, such as differential privacy or federated learning, to achieve it. As a result, regulation remains effective and adaptive in the face of advancing technology and emerging risks.

Question 2: Isn’t a techno-legal approach most suitable when the subject of regulation is clearly defined? If so, doesn’t AI’s rapidly evolving and non-deterministic nature make it a poor candidate for such regulation?

A precise definition of the regulatory subject is essential for traditional command-and-control regulation. This model relies on ex ante identification and enumeration of risks and corresponding mitigation measures, typically framed as detailed, positive obligations that regulatees must follow. Without a clear regulatory subject, risk assessments can be inaccurate, leading to over-regulation in some areas and under-regulation in others. Given AI’s rapidly evolving and non-deterministic nature, it is ill-suited for such rigid regulation.

In contrast, a techno-legal approach focuses on defining the regulatory outcome, rather than the precise subject of regulation. The regulator requires that the outcome—such as privacy preservation in AI training—be embedded into the technical design of any system that could affect it, without prescribing specific methods to achieve compliance. This removes the need for exhaustive risk enumeration upfront and avoids the pitfalls of narrowly defining the regulatory subject. By focusing on outcomes rather than rigid processes, techno-legal regulation enables continuous adaptability, making it uniquely well-suited to govern AI systems that are non-deterministic and continuously evolving in capability and complexity.

For example, Musical AI’s Rights Management Platform is a techno-legal solution that embeds the regulatory objective of copyright protection directly into the AI model development process. The platform achieves this by restricting training of music generation models to licensed content and integrating attribution technology that logs each output, linking it to the original artist or song. This ensures seamless copyright enforcement and fair revenue sharing. Crucially, the focus remains exclusively on the outcome, i.e.—safeguarding creators’ exclusive rights over the use and distribution of their works, as mandated by copyright laws globally. For such a techno-legal solution to function, the regulator need not define specific AI model types for music generation as the regulatory subject, nor prescribe a particular rights management platform as a compliance mandate. Instead, technologists and companies remain free to innovate in AI music generation, applying any method or architecture they choose—as long as the regulatory outcome of effective copyright protection is achieved.

Question 3: How can techno-legal regulation be designed to avoid becoming redundant or leading to unintended or undesirable consequences?

Techno-legal approaches are intended to tackle the very problem of redundancy in AI regulation, setting clear, outcome-focused obligations for system developers and operators while enabling continuous monitoring and adaptability to emerging risks (as explained in response to Question 1 above).

That said, in addition to having clearly defined regulatory outcomes, techno-legal regulation depends on two key conditions to remain effective and adaptive, ensuring it does not ironically render itself redundant. First, the efficacy of any techno-legal solution must be assessed using well-defined metrics to track its progress toward the regulatory objective. Where direct measurement is impractical, appropriate proxy indicators can be used. Importantly, these metrics should be subject to regular review, ensuring they stay relevant and responsive to emerging externalities and shifts in the operating environment. Second, the techno-legal solution should undergo regular audits to verify its effectiveness and continued alignment with the regulatory objective. This ensures that the system continues to function as intended. When designed with clear objectives, measurable metrics, and periodic auditing—techno-legal regulation remains robust, avoiding potential redundancy and the risk of unintended or undesirable consequences.

Question 4: Wouldn’t the AI Chain architecture under DEPA 2.0 restrict the diversity of relationships in the value chain, thereby limiting novel pathways for innovation?

On the contrary, the AI Chain architecture is specifically designed to enable the broadest diversity of relationships in the AI value chain. Its open, modular design and transparent accountability mechanisms allow various actors—including developers, data providers, service operators, and others—to collaborate with trust and innovate without rigid barriers. This flexibility, in turn, fosters the emergence of novel and unexpected pathways for value creation.

Question 5: Can the allocation of liability—an inherently nuanced area of jurisprudence that has evolved over centuries—be effectively codified into a technology framework?

The allocation of liability, grounded in centuries of jurisprudence, becomes particularly complex when applied to AI. While techno-legal approaches may not be suited to directly assign liability and enforce penalties for AI harms on their own, they could certainly provide valuable tools to help navigate this complexity. For example, the AI Chain architecture under DEPA 2.0 leverages distributed ledger technology to provide end-to-end tracking of system activities and participant actions at a fine-grained level—capturing who performed which action, when, and using which model or dataset, with precise timestamps. Cryptographic proofs such as Merkle trees ensure that every step is irrefutably recorded and immutable. These detailed traces create a tamper-proof, transparent record of events, which auditors, courts, and regulators can use to reconstruct the sequence of actions leading to an AI-related harm.

The technological observability and causal traceability enabled by the architecture could incentivise good behaviour among ecosystem actors, reduce ambiguity in legal and adjudicatory processes, and support the development of robust AI liability jurisprudence—making liability allocation for AI harms streamlined, scalable, transparent, and fair.

We welcome feedback and suggestions from all stakeholders at [email protected]

Please note: The blog post is authored by Raj Shekhar, with inputs from Sunu Engineer and review by Subodh Sharma, all volunteers with iSPIRT.

FAQs and Facts on Techno-Legal Regulation

This blog is an invitation to advance public discourse on techno-legal regulation of artificial intelligence (AI). It builds on an article by Rahul Matthan (15 January 2025), in which he raised reservations about applying techno-legal regulation to AI governance and expressed concerns about the practicability of techno-legal artefacts-particularly their ability to establish liability chains among ecosystem actors-as a tool for enforcing good behaviour and ensuring accountability for AI harms. Through a Q&A format, this blog addresses those reservations and concerns directly, while explaining why techno-legal regulation is not only feasible but also the only practicable and scalable way to regulate AI effectively

Techno-legal regulation isn’t a monolithic concept, it can assume multiple implementations for different problems. DEPA Training embeds privacy and sovereignty requirements directly into AI training pipelines through confidential clean rooms and differential privacy. DEPA Inference creates consent-based data sharing. The proposed AI Chain architecture would establish liability tracking through distributed ledgers. Each solves a different problem using the same core principle: making regulatory compliance systematically enforced rather than legally suggested.

The confusion arises because people conflate these distinct systems. DEPA Training ensures AI models can do data collaboration. Privacy budgets will prevent individual contributions from being traced. DEPA Inference ensures PII based data can’t be accessed without consent because the cryptographic handshake fails without a valid consent artifact. AI Chain would ensure accountability can’t be avoided because every inference generates a log trace. Three different problems, three different techno-legal solutions, one underlying philosophy: architecture enforces what law requires.

Moreover, tools don’t meet the bar of techno-legal: that is precisely why one would want to craft techno-legal docs to accept technology substrates as keys ideas which are accepted and acknowledged as such to be mechanisable to meet certain key properties and invariants (in the real world). Tools are just instances of realising these mechanisable properties/invariants. For instance — can policy be put as attestable and executable code — why not? Policy is a set of unambiguous rules and so long as they are unambiguous and computable, they are automatable. If exceptions to the rule exist then they must also be documented.

There is a general worry that introducing identities into AI systems will erode privacy. From a computer-systems standpoint, that conclusion doesn’t follow. What matters is how identifiers are created and managed and what is recorded. With pairwise (service-scoped) identifiers, selective disclosure, and tamper-evident logging of metadata (not payloads), systems can offer accountability and simultaneously uphold Privacy by Design (PbD). These are not speculative ideas: the web and major identity programs already run variants at scale.

OpenID Connect has long supported pairwise subject identifiers, which purposely give each relying party a different, opaque value, curbing cross-service linkability. Aadhaar’s Virtual ID (VID) and UID tokenization make the same design choice in India: a revocable, tokenized identifier is presented instead of the Aadhaar number, and per-agency tokens prevent easy correlation across services while remaining auditable. In both cases, the principle is the same—identity is scoped to a context.

On the web, the W3C Verifiable Credentials (VC) 2.0 model and cryptographic suites such as BBS+ allow a holder to prove only the claims that are necessary (for example, “over 18”) while withholding the rest; the SD-JWT work in the IETF ecosystem supports similar selective-disclosure for JWTs (JSON Web Tokens). The direction of travel — both in standards and deployments — is to treat “need-to-know” as a first-class property.

Every time a browser trusts a public TLS certificate, it relies on Certificate Transparency (CT) — append-only Merkle-tree logs with efficient inclusion and consistency proofs—to keep Certificate Authorities honest. Chrome and Apple have required CT for certificates issued after 2018. Therein lies a lesson for AI: append-only, publicly auditable logs are one mature way to record event receipts without exposing content.

PbD’s “positive-sum” stance is compatible with a metadata-only accountability layer. Instead of retaining prompts, outputs, or personal payloads, systems can emit signed, append-only receipts that capture who/what/which/when: a scoped user identifier, model and dataset versions, operation type (e.g., generate/transform/moderate), timestamp, and the responsible (but not necessarily trusted) operator or process. Auditors later verify that events occurred and in which order via Merkle proofs; when a lawful process requires more detail, selective-disclosure credentials release the minimum necessary information. This is the same architectural separation that keeps web PKI and identity wallets both auditable and privacy-preserving.

When we track things securely, we do not create a surveillance state. We create a modelable, measurable, manageable state. When the tracking data is misused by parties – parties in power or parties with power to access the data, bypassing access checks – then they have the ability to create a surveillance state or cause damage. DEPA liability chains are designed to establish the connections between different parts of the data economy ecosystem, but using strong cryptographic techniques to detect and protect against unauthorised access.

Traceability and agency/activity chains are needed to construct the data economy ecosystem robustly.

India needs techno-legal regulation because we can’t afford not to have it. We don’t have thousands of judges to adjudicate AI harm. We don’t have armies of auditors to verify compliance. We have scale challenges the West doesn’t face, governing AI for 1.4 billion people requires architectural enforcement. We need to protect our people and enable our innovators.

The question isn’t whether we need techno-legal regulation, it’s whether we’re honest about what happens without it. Without DEPA Training’s cryptographic enforcement, AI systems will train on unauthorized data because detection is impossible at scale. Without immutable audit trails, companies will claim compliance while violating every principle because verification requires resources we don’t have. Without architectural enforcement, the most vulnerable Indians, those who can’t afford lawyers, don’t understand technology, can’t navigate bureaucracy, will be harmed first and most.

AI space is an unknown space. To define legal regulation in a space we need to be able to enumerate ( exhaustively if possible) all the failure modes in the system and then frame the regulations to prevent, to detect, to curtail impact, to correct after the event etc. When we know the details one can compute the legal implications and consequences and define a legal regulation ( 80 percent) supported by technology ( 20 percent). When we are dealing with an unknown space, unknown in the sense that the failure modes are not enumerable, then we can do techno-legal regulation in an evolutionary manner ( even more so when the activity is distributed in space and time and occurring with a high frequency ). Here we start with a base implementation and evolve it based on the discovery of failure modes. We can argue that such an evolutionary approach to creating regulation that not only protects but also fosters growth needs to be implemented on a technology substrate (80 percent tech 20 percent human). Otherwise the evolution will be very slow and the regulation will be out of sync with market needs.

True, current technologies may not be able to solve use limitation and/or data minimisation in the world of AI ex-ante, however, the question should be can we construct testable tech mechanisms to check violations of these requirements ex-post. I believe that is certainly possible — challenging but doable.

DEPA does solve for this indirectly. Retention restrictions, usage limitation, data minimization, all require deep understanding of how and where data is being used. DEPA chains track and trace and provide this information which will enable the DEPA framework itself to implement and enforce these and other constraints and conditions on data use. Without a technology framework to do this, it is likely that there will be many more violations of these kinds of conditions without coming to light. The more complex the regulations get, the more technologically advanced and evolutionary the substrate needs to be.

We’re not encoding Platonic ideals of fairness, we’re implementing specific, measurable requirements that regulators and courts have already defined. DEPA Training’s architecture can use techno-legal solutions to enforce fairness principles, it may work like this: when a dataset enters the clean room, the system automatically computes demographic distributions and compares them against regulatory baselines. If biases are detected appropriate remedial measures are effected.

We welcome feedback and suggestions from all stakeholders at [email protected]

Please note: The blog post is authored by our volunteers, Sunu Engineer, Subodh Sharma, Raj Shekhar and Harshit Kacholiya

As the AI race across the world heats up, a post: “𝐈𝐧𝐝𝐢𝐚 𝐝𝐨𝐞𝐬𝐧’𝐭 𝐰𝐢𝐬𝐡 𝐭𝐨 𝐛𝐞 𝐣𝐮𝐬𝐭 𝐚 𝐭𝐫𝐚𝐝𝐞 𝐜𝐨𝐥𝐨𝐧𝐲 𝐨𝐟 𝐂𝐡𝐢𝐧𝐚 𝐨𝐫 𝐭𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐲 𝐜𝐨𝐥𝐨𝐧𝐲 𝐨𝐟 𝐭𝐡𝐞 𝐔𝐒”

To succeed at AI, we need a whole-of-nation approach involving deep-tech startups, enabling industrial policy and pre-commercial publicly-funded research.

When the Biden Administration released its AI Diffusion Executive Order a few weeks back restricting GPUs to countries, it became clear that having strategic autonomy in AI was of paramount importance to India.

Just being the use-case capital for AI wasn’t the right way to go.

India doesn’t wish to be a trade colony of China or the technology colony of the US.

What makes AI different is that it needs a whole-of-nation approach. To win at AI we need deep-tech startups, enabling industrial policy and pre-commercial publicly-funded research. It is only when all three come together that magic can happen.

Our resistance to the whole-of-nation approach is understandable. After all, our IT Services and SaaS industry came up without the whole-of-nation approach. So, many people thought that the same playbook would apply to AI.

China has proved with DeepSeek’s R1 and Moonshot AI [another Chinese company’s] Kimi k1.5 that a whole-of-nation approach can have big payoffs. In India, this approach has worked for cryogenic engines, 4G/5G telecom equipment and India Stack. We do remarkable things when we set our mind to it!

Yes, we have lost some time due to the use-case captial camp. But all is not lost. The field is still young and many areas like neurosymbolic AI are very much open.

The Biden AI Diffusion order, and Chinese success has given new vigour to the whole-of-nation camp within government, private sector and civil society. The debate is now over: You will see some good developments become visible in the coming months  #AI #StrategicAutonomy

Also see: https://www.moneycontrol.com/technology/deepseek-s-llm-success-triggers-big-debate-is-india-s-hesitation-a-strategic-mistake-article-12921811.html

Open House on DPI for AI #4: Why India is best suited to be the breeding ground for AI innovation!

This is the 4th blog in a series of blogs describing and signifying the importance of DPI for AI, a privacy-preserving techno-legal framework for AI data collaboration. Readers are encouraged to first go over the earlier blogs for better understanding and continuity. 

We are at the cusp of history with regard to how AI advancements are unfolding and the potential to build a man-machine society of the future economically, socially, and politically. There is a great opportunity to understand and deliver on potentially breakthrough business and societal use cases while developing and advancing foundational capabilities that can adapt to new ideas and challenges in the future. The major startups in Silicon Valley and big techs are focused first on bringing the advancements of AI to first-world problems – optimized and trained for their contexts. However, we know that first world’s solutions may not work in diverse and unstructured contexts in the rest of the world – may not even for all sections of the developed world.

Let’s address the elephant in the room – what are the critical ingredients that an AI ecosystem needs to succeed –  Data, enabling regulatory framework, talent, computing, capital, and a large market. In this open house

we make a case that India is the place that excels in all these dimensions, making it literally a no-brainer whether you are an investor, a researcher, an AI startup, or a product company to come and do it in India for your own success. 

India has one of the most vibrant, diverse, and eager markets in the world, making it a treasure chest of diverse data at scale, which is vital for AI models. While much of this data happens to be proprietary, the DPI for AI data collaboration framework makes it available in an easy and privacy-preserving way to innovators in India. Literally, no other country has such a scale and game plan for training data. One may ask that diversity and scale are indeed India’s strengths but where is the data? Isn’t most of our data with the US-based platforms? In this context, there are three types of Data: 

a. Public Data,
b. Non-Personal Data (NPD), and
c. Proprietary Datasets.

Let’s look at health. India has far more proprietary datasets than the US. It is just frozen in the current setup. Unfreezing this will give us a play in AI. This is exactly what DPI for AI is doing – in a privacy-preserving manner. In the US, health data platforms like those of Apple and Google are entering into agreements with big hospital chains – to supplement their user health data that comes from wearables. How do we better that? This is the US Big Tech-oriented approach – not exactly an ecosystem approach. Democratic unfreezing of health data with hospitals is the key today. DPI for AI would do that even for all – small or big, developers or researchers! We have continental-scale data with more diversity than any other nation. We need a unique way to unlock them to enable the entire ecosystem, not just big corporations. If we can do that, and we think we can via DPI for AI, we will have AI winners from India.

Combine this with India’s forward looking regulatory thought process that balances Regulation for AI and Regulation of AI in a unique way that encourages innovation without compromising on individual privacy and other potential harms of the technology. The diversity and scale of the Indian market act like a forcing function for innovators to think of robustness, safety, and efficiency from the very start which is critical for the innovations in AI to actually result in financial and societal benefits at scale. There are more engineers and scientists of Indian origin who are both creating AI models or developing innovative applications around AI models. Given our demographic dividend, this is one of our strengths for decades to come. Capital and Compute are clearly not our strong points, but capital literally follows the opportunity. Given India’s position of strength on data, regulation, market, and talent, capital is finding its way to India!

So, what are you all waiting for? India welcomes you with continental scale data with a lightweight but safe regulatory regime and talent like no place else to come build, invest, and innovate in India. India has done it in the past in various sectors, and it is strongly positioned to do it again in AI. Let’s do this together. We are just getting started, and, as always, are very eager for your feedback, suggestions, and participation in this journey!

Please share your feedback here
For more information, please visit depa.world

Please note: The blog post is authored by our volunteers, Sharad Sharma, Gaurav Aggarwal, Umakant Soni, and Sunu Engineer

Ready for India’s AI ambitions: We are now one step closer to having a modern regulation for and of AI

The passage of the Digital Personal Data Protection Bill 2023 (DPDP) by the Lok Sabha is significant in more ways than one. The Bill aims to enforce and promote lawful usage of digital personal data and stipulates how organisations and individuals should navigate privacy rights and handle personal data.

Creating effective mechanisms to enable data governance has become one of the top priorities for countries around the world. The challenge for policymakers is designing legal and regulatory frameworks that clearly lay down the rights of data principals and obligations for data fiduciaries.

The Digital Data Protection Bill is a much-needed step in this direction, taken after months of deliberations and discussions. Such normative frameworks are critical to secure regulatory certainty for enterprises. However, innovative technical measures are required to support their operationalisation.

In the past couple of years, India has made significant strides in adopting a techno-legal approach to data governance. Through this approach, India is building technical infrastructure for authorising access to datasets that embed privacy and security principles in its design.

Data also lies at the heart of AI innovations that can address significant global challenges. India’s unique techno-legal approach to data governance is applicable across the life cycle of machine learning systems.  It complements the country’s ambition of supporting its growing AI start-up ecosystem while providing privacy guarantees.

As part of India Stack, the Data Empowerment and Protection Architecture (DEPA) launch in 2017 was India’s paradigm-defining moment for the inference cycle of the machine learning life cycle. It proposed the setting up of Consent Managers (CMs), also known as Account Aggregators in the financial sector.

This approach, also mentioned in the current iteration of the DPDP (Chapter 2, [Sections 7-9]), ensures individuals can exercise control over their data and can provide revocable, granular, auditable, and secure consent for every piece of data using standard Application Programming Interface (APIs). The secured consent artefact records an individual’s consent for the stated purpose.
It allows users to transfer their data from those data businesses that hold it to those that have to use it to provide individuals certain services while ensuring purpose limitation. For instance, individuals can share their financial data residing within their banks with potential loan service providers to get the best loan package.

DEPA is India’s attempt at securing a consent-based data-sharing framework. It has facilitated the financial inclusion of millions of its citizens. Eight of India’s largest banks were early adopters of the framework starting in 2021. Currently, 415 entities, including CMs, Financial Information Providers, and Users, participate across various DEPA implementation stages.

However, the training cycle of an AI model demands substantially more data to make accurate predictions in the inference cycle. As such, there is a need for more of such robust technical solutions that disrupt data silos and connect data providers with model developers while providing privacy and security guarantees to individuals who are the real owners of their own data.

With DEPA 2.0, India is already experimenting with a solution inspired by confidential computing called the Confidential Computing Rooms, or CCRs. CCRs are hardware-protected secure computing environments where sensitive data can be accessed in an algorithmically controlled manner for model training.

These algorithms create an environment for data to be used while ensuring compliance with privacy and security guarantees for citizens are upheld and data does not exchange hands. Techniques like differential privacy introduce controlled noise or randomness into the training process to protect individuals’ privacy by making it harder to identify them or extract sensitive information.

To make CCR work, model certifications and e-contracts are essential elements. The model sent to CCR for training has to be certified to ensure it upholds privacy and security guidelines, and the e-contracts are required to facilitate authorized and auditable access to datasets. For example, loan providers can authorise access to a representative sample of the datasets residing with them to model developers via CCR for model training. This arrangement will be facilitated via e-contracts once the CCR verifies the validity of the model certification provided by the modeller.

India’s significant progress with technical measures that are aligned with domestic legal frameworks provides it with a head start in the AI innovation landscape. Countries all across the globe are struggling to find solutions to facilitate personal data sharing for model development that prioritises security and privacy. Multiple lawsuits have recently been filed against OpenAI across numerous jurisdictions for unlawfully using personal data to train their models.

India’s unique approach to data governance, where both technical and legal frameworks fit like a puzzle and balance the thin line of promoting AI innovation while providing privacy guarantees, is well-positioned to guide global approaches to data governance.

In a quiet and disciplined fashion, over the last six years, India has put the critical techno-legal pieces in place for becoming a significant AI player in the world alongside US and China. Like them, we have continental-scale data and the talent to shape our future. With the passage of the DPDP Bill, we are now one step closer to having modern regulatory tools for effective regulation of AI and regulation for AI.

Co-Authored by Antara Vats and Sharad Sharma
A version of this was published on Financial Express, August 9th, 2023.

Deep Learning Session with Julia Computing

robot-2167836_640

An evening with Julia

iSPIRT, in association with Julia Computing, is proud to announce an open-session with Prof. Alan Edelman and Dr. Viral Shah, co-creators of Julia, an open source programming language, and co-founders of Julia Computing Inc.

The event will be hosted in Koramangala, Bangalore, on the 22nd of January 2018, from 5 – 7pm. Register now for an invite to the session or to join the live cast (venue details will be shared along with the invite).

What is Julia?

Julia is a modern, high-level, high-performance programming language for numerical computing, data science and AI. With syntax that is familiar to users of other technical computing environments, Julia solves the eternal two language problem, by providing productivity similar to Matlab or R, and performance similar to C for writing mathematical and statistical software. Julia is open source, its research is anchored at MIT since 2009 and is growing very rapidly in its adoption across industries, from Finance to Life Sciences.

Julia … can even be used by those who aren’t programmers by training

Why Should You Care?

Julia’s deep mathematical roots and comprehensive customizability make it very friendly to work with for data scientists, who are generally limited with popular Machine Learning approaches due to their issues with customizability and efficiency.

This 90 minute session will cover a quick introduction to Julia, showcase a few challenging and compute-intensive case studies that Julia has helped solve across domains, and demonstrate how Julia as a framework is used to enable nextgen AI & ML modeling & computing with the AI tools of your choice, including popular libraries like Mocha, MXNet and TensorFlow. This will be a great opportunity to interact with Prof Alan and Dr. Viral on best ways to approach an AI/ML strategy.

About the Speakers:

Prof. Alan Edelman is a Professor of Applied Mathematics, Computer Science and AI at MIT. He is a co-creator of Julia language, and a Co-founder and Chief Scientific Officer of Julia Computing, Inc.

Dr. Viral Shah is a co-creator of Julia language, and a Co-founder and CEO of of Julia Computing, Inc. He has been an important part of Aadhaar team from 2009 to 2014, and has co-authored a book called Rebooting India with Nandan Nilekani.

Julia Computing was founded in 2015 by the creators of the open source Julia language to develop products and provide support for businesses and researchers who use Julia.

Register now for an invite to the session or join the live cast.

Also, Workshop will be streamed on Youtube live for those who can join us virtually. The Invite will be shared on 21st Jan 2018 with the registered participants.

Are AI and Automation dirty words for some?

Man being replaced by machines has been a topic very well documented in our academic and social history. While, designing machines that can replicate human intelligence is ‘the dream’ for many, the idea has seen its fair share of resistance from anxious workers afraid to lose their livelihood. It would be a mistake to think that the phenomenon is only very recent. The Luddite movement, which began in Nottingham in 1811, was named after a disgruntled weaver who broke two stocking frames in a fit of rage. Destruction of machinery, as a form of protest, was carried out throughout England by groups of English textile workers and self-employed weavers. Since then, the term ‘Luddite’ has become a reference to someone opposed to industrialisation, automation, computerisation or new technologies in general.

Back to the 21st Century, Infosys’s human resources head Krishnamurthy Shankar has revealed that the company had “released” 8,000-9,000 employees in the last 12 months due to automation of lower-end jobs. The employees are not necessarily jobless and have been retrained and absorbed to carry out ‘more advanced projects’. The company also reduced its hiring in the Jan to December 2016 period to 5,700 compared to 17,000 in the first nine months of previous fiscal year. Infosys is not alone in their journey towards automation. Most Indian and global IT services companies are investing in automation of processes in their core businesses such as Application Management, Infrastructure Management and Business Process Management (BPM).

India’s IT giants are leaving no stones unturned to fill the gaps in their digital portfolio of products and services. The subjects of Internet of Things, Cloud, Artificial Intelligence and Automation figure high on each company’s organic strategy and also in their shopping list for inorganic growth (Table 1).

Table 1: Select Digital Acquisitions by Indian IT majors

Acquirer Target Value

(USD mn)

Brief
Infosys Panaya 200 Provider of automation technology for large scale enterprise software management
Wipro Healthplan Services 460 A technology and Business Process as a Service (BPaaS) provider in the U.S. Health Insurance market
Wipro Appiro 500 A services company that helps customers create next-generation Worker and Customer Experience using the latest cloud technologies
Infosys Skava 120 A provider of digital experience solutions, including mobile commerce and in-store shopping experiences to large retail clients
Tech Mahindra The BIO Agency 52 UK-based digital transformation firm
Tech Mahindra Target Group 164 A provider of business process outsourcing and software solutions

Automation is heralding the age of Industry 4.0 which is characterised by a diminishing boundary between the cyber and physical systems. In October 2016, World Bank research announced that Automation threatens 69 % of the jobs in India, while 77% in China. Google’s AI research lab, Google Brain is working on building AI software that can build more AI software. I wouldn’t blame anyone if they started thinking about the Skynet from Terminator or the writings of James Barrat – Our Final Invention: Artificial Intelligence and the End of the Human Era.

As per research by Gartner, IT process automation (ITPA) is very underpenetrated (only 15-20%) and will move towards maturity over the next 5-10 years. Most leading vendors in the IT services space have launched an automation platform to boost delivery efficiency.

Table 2: Automation/ AI Platforms of Indian IT Players

Company Platform Offerings
Wipro Holmes An artificial-intelligence platform built on opensource computing aimed at optimising resource utilisation and reducing costs
Infosys Aikido Enables creation of intelligent robots that can resolve incident related to customer orders
TCS Ignio An Artificial intelligence-based automation platform which automates and optimizes IT processes within an organisation.
Tech Mahindra Carexa Uno Customer care, with agent virtualisation, analytics, assisted

interactions and digital channels.

HCL Technologies DryIce A digital service exchange platform enabled by ServiceNOW

Source: NASSCOM, Edelweiss

Platforms based on novel technologies will minimise the human effort required. Are the coders coding away their jobs then? Thankfully, there are learned people who believe otherwise. As per NASSCOM, the future may not be as dire. There is a distinct possibility that repetitive and labour intensive jobs such as data entry and testing may get completely automated, but there will be augmentation of cognitive jobs. New roles will emerge which will focus on training, learning and maintenance requirements of AI systems. Indian companies will also need to invest in re-training its employees or importing talent in the short term. In the long term, a joint effort with technology schools such as IITs and IISc will be needed to build a supply chain of talent. 65% of Google DeepMind’s hires were directly from academia.

The Indian IT services sector is worth approximately USD 150 billion, and it is largely export dependent. The Indian players need to enhance their digital capabilities to compete globally. Automation is a key area of this digital growth and so is the evolution of skilled workforce and their job profiles. The fear of technology destroying all the jobs is as unreasonable now as it was in the 18th century. Also, it is evident from history that technology has always led to creation of more jobs than it has destroyed.

The workforce engaged in IT services by nature is flexible and open to evolving work profiles. Workers in some other sectors may not have that option, especially at the jobs requiring less complexity. HDFC bank just announced that it has witnessed a head count reduction of 4,500 due to efficiency improvements and attritions in the last quarter alone. The Bank is planning to install up to 20 humanoids named “Íra” at its branches in the two years to assist customers. Ira has been developed by Kochi-based Asimov Robotics and the company has already received queries from airports, hospitality industry and retail chains to deploy similar humanoids. It would be a good move for all professionals in all sectors to ask themselves – “Can a Robot do my job?”, and upgrade their professional skills accordingly.

arvind-yadav

 

This is a guest post by Arvind Yadav,

Principal at Aurum Equity Partners LLP.

 

 

 

 

 

Industry 4.0: The New Normal

In case you are a manufacturing company beginning to explore how investment into Artificial Intelligence and Internet of Things could help your top and bottom lines, you may already have fallen behind. The fourth industrial revolution or the ‘Industry 4.0’ is already upon us and the opportunities to completely transform the way we carry out production are limitless. Industry 4.0 may be broadly defined as a collective term for a number of contemporary automation, data exchange and manufacturing technologies. It is characterised by a diminishing boundary between the cyber and physical systems to enhance productivity and reduce costs. ‘Smart’ and ‘Connected’ are two of the most important keywords in the new industry universe. Smart takes us into the domain of Artificial Intelligence (AI) while ‘Connected’ is more a purview of ‘Internet of Things’ (IoT).

screen-shot-2016-12-08-at-4-26-39-pm

‘Smart’ – A detour into Artificial Intelligence

AI finds its roots way back in 1956 when the name ‘Artificial Intelligence’ was adopted or even further back with Alan Turing in 1950 or in 1943 when McCulloch & Pitts introduced the Boolean circuit model of brain. It’s still however, a little difficult to settle on one universal definition of AI. For our purpose we may define AI as the development of computer systems able to perform tasks normally requiring human intelligence. These may include (but are not limited to) visual perception, speech recognition, decision-making, and translation between languages. More passionate people define AI as the ability to ‘solve new problems’.

The lack of one single definition has not detracted investors from recognizing the potential of AI and they have been pouring in money like never before. As per Zinnov Consulting, in the last 5 years alone, investments in AI have grown ten-fold from USD 94 million in 2011 to USD 1billion in 2016. As per CB Insights, the equity investments in AI were North of USD 2 billion in both 2014 and 2015. We may attribute different ways of defining AI to different investment figures, however we can agree that investments have sky rocketed. While, Venture capital firms have obviously been at the forefront in backing early stage companies, the high corporate interest in acquiring AI start-ups has also led to a buzz in the M&A markets. Some of the biggest acquirers in AI include Google, Apple, Salesforce, Amazon, Microsoft, Intel and IBM.

India is holding its own in terms of AI related action. As per Zinnov, India has emerged as the 3rd largest AI ecosystem in the world with 170 start-ups. Niki.ai, SnapShopr, YANA, HealthNextGen, Aindra Systems, Hire Alchemy are some of the notable firms trying to disrupt the value chain across sectors. Global technology companies have acquired more than half-a-dozen India based AI start-ups in the last 18 months. It’s not all one way traffic. Indian IT services firms like Infosys (UNSILO, Cloudyn, TidalScale) and Wipro (Vicarious, Vectra Ventures) have been looking for targets abroad to augment their AI capabilities.

Table 1: AI use cases across sectors

screen-shot-2016-12-08-at-4-26-58-pm


‘Connected’ – the Industrial IoT

The Industrial Internet of Things refers to the network of equipment which includes a very large volume of sensors, devices and “things” that produce information and add value to the manufacturing processes. This information or data acts a feed to the AI systems. As per Cisco, 50 billion devices will be connected by 2020 and 500 billion by 2030. McKinsey projects that IoT will generate 11% of global GDP by 2025. This is driven by optimising industry performance and cost efficiencies.

 

IIoT on the Factory Floor

The global IIoT spending is estimated at USD 250 billion and is expected to reach USD 575 billion by 2020. The key components of the IIoT ecosystem include sensors/modules, connectivity, customisation, and platform/IoT cloud/applications.

As per NASSCOM, The Indian IoT market is expected to reach USD 15 billion with 2.7 billion units by 2020 from the current USD 5.6 billion and 200 million connected units. This is expected to be largely driven by applications in manufacturing, automotive and transportation and logistics.

In India, the IIoT segment has caught the attention of the largest manufacturers. In November 2016, Reliance and GE announced a partnership to work together to build applications for GE’s Predix platform. The partnership will provide industrial IoT solutions to customers in industries such as oil and gas, fertilizers, power, healthcare and telecom. Mahindra & Mahindra’s uses bots to build car body frames at its Nashik plant. Plants operated by Godrej and Welspun use the Intelligent Plant Framework provided by Covacis Technologies to run their factory floors.

Industry 4.0 is an exciting phase and the possibilities seem limitless. The Indian government is trying to play its part through the Digital India mission. It is positively driving various government projects such as smart cities, smart transportation, smart grids, etc. which are also expected to further propel the use of IoT technology. It is imperative for the promoters and companies in the manufacturing segment to find their place in the new digital world order through organic or inorganic investment.

arvind-yadav

 

 

 

This is a guest post by Arvind Yadav,

Principal at Aurum Equity Partners LLP.