Open House on DPI for AI #4: Why India is best suited to be the breeding ground for AI innovation!

This is the 4th blog in a series of blogs describing and signifying the importance of DPI for AI, a privacy-preserving techno-legal framework for AI data collaboration. Readers are encouraged to first go over the earlier blogs for better understanding and continuity. 

We are at the cusp of history with regard to how AI advancements are unfolding and the potential to build a man-machine society of the future economically, socially, and politically. There is a great opportunity to understand and deliver on potentially breakthrough business and societal use cases while developing and advancing foundational capabilities that can adapt to new ideas and challenges in the future. The major startups in Silicon Valley and big techs are focused first on bringing the advancements of AI to first-world problems – optimized and trained for their contexts. However, we know that first world’s solutions may not work in diverse and unstructured contexts in the rest of the world – may not even for all sections of the developed world.

Let’s address the elephant in the room – what are the critical ingredients that an AI ecosystem needs to succeed –  Data, enabling regulatory framework, talent, computing, capital, and a large market. In this open house

we make a case that India is the place that excels in all these dimensions, making it literally a no-brainer whether you are an investor, a researcher, an AI startup, or a product company to come and do it in India for your own success. 

India has one of the most vibrant, diverse, and eager markets in the world, making it a treasure chest of diverse data at scale, which is vital for AI models. While much of this data happens to be proprietary, the DPI for AI data collaboration framework makes it available in an easy and privacy-preserving way to innovators in India. Literally, no other country has such a scale and game plan for training data. One may ask that diversity and scale are indeed India’s strengths but where is the data? Isn’t most of our data with the US-based platforms? In this context, there are three types of Data: 

a. Public Data,
b. Non-Personal Data (NPD), and
c. Proprietary Datasets.

Let’s look at health. India has far more proprietary datasets than the US. It is just frozen in the current setup. Unfreezing this will give us a play in AI. This is exactly what DPI for AI is doing – in a privacy-preserving manner. In the US, health data platforms like those of Apple and Google are entering into agreements with big hospital chains – to supplement their user health data that comes from wearables. How do we better that? This is the US Big Tech-oriented approach – not exactly an ecosystem approach. Democratic unfreezing of health data with hospitals is the key today. DPI for AI would do that even for all – small or big, developers or researchers! We have continental-scale data with more diversity than any other nation. We need a unique way to unlock them to enable the entire ecosystem, not just big corporations. If we can do that, and we think we can via DPI for AI, we will have AI winners from India.

Combine this with India’s forward looking regulatory thought process that balances Regulation for AI and Regulation of AI in a unique way that encourages innovation without compromising on individual privacy and other potential harms of the technology. The diversity and scale of the Indian market act like a forcing function for innovators to think of robustness, safety, and efficiency from the very start which is critical for the innovations in AI to actually result in financial and societal benefits at scale. There are more engineers and scientists of Indian origin who are both creating AI models or developing innovative applications around AI models. Given our demographic dividend, this is one of our strengths for decades to come. Capital and Compute are clearly not our strong points, but capital literally follows the opportunity. Given India’s position of strength on data, regulation, market, and talent, capital is finding its way to India!

So, what are you all waiting for? India welcomes you with continental scale data with a lightweight but safe regulatory regime and talent like no place else to come build, invest, and innovate in India. India has done it in the past in various sectors, and it is strongly positioned to do it again in AI. Let’s do this together. We are just getting started, and, as always, are very eager for your feedback, suggestions, and participation in this journey!

Please share your feedback here
For more information, please visit depa.world

Please note: The blog post is authored by our volunteers, Sharad Sharma, Gaurav Aggarwal, Umakant Soni, and Sunu Engineer

Open House on DEPA Training #3: The Regulatory and Legal Aspects

This is the third in a series of blogs describing  the structure and importance of Digital Public Infrastructure for Artificial Intelligence (DPI for AI), a privacy-preserving techno-legal framework for data and AI model building  collaborations. Readers are encouraged to go over the first and second blogs for better understanding and continuity.

Open House on DEPA Training #1

Open House on DEPA Training #2: DPI to Unfreeze Data Markets. Let’s Make India an AI Nation!

The techno-legal framework of DEPA, elaborated upon in the earlier blogs, provides the foundations. From multiple discussions and history, it is clear that building and growing a vibrant AI economy that can create a product nation in India, requires a regulatory framework. This regulatory structure will serve as the legal partner to the technology aspect and work hand in hand with it. Upon this reliable techno-legal foundation will the ecosystem and global product companies from India be materialized.

‘Data Empowerment And Protection Architecture’ – or DEPA’s – worldview of ‘regulation for AI’, rather than the more conventional ‘Regulation of AI’ espoused by US, EU and so on sets DEPA apart and drives India towards an AI product nation with a global footprint.

How does one envisage the form and function of ‘Regulation for AI’?  In this open house, we have a dialog between technology and legal sides of the approach to explain the significant facets.

In a nutshell, ‘Regulation for AI’ will focus on 

  • what standards the AI models need to adhere to
  • define a lightweight but foolproof path for getting there for startups as well as the big players 
  • provide an environment which deals with many of the compliance and safety aspects ab initio 
  • define ways to remove hurdles from the innovator’s paths

In contrast, ‘Regulation of AI’ deals with what AI models cannot be and do and the tests and conditions that they have to pass depending on the risk classes that they are placed into. This is akin to certification processes in many fields such as pharma, transportation and so on which impose heavy cost burdens, especially on new innovators. For instance, many pharma companies which develop potentially good drug candidates run out of steam trying to meet the clinical trial conditions. Very often they are unable to find a valid and sizeable sample population to test their products as a part of the mandatory certification process. 

The current standards in the new Regulation of AI in the US, EU and so on leave many aspects such as the risk model classification process undefined, leading to regulatory uncertainty. This also works against investment driven innovation and consequent growth of the ecosystem in multiple ways.

The path to value both for the economy and the users, lies in the power of the data being projected into the universe of applications. These applications will be powered by the AI models in addition to other algorithmic engines. The earlier blogs already addressed the need and the way for data to make their way into models. 

For the models to exhibit their power, we must make sure they are reliable and used widely. This requires the AI models be accessible and available and most importantly, ‘do no harm’ when they are applied, through mistakes, misuse or malfeasance.  In addition to this, humans or their agents must not be allowed to harm the markets and users through monopoly control of the AI models. Large scale monopolistic control of these models which have global use and relevance can lead to situations which are beyond national or international legislation to control or curb. 

In the DEPA model, this benign, and in most ways, benevolent environment is created by a concinnous combination of technology and legal principles. Having analyzed the technological aspects of data privacy in the earlier blogs in this chain, here we will talk about the regulations implemented via a Self-Regulatory Organization – the SRO.  

Though not fully fleshed out, the SRO provides functions such as registration and roles to participants such as TDP (Training Data Provider), TDC (Training Data Consumer) and CCRP (Confidential Clean Room Provider). Many of these functions have been implemented in part to support the tech stack that we have released with respect to the CCR (Ref: DEPA Open House #1). This tech stack currently supports registration and allows the interactions between participants to be mediated via electronic contracts (the technological counterpart of legal contracts). 

The technology that validates the models through pre-deployment analysis based on complex adaptive system models is under development and is based on diverse research efforts across the world. This technology is designed for measuring the positive and negative impact of use of these models on societies at small and large scale and in short and long timescales.   

‘Complex adaptive system models’ are dynamic models which can capture agents with their state information and the multiple feedback loops which determine the changes in the system at different scales, sometimes simultaneously. The large number of components and the many kinds of feedback loops with their dynamic nature are what make these models complex and adaptive. These models, while still in their infancy in many ways, are critical to the question of understanding the AI models’ impact on societies. 

The SRO guides and supports the ecosystem players in building and deploying their models in a safe and secure way with lightweight regulatory ceilings so that large product companies in many fields like finance, healthcare, and education can grow and reach a happy consumer base. This is key to growing the ecosystem and connecting it to other parts of the India stack. 

We envisage leveraging the current legal system in terms of the different Acts (DPDP, IT Act, Copyright etc.) and models of Data Protection through CDO ( Chief Data Office) and CGO ( Grievance Office) in companies in India in defining the SRO’s role and features further.

The regulatory model also looks at the question of data ownership and copyright issues, especially in the context of Generative AI. We require large foundation models independent of the ‘Big Tech’ to fight potential monopolies. These models should be reflective of the local diversity to serve as reliable engines in the context of India. We need these models built and deployed locally, to be able to play a role as a product nation without being subverted or subjugated in our cyberspace strategies. 

To light up the AI sky with these many ‘fireflies’ in different parts of India, infrastructure for compute as well as market access is needed. The SRO creates policies that are not restrictive or protective but promotes participation and value realization. The data players, compute providers, market creators and users need to be able to play with each other in a safe space. Sufficient protection of copyright and creative invention will be provided via existing IP law to incentivize participation while not restricting to the point of killing innovation – this is the balance that the regulatory framework of SRO strives to reach. 

Drawing upon ideas of risk-based categorization of models (such as in the EU AI Act) and regulatory models (including punitive and compensatory measures) proportional to them, the models in India Stack will be easily compatible with international standards, as well as a universal or global standard, should an organization such as a UN agency define it. This makes global market reach of   AI models and products built in India, an easier target to achieve. 

We conjecture that these different aspects of DEPA will release the data from its silos. AI models will proliferate with multiple players profiting from infrastructure, model building, and exporting them to the world. Many applications will be built which will be used both in India (as part of the India Stack) and the world. It is through these models and applications that the latent potential and knowledge in the vast stores of data in India will be realized.

Please share your feedback here

For more information, please visit depa.world

Please note: The blog post is authored by our volunteers, Antara Vats, Vibhav Mithal and Sunu Engineer

Open House on DEPA Training #2: DPI to Unfreeze Data Markets. Let’s Make India an AI Nation!

This is the 2nd blog in a series of blogs describing and signifying the importance of DPI for AI, a privacy-preserving techno-legal framework for AI data collaboration. Readers are encouraged to first go over the 1st blog for better understanding and continuity.

What is unique about the techno-legal framework in DPI for AI is that it allows for data collaboration without compromising on data privacy. Now let’s put this in perspective of Indian enterprises and users. This framework can potentially revolutionize the entire ecosystem to slingshot India towards an AI product nation where we are not just using AI models developed within India but exporting the same. What is the biggest roadblock in this dream? In this open house (https://bit.ly/DEPA-2), we make a case that privileged access to data from Indian contexts is not only necessary to develop AI-based systems that are much more relatable to Indians but in fact, gives Indian innovators a distinct advantage over much larger and better funded big tech companies from the west.

Let’s get started. Clearly, there is a race to build larger and larger AI models these days trained on as much training data as possible. Most training data used in the models is publicly available on the web. Given that Indian enterprises are quite behind in this race, it is unlikely that we will catch up by simply following their footsteps. But what many folks outside of AI research circles often miss is that there has been credible research that shows that access to even relatively small amounts of contextual data can drastically reduce the data and compute requirements to achieve the same level of performance.

This sounds great, right, but (there is always a but!) much of this Indian context data is not in one place and is hidden behind numerous government and corporate walls. What makes the situation worse is most of these data silos are enterprises of traditional nature and are not the typical centers of innovation, at least for modern technologies like AI. This is a fertile ground for DPI for AI. The three core concepts of DPI for AI ensure that this data sitting in silos can be seamlessly (thanks to digital contracts) and democratically shared with innovators around India in a privacy-preserving manner (thanks to differential privacy). The innovators also do not need to worry one bit about the confidentiality of their IP (thanks to confidential computing). The techno-legal framework makes it super easy for anyone to abide by the privacy regulations without sweat. This will keep them safe from future litigations as long as they follow easy-to-follow guidelines provided in the framework. This is what we refer to as the unfreezing of data markets in this Open House. This unfreezing is critical for our innovators to get easy access to contextual data to give them a much-needed leg up against the Western onslaught in the field of AI. This is India’s moment to leapfrog in the field of AI as we have done in so many domains (payments, identity, internet, etc.). Given the enormity of the goal and the need to get it right, we seek participation from folks from varied expertise and backgrounds. Please share your feedback here

For more information, please visit depa.world

Please note: The blog post is authored by our volunteers, Hari Subramanian and Gaurav Aggarwal.

Introducing DEPA for Training: DPI for Responsible AI

In the last decade, we’ve seen an extraordinary explosion in the volume of data that we, as a species, generate. The possibilities that this data-driven era unlocks are mind-boggling. Large language models, trained on vast datasets, are already capable of performing a wide array of tasks, from text completion to image generation and understanding. The potential applications of AI, especially for societal problems, are limitless. However, lurking in the shadows are significant concerns such as security and privacy, abuse and mis-information, fairness and bias.

These concerns have led to stringent data protection laws worldwide, such as the European Union’s General Data Protection Regulation (GDPR) and California’s Consumer Privacy Act (CCPA), and the European AI Act. India has recently joined this global privacy protection movement with the Data Protection and Privacy Act of 2023 (DPDP Act). These laws emphasize the importance of individuals’ right to privacy and the need for real-time, granular, and specific consent when sharing personal data.

In parallel with privacy laws, India has also adopted a techno-legal approach for data sharing, led by the Data Empowerment and Protection Architecture (DEPA). This new-age digital infrastructure introduces a streamlined and compliant approach to consent-driven data sharing.

Today, we are taking the next step in this journey by extending DEPA to support training of AI models in accordance with responsible AI principles. This new digital public infrastructure, which we call DEPA for Training, is designed to address critical scenarios such as detecting fraud using datasets from multiple banks, helping with tracking and diagnosis of diseases, all without compromising the privacy of data principals.

DEPA for Training is founded on three core concepts, digital contracts, confidential clean rooms, and differential privacy. Digital contracts backed by transparent contract services make it simpler for organizations to share datasets and collaborate by recording data sharing agreements transparently. Confidential clean rooms ensure data security and privacy by processing datasets and training models in hardware protected secure environments. Differential privacy further fortifies this approach, allowing AI models to learn from data without risking individuals’ privacy. You can find more details how these concepts come together to create an open and fair ecosystem at https://depa.world.

DEPA for Training represents the first step towards a more responsible and secure AI landscape, where data privacy and technological advancement can thrive side by side. We believe that collaboration and feedback from experts, stakeholders, and the wider community are essential in shaping the future of this approach. Please share your feedback here

For more information, please visit depa.world

Please note: The blog post is authored by our volunteer, Kapil Vaswani

Virtual Meeting on Data Empowerment (August 31, 2021)

Senior policymakers met to discuss data empowerment approaches that ensure privacy and encourage innovation

The digitalization of economies, particularly in critical sectors such as health, mobility, energy, and finance, has seen significant generation of data. The ubiquity of data should lead to greater user-centric innovation, while preserving the trust that users have in an open, secure, and safe internet. This is among the foremost goals of policymakers and regulators today. 

Governments have adopted or are in the process of introducing legislation to provide a foundation for robust data governance. Their policy goals can be complemented and advanced with the help of common, open, and interoperable protocols that increase the choice of digital services available to a user and enhance user privacy. By implementing technical protocols that reflect privacy principles, a ‘techno-legal’ approach to data governance brings transparency and accountability to the way in which data is shared, thus empowering the user.

The global and seamless nature of the internet, and growing interdependence among digital economies calls for cooperation among like-minded partners on data empowerment. As part of a consultative process, a collective of senior policymakers met virtually for the first time on August 31, 2021.

Key participants at the meeting included:

  • Ms. Margrethe Vestager, Executive Vice President for A Europe Fit for the Digital Age and Competition, European Commission
  • Mr. Nikolai Astrup, Minister for Local Government and Modernisation, Norway
  • Dr. Agustin Carstens, General Manager, Bank for International Settlements, Switzerland
  • Dr. Rajiv Kumar, Vice-Chairman, NITI Aayog, India

Senior officials from Rwanda, Japan, France, and Australia also participated and made brief remarks in the meeting.

Participants at the meeting affirmed the importance of reinforcing the twin policy goals of privacy and data-driven innovation through open, interoperable technical protocols. They also underscored the need to reach out to more like-minded countries, and promote an inclusive and sustained dialogue on data empowerment. 

Zoom Meeting Capture (Image.1)
Zoom Meeting Capture (Image.2)

Data Empowerment and Protection Architecture Explained – Video

More commonly known as the ‘Consent Layer of the India Stack’, Data Empowerment and Protection Architecture (DEPA) is a new approach, a paradigm shift in personal data management and processing that transforms the currently prevalent organization-centric system to a human-centric system. By giving people the power to decide how their data can be used, DEPA enables the collection and use of personal data in ways that empower people to access better financial, healthcare, and other socio-economically important services in a safe, secure, and privacy-preserving manner.

It gives every Indian control over their data, democratizes access and enables the portability of trusted data between service providers. This architecture will help Indians in accessing better financial services, healthcare services, and other socio-economically important services.The rollout of DEPA for financial data and telecom data is already taking place through Account Aggregators that are licensed by RBI. It covers all asset data, liabilities data, and telecom data.

We, at iSPIRT, organised a learning session on the 18th of May, to give relevant and interested stakeholders a detailed primer on DEPA. We had 60-odd very animated and engaging people in the audience. The purpose of the session was to understand the technological, institutional, market and regulatory architecture of DEPA, it impacts on existing data consuming businesses and how people could contribute to this new data sharing infrastructure that’s being built in India.

The session was anchored by Siddarth Shetty, Data Empowerment And Protection Architecture Lead & Fellow, iSPIRT Foundation (Email – sid@ispirt.in). Please feel free to reach out to him for any queries regarding DEPA.

For other queries, please write to [email protected].

#5 What is the Federated PHR Component of the Health Stack?

PHR – Personal Health Record – is a mechanism to access a longitudinal view of a patient’s health history and be able to use it for different purposes. It is a component of the health stack:


It relies on two building blocks – (a) registries, to know the source of the data; and (b) health identifier, to know whom the data belongs to. Separating out the building blocks with each serving singular functions helps design a more scalable and sustainable system. We follow certain principles for both of these building blocks:

1. Registries are master databases with information about different entities in the healthcare ecosystem, for example, of hospitals, doctors, care beneficiaries, etc. There should be checks and balances built to ensure correctness of data (such as digital signatures, audit trails, etc.), and this information should be made accessible for different use cases (through open APIs, and consent). Opening access to this information will have a positive effect of increased demand, thus improving quality and leading to convergence towards singular sources.

2. Health identifier is a mechanism to integrate a patient’s health records. This identifier should incorporate the following features:

  • The identifier need not be unique. This means that a patient should have the ability to create multiple health identifiers for different health records – think of different digital folders for mental health cases and cancer cases (a common practice in the physical world).
  • The power to unify health records should lie with the patient. In the physical world, this would translate to the patient having the right to either keep two folders or merge them into one. The same should be allowed digitally.
  • Patients should be allowed to use any identifier to verify themselves. However, since we are creating an electronic system of health records, it is important that these be digitally verifiable – such as mobile number, email ID or Aadhaar.

3. Electronic consent, as specified by MeitY, is a mechanism to give consent electronically in a manner that follows the ORGANS Principles – Open, Revocable, Granular, Auditable, Notifiable, Secure.

_____________________________________________________________________

With these building blocks in place, we come to features of the PHR architecture:

1. Federated – instead of having a centralised repository of all health records, we propose a federated framework where data resides at the source of generation. This has many benefits – (i) ease of operations, as data is not stored with a single entity (ii) lower costs, as no additional repository is being built (iii) better security, as data is stored at different nodes; and (iv) patient empowerment, as data is being shared directly with the patient.

2. Schema level standardisation – we believe that only standardising the schema without enforcing codification standards (which require a significant behavioural shift) should be sufficient for a number of use cases. Since this standardisation is at an IT systems level, it only requires a one-time mapping and does not require any change in clinical workflows.

3. Health data access fiduciaries – these would be entities that would route the consent and data requests between information users and information providers. In doing so, they would play the role of privacy protection, consent management and user education.

4. Health data vault – this is an option for the patient to store his/ her records in a personal storage space. While most hospitals that capture data continue to store it for a long period of time,  an individual might still choose to store this information separately (for long-term access, trust-deficit between patient and provider, etc.). In such a case, the patient can request a copy of the record to be pushed to his/her health data vault.

_____________________________________________________________________

Proposed architecture:

Workflow:

Patient goes to a healthcare provider. At the time of issuance:

Option 1: patient shares mobile number/ email id/ aadhaar no.
1. Provider authenticates user using one of the digital identifiers
2. (a) Provider sends a link to patient for downloading the report. Patient can later link these records with his/ her HDAF; or
2. (b) Patient can sign up with HDAF and search for provider to link records

Option 2: patient shares HDAF ID
1. Provider links patient records to the HDAF

Post linkage, patient can approve requests from data consumers through the HDAF for different use cases.

_____________________________________________________________________

We believe that building PHR as a public good will enable interesting use cases to come to life, that would together improve the healthcare ecosystem. While we will continue our quest for these, we would love to receive feedback on our thinking! If you work in this space and have comments, or would like to understand how this could help your product, please drop me a line at [email protected].

#4 Reimagining Cancer Care

In the last few months, I have had the opportunity to work closely with the National Cancer Grid – a network of 150+ cancer centres in India – and in the process, better understand the workflows involved in different medical processes and the requirements of medical professionals. I have closely observed care delivery, interviewed cancer patients and oncologists, learnt about current challenges and about initiatives being undertaken by NCG and other organisations to tackle them.

This blog post is an evolved version of an earlier post, where I had talked about the use cases of health data and the implementation of a PHR (Personal Health Record). Of these, I believe that the biggest use of health data will be in improving the quality of care in complex medical cases (either acute like surgical procedures, or chronic like cancer). In this post, I will use cancer care to exemplify this.

Core idea
Let us visualise a specific application for cancer care, with oncologists as its primary users. There are only around 1000 trained oncologists in India, so let’s assume that all of them are users of this application. Let us also assume that clinical data of all patients treated by these oncologists is conveniently accessible through this application (with due privacy and security measures). What will these users do now?

Expert consultation
I attended a Virtual Tumour Board run by the National Cancer Grid – a weekly remote consultation program run on Saturday mornings where teams of doctors voluntarily join to discuss well-documented cases and their potential treatment plans. VTBs are run separately for each speciality (like head & neck tumour, gynaecology, neurology, etc.), which means that it takes up to 4-6 weeks for one’s turn. Doctors usually do not have the luxury of such long waiting periods, and therefore turn to individual consultations which are often not documented, depend on informal connects and are sometimes made with incomplete data. Formalising this process and making it asynchronous can be of huge benefit to all medical professionals.

Care team collaboration

Complex medical procedures often involve a team of doctors and other medical professionals, working responsibly for a given patient. A significant percentage of all deaths due to medical negligence is caused by lack of communication between the care team members. The communication process today is paper-based and unstructured, leading to accidents that can, in fact, be prevented – especially with the growing use of IoT devices and voice-based inputs. (I saw one such application at Narayana Health being used by their ICU teams).

Performance evaluation

Lack of organised data, changing patient care-providers and long feedback loops make it difficult for medical professionals to monitor their performance. Can we empower them with tools to do so? Doctors today lack visibility on the outcome of the treatment given and rely on intuition, experience or techniques tested in developed countries for care delivery. Such a tool would not only help doctors improve their performance, but also improve the trust equation with their patients.

User Experience
There are three crucial elements for enabling a good user experience:

Data input – Most EHR systems require text input to be typed in by doctors. This makes it difficult to use. Other input techniques for automated data transcription like touch, voice, or other innovative methods for data capture will need to be explored. Additionally, interoperability across all systems and devices will be key in enabling access to all data.

Data interpretation – Sorting through a patient’s health records takes up a substantial amount of time of a physician, especially when the data is unstructured. Developing intelligence to sort the relevant records as per the case in question will significantly enhance the user experience of the product.

Safety and PrivacyAll solutions should ensure complete privacy of patients. This could mean access controls, electronic consent, digital signatures, digital logs, tools for data anonymisation, etc. it might also be important to perform basic verification of users of the platform.

Value Discovery
The value of the platform will increase as more and more physicians become a part of it. For example, an endocrinologist might need to consult a cardiologist in a case of disease progression, or an ENT specialist might need to consult an oncologist to confirm a diagnosis. More importantly, the platform will also drive innovation, i.e., other use cases can be developed on top of it. For example, the expert opinions mentioned above can also be used for consulting patient remotely, pre-authorising claims, forming medical peer review groups, etc. Similarly, working care groups can also simultaneously enrol staff for upskilling (as practised today in an offline setting), and information about treatment outcomes can help guide better research.

Next steps
We remain on a quest to find use-cases for PHR since we believe technology pilots alone would not be enough to drive its adoption. In that context, we are looking for partners to experiment with this in different healthcare domains. If you are interested, please reach out to me at [email protected]!

iSPIRT Final Comments on India’s Personal Data Protection Bill

Below represents iSPIRT’s comments and recommendations on the draft Personal Data Protection Bill.  iSPIRT’s overall data privacy and data empowerment philosophy is covered here.  

Table of Contents

Major Comments
1. Include Consent Dashboards
2. Financial Understanding and Informed Consent for all Indians
3. Data Fiduciary Trust Scores Similar to App Store Ratings
4. Comments & Complaints on Data Fiduciaries are Public, Aggregatable Data
5. Warn of Potential Credit and Reputation Hazards
6. A Right to View and Edit Inferred Personal Data
7. Sharing and Processing of Health Data

Suggestions and Questions

  • Fund Data Rights Education
  • Limit Impact Assessment Requirement
  • Passwords should be treated differently than other Sensitive Personal Data.
  • Does the Bill intend to ban automatic person-tagging in photos and image search of people?
  • Notifications about updates to personal data should be handled by a Consent Dashboard, not every data fiduciary.
  • Need for an Authority appeal process when data principal rights conflict
  • Do not outlaw private fraud detection
  • Limit record keeping use and disclosure to the Authority and the company itself.
  • Fillings may be performed digitally
  • Request for Definition Clarifications
  • Author Comments
  • Links
  • Appendix – Sample User Interface Screens

Major Comments

1. Include Consent Dashboards

We support the idea of a Consent Dashboard as suggested in the Data Protection Committee Report (page 38) and recommend it to be incorporated in the Bill in Section 26 – Right to Data Portability and Section 30 (2) Transparency.  

We envision all of a user’s personal and inferred data that is known by data fiduciaries (i.e. companies) being exposed on a consent dashboard, provided by a third party consent collector or account aggregator (to use the RBI’s parlance). Below is an example user interface:

This mandate would enable users to have one place – their consent collector-provided dashboard – to discover, view and edit all data about them. It would also allow users to see any pending, approved and denied data requests.

Furthermore, in the event of data breaches, especially when a user’s password and identifier (mobile, email, etc) have been compromised, the breach and recommended action steps could be made clear on the consent dashboard.

Given the scope of this suggestion, we recommend an iterative or domain specific approach, wherein financial data is first listed in a dashboard limited to financial data and for its scope to grow with time.

2. Financial Understanding and Informed Consent for all Indians

We applaud the Bill’s Right to Confirmation and Access (Chapter IV, Section 24):

The data fiduciary shall provide the information as required under this section to the data principal in a clear and concise manner that is easily comprehensible to a reasonable person.

That said, we’ve found in practice that it’s difficult to appreciate the implications of digital policies on users until real user interfaces are presented to end users and then tested for their usability and understanding. Hence, we’ve put together a set of sample interfaces (see Appendix) that incorporate many of the proposed bill’s provisions and our recommendations. That said, much more work is needed before we can confidently assert that most Indians understand these interfaces and what they are truly consenting to share.

The concepts behind this bill are complicated and yet important. Most people do not understand concepts such as “revocable data access rights” and other rather jargon-filled phrases often present in the discussion of data privacy rights. Hence, we believe the best practices from interface design must be employed to help all Indians – even those who are illiterate and may only speak one of our many non-dominant languages – understand how to control their data.

For example, multi-language interfaces with audio assistance and help videos could be created to aid understanding and create informed consent.  Toll-free voice hotlines could be available for users to ask questions. Importantly, we recognize that the interfaces of informed consent and privacy control need rigorous study and will need to evolve in the years ahead.

In particular, we recommend user interface research in the following areas:

  • Interfaces for low-education and traditionally marginalized communities
  • Voice-only and augmented interfaces
  • Smart and “candy-bar” phone interfaces
  • Both self-serving and assisted interfaces (such that a user can consensually and legally delegate consent, as tax-payers do to accountants).

After user interface research has been completed and one can confidently assert that certain interface patterns can be understood by most Indian adults, we can imagine that templated designs representing best practices are recommended for the industry, much like the design guidelines for credit card products published by US Consumer Financial Protection Bureau or nutritional labelling.

3. Data Fiduciary Trust Scores Similar to App Store Ratings

We support the government’s effort to improve the trust environment and believe users should have appropriate, easy and fast ways to give informed consent & ensure bad actors can’t do well. Conversely, we believe that the best actors should benefit from a seamless UI and rise to the top.

The courts and data auditors can’t be the only way to highlight good, mediocre and bad players. From experience, we know that there will be a continuum of good to bad experiences provided by data fiduciaries, with only the worst and often most egregious actions being illegal.

People should be able to see the experiences of other users – both good and bad – to make more meaningful and informed choices. For example, a lender that also cross-sells other products to loan recipients and shares their mobile numbers may not be engaging in an illegal activity but users may find it simply annoying.

Hence, we recommend that data fiduciary trust scores are informed with user-created negatives reviews (aka complaints) and positive reviews.

In addition to Data Auditors (as the Bill envisions), user created, public ratings will create additional data points and business incentives for data fiduciaries to remain in full compliance with this law, without a company’s data protection assessment being the sole domain of its paid data auditors.

We would note that crowd sourced rating systems are an ever-evolving tech problem in their own right (and subject to gaming, spam, etc) and hence, trust rating and score maintenance may be best provided by multiple market actors and tech platforms.

4. Comments & Complaints on Data Fiduciaries are Public, Aggregatable Data

…so 3rd party actors and civil society can act on behalf of users.

A privacy framework will not change the power dynamics of our society overnight. Desperate people in need of money will often sign over almost anything, especially abstract rights. Additionally, individual citizens will rarely to be able to see larger patterns in the behaviour of lenders or other data fiduciaries and are ill-equipped to fight for small rewards on behalf of their community.  Hence, we believe that user ratings and complaint data about data fiduciaries must be made available in machine-readable forms to not only to the State but to third-parties, civic society and researchers so that they may identify patterns of good and bad behaviour, acting as additional data rights watchdogs on behalf all of us.

5. Warn of Potential Credit and Reputation Hazards

We are concerned about the rise of digital and mobile loans in other countries in recent years. Kenya – a country with high mobile payment penetration and hence like India one that has become data rich before becoming economically rich – has seen more than 10% of the adult population on credit blacklists in 2017; three percent of all digital loans were reportedly used for gambling. These new loan products were largely made possible by digital money systems and the ability of lenders to create automated risk profiles based on personal data; they clearly have the potential to cause societal harm and must be considered carefully.

Potential remedies to widespread and multiple loans are being proposed (e.g. real-time credit reporting services), but the fact that a user’s reputation and credit score will be affected by an action (such as taking out a loan), most also be known and understood by users. E.g. Users need to know that an offered loan will be reported to other banks and if they don’t pay they will be reported and unable to get other loans.

Furthermore, shared usage-based patterns – such as whether a customer pays their bills on time or buys certain types of products – must be available for review by end users.

6. A Right to View and Edit Inferred Personal Data

The Machine Learning and AI community have made incredible strides in computers’ ability to predict or infer almost anything. For example, in 2017, a babajob.com researcher showed the company could predict whether a job seeker earned more or less than Rs 12000 / month with more than 80% accuracy, using just their photo.  She did this using 3000 job seeker photos, 10 lines of code and Google’s TensorFlow for Poets sample code.  Note the project was never deployed or made publicly available.

As these techniques become ever more commonplace in the years to come, it’s reasonable to assume that public facing camera and sensor systems will be able to accurately infer most of the personal data of their subjects – e.g. their gender, emotional state, health, caste, religion, income – and then connect this data to other personally identifiable data such as a photo of their credit card and purchase history. Doing so will improve training data so that systems become even more accurate. In time, these systems – especially ones with large databases of labelled photos – like the governments’, popular social networks’ or a mall’s point of sale + video surveillance system – truly will be able to precisely identify individuals and their most marketable traits from any video feed.

Europe’s GDPR has enshrined the right for people to view data inferred about them, but in conjunction with the idea of a third party consent dashboard or Account Aggregator (in the RBI’s case), we believe we can do better.

In particular, any entity that collects or infers data about an individual that’s associated with an identifier such as an email address, mobile, credit card, or Aadhaar number should make that data viewable and editable to end users via their consent dashboard.  For example, if a payment gateway provider analyses your purchase history and infers you are diabetic and sells this information as a categorization parameter to medical advertisers, that payment gateway must notify you that it believes you are diabetic and enable you to view and remove this data. Google, for example, lists these inferences as Interests and allows users to edit them:

Using the Consent Dashboard mentioned in Major Comment 1, we believe users should have one place where they can discover, view and correct all personal and inferred data relevant to them.

Finally, more clarity is needed regarding how data gathered or inferred from secondary sources should be regulated and what consent may be required. For example, many mobile apps ask for a user’s consent to read their SMS Inbox and then read their bank confirmation SMSs to create a credit score. From our view, the inferred credit score should be viewable by the end user before it’s shared, given its personal data that deeply affects the user’s ability to gain usage of a service (in this case, often a loan at a given interest rate).

7. Sharing and Processing of Health Data

The Bill requires capturing the purpose for data sharing:

Chapter II, point 5:

“Purpose limitation.— (1) Personal data shall be processed only for purposes that are clear, specific and lawful. (2) Personal data shall be processed only for purposes specified or for any other incidental purpose that the data principal would reasonably expect the personal data to be used for, having regard to the specified purposes, and the context and circumstances in which the personal data was collected.”

In the healthcare domain, collecting the purpose for which the data is being shared might itself be quite revealing. For example, if data is being shared for a potential cancer biopsy or HIV testing, the purpose might be enough to make inferences and private determinations about the patient and say deny insurance coverage. On the other hand, stating high-level, blanket purposes might not be enough for future audits. A regulation must be in place to ensure the confidentiality of the stated purpose.  

The Bill has a provision for processing sensitive personal data for prompt action:

Chapter IV, point 21:

“Processing of certain categories of sensitive personal data for prompt action. — Passwords, financial data, health data, official identifiers, genetic data, and biometric data may be processed where such processing is strictly necessary— (a) to respond to any medical emergency involving a threat to the life or a severe threat to the health of the data principal; (b) to undertake any measure to provide medical treatment or health services to any individual during an epidemic, outbreak of disease or any other threat to public health; or (c) to undertake any measure to ensure safety of, or provide assistance or services to, any individual during any disaster or any breakdown of public order.”

While this is indeed a necessity, we believe that a middle ground could be achieved by providing an option for users to appoint consent nominees, in a similar manner to granting power of attorney. In cases of emergency, consent nominees such as family members could grant consent on behalf of the user. Processing without consent could happen only in cases where a consent nominee is unavailable or has not been appointed. This creates an additional layer of protection against misuse of health data of the user.

Suggestions and Questions

Fund Data Rights Education

We believe a larger, public education program may be necessary to educate the public on their data rights.

Limit Impact Assessment Requirement

Section 33 – Data Protection Impact Assessment —

  • Where the data fiduciary intends to undertake any processing involving new technologies or large scale profiling or use of sensitive personal data such as genetic data or biometric data, or any other processing which carries a risk of significant harm to data principals, such processing shall not be commenced unless the data fiduciary has undertaken a data protection impact assessment in accordance with the provisions of this section. …
  • On receipt of the assessment, if the Authority has reason to believe that the processing is likely to cause harm to the data principals, the Authority may direct the data fiduciary to cease such processing or direct that such processing shall be subject to such conditions as may be issued by the Authority.

We believe that the public must be protected from egregious data profiling but this provision does not strike an appropriate balance with respect to innovation. It mandates that companies and other researchers must ask government permission to innovate around large scale data processing before any work, public deployments or evidence of harm takes place. We believe this provision will be a large hinderance to experimentation and cause significant AI research to simply leave India. A more appropriate balance might be to ask data fiduciaries to privately create such an impact assessment but only submit to the Authority for approval once small scale testing has been completed (with potential harms better understood) and large scale deployments are imminent.

Passwords should be treated differently than other sensitive personal data.

Chapter IV – Section 18. Sensitive Personal Data. Passwords are different than other types of Sensitive Personal Data, given that they are a data security artifact, rather than a piece of data that is pertinent to a person’s being. We believe that data protection should be over-ridden in extraordinary circumstances without forcing companies to provide a backdoor to reveal passwords. We fully acknowledge that it is useful and sometimes necessary to provide backdoors to personal data – e.g. one’s medical history in the event of a medical emergency – but to require such a backdoor for passwords would likely introduce large potential security breaches throughout the entire personal data ecosystem.  

Does the Bill intend to ban automatic person-tagging in photos and image search of people?

Chapter I.3.8 – Biometric Data – The Bill defines Biometric Data to be:

“facial images, fingerprints, iris scans, or any other similar personal data resulting from measurements or technical processing operations carried out on physical, physiological, or behavioural characteristics of a data principal, which allow or confirm the unique identification of that natural person;”

The Bill includes Biometric Data in its definition of Sensitive Personal Data (section 3.35) which may only be processed with explicit consent:

Section 18. Processing of sensitive personal data based on explicit consent. — (1) Sensitive personal data may be processed on the basis of explicit consent

From our reading, we can see a variety of features available today around image search and person tagging being disallowed based on these provisions. E.g. Google’s image search contains many facial images which have been processed to enable identification of natural persons. Facebook’s “friend auto-suggestion” feature on photos employs similar techniques. Does the Bill intend for these features and others like them to be banned in India? It can certainly be argued that non-public people have a right to explicitly consent before they are publicly identified in a photo but we feel the Bill’s authors should clarify this position. Furthermore, does the purpose of unique identification processing matter with respect to its legality?  For example, we can imagine mobile phone-based, machine learning algorithms automatically identifying a user’s friends to make a photo easier to share with those friends; would such an algorithm require explicit consent from those friends before it may suggest them to the user?

Notifications about updates to personal data should be handled by a Consent Dashboard, not every data fiduciary.

Chapter IV – Section 25.4 – Right to correction, etc

Where the data fiduciary corrects, completes, or updates personal data in accordance with sub-section (1), the data fiduciary shall also take reasonable steps to notify all relevant entities or individuals to whom such personal data may have been disclosed regarding the relevant correction, completion or updating, particularly where such action would have an impact on the rights and interests of the data principal or on decisions made regarding them.

We believe the mandate on a data fiduciary to notify all relevant entities of a personal data change is too great a burden and is better performed by a consent dashboard, who maintains which other entities have a valid, up-to-date consent request to a user’s data. Hence, upon a data change, the data fiduciary would update the consent dashboard of the change and then the consent dashboard would then notify all other relevant entities.

It may be useful to keep the user in this loop – so that this sharing is done with their knowledge and approval.

Need for an Authority appeal process when data principal rights conflict

Section 28.5 – General conditions for the exercise of rights in this Chapter. —  

The data fiduciary is not obliged to comply with any request made under this Chapter where such compliance would harm the rights of any other data principal under this Act.

This portion of the law enables a data fiduciary to deny a user’s data change request if it believes doing so would harm another data principal. We believe it should not be up to the sole discretion of the data fiduciary to determine which data principal rights are more important and hence would like to see an appeal process to the Data Protection Authority made available if a request is refused for this reason.

Do not outlaw private fraud detection

Section 43.1 Prevention, detection, investigation and prosecution of contraventions of law

(1) Processing of personal data in the interests of prevention, detection, investigation and prosecution of any offence or any other contravention of law shall not be permitted unless it is authorised by a law made by Parliament and State Legislature and is necessary for, and proportionate to, such interests being achieved.

We worry the above clause would effectively outlaw fraud detection research, development and services by private companies in India. For instance, if a payment processor wishes to implement a fraud detection mechanism, they should be able to do so, without leaving that task to the State.  These innovations have a long track record of protecting users and businesses and reducing transaction costs. We recommend a clarification of this section and/or its restrictions to be applied to the State.

Limit record keeping use and disclosure to the Authority and the company itself.

Section 34.1.a. Record – Keeping –

The data fiduciary shall maintain accurate and up-to-date records of the following

(a) important operations in the data life-cycle including collection, transfers, and erasure of personal data to demonstrate compliance as required under section 11;

We expect sensitive meta-data and identifiers will need to be maintained for the purposes of Record Keeping; we suggest that this Record Keeping information be allowed but its sharing limited only to this use and shared only with the company, its Record Keeping contractors (if any) and the Authority.

Fillings may be performed digitally

Section 27.4 – Right to be Forgotten

The right under sub-section (1) shall be exercised by filing an application in such form and manner as may be prescribed.

The Bill contains many references to filing an application;  we’d suggest a definition that is broad and includes digital filings.

This also applies to sections which include “in writing” – which must include digital communications which can be stored (for instance, email).

Request for Definition Clarifications

What is “publicly available personal data”?

  • Section 17.2.g – We believe greater clarity is needed around the term “publicly available personal data.“ There questionably obtained databases for sale that list the mobile numbers and addresses of millions of Indians – would there thus be included as a publicly available personal data?
  • We’d recommend that DPA defines rules around what is publicly available personal data so that it is taken out of the ambit of the bill.  
  • The same can be said for data where there is no reasonable expectation of privacy (with the exception that systematic data collection on one subject cannot be considered to be such a situation)

Clarity of “Privacy by Design”

Section 29 – Privacy by Design

Privacy by Design is an established set of principles (see here and in GDPR) and we would like to see the Bill reference those patterns explicitly or use a different name if it wishes to employ another definition.

Define “prevent continuing disclosure”

Section 27.1 – Right to be Forgotten

The data principal shall have the right to restrict or prevent continuing disclosure of personal data by a data fiduciary…

We request further clarification on the meaning of  “prevent continuing disclosure” and an example use case of harm.

Define “standard contractual clauses” for Cross-Border Transfers

Section 41.3.5 – Conditions for Cross-Border Transfer of Personal Data

(5) The Authority may only approve standard contractual clauses or intra-group schemes under clause (a) of sub-section (1) where such clauses or schemes effectively protect the rights of data principals under this Act, including in relation with further transfers from the transferees of personal data under this subsection to any other person or entity.

We would like to standard contractual clauses clearly defined.

Define “trade secret”

Section 26.2 C – Right to be Forgotten

compliance with the request in sub-section (1) would reveal a trade secret of any data fiduciary or would not be technically feasible.

We request further clarification on the meaning of  “trade secret” and an example of the same.

Author Comments

Compiled by iSPIRT Volunteers:

  • Sean Blagsvedt – sean _@_ blagsvedt.com
  • Siddharth Shetty – siddharth _@_ siddharthshetty.com
  • Anukriti Chaudharianukriti.chaudhari _@_ gmail.com
  • Sanjay Jain – snjyjn _@_ gmail.com

Links

Comments and feedback are appreciated. Please mail us at [email protected].

Appendix – Sample User Interface Screens

Link: https://docs.google.com/presentation/d/1Eyszb3Xyy5deaaKf-jjnu0ahbNDxl7HOicImNVjSpFY/edit?usp=sharing

******

How To Empower 1.3 Billion Citizens With Their Data

2018 has been a significant year in our relationship with Data. Globally, the Cambridge Analytica incident made people realise that democracy itself can be vulnerable to data.  Closer to home, we got a first glimpse at the draft bill for Privacy by the Justice Sri Krishna Committee.

The writing on the wall is obvious. We cannot continue the way we have. This is a problem at every level – Individuals need to be more careful with whom they share their data and data controllers need to show more transparency and responsibility in handling user data. But one cannot expect that we will just organically shift to a more responsible, transparent, privacy-protecting regime without the intervention of the state. The draft bill, if it becomes law, will be a great win as it finally prescribes meaningful penalties for transgressions by controllers.

But we must not forget that the flip side of the coin is that data can also help empower people. India has much more socio-economic diversity than other countries where a data protection law has been enacted. Our concerns are more than just limiting the exploitation of user data by data controllers. We must look at data as an opportunity and ask how can we help users generate wealth out of their own data. Thus we propose, that we should design an India-specific Data Protection & Empowerment Architecture (DEPA). Empowerment & Protection are neither opposite nor orthogonal but co-dependent activities. We must think of them together else we will miss the forest for the trees.

In my talk linked below which took place at IDFC Dialogues Goa, I expand more on these ideas. I also talk about the exciting new technology tools that actually help us realise a future where Data can empower.

I hope you take away something of value from the talk. The larger message though, is that it is still early days for the internet. We can participate in shaping its culture, maybe even lead the way, instead of being passive observers. The Indian approach is finding deep resonance globally, and many countries, developing as well as developed, are looking to us for inspiration on how to deal with their own data problem. But it is going to take a lot more collaboration and co-creation before we get there. I hope you will join us on this mission to create a Data Democracy.