The best way forward for Privacy is to open up your data

Since conception, the India Stack has always been presented as having 4 layers. The first 3, paperless, presence-less and cashless, would essentially combat the price of doing transactions. Whether government to consumer or business to consumer, these layers significantly drive down the cost of interacting with your end consumer. With reduced cost, came increased access. Sachetization of services was possible, and we started seeing more and more businesses target India-2 and India-3, making them data rich.

Slide Showing the 4 layers of the India Stack
Slide Showing the 4 layers of the India Stack

The 4th layer, that was nicknamed the “consent layer” was different, and hasn’t received much attention thus far. Unlike, the other 3 layers, it does not seek to drive down the cost of delivering services, but is a tool to, as Nandan Nilekani puts it, “invert the data”. That means that it allows the user to assert ownership over the data, and exercise certain choices in how it is used. The Data Empowerment & Protection Architecture (DEPA) brings us closer to achieving a Data Democracy, where the user can share his data with service providers. The slides attached here, present iSPIRT’s outlook on what DEPA is, what it does and most importantly, how does it empower the user.

Before we get into details of Data Empowerment, it is useful to acknowledge that Data is not a homogeneous commodity. There is a hierarchy within Data, and not all forms of Data can be treated equally.

The five types of Data that needs regulations
The five types of Data that needs regulation

In the table above, as we go from left to right, data goes from more intimate, to more public. Even in today’s muddled regulatory framework around Data, non-shareable data such as biometrics, or passwords is seen as user-owned and a big no-no to share. But as we move towards the right, where ownership of the user ends, and that of the Data Controller begins, is murky at best.

Second, as you go reverse from right to left, the data becomes more individualistic. Anonymous Datasets, and Public Datasets are clearly about group data, whereas the rest are coupled with one or more individuals. Generated Data on e-commerce marketplaces, for example, may involve 3 or even 4 parties. Non-shareable data is typically intimate to a single user

Everything that we talk about in this post or the slides that follow, focus on the shareable middle of this chart, highlighted in yellow. At a principle level, we assert that any data, that has an individual identifier to it, is co-owned by every individual whose identifier is present in the data set. This may not give you complete rights over your data, but it gives you two rights, that DEPA enshrines in technology.

The first right, is that you can ask for access to your data from the data provider where this data originated, in a machine-readable format. The second right, is to share with user consent, your personal, generated or derived data with any other service provider you wish. To be clear, this is a right to consented sharing, and not consented collection. The right to what data is collected is a tricky issue, and requires policy, legal & regulatory clarity, before we can build tools to protect it. But we believe that the right for a user to claim stake on collected, generated or derived data about themselves has clear legal and moral precedent.

DEPA engages with Consent to Collect, not Consent to Share
DEPA engages with Consent to Collect, not Consent to Share

How do we enshrine these rights in technology? The Electronic Data Consent. But before we introduce the hero, I’d like to get into a little back story to set the stage.

When UIDAI first issued Aadhaar cards, it noticed that despite it’s portable, digital nature, people used Aadhaar cards as Proof of Identity in the same way they used other IDs, through self-attested photocopies. Most of the time, these photocopies would not contain the explicit purpose of why they were photocopied. These photocopies were impossible to manage, and inadvertently some bad actors would steal say PAN or Aadhaar xeroxes, and use them as paper identity documents for fraudulent transactions.

So the UIDAI launched eKYC. The premise was simple, UIDAI could authenticate the identity of the individual. Combined with explicit consent of the user to share their data, the encrypted data would go directly to the service provider, digitally signed from UIDAI. This copy of the KYC document was safer, more trustworthy as well as faster and cheaper.

So the basic equation became :

The eKYC equation
The eKYC equation

But this thinking was pretty powerful, and the MeitY decided to abstract it and create the Digital Locker Ecosystem. Where instead of only one source of truth (UIDAI), any government or private entity can become an issuer of documents. Authentication was also abstracted and need not be tied to Aadhaar. You could retrieve marksheets linked to your roll number, or mobile bills linked to your mobile number. This lead to the following equation :

The Digital Locker System Equation
The Digital Locker System Equation

If you’ve been following this so far, you’ll realize there’s a pretty big missing piece in these equations so far. The “User Consent to Share” bit doesn’t seem to have the same sort of granularity as the other two parts of the equation. Consent is more nuanced than a simple yes or no. By forcing consent into a binary, data providers reduce their offerings to a “take it” or “leave it” choice. This is a meaningless choice for the consumer.

To really capture user intent, we must expand our understanding of consent. We must try to capture the granularity of the customer’s intent to consent. Does the customer consent to sharing of his data forever or for a limited period of time? Does the customer consent to further downstream sharing of the data, or does he not want his data to leave the service provider? This is where the hero we mentioned earlier enters.

Sample Flows of Data and Consent under EDC
Sample Flows of Data and Consent under EDC

Introducing Electronic Data Consent (EDC). It is a mechanism to abstract consent flows, from data flows. Which means, that you can capture the user’s intent to consent in bits, digitally signed by the user for authenticity, and share it with other providers to retrieve user data seamlessly.

Flowing through the pipes of EDC is an open, extensible XML file called the Consent Artefact. The Consent Artefact has some pretty cool features. It captures all the parties involved in the transaction, it explicitly states what data is being shared and for how long. There are options for the user to decide if the data consumer is allowed storage and further sharing of this data. Also, all parties are immediately notified to any updates in the consent and all changes are logged. This facilitates data audits, not just for regulators, but also to enhance trust between Data Providers and Consumers, and unlock the data economy.

The Consent Artefact enables differential privacy measures such as Virtual Data Rooms. For e.g., a lending startup could know if your income in the bank account matches the one on your salary slip, without having to hand over your entire financial transaction history from your bank to the NBFC. It can just raise a query to your bank “Is income > x?”, and get a simple yes/no in return. The Consent Artefact’s logging and notice, can enable newer ways of pricing and doing business on data. The Consent Artefact deserves another post just for itself, and you will get one, in the next couple of weeks.

But to summarize, the EDC abstracts consent flows, from data flows. It allows for collection, management and audit of granular consent to share data in an open XML file called the Consent Artefact. Now, time for the big question. So What?

Well, today you can open a Bank Account instantly with eKYC. You can get your bank statements on a digital locker without lifting a finger, if the bank is enrolled as an issuer on a digital locker. But to get a personalized flow-based loan, you need EDC. Electronic consent unlocks the value of your data sitting in multiple databases of multiple service providers and gives you granular control over who gets to see what. Together these 3 tools give us a stack for Consented Data Sharing that we call Data Empowerment & Protection Architecture. DEPA opens up whole new models for Privacy Protection and Auditing Data flows while keeping the user in the center. More tools will be added to this Architecture that encourage the unlocking of value in disparate data sources, such as the healthcare combiner.

The 3 Tools that make up DEPA. Upcoming tools such as the Health Record Combiner will be introduced in another blog post.
The 3 Tools that make up DEPA. Upcoming tools such as the Health Record Combiner will be introduced in another blog post.

We believe that the 4th layer of the India Stack is the most critical. While the other 3 were useful operationally, the consent layer is useful strategically. It forces you to think about how best to align Data for the empowerment of the user. By opening up the data, it removes all monopoly value attached to the Data, whilst still retaining the inherent value of the Data. Innovation moves away from who can hoard the most user Data, to who can make the best use of the Data for the user.

If you’re curious to see EDC in action, please do have a look at the talk by Sanjay Jain & team here. The slides used in that talk are shared here.