iSPIRT Foundation’s Official Response to Non-Personal Data Report

Last month, an expert committee chaired by Kris Gopalakrishnan submitted a report on a framework for regulating Non-Personal Data in India. The rest of this blogpost contains iSPIRT’s response to this report.

At iSPIRT Foundation, our view on data laws stems from three fundamental beliefs: 

  1. Merits of a data democracy (that is, the user must be in charge),
  2. Competitive effects must be well understood, for creation of a level playing field amongst all Indian companies, and some ring-fencing must exist to protect against global data monopolies
  3. Careful design enables both high compliance and high convenience

It is with these three perspectives that we have analyzed the Non-Personal Data report in our response. 

The report makes a well-cited and articulate case for regulating data in India, such that “[data’s] benefits accrue to India, and Indians”. It defines and outlines the roles of various entities relating to the capture, processing and governance of Non-Personal Data. Whereas we are wholeheartedly aligned with the vision of the committee and its members in creating an Aatmanirbhar Bharat and the importance of facilitating a robust AI-industry in the country, we find that the report is sound in premise but murky in detail. Where an overarching legal framework surrounding NPD is proposed, on closer inspection it falls short of achieving the aforementioned goals of protection for Indians and prosperity for Indian businesses. Therefore, more work must be done before a final version of this report can be released. 

In what follows, we take the example of one industry that will be affected by the new Non-Personal Data framework and attempt to understand the report in relation to it. 

Case Study: X-ray Data

Consider the case of a (fictional) health-tech startup called rad.ai that seeks to generate radiology summaries from lung X-rays. These summaries will help pulmonologists (lung-specialist doctors) identify various pulmonary diseases or early signs of cancer. Such a company requires two things to succeed – data about X-rays and technology to build models. 

Under the tenets of the NPD report, it would seem that (given the right technology foundations) such a startup would be well placed to succeed. In particular, the report suggests that the X-ray data as described above will be classified as community data and outlines mechanisms for collecting and sharing it. It suggests that the dual goal of enabling rad.ai while protecting the subjects of the data can be accomplished in the following framework – 

  1. rad.ai will first be able to identify that appropriately anonymized lung X-rays, termed community data, are available for access through meta-data that is published by radiology labs engaging in collecting such data. 
  2. rad.ai will then be able to access this data for free through a data trust (presumably a virtual API on the cloud) which is created by the radiology labs. 
  3. This will all happen with the consent and oversight of a data trustee, an entity that is a representative of the persons from whom this data is collected. 

While this sets a broad-strokes framework with the right intentions, there are a few key problems that arise when we dive deeper into the proposed solution. We have outlined some areas where clarifications are needed and details must be made available. These include the definition & value of community data, obligations of the data custodian, access for data users and rights of the data principal. Towards the end, we also discuss the question of NPD’s interplay with personal data and the definition and use of public data

Definition and Value of Community Data 

While the case of X-ray data seems straightforward, the scope of community data in the report appears to be vast. However, it also seems to suggest that while data explicitly extracted from users is classified as community data, data generated in the course of a business’s activities may be considered private business data – and this may not be mandated to be shared. In that case, it would appear that e-commerce, delivery and employee data fall under private business data – although an argument could be made for classifying this as community data. 

Further, the report makes a distinction between raw and processed data. In particular, raw data must be shared free of cost, whereas the price of processed data could be determined based on market or FRAND (fair, reasonable and non-discriminatory) principles. In the case of lung X-ray data, labs obtain such data from anonymizing raw personal data, so presumably it is now processed and need not be shared without remuneration. In the general case, however, we recommend that the lines between raw and (various levels of) processed data should be clearly demarcated. 

Data Custodian Perspective

Consider next, the situation from the perspective of the diagnostic labs. There are about 70,000 labs and hospitals offering radiology services in the country [ref]. Each of these has its own technology implementations or third-party TSPs (technical service providers) for collecting and storing any patient data. How can all of these labs and hospitals be mandated to provide control of this data to the data trustee in order to share the X-ray information with companies like rad.ai? Who is/are the data trustee in this case anyway – the MoHWF, the NHA, an NGO serving the interests of cancer patients? If there are multiple data trustees that have an interest in the same underlying data, how is it ensured that every decision that each of them takes is in the best interests of the user? How are labs supposed to even publish the meta-data (meta-information about what data they have collected) in a standard, machine-readable format, so that companies like rad.ai can discover it seamlessly? Finally, considering that such compliance comes with associated costs, what are the concrete benefits for making this data available?

Data User Perspective

Then there is the case of rad.ai itself. Let us assume, for now, the provisions of the previous paragraph and overlook the contentions raised. Say that because of access to this X-ray data, the startup rad.ai has been able to create excellent ML models to predict the early risks of major diseases and help pulmonologists. It has done well and begins to expand across the country. A few years later, a foreign health-tech giant (again, fictional) called Hooli enters the fray by investing in rad.ai. Backed by foreign investment, rad is now stronger than ever. However, the report identifies “to promote and encourage the development of domestic industry and startups that can scale” as one of its key objectives. While rad.ai was originally a fully Indian company, it was allowed access to the community data from radiology labs to develop its models. Now that rad.ai has foreign investment, should it still be allowed to benefit from Indian community data? What if Hooli owns a majority stake? What if Hooli acquires rad.ai outright? 

Data Principal Perspective

Finally, consider this from the perspective of the patient herself – the “data principal”. How is she guaranteed that her interests are protected and the aforementioned data is impervious to de-anonymization? As discussed above health data is considered sensitive data and the patient has every right to ensure that it is handled with extreme care. The report also recommends that consent must be taken from the actors in the community before the anonymization and use of their data – but what form will this consent take? How will its purpose be determined and collection be enforced before any data is anonymized and utilized? How can anonymisation be assured when multiple anonymised datasets are capable of being combined together? Finally, the report acknowledges that all sharing of community data should be based on the concept of “beneficial ownership” where the benefits of this data sharing also accrue to the data principal. How is the patient directly benefiting from such a data-sharing mechanism? What mechanisms are in place to enable such benefits?  

Interplay With PDP Bill

Consider next, the case of the health-tech giant Hooli and assume that it also operates a search engine. As per the suggestions of the Srikrishna PDP Bill [Clause 14], Hooli could be allowed to collect personal user data and use it directly since its operation of search engines falls under “other reasonable uses of personal data”. When combined with the fact that under the provisions of the Non-Personal Data report sharing of all raw NPD becomes mandatory, Hooli is actually disincentivized from anonymizing its personal search data. By keeping this data personally identifiable, it is prevented from having to share it with competing search engines and startups. Can misaligned incentives such as these hinder progress towards the report’s stated goals? 

On Public Data

Consider finally that some of the labs and hospitals conducting X-rays will be government-owned. The report recommends that all data generated through government and government-funded activities are classified as Public data, which is in turn classified as a national resource. There is no guidance on the compliance requirements of the government controlling this public data under such a scenario. Will government hospitals still be required to make it available? More importantly, does the community (i.e. patients) have no claim to their data in this case merely because it was collected by the government instead of a private entity? 

Concluding Remarks

In summary, the report importantly conceives a legal notion of community, private and public non-personal data and outlines a framework in which such data might be shared for the advancement of economic, sovereign and core public interests. However, more work must be done to detail compliance requirements and standardize the mechanisms of sharing such data before the final version of the report is released. In its current state, the report only touches upon the rights of the data principal, the obligations of the data trustee and the compliance requirements of the data custodian. In a State with limited regulatory capacity for creating standards and enforcing bodies, these rights, obligations and compliance requirements should be comprehensively and clearly defined before a law ordaining them can come to pass. Custodians must be incentivized to share, industry startups must be enabled to access and the community must be empowered to self-preserve before the vision of an Aatmanirbhar Bharat with a booming AI industry can be realized. 


For further queries, drop us a note at [email protected]