Internet’s Truth or Dare game of anonimisation

A while ago, I broke my self imposed “participate in only one social network policy” when I installed popular networking app Secret. It was out of sheer curiosity and an intention to experiment with its technology of anonymous social networking. The underlying premise is very interesting as the people participating are from your extended friend circles but are anonimised by a combination of encryption and oneway hashes, time lags. As expected both vigilantism and mud slinging are rampant and there are lot of scandalous topics and dirt being posted regarding obvious taboo subjects, about people and sometime very personal intimate information, which if traced back to a real person can cause huge legal trouble, embarrassment and pain in multiple spheres of life. But can one be really anonymous online even while assuming the technology and intent behind such apps to be secure and trusted ?

Let us look at the recent case of data share in NY city which released anonymized records of all the cab trips that included, types of cabs, number of passengers, routes, times, fare and other treasure trove of information which could be used for intelligent planning of traffic, roads, cab capacity, parking, public transport. However some of the information such as drivers license number, cab license plates could be sensitive information as they lend themselves to malicious uses apart from breach of privacy. So that information was masked by the use of oneway hash (for uninitiated, it is an encryption technology which can never be decoded back to original text even if the key and algorithm is known). However one intelligent researcher saw this data and realized that license numbers are fixed format and there is finite number of possible hash results. So he simply computed all possible hashes (173 million records) and matched that with the datasets that identify all the cab drivers, their incomes and whereabouts.

There is a considerable research happening in the space of re-identification with the fact that only 33 bits of unique information to be able to uniquely identify everyone on the internet. This along with slew of public data statistics such as census records, innocuous social network data could mean the task of re-identifying anonymized information is easier than ever. As in everything with the internet, many such technologies are becoming commoditized, due to cheap availability of processing power.

Now lets apply this concept back to secret; With the secrets utility that informs how far the person posting something is located, whether he/she is a friend and pandering to one’s inner sherlock Holmes aka deductive logic one can potentially figure out who the people participating in that discussion are. This is in spite of the integrity of the app and security technologies being intact. Obviously any malicious hack could lead more devastating circumstances.

While secret is somewhat frivolous albeit interesting use case, there are many valid, business and legal reasons for protecting people’s identity while sharing relevant data for both academic and business purposes. Strong anonymization is desired in many cases, but often leads to loss of intelligence. Next few years will see an interesting race between anonymization and re-identification both providing useful application in various contexts.