Research: Anonymous data is not so anonymous anymore


07/30/2019

Databases of medical institutions, insurance companies, telecom operators or entertainment services that do not contain a person’s first and last name are no longer personal data from the point of view of the law, and therefore can be freely sold or transferred to third parties. Scientists say that this supposedly anonymous data makes it easy to identify any person with an accuracy of 99.8%.



Homedust
About two-thirds of the world's population use the Internet, leaving digital traces daily and transmitting personal information to websites. Thousands of companies collect and process this information, and can freely share it. To do this, it is enough to hide treal names and surnames of people when transferring databases to someone. Laws on the protection of personal data all over the world no longer regard anonymized information as personal data, which means that it can be freely used, transmitted or sold.

However, impersonal data can be easily deanonymized, and the practice of companies in the storage and transfer of customer personal data does not guarantee respect for citizens’ right to privacy. This is the conclusion reached by scientists of Imperial College London and the Louvain Catholic University, who have learned how to virtually de-anonymize people using pieces of information from anonymous databases.

It is not necessary to know the name and surname to understand who exactly is in question in the database. It is enough to put a few parameters together. Knowing even three indicators, for example, zip code, date of birth and gender, one can isolate a very narrow circle of people. With the presence of 15 indicators collected from databases with socio-demographic characteristics of people, surveys and medical bases, it is possible to accurately determine a specific person.

As scientists say in an article published by Nature Communications magazine, their model of artificial intelligence for identifying citizens will be able to produce the correct result with a probability of 99.8% with coverage of the entire US population, or 99.6% with coverage of only 1% of the population.

source: nature.com