Gabriel Kaptchuk , Researcher Assistant Professor in Computer Science, Boston University.
Elissa M. Redmiles, Faculty member & Research Group Leader, Max Planck Institute.
Rachel Cummings, Assistant Professor of Industrial and Systems Engineering, Georgia Institute of Technology.
The Trump administration’s move to ban the popular video app TikTok has stoked fears about the Chinese government collecting personal information of people who use the app. These fears underscore growing concerns Americans have about digital privacy generally.
Debates around privacy might seem simple: Something is private or it’s not. However, the technology that provides digital privacy is anything but simple.
Our data privacy research shows that people’s hesitancy to share their data stems in part from not knowing who would have access to it and how organizations that collect data keep it private. We’ve also found that when people are aware of data privacy technologies, they might not get what they expect.
Differential privacy explained
Imagine your local tourism committee wanted to find out the most popular places in your area. A simple solution would be to collect lists of all the locations you have visited from your mobile device, combine it with similar lists for everyone else in your area, and count how often each location was visited. While efficient, collecting people’s sensitive data in this way can have dire consequences. Even if the data is stripped of names, it may still be possible for a data analyst or a hacker to identify and stalk individuals.
Differential privacy can be used to protect everyone’s personal data while gleaning useful information from it. Differential privacy disguises individuals’ information by randomly changing the lists of places they have visited, possibly by removing some locations and adding others. These introduced errors make it virtually impossible to compare people’s information and use the process of elimination to determine someone’s identity. Importantly, these random changes are small enough to ensure that the summary statistics – in this case, the most popular places – are accurate.
In practice, differential privacy isn’t perfect. The randomization process must be calibrated carefully. Too much randomness will make the summary statistics inaccurate. Too little will leave people vulnerable to being identified. Also, if the randomization takes place after everyone’s unaltered data has been collected, as is common in some versions of differential privacy, hackers may still be able to get at the original data.