Social Science One and Facebook this week hosted training for more than 50 independent researchers announced last month by the Social Science Research Council to study the role of social media in elections and democracy. Drawn from 25 universities located across eight countries and five continents, the researchers participated in hands-on sessions and discussions this week in order to learn the new, highly secure tool created by Facebook to share aggregated and anonymized data with researchers for the purposes of studying social media’s role in elections.
At the training, researchers were given access to this new Facebook research tool along with an initial data set of URLs shared publicly on Facebook (rather than shared privately or to specific friends or Groups) by 100 or more unique users. This aggregated and anonymized data includes the URL link, the URL’s “share title,” a text summary of the content, information on the country where the URL was shared most often, and any ratings from Facebook’s third-party fact-checking partners. Researchers will be able to use this information in conjunction with data already available to them from CrowdTangle and Facebook’s Ads Library API to analyze topics ranging from “Measuring the Effects of Peer Sharing on Fake and Polarized News Consumption” to “False News on Facebook During the 2017 Chilean Elections: Analyzing Its Content, Diffusion, and Audience Characteristics” to “Mapping Disinformation Campaigns Across Platforms: The German General Election.”
This first-of-its-kind partnership between the academic research community and Facebook holds the potential to unlock important findings with large societal impact. At the same time, this work must be performed in a manner that protects people’s privacy. To achieve these dual goals, Facebook worked with the academic, privacy, and security communities to build a system that allows researchers to access data through a querying system that provides insights without revealing individual people’s identities.
This new system will help address challenges encountered in past data-sharing efforts:
A key innovation of the development of the research tool has been to build in systems, such as differential privacy, that help provide more formal guarantees of privacy. Differential privacy is an innovative new method of adding “noise” to data sets to protect against reidentification attacks, which attempt to break conventional anonymization techniques. For this project, the tool uses differential privacy to prevent those who have access to the data from determining whether a specific individual contributed to the data set.
In designing, building, and testing this innovative approach to data sharing, Facebook sought guidance from a wide range of experts. For example, in late 2018, we worked with a third-party security vendor to conduct security penetration tests, which allowed the team to identify and fix potential vulnerabilities prior to sharing any data. We also worked with Nick Nikiforakis (Stony Brook University) to validate that we utilized best practices for the removal of personally identifiable information from the URLs in the data set. And much of our work to ensure privacy in our data-sharing efforts has also involved the help of privacy experts from the academic community, including Michael Hay (Colgate University), Daniel Kifer (Penn State University), Aaron Roth (University of Pennsylvania), Abhradeep Thakurta (UC Santa Cruz), and Danfeng Zhang (Penn State University). Along with Social Science One, they will continue to advise our team and test our systems to help ensure that the tool offers strong privacy protection while still facilitating reliable research. This work will result in two formal white papers on our system’s ability to 1) protect privacy and 2) serve the research community.
Over the next several months, Facebook will continue to work with our privacy advisers and Social Science One to confirm that the system we’re building offers strong privacy protections while also providing valuable insights for researchers. Additionally, we are continuing to review what other types of aggregated and anonymized data could safely be released to further aid researchers, and we look forward to these ongoing discussions.