In 2018, Facebook beganan initiative to support independent academic research on social media’s role in elections and democracy. This first-of-its-kind project seeks to provide researchers access to privacy-preserving data sets in order to support research on these important topics.
Today, we are announcing that we have substantially increased the amount of data we’re providing to 60 academic researchers across 17 labs and 30 universities around the world. This release delivers on the commitment we made in July 2018 to share a data set that enables researchers to study information and misinformation on Facebook, while also ensuring that we protect the privacy of our users.
This new data release supplants data we released in the fall of 2019. That 2019 data set consisted of links that had been shared publicly on Facebook by at least 100 unique Facebook users. It included information about share counts, ratings by Facebook’s third-party fact-checkers, and user reporting on spam, hate speech, and false news associated with those links. We have expanded the data set to now include more than 38 million unique links with new aggregated information to help academic researchers analyze how many people saw these links on Facebook and how they interacted with that content – including views, clicks, shares, likes, and other reactions. We’ve also aggregated these shares by age, gender, country, and month. And, we have expanded the time frame covered by the data from January 2017 – February 2019 to January 2017 – August 2019.
With this data, researchers will be able to understand important aspects of how social media shapes our world. They’ll be able to make progress on the research questions they proposed, such as “how to characterize mainstream and non-mainstream online news sources in social media” and “studying polarization, misinformation, and manipulation across multiple platforms and the larger information ecosystem.”
In addition to the data set of URLs, researchers will continue to have access to CrowdTangle and Facebook’s Ad Library API to augment their analyses. Per the original plan for this project, outside of a limited review to ensure that no confidential or user data is inadvertently released, these researchers will be able to publish their findings without approval from Facebook.
We are sharing this data with researchers while continuing to prioritize the privacy of people who use our services. This new data set, like the data we released before it, is protected by a method known as differential privacy. Researchers have access to data tables from which they can learn about aggregated groups, but where they cannot identify any individual user. As Harvard University’s Privacy Tools project puts it:
“The guarantee of a differentially private algorithm is that its behavior hardly changes when a single individual joins or leaves the dataset — anything the algorithm might output on a database containing some individual’s information is almost as likely to have come from a database without that individual’s information. … This gives a formal guarantee that individual-level information about participants in the database is not leaked.”
At its core, differential privacy operates by adding enough random noise to data such that there are mathematical guarantees of individuals’ protection from reidentification. As such, the results of analysis are the same whether or not a given individual is included in the data, meaning that people have plausible deniability that their information is contained within it.
We have worked with a committee of differential privacy experts, including Daniel Kifer, Aaron Roth, Abhradeep Thakurta, and Danfeng Zhang, to ensure that we have built differential privacy into our data set in a rigorous way. This committee has also submitted a white paper for journal publication that discusses guidelines for implementing and auditing differentially private systems. This white paper summarizes our learnings on implementing differential privacy and serves as a roadmap for other organizations seeking to implement similar privacy protections.
Over the past two years, we have dedicated more than $11 million and more than 20 full-time staff members to this work, making Facebook the largest contributor to the project. This announcement does not mark the end of our commitment. We will continue to provide access to data for independent academic research while ensuring that we also protect people’s privacy. We are currently onboarding another round of researchers, and interested researchers can continue to apply for data access through Social Science One’s request for proposal process. We look forward to sharing additional updates on this work over the coming months.