Natural disasters force millions of people to leave their homes on a yearly basis. In 2020, the Internal Displacement Monitoring Center estimated that 30.7 million people were displaced as a result of weather-related or geophysical events. Given the severity and increasing frequency of disasters, Data for Good has been collaborating with humanitarian organizations to better understand displacement patterns. For example, we have been sharing privacy-protective gender disaggregated displacement maps that are helping humanitarian organizations understand the number of men and women displaced after a disaster, as well as when people are able to return home.
Today, we are announcing a new re-weighted version of displacement maps that aims to better represent displacement trends on the ground. While Facebook-specific data can provide valuable information to our partners, it does not typically represent the trends of the total population, since not all people use Facebook and not all people using Facebook choose to share their location history (LH) information, which our displacement maps rely on. To account for these issues, we have implemented a re-weighting methodology to reduce the biases in our underlying data and better represent ground-truth trends in displacement across the world.
To reduce biases in our Data for Good displacement maps, we use a two-step re-weighting process. First, we aim to make Facebook location history users more representative of the entire Facebook population. Then, we aim to make Facebook users more representative of the population as a whole. To conduct this weighting, we leverage two sources of data: (1) known characteristics from Facebook users such as age and gender, and (2) gold standard data sets on population, relative wealth, and the density of roads, the last of which serves as a proxy for identifying urban vs. rural areas.
Stage 1. Re-weighting the Facebook population to the population as a whole
In stage one, we aim to make Facebook users representative of the population on the ground in terms of age, gender, relative wealth, and a proxy of rural/urban characteristics. To conduct this weighting, we aggregate Facebook users into age and gender buckets at tiles that are 2.4 kilometers on a side and calculate the average relative wealth index and road density within each. Then, we re-weight these tile-level aggregates against the ground-level population estimates for the same area.
Stage 2. Re-weighting Location History users to the Facebook population as a whole
In stage two, we leverage Facebook users' characteristics such as age and gender to make Facebook location users more similar to the re-weighted version of Facebook users obtained in the first stage. Given that in some areas we might have a sparse number of Facebook location users, we produce this step at a higher geographical resolution to include the whole area affected by a recent disaster.
For both steps in our methodology, we use Inverse Probability Weighting (IPW) calculated with a Lasso Regression. By applying IPW to our LH users, we aim to remove systematic differences between a statistical estimate and the true population parameter caused by problems with the composition of the sample. In our case, as Facebook location users have different probabilities of being selected into our sample compared to the rest of the population and IPW weights each person by the inverse of its probability of selection. This process is commonly used to re-weight nonrandom samples in survey research.
Understanding bias-correction in Saint Vincent volcanic eruption
On April 8, 2021, the government of St. Vincent and the Grenadines issued massive evacuation orders and on April 9, La Soufriere volcano erupted for the first time in 42 years. Three days later, a second eruption released large amounts of hot ash, lava, and toxic gas into surrounding areas. Approximately two-thirds of the island was covered in a thick layer of ash.
Humanitarian organizations and media outlets report that approximately 20,000 people were displaced from the volcanic eruption in St Vincent. However, as it is common with displacement data, response organizations have struggled to get more detailed information about the true level of displacement. While governments and nonprofits may have a sense of how many people ended up in shelters, they lack information of how many people left their homes to stay with friends and family, or when people who were temporarily displaced returned home.
The new re-weighted displacement maps for St. Vincent and the Grenadines estimate that around 13,700 people over the age of 15 were displaced by the volcanic eruption. If we rescale this number to represent the total population, we get a total of around 18,200 displaced people, which is within 10 percent of the official estimate of 20,000 people estimated by the government and humanitarian organizations — a reassuring sign that our re-weighted estimates reflect on-the-ground reality.
When comparing unweighted and weighted displaced data, we can see that our unweighted estimates tended to overestimate both men and women between the age of 25 and 50, more so women than men, and underestimate those under 25. By reducing these biases, we create a resulting displacement map that is more reflective of the true population.
In addition to better estimating absolute levels of displacement from weather-based events, these new re-weighted estimates should help our partners interpret gender-specific trends in displacement with more confidence. By reducing biases in the underlying data, we can better understand how men and women are truly affected differently after a disaster.
The plot below shows how the bias correction changes what we learn from the final displacement findings. If we look at the unweighted trends by gender in red, we might conclude that more women were displaced than men and that three months after the disaster, a similar level of men and women were still displaced. The weighted trends in blue, however, tell a different story: men and women were displaced at a similar level and three months after the disaster, a higher percentage of men continued being displaced compared to women. These discrepancies between weighted and unweighted data are caused by the fact that the original Location History user data overestimated the true number of displaced women in St. Vincent and the Grenadines.