September 29, 2020

Friendship across Europe: How geography and history shape social networks

By: Michael Bailey, Drew Johnston, Theresa Kuchler, Dominic Russel, Bogdan State, Johannes Stroebel

Social connections shape many aspects of global society. While understanding the geographic structure of these connections is important for a wide range of social science and public policy issues, researchers have traditionally been limited by a lack of large-scale representative data on connectedness. Using theSocial Connectedness Index (SCI), an aggregated measure constructed from the friendship networks of Facebook’s more than 2.5 billion monthly active users, we study social connections between European regions. Our results suggest that geographic distance and political borders are important determinants of European connectedness. In fact, we find that the relationship between borders and connectedness persists even long after boundaries change (for example, within former Czechoslovakia and the Austro-Hungarian Empire). We also find that social connections in Europe are stronger between regions with residents of similar ages and education levels, as well as between regions that share a language and religion. In contrast, European region-pairs with dissimilar incomes tend to be more connected, likely due to patterns of migration.

Social Connectedness Index

The SCI uses aggregated friendship connections on Facebook to measure the intensity of connectedness between locations. Locations are assigned to users based on information they provide, connection information, and location services they have opted into. These friendships are used to estimate the probability that a pair of users in these geographies are Facebook friends and mapped to an index score called the Social Connectedness Index. If the SCI is twice as large between two pairs of geographies, it means users in the first geography-pair are about twice as likely to be connected compared with users in the second geography-pair.

More details on the methodology can be foundhere and in the paper Social Connectedness: Measurement, Determinants, and Effects, published in the Journal of Economic Perspectives.


To explore the factors that shape social connectedness in Europe, we looked at SCI between NUTS2 regions, which have between 800,000 and 3 million inhabitants. Using this data, we first constructed a number of case studies. For example, we plotted the social connectedness of the Limburg and Namur regions in Belgium. We found the strongest social connections for both were to other areas nearby within Belgium. Yet, while the capitals of the two regions (Hasselt and Namur, respectively) are less than 70 km apart, the two regions’ connections outside Belgium differ substantially. The official and most commonly spoken language in Limburg is Dutch, whereas in Namur it is French. Accordingly, Limburg is more strongly connected to the entire Netherlands to the north, and Namur is more strongly connected to areas throughout all of France to the south. This suggests that language has an important relationship with patterns of connectedness.

We then sought to understand how patterns of connectedness would be reflected if we created communities of 20 and 50 regions with strong connections to each other (instead of the existing 37 countries). To do so, we created clusters that maximize within-cluster pairwise social connectedness using hierarchical agglomerative linkage clustering.

In the 20-unit map, nearly all the community borders (denoted by a change in area color) line up with country borders (denoted by large black lines). This suggests that individuals are more likely to be connected to distant individuals within their own country than equally distant or closer individuals in other countries. Furthermore, cross-country communities mostly line up with historical borders: For example, every region in the countries that made up Yugoslavia until the early 1990s (NUTS2 regions are defined for Slovenia, Croatia, Serbia, North Macedonia, and Montenegro) are grouped together in one community. We also see the importance of migration in shaping connectedness: Outer London West and North West, which have welcomed a large number of Romanian immigrants in recent years, are grouped together with Romania.

In the 50-unit map, countries begin to break apart internally. Most of these resulting subcountry communities are spatially contiguous, consistent with distance being an important determinant of social connections. We also see linguistic communities form: Belgium splits into French- and Dutch-speaking communities, and Catalan and Andalusian Spanish communities emerge in Spain.

Finally, we used a formal regression approach to assess the relationship between certain factors and European connectedness. Consistent with our exploration, we found that connections are strongest between areas that are physically close to each other: A 10 percent increase in distance is associated with a 13 percent decline in social connectedness. Social connectedness also drops off sharply at country borders. Controlling for geographic distance, the probability of friendship between two individuals living in the same country is five to 18 times as large as it is for two individuals living in different countries.

Using a number of 20th-century European border changes, we also found that this relationship between political borders and connectedness can persist decades after boundary changes. For example, we found higher social connectedness across regions that were originally part of the Austro-Hungarian Empire, even after controlling for distance, current country borders, and a number of other relevant factors.

In addition to distance and political borders, we found that regions that are more similar along demographic measures such as language, religion, education, and age are more socially connected. In particular, social connectedness between two regions with the same most common language is about 4.5 times larger than for two regions without a common language (again controlling for same and border country effects, distance, and other factors). In contrast, we saw that pairs of regions with dissimilar incomes are more connected. Our exploratory analyses suggest this trend may be explained by patterns of migration from regions with lower average incomes to regions with higher average incomes.

A full version of our working paper with additional details on our methodology is available here.

The social connectedness data used is available here.