August 6, 2015

The Not-So-Universal Language of Laughter

By: Lada Adamic, Mike Develin, Udi Weinsberg

Several weeks ago, Sarah Larson from The New Yorker published a fun article about e-laughter (all the hahas and lols we use to communicate with our friends online) and their social subtleties. Like any “dialect,” e-laughing is evolving. Curious as to whether her usage followed up-to-date social norms, she consulted her savvy friends for answers. Anecdotally, she found that laughter tended to vary by age and gender.

But why rely on anecdotes when you have data? We analyzed de-identified posts and comments posted on Facebook in the last week of May with at least one string of characters matching laughter1. We did the matching with regular expressions which automatically identified laughter in the text, including variants of haha, hehe, emoji, and lol2.

As denizens of the Internet will know, laughter is quite common: 15% of people included laughter in a post or comment that week. The most common laugh is haha, followed by various emoji and hehe. Age, gender and geographic location play a role in laughter type and length: young people and women prefer emoji, whereas men prefer longer hehes. People in Chicago and New York prefer emoji, while Seattle and San Francisco prefer hahas. Let’s dive in.

Ms. Larson’s first concern was that she laughs a lot, whereas some of her friends are “above it and don’t use has too much.” We found that roughly 15% of the people who posted or commented during that week used at least one e-laugh. So Ms. Larson – have no worries – it’s pretty common to laugh online!

For those people that laughed, we analyzed how many times they laughed. The plot below shows the distribution of the number of laughs, indicating that around 46% of the people posted only a single laugh during the week, and 85% posted fewer than five laughs. The plot also shows how many different laughs people used (labeled as ‘Unique’) – 52% of people used a single type of laugh, and roughly 20% used two different types.

Since most people used a single type of laugh, we classified people in our dataset into four categories based on their most commonly used laugh. For brevity of the plots, we write these as haha, hehe, lol and emoji. Keep in mind that the class label includes a wide range of laughs, e.g., haha includes terms like haha, hahaha, haahhhaa, etc. Here’s the breakdown:

As the pie chart shows, the vast majority of people in our dataset are haha-ers (51.4%), then there are the emoji lovers (33.7%), the hehe-ers (12.7%), and finally, the lol-ers (1.9%).

Ms. Larson discusses the emergence of the peculiar hehe, which is “poised upon us by the youth.” Are the hehes really a more youthful expression than hahas? The data say: not so! We found that across all age groups, from 13 to 70, the most common laughs are still haha, hahaha, hahahaha, and only then followed by hehe. If hehe is not particularly favored younger people, are there other distinctive ways youth express themselves? To answer this we collected all emoji, hahas, hehes and even the lols, and looked at the distribution of ages3:

This plot shows that the median person (the dashed line) that uses emoji is slightly younger than the median haha person, but both of these are younger than the people using hehes and lols!

Ms. Larson also suggests that a ha is like a lego piece, which people use to convey different “levels” of laughter, ranging from the polite haha to a deranged hahahahahahaha. So we look at the lengths of laughs by type:

Indeed, as Ms. Larson points out, the peaks in the even numbers indicate that people treat the has and hes as building blocks, and usually prefer not to add extra letters. No heh hehs here. (Settle down, Beavis.) The most common are the four letter hahas and hehes. The six letter hahaha is also very common, and in general, the hahaers use longer laughter. The hahaers are also slightly more open than the hehe-ers to using odd number of letters, and we do see the occasional hahaas and hhhhaaahhhaas. The lol almost always stands by itself, though some rare specimens of lolz and loll were found. A single emoji is used 50% of the time, and it’s quite rare to see people use more than 5 identical consecutive emoji. Perhaps emoji offer a concise way to convey various forms of laughter?

You might have noticed that we cut the plot at 20 letters, but as with any behavior on the Internet, there is a long tail of laughter lengths. Our automatic regular expression parser gave up after trying to get through a haha over 600 letters long! Computers have a long way to go before they can truly understand the human condition. We weren’t laughing that day.

Finally, Ms. Larson raised the suspicion that hehe is a more masculine laugh, since it’s made up of “a bunch of he’s” Well then, let’s take a look at the distribution of laughter across genders:

Both men and women like their hahas and emoji, followed by hehes and lols. The hahas and to some extent the hehes are preferred by men, whereas emoji are clearly dominated by women, who also seem to like the lols a bit more than men.

So we see that there are patterns in laughter on Facebook, but they are quite different from the anecdotal evidence presented in the New Yorker article. Then it hit us: maybe the difference is because Ms. Larson is hanging out with cool people from New York City. So we plotted the distribution of laughter across a bunch of cities. We focus on New York, San Francisco, Boston, Phoenix, Chicago and Seattle and got the following:

Indeed laughter varies by city, so we created heatmaps to see the popularity of the different types of laughter across states in the USA. For each laughter type, the map shows the fraction of laughter in each state out of the total laughter. The darker the color is, the more popular a laugh is compared to other states:

The maps broadly show that haha and hehe are more popular on the west coast, emoji are the weapon of choice in the midwest, and southern states are fond of lol. Presidential campaigns, take note: the battleground states of Ohio and Virginia are haha states, while the candidates’ emoji games will surely be key in determining who emerges victorious in Florida.

Special thanks to Moira Burke and the team members of Core Data Science.

Reference

[1.] We limit this study to posts and comments and do not look at direct messages through Messenger. In her article, Ms. Larson discusses conversations on messaging apps that might have different nuances from text posted on Facebook. Additionally, although we consider people from all around the world, we focus on English laughter and emoji.

[2.] Although we looked at global shares from around the world, we restrict this study to the following (mostly English) regular expressions:

(l+o+l+z*)+|([abw]?h+a)[ha\s]+\b|(h+e)[he\s]+\b

In addition we used the following emoji unicodes:

(\udbb8\udf34)+|(\ud83d\ude0c)+|(\ud83d\ude01)+|(\ud83d\ude1b)+|(\ud83d\ude1d)+|(\ud83d\ude1c)+|(\ud83d\ude09)+|(\ud83d\ude0a)+|(\ud83d\ude00)+|(\ud83d\ude03)+|(\ud83d\ude04)+|(\ud83d\ude06)+|(\ud83d\ude0b)+

That translates to the following emoji “words” (a sequence of one or more of the following):

[3.] This is a violin plot that visualizes the distribution of measurements with markers for the median (dashed line), the 25th and the 75th percentiles (lower and upper dotted lines, respectively). The width of each “violin” is relative to the number of samples at each value point over the total samples in the group.