Favorite books are something friends like to share and discuss. A Facebook meme facilitates this very interaction. You may have seen one of your friends post something like “List 10 books that have stayed with you in some way. Don’t take more than a few minutes, and don’t think too hard. They do not have to be the ‘right’ books or great works of literature, just ones that have affected you in some way.” If not great works of literature, what are the books that have stayed with us?
The following analysis was conducted on anonymized, aggregate data.
To answer this question we gathered a de-identified sample of over 130,000 status updates matching “10 books” or “ten books” appearing in the last two weeks of August 2014 (although the meme has been active over at least a year). The demographics of those posting were as follows: 63.7% were in the US, followed by 9.3%in India, and 6.3% in the UK. Women outnumbered men 3.1:1. The average age was 37. We therefore expect the books chosen to be reflective of this subset of the population.
We programmatically segmented the posts into lists, and found the most frequently occurring substrings, which corresponded to different books, e.g. “Anna Karenina by Leo Tolstoy”. However, the same book could appear as different substrings: e.g. just “Anna Karenina” or “Anna Karenina – Leo Tolstoy”. We clustered similar variants programmatically, hand tuning where the algorithm had failed to merge two popular variants. We then used the clusters to automatically match the book lists against the common variants of the top 500 most popular books.
Here are the top 20 books, along with a percentage of all lists (having at least one of the top 500 books) that contained them.
While there are many great ‘serious’ books on the list, the Hitchhiker’s Guide to the Galaxy makes an appearance at #7, and Harry Potter reigns supreme (although enjoying the advantage that it was most often referred to as a series and our clustering algorithm lumped all Harry Potter books into the same cluster). Stephen King’s dark novels have stayed with their readers as well (The Stand at #14 and the Dark Tower series at #64). In the complete list of the top 100, included at the end of this post, we see a number of children’s books appear as well. Although these may not normally be considered great works of literature, they tend to stay with us through the decades. In particular, two of Shel Silverstein’s books (the Giving Tree and Where the Sidewalk Ends) make it into the top 100, as does the Little Prince.
One can also look at connections between the books, e.g. ‘people who listed X also listed Y’, using pointwise mutual information. In the network visualization, each node represents a book, sized by the frequency with which it was mentioned, as an edge represents an unusual number of co-occurrences of the two books in the lists.
Each book is linked to another it occurs with more often than expected. The color represents whether the book was more often mentioned by women (red) or men (blue)
There is actually another kind of network that forms. While some people shared the meme without tagging, calling on all their friends to make their own posts, others tagged specific friends whose favorite books they’d like to know about. Even a small fragment of the cascade shows long (tangled) tagging chains through which it diffused.
Tagging links posts about favorite books.
Do friends tend to like the same books? We computed the number of books shared between lists linked via tags, which was a mere 0.4 books on average! This number was 4 times greater than the overlap of 0.1 books between any two random lists. It is also an underestimate, since our automated matching identifies only 5.3 books/list on average (rather than the full 10), due to matching on just the 500 most commonly mentioned titles. Nevertheless, the low overlap underlines that even in a world of relatively few highly successful bestsellers, lists of favorites tend to be rather different, even between friends.
Finally, the remaining top 100 books were:
[An earlier version of this post had 2 clusters representing the Chronicles of Narnia series. When these were merged, the series rose up to #10]