September 15, 2014

What’s in a Name?

By: Adrien Friggeri, Mike Develin

For years, parents have been naming their children in clever, or, if you feel less charitable, “clever” ways. This dates at least all the way back to Roman times, where the twins Romulus and Remus share more than just genes and a wolf-mother: their names themselves bear obvious typographical and phonetic resemblance.

This trend is alive and well today. Roger Clemens loved strikeouts so much that he named his children all starting with the letter K. It’s sad to imagine the Kardashian family helplessly struggling to name its children in a world missing that same letter. The winner in this department, however, has to be boxer George Foreman, who famously named all five of his sons George.

How common is this alliteration? It turns out to be quite significant. We looked at pairs of siblings on Facebook who both live in the United States, and found that at all ages, they had a much higher likelihood of having the same first initial:

As you can see, a pair of siblings has about an 11% chance of having the same first initial, compared to about 7% as we’d expect from random chance (if their first names were independently drawn from the pool of all users of their respective ages). This trend does not appear to be fluctuating much over time; the late-teens drop is due to a cultural phenomenon where best friends of that age will often list each other as siblings.

When we examine the most common pairs of sibling names, we see even more similarity, and other themes emerge. These are the pairs of sibling names that occur most often compared to random chance (that is, # of pairs / # of expected pairs if names are random within their age):

1: Yvette-Yvonne: 37.4 times as often as expected

2: Faith-Hope: 31.4

3: Charity-Faith: 24.3

4: Jami-Jodi: 23.8

5: Gretchen-Heidi: 22.0

6: Charity-Hope: 21.2

7: Kelli-Kerri: 18.9

8: Latasha-Latoya: 18.5

9: Eileen-Maureen: 18.4

10: Colleen-Maureen: 17.0

11-20: Cesar-Oscar, Landon-Logan, Edgar-Oscar, Blair-Blake, Tammi-Terri, Adriana-Alejandra, Dalton-Dillon, Autumn-Summer, Edgar-Omar, Juana-Maria

21-30: Trent-Trevor, Kory-Kyle, Trent-Troy, Brendan-Colin, Eduardo-Jorge, Javier-Jorge, Garrett-Grant, Kathleen-Maureen, Jesus-Jose, Chance-Chase

The patterns are striking. There are two categories of non-typographical associations: names associated with the same ethnicity, and the Faith-Hope-Charity trinity of virtuous names. Aside from these (and even within the coethnic names) it’s extreme typographical and phonetic similarity, even above and beyond having the same first initial. It looks like the entire country, at least on this front, is keeping up with the Kardashians.

You may have noticed an unusual sawtooth pattern in the graph above, along with a spike for young ages. These both come from a set of siblings that’s even more sibling than ordinary siblings: twins! Romulus and Remus are just one in a series of twin pairs with similar names. Compared to the 11% rate for siblings and the 7% expected random-chance rate, twins have a much higher chance of having the same first initial (surprisingly, looking at the gender rates, this is only slightly higher for same-gender and thus for identical twins):

Interestingly, this rate has gone down quite a bit over the last 40 years, but many twins are still saddled with the same first initial, resulting in what is no doubt a lifetime of frustration around login names.

We close by looking at another surprising influence on name: where you live. In the state of Virginia, 46% more people are named Virginia than in the country at large; in West Virginia, it’s 28% more. We see no bump in surrounding states, suggesting that it’s not just cultural: either people named Virginia are drawn to these states or, perhaps more likely, parents are trying to be clever again. Oh, those parents.

All data was anonymized and aggregated. We removed cases where the siblings have the same name (sorry, Mr. Foreman, these are more likely to be multiple profiles of the same person than genuine siblings), and for the state case we adjusted for non-person user profiles (e.g. “Virginia Highway Patrol”).