What’s in a Surename

When I was growing up, my family would occasionally get a piece of junk mail inviting us to send off fifty bucks or so for a handsome book. The brochure always featured a photo of a faux leather tome, stamped with a heraldic seal, and complete with a forked satin ribbon to mark one’s place amid the hundreds of gilt-edged pages. The book purported to be a history of ‘The Pearson Family‘, ostensibly an ancient clan descended from some Norseman named Per.

Ours was one of dozens of Pearson households listed in the St. Louis phone directory (as in any big American city), and I assume we were targeted by the book company through such public data (i.e., the McConaghy family down the block presumably didn’t get the same ad). But our family’s claim to the storied patrimony of Per is, uh, tenuous. My great-grandfather adopted the surname around 1920, at the behest of his brother, who worried that immigrants settling in Hamilton, Ontario would have a hard time finding work under so conspicuously Jewish a surname as Pinkovits. In light of this, we always got a kick out of the heraldic junk mail come-ons, imagining ourselves mingling at a worldwide Pearson family reunion, too short to find the punchbowl without stopping a Viking for directions.

Our family’s recent adoption of a surname is far from unusual, of course. Many north Americans today were born with surnames that were adopted by immigrant ancestors, or their early descendants, in the past few centuries. As is true in much of the world, these names are usually passed on patrilineally (i.e., only by fathers). In this sense, surname transmission mirrors that of mitochondria (which are passed on only by mothers), and effectively matches that of Y chromosome lineages. There are exceptions, of course, in which Y lineages are knowingly (through formal adoption) or unknowingly (through what human geneticists often call ‘non-paternity’, but would be better termed ‘cryptic paternity’ or, most simply, ‘cuckoldry’) given new surnames.

Considering the dynamics of this process, one might wonder how Y chromosome diversity will be distributed within, versus among, surname lineages in a given human population. You might intuit that a few basic parameters will largely govern that distribution. On the name side, there will be rates of willful surname change, cryptic paternity, and differentiation of pronunciation/spelling. On the genetic side, there will be the background mix of Y chromosome haplotypes in the given population, and, as time passes, the mutation rate of those haplotypes, and the degree of variation in breeding success among them.

In a new paper, Turi King and Mark Jobling set out to assess some of these parameters in the British population, focusing on 40 surnames that have, according to records, been established in Britain for a long time. In doing so, they augment a stream of data that started with a 2000 paper focusing on just one surname: Sykes. That paper, by Brian Sykes (no coincidence there, of course) and Catherine Irven, suggested that nearly all modern Sykeses share a patrilineal ancestor roughly 20 generations ago (a blink of the evolutionary eye). King and Jobling’s more comprehensive data paint a more complex picture, but one that ultimately suggests a strong, lasting relationship between surnames and the Y lineages that carry them. Meaning that many long-established British surnames show distinct Y haplotype compositions that are strongly biased toward one or a few haplotypes, which King and Jobling call ‘descent clusters’.

For some surnames, the Y lineages that make up those surname-specific ‘descent clusters’ happen to be similarly frequent in samples from the overall British population. In these cases, it’s hard to infer just how faithfully the surname has been passed on patrilineally; after all, the pattern might just as readily reflect a history of free adoption of the surname by various local patrilines throughout its history. As one might guess, some particularly common British surnames, such as Smith and King, show this relatively hard-to-interpret pattern.

For many other surnames, however, the most common ‘descent cluster’ Y haplotype(s) were quite rare in the general British population — or, if not generally rare, were so overwhelmingly common among carriers of the given surname that one could safely infer a robust association between surname and patriline(s). A striking exemplar of such clearly interpretable patrilineal clustering was Attenborough: nearly all sampled British men with this name showed a Y lineage that is particularly common in east Africa (and to a lesser extent in other parts of Africa, the Mediterranean basin, and southwest Asia), but very rare, overall, in Britain. In King and Jobling’s data, many other surnames showed such significantly distinctive ‘clustered’ Y haplotype composition.

King and Jobling are careful to note (and to verify by computer simulation), however, that the particular Y lineage composition seen for a particular surname need not closely resemble the Y-lineage composition among carriers that surname, say, 20 generations ago. Rather, some patrilines that originally carried the surname can readily have gone extinct (meaning that, at some point, the last remaining man carrying both the surname in question and that Y type died without having a son), leaving fewer and fewer distinctive Y lineages carrying the surname in question. Small populations (such as that comprising the male carriers of a given surname) are particular prone to random extinction of lineages, per se; termed genetic drift, this random process can quickly and drastically change the Y lineage composition of the given surname. King and Jobling infer that some cases of strong clustering seen for single surnames in their data may readily reflect such drift, rather than original foundation of the surname by just one or a few men.

A strange pattern in King and Jobling’s data goes unnoted in their paper: a modest positive correlation between surname length (syllable- or letter-count) and degree of patrilineal clustering (measured as the proportion of samples assigned to the largest lineage cluster). In my discussion of this pattern with Jobling, a couple of potential explanations came up. First, long names contain inherently more information than short ones — which might let researchers identify variants of longer names more accurately, including fewer ‘false positive’ matches that are likely to carry distinct Y lineages. Second, short surnames might be adopted, wholesale, more frequently than long ones, preferentially adding to their patrilineal diversity. In light of the question of surname adoption frequency, I’m curious at the degree of patrilineal diversity that might be found among carriers of surnames, such as Esposito (Italian ‘exposed’, as in ‘left out’), that were commonly assigned to foundlings in many parts of Europe. Such names may represent one extreme of the ‘founder diversity’ range, and, as such, might offer a good opportunity to gauge effects of a) background haplotype composition in a given population and b) genetic drift.



Plots of surname length, as measured by letter count or syllable count (American pronunciation), versus the proportion of carriers of that surname whose Y chromosome lineage belongs the commonest (‘modal’) surname-specific cluster, in the data of King and Jobling (2009)

The prospect of picking particular surnames for further study highlights unusual problems with sample-donor confidentiality posed by studies of surname-specific genetic variation. It would clearly be wrong to assume. from otherwise anonymous data published in a paper such as King and Jobling’s, that a given person — say, naturalist David Attenborough — carries a given allele, no matter how common it is in a reported sample. To more safely preclude such potentially highly personal overinterpretation, however, some prior authors have, in various ways, avoided fully specifying data for particular surnames. Such ‘redacted’ results may, however, be less usefully interpretable — especially by workers in tangentially related fields — than detailed name-specific data.

Arguably, even King and Jobling’s relatively detailed data are, ultimately, demographic trivia, overly specific to a particular population. Yet they may offer some modest, potentially comparatively useful ethnographic insights, particularly regarding early surnaming conventions (they discuss the possibility of distinguishing signals of patrilineal variability as they may relate to geographic, occupational, or other types of surname origin) and, more weakly, the sex lives of ancestral Britons. And the new paper carries one more nugget of potential interest, too: data on the surname Jefferson. Nearly all of the newly sampled British Jeffersons carry one of the two most common Y lineages in Britain. Only about 4% of them, by contrast, carry the distinctive Y lineage found among patrilineal descendants of American statesman-scientist Thomas Jefferson (and that also appears, much more densely than in Britain, in Mediterranean populations). As famously reported in another paper by Jobling and colleagues, those American descendants likely include descendants of Jefferson’s son Eston, whose personal story exemplifies the complex association of surnames with Y haplotypes. Born into slavery (and never publicly acknowledged by his slave-owning father), Eston used his mother Sally‘s surname, Hemings. Sally, in turn, got that surname from her own mother, likely because her father too (reportedly a slaveowner of European ancestry, like Jefferson) refused to acknowledge his extramaritally fathered children — especially those understood to have African ancestry.