Rare variants, disease, and population size

Three new papers spotlight a glut of rare variants in our genomes, with key insights for human history and health.

Rare fruit burdens the boughs. Female gingko, Chicago<br />
(image copyright Nathaniel Pearson)

Rare fruit burdens the boughs. Female gingko, Chicago
(image copyright Nathaniel Pearson)

Rarity abounds
Data-rich new papers from teams led by Josh Akey and John Novembre, and a brief theory paper from Alon Keinan and the prolific Andy Clark, highlight a bounty of rare genetic variants in our genomes — and point out why we should care.

Bolstered by the papers’ data from more than 80 million copies of individual human genes, the growing catalog of such rare variants casts our recent ancestors’ rampant population growth into sharper temporal relief — and should, in the long run, help finely trace the geographic sojourns of particular copies of human chromosome segments. More importantly, however, many of those rare variants likely figure centrally in our health.

These basic insights have been clear to geneticists for a long time, and it’s great to see them percolate through the lay press. The new data papers scoured every letter of many genes in thousands of people, and found a bumper crop of spelling variants that are each found in just one or a few of those people. The third paper summarized what such findings suggest about precisely how big the human population has been over time, and roughly what they mean for efforts to understand disease.

Altogether, the findings cast such bright light on our origins and health because, under simple assumptions1, geneticists can predict how often variants that do (or don’t) greatly alter proteins should pop up in a given proportion of people, if our ancestors were steady in number, and if proteins weren’t especially important for health. And those are two big ifs.

The new data highlight that real patterns of such variant frequencies in our genomes drastically flout those null expectations — and they call sensible attention to rare variants, which underlie that deviation, as we search for the genetic basis of disease. More specifically, the papers all underscore two broad insights that have been clear for several years:

• Our population has skyrocketed, but just for the past few millenia — a trend that’s left a strong signature of many young, rare spelling variants in our genomes.

• Many of those rare variants may be making us sick.

A tippy tree, laden with rare fruit
The findings in the new papers hinge on a simple insight: the more widely common a genetic variant is, the older it likely is. This is because old variants have typically been carried down many branches of the growing human family tree, spreading far and wide on the planet. By contrast, variants that just arose recently are typically confined to recently sprouted, geographically narrow branches of the tree.

While details of very early human population dynamics are hard to precisely infer2, the new data, along with much other genetic and ancillary historical evidence (see Keinan and Clark’s reference citations, for starters), suggest that our population has grown extremely fast in the past few millenia. Such growth has, effectively, stretched the human family tree at its tips: the tree’s young twigs look longer3, in units of generations, than we’d otherwise expect, given how long the trunk and inner branches are. And because new genetic variants pop up roughly randomly (by mutation) on the branches as they grow, the long, fast-growing tips of the tree harbor more of its total load of mutations than they would have, had the tree grown at a constant rate.

You can picture each such mutation as if it were a little brainstorm in the head of the late Dr. Seuss. Had Seuss drawn genomic family trees, he might have represented each mutation as an odd, never-before-seen kind of fruit, confined to the branch (big or small, and including its sub-branches) where the mutation struck. Many of the rife rare variants in our genomes can thus be thought of as distinctive fruits, each confined to just one or a few twigs amid a great, bushy tree.

In this light, the new papers affirm what’s become clear over the past few years, as we sequence more and more people’s whole genomes: we’ll still be finding new human genetic variants for a long time, even after having sequenced many more of us.4

And, as long as our population continues to dramatically balloon — a system out of equilibrium, in population genetic terms — the tree will continue to loosely resemble an inflationary universe, its various branches speeding apart from each other via new mutations. In this analogy, the genetic counterpart of the red-shift that signals cosmic expansion is, roughly speaking, the overall skew in frequency, toward rarity, of our genetic variants.

Rare variants in disease
Visions of the human family tree, tips bent toward our inquisitive grasp by newfound fruit, may recall the myth of another tree. Apt, then, that the crop of rare variants in our genomes may include much of the fruit of human affliction.

Rare variants are thought to figure centrally in disease for two related reasons: as we’ve seen, most such variants are rare because they arose recently, so haven’t had time to spread widely among people; and young variants, by definition, haven’t withstood natural selection for long.

Such selection — often assisted by chance — tends to keep harmful variants rare, or purge them from the population altogether. Non-harmful rare variants, by contrast, are in principle free to get more common (though chance often strikes them down too).

That is, over time, consistently harmful variants tend to vanish, especially if the population is big enough to stably harbor a rich variety of alternative variants; meanwhile, variants that happen not to harm their carriers are free to spread, whether by chance or, in rare cases, by helping their carriers have more kids than others do.

Together, these trends mean that a snapshot of the rare variants we carry today, like a minute’s worth of the world’s newest tweets, is likely enriched for items that will soon be either gone5 or, in a few cases, more common.

And they help explain why surveys of the common genetic variants covered by fast, cheap SNP chip screens rarely offer clear insight into disease risk. For a given stretch of the genome, such common variants do distinguish big branches of the human family tree from each other, making them quite informative of ancestry. But a consensus has emerged that the long tail of human genetic diversity — all those rare variants — is where we’ll find much of the genetic contribution to disease risk.

Spotting which rare variants harm us, however, turns out to be tough.

Proof of burden
Take the extreme case of a variant found in just one woman, among everyone on earth. If we split humanity into those who get a given disease in life, and those who don’t, our chosen woman must fall into one group or the other. And if we look at enough diseases, she’ll eventually fall into the sick group for at least one of them.

But it’s clearly too far a leap to infer that the unique variant she carries made her sick. That is, the variant’s distribution among people with and without the disease simply can’t be statistically significant, given how rare it is overall.

To meet this inherent challenge to squarely implicating a given rare variant in a given disease, geneticists look to leverage other insights. If the variant really is too rare to show up on further screening of more sick or healthy people — and that’s a place where the new data are already helping us at Knome, as we shortlist intriguing variants for research clients — they next ask how readily it may affect physiology by altering the amount or chemical makeup of a protein encoded by a gene that either harbors the variant itself, or sits near it in the genome.

And, next, they may look at more people with the disease in question, and ask whether other rare variants tend to cluster nearby in their genomes, moreso than other people’s. In recent years, as richly detailed data on human genetic variation has started to flow, geneticists have been honing rare variant burden tests specifically to find such regions. Refining such tests, and gathering more genetic and phenotypic data to feed them, stands to bring many key insights into the genetic basis of disease (and on a time frame shorter, we can certainly hope, than that needed for natural selection itself to weed all those harmful variants from the crop of rare variants we carry!).

A new drug
To thoroughly catalog the rare variants that pepper our genomes, of course, we have to read what DNA letters we carry at each site in the genome, rather than just at those sites already known to vary in spelling (as in SNP chips). The newly published work furthers that effort, by carefully sifting through particular sets of genes in many thousands of people — more people than have ever been so comprehensively sequenced together.

Notably, the Novembre group’s paper focuses on a few hundred genes already thought to help govern how the body responds to particular drugs. Such genes are actually an intriguing testing ground for the notion that rare variants crucially shape not just disease risk, but other phenotypes (outward traits) too.

Many drugs derive from defense chemicals made by plants and molds — nature’s organic chemists extraordinaires — that our ancestors have long eaten, breathed, and otherwise touched. But modern folks have also tinkered greatly with such drugs, concentrating, combining, and diversifying them in our quest to prevent and cure diseases. As such, many drugs, and cocktails thereof, are (like other facets of our overall diets) fairly new parts of the human environment.

Drugs we take are thus exposing even the most common (read: oldest) variants in our genomes to novel regimes of natural selection. Many such drugs work better, at particular doses, in some people than others — and such variation may often trace largely to variation in our genomes.

Looking ahead, I’m intrigued to see whether rare genetic variants turn out to explain unusual responses to particular drugs as well (or better) as they explain particular diseases — or, alternatively, whether such variation in drug response traces largely to common variants in our genomes.

Tall trees: the diversity skyline
An intriguing tidbit in the Akey group’s paper is a spatial contour of overall genetic diversity across thousands of genes in our genomes. Plotting the classic measure of nucleotide diversity — that is, how often two randomly chosen chromosomal copies of a genome site differ in spelling — Akey’s post-doc Jacob Tennessen et al. predictably found the strongest peak in diversity in the HLA gene cluster on chromosome 6. Expressed on the surface of immune response cells, these genes work, in large part, to help us fight infection — a job thought to be well served by great genetic diversity within a genome, which presumedly helps its carrier respond to many kinds of germs.

Byzantine in its sequence variation, HLA turns out to play surprising functional roles in mate choice, drug response, and diseases from multiple sclerosis to narcolepsy. Notably, women and other female great apes likely pick their mates in part (and unconsciously) by how they smell, thanks partly to what versions of HLA they and their suitors carry. Such preferences are thought to help preserve genetic variation longer here than elsewhere in the genome — so well, in fact, that your copies of some HLA genes more closely resemble some gorillas’ copies than some other people’s copies…and those gorillas’ HLA genes are likewise closer to yours than to each others’!

Essentially, even the inner branches of the family tree of this part of the genome are incredibly long, stretching back ten-fold more generations than is typical. As we’ll see in a coming post, the overall depth of the tree for a given part of the genome can be thought of as a rough proxy for how big the ancestral population for that part of the genome has, over time, tended to be.

Other peaks in genetic diversity — lower than HLA, but still prominent — include odorant receptor and keratin (hair/skin protein-making) genes, which are widely presumed to accumulate functionally unimportant variation, reflecting less stringent evolutionary constraint in people than in some other mammals. Strikingly, however, the Akey group also found that another immune response gene, DEFB108B, marks a peak in genetic diversity roughly as tall as that of the much better known HLA cluster. It’ll be intriguing to learn more about what DEFB108B does in our bodies, and whether its remarkable diversity reflects HLA-like importance, or keratin-like dispensability.

Stay tuned on that front. As more of us are sequenced and phenotyped, we’ll learn much more about which of our variants — among the common ones, and the newly commonplace rare ones — matter most, and how. Much of what we learn will speak directly to the pending challenges of genomically personalized medicine, as framed in fervent discussion of another recent paper, both at large, and in these pages.


1Back-of-the-envelope estimates typically ignore any complications from non-random mating, variation in mutation rate, and so forth — but are quite robust.
2Moreover, the history of human population change has, of course, varied in space (among regional sub-populations), as well as over time. Notably, the new papers suggest that such variation may be fairly minor in the grand scheme, dwarfed by the remarkable overall recent growth. And Keinan and Clark note that sample sizes, in particular, may add roughly as much noise to the picture as do real underlying variables.
3Ultimately, the length of these twigs tracks how long many randomly chosen pairs of extant copies of our chromosomes have descended along separate lines.
4In the end, you likely harbor a dozen or so brand new genetic variants that arose by mutation only in you. But you also likely harbor plenty of other very rare variants that, til we sequence your genome, will have never been spotted in anyone else.
5Note that this doesn’t mean that no one with harmful variants has kids — after all, everyone carries some such variants, and people are breeding just fine. Rather, because a given variant can be inherited independently of other variants in the same genome, and may wreak harm only in combination with another copy of itself (or some other variant), people simply tend to have more kids who inherit more copies of healthier alternative variants than kids who inherit more copies of harmful ones. Moreover, much of the natural selection in question likely happens beyond our view, before pregnancy begins, when unhealthy early embryos fail to implant and thrive in the womb.