Friday, November 30, 2012

Can we or can't we explain common disease?

Rare variants don't explain disease risk
We're still catching up on readings after a long Thanksgiving weekend, so are just getting to last week's Science.  Here's a piece that's of interest -- 'Genetic Influences on Disease Remain Hidden,' Jocelyn Kaiser -- in part because it touches on a subject we often write about here, and in part because it seems to contradict a story getting big press this week, published in this week's Nature.

Kaiser reports from the Human Genetics meetings in San Francisco that finding genes for common disease is proving to be difficult.  GWAS, it turns out, are finding lots of genes with little effect on disease.  This is of course not news, though the Common Variant/Common Disease hypothesis -- the idea that there would be many common alleles that explain common diseases like heart disease and type 2 diabetes -- died far too slowly given what was obvious from the beginning (never with any serious rationale, as some of us had said clearly at the time, we may not-so-humbly add), and the rare variants hypothesis that replaced it is rather inexplicably still gasping.  Or, as Kaiser writes, "...a popular hypothesis in the field—that the general population carries somewhat rare variants that greatly increase or decrease a person's disease risk—is not yet panning out."

Apparently the idea, then, is that there's still hope. Indeed, many geneticists believe that larger samples are the answer.  That is, studies that include tens or hundreds of thousands of individuals, because these will be powerful enough to detect any strong effect rare variants may have on disease, in theory explaining the risk in the center of the graph from the paper, which we reproduce here.  Kaiser cites geneticist Mark McCarthy of the University of Oxford in the United Kingdom: “We're still in the foothills, really. We need larger sample sizes."  Further, he says, "The view that there would be lots of low frequency variants with really big effects does not look to be well supported at the moment." 

Fig from Kaiser. New studies failing to explain the genetics of common disease.  


Even with larger sample sizes, it turns out that some variants are so rare that they're only seen once.  And probably explain only a small proportion of risk anyway, even in that single individual. And certainly can't be used to predict disease. But this doesn't stop geneticists from wanting to increase sample sizes, at this point usually by doing exome sequencing (sequencing all the exons, or protein coding regions) of tens of thousands of people and looking for rare variants with large effects.  Ever hopeful.  McCarthy, a seriously non-disinterested party to any such discussion, is not likely to give up on ever-larger scale operations; that would be research-budget suicide, regardless of the plausibility of the rationales.

Rare variants do explain disease risk
Which brings us to the big news story of the week, a paper in Nature by geneticist Josh Akey et al., described in a News piece by Nidhi Subbaraman in the same journal, 'Past 5,000 years prolific for changes to human genome.'  The idea is that the rapid population growth of the last 5,000 years has resulted in many rare genetic variants, because every generation brings new mutations, and that these are the variants that are most likely to be responsible for disease because they haven't yet been weeded out of the population for being deleterious.

The research group sequenced 15,336 genes from 6,515 European Americans and African Americans and determined the age of the 1,146,401 variants they found.  "The average age across all SNVs was 34,200±900years (±s.d.) in European Americans and 47,600±1,500years in African Americans..."  They estimated that the large majority of the protein-coding, or exonic single nucleotide variants (SNVs) "predicted to be deleterious arose in the past 5,000-10,000 years."  Genes known to be associated with disease had more recent variants than did non-disease genes, and European Americans "had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans..."

They conclude that their "results better delimit the historical details of human protein-coding variation, show the profound effect of recent human history on the burden of deleterious SNVs segregating in contemporary populations, and provide important practical information that can be used to prioritize variants in disease-gene discovery."  Indeed, the proportion of SNVs in genes associated with Mendelian disorders, complex diseases and "essential genes" (those for which mouse knockouts are associated with sterility or death) that were 50,000 to 100,000 years old was higher in European Americans than in African Americans.  The authors propose that this is because these variants are associated with the Out-of-Africa bottleneck as humans migrated into the Middle East and Europe, which "led to less efficient purging of weakly deleterious alleles."

The researchers conclude:
In summary, the spectrum of protein-coding variation is considerably different today compared to what existed as recently as 200 to 400 generations ago. Of the putatively deleterious protein-coding SNVs, 86.4% arose in the last 5,000 to 10,000years, and they are enriched for mutations of large effect as selection has not had sufficient time to purge them from the population. Thus, it seems likely that rare variants have an important role in heritable phenotypic variation, disease susceptibility and adverse drug responses. In principle, our results provide a framework for developing new methods to prioritize potential disease-causing variants in gene-mapping studies.  More generally, the recent dramatic increase in human population size, resulting in a deluge of rare functionally important variation, has important implications for understanding and predicting current and future patterns of human disease and evolution. For example, the increased mutational capacity of recent human populations has led to a larger burden of Mendelian disorders, increased the allelic and genetic heterogeneity of traits, and may have created a new repository of recently arisen advantageous alleles that adaptive evolution will act upon in subsequent generations.
This does seem to contradict the Kaiser piece we mention above, which concludes that rare variants with large effect will not turn out to explain much common disease.  This paper suggests they will -- which we don't think is right, for reasons we write about all the time.  But it does lend support to the idea that the Common Variant/Common Disease hypothesis is dead and buried. 

Serious questions
It is curious, and serious if true, that Africans harbor fewer rare variants than Eurasians.  African populations expanded rapidly since agriculture, just as Eurasians did.  It could be, but seems like rather post-hoc rationalizing, that Africa is more dangerous to live in, even for only mildly harmful variants.  Rapid expansion--the human gene lineages have expanded a million-fold in the last 10,000 years, will lead to many slightly harmful variants being around at low frequency, because slight effects aren't purged by selection as fast as they are generated in an expanding population.

In a sense the deluge has not been of functionally important but rather functionally minimal variants.  Maybe there is something about the raised probability that a person will have a combination of such variants, and the variants could be found by massive samples.  But then their individual effect probably isn't worth the cost of finding them, as a rule.

But where's the nod to complexity?
But, environments change, and genes now considered to be deleterious may not have been so in previous environments, or may even have been beneficial.  And African Americans don't represent a random sample from the entire African continent, as their ancestry is predominantly West African, and SNV patterns are likely to be different in different parts of Africa.  And, numerous studies have found that healthy people carry multiple 'deleterious' alleles, so the idea that 84% of SNVs will lead to disease is probably greatly exaggerated. Geneticists just can't bring themselves to acknowledge that complexity trivializes most individual genetic effects.

The more likely explanation for complex disease continues to be, "It's complex."

15 comments:

  1. Hello Anne

    First sorry for my bad english and for my ignorance, because I'm quite confused here.

    Why?

    Because another study (Tenessen et al 2012) states that African Americans had
    significantly more SNVs per exome than European Americans.

    I'm sure there's something I don't understand because of my ignorance. Can you explain the apparent contradiction of the results of these two studies?

    Thank you in advance!

    Best regards

    ReplyDelete
    Replies
    1. Thanks, Hans. I'll try to clarify. As I understand it, the number of SNVs is indeed higher in African Americans, and not surprisingly, in both of these studies. In fact, both of these studies are reporting on the same data. But the proportion of rare or singleton variants is higher in European Americans, and the explanation is that this is because of the Out-of-Africa bottleneck. And, the Nature paper suggests, the recency of these variants means that a high-ish (14%) of them are potentially deleterious. That is, a higher proportion of rare SNVs in European Americans may be associated with disease.

      I hope that helps! And I hope that's right!

      Delete
    2. Thank you for your response Anne, it really helps me!

      However I have a last question because I thought that SNVs are a kind of rare variants. But apparently they are not exactly the same thing. So as I understand the topic, SNVs are indeed rare but by the term "rare variant" (some) scientists mean "rare and recent" and/or "very rare" (rarer than the average SNV).

      But parhaps I'm entirely wrong.

      Anyway thank you again

      Delete
    3. Glad if I helped! No, single nucleotide variants are simply sites that have been found to vary at any frequency, not only rare variants.

      The definition of 'rare' is itself variable -- 5, 3, 2 or 1%. And some rare variants are seen once and only once. And, age of the variant is not implied by the term, simply that more than 1 nucleotide has been seen at that site at least once. Age is determined separately.

      Delete
  2. Thank you again!

    About the false equivalence "SNV = rare variant" I think I was initially confused by this page:

    «A SNV is a private mutation while a SNP is a mutation that is shared amongst a population.»

    But unfortunately it gets worse, because by reading again the Tenessen's study, I realize that this study really states that African-Americans have more rare SNVs than European Americans see this graph.

    So you really helped me to understand the present Nature's Study but I still see a real contradiction with the Tenessen's study.

    So I think there's still an important thing that I don't understand and perhaps can you also explain me what I don't understand here.

    Naturally I know it's often difficult to explain simply to a non-scientist a complex scientific topic and I don't want to annoy you with my questions (I hope I didn't). So just in case I specify that if you don't have the time to answer me I will not be offended.

    Best regards

    ReplyDelete
    Replies
    1. No problem. I'll try to clarify at least how I see this. Because there's no right answer to the definition of 'rare,' differentiating between SNPs and SNVs based on nucleotide frequency is semantics, not science. And, since African Americans don't represent the entire continent of Africa, it's impossible to know how well or poorly this sample, even as large as it is, represents the frequency of variants, rare or otherwise, in Africa. So, I'd not worry too much about this question, if it were me.

      We know that variation in Africa is greater than in Europe because of the age of the population. We know that new variants arise in every population all the time, presumably at the same frequency. We know that complex diseases are generally polygenic, and responses to environmental provocation, not due to single genes, or variants, rare or otherwise. I think that, therefore, it's not correct to draw the kinds of conclusions being drawn from these studies about susceptibility to disease, and that's the basic point we were trying to make in this post.

      Delete
    2. P.S. You're a nonscientist, Hans? I'm impressed with how much effort you're putting into trying to understand this issue.

      Delete
    3. In fact I’m a student in social anthropology but I’m also strongly interested by life science and biological anthropology.

      Initially I already had an e-mail correspondence with your colleague Ken Weiss on this topic because I was “troubled” by some strange speculative assertions about rare variants on The Internet and as you said this is what you discuss here.

      And indeed I see that there is no scientifically determined about rare variants and even distribution of rare variants between populations and their possible consequences. And I also see that I have a lot to learn in genetics.

      Anyway I am grateful for the answers you have provided me.

      Sincerely

      PS: My real name is not Hans but Bruno, Ken Weiss also know my last name because as you probably already guess I prefer remain as anonymous as possible on the Internet.

      Delete
  3. Its really frustrating to hear 'We need larger sample sizes' as the way forward. I think some of the points in this article talking about the fallacy of big data apply to genomics http://techcrunch.com/2012/11/25/the-big-data-fallacy-data-≠-information-≠-insights/

    ReplyDelete
    Replies
    1. I agree. What we need is innovative ideas, not just more of the same.

      Delete
    2. Hi Anne,

      I'm all in favor of innovative ideas, but I'm not convinced that they'll have much value without simultaneously generating "more of the same" (sequence data). Complex models will, if anything, require even larger sample sizes (along with additional information about environmental variables) than the current simple genetic models. So I just can't see how incorporating "complexity" will magically allow us to make do with less sequence data than we have currently.

      Delete
    3. There is a long history of these issues. Ideas alone can't solve the problem, but massive (expensive) new data gathering may not do it, either.

      Waving terms around, like 'complexity' can be just as fashionable and empty as demanding ever more data to be collected.

      The key is to have focused question and design to determine what kinds of data would be illuminating.

      But if many-to-many causal landscapes are the reality, new conceptual approaches may be needed.

      It is fair to point out, I think, the various reasons that new toys and technology and the grants that go with them are appealing, in an age where we are expected (for better or worse) to keep the mill churning out 'results'. We live in an industrial society and that culture affects (or infects) science as well.

      Look at this another way: if funds become limited, or 'translation' is really the goal, to reduce disease loads, then the funds could be better spent on other things that, perhaps without the techiness or even without understanding complex underlying mechanisms, can achieve results.

      Delete
    4. Daniel, can you explain what larger samples can tell us that we don't already know? Ok, yes, the identification of more rare variants perhaps, but we already don't know what to make of all the rare variants that have already been identified, unless they are associated with Mendelian disease.

      And when do we stop collecting data, since every new meiosis means potentially new deleterious rare variants (not to mention every mitosis)? And, of course larger samples mean increased heterogeneity; how does that help clarify things?

      What we're arguing is that the picture is already pretty clear. We already know that complex diseases are polygenic, that there are many pathways to the same phenotype, that there's gene by environment interaction, whatever that means. Given that reality, what more can larger samples tell us? They won't change the reality.

      And, will larger studies produce information that's useful for prediction at the individual level? We can be pretty sure that the answer to that is no for complex diseases (we've written a lot about that), and we've already got the statistical methods and conceptual understanding to do that for Mendelian diseases, for which we do not need larger and larger samples.

      So, until I understand what I'm missing here, I agree with Ken -- given what I think is the probability of low return from the huge investment sequencing larger samples will yield, the money could be better spent on other things.

      Delete
  4. Well, speaking of rare variants of large effect, deCode came out with a paper in November in NEJM in which they discovered a rare non-syn SNP that has a effect size of nearly 3 to confer risk of obtaining late-onset Alzheimer's. They sequenced 2000 individuals and the variant was segregating at less than 1%. I'm not sure exactly what the demographic history of Iceland is, but would it be more fruitful to focus on sequencing homogenous population isolates around the world?

    ReplyDelete
    Replies
    1. This is the strategy of using isolates or populations with large inbred families as found in some middle east cultures, or Finland (and Iceland, etc.). It's an old and respectable idea.

      The issue is that such variants are likely to be very rare in larger more variable populations. If the variant has a strong effect size, why has it not been found elsewhere? Generally, because it's too rare to show up enough times in GWAS to generate a signal.

      Others advocate looking in families segregating the trait, even in heterogeneous societies, and to find co-segregating genes, essentially using the families as a kind of isolate. This, too, is an old idea.

      I think there is no one answer, and we need to learn what we gain from identifying very rare variants. Advocates of unrestrained omics approaches have various answers, often based on assuming that computational power can make sense of each person's assemblage of variants, etc.

      Others would argue that we should show that knowing a 'real' variant (one with substantial effect) can lead to treatment or solid new understanding of the underlying biology.

      So, to use your word, to be fruitful what is the 'fruit' one wants? And what is the most cost-effective way to do this?

      When environmental causation is far more powerful than genetics, or than a single variant, should the environmental causes be addressed first, saving the exotic genetic approaches for the remaining cases that really are genetic?

      I have my own views, but these are things that need to be decided by the community at large and the public who is paying the bill. Clearly at present technological approaches focusing on genes are ruling. How long that will last is impossible to say, I think.

      Delete