Tuesday, October 16, 2012


A new report out of 23andMe suggests that one can learn more about risk for common traits from relatives than from individual DNA sequences.  The fact is about as surprising as a sunrise, and has been known for a long time (as has the reason for it).  Acknowledging this sounds more than typically forthcoming from this company, but they do put their own spin on it, saying the combination of family history and the kind of genetic risk estimation they sell is best.  Family history predicts common diseases and risk estimates the rare ones. 

The reason that family history is so predictive is quite simple: family history integrates all your relatives' genetic variation to reveal, at least within their respective environmental contexts, the net effect of that variation.  You inherit half your variation from each parent (given some reasonable assumptions), so the net risk experienced by your parents is much greater than that likely to be predicted from a single variant in a single gene in your DNA sequence.

Several papers have shown this in various ways.  A clever one a few years ago noted that Francis Galton's family correlations from Darwin's time (he was a cousin of Darwin's) provides a better prediction than modern genotype sequencing.

We might be expected to revel in this confession by 23andLess, since we have criticized this whole endeavor of personalized genomic medicine as a bit of snake-oil selling.  Of course, there are countless variants that for some generally rare, mostly pediatric traits are highly predictive.  Companies can provide this information if you have these variants, but even then unless they are recessive in their effects, if you have the gene you'd already have the disease or trait.  For decades there have been honorable practitioners, called genetic counselors, who worked within medical schools generally and were carefully licensed to do this.  They identify known genetic risks and advise about recurrence risks and the like.  But they were professionals, working with physicians, not hustling to the public for corporate profit.

At the same time, let's step back and ask about the idea of family risk.  If that's based on genetic variation (that is, if you filter out environmental effects), then certainly your DNA sequence would contain the variants involved!  That means that, properly done, sequencing should be able to identify the variants that account for your resemblance to your parents. But studies to date have shown that variants identified by GWAS and other approaches only account for a small fraction of the parent-offspring correlation. That is the gist of the current report, stated in another way.   The issue, then, is how to identify what's causal and what isn't, among the huge amount of DNA sequence you share with any individual relative.

Various authors have made suggestions. Some say the problem is that each of us is affected by one or more very rare variants and even if you carry the same from your parent, finding it in a sea of sequence data is nigh impossible.  Authors favoring the rare-variant idea are trying their best to devise such methods.  One is to look at multiple close, affected relatives and winnow down the shared amount to the part that's actually causal.  Another prominent group is trying to argue that it is not the sum of individual genetic variants that determines risk but that interactions among variants is what contributes to risk.  This is a statistical nightmare to work out, but if it were true it would mean that we really have already identified the variants in question, but not the way they interact.

The most likely truth at this stage is that such common traits like heart disease or how tall or heavy you are, are determined by a very large number of genes, mostly with individually very small effects.  Each person with the 'same' trait--each diabetic, say--has that trait for a different genetic reason.  Individual genetic variants may be causal contributors, but they are not very important.

If this is so, and we could find a way to document the individual effects, we could use each person's genome sequence to tally up all their particular set of risk variants and compute their risk, even if specific approaches usually couldn't be tailored to his/her specific risk set.  In essence, we would gain nothing but a usually false sense of precision by doing so.  We'd be just as well off, in practice, to look at family history (or even more so, directly relevant risk traits like glucose levels, obesity, etc.).  And this is under the assumption that we know of or don't have to worry about, environmental effects.  That is a huge can of worms that everyone is just conveniently ignoring.

The same, by the way, applies to attempts to identify or characterize traits in terms of genes responsible for their adaptive evolution--and for essentially the same reasons.


  1. It's hard not to see it through the eyes of a conspiracy theorists. 23andMe customers aren't doing family health researching as thoroughly, and sharing their family histories as frequently, as needed to boost the company's discovery pipeline. Giving the customers a stronger reason to be through and generous with their family data just might strengthen it....

    1. I'm not sure I understand your point. More information might be better, of course. But what would we require for validation of what a customer happens to know or chooses to report?

      The idea that just family history is better than specific genotypes is itself important, but of course family history needs to be known reliably....and many things like diagnostic efficacy, accurate recording, etc. can make a huge difference.

      I would not be a conspiracy theorist, but I would say that commercial self-interest is not (in my personal opinion) and good thing for health research of this type.

      Even rare variants might not show inheritance patterns we could recognize, since each person's trait is determined by many different variants. All that each sib, say, needs to inherit is some combination that puts them at risk. A parent (or parents) at high risk and with the trait could pass on different subsets of his/her/their risk variants and lead to elevated risk in different offspring.

      Telling people that if they just report their family history carefully, all will be all right, will be a step towards better data, but won't solve that problem.

      It's a problem no mater how you look at it.

      The Google approach to the world is to collect huge--very huge--amounts of data, and computer analysis will find the pattern.

      This is 21st century inductionism at its height. But if causation is complex, it many not lead to the kind of simple, reliable prediction that is being promised, explicitly or implicitly.