Tuesday, August 6, 2013

If I'm healthy, why should I have my genome sequenced?

Seriously, why?  Many people have embraced direct-to-consumer (DTC) genotyping, or whole genome sequencing, for reasons that we admit we don't understand.  But we are clearly missing something.  Do people believe that as a general statement future disease is truly predictable from their genome?  We think that most geneticists at least, would say not.

Just to be clear, here we're referring to diseases the person doesn't yet have.  If one already has some disease, the usefulness, if any, of genome testing would be those instances where specific causal variants are known to respond to specific kinds of treatment.  This, however, is only a small minority of cases, in which the causal variants have relatively clear, strong, and consistent effect. If this isn't the case, why do people do it?  In today's post we lay out our view, hoping it might elicit insights from people who see it differently.

Disease risk prediction
We'll start with GWAS (genomewide association studies), the most common method these days for looking for causal genes.  A few recent exchanges on Twitter make it obvious that what people think about the success of GWAS is a glass half full, glass half empty kind of thing.  Everyone agrees that genomewide association studies have not explained much variation in most traits, but that's where the agreement ends.  Supporters say it doesn't matter because GWAS are teaching us a lot about causal pathways, and have found thousands of replicated signals for complex disease, even if with small effect. Detractors say GWAS are an expensive way to gain very little, and if the point is to be able to predict disease, they can't get us there.  Or a more sanguine view, the glass half full and half empty view, which we hold, is that GWAS have very successfully revealed the general shape of genomic effects on traits -- but did so years ago and we need not continue to expand and increase the same approach just to identify ever-more-miniscule effects.

Dr Muin Khoury at the CDC in a recent blog post about the potential public health impact of GWAS, notes the glass half full/half empty quality of the endeavor as well.  These studies, he writes, have produced massive amounts of data, but with little application to public health as of yet.  Though he cites a recent paper by Teri Manolio reporting that specific applications are beginning to be seen, he cautions that it will be decades before the full benefit will be felt.

But is it possible?
Of course, effect sizes will generally need to be larger.  Further, such time and size estimates depend on extrapolation from what is known today, and may be wildly inaccurate if some deeper insight about genomic causation comes along.  From our point of view, what we know today does not suggest a high payoff from continuing business as usual.  And we have to be especially circumspect about messages from on high, that is, from heavily funded investigators or, even moreso, from NIH staffers (like Teri) who are fine people but who fund the work and hence have a very clear if not unavoidable interest in touting its results.

That said, we think it's still fair to say that many people, including human geneticists, agree that we're far from being able to accurately predict complex disease -- from GWAS or anything else, including whole genome sequencing (WGS).  Even so, many people look forward to the day when newborns leave the hospital with their genome on a chip.  The drumbeat for personalized genomic medicine, backed by administrative decisions to push much of the research funding toward that promise, is not trivial.  This is curious, given that a major lesson of the genome era is that environment is a huge factor in the risk of common diseases that fell us, and yet future environments are inherently unpredictable. We've also learned that there are multiple genetic pathways to many traits.

Risk is elusive
And, there are methodological issues with risk prediction from direct-to-consumer companies.  A new paper in Genetics in Medicine ("Variations in predicted risks in personal genome testing for common complex diseases", Kalf et al.) reports a comparison of disease prediction from three DTC companies.  The authors found substantial differences in predicted risk estimates of specific diseases because the companies use different SNPs, different average population risk estimates, and different formulas for calculating risk.  Indeed, average population risk estimates can change with every new study because calculated risk is never the same in different population samples. Is anyone 'right' here?  Is everyone 'wrong' and if so, to a measurable or knowable extent?

To date, the best predictor of future disease is family history, and that's because if a disease follows Mendelian patterns of inheritance or risk levels are correlated among close family members, and thus you know that risk genotypes are common in your family, you can infer that you are at higher risk without knowing specifically which one or ten or hundreds of genomic variants are responsible.   That is, we don't need GWAS for these diseases.

For the clearer, single-gene caused diseases, we already have an informed medical system, with professional genetic counselors and physicians, that has sorted these out long ago and has long been set up to use various kinds of data to provide very important and useful advice.  Which is not to say that the system is infallible for people with such diseases at all.  Yes, if the cause of the disease or disorder hasn't yet been identified, WGS or WES may be helpful for doing so, though certainly not always, and as it's the very rare Mendelian disorders that remain unexplained, the search can be complex -- the causal variant may be due to somatic mutation, or the cause may be multiple interacting genes, or in a regulatory region, not protein coding.

In fact, it may not be widely understood that GWAS works because of inheritance and is essentially itself a kind of family data -- but one with unknown, very deep family connections.  It has some advantages in that respect, because if cases are only distantly related, they may only share narrow chromosome regions (at the causal genes), whereas close relatives share huge fractions of their genomes.  By contrast, if the disease is genetically caused, unaffected controls will be less closely related to each other than are the cases.

In this sense, GWAS removes the close-relationship 'noise' of shared variation. The problem is that when traits are caused by many different genes, family members may have simpler sets of causal genes than random sets of cases and controls.  So there are statistical issues at play here.  Nonetheless, for the important, common complex diseases, family data are generally more informative than GWAS kinds of data.  [There are other issues too much to go into here, such as the guesspothesis that common diseases are caused by very rare genetic variants that might be found in genome sequence data in families.]

So, as we see it, GWAS -- or any other way to identify genetic causation -- won't be very useful for predicting common complex diseases in individuals, or at least only rarely.  Again, environment is the often intractable but most important wild-card. So, why are currently healthy people interested in having their genomes typed or sequenced? If you've done it, we'd love to know. 

18 comments:

Anonymous said...

I only have 23andMe genotype data, not sequencing, but would also pay a small fee for the latter. I agree genomic prediction using common alleles is not well powered at the moment (and in many cases, in principle) to give answers that would motivate me to take any action. It's mostly for curiosity, some current utility, and some investment to potential future utility that I got myself typed.

1) Curiosity about ancestry. With reference panels from all around the world, individual haplotypes can be traced to founder populations for an illuminating picture of "where I come from", while birth records only go back a century or two.
2) Actionable large effect alleles (drug dosing, BRCA). Recently, a family member had trouble during surgery due to increased warfarin sensitivity that could have been prevented (or at least the surgeons notified) with this information at hand.
3) Carrier status for severe Mendelian diseases. When planning kids, we could double check whether conditions and alleles not covered by common tests (or tests not available in our country) have potential to yield compound heterozygosity that we should test for.
4) Cumulative gain of knowledge over time. There has only been about a decade of sequencing and array powered genomic discovery, I am sure the utility of genotyping data will increase with time.
5) Contributing my genome data to understanding. Some heritable traits are quaintly interesting (e.g. detached earlobes), but should not be spent public money on for mapping. For some large reference panels (like 23andMe is amassing), gathering information on such traits is cheap, and my data can help in the mapping.

Ken Weiss said...

These are some good reasons for genomic data collection and use. Personally, I think one cannot trust, at all, what private interests like '23' will do with the data.

If there were not some real epistemic issues that are being largely swept under the rug, and if/when whole genome sequencing etc becomes truly affordable on a large scale, then developing whole-population registries would have a lot of appeal.

But the data (made confidential, to the extent possible) should be made public, and collected by personally disinterested government staff. That is, not a huge boondoggle for some university professors, and public immediately, not after a moratorium that let some investigators mine it for publications.

The real sticking points are the idea that DNA sequence is all you inherit that's relevant, that the 'constitutive' genotype (from saliva, cheek swab, blood sample) is your only genotype (see some last-week posts on MT if you missed them), that tissue-specific gene expression is what counts, and that non-genetic (environmental, whatever that may include) variables are not adequately measured or, in many instances, measurable or predictable.

Given these issues, I personally think that exhaustive data collection is not the optimum way to invest science resources, especially in regard to public health

For your point #3, issues about Mendelian disease are also much more subtle than widely realized, but genetic counselors already can do that, and competently and not-for-profit (i.e., no ulterior motive).

We don't really need the continued GWASing of everything that moves to address your point #2.

Whether this is cumulative knowledge or basically vaporware is debatable, and I think it is far from obvious that the knowledge is very valuable or, after a few years, would even be looked at except by historians. Time will tell, on all of these fronts.

But not to worry: for those, like you and many others, who like this trend in events, it is a reality that isn't going to be slowed. After all, '23' has begun a major advertising campaign, apparently playing on fear (of disease). So this business is booming, regardless of critics or skeptics.

Anne Buchanan said...

Thank you. Clearly, these are compelling reasons for many people. Of all your reasons, to me, curiosity is the most appealing. But the answers to any questions I'd wonder about, including ancestry, would be so ephemeral that, to me, it's not a good enough reason.

An important thing we forgot to mention in the post is that not only are future environments unpredictable, the fact that everyone's genome is unique makes predicting the effect of a given mutation difficult, given that genomic background has an effect. And then if you factor in the unpredictability of future environmental exposures, it becomes let's call it a crap shoot to predict disease (some variants are more predictive, it's true, but these tend to be for rare diseases). And that's true even when everyone doing the predicting is using the same protocol, which is currently not the case.

Holly Dunsworth said...

And how does anyone know whether their risk estimate of 37% for heart disease is accurate? And that, therefore, they spent their money wisely? This is why paying for 23andMe to get their "health reports" is compared to paying for a fortune from a fortune-teller with a crystal ball.

Ken Weiss said...

There are (at least) two important issues. First, is the risk estimate accurate and is there a way to know?

Second, how does my individual risk relate to the group risk which is what predictions are essentially about, despite what they say (we have an earlier series of posts about the nature of such statistical estimates and their assumptions about causality).

In some cases, public health risks (group risks) are very useful. Fluoridated water or iodized salt, perhaps vitamin D in milk are examples. Everyone, or at least most by far, benefit in similar ways. Risk of disease in the absence of these measures can at least be estimated with some reliability.

But we have little way to know how these kinds of risk, based on collections of specific measured factors (alleles across the genome) estimated from cases and controls or similar data, apply to individuals.

One way to think about it is to ask what is the variance around the mean risk estimate? Is it narrow, in which case you're close to the group mean risk, or is it wide, in which case the risk estimate is rather meaningless. For many reasons, even this is only a heuristic way to think, because the methods of obtaining data and estimating risk introduce this 'variance' in ways that have to do with the analysis as much as with the underlying causal reality (we tried to deal with these issues in those earlier posts)

Anne Buchanan said...

Indeed. Even more perplexing, your risk might be 37% today, but 27% or 47% next week, because new studies have reported new estimates.

But, risk is elusive and ephemeral for other reasons. As Muin Khoury points out here, "below average" and "above average" risk are only relative concepts.

"The meaning of “above average” and “below average” life time risks of disease is still based on incomplete scientific data and needs to be interpreted in the context of how common the disease is and the person’s other risk factors. In other words, a “below average” risk of heart disease will still mean a high risk of heart disease since heart disease is very common, whereas an “above average” risk of multiple sclerosis will still mean a low risk since the disease is much less common than heart disease."

Ken Weiss said...

If anybody really wants truth, it's worse than what you say, Anne. That's because if your actual risk (largely unknown if not unknowable) for disease X is changed--either because the estimate changes or because the biological facts change due to circumstances, like diet or taking statins, then your risk of something else actually RISES! You have to get something, after all!

But there is little way at present to know what that something will be. And, if it is delayed you may have longer to life, but will those added months or years be worth it, or will it just mean more debilitated end-of-life?

Nobody wants to talk about these hard realities.

Mark Wanner said...

Holly's comment made me laugh. I give tours/talks about genomics and genomic medicine, which personally I find very compelling. Nonetheless I say that I would support anyone's decision to get genetic information from a DTC company only if they take the view that it's likely going to yield entertainment for the most part, not useful understanding. The predictive power is about as good as you'll get from a phone-in psychic at this point.

I am about to get WGS for myself for a couple of reasons though. First is simple curiosity, as mentioned by the first commenter. Second is to add my data to the pool (I'm in PGP) in hopes that if we can sequence millions and one day figure out how to share and manage the data, not to mention all the ELSI stuff, we'll find out some really useful things along the way. Still probabilistic and not predictive most likely, but useful nonetheless.

Ken Weiss said...

To me the question is rather a political or ethical one. Is this going to lead to abuse of our notions of confidentiality and equity, or to discrimination and profiteering at our expense.

And, is there a better way to spend limited public (or even private) funds? Personally, I believe that we know of many problems that are clear, focused, and should be amenable to proper science. There are all sorts of life-spoiling diseases with a clear genetic basis, for example. People do work on them, but rather than all the flailing about by the omics world, I think we should stop that little-sense and focus intensely on the known, hopefully tractable problems.

This also would serve as proof of principle for extending the same kinds of effort to less clear problems. We have a few successes, but nowhere near what we should if current investment policies were about general well-being rather than well-being for professors, university administrations, and genome tech companies.

Plus, we already know very well that non-technical adjustments for most of the diseases in question would have far, far greater impact on actual people's health than the technical stuff we're up to.

Engineering society is probably harder to do than engineering genes, which we're already hard-pressed to do very well. But at least lifestyle changes should cost less and do more. If only we had enough cogent research to know how....

Anne Buchanan said...

Thanks, Mark! Genomic medicine is a huge field, and I totally accept that there are aspects that are very useful (genotyping tumors is a big one, e.g., genotyping to determine useful therapies for a number of diseases, etc.). People often say what you do, essentially that massive amounts of sequence data will lead us to understand what it all means. But it seems that, to date, the more we know the *more* complex everything is rather than less. I suspect we already know the answer -- life is complex.

Mark Wanner said...

I'm in complete agreement that, as a society, we could reap more and faster benefit from behavior change. Just look at tobacco. But large-scale changes in the population don't come easy, and disease prevalence can't be eliminated, just reduced, so we need better medicine regardless.

It's good to question how we spend our funds, and I'm sure it's easy to find misapplied funding in the biomedical field. Nonetheless, the life-spoiling diseases with a clear genetic basis, even single-gene diseases, have largely proven non-amenable to clear, focused research using our prior methods and technologies. We need more power to better understand the biology, and genomics gives us a valid starting point IMO, although it's the downstream research--RNA and proteins--that may well give us more of the medical solutions we seek.

I'll admit, my perspective is influenced by a persistent optimism that we can make hay from the massive data haystacks we obtain from omics research. That said, I don't think we'll understand it all, just enough to substantively improve the delivery of medicine to individual patients over time. YMMV, but to me that's worth the investment.

Ken Weiss said...

I basically agree, but we have an honest disagreement about the relative worth of different directions science should take.

If the clear-cut diseases haven't yielded, then either we should focus on doing something to change that, or decide that we need some very different approach. Chasing down the multitude of individually unique minor contributing factors doesn't seem to me to be right, and I personally think mainly reflects inertia, and vested interests, that mitigate against more measured consideration of what should and can be done.

So, I guess I just come down more on the pessimistic side of the Big Data issues. On the other hand, I think technology is very powerful, so I'm more optimistic about the eventual ability to engineer some solutions to the things that really are meaningfully genetic.

Anonymous said...

A couple of points in response. I won't focus on prediction much, as its issues have been well covered in your posts and comments, and I don't feel qualified to opine on science policy/funding, as I don't know the current structure well enough.

Private vs. public money. I answered why I would (and did) get myself genotyped. I wholeheartedly agree that genomic prediction does not currently give medically useful information for majority of the traits, and do not argue (here) that a lot more genotyping should be done with public money. But for my private hobby, it has been worth it for the fun, and finding some long lost second cousins in far countries.

"GWASing everything that moves". I think that if the genotype data have been collected with appropriate consent, it would be silly _not_ to map all the measured traits. GWAS do have the potential of uncovering actionable alleles, and furthering our understanding of the underlying biology.

Privacy issues. I agree these are real. I only understood the real implications of 23andMe terms and conditions after receiving the data. In short, they are free to sell my anonymised data on to drug companies. And given recent internet surveillance revelations, I have no doubt the US government could also obtain them. I am not happy about either.

I don't think somatic mosaicism and tissue-specific expression would be a major factor in decisions to genotype, and do believe genetic counselors would be able to give better advice if they had all the information (all the sequence data) available.

All in all - time will tell, as has been a conclusion above and below. By the way, while it's my first time commenting, I've been enjoying your thorough, thoughtful posts for a while now; thank you for the content!

Ken Weiss said...

There are just differences of view at play here. If data were free to collect and made openly available (but confidentiality somehow preserved!), then clearly there will be things to find there, and some--the clear-cut risks--would be identified (as we've long had methods to do, actually).

Things like somatic mutation and gene-environmental interactions would not, nor would the changing landscape of how we phenotype individuals, and so on. So the data would be of limited, and over time greatly diminishing value.

Right now, to me, a major issue is the spending of public funds on diminishing returns, when we have many clear, even clearly genetic, traits we should be investing a full fusillade of resources to do something about, rather than toying around with countless trivially weak, ephemeral factors.

But, of course, it's just my view!
And thanks for the final compliment. Trying to be thoughtful and thought-provoking is our objective!

Anne Buchanan said...

Thanks very much for your thoughtful replies (and your kind comments re. the blog!). I respect and appreciate your reasons for doing 23andMe. Of the reasons people have offered for doing sequencing, the one I most get is curiosity. And, yes, time will tell.

Eric Turkheimer said...

Hi, I wanted to let you know how happy I am to have discovered this blog, and this seems to be as good an entry point as any. Although I have spent a lifetime thinking about the role of genetics in the genesis of complex human behavior and am currently the past-President of the Behavior Genetics Association, I often find myself in disagreement with my colleagues about reductionistic causal models in gene-behavior relations.

If I can be forgiven for plugging one of my own papers here, you might be interested in:

http://people.virginia.edu/~ent3c/papers2/Turkheimer%20GWAS%20EWAS%20Final.pdf

Some other things I have written will be cited there.

Here is my standard argument about the limitations of GWAS. Suppose you are given a stack of DVDs with movies on them, and a microscope. You are told to examine the pattern of dots or whatever through the microscope. Your task is to figure out on this basis whether the movie is a drama or a comedy.

My conclusions:

1) No one is denying that one way or another all the information about the movie is encoded on the DVD.

2) Nevertheless, it won't work.

3) Because the microscope does not encompass the developmental model via which the data on the disc gets turned into a movie.

4) Sample size is not the issue.

5) Despite everything, you will still get some hits. That is, if you have enough DVDs, sooner or later you would find some location on the disc whose state was correlated with drama v. comedy at some level of statistical significance.

Anyway, thanks again for the blog.

Eric Turkheimer

Anne Buchanan said...

Thanks very much, Erik, and you are certainly forgiven for plugging your own paper, especially since it begins with a section titled "GWAS and Its Discontents"! I look forward to reading the whole thing.

And, yep, concur with your model and conclusions.

Eric Turkheimer said...

Thanks. Feel free to post comments and criticisms here when you have them.