Thursday, March 12, 2015

Simulating complexity and predicting the future

Predicting complex disease is the latest genomics flavor of the day. Or rather, it's the old flavor with a new name -- precision medicine.  So, we were pleased to be alerted to a new paper (H/T Peter Tennant and Mel Bartley; "The mathematical limits of genetic prediction for complex chronic disease," Keyes et al., Journal of Epidemiology and Community Health) that addresses the prediction question by simulating a lot of data to look at how plausible it will be to predict complex disease given the wealth of potentially interacting risk factors that will need to be taken into account .  This question of course is particularly timely, given the new million genomes, precision medicine effort proposed by President Obama and endorsed by the head of the NIH, and many others.

The Crystal Ball, by John William Waterhouse: scrying in crystal; Wikipedia

A few weeks ago, Ken blogged about the advantages of using computer simulation to probe causal connections in genetics and epidemiology (here and here).  Simulations can be valuable because they allow exploration of complexity with known assumptions built in explicitly and hence testably, and because there are no data or measurement errors (unless introduced intentionally and then they're still identifiable).  If the results resemble real data, then one has confidence in the assumptions.  If not, conditions can be changed to explore why not.  Also, things far too complex to be affordably tested in the real world can be simulated, and simulation is fast and inexpensive, as a way to explore the nature of causation in a given context.

Keyes et al. simulate one million populations, with 10,000 individuals per population, to explore the question of how possible, given epistasis (gene interaction), and gene x environment interaction, predicting complex diseases will be. They point out that while genetic and epidemiological studies have been useful for finding correlations between risk factors and disease, they've been less useful for predicting which individuals will develop a given disease.

And, they point out that genome wide association studies (GWAS) have been invaluable for demonstrating that complex diseases are by and large polygenic, and that different subsets of many genes are apparently interacting in individuals who share a disease.  But, they haven't been useful for prediction of complex traits.

But, identifying interacting genes, and gene by gene interaction has proven to be difficult.  Thus, Keyes et al write, "[i]n this paper, we use simulated data of one million separate populations to demonstrate the drivers of the association between a germline genetic risk factor and a disease outcome, drawing observations that have implications for personalised medicine and genetic risk prediction."

They first create a hypothetical disease, one that is caused by a germ line genetic variant and environmental exposure to one or more risk factors.  Risk of disease is higher in those exposed to both than the additive effect in individuals exposed to genetic risk or environmental risk alone.  And, importantly, the disease can also be caused in many other ways.  Keyes et al. varied the rate of genetic exposure, environmental exposure, and background prevalence of disease in each of their simulated populations.

They simulated the enormous number of populations they did in order to accommodate every possible prevalence of the combination of risk factors, from 1 to 100%.  They then compared nine different scenarios of genetic and environmental risk exposure, low, moderate and high, estimating the risk of disease for those with compared with those without the risk allele.
Using simulations that span the range of potential possible prevalences of genes, environmental factor and unrelated factors, we show that the magnitude of both the risk ratio and risk difference [risk of disease to those exposed to the genetic risk factor vs those not exposed] association between a genetic factor and health outcome depends entirely on the prevalence of two factors: (1) the factors that interact with the genetic variant of interest; and (2) the background rate of disease in the population. These results indicate that genetic risk factors can only adequately predict disease in the presence of common interacting factors, suggesting natural limits on the predictive ability of individual common germline genetic factors in preventative medicine.
And, four conclusions.  First, predicting complex disease from genes will continue to be largely unsuccessful, unless the environmental context and gene interactions are understood.  Second, it's when background disease rates are low, and environmental risk factors common that predicting disease from genes is going to be most reliable.  Third, environmental context is important in predicting the effect of genes. And, fourth, non-replicability of many genotype/phenotype studies is likely to be due to differing prevalences of genetic and environmental risk factors in the different study populations.  Trait 'heritability' is context dependent, not an inherent characteristic of the trait itself.

Our simulation program, ForSim, referred to earlier, is a more complex and evolutionarily sophisticated approach that specifies things less explicitly and that can apply to multiple populations and other things.  But if anything, in its simulated results, with causation and variation more realistic, causation will be even less precisely estimable or predictable than in the current paper, whose results are already quite convincing.

We'd just add that while this study is certainly a cautionary tale, the authors don't, in our view, acknowledge the full import of their conclusions.  Every genome is unique and unpredictable, and future environments are unpredictable, even in principle, so that if predicting complex diseases depends on knowing environmental and genomic context, it's not going to be possible.  It may be possible to retrodict complex disease based on understanding past environment and observed genes and genomes, but solving the prediction problem is another question.


James Goetz said...

Hi, This computer simulation project sounds interesting to me. My interest is mostly long-term evolution. For example, assuming the origin of a prokaryotic cell, on average, what could we expect to develop within 4 billion years, 5 billion years, 6 billion years...? I suppose no current computer system could work on that, but perhaps this could shed some light on the past debates of the late Stephen Jay Gould and Simon Conway Morris on the likeliness for the evolutionary emergence of intelligent life on any given planet that forms a prokaryotic cell. I tend to side with Gould while I know of no way to calculate or simulate this.

Ken Weiss said...

One can't simulate this because one would need to know the environments and the 'traits' caused by the simulated genes in relation to the environments, and also the competitors (individuals and species). One could predict the evolution of ecosystems, maybe of predators and prey, of adaptation to available resources and so on. But simulating the specifics might require building in what we already know happened.

James Goetz said...

I find this fascinating, but how could one predict the evolution of a predator and prey?

I feel stumped at what I suppose is the first step. I think that the first step is predicting possible new phenotypes from possible outcomes of independent assortment and mutations (point substitutions, indels, chromosome aberrations). I find this mind boggling to consider for one or two sexually reproductive species. (Of course, I do not know nearly as much about this as you two.)

I suppose that the next step is simulating the eventual fixation or extinction of these phenotypes while considering the factors that are mainly the environment and the frequency of repetition if any for the respective mutations.

Moreover, these two steps are just one degree of adaptation in evolutionary development that might eventually result in complexity.

Jeff Walker said...

James: The simulation discussed in this post really has nothing to do with the kinds of simulation that you seem to be interested in. I would suggest that if you are truly interested in simulation models applied to different aspects of evolutionary biology, you start with a *big* cup of coffee and settle in with google scholar. You will have many years of reading ahead of you. But it is really hard to understand these simulations without getting your hands dirty so I would suggest that you also become intimately familiar with programming and try creating simulations your self. Python is free and there is lots of online help.

Ken Weiss said...

My own experience is that I have tried to simulate things from the most generic point of view that seemed practicable, being as little prescriptive as I could, making assumptions as generic as seemed feasible.

I have typically learned the most when things did not look as I expected. On a few occasions this reflected a bug in the program,but most of the time it was a 'bug' in my thinking.

Simulation that raises surprises may be the most informative of all. If it just shows what you expected, in some ways it means you have built it in to the approach.

Secondly, simulation should suggest what to look at empirically or how to look in ways not typically already done.

Jeff Walker said...

Yes! Simulations are excellent tools to discover bugs in one's thinking! I have started to teach simple simulation strategies to students in biology to test their understanding of simple statistical (correlations, causal effects, etc.), evolutionary (random drift, selection, etc.), and physiological (diffusion) concepts.

As for James: I would also suggest reading Erik Kandel's memoir. It has nothing to do with computer simulations but his entire (Nobel prize winning) research program started when he approached a mentor early on and wanted to design experiments to study the neurophysiology of psychotherapy! His mentor replied with something like "huh, I think you need to start with something much much simpler". And Kandel did - starting out with the very simplest experiments of the neurophysiology of conditioning in the response to a startle stimulus in Aplysia. This was an exceptionally simple experimental model of memory!

Anonymous said...

Great post! We need more of these kind of studies.

Maricarda said...

Students use their word knowledge to predict how unfamiliar words might contribute to different story elements.

James Goetz said...

Hi Jeff Walker, Sorry for the delay in my reply. I'm currently working on some philosophic research about future contingents and counterfactuals other than evolutionary development. I probably will at some point do more analysis of evolution, but I doubt that computer simulations could handle the theories that I am developing. But nonetheless Ken's and Anne's work on simulations does tie into the philosophy or future contingents. Peace, Jim