|The Crystal Ball, by John William Waterhouse: scrying in crystal; Wikipedia|
A few weeks ago, Ken blogged about the advantages of using computer simulation to probe causal connections in genetics and epidemiology (here and here). Simulations can be valuable because they allow exploration of complexity with known assumptions built in explicitly and hence testably, and because there are no data or measurement errors (unless introduced intentionally and then they're still identifiable). If the results resemble real data, then one has confidence in the assumptions. If not, conditions can be changed to explore why not. Also, things far too complex to be affordably tested in the real world can be simulated, and simulation is fast and inexpensive, as a way to explore the nature of causation in a given context.
Keyes et al. simulate one million populations, with 10,000 individuals per population, to explore the question of how possible, given epistasis (gene interaction), and gene x environment interaction, predicting complex diseases will be. They point out that while genetic and epidemiological studies have been useful for finding correlations between risk factors and disease, they've been less useful for predicting which individuals will develop a given disease.
And, they point out that genome wide association studies (GWAS) have been invaluable for demonstrating that complex diseases are by and large polygenic, and that different subsets of many genes are apparently interacting in individuals who share a disease. But, they haven't been useful for prediction of complex traits.
But, identifying interacting genes, and gene by gene interaction has proven to be difficult. Thus, Keyes et al write, "[i]n this paper, we use simulated data of one million separate populations to demonstrate the drivers of the association between a germline genetic risk factor and a disease outcome, drawing observations that have implications for personalised medicine and genetic risk prediction."
They first create a hypothetical disease, one that is caused by a germ line genetic variant and environmental exposure to one or more risk factors. Risk of disease is higher in those exposed to both than the additive effect in individuals exposed to genetic risk or environmental risk alone. And, importantly, the disease can also be caused in many other ways. Keyes et al. varied the rate of genetic exposure, environmental exposure, and background prevalence of disease in each of their simulated populations.
They simulated the enormous number of populations they did in order to accommodate every possible prevalence of the combination of risk factors, from 1 to 100%. They then compared nine different scenarios of genetic and environmental risk exposure, low, moderate and high, estimating the risk of disease for those with compared with those without the risk allele.
Using simulations that span the range of potential possible prevalences of genes, environmental factor and unrelated factors, we show that the magnitude of both the risk ratio and risk difference [risk of disease to those exposed to the genetic risk factor vs those not exposed] association between a genetic factor and health outcome depends entirely on the prevalence of two factors: (1) the factors that interact with the genetic variant of interest; and (2) the background rate of disease in the population. These results indicate that genetic risk factors can only adequately predict disease in the presence of common interacting factors, suggesting natural limits on the predictive ability of individual common germline genetic factors in preventative medicine.And, four conclusions. First, predicting complex disease from genes will continue to be largely unsuccessful, unless the environmental context and gene interactions are understood. Second, it's when background disease rates are low, and environmental risk factors common that predicting disease from genes is going to be most reliable. Third, environmental context is important in predicting the effect of genes. And, fourth, non-replicability of many genotype/phenotype studies is likely to be due to differing prevalences of genetic and environmental risk factors in the different study populations. Trait 'heritability' is context dependent, not an inherent characteristic of the trait itself.
Our simulation program, ForSim, referred to earlier, is a more complex and evolutionarily sophisticated approach that specifies things less explicitly and that can apply to multiple populations and other things. But if anything, in its simulated results, with causation and variation more realistic, causation will be even less precisely estimable or predictable than in the current paper, whose results are already quite convincing.
We'd just add that while this study is certainly a cautionary tale, the authors don't, in our view, acknowledge the full import of their conclusions. Every genome is unique and unpredictable, and future environments are unpredictable, even in principle, so that if predicting complex diseases depends on knowing environmental and genomic context, it's not going to be possible. It may be possible to retrodict complex disease based on understanding past environment and observed genes and genomes, but solving the prediction problem is another question.