There's a thought-provoking
article, "Epidemiology, epigenetics and the 'Gloomy Prospect': embracing randomness in population health research and practice," in the August issue of the
International Journal of Epidemiology (IJE) by George Davey Smith, one of the smartest, most thoughtful -- not to mention prolific -- people in the field of epidemiology these days. He's in the School of Social and Community Medicine at the University of Bristol in the UK, and the paper is the published version of the 2011 IEA John Snow Lecture which he gave at the World Conference of Epidemiology in Edinburgh this past summer. In the paper he addresses some of the same issues of causation that we often blog about (most recently
here) and publish elsewhere (e.g.
here, in
Genetics, and
here, in the IJE), and in doing so he touches on what we think is one of the most overlooked actors in much of life; randomness, or chance.
George specifically addresses epidemiology, the field of public health that has to do with the understanding of patterns of disease, ideally so that public health measures can be instituted to prevent disease outbreaks. But, his points are equally applicable to many other areas, certainly including genetics.
Epidemiology is a field that uses population-level data to understand disease in aggregate. This is how risk factors like smoking are discovered, and how events such as food poisoning epidemics, or outbreaks of cholera are explained. And the field has a long history of success in explaining many disease outbreaks, and identifying many significant risk factors for which public health measures (clean water, e.g; or anti-smoking campaigns) have been implemented.
This is all well and good, and perhaps useful to policy makers. But, as with genetic studies, the amount of variation in risk that's explained by epidemiological studies is often small, and the usefulness of the population-level approach is limited when it comes to predicting outcomes for
individuals, for them or their doctors to optimize their chances of avoiding nasty diseases, and this is the subject of Davey Smith's paper. He uses
Winnie Langley as his example; she smoked for 95 years -- why didn't she get lung cancer? (The actual provenance of these photos isn't clear because they're all over the web, but we got this one
here, and the one below
here.)
The purpose of epidemiology, as a branch of public health, is to identify causes of disease that can be eliminated or attenuated, to prevent disease. This is a lot easier when the causes have major effects. Indeed, epidemiology, like genetics, is most successful at dealing with causes with large effects such as infectious agents, cigarettes, or obesity, the equivalent to genes for diseases such as cystic fibrosis or Tay-Sachs or the periodic paralyses. Though, a major difference is that clearly genetic diseases are much rarer than diseases with widespread environmental causes. But the point is the same -- current methods in both fields are much better at finding causes that pack a wallop. Even those, such as dietary salt or cholesterol are not as straightforward as their public image.
Can the risk factors that epidemiologists or geneticists do identify be translated into predicting who will or will not get sick? Not definitively in either case, although some rare alleles, such as for Huntington's or PKU come close. In general, however, the answer is no -- despite what direct-to-consumer genetic testing companies would like to sell you. At least, the probabilities are usuallly low, and the estimates of those probabilities not very stable or precise, since many factors including changeable environmental exposures affect what a given genotype may do. We've written a lot about why what we know about evolution means this must be true, and after much discussion in his paper of why this is so in epidemiology, Davey Smith makes the same point.
Most epidemiological research, as genetic research, however, is based on the belief that if we just identify more risk factors/genes, we'll be able to account for enough of the variance in risk of our favorite disease that we
will be able to predict who will get it. Genetic epidemiology, 'life course epidemiology', social epidemiology, and so on, are all attempts to expand the universe of risk factors such that eventually the field captures them all, from the uterine environment to old age.
But, as Davey Smith points out -- and we think it's fair to say, as we've pointed out numerous times over many years ourselves -- there is much too much randomness in life to ever reach this goal, even assuming all those replicable risk factors people are now looking for could be found.
The chance events that contribute to disease aetiology can be analysed at many levels, from the social to the molecular. Consider Winnie; why has she managed to smoke for 93 years without developing lung cancer? Perhaps her genotype is particularly resilient in this regard? Or perhaps many years ago the postman called at one particular minute rather than another, and when she opened the door a blast of wind caused Winnie to cough, and through this dislodge a metaplastic cell from her alveoli? Individual biographies would involve a multitude of such events, and even the most enthusiastic lifecourse epidemiologist could not hope to capture them. Perhaps chance is an under-appreciated contributor to the epidemiology of disease.
He nicely dismantles the idea that siblings' shared environments will be a major clue to risk of most diseases, because, for one thing, it turns out that we share about as much with our siblings as we do with people who grow up in other households. In large part this is because chance or stochastic events are much larger components of what happens to us than generally assumed. Current methods tend to allow for statistical noise, but not for the essential role that chance plays in our lives, from the cellular level on up. This has long been known, but scant attention has been paid to it by the reductionist sciences that epidemiology and genetics are.
Davey Smith points out that epigenetics is the current fad, based on the hope that by finding epigenetic mechanisms we'll soon be able to explain what now just looks like chance, but that this is a false hope. He makes further points in this long paper, including offering an evolutionary explanation for the centrality of chance in life (it's advantageous to have a variable genotype given that environments are changeable), and so on.
Davey Smith concludes that the purpose of epidemiology after all is not to predict the fate of individuals but to provide population-level statistics.
For our purposes, it is immaterial whether there is true ontological indeterminacy—that events occur for which there is no immediate cause—or whether there is merely epistemological indeterminacy: that each and every aspect of life (from every single one of Winnie’s coughs down to each apparently stochastic subcellular molecular event) cannot be documented and known in an epidemiological context. Luckily, epidemiology is a group rather than individual level discipline, and it is at this level that knowledge is sought; thus averages are what we collect and estimate, even when using apparently individual-level data.
The point of the discipline is to "provide simple, understandable and statistically tractable higher-order regularities".
We're with George up to this point. Indeed, when epidemiology can point to causes that public health measures can deal with (clean water, window screens, vaccination campaigns) -- that is, population-level causes that are amenable to population-level controls -- it has done its job, and done it well. But why hasn't environmental epidemiology explained the asthma epidemic satisfactorily? Even with population-level data. And why don't the large population-level studies of hormone replacement therapy, or calcium and vitamin D population yield the same results? Again, this is equivalent to the failings of GWAS (genome wide association studies). And who can predict heart disease in the future when so many cultural changes, involving the dynamics of lifetime exposures to risk factors known and unknown?
Part of the problem is that main effects can differ among populations -- even assuming what a 'population' is and how one defines and samples it, and that the population-specific effect is not due to changeable population-specific environments. The ApoE 4 gene variant is associated with Alzheimer's disease in European-derived populations, but much less so in African Americans, for example. And the same risk variant, which is relatively infrequent in humans, is the
standard in our close primate relatives. Causation is relative, even when strong. So even the population-based view of epidemiology is often problematic.
There is another point about randomness. Sometimes, what we mean is that there is a distribution of probabilities of outcomes, as in 1's or 6's in rolls of dice. There, we know that one has a 1/6 chance of a specific result, the probabilities (risks, in this context) are known and predictable, even if each individual's outcome isn't specifically knowable in advance. But many chance ('random') factors have no such underlying theoretical distribution of this kind -- the probability you'll be struck by lightning, or that some part of some artery will be clogged by cholesterol plaque. Dealing with that kind of randomness is far more problematic, yet that is likely to be the major role of probabilism. In that case, all we can do is estimate risk from
past experience and hope the same applies to the future....but we know, in changeable environments, that it won't.
The same kinds of statements apply with even more force when we're trying to infer evolutionary history and how today's genes and their effects got here. It is a humbling lesson that is difficult to accept, even if the evidence for it is very strong.
As for Winnie, she may not be that much of an outlier, after all, perhaps in fact confirming that epidemiological methods can work when it comes to risk factors with large effects. She may have smoked all her life, but she said she was too poor to smoke more than 5 cigarettes a day, and after 100, only smoked 1.