The Mermaid's Tale: The exactitude of -omics

On Exactitude in Science

Jorge Luis Borges, Collected Fictions, translated by Andrew Hurley.

…In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.

—Suarez Miranda,Viajes de varones prudentes, Libro IV,Cap. XLV, Lerida, 1658

We'd like to suggest that Borges' short story can be aptly applied to the current state of disease prediction. Fifteen years ago or so we were being told that once we had the human genome (HG) sequenced we'd be able to predict the diseases people were going to get, prevent them, and everyone would live to older ages than we'd ever attained before. Aside from the questionable ethics of enabling such a demographic catastrophe, not to mention the idea that "everyone" would surely be an exclusive club, this promise is not much closer to realization now than in pre-HG days.

The first HG sequence, such as it was, was published in 2001. Since then the promises have been honed a bit--ok, so the sequence itself wasn't going to bring us as close to immortality as we'd hoped, but the Common Disease Common Variant project would. That was the theory that was used to justify the HapMap project, to provide resources to use case-control comparisons to find causal variants; then we'd have the data in hand for disease prediction and prevention. That project was itself fine-tuned and scaled-up over the years, eventually bringing us genomewide association studies (GWAS) which, depending on who you ask, are either justifiably dead because they're mainly finding genes with very small effects, or alive and well because there have been some successful studies (macular degeneration studies are always cited) and if we just fine-tune the method some more it will really work. And think what we'll be able to do with more whole genomes.

The -omics boom was being born. This is the era of 'hypothesis free' approaches. When we don't know the cause or can't develop useful actual hypotheses, our 'hypothesis' is just that some element in the realm we're searching has causal effects. The genome was the first such realm, and the idea was that the trait had to have some genetic cause and if we blindly search the entire genome it must be there, and so we'll find it (or instead of 'it', some tractable few numbers of such causal sites).

Genomics was driven by increasing technology and was addictive, because, it is not too cynical to say, it was thought-free, meat-grinder, factory science. It was lucrative, did indeed teach us a lot about what genes and genomes do, and found a modest number of important causal genes. Its success, at least in the fashion and funding senses, understandably spawned other hypothesis-free blind technological approaches, cashing in on the cachet of the 'omics' word and its rejection of the need for actual prior hypotheses to design studies: nutriomics, connectomics, metabolomics, microbiomics, immunomics, epigenomics, and more. How much of this was because the same people who were promising us that successful disease prediction with genetics was right around the corner realized that this just wasn't true, and needed to figure out ways to keep their labs running we can't say, but we certainly are a fad-following, money-following research culture and we know this is part of the story. To be fair, when other approaches hadn't solved any of the problems, there was natural appeal to a thought-free, safely factory-like turn. In any case, many of the same people who were gung-ho about genetics are now equally gung-ho about the promise of the -omics boom to bring us disease prediction and prevention that will really work this time.

The current interest in the -omics of supercentenarians in order to figure how they lived to their ripe old ages, and thus how we can live to 120 is, we think, an example of this misguided fad. One basic assumption of this work is that every cause is individually identifiable, predictable and replicable. This is in fact true for causes with large effects--Mendelian diseases, e.g., or point source infections like cholera or malaria and so on--but there are many paths to heart disease or stroke. When everyone's genome is unique and causes many and variable, however, too often each combination of environmental and genetic factors will be extremely rare if not singular, and impossible to identify with current statistical-sampling based methods, the identification of rarely replicated events will be next to impossible. The idea that every cause can be identified is a reductionist approach to disease akin to the reductionist approach to evolution, which requires every trait to have an adaptive reason to have evolved when in fact sometimes it's just chance.

But, once we venture into the quest to find environmental factors that influence longevity, we're necessarily identifying these factors retroactively, if they are even identifiable, and yet none of us is going to live in the past. Future environments are unpredictable. So, again, unless a factor has large effects--heavy radiation exposure, infectious agents, toxins, e.g.--it's unlikely to be useful in predicting individual cases of disease.

We can see the issues by the proliferation of ever-more 'omics' approaches. Each omics-community advocates its realm as if it is the, or at least the critical, one. Essentially, we always add but rarely reduce, the number of potential causes of the traits in the lives of individuals. This adds to the combinatorial realm--number of possible combinations of factors (and their intensity)--through which we must search. More causes, inevitably individually rare, means that to show that a combination is causal it has to be seen enough times. That means ever larger samples because 'seen enough times' means to enable us to rule out chance as the explanation for the association between the combination of risks and the outcome. But when there are more reasonably plausible combinations than grains of sand on the earth's beaches (this is no exaggeration--it's if anything an understatement), there aren't enough people to get such results. And subsequent generations will have different people with different combinations of risk factors.

We certainly wouldn't argue with the idea that what we eventually succumb to is likely to be the result of multiple -omics, that is, a combination of factors. But, we do question the idea that they will be identifiable, or useful in prediction, which is presumably the point of all this work. The current interest in documenting every possible factor that might have an effect on health and longevity is bringing us closer and closer to Borges' map of the Empire.

8 comments:

John R. VokeyAugust 23, 2012 at 1:29 AM
You continue to out-do yourselves. Brilliant piece: the lede as the Jorge Luis Borges' paragraph story (he could write whole worlds in a paragraph!) could not be more apropos. You might also want to consider (re)reading C.S. Peirce on the same issues of causality.
JavierAugust 23, 2012 at 3:02 PM
Beautiful. An entire universe in one paragraph. An Aleph.

Wednesday, August 22, 2012

The exactitude of -omics

8 comments: