Monday, August 1, 2016

FAS - Fishy Association Studies

                                  On Saturday, July 19, 1879, the brilliant opera 
                                  composer, Richard Wagner, "had a bad night; 
                                  he thinks that...he ate too much trout."  
                                             Quoted from Cosima Wagner's Diary, Vol. II, 1878-83.

As I was reading Cosima Wagner's doting diary of life with her famous husband, I chanced across the above quote that seemed an appropriate, if snarky, way to frame today's post. The incident she related exemplifies how we routinely assign causation even to one-off events in daily life. Science, on the other hand, purports to be about causation of a deeper sort, with some sufficient form of regularity or replicability.

Cause and effect can be elusive concepts, especially difficult to winnow out from observations in the complex living world.  We've hammered on about this on MT over the years.  The best science at least tries to collect adequate evidence in order to infer causation in credible rather than casual ways. There are, for example, likely to be lots of reasons, other than eating trout, that could explain why a cranky genius like Wagner had a bad night.  It is all too easy to over-interpret associations in causal terms.

By such thinking, the above figures (from Wikimedia commons) might be interpreted as having the following predictive power:
     One fish = bad night
     Two fish = total insomnia
     Many fish = hours of nightmarish dissonance called Tristan und Isolde!

Too often, we salivate over GWAS (genomewide association studies) results as if they justify ever-bigger and longer studies.  But equally too often, these are FAS, fishy association studies.  That is what we get when the science community doesn't pay heed to the serious and often fundamental difficulties in determining causation that may well undermine their findings and the advice so blithely proffered to the public.

We are not the only ones who have been writing that the current enumerative, 'Big Data', approach to biomedical and even behavior genetic causation leaves, to say the least, much to be desired.  Among other issues, there's too much asserting conclusions on inadequate evidence, and not enough recognition of when assertions are effectively not that much more robust than saying one 'ate too much trout'.  Weak statistical associations, so typically the result of these association studies, are not the same as demonstrations of causation.

The idea of mapping complex traits by huge genomewide case-control or population sample studies is a captivating one for biomedical researchers.  It's mechanical, perfectly designed to be done by huge computer database analysis by people who may never have seen the inside of a wet lab (e.g., programmers and 'informatics' or statistical specialists who have little serious critical understanding of the underlying biology).  It's often largely thought-free, because that makes the results safe to publish, safe for getting more grants, and so on; but more than being 'captivating' it is 'capturing'.... a hog-trough's share of research resources.

The promise, not even always carefully hedged with escape-words lest it be shown to be wrong, is that from your genome your future biomedical (and behavioral) traits can be known.  A recent article in the July 28 issue of the Journal of the American Medical Association (JAMA), Joyner et al. describes the stubborn persistence of under-performing but costly research that becomes entrenched, a perpetuation that NIH's misnomered 'precision based genomic medicine' continues or even expands upon. Below is our riff on the article, but it's open-source so you can read the points they make and judge for yourself if we have the right 'take' on what they say.  It is one of many articles that have been making similar case anyone is listening.

The problem is complex causation
The underlying basic problem is the complex nature of causation of 'complex' traits, like many if not most behavioral or chronic or late-onset diseases. The word complex, long-used for such traits, refers not to identified causes but to the fact that the outcomes clearly did not have simple, identified causes.  It seemed clear that their causation was due mainly to countless combinations of many individually small causal factors, some of which were inherited; but the specifics were usually unknown. Computer and various DNA technologies made it possible, in principle, to identify and sort through huge numbers of possible causes or at least statistically associated factors, including DNA sequence variants.  But underlying this source for this approach has been the idea, always a myth really, that identifying some enumerated set of causes in a statistical sample would allow accurate prediction of outcomes.  This has proven not to be the case nearly as generally as has been promised.

To me, the push to do large-scale huge-sample, survey-based genomewide risk analysis was at least partly justified, at least in principle, years ago when there might have been some doubt about the nature of the causal biology underlying complex traits, including the increasingly common chronic disease problems that our aging population faces.  But the results are in, and in fact have been in for quite a long time.  Moreover, and a credit to the validity of the science, is that the results support what we had good reason to know for a long time.  The results show that this approach is not, or at least clearly no longer the optimal way to do science in this area or contribute to improving public health (and much of the same applies to evolutionary biology as well).

I think it fair to say that I was making these points, in print, in prominent places, starting as long ago as nearly 30 years, in books and journal articles (and more recently here on MT), that is, ever since the relevant actual data were beginning to appear.  But neither I nor my collaborators were the original discoverers of this insight: instead, the basic truth has been known in principle and in many empirical experimental (such as agricultural breeding) and observational contexts, for nearly a century! Struggling with the inheritance of causal elements ('genes' as they were generically known), the 1930s' 'modern synthesis' of evolutionary biology reconciled (1) Darwin's idea of gradual evolution, mainly of quantitative traits, with the experimental evidence of the quantitative nature of their inheritance, and (2) the discrete nature of inheritance of discrete causal elements first systematically demonstrated by Mendel for selected 2-state traits.  That was a powerful understanding but in too many ways it has thoughtlessly been taken to imply that all traits, not just genes, are usefully 'Mendelian', due to substantial, enumerable, strongly causal genetic agents.  That has always been the exception, not the rule.

A view is possible that is not wholly cynical 
We have been outspoken about the sociocultural aspect of modern research, which can be understood by what one might call the FTM (Follow the Money) approach, in some ways a better way to understand where we are than looking at the science itself.  Who has what to gain by the current approaches?  Our understanding is aided by realizing that the science is presented to us by scientists and journalists, supplier industries and bureaucrats, who have vested interests that are served by promoting that way of doing business.

FTM isn't the only useful perspective, however.  A less cynical, and yet still appropriate way to look at this is in terms of diminishing returns.  The investment in the current way of doing science in this (and other areas) is part of our culture.  From a scientific point of view, the first forays into a new way or approach, or a theoretical idea, yield quick and, by definition, new results.  Eventually, it becomes more routine and the per-study yield diminishes. We asymptotically approach what we can glean from the approach.  Eventually some chance insight will yield some forms of better and more powerful approaches, whatever they'll be.

If current approaches were just yielding low-cost incremental gain, or were being done in well-off investigators' basement labs, it would be a normal course of scientific-history, and nobody would have reason to complain.  But that isn't how it works these days.  These days understanding via FTM is important: the science establishment's hands are in all our pockets, and we should expect more in return than the satisfaction that the trough has been feeding many very nice careers (including mine), in universities, journalism, and so on.  How, when, and where a properly increased expectation of science for societal benefits will be fulfilled is not predictable, because facts are elusive and Nature often opaque.  However, simply more-of-the-same, at its current costs, with continuing entrenched justification, isn't the best way for public resources to be used.

There will always be a place for 'big data' resources.  A unified system of online biomedical records would save a lot of excess repeat-testing and other clinical costs, if every doctor you consult could access those records.  The records could potentially be used for research purposes, to the (limited) extent that they could be informative.  For a variety of conditions that would be very useful and cost-effective indeed; but most of those would be relatively rare.

Continuing to pour research funds into the idea that ever more 'data' will lead to dramatic improvements of 'precision' medicine is far more about the health of entrenched university labs and investigators than that of the general citizenry. Focused laboratory work that is more rigorously supported by theory or definitive experiment, with some accountability (but no expectations nor promises of miracles) is in order, given what the GWAS etc. era, plus a century of evolutionary genetics, has shown. There are countless areas, especially many serious early onset diseases, for which we have a focused, persuasive, meaningful understanding of causation and where resources should now be invested more heavily.

Intentionally open-ended beetle collecting ventures joined at the hip to promises of 'precision' without those promising even knowing what that word means (but hinting that it means 'perfection'), or glorifying the occasional seriously good findings as if they are typical or as though more focussed, less open-ended research wouldn't be a better investment, is not a legitimate approach.  Yet that is largely what is going on today.  The scientists, at least the smart ones, know this very well and say so (in confidence, of course).

Understanding complex causation is complex, and we have to face up to that.  We can't demand inexpensive or instant or even predictable answers.  These are inconvenient facts few want to face up to.  But we and others have said this ad nauseam before, so here we wanted to point out the current JAMA paper as yet another formal and prominently published realization of the costly inertia in which we are embedded, and by highly capable authors. In any aspect of society, not just science, prying resources loose from the hands of a small elite is never easy, even when there are other ways to use those resources that might have better payoff for all of us.

Usually, such resource reallocation seems to require some major new and imminent external threat, or some unpredicted discovery, which I think is far more likely to come from some smaller operation where thinking was more important than cranking out yet another mass-scale statistical survey of Big Data sausage.  Still, every push against wasteful inertia, like the Joyner et al. JAMA paper,  helps. Indeed, those many whose careers are entrapped by that part of the System have the skills and neuronal power to do something better if circumstances enabled it to happen more readily.  To encourage that, perhaps we should stop paying so much attention to Fishy stories.

No comments: