Monday, October 13, 2014

Morgan's insight--and Morgan's restraint

Last week, we stirred the pot by asserting that it was at best misleading for the authors of the latest human stature mega-study to say, as if reassuringly, that the number of genome locations contributing to stature was in the thousands, but that at least it was finite. We questioned that 'finite' both figuratively and literally, because it has to do with the realities and manageability of this sort of causal landscape.  And this is for what appears to be a highly genetic and easily measured trait.

Defenders of the faith tweeted sneeringly at these points.  Our view is that current practice is largely chasing rainbows, and we know it, and we had solid century-old theoretical reasons to expect the kind of complexity that's been found (countless contributing factors to complex traits).  The essential nature of the findings was clearly predicted, before and during the large-scale mapping era. Initially, one could argue that the theory of 'polygenic' inheritance was non-specific and the growth of whole genome studies confirmed it.  That, in itself, was a major success, not a failure, though it showed that using genomes to predict complex traits is problematic.

We have said that by now we have enough actual explicit genomic evidence to show the lay of the land--predicting phenotypes from genotypes is, and will continue to be problematic.  It's long time to stop chasing these rainbows and to stop making exaggerated promises of pots of medical gold to come. Some funding groups have said as much, but the push for ever bigger is not abating.

In our post, we used a quote from TH Morgan's 1926 book, The Theory of the Gene. Morgan was a major figure who laid the linkage and mapping framework for today's finds.  He made statements about stature and its complex causal basis that have stood a century of time, and the quote we used made our point.

Of course, selective retro-quoting is as dicey as using retro-fitted data to allege predictive power.  We can mine our forebears for quotes that seem prescient....because they support our own point of view. But exegesis is a game anyone can play.  You can usually find that the same author, or his contemporaries, said  things that don't support our view.  Industries of professors have made their careers by mining history for antecedents whose quotes presaged major discoveries such as of relativity, evolution, and so on, and/or helped stimulate Einstein or Darwin.  So, quoting Morgan was a rhetorical device for making a point, and in itself the quote has no scientific heft.

In fact, however, the quote reflects Morgan's views about what he was doing--and about what science at the time was not yet equipped to do.  Wisely, perhaps more than in today's environment, he simply refrained from doing what was not yet seriously feasible.

Morgan's contributions in the famous fly room are well documented (an interesting account is in Lords of the Fly, by RE Kohler, 1994, U Chicago Press).  He had a major, clearly important agenda.  Mendel had shown evidence that (carefully selected) traits could be inherited in what appeared to be a kind of "point" causation--single transmissible causal factors.  Mendel worked in the context of the newly developing atomic theory of chemistry, in which substances came in quantal packets (molecules composed of integer multiples of carbon), and the discovery of point causation of infectious disease (bacteria) by Pasteur and Snow and others.  I think this general scientific environment led Mendel to think in terms of 'integral' causation, that is, by discrete causal units.

The work of Morgan and his students and colleagues was designed to explain how such point causes of inheritance worked, whatever they were at the molecular level.  In his fly experiments, also working with carefully selected traits as Mendel did (and aware that not all traits behaved this way), he used controlled, replicable experimental crosses to show that these sorts of point causes could be located to specific relative physical places in chromosomes.  This follow-up to what Mendel first clearly revealed was of course fundamental and extremely valuable.

Morgan did not, however, think of genes (the causal 'beads' on the chromosomal string, whatever their actual nature) as having a fixed functional effect. In the absence of direct knowledge of their chemical nature he, like Mendel and everyone else up to his time, had to use phenotypic markers to reveal the presence of a given allele (genetic variant).  He recognized that 'genes' could have multiple or complex effects, but he scored flies for traits that had some discrete, enumerable state at some specific life stage, such as newly hatched larva, in controlled crosses, that could be used to identify the presence of the causal element.  He didn't care, and was explicit about this, whether that was all the gene did, or whether the trait was even present at some other life-history stage.  One might say that he was interested in the causal layout or, shape, to use a word we used in a post last week, of inheritance.  That is, his approach was a tactic to understand the nature of genetic inheritance, essentially not to explain the traits.

Morgan explicitly also eschewed working in areas like developmental genetics--or stature--because he rightly said there was simply not enough known to do that at the time--more fundamental understanding was needed first.  By avoiding what was hopeless to understand at the time, and using his restricted, focused approach to get to a deeper question (genes as causal locations on chromosomes, recombination, etc.), he made some of the most important contributions to our understanding of life.

In that sense, it is fair to quote him as we did in our post because he both had the insight and the restraint to stay within what was known.

How did we get here?
The formal theories of genetics that developed in the early third of the 20th century included ways to reconcile discrete Mendelian heritable 'causation' with the causation of the more obviously continuous traits--like stature.  The reconciliation was the concept of 'polygenic' rather than point causation.  The idea, in its theoretical expression, was that an infinite number of infinitesimally small individual genes generated the continuous population distribution of complex traits.

Like points in geometry, genes could be point causes, but were infinitesimally small in the limit, and their joint effects could have useful distributional properties (like the bell-shaped distribution of stature).  But quantitative geneticists properly refrained from trying to identify individual genes 'for' such traits (or, as time progressed, claimed that sometimes one or a few 'major' genes might be identifiable but in a polygenic background).  Whatever they were, genes behaved in individuals and were transmitted in aggregate ways that clearly fit the polygenic model, whether or not the number of causes was literally infinite.  That's like saying a line can be understood and analyzed as if made of countless infinitely small, non-enumerable points. From an aggregate point of view, it makes complete sense.

By roughly the 1990s, while the human genome reference sequence did not yet exist, it nonetheless became technically possible to scan the whole genome for specific sites that contributed to complex traits.  The genome was viewed much more as a string of discrete beads than it is now.  Enthusiasm was high because the method worked (called 'linkage' analysis, in very large families where detection power is greatest) and breast cancer susceptibility genes, cystic fibrosis, and others as well, were mapped by various related approaches.

Without going into the historical details, what was mappable were genes in which there were sufficiently common variants with sufficiently strong effects to appear in families in a pattern much like that which Mendel had introduced, in which the trait was an efficient marker of the presence of the causal allele.  The predictive power was strong in those families, but it was just as obvious that this was not the general case for occurrences of the same traits.  Even today, the preponderance of breast cancer cases are not due to the BRCA genes nor does the disease segregate in families in Mendelian fashion.

Still, the mapping-drug had been taken, and geneticists on a high saw a limitless landscape of possible ways to identify--to enumerate--the genomic regions that contributed causally to a host of complex, largely continuously distributed (quantitative) traits.  Just collect more data!  As technology improved, the addiction was fed because endlessly finer resolution seemed in the offing.  The 'hits' that were made were naturally trumpeted with great enthusiasm.  We could turn complex traits into Mendel's peas!

This began in earnest around 15 years ago, and money poured into genetics: the omics era had dawned.  For legitimate as well as fashion and imitation, every problem was turned into a big-data 'omics' problem driven, rather than just enabled, by advancing technology.  Nutrigenomics, diseaseomics, microbiomics, epigenomics, proteomics, and so on.  In a sense, science has become industry, and has jumped on the 'Big Data' bandwagon.

Where are we now?
The problem as I see it is that we have reached what seems clearly to be a kind of ceiling in cost-benefit or signal to noise terms.  The findings over the past 20 years, both widespread and consistent, from natural as well as experimental approaches, and from all the kingdoms of life, have confirmed the century old theory that the traits in question really are 'polygenic' in the practical sense of the term.  This is an elegant success story--but it's more like Morgan than the transformation so vocally being asserted, which amounts to the promise of imminent medical miracles.

There will of course always be some important findings when such a huge enterprise is undertaken. But we seem not yet willing to acknowledge, much less accept the limits of the new knowledge, in particular including that predicting complex traits from genomes at birth is not going to go as promised.

People aren't saying, with restraint, that we're just showing that discrete spots on the genome have causal effects, because we have blurred the effect (e.g., stature, but only after problematic 'regression' on age, sex, etc. as if that were the equivalent to Morgan's marker traits), we have assumed that random sampling gives the same sort of information as controlled crosses and so on.  We have assumed that estimating these things retrospectively gives us estimable predictive ability.

At this stage, the view we've expressed is that we now have countless big-scale mapping studies, generating similar results, and it's time to think about what we've been shown, rather than to continue along the same basic path.  Some are doing that, advocating whole-genome sequencing and whole-population data bases of sequence--partly to avoid the problem with the association studies that they can't find rare causes, and hoping instead to find them in families within population data bases.  This, too, will work sometimes.  But this is asking for a lot for the occasional success. It is not asking whether what we've done has shown us that genomes work in ways far more complex than our enumerative approach is aiming to document--and that view is what we assert.

It is of course possible that the kind of data we are collecting is, in the end, appropriate and that there isn't anything profound yet to be discovered by more careful, focused, less industry-first methods. Time will tell.

A standard wagon-circling criticism of those who say we've done more than enough of the recent mapping approach, is to say that if you don't have the answer you should shut up and go home.  But that is somewhat like saying that if you sees that the theater is on fire, you shouldn't say anything about it unless you have a hose in their hand.  That's a totally bogus argument.  If there is a problem, and many do now think so, and nobody is seriously rebutting the points, then there is a problem! There are resources, fiscal and intellectual, at stake and that could be used more productively.  Denial and aggressive promotion of current practice certainly keeps the motor running, but it won't solve the problem.

No comments: