Lots of people commented or tweeted about our post yesterday on the latest stature genomics paper. It is true that we think that tons of money are being thrown into chasing rainbows such as the genomic basis of stature. We said what we felt about that yesterday.
But people often defend current practice reflexively, including on Twitter and avoid having to face up to the serious need for new and actually creative thinking.
I'm pretty clumsy at Twitter, and too verbose, but I think the issues demand more than just 140 character exchanges anyway. The latter can make the issues seem dismissible or trivial, and instead from a biological point of view they are anything but.
We did one good deed for the genetics community: we provided some great sport, as defenders of the faith tweeted, re-tweeted and re-re-tweeded dismissive and often derisive comments. Circling the wagons is reinforcing, and heresy is never well-received. Still, the issue remains.
The sneering aside, there are a number of real issues
Is it strange that not a single objector provided even a single reason why the comments we made were in any sort of substantial error, nor why continuing the vain attempt at enumerating the thousands of trivial causes of stature is a good way to keep investing resources. A characteristic of the objecting tweets is that the tweeters reacted defensively or didn't actually read the post carefully. We were, for example, accused of thinking that the huge number of hits only involved genes in the traditional sense (protein coding regions). We were actually clear that the word 'gene' is losing its traditional, restricted meaning because we now know there are so many other types of DNA function. So many now use the word quite vaguely, sometimes to refer to a single nucleotide, or more generally to refer to specific spots in the genome--some use the word only when claiming that the variant site has relevant function. To criticize us for such a trivial point is to miss the point: If there are figuratively or even literally countless causal contributors, not all of them digital or enumerable, then it doesn't matter about such semantic details. But it does matter that we continue to churn out such kinds of results as if we're making progress. But these sorts of findings were predicted long ago.
The fact that we mentioned unborn or dead people was criticized but our point was that the sample used in this study is literally not replicable, and replicability is a fundamental aspect of the kind of science being applied to the data. The sample consists of people not all of whom are likely still to be alive (or, soon won't be), or whose stature will actually change (the age of individuals in the study ranged from 14 to 103, the young ones still growing, the old ones surely losing height). The human population is churning over in numbers so that genotypes in one study don't represent what some next study would sample (and here we can forget the trivial detail that this stature study basically only included Europeans).
Quibbling about the details won't change the overall bottom line, and is only a careless or intentional distraction from the real message that even the sneerers know very well is the truth: When even easily measurable traits like stature show this level of causal (or, mainly, statistical-associational) complexity, and we have similar traits from many different kinds of species, including plants, animals and even yeast, then we have a problem! This is not irrelevant complexity.
Calling something complex and arguing that therefore we just need to enumerate larger samples is a way of stalling, circling wagons rather than realizing that now we know the lay of the land and it's time to think differently. One might even dare to ask what kind of sense it makes to say that 9,500 sites in the genome contribute to stature (in this current, restricted sort of sample). But that is perhaps too profound a question to raise in polite company.
Even saying, as one tweeter implied, that the study was essentially closed in terms of its causal elements (that is, everything was there, with no new in- or outputs, and hence enumerable) really means that what can be done is to assess risk in closed sets of data. But the implication of the paper, and the authors' suggesting reassuringly that the causes were 'finite', is that in general, and in relation to diagnosis or prediction, that the causal elements are closed in number, and enumerable, and discrete (categorical). These are simply fictions.
So here's an analogy I used in a brief tweet. If we sample some trees and enumerate its number of branches, we simply cannot say we know how many branches trees in general have. But perhaps we can find ways to understand how branches form, relate to each other, change over the lifetime of a tree, vary among tree species, and so on. Enumerability is not the objective of the development of an adequate theory of branching.
Similarly, we should be asking how it is that so many functional spots in the genome, not the same from sample to sample, could be involved in a seemingly simple trait like stature--with qualitatively similar findings even in much more well-controlled, even experimental studies. What does it actually mean to say that thousands of parts of the genome affect the trait? It probably means something we do not yet understand about the organization of organisms that goes beyond enumerability (since clearly open-ended numbers of combinations yield similar results--normal height, blood pressure, glucose levels, brain functions, etc.).
One might argue that what we can get from such studies as mega-sequencing extravaganzas is some sense of the 'shape' of causation in various way, and that's good. Yes, true enough, but it is not even clear what one would mean by 'shape', and there comes a time when we have enough post hoc enumerative data and need to go beyond that, because that's generic rather than specific when it comes either to understanding causation or, more to the point, to prediction. If this work has little predictive value, then it is being misrepresented.
In the case of stature, since we can't predict future environments, we cannot, in principle, predict stature conditional on genotype, except perhaps in the real, often pathologic extremes. Every tweeter who knows his/her genetics knows this.
Since in truth the same thousands-of-sites genotype doesn't ever recur (and here, we only refer to the constitutive genotype, not the entire somatic genotype as we discussed in our post), then there is a major limit to our predictive power. We know this even from the many nearly single-locus diseases that have been studied in detail.
The fact that the authors claimed substantial (if far from complete) accounting of the high heritability of stature is potentially very misleading. They retrofitted to a set of data, which describes correlations but doesn't really address causation and hence prediction, and conditional on some mix of environmental exposures of the sampled individuals. Heritability is a population-specific ratio, as everyone knows, so what is actually being accounted for by such a percentage is unclear and that needs to be said up front. A proper understanding of 'heritability' of course also must recognize the fundamental relative nature of that measure, even if all elements at work were measured and samples large enough to resolve everything sampled, and even if 'the' environment is in some sense a unitary population measure--which it isn't.
Retrofitting is one thing, but prediction is another. It is further limited because only a fraction of the presumed contributing genomic elements will be seen again or even have assessable effects (even in the study, because they're too rare), not to mention lifestyle issues.
Shape is a different thing. If we can get a proper definition of causal 'shape', with some serious-level theoretical background, we might get somewhere. That is elusive at present in both evolutionary and present-day genomics, or more properly, in causal biology. Maybe it (and other strange things we know about genomics but sweep under the rug) is in order to take these things seriously, and stop laughing and start thinking beyond enumeration.
Various such aggregate approaches to GWAS results have been suggested, but they're mainly additive summaries of retro-fitted causal associations, and post hoc hopeful mention of purported possibly relevant pathways.
Better science is needed.