Thursday, October 9, 2014

Hiding behind technicalities to avoid having to change

Lots of people commented or tweeted about our post yesterday on the latest stature genomics paper.  It is true that we think that tons of money are being thrown into chasing rainbows such as the genomic basis of stature.  We said what we felt about that yesterday.

But people often defend current practice reflexively, including on Twitter and avoid having to face up to the serious need for new and actually creative thinking.

I'm pretty clumsy at Twitter, and too verbose, but I think the issues demand more than just 140 character exchanges anyway.  The latter can make the issues seem dismissible or trivial, and instead from a biological point of view they are anything but.

We did one good deed for the genetics community:  we provided some great sport, as defenders of the faith tweeted, re-tweeted and re-re-tweeded dismissive and often derisive comments.  Circling the wagons is reinforcing, and heresy is never well-received.  Still, the issue remains.

The sneering aside, there are a number of real issues
Is it strange that not a single objector provided even a single reason why the comments we made were in any sort of substantial error, nor why continuing the vain attempt at enumerating the thousands of trivial causes of stature is a good way to keep investing resources.  A characteristic of the objecting tweets is that the tweeters reacted defensively or didn't actually read the post carefully. We were, for example, accused of thinking that the huge number of hits only involved genes in the traditional sense (protein coding regions).  We were actually clear that the word 'gene' is losing its traditional, restricted meaning because we now know there are so many other types of DNA function. So many now use the word quite vaguely, sometimes to refer to a single nucleotide, or more generally to refer to specific spots in the genome--some use the word only when claiming that the variant site has relevant function.  To criticize us for such a trivial point is to miss the point:  If there are figuratively or even literally countless causal contributors, not all of them digital or enumerable, then it doesn't matter about such semantic details.  But it does matter that we continue to churn out such kinds of results as if we're making progress.  But these sorts of findings were predicted long ago.

The fact that we mentioned unborn or dead people was criticized but our point was that the sample used in this study is literally not replicable, and replicability is a fundamental aspect of the kind of science being applied to the data.  The sample consists of people not all of whom are likely still to be alive (or, soon won't be), or whose stature will actually change (the age of individuals in the study ranged from 14 to 103, the young ones still growing, the old ones surely losing height).  The human population is churning over in numbers so that genotypes in one study don't represent what some next study would sample (and here we can forget the trivial detail that this stature study basically only included Europeans).

Quibbling about the details won't change the overall bottom line, and is only a careless or intentional distraction from the real message that even the sneerers know very well is the truth:  When even easily measurable traits like stature show this level of causal (or, mainly, statistical-associational) complexity, and we have similar traits from many different kinds of species, including plants, animals and even yeast, then we have a problem!  This is not irrelevant complexity.

Calling something complex and arguing that therefore we just need to enumerate larger samples is a way of stalling, circling wagons rather than realizing that now we know the lay of the land and it's time to think differently. One might even dare to ask what kind of sense it makes to say that 9,500 sites in the genome contribute to stature (in this current, restricted sort of sample). But that is perhaps too profound a question to raise in polite company.

Even saying, as one tweeter implied, that the study was essentially closed in terms of its causal elements (that is, everything was there, with no new in- or outputs, and hence enumerable) really means that what can be done is to assess risk in closed sets of data.  But the implication of the paper, and the authors' suggesting reassuringly that the causes were 'finite', is that in general, and in relation to diagnosis or prediction, that the causal elements are closed in number, and enumerable, and discrete (categorical).  These are simply fictions.

So here's an analogy I used in a brief tweet.  If we sample some trees and enumerate its number of branches, we simply cannot say we know how many branches trees in general have.  But perhaps we can find ways to understand how branches form, relate to each other, change over the lifetime of a tree, vary among tree species, and so on.  Enumerability is not the objective of the development of an adequate theory of branching.

Similarly, we should be asking how it is that so many functional spots in the genome, not the same from sample to sample, could be involved in a seemingly simple trait like stature--with qualitatively similar findings even in much more well-controlled, even experimental studies.  What does it actually mean to say that thousands of parts of the genome affect the trait?  It probably means something we do not yet understand about the organization of organisms that goes beyond enumerability (since clearly open-ended numbers of combinations yield similar results--normal height, blood pressure, glucose levels, brain functions, etc.).

One might argue that what we can get from such studies as mega-sequencing extravaganzas is some sense of the 'shape' of causation in various way, and that's good.  Yes, true enough, but it is not even clear what one would mean by 'shape', and there comes a time when we have enough post hoc enumerative data and need to go beyond that, because that's generic rather than specific when it comes either to understanding causation or, more to the point, to prediction.  If this work has little predictive value, then it is being misrepresented.

In the case of stature, since we can't predict future environments, we cannot, in principle, predict stature conditional on genotype, except perhaps in the real, often pathologic extremes.  Every tweeter who knows his/her genetics knows this.

Since in truth the same thousands-of-sites genotype doesn't ever recur (and here, we only refer to the constitutive genotype, not the entire somatic genotype as we discussed in our post), then there is a major limit to our predictive power.  We know this even from the many nearly single-locus diseases that have been studied in detail.

The fact that the authors claimed substantial (if far from complete) accounting of the high heritability of stature is potentially very misleading.  They retrofitted to a set of data, which describes correlations but doesn't really address causation and hence prediction, and conditional on some mix of environmental exposures of the sampled individuals.  Heritability is a population-specific ratio, as everyone knows, so what is actually being accounted for by such a percentage is unclear and that needs to be said up front.  A proper understanding of 'heritability' of course also must recognize the fundamental relative nature of that measure, even if all elements at work were measured and samples large enough to resolve everything sampled, and even if 'the' environment is in some sense a unitary population measure--which it isn't.

Retrofitting is one thing, but prediction is another.  It is further limited because only a fraction of the presumed contributing genomic elements will be seen again or even have assessable effects (even in the study, because they're too rare), not to mention lifestyle issues.

Shape is a different thing.  If we can get a proper definition of causal 'shape', with some serious-level theoretical background, we might get somewhere.  That is elusive at present in both evolutionary and present-day genomics, or more properly, in causal biology.  Maybe it (and other strange things we know about genomics but sweep under the rug) is in order to take these things seriously, and stop laughing and start thinking beyond enumeration.

Various such aggregate approaches to GWAS results have been suggested, but they're mainly additive summaries of retro-fitted causal associations, and post hoc hopeful mention of purported possibly relevant pathways.

Better science is needed.


incurable scientist said...

As an applied plant geneticist who lost faith in all of the textbook science, after some 32 years of research, I read all your texts on genetics with genuine delight. By 1998, we began to devise a quite complex systemic approach, waving goodbye to the textbook methods and theories. Discussion of my loss of trust in genetic theories with the late Dr Don Wallace (Cornell) is worth a quote; I asked, how many glees control plant height? He answered, nearly all of them. Or all of them. Then he went on to discuss how genes interact with each other and with the environments; this is reality, whatever is a gene. To be practical, we relied more on multivariate analyses and broad exploration of genetic diversity, and it worked. Methods must change over time, because the genetic basis does. Our new and gradually evolving approaches give better results, in part because we are not after simple explanations of every trait.
The only general rule (well, near general) is that modifying one trait modifies others unwillingly. Too often, improving one trait makes another worse. Markers can help sometimes, but more generally, they are not worth the cost. Applied genetics still has a long way to go before anyone can claim understanding. But it can be fun.

Ken Weiss said...

Thanks for the comments and your experience! My personal view is that we can now clearly see the issues (and you raise a number of them, too). The people riding high in the system naturally claim that it's working fine if we just let it keep going (and growing).

I don't think so. Indeed, I think that 'statistics' as generally applied, assumes regularities that are not an adequately accurate way to model the way life has evolved through diversity rather than replication.

That's a broad almost philosophical statement about life, but I think it's true nonetheless. We are using models developed for replicable events, with mathematical regularity, when that's far from the nature of life.

Anonymous said...

What say you?
Krapohl, E. et al. The high heritability of educational achievement reflects many genetically influenced traits, not just intelligence. PNAS, October 2014 DOI: 10.1073/pnas.1408777111


Ken Weiss said...

Without reading more than the abstract, but knowing the senior author, I say this is yet another paper in the gene-fad age. Everything has to be seen as, or forced to be seen as, genetic. Of course various personal traits contribute to success or failure or mediocrity in any endeavor in a population context.

But many of these traits are affected by all sorts of confounders, not least being more democratic society, parental jobs and neighborhoods etc.

A century or two ago, only the wealthy went to school, and basically only they contributed to 'intelligent' activities that are given status (philosophy, political or military leadership, arts, etc.). Did that mean all these other traits, in the great majority of the people, didn't then have a genomic basis? Clearly and obviously genes act only in context. The fervor and rewards for seeing everything as genetic is part of our time. it will pass. Some day we'll realize that genomic variation affects everything, but we'll respect and address context more.

The two axioms of life are:
1. Everything is genetic (inherited) at least in some sense!
2. Everything is environmental (contextual) in some sense! said...

WRT the commentor's PNAS reference, the study's “genetic reasons” term didn’t mean that the researchers actually took genetic samples. From one news article: “Identical twins share 100 percent of their genes while non-identical twins share just 50 percent of their genes. Because these sets of twins share the same environment, the scientists were able to compare identical and non-identical twins to estimate the relative contributions of genetic and environmental factors.”

This estimating method produced an artificial divide between genetic and environmental factors. Identical twins start out sharing 100% of their genes, but then their genes become expressed differently – often because of environmental factors – to produce unique individuals even before birth. The sets of identical twins were definitely not the 100% same genetic makeup between themselves at age 16 as they were at conception, and that assumption was the foundation of the estimating method.

I feel that the researchers didn’t prove their case that “genetic reasons” were a causal factor to the stated extent. Although their estimating method’s numbers may have indicated that the above exercise was valid, that didn’t necessarily mean that the method’s results reflected the reality of genetic and epigenetic influences on the subjects.

Better methods of estimating “the relative contributions of genetic and environmental factors” are available with genetic sampling. One way is to measure the degree of DNA methylation of genes.

The funniest thing I saw in the study’s news coverage was one where someone argued that the researchers were wrong and that they needed educational psychologists on their staff to interpret the data. Guess the profession of the arguer!

Ken Weiss said...

Thanks for this interesting comment, that makes valid additional points. Stature clearly has a high relative genetic component, but ad you say ....

The current Am J Hum Genet has a paper that dissects stature into some subcomponents, but the basic picture does not really change.