I recently read Laurence Sterne's Tristram Shandy, and this led me to begin re-reading one of the books that was a precursor to the jumbled, chaotic, but often hilarious adventures of Tristram, namely, the 16th century Gargantua and Pantagruel by Francois Rabelais. In the Preface to Book I, I noticed that Rabelais spoke of one Friar Lubin, who went to great lengths "to find a lid to fit his kettle."
The context was Rabelais' argument that a lot of retrospective meanings were often assigned, by presumed sages, to the works of the classics such as Homer and Ovid. Rabelais' idea was that these authors wrote wonderful stuff but afterwards scholars combed through it to find subtle meanings that were never really there. The relevance to science, as I interpret this, is to the widespread, perhaps quite natural tendency for investigators convinced something is true to force results into interpretation consistent with that conviction. Could this lead geneticists to make more of a specific mapping-based genome location of functions that are not really there, and thus to be distracted from functions that might be more important?
A century of work has found many normal and disease traits that are tractably genetic in the sense that one or at most a few, or a choice of one among a few, identifiable genetic loci are responsible. But simple genetic causation is far from what is being routinely promised as the more general case, especially for the common, important complex traits that are the major public health problems, and a main target of genomics today. For those traits, rather than single genes, tens to even thousands of different genome regions are being found to have statistically detectable association with the traits, usually detectable only in huge samples and/or with individually very small effects. Yet such findings are commonly claimed as triumphs, and the investigators go to great lengths to find in them a lid to fit their kettle.
One possible current example of this kind of Procrustean approach is a paper in the current issue of Cell ("Lessons from a Failed γ-Secretase Alzheimer Trial," De Strooper, Cell, Nov 6, 2014). Protein complexes called γ-secretases have been thought by various criteria related to the amyloid plaques associated with Alzheimer disease, to be likely candidates for inhibitors to have therapeutic effects, but a directly relevant drug trial study found some negative consequences but failed to find the positive effect. The authors of the Cell paper argue that this 'No' actually means 'Yes' if you just let us continue the research: "This pessimism is unwarranted: analysis of available information
presented here demonstrates significant confounds for interpreting the
outcome of the trial and argues that the major lessons pertain to broad
knowledge gaps that are imperative to fill."
Strooper presents a vigorous and technically specific set of arguments, and he may be right, of course, but even if so in this case, No-means-Yes arguments are seen rather more often than any actual beef. It's easy to make fun of, and indeed if there is good plausibility evidence, a single study, especially with statistically-based inference, may not be a definitive refutation of an idea. If one has what seems like a good idea it is natural and right not to give up on it too easily. But the frequency of this persistence and the rather typical lack of strong follow-up confirmation at least raises serious questions about our criteria for inference and for giving up on an idea that isn't panning out.
If we choose in advance some significance level, say p = 0.05, as a cutoff for finding a signal, and we design a sample that according to our model should be able to detect an effect of the size we expect, but the study arrives at a p-value of, say, 0.06, in technical terms we should abandon our hypothesis, but of course we usually don't. We call 0.07 'suggestive' and press ahead with our hypothesis. This seems like cheating, and in a sense it is. But in a deeper sense, if we realize the arbitrariness of all our inferential criteria (parsimony, falsifiability, significance....) then we realize that inference is a subjective kind of collective sense of acceptance (or not) of hypotheses. In that light, the γ-secretase Carry On Regardless attitude may not be so wrong--even if it shows that belief, not just objectivity, is important in sciences like genomics that would fancy themselves rigorously objective.
Another example is the search that some investigators are making to find rare rather than common variants causing disease. In principle this makes sense since most variants in the human genome are rare and this may be especially true of harmful variants because evolution (natural selection) will on average work against them. There are various techniques for a rare-variant approach, such as finding a given gene in which different sequence variants are seen in different cases of the disease. This is persuasive, not in the sense that the nature of the specific variants themselves shows why they are pathogenic, but because multiple observations of the same gene at least suggests it might be causal. Historically, once relatively common variants were used to map causation of some pediatric traits, like PKU or Cystic Fibrosis, subsequent sequencing of the gene in patients has found a large variety of different variants--typically hundreds!--that are themselves too rare to generate statistical association on their own. If we can now assume that the gene is the cause, then we can infer that the newly found mutations are causal. That is an assumption that can be questioned, because under the assumption many strange seemingly innocuous variants (e.g., in noncoding or intronic or synonymous sites) are blamed as being causal.
Another tactic for attributing cause to rare variants is to find the same variant in affected relatives, especially a parent and offspring. This plausibly appears as high-penetrance (Mendelian dominant) inheritance. Of course, roughly half of all sequence variants found in any parent will be found in any given offspring, but if there is functional, experimental, or other substantial reason to suspect a particular gene, or the inherited rare variant seems culpable (e.g., a premature stop codon), then such transmission would seem to be at least plausibly convincing. This seems to be widely accepted logic, but is it right?
Fitting data to prior ideas: Procrustean beds or lidless kettles?
The answer is, undoubtedly sometimes, but probably in most cases not really. How can that be? The reason, if not the trait, is simple: genetic variants have their effects only in their environmental and genomic context. If a given variant is not always seen in association with a trait, or there is no particular known functional reason to 'blame' a given genome location for the trait, then one has to ask why one can make a causal assumption. We know from many mapping studies by now that variant-specific risks are usually very small, often detectable only in huge samples. In other words, by far most people carrying the variant don't get the disease, so it's a tad strange to think of it as a 'causal' finding.
Even genes in which known variants with clearly very strong unquestioned effect are widely accepted (e.g., major mutations in the BRCA genes in relation to breast cancer). But even then the risk estimated from samples is neither 100%, nor similar across cohorts. Something differs among affected carriers of these variants, and that is context. The context is either environmental or genomic. In the case of BRCA, the genes are thought to function to detect mutations in the cell and stimulate their correction or to kill the cell. Their role in cancer is that in a meaningful sense the tumor is caused by other variants in the genome, not BRCA itself. But lifestyle factors somehow seem clearly also to be involved--how would that be if the BRCA is a mutation-repair related gene?
Parent-offspring transmission of rare variants certainly may indicate that they play some role in the outcome, but it's possibly (perhaps likely?) because of other genetic (or environmental) co-conditions in the individuals. Offspring inherit much besides a single variant from their parents, after all.
Various studies, based on DNA sequence analysis, have by now shown that we each typically carry around tens or more defunct or seriously damaged genes. The variants may be pathogenic in some individuals but not in others. The reason again must be context, that is, something other than the gene itself. If not, it is some probabilistic aspect of causation about which we can usually only speculate (or assume without even a guess about mechanism)--or simply use 'probability' as a fudge factor to make our story seem scientifically convincing.
Weak signals may not be low fruit, but pointers to elsewhere
Ironically and oddly, finding rare variants in various individuals or finding variants common enough to generate a statistically significant association test but with only low relative risk may mainly mean that the bearers also carry other risk factor(s) that made the target variant 'causal' in the few observed cases. Most of the time--in most contexts--there seems to be no excess risk, or else one would expect the gene to be easily identified even in modest samples, as CF and PKU and many other traits were. Small effect is what small relative risks mean, and small relative risks are by far the rule in mapping studies.
Indeed, claiming success by forcing the conclusion that the identified gene is 'the' cause in these individuals, even parent-offspring pairs, may be another way of finding lids to fit investigators' kettles. Again, if there is a conclusion, it might better be that when small-effect variants are found, it is the context of the rest of the genome (plus life experience of the cases) that are as key to understanding the trait as the target 'hit' site itself. The discovered hit may be involved, but mainly acting as a pointer to some other factor(s) that really account for the effect. If the identified gene itself is so important, why do we only identify a few rare variants in that gene associated with risk, even if transmitted in families? That is, why don't we see some higher-frequency mutations in the same gene as we do with many of the other largely single-allele traits?
These questions apply even to those who argue that finding these cases, the 'low hanging fruit' as such things are often called, is a worthy objective that we can attain, even in the face of complexity. Of course, there are population genetic (evolutionary history) reasons why this may be so, since variant frequencies are affected by chance among other things. And finding a cherry is not evidence against it's involvement. When an inactivating variant is found to be transmitted, this is certainly plausibility evidence worth following and, after all, many single-gene disorders have been identified once there is a clear-enough trail to follow. Still, even knockout mouse confirmations are not always definitive support by any means, and as we noted above, healthy people may harbor as many 'bad' genetic variants as those affected. But if the finding is confirmed, then of course therapeutic approaches can be contemplated.
However, the great lack of clear therapeutic consequents of the vast majority of GWAS-like findings is consistent with the idea that the target site is in truth mainly pointing us to other things that are what we need to know about. That is, thinking of the picked cherry as really causal may be a mistaken way to interpret genomic data, even if the cherry is a small part of the story. If this is being too critical then it is only to match the predominant view which is being too promotional.
An upside of these ideas could be to lead investigators to take the context-dependent aspect of such findings more seriously and see what else may be accompanying the rare variant in question, or what it may interact with. There are, of course, efforts to do this, but it is not an easy problem, because such follow-ups lead back into the web of complexity; but perhaps using these situations as entry points we can find some order there.
To make more of it than that may suggest that oftentimes we've decided
ahead of time what sort of kettle we have, we will fit what lids we find