Yesterday we wrote about the state of things in genomics. The idea of genetics as essentially a reductionistic one gene, one trait approach to understanding causation and prediction is still a live one, despite decades of evidence to the contrary. Indeed, despite the fact that we've known for 100 years that life is far more complex than that. Yet still today the prevailing paradigm is to collect more data, enumerate more genes and gene variants associated with disease, and other sorts of 'omics' Big Data, and we'll finally understand causation and be able to predict disease. It is largely raw induction--the data will speak for themselves by the patterns computers can find in them. But in many ways, the closer we look, the stranger things seem, not clearer.
In his book The Structure of Scientific Revolutions, published in 1962, Thomas Kuhn described what scientists do as 'normal science' interrupted by rare, transformative changes of fundamental viewpoint. He called these moments 'paradigm shifts', now a terribly over-worked phrase. People are very reluctant to give up a worldview they know and have worked with, and are either oblivious to contrary facts or problems that seem insoluble. Until someone comes along with a fundamentally better idea that accounts for those contrary facts, and then people wonder, as Huxley did about Darwin's theory, why they hadn't seen it all along.
We have seen over the last few years that there are important areas in which the proverbial emperor of genomics has been shown to have less than adequate clothes, or more accurately, that there is not very much emperor in the huge cloak of modern 'omics'. We're awash in data, with new sorts appearing regularly (e.g., ever-growing lists of SNPs, copy number variation, microbiomes, epigenetic modifications, genes in pathways, etc.). This has added potentially causal elements to efforts to relate genomic data to the traits of organisms, like disease or adaptations, and to a withering amount of complexity about which there is much angst about how it can be parsed. Some findings have been quite important, but most have been minor at best, and often totally ephemeral.
But what we get and what most are seeking are just lists, in some ways that only a computer can love (or have the patience to look through), and lists don't account for the many, many spatial and temporal entanglements, of diverse form, between the multitude of factors we know are involved in making organisms what they are, in 4-dimensional space and time.
It is tempting to think that some revolutionary theory is just around the corner if only someone makes the profound discovery--the next Newton or Einstein (or Darwin). Darwin's insight was as profound as these others, but what he saw was that life, unlike atoms, seems imprecise by nature--based as it is not on replication but on divergence by random variation weakly screened by experience. And despite widespread but uncritical views to the contrary, Darwin's very Newtonian simple causal determinism was patently imprecise or incomplete. Is there something fundamental about causation in life and genomes that is yet to be discovered?
In a sense, the evolutionary and functional genomics professions are clinging to conventional notions much the way early 20th century physicists clung to 'ether' in the face of relativity theory: if we just have better technology, bigger samples and enumerate more and more things, and build statistical models to infer patterns we attribute to causation, we'll understand everything and answer riddles like 'hidden heritability' or enable 'personalized genomic medicine' ... finally! So, defenders of the faith say to skeptics: patience, please--let us carry on!
But is this right? What if we ask whether there might be something more involved in life than relentless 'omic'-scale beetle-collecting?
Do strange things about life require new concepts?
Here is another list, this time of a few discoveries or realizations that don't easily fit into the prevailing view, suggesting that simple ramping up of enumeration may not be our salvation:
1. The linear view of genetic causation (cis effects of gene function, for the cognoscenti) is clearly inaccurate. Gene regulation and usage are largely, if not mainly, not just local to a given chromosome region (they are trans);
2. Chromosomal usage is 4-dimensional within the nucleus, not even 3-dimensional, because arrangements are changing with circumstances, that is, with time;
3. There is a large amount of inter-genic and inter-chromosomal communication leading to selective expression and non-expression at individual locations and across the genome (e.g., monoallelic expression). Thousands of local areas of chromosomes wrap and unwrap dynamically depending on species, cell type, environmental conditions, and the state of other parts of the genome at a given time;
4. There is all sorts of post-transcription modification (e.g., RNA editing, chaperoning) that is a further part of 4-D causation;
5. There is environmental feedback in terms of gene usage, some of which is inherited (epigenetic marking) that can be inherited and borders on being 'lamarckian';
6. There are dynamic symbioses as a fundamental and pervasive rather than just incidental and occasional part of life (e.g., microbes in humans);
7. There is no such thing as 'the' human genome from which deviations are measured. Likewise, there is no evolution of 'the' human and chimpanzee genome from 'the' genome of a common ancestor. Instead, perhaps conceptually like event cones in physics, where the speed of light constrains what has happened or can happen, there are descent cones of genomic variation descending from individual sequences--time-dependent spreading of variation, with time-dependent limitations. They intertwine among individuals though each individual's is unique. There is a past cone leading of ancestry to each current instance of a genome sequence, from an ever-widening set of ancestors (as one goes back in time) and a future cone of descendants and their variation that's affected by mutations. There are descent cones in the genomes among organisms, and among organisms in a species, and between species. This is of course just a heuristic, not an attempt at a literal simile or to steal ideas from physics!
Light cone: Wikipedia |
8. Descent cones exist among the cells and tissues within each organism, because of somatic mutation, but the metaphor breaks down because they have strange singular rather than complex ancestry because in individuals the go back to a point, a single fertilized egg, and of individuals to life's Big Bang;
9. For the previous reasons, all genomes represent 'point' variations (instances) around a non-existent core that we conceptually refer to as 'species' or 'organs', etc.('the' human genome, 'the' giraffe, etc.);
10. Enumerating causation by statistical sampling methods is often impossible (literally) because rare variants don't have enough copies to generate 'significance', significance criteria are subjective, and/or because many variants have effects too small to generate significance;
11. Natural selection, that generates current variation along with chance (drift) is usually so weak that it cannot be demonstrated, often in principle, for similar statistical reasons: if cause of a trait is too weak to show, cause of fitness is too weak to show; there is not just one way to be 'adapted'.
12. Alleles and genotypes have effects that are inherently relativistic. They depend upon context, and each organism's context is different;
13. Perhaps analogously with the ideal gas law and its like, phenotypes seem to have coherence. We each have a height or blood pressure, despite all the variation noted above. In populations of people, or organs, we find ordinary (e.g., 'bell-shaped') distributions, that may be the result of a 'law' of large numbers: just as human genomes are variation around a 'platonic' core, so blood pressure is the net result of individual action of many cells. And biological traits are typically always changing;
14. 'Environment' (itself a vague catch-all term) has very unclear effects on traits. Genomic-based risks are retrospectively assessed but future environments cannot, in principle, be known, so that genomic-based prediction is an illusion of unclear precision;
15. The typical picture is of many-to-many genomic (and other) causation for which many causes can lead to the same result (polygenic equivalence), and many results can be due to the same cause (pleiotropy);
16. Our reductionist models, even those that deal with networks, badly under-include interactions and complementarity. We are prisoners of single-cause thinking, which is only reinforced by strongly adaptationist Darwinism that, to this day, makes us think deterministically and in terms of competition, even though life is manifestly a phenomenon of molecular cooperation (interaction). We have no theory for the form of these interactions (simple multiplicative? geometric?).
17. In a sense all molecular reactions are about entropy, energy, and interaction among different molecules or whatever. But while ordinary nonliving molecular reactions converge on some result, life is generally about increasing difference, because life is an evolutionary phenomenon.
18. DNA is itself a quasi-random, inert sequence. Its properties come entirely from spatial, temporal, combinatorial ('Boolean'-like) relationships. This context works only because of what else is in (and on the immediate outside) of the cell at the given time, a regress back to the origin of life.. . . . you can probably add other facts, curious things about life that are not simply list-like and at the very least challenge the idea that we can understand genomic causation with current approaches.
Is there an analog of 'complementarity' or something equivalently important missing?
These facts are, to paraphrase Einstein about strange phenomena in quantum physics, 'spooky' if you think about them in terms of normal ideas about life or even just about genes. They are far from the idea of DNA as a linear code or 'the' blueprint for life, or even as a source of 'information' read off like one reads a sentence in an email message. Yet, generally, we explain biological causation with statistical descriptions of the above sorts of phenomena, based on sampling and enumeration studies, but even huge studies of hundreds of thousands of people, and millions of genomic loci aren't getting us very far.
We do, of course, have a huge array of experimental ways of using reductionist approaches to understand many sorts of processes--transcription, physiological reactions, translation, countless others. We use animal and cell culture models with many fine results where reductionist approaches are in order or are suited to our objectives. Each gives us something of a view of biological causation. But often if not usually without asymptotic precision--more than just measurement error.
Even in most of these instances, and especially at higher levels of observation, we currently have no theory that is remotely comparable to fundamental theories in chemistry and physics. There are general evolutionary and population genomic patterns that may even be widely observed, but the patterns are basically empirical rather than being predicted by some sort of 'laws' that compare to those of physics. It's not even clear how deeply we understand how things work. We can observe statistical patterns, but they are not of the rigorous kind of probabilistic processes found in physics or chemistry. As with, say, relativity, you can ignore it unless you approach a critical point (the speed of light, say). Then you must have a better theory of what's happening and a better way to assess it. Perhaps we have reached such a point in our desire to make precise predictions about genomes, a kind of limit of utility of enumeration-based thinking.
If countless, ephemeral variants have individually minor, but overall substantial contribution to traits, enumeration and statistical significance criteria simply won't work even if the effects are real. Similarly, the number of known interactions even in biologically simple reactions are typically (and obviously) vastly more than can be characterized or identified by sampling and standard statistical analysis. Everyone knows this. But we don't yet have anything comparable to 'entropy' or 'statistical mechanics' to deal with this adequately--for example, to make precise predictions. Yet the standard view, basically not based on any profound creativity, is that what we need is bigger enumerative studies--'Big Data'.
Is there something missing at a basic level? The list above suggests that this may be so. In many ways we may not even be asking well-posed questions. Would a true conceptual change of some kind lead us to the kinds of predictive uses of genomic data that is being promised? Will it lead to a serious new theory of genetics? Is such a change even in the offing or will we just plow ahead with very expensive, sausage-grinding normal science indefinitely?
It is easy to think of the strange facts and argue that a transformative insight will put them all in place. And of course it is always possible that the majority view is simply correct, that we understand life well enough already, basically just needing more data, more computational power, and revised statistical tweaks.
Our personal feeling is that we are ripe for something radically new, that will make many facts that don't now fit the current paradigm fall into place. In reality, right now, most bets and almost all scientific momentum and the way science works, and careers are built, are in the business-as-usual, normal-science mode, but can deeper thinking change that? People aren't yet asking well-posed questions, and we think not enough even recognize that there's a problem.
What we've tried to do here is suggest reasons why we think a change in how we view the role of genes in biology may be overdue, and to trigger readers to think seriously about what that might be.
1 comment:
Here are a few notes from an email our friend Popcorn Charlie sent us this morning, commenting on this and yesterday's posts. We've learned a lot from Charlie, and continue to do so.
Your 18 statements that reflect a very high level of ignorance should be the starting point for any genetics course built around Firestein's approach to how science should be conducted. I.e. given these starting points the best science is searching for a black cat in a dark room, not knowing whether there is really a cat in there. Most will continue to describe the room. Not much we can do about this because we are not in Lake Wobegon where everyone is above average.
I suggest you guys add one more to your list of facts:
19) Genetic effects that replicate, i.e. are not context dependent, are rare and inconsequential in explaining phenotypic variations.
Post a Comment