Thursday, October 31, 2013

Time to re-think? In search of biology's "Schroedinger equation"

 In the history of 20th century physics, there were some profoundly unsettling conundrums.  Among the most important were observations of what seemed to be inconsistency between the idea that the world was made up of discrete objects (e.g., particles and other 'stuff') moved by continuous, Newtonian, forces (e.g., gravity, energy)--both of which were in a sense absolute.  Separate, but fixed, laws applied to discreteness and continuity.  So long, as Einstein showed, as you were within a particular frame of reference.

But various critical observations showed that such a duality of views just didn't work (nor did the idea of absoluteness).  There were exceptions, all sorts of strangeness.  The  anomalies were precise (unlike those in biology), so debate could be focused.  Adherents to traditional views clung fastly to them.  But then the leaders of transformational change like Einstein and others began to eat away at conventional wisdom.  Was light a wave or a particle?  Are electrons located in a specific place...or are they even 'particles'?  Symbolically representing the transformation that resulted is Schroedinger's equation that accounted for the quantum-based dynamics of the universe on its atomic scale.  It's oversimplifying, but there, in one go, many things fell into place.

Do we need a 'Schroedinger equation' for biology?  In the past couple of posts, we've noted reasons why we think we're in a period of biology in which we have amazing technology for identifying aspects of causal complexity, but are long on data and short on theoretical explanations.  We listed some rather similar-feeling unease, to what physicists went through, about whether our current understanding of life is facing fundamental paradoxes, and we wondered if our own analog to the famous equation is in order.

What we need is Bell Labs, not Bell Telephone
Our posts generated a decent bit of Twitter-ink, including, apparently, a lot of agreement, but among the comments was the suggestion that

1.  "If you don't have a solution, you shouldn't raise the issues."  The objection is understandable, but unreasonable: nobody likes a critic, much less a heretic.  But this common reaction is circling the wagon.  It's like saying that if I see your house on fire, I shouldn't tell you unless I have a fire-hose to put it out with. 

2.  "Until you tell us what to do, I'm going to continue doing what I'm doing."  The objection is understandable, and it's how humans are--especially in our current careerist environment.  But how costly in brain-cells and resources is that?

One major problem is that we have steadily been structuring universities as money-makers, venal institutions that imitate the rest of society rather than protected environments where intellectual activities can occur without needing to satisfy an immediate bottom line.   Again, it's understandable, but society should have such oases if it wants real innovation.

By reputation, at least, Bell Telephone created Bell Labs as isolated, protected arenas for unconstrained thought.  High quality scientists were given a building, a coffee pot, and blackboards, and the door was locked and they were told to do whatever they wanted.  Every now and then they had to report at least something of what they were case it might make for something useful for the phone company.  The incredible results, over decades, showed that the model works.

Universities, too, were like that--it's why they're often called 'ivory towers'. Unfortunately, these venal days, universities are now much more like Bell Telephone than Bell Labs.  That, we think, is tragic for society.  But we, at least, are senior enough that we don't have to give a damn what the organization thinks, so, whether or not we have the appropriate level of talent, we do have the ability to think as freely as we want.  We're effectively in Bell Labs!  But really, so is everyone on the internet.

So, some thoughts from "Bell Labs" 
Here are a few "What if?" issues that we think are worth musing over, in regard to the normal view of life in this arena, and how one might revisit or revise.

1. Identical causes are not identical
If we have a 'gene' with two states, A and a (following Mendel's 1866 notation as if nothing's changed since them), then we assume all A's are alike, and all a's are alike.  And extending this, that all AA's are alike.  Risk's are assigned to alleles (or, in some cases, genotypes like AA or Aa), and are assessed by counting (regression, chi-square tests) based on inherently subjective statistical decision-making criteria (whose results don't force us to make the putative decision if we don't like them!).

But what if things that are alike aren't alike, after all?   What if each 'A' allele is functionally different?  Even if we just refer to a single SNP (DNA location), so that all 'A' alleles are chemically identical (unlike, some, that may be methylated), there will be some span around the site beyond which each instance is different.  Not all AA genotypes are alike.  At some point as the span extends, at least, each 'allele' is unique, or becomes increasingly more so.  Concepts like Hardy-Weinberg are misleading in various ways because they treat labeled identity as if it were functional identity and aggregate categories (like "AA"s).  Monoallelic expression, which may be far more widespread than is accepted in standard thinking (which, again, goes uncritically back to Mendel in 1866), is then clearly important: each cell in an AA individual, expresses only one of it's  'A's.  Likewise with dominance deviations for quantitative traits: we may be treating salad as if it's all carrots.  The identity span becomes something to test or understand.  There are obvious experimental ways to look at such things.

2. Life is entropy generating, but negative entropy is its engine
Life is an evolutionary molecular reaction.  We evolved from nested common origins, so species are built of similar things and hence serve as convenient nutrition for each other, as life 'tries' to disperse concentrated energy or materials in an entropy-generating way.  The simplest causes (like classical 'dominance') are wholly negentropic: they concentrate function, because if you have the genetic variant, you always have its causal result.  But we tend to enumerate functional effects of localized variation in genomes.  We do a lot of hand-waving about interaction, ROC-based risk estimates, systems biology, and the like.  But that's not yet much more than very poorly organized hand-waving--it's another form of enumerative Big Data rather than something more systematic.

But what if variation operates differently?  What if there are patterns of entropy, or some measure of variation, as one moves along or around or 'over' a genome--patterns among cells in individuals that can be related to their the genomes in the single cell that they arose from, and patterns among individuals within a species--that one can relate to functions  Since genomic function, and hence evolutionary trajectories, are not just linearly organized, perhaps we need a better way to characterize a proper entropy measure.  Since not all things that seem the same are the same (point 1 above), the usual plog p conception of 'entropy' may not be the right way to think of this because 'p' is a frequency measure that aggregates things as if they were identical.  By a conceptual analogy, the less entropy by some appropriate genomewide measure(s), the greater the functional effects; variation scattered more or less uniformly all over the place can't do 'work'.

What is meant here by 'entropy' is very vague and thoroughly exploratory.  But carrying the thought further, DNA and hence genetic causation, is by itself inert.  Biological function relates fundamentally to context (and it's layered, as things like RNA editing show).  If one can characterize its (properly conceptualized) entropy--some measure of relevant variation, one might view its functional effects relative to its dynamically changing 'environment' (itself properly defined), in a way somewhat analogous to electromagnetism:  much as passing through a magnetic field induces current in a wire, genomes passing through environments induce functional outcomes.  But this must be assessed on a genomic scale since there will be wavelike inteference among different parts of the genome, effects reinforcing or cancelling each other, so the overall entropic features may dictate the net result.  If you think in terms of mathematical functions as a metaphor, perhaps we can construct a wave-like description of variation along genomes or per 3D or 4D genome configuration, analogous to the quantum descriptions of, say, electrons.

3. Life is more exploratory than conservative
Evolution is often viewed in a mindlessly Darwinian way, of intensive force-like competition driving genomes to ever-changing, ever-refining goals.  Yet when we know of functional parts of genomes mainly what we see among individuals or species is conservation.  This we attribute to purifying selection.  Adaptive selection does occur, but it, too, generally and quickly leads to reduced variation.  So it has come to pass that sequence conservation is assumed to be a key, or even defining criterion for biological function.  Yet---we now know of all sorts of aspects of genomes, like various non-coding RNAs or chromatin-packaging or spacing that represent functions that may be independent of sequence itself.

But what if sequence conservation is not the only, or not the main, criterion for biological function? What if the widespread mapping results showing 'polygenic' control of traits is correct, and each contributor is making so small a contribution that it evolves mainly by drift?  And many of the elements are short and hence easily created or erased by mutation, and their locations or binding-affinities are variable.  Many promoter and other replicable transcription or other sites in DNA clearly seem organized (they have low entropy) but are not associated with surrounding conserved sequence.  What if such multiplicity of effects makes them ephemeral?  Then our approach to causal inference needs revision, pointing to the desirability of some other criterion--like, perhaps, something related to aggregate or distributional effects such as 'entropy'.

These are just night-thoughts in the midst of an ongoing attempt to see not just whether really new thinking might be called for, but what that could possibly entail.  Right now, we're squandering both brainpower and large amounts of public resources chasing rainbows, because we've institutionalized and industrialized thinking.  That stifles originality.

In the history of science, it is usually puzzling facts like the above, and those we listed yesterday, that provoked hard thinking by persons who were free to do it.   The successes came from just such people.  So if we personally can do anything in a forum like a blog, perhaps it is just to raise issues for readers to think about.....

Wednesday, October 30, 2013

Are we there yet or do strange things about life require new thinking?

Yesterday we wrote about the state of things in genomics. The idea of genetics as essentially a reductionistic one gene, one trait approach to understanding causation and prediction is still a live one, despite decades of evidence to the contrary.  Indeed, despite the fact that we've known for 100 years that life is far more complex than that.  Yet still today the prevailing paradigm is to collect more data, enumerate more genes and gene variants associated with disease, and other sorts of 'omics' Big Data, and we'll finally understand causation and be able to predict disease.  It is largely raw induction--the data will speak for themselves by the patterns computers can find in them.  But in many ways, the closer we look, the stranger things seem, not clearer.

In his book The Structure of Scientific Revolutions, published in 1962, Thomas Kuhn described what scientists do as 'normal science' interrupted by rare, transformative changes of fundamental viewpoint.  He called these moments 'paradigm shifts', now a terribly over-worked phrase.  People are very reluctant to give up a worldview they know and have worked with, and are either oblivious to contrary facts or problems that seem insoluble.  Until someone comes along with a fundamentally better idea that accounts for those contrary facts, and then people wonder, as Huxley did about Darwin's theory, why they hadn't seen it all along.

We have seen over the last few years that there are important areas in which the proverbial emperor of genomics has been shown to have less than adequate clothes, or more accurately, that there is not very much emperor in the huge cloak of modern 'omics'. We're awash in data, with new sorts appearing regularly (e.g., ever-growing lists of SNPs, copy number variation, microbiomes, epigenetic modifications, genes in pathways, etc.).  This has added potentially causal elements to efforts to relate genomic data to the traits of organisms, like disease or adaptations, and to a withering amount of complexity about which there is much angst about how it can be parsed.  Some findings have been quite important, but most have been minor at best, and often totally ephemeral.

But what we get and what most are seeking are just lists, in some ways that only a computer can love (or have the patience to look through), and lists don't account for the many, many spatial and temporal entanglements, of diverse form, between the multitude of factors we know are involved in making organisms what they are, in 4-dimensional space and time.

It is tempting to think that some  revolutionary theory is just around the corner if only someone makes the profound discovery--the next Newton or Einstein (or Darwin).  Darwin's insight was as profound as these others, but what he saw was that life, unlike atoms, seems imprecise by nature--based as it is not on replication but on divergence by random variation weakly screened by experience.  And despite widespread but uncritical views to the contrary, Darwin's very Newtonian simple causal determinism was patently imprecise or incomplete.  Is there something fundamental about causation in life and genomes that is yet to be discovered?

In a sense, the evolutionary and functional genomics professions are clinging to conventional notions much the way early 20th century physicists clung to 'ether' in the face of relativity theory:  if we just have better technology, bigger samples and enumerate more and more things, and build statistical models to infer patterns we attribute to causation, we'll understand everything and answer riddles like 'hidden heritability' or enable 'personalized genomic medicine' ... finally!  So, defenders of the faith say to skeptics: patience, please--let us carry on!

But is this right?  What if we ask whether there might be something more involved in life than relentless 'omic'-scale beetle-collecting?

Do strange things about life require new concepts?
Here is another list, this time of a few discoveries or realizations that don't easily fit into the prevailing view, suggesting that simple ramping up of enumeration may not be our salvation:
1.  The linear view of genetic causation (cis effects of gene function, for the cognoscenti) is clearly inaccurate.  Gene regulation and usage are largely, if not mainly, not just local to a given chromosome region (they are trans);
2.  Chromosomal usage is 4-dimensional within the nucleus, not even 3-dimensional, because arrangements are changing with circumstances, that is, with time;
3.  There is a large amount of inter-genic and inter-chromosomal communication leading to selective expression and non-expression at individual locations and across the genome (e.g., monoallelic expression).  Thousands of local areas of chromosomes wrap and unwrap dynamically depending on species, cell type,  environmental conditions, and the state of other parts of the genome at a given time;
4.  There is all sorts of post-transcription modification (e.g., RNA editing, chaperoning) that is a further part of 4-D causation;
5.  There is environmental feedback in terms of gene usage, some of which is inherited (epigenetic marking) that can be inherited and borders on being 'lamarckian';
6.  There are dynamic symbioses as a fundamental and pervasive rather than just incidental and occasional part of life (e.g., microbes in humans);
7.  There is no such thing as 'the' human genome from which deviations are measured.  Likewise, there is no evolution of 'the' human and chimpanzee genome from 'the' genome of a common ancestor.  Instead, perhaps conceptually like event cones in physics, where the speed of light constrains what has happened or can happen, there are descent cones of genomic variation descending from individual sequences--time-dependent spreading of variation, with time-dependent limitations.  They intertwine among individuals though each individual's is unique.  There is a past cone leading of ancestry to each current instance of a genome sequence, from an ever-widening set of ancestors (as one goes back in time) and a future cone of descendants and their variation that's affected by mutations.  There are descent cones in the genomes among organisms, and among organisms in a species, and between species. This is of course just a heuristic, not an attempt at a literal simile or to steal ideas from physics!
Light cone: Wikipedia

8.  Descent cones exist among the cells and tissues within each organism, because of somatic mutation, but the metaphor breaks down because they have strange singular rather than complex ancestry because in individuals the go back to a point, a single fertilized egg, and of individuals to life's Big Bang;
9.  For the previous reasons, all genomes represent 'point' variations (instances) around a non-existent core  that we conceptually refer to as 'species' or 'organs', etc.('the' human genome, 'the' giraffe, etc.);
10.  Enumerating causation by statistical sampling methods is often impossible (literally) because rare variants don't have enough copies to generate 'significance', significance criteria are subjective, and/or because many variants have effects too small to generate significance;
11.  Natural selection, that generates current variation along with chance (drift) is usually so weak that it cannot be demonstrated, often in principle, for similar statistical reasons:  if cause of a trait is too weak to show, cause of fitness is too weak to show; there is not just one way to be 'adapted'.
12.  Alleles and genotypes have effects that are inherently relativistic.  They depend upon context, and each organism's context is different;
13.  Perhaps analogously with the ideal gas law and its like, phenotypes seem to have coherence.  We each have a height or blood pressure, despite all the variation noted above.  In populations of people, or organs, we find ordinary (e.g., 'bell-shaped') distributions, that may be the result of a 'law' of large numbers: just as human genomes are variation around a 'platonic' core, so blood pressure is the net result of individual action of many cells.  And biological traits are typically always changing;
14. 'Environment' (itself a vague catch-all term) has very unclear effects on traits.  Genomic-based risks are retrospectively assessed but future environments cannot, in principle, be known, so that genomic-based prediction is an illusion of unclear precision;
15.  The typical picture is of many-to-many genomic (and other) causation for which many causes can lead to the same result (polygenic equivalence), and many results can be due to the same cause (pleiotropy);
16. Our reductionist models, even those that deal with networks, badly under-include interactions and complementarity.  We are prisoners of single-cause thinking, which is only reinforced by strongly adaptationist Darwinism that, to this day, makes us think deterministically and in terms of competition, even though life is manifestly a phenomenon of molecular cooperation (interaction).  We have no theory for the form of these interactions (simple multiplicative? geometric?).
17.  In a sense all molecular reactions are about entropy, energy, and interaction among different molecules or whatever.  But while ordinary nonliving molecular reactions converge on some result, life is generally about increasing difference, because life is an evolutionary phenomenon.
18. DNA is itself a quasi-random, inert sequence. Its properties come entirely from spatial, temporal, combinatorial ('Boolean'-like) relationships. This context works only because of what else is in (and on the immediate outside) of the cell at the given time, a regress back to the origin of life.
. . .  . you can probably add other facts, curious things about life that are not simply list-like and at the very least challenge the idea that we can understand genomic causation with current approaches.

Is there an analog of 'complementarity' or something equivalently important missing?
These facts are, to paraphrase Einstein about strange phenomena in quantum physics, 'spooky' if you think about them in terms of normal ideas about life or even just about genes.  They are far from the idea of DNA as a linear code or 'the' blueprint for life, or even as a source of 'information' read off like one reads a sentence in an email message.  Yet, generally, we explain biological causation with statistical descriptions of the above sorts of phenomena, based on sampling and enumeration studies, but even huge studies of hundreds of thousands of people, and millions of genomic loci aren't getting us very far.

We do, of course, have a huge array of experimental ways of using reductionist approaches to understand  many sorts of  processes--transcription, physiological reactions, translation, countless others.  We use animal and cell culture models with many fine results where reductionist approaches are in order or are suited to our objectives.  Each gives us something of a view of biological causation.  But often if not usually without asymptotic precision--more than just measurement error. 

Even in most of these instances, and especially at higher levels of observation, we currently have no theory  that is remotely comparable to fundamental theories in chemistry and physics.  There are general evolutionary and population genomic patterns that may even be widely observed, but the patterns are basically empirical rather than being predicted by some sort of 'laws' that compare to those of physics.  It's not even clear how deeply we understand how things work.  We can observe statistical patterns, but they are not of the rigorous kind of probabilistic processes found in physics or chemistry.  As with, say, relativity, you can ignore it unless you approach a critical point (the speed of light, say).  Then you must have a better theory of what's happening and a better way to assess it.  Perhaps we have reached such a point in our desire to make precise predictions about genomes, a kind of limit of utility of enumeration-based thinking.

If countless, ephemeral variants have individually minor, but overall substantial contribution to traits, enumeration and statistical significance criteria simply won't work even if the effects are real.  Similarly, the number of known interactions even in biologically simple reactions are typically (and obviously) vastly more than can be characterized or identified by sampling and standard statistical analysis.  Everyone knows this.  But we don't yet have anything comparable to 'entropy' or 'statistical mechanics' to deal with this adequately--for example, to make precise predictions. Yet the standard view, basically not based on any profound creativity, is that what we need is bigger enumerative studies--'Big Data'.

Is there something missing at a basic level?  The list above suggests that this may be so.  In many ways we may not even be asking well-posed questions.  Would a true conceptual change of some kind lead us to the kinds of predictive uses of genomic data that is being promised?  Will it lead to a serious new theory of genetics?  Is such a change even in the offing or will we just plow ahead with very expensive, sausage-grinding normal science indefinitely?

It is easy to think of the strange facts and argue that a transformative insight will put them all in place. And of course it is always possible that the majority view is simply correct, that we understand life well enough already, basically just needing more data, more computational power, and revised statistical tweaks.

Our personal feeling is that we are ripe for something radically new, that will make many facts that don't now fit the current paradigm fall into place. In reality, right now, most bets and almost all scientific momentum and the way science works, and careers are built, are in the business-as-usual, normal-science mode, but can deeper thinking change that? People aren't yet asking well-posed questions, and we think not enough even recognize that there's a problem.

What we've tried to do here is suggest reasons why we think a change in how we view the role of genes in biology may be overdue, and to trigger readers to think seriously about what that might be.

Tuesday, October 29, 2013

This just in: genes 'for' disease might be an outdated concept!

By Anne Buchanan and Ken Weiss

Science writer David Dobbs has a nice piece over at Slate this week, wrapping up the American Society of Human Genetics meetings that just took place in Boston.  Yes, he writes, the field of Human Genetics can report successes finding genes for rare diseases, and sequencing genomes has gotten fast and (relatively) cheap.  But the overall story is one of "struggle and confusion."  Reports of genes for [fill-in-the-blank] can't be replicated, some sequencing methods are missing causal mutations, and indeed, "[t]here's a real possibility that "the majority of cancer predisposition genes in databases are wrong.""  This is a time of growing humility, Dobbs writes, as geneticists recognize that the promise of the human genome as portrayed by Francis Collins and Craig Venter et al. in 2000 has not panned out.

At the same time, a new paper in Trends in Genetics ("A century after Fisher: time for a new paradigm in quantitative genetics", Nelson, Pettersson and Carlborg, 23 Oct, 2013), tweeted and retweeted on Oct 28, argues that, well, biology is complex. 
Despite easy access to commercial personal genetics services, our knowledge of the genetic architecture of common diseases is still very limited and has not yet fulfilled the promise of accurately predicting most people at risk. This is partly because of the complexity of the mapping relationship between genotype and phenotype that is a consequence of epistasis (gene-gene interaction) and other phenomena such as gene-environment interaction and locus heterogeneity. Unfortunately, these aspects of genetic architecture have not been addressed in most of the genetic association studies that provide the knowledge base for interpreting large-scale genetic association results. 
This is not new news!
But, we ask, why is any of this news??  In response to Dobbs's tweets on Oct 28 about his Slate piece, Jason Moore reminded Dobbs about a paper he and and Scott Williams published in 2009 in which they argued, "based on the emerging data and analyses, that elucidating the genetic architecture of breast cancer and comparable diseases must focus on underlying complexity." Another paper about epistasis.

It's perhaps self-serving of us to point this out, but there have been plenty more, and much earlier warnings about complexity.  If it seems strictly like vanity, we have pointed out that these publications noted that the basic theoretical understanding, and data to support it, had been known for close on a century.

For example, in the days when sequencing a single gene was still news, we were involved in one of the first projects to document variation, the extent of which was surprising at the time -- but shouldn't be any longer.  That was a cardiovascular disease project and here's just one of the papers that resulted, documenting variation in the human lipoprotein lipase (LPL) gene.  This was one of the first publications to alert the human genetics community that the genetics of disease was going to be more complex than people were expecting.  Published in Nature Genetics, the news was not hidden.  

In 2000, Weiss and Terwilliger argued in a commentary, again in Nature Genetics ("How many diseases does it take to map a gene with SNPs?"), that the common disease/common variant model then current was based on faulty understanding of population genetics and biology. It was, they said,
...fuelled by a faith that the genetic determinants of complex traits are tractable, and that knowledge of genetic variation will materially improve the diagnosis, treatment or prevention of a substantial fraction of cases of the diseases that constitute the major public health burden of industrialized nations. Much of the enthusiasm is based on the hope that the marginal effects of common allelic variants account for a substantial proportion of the population risk for such diseases in a usefully predictive way. A main area of effort has been to develop better molecular and statistical technologies, often evaluated by the question: how many SNPs (or other markers) do we need to map genes for complex diseases? We think the question is inappropriately posed, as the problem may be one primarily of biology rather than technology.
Note that this paper was published 13 years ago.  Terwilliger has said numerous times that it could be published again today, without changing a word.  In fact, we'll post the conclusion from the paper here, as it's still entirely a propos
Resistance to genetic reductionism is not new, and we know that, by expressing these views (some might describe them as heresies), we risk being seen as stereotypic nay-sayers. However,ours is not an argument against genetics, but for a revised genetics that interfaces more intimately with biology. Biological traitshave evolved by noise-tolerant evolutionary mechanisms, and atrait that doesn't manifest until long after the reproductive lifespan of most humans throughout history is unlikely to be genetic in the traditional, deterministic sense of the term. Most genetic studies that focus on humans are designed, in effect, to mimic Mendel’s choice of experimental system, with only two or three frequent states with strongly different effects. That certainly enables us to characterize some of the high-penetrance tail of distribution of the allelic effects, but as noted above these may usually be rather rare. But inflated claims based on this approach can divert attention from the critical issue of how to deal with complexity on its own terms, and fuel false hopes for simple answers to complex questions. The problems faced in treating complex diseases as if they were Mendel's peas show, without invoking theterm in its faddish sense, that 'complexity' is a subject that needs its own operating framework, a new twenty-first rather than nineteenth or even twentieth century genetics.
Here's a figure from that paper, showing the logic of causation, and gene mapping strategy.

From Weiss and Terwilliger, Nat Genet, 2000
Schematic model of trait aetiology. The phenotype under study, Ph, is influenced by diverse genetic, environmental and cultural factors (with interactions indicated in simplified form). Genetic factors may include many loci of small or large effect, GPi, and polygenic background. Marker genotypes, Gx, are near to (and hopefully correlated with) genetic factor, Gp, that affects the phenotype. Genetic epidemiology tries to correlate Gx with Ph to localize Gp. Above the diagram, the horizontal lines represent different copies of a chromosome; vertical hash marks show marker loci in and around the gene, Gp, affecting the trait. The red Pi are the chromosomal locations of aetiologically relevant variants, relative to Ph.
That commentary wasn't hidden either, published as it was in Nature Genetics.   And a paper by Weiss and Clark a few years later ("Linkage disequilibrium and the mapping of complex human traits", Trends in Genetics, 1 Jan, 2002), argued that the then-current idea of common variants/common disease was not based on "sound population genetic principles".  They showed that there were good, long-known reasons why the idea was simplistic and wouldn't pan out as a way to find genes for many common diseases, as it proved not to.

Indeed, in Ken's own Cambridge Press book published in 1993 Genetic Variation and Human Disease, the situation was also rather clearly laid out, even at the beginning of the mapping era.  Of course, we weren't the only people writing about this, but the important point is that the issue is not new, and the reasons for expecting things to be more or less as we have found them were stated for the right reasons---reasons that go back to the early 1900s and are often attributed to RA Fisher's famous 1918 paper on polygenic inheritance that in many ways laid the foundation for the modern evolutionary synthesis that united Mendelian inheritance and Darwinian evolution.  Others, less routinely credited, were just as clearly pointing out the multilocus causation of complex traits.

We've published in print and on this blog a number of commentaries on complexity, the perils of genetic reductionism, and the need to move beyond Mendel.  So why is it taking so long for the profession to learn, or to recognize, the situation?

Why the deaf ear?
The answer is complex and we've described the influence of Mendelian thinking (of single gene causation, among other things), the hope for simple, easy success with dramatic effects, the way this played into the grant system, the lobbying for funding, and (importantly) the evolving tractability of experimental approaches.  Humans, even purportedly objective scientists, hear what they are attuned to hear, and reject or dismiss or mention only in passing things that are inconvenient.

If you do sequencing or genotyping and you know how to collect piles of genetic data and analyze it, and you get your grants by doing that (and, directly or subtly promising cures), and have a staff whose jobs you want to protect (including your own medical school salary), and the media reward bold announcements, or you're an NIH project officer with a grant portfolio to protect, then you won't hear the music of the spheres.  In many ways, the dog (state-of-the-art technology) is wagging the tail (questions and interpretation).

Also, and to be at least somewhat fair, you will retrospectively cite the successes that clearly do arise from this approach, even if they may mainly be only partial.  You will justify business as usual by the argument that serendipity has led to many major scientific advances so if we keep pouring resources into this particular approach, until the next generation of fancy equipment is available etc., that some lucky discoveries will be made (you don't point out that if we instead focused resources on what really is 'genetic', we would also make serendipitous discoveries).

So you manufacture exciting! news.  We have this or that mystery!  It's the hidden heritability, for example, and the solution is that it's all copy number variation!, or rare not common variants!, or will be found in whole genome sequencing!, or it's epigenomics!, or proteomics, or the microbiome!, or you-name-it. 

More of the same can be predicted
Science usually works in this way, especially in our industrial-scale business-model world. The hunger for the secret key is quite understandable. No one will or perhaps no one can, slow down and think first, and lobby only afterwords.
As is usually the case in human society, our profession will move towards some new set of claims, when like a school of fish something comes along and it's where the funding will lie.  Meanwhile, we have a long list of known causes that could have major public health benefits if they were as aggressively pursued (e.g., the fact that most diseases being mapped are more environmentally than genomically caused, the fact that we have hundreds of single-locus traits for which real concentrated effort might lead to cures or prevention, etc.).

But can there be something better? 
From the meeting: 

Dobbs writes, "If the field is currently a bit lost in the fog, whoever clears the air could become to Watson and Crick as Watson and Crick were to Darwin."  But to a great extent, the air has been cleared!  While genes 'for' most common complex diseases aren't being found, and we still can't predict who'll get what sometime down the road, we've learned a tremendous amount in the last several decades -- some, mostly rare, mostly pediatric diseases are caused by single genes, but most traits are complex, polygenic, caused by no one pathway, and most likely involve gene by environment causation.

Is rethinking just a word, in an area that really will simply yield to an increase in the scale of operations?  Or is some truly new conceptualization in order, and if it is, what might it be?

We'll comment on this and make some suggestions next time.

Monday, October 28, 2013

The genetics of olfaction: In evolution, where there's one, or one thousand....there's more. But how much more? Part II.

Last Friday and today we're discussing the means by which you are able to detect and discriminate among infecting pathogens to which you may be exposed.  We reviewed the immune system and the idea of monoallelic expression, by which only one of the two copies of a gene you inherit is chosen for use in a given immune cell.  In fact, in the immune system, it's really only part of a cluster of gene segments that is used, and only one cluster of the two in your inherited genome.  It's called 'adaptive' immunity (you also have a second, very different, 'innate' immune defense unrelated to our subject here).

We went on to say that odor detection is a somewhat similar challenge:  of the countless odor molecules that you might want or need to be able to detect, your body can't know all in advance.  Since odorants are molecules, perhaps a molecular--a genetic--method of open-ended variation would serve as well as it does in our 'adaptive' immune system. In fact, we have something similar in our odor-detection sense.  But it works very differently from the immune system, using unrelated genes and unrelated (as far as is known) gene-selection mechanisms.

We have about 1,000 (yes, that's thousand, about 5% of our genome) different Olfactory Receptor (OR) genes in our genomes. (We've posted on aspects of this story in past years, e.g., here and here.)  Since we have two copies of all our genes (except for XY genes in males), that means we have a total of nearly 2,000 OR genes.  An OR gene codes for a protein that sits in the surface of olfactory neurons, hanging out in the nasal passage, in contact with the air we breathe, and the odorant molecules it carries.  The OR genes are in clusters, large and small arrays of adjacent, related OR coding genes, scattered across most of our chromosomes.  Each OR gene is the result of a past duplication event, so is closely related to some other OR's often right nearby on the same chromosome; 'closely' means a few mutations different from the 'parent' gene that have arisen as part of the duplication or since that occurred.

OR locations in the genome (red); obtained from
In some mammals that rely heavily on smell, most of the OR genes, in most clusters, are working.  In primates and humans, we've still got as many identifiable OR genes but more of them are dysfunctional, having been mutated out of usability; the protein code just doesn't generate a functional OR protein.  But for many or most, that means that ORs that may be dysfunctional in one of my copies may be functional in the other, or in other people.

Olfactory neurons develop as part of the nasal cell lining.  They express OR genes on their surface, that dangle into the nasal airway.  But as a future olfactory neuron develops, it picks one OR gene, from one cluster, and of that only from one of the two copies of that cluster, to express.  The other copy is inactivated as are all the other 1,999 OR genes!  But this is not so easy to explain as the comparable monoallelic expression of the antibody genes that we described on Friday.


The two copies of them, at least, are in the corresponding place on the two copies of the respective chromosomes (one inherited from each parent).  That makes them in principle easier to be put into correspondence by which one is chosen and the other inactivated.

In the case of the OR genes, the developing olfactory neuron must first pick one of the many clusters to use, and then use only one OR gene from that cluster, and then inactivate the other genes in all other clusters, both copies, on all other chromosomes!  Pick one gene, and then go find all the others and silence them.

This is a remarkable feat and the very explicable outcome is that each olfactory neuron uses only one OR protein, and can detect only odorants that, wafting by, are recognized by that OR protein and trigger the neuron to send an "Aha!"  message to the brain. The brain collates the messages sent at any given time by all the neurons in both nostrils, and assembles and remembers a catalogue: "These neurons respond together and we'll call that 'lemon';  next time this same collection of neurons fires, I'll know to get out the glass, water and ice cubes--lemonade!"

Because they are being duplicated rapidly and they mutate rapidly, the 2,000 OR genes provide a huge repertoire of potential responders, and their combinatorial signals, which provides the kind of open-ended cataloging of odorants--including mates, predators, food and prey.  This is an extensive form of monoallelic expression that seems obviously to have been useful to our ancestors' survival.

Current status
In 2005 I spent a sabbatical leave at the Sanger Genome Center near Cambridge, England, trying to find DNA sequence motifs that might help account for this very selective monoallelic expression.  I didn't succeed....but neither has anyone else since.  A recent review by Ivan Rodriguez in Cell describes what is known about this system, and an article in the same journal describes one facet of that mechanism.

Rodriguez notes that some cluster-specific regulatory regions have been found on some chromosomes, and there are instances of those having similar sequences; but they are far down the chromosome from the OR genes themselves, they only affect their local cluster, which makes such regulatory sequence elements hard to find on other clusters.   Previous ideas of global all-cluster control sequence signals have not panned out.  But a mechanism has been found that, in part at least, explains how a cell that happens at first to pick a defunct OR gene for use fails to develop further but instead is returned to the OR-picking stage.  It's not known what happens if a cell mistakenly picks more than one OR gene, but such cells are rarely seen so they must degrade somehow, or fail to reach the relevant part of the brain (the olfactory bulb).  The part of the DNA including a chosen cluster region is open for expression, but how the rest of the OR-cluster areas on the other chromosomes are silenced isn't known.

Other related systems have similar one-gene selection. These include the V1R and V2R genes that code for pheromone receptors and two other groups, the FPR and TAAR genes, that are each expressed in a one-only way but in a different part of the nose, called the vomeronasal area. 

An older idea that the cluster-containing parts of chromosomes were located near the outside of the neuron cell's nuclear membrane was shown to be wrong; instead these areas may be clustered toward the center of the nucleus.  This may bring various OR regions closer together so they can be activated or shut down.....but how are these areas shepherded there?  And since an olfactory neuron is also expressed in many hundreds of other genes, how are they kept open for expression?

This is considerable advancement in our understanding of how the chosen OR gene leads to a functional neuron or how a dysfunctional OR choice leads to a re-selection. It explains some aspects of how an OR cluster's genes are opened for expression, and how the coded protein is processed outside the cell's nucleus. But it doesn't answer the $64 question:  how is the choice made in the first place and the unlucky 1,999 lottery losers shut down?  How is this partially related to the neighboring cells' choice, presumably because they are recent cellular descendants of each other in the growing nasal epithelium?  And how is the choice totally independent in other cells?

The OR genes are members of a large, ancient family of genes that code for all sorts of cell-surface receptors that are used in all sorts of other functions, but as far as is known (or at least as far as we're aware) their expression is not monoallelic and they are not used in a choose-one/exclude-the-others mode.  So mystery upon mystery still attends the monoallelic expression, the very nice bookkeeping strategy, by which we can tell when we smell a rat.

What is curious is that the same mechanism does not seem to apply to the various other known forms of monoallelic expression, described on Friday, each of which is very deep in evolutionary terms.  And more curious is the question: how widespread is monoallelic expression, if it exists, in other systems where choosing among many genes is not the 'trick' but in which there may be reasons for only one copy of a gene to be used in a given cell in a given context?  If recent evidence is any guide, we will find many more examples.

This is yet another way in which our rather stereotypical, and one may say superannuated Mendelian concepts are in need of some serious rethinking.

Saturday, October 26, 2013

A glimpse at the difficulty with early Homo

The new Dmanisi skull's got me thinking about early Homo more than normal, especially since it's what's on the horizon for my paleoanthropology course this semester.

This morning I unearthed a  project that never got published as a paper and it's worth sharing here.

A couple years ago I tried to address just one problem with distinguishing different species among early Homo fossils: The usefulness (or not) of molar outline shape. 

With great contribution from, and collaboration with, a student who went on to do graduate work in biological anthropology, together we presented a poster at the 2010 Paleoanthropology Society meetings in St. Louis called,

"From rhomboid to rectangle: Virtual wear of early Homo molars"

Here's a jpeg of that poster.

I'm posting it here in case it's useful for anyone concerned with these issues.

Email me (holly_dunsworth at or tweet (@hollydunsworth) if you'd like a nice pdf or more information.

Friday, October 25, 2013

The genetics of olfaction: In evolution, where there's one, or one thousand....there's more. But how much more? Part I.

The way we're taught genetics in school, we have two copies of every gene and both are expressed in their appropriate tissues.  That expression is based on regulatory sequences near to the gene, and when specific proteins (call them Transcription Factors, or TFs) stick to those sequences, the nearby gene is transcribed into messenger RNA which is then used to make the protein the gene codes for.

Thus, like the pair-by-pair march onto Noah's Ark, your two sets of globin genes march on--are expressed--in red blood cells, your digestive enzyme genes in the gut, neurotransmitter genes in brain cells, and so on.  So if you have an 'aa' genotype both 'a' alleles are expressed, but if you're 'Aa' your cells get a comparable dose of each....just as Mendel told us.

Well, not exactly!  Each copy is expressed at a level determined by its nearby regulatory DNA sequences.  These could be very different (due to inherited variation in their sequence details, among other things), so even if both copies are being used, they may not be being used as much.

But that's not all.  Sometimes we find monoallelic expression: only one of the two copies is being used!  Our knowedge of this once-strange exception to the rule has been growing rapidly in recent years.  Now we know there's nothing exceptional about it.....but how 'nothing' is that?

X marks the un-used: monoallelic expresssion and sex determination
Female humans and other mammals are specified by having two X chromosomes.  Since males have only one X and a very different Y chromosome, one might expect the delicate cell environment to be impaired in males,  with only a single dose of all the X-linked proteins, when for the rest of their genes on all the 22 other chromosomes they had a double dose.  But instead, it has been found that in females, for about 85% of the genes on the X-chromosomes, only one of the copies that she carries is actually used.  The other is silenced.  Early in a female embryo's development each cell picks one of its X's to use, and inactivates the other (except for about 15% of genes, when both are used).  Thereafter, descendent cells express only the chosen X.  Since this is random, about half a female's cells use a gene on one X chromosome, the other half genes on the other X.  The use of only one of two available genes in a given cell is known as monoallelic expression (the two copies being 'alleles' of the same gene).

The patchy distribution of color on tortoiseshell cats results from the random inactivation of one X chromosome in females. Scitable, by NatureEducation,

Monoallelic expression, beyond the X
A specific mechanism for this was worked out, starting decades ago, and today much is known about how it works.  It was long thought to be a unique random-selection-inactivation phenomenon. But things in evolution seem to pop up again and again, sometimes in very different guises.  For example, some decades ago it was found that the antibody genes that are expressed in white blood cells and work against microbial infections have something similar to X-inactivation....but also quite different.  Each white cell selects from a large array of possible gene sections, chains them together to form the messenger RNA for an antibody protein, then cuts the un-used sections from the DNA and discards them from the cell.  Not only that, but the other of the two copies of this entire gene is inactivated.

The protein coded by the activated genecopy is shepherded to the surface of the white blood cells, and as the cell circulates in the blood or lymph, it bumps into whatever is around.  If it happens to bind to something it recognizes, like a nasty virus particle, a cascade of events is triggered.  Among other things, the excited white cell divides rapidly, making more of its kind (that remember and continue to express the chosen antibody gene), so they can detect other copies of the infecting villain that are in the body.  When successful, this triggers a cascade of hunt-and-destroy activities that rid the body of the infection.
B Cell development; Weiss and Buchanan; Genetics and the Logic of Evolution, 2004

I've omitted many details, but it is in this basic way that you fight the kinds of infection your inheritance can't have directly predicted--that is, that you didn't evolve specifically to be able to ward off. It works because basically every white blood cell lineage is using a somewhat different antibody molecule, and this  diversity makes it possible to molecularly 'recognize' unpredictable foreign molecules.  There are several gene complexes, located on different chromosomes, that are used in essentially the same way in the formation of various white blood cell types. How these choices are made and the antibody gene regions identified is not fully known.

A given pathogen or alien substance, pollen, virus, bacterium etc., has its molecular surface characteristics.  If one or (usually) more than one part of its surface can be recognized by at least one lineage of immune system cells because of the particular variant receptor they have on their surface, then the pathogen can be grabbed and surrounded by such cells or antibody molecules, and destroyed.

Is that all there is to monoallelic expression?  By no means!

Olfactory receptors: monoallelic expression big-time
How do you smell?  We're not talking about your BO or your Chanel No. 5 sex appeal.  No, we mean how can you smell?  How can you tell it's bananas, lemons, or a dead rat you smell when you walk into a dark room?

The world is drenched in molecules wafting through the air, all around us, and of an essential infinity of different structures.  If detecting food or predators has always been important, one would expect the system to be quite specific, or at least quite sensitive.  In fact, while there may be some specific smell senses (see, pheromones, in Monday's post), in animal evolution it has apparently been generally safer to generate a large array of more or less random odorant detector molecules so that your chances that at least one of them would be sensitive to any odor molecule you may come across, and be sensitive to, and able to remember and discriminate among such molecules.   Your ancestors needed to be able to tell if they smelled a rat or a ratatoulle.

As with the unpredictability of potential sources of infection, you can't know what odors you may be exposed to or want to identify--and remember.   In fact, this has been achieved in mammals and even in insects by a kind of combinatorial mechanism very similar to that used in immune defenses.....but with entirely different mechanism!  We'll talk about that next week.

Thursday, October 24, 2013

Who will remember, Big Boy?

Old guys taking testosterone to help stiffen up (their muscles) is a new fad, reports the NY Times.  And even not such old guys.  And why not, if it can fix their "Low T", make them young and full of energy again, play sports like they're 30, return their sex drive to their studly days?
Well, not so fast.  The effectiveness and use and advertising and all that are now being called into question.  Testosterone products "are increasingly being sold as lifestyle products, to raise dipping levels of the male sex hormone as men age. . . . .The market for testosterone gels evolved because there is an appetite among men and because there is advertising" and so on, as one might guess.  That is, manufacturers have once again created a disease for which they can sell a high-priced cure.  Has this lead to overuse?  Needless cost on something that doesn't work?

Virile Jupiter, statue at the Louvre (Wikipedia)

These are questions that are being asked, to resistance by the manufacturers and promoters of testosterone-for-everyone, of course.  But they are separate from the issue we'd like to raise here.

Who will remember?
We are not particularly qualified to judge the questions of current use or abuse patterns, nor the effectiveness of T-therapy, nor the personal or vested interests involved in all the great claims about these substances.  There's a lot one could say about those topics.  But instead, our point for mentioning the story here on MT is a different one.  It is one that applies to epidemiological inference, and hence our understanding of genetic and general public health causation.

Testosterone therapy is a current fad, that according to the story is taking many forms.  Some are through the usual channels of the medical system and may be documented, though how thoroughly is not clear.  But apparently much of the T-therapy usage is of the informal, over-the-counter type. Regardless of the hormone's actual properties or uses, legitimate or otherwise, much of that usage will largely be undocumented or under-documented.  Yet it can have substantial effects on the health and physiology of the users--if the claims are true.

However, lifestyle factors other than motorcycling without a helmet do not cause disease right away. If they did, hardly anyone would expose themselves to those factors.  Instead, these behaviors and exposures cause no or few immediate symptoms (or may give good immediate outcomes as the story suggests many believe), or -- and mainly -- they cause symptoms or disease decades later, combining with the accumulation of all sorts of behaviors and exposures.  Even smoking takes decades for its major health effects to arise.

Years from now, given the epidemiological research fads of that time, how well will exposure to testosterone be remembered or documented?  If their effects are dramatic, we will know about them--though even then it is hard to disentangle cause and effect.  But if they are subtle, as most lifestyle factors are, will they even be included in survey studies of exposure to risk-factors?  Even if they prove to be a substantial risk factor, how accurately will the timing, duration, and level of doses be remembered?  Particularly if users were on the young end when they partook. 

Big Data resources, regardless of their privacy-invading potential, will document what's known, and Big Data miners with future NIH grants to find whatever they can without having to think much about it in advance, will generally only be able to examine the ore that's in the mine.   Unreported, unremembered, transitory, low-level, or inadvertent exposures will be statistical 'noise' in the system.  Or, worse, exposure will be correlated with all sorts of other behaviors and lifestyles--confounders, the bane of epidemiology--and supposed causal associations will be spurious.

Epidemiology often relies on interviews or what was measured (and how it was measured) in widespread clinics and offices over decades.  Asking about your current diet may tell us something, but dietary risks are usually manifest much later and only if your current diet reflects what you ate long ago, even as a child, will diet questionnaires be veryinformative .  This is widespread or even typical experience in epidemiology, generating serious data reliability issues that lead to the common ephemeral pronouncements about risk that we see in the news media almost every day.

In the 1950s, hula hoops made their debut and were very popular for a few years, far more than they are now.  The fad pretty much died out, though it may be resurging temporarily these days.  The kind of hip and spine rotation that hula hooping involved could, in principle, have later impact on arthritis or other bone disease (we're just being hypothetical, here).  But who would include hula-hoop use patterns in a lifestyle survey?  Instead, we'd look for genes!

For those of us who have been around for a considerable time, we have seen fads like testosterone salves come and go, often very quickly.  But if physiological set-points, mutations, or other sorts of lasting effects result from the exposure, it will to a great extent disappear from memory, often leaving the field to other, even lesser, confounding but measured factors that happen to be current at the time of the study, decades after the relevant exposures.

This is another way to illustrate the rather deep issues we face when it comes to important aspects of scientific knowledge, in a wide array of areas. 

Wednesday, October 23, 2013

Health effects of vitamin D - another epidemiological conundrum

Forget an apple a day keeping the doctor away, current wisdom suggests that a dose of vitamin D a day will keep every imaginable specialist away.  Vitamin D deficiency has been associated with cancer, cardiovascular disease, immune problems like asthma, flu, tuberculosis, autoimmune diseases like multiple sclerosis, low bone density and osteoporosis, and many other diseases.  As with so many epidemiological studies, however, the evidence for most of these is not unequivocal.  Indeed, it's not even clear what "normal" vitamin D levels should be.  Even so, vitamin D supplementation is big, with almost half of adults over 50 in the US currently taking them.

However, as reported in the NYTimes and elsewhere, a new paper in The Lancet ("Effects of vitamin D supplements on bone mineral density: a systematic review and meta-analysis", Reid et al., Oct 11) concludes that there's essentially no evidence that vitamin D supplementation improves bone density or prevents osteoporosis.

The paper is a report of a meta-analysis of twenty-three studies, with an average duration of about 2 years.  A total of 4082 participants, 92% women, with an average age of 59, were included in the studies in the meta-analysis.  Nineteen of the studies were of predominantly white women.  Baseline vitamin D was generally within normal levels, women were healthy by and large, and were given vitamin D doses between 500 and 800 units per day.  Some were also given calcium.  Bone mineral density was measured at one to five sites (lumbar spine, femoral neck, total hip, trochanter, total body, or forearm).  The analysis showed a small benefit only at one site (femoral neck), though not in all the studies, and no effect was found at any other site.  The authors conclude,
Continuing widespread use of vitamin D for osteoporosis prevention in community-dwelling adults without specific risk factors for vitamin D deficiency seems to be inappropriate.
Vitamin D is available in small amounts in some foods, but most mammals, including humans, make their own, induced by exposure to the sun.  Without a doubt, severe vitamin D deficiency can cause osteomalacia (softening of the bones) or rickets (softening of the bones in children), but other health effects aren't so clear.  And, optimal vitamin D levels haven't been agreed upon, and they may well vary by age, sex and ethnicity.

2-yr old rickets patient; Wikipedia

Indeed, people with darker skin don't synthesize as much vitamin D from sun exposure as lighter- skinned people do, but African American women are less susceptible to osteoporosis and fractures than are European women (much has been written about this paradox, including this).  So, even after decades of intense study, there's a lot that's not yet understood about vitamin D and its contributions to health.

Whether or how vitamin D boasts the immune system, or protects against the many diseases it is said to are still open questions.  Multiple studies have shown that TB patients do have lower serum vitamin D levels than healthy controls, but it's not clear that low vitamin D preceded disease, and vitamin D supplementation doesn't seem to speed recovery.  Thus there is the question of cause vs correlation, statistical association vs confounding by unmeasured correlated causal factors.  The same kinds of equivocal findings are true of other diseases.  Further, clinicians don't agree on optimal levels, and excess vitamin D, too, may be associated with risk of some cancers or atrial fibrillation. Each person is likely to react differently to high or low doses, based on genetic variation or other lifestyle factors.

There's much still to be sorted out about the association between vitamin D and health.  Because results have not been definitive, it's likely that the effects of vitamin D are modest, at best, except for long known effects of severe deficiency.  Because vitamin D doesn't act alone, and the complex diseases it has been associated with are themselves associated with multiple risk factors, teasing out the role of vitamin D won't be straightforward.  Another epidemiological conundrum.

Tuesday, October 22, 2013

The mad farmer and the World Food Prize

The World Food Prize, also known as the 'Nobel Prize' for Food, was given to its 2013 recipients on October 18.  This year's winners were Mary-Dell Chilton, founder and researcher at Syngenta Biotechnology, Robert Fraley, chief technology officer at Monsanto, and Marc Van Mantagu, founder and chairman of the Institute of Plant Biotechnology Outreach at Ghent University in Belgium.  All of these institutions have been involved in developing genetically modified crops.

The World Food Prize Foundation described the accomplishments of this year's winners this way:
Building upon the scientific discovery of the Double Helix structure of DNA by Watson and Crick in the 1950s, Van Montagu, Chilton, and Fraley each conducted groundbreaking molecular research on how a plant bacterium could be adapted as a tool to insert genes from another organism into plant cells, which could produce new genetic lines with highly favorable traits. 
The revolutionary biotechnology discoveries of these three individuals —each working in separate facilities on two continents—unlocked the key to plant cell transformation using recombinant DNA. Their work led to the development of a host of genetically enhanced crops, which, by 2012, were grown on more than 170 million hectares around the globe by 17.3 million farmers, over 90 percent of whom were small resource-poor farmers in developing countries. 
The combined achievements of the 2013 World Food Prize Laureates, from their work in the laboratory to applying biotechnology innovations in farmers’ fields, have contributed significantly to increasing the quantity and availability of food, and can play a critical role as we face the global challenges of the 21st century of producing more food, in a sustainable way, while confronting an increasingly volatile climate.
Biotech won big this year.

The World Food Prize Foundation was founded by Norman Borlaug, who himself received the Nobel Peace Prize in 1970 for his work fighting hunger around the world, by using biotechnology of the time to increase agricultural productivity.  Called the father of the Green Revolution, Borlaug led efforts to improve agriculture in non-industrialized countries in the 1940's, 50's and 60's with higher yield grains, hybridized seeds, increased use of synthetic fertilizers and pesticides, and improved irrigation techniques among other biotechnological innovations.  Borlaug is credited with saving a billion lives through his efforts.

Land area used for genetically modified crops by country (1996–2009), in millions of hectares. In 2011, the land area used was 160 million hectares, or 1.6 million square kilometers; Wikipedia

Food production did increase in countries that adopted Green Revolution techniques, particularly India, but, as agricultural journalist and economist Alan Guebert reminds us, there's a difference between productivity and efficiency.  He reports on his recent trip through the farmlands of California in his  Oct 20 Farm and Food File column.
If demographic California now looks like what experts say America will resemble in a generation or two—multi-cultural, multi-lingual, more crowded—then California’s agriculture may soon be America’s farming past.

The reason becomes clearer with every mile you travel in this beautiful, incredibly productive valley: It’s very hard to see any future to any food system that devours so many intensively concentrated resources—water, fuel, artificial fertilizer, chemicals—so America can eat strawberries in February. 
A few days later, a young environmental engineer in Berkeley disagrees when I offer that thesis. “California’s agriculture is too efficient to ever change,” he says.
Now I disagree. You’re confusing efficiency with production, I say. California is highly productive, no argument. But it is inefficient and without water it’s neither. 
This just about sums up the criticism many have of the Green Revolution; productive by not efficient.  Yes, productivity rose but at great cost -- the required energy input to produce food in this highly mechanized, biotech, fossil fuel-reliant way increased rapidly, and the crop output/energy input ratio has decreased over time.  Farmers around the world have become dependent on inputs -- fertilizers, pesticides and herbicides, many developed from fossil fuels, and hybrid genetically modified seeds -- which can be prohibitively expensive, and ultimately unsustainable.

The work of this year's Food Prize winners is entirely in keeping with Borlaug's view of how to feed the world.  Indeed, he knew the three of them, and hoped that their efforts would one day be recognized with this prize.  Genetically modified foods remain controversial, of course, but the winners hope that the prize will help quell the opposition.  Chilton was quoted in a USA Today story, saying, "My hope is this will put to rest the misguided opposition" to the crops...  She called genetically modified organisms a "wonderful tool" in the fight against hunger.

But, despite what protesters so often say, genetically modified foods per se are not the problem.  It's the technologies that are required to produce them that are the problem, which are neither efficient nor sustainable, nor progress toward longterm food security.  Much of agribusiness depends on monocropping, which, as we wrote last week (here and here), can be less productive than rotating crops, and the cause of more soil erosion, and increased dependence on toxic chemicals and fossil fuels.  And, the grains produced are annuals rather than perennials which, because they are patented, means that farmers must buy seed every year, and become dependent on suppliers and their herbicides and pesticides, and on conventional agricultural practices.
The three nested systems of sustainability; the economy wholly contained by society, wholly contained by the biophysical environment: Wikipedia
Ecological economists, like Herman Daly or Josh Farley (co-authors of "Ecological Economics: Principles and Applications") or Rob Dietz (co-author of "Enough is Enough: Building a Sustainable Economy in a World of Finite Resources") would say that the problem is even more fundamental than dependence on biotechnology.  The problem extends to how we think about economies and ecosystems in general.  As Richard Heinberg says in his 2011 book "The End of Growth: Adapting to Our New Economic Reality", 18th and 19th century economic philosophers, like Malthus, for example, considered land (or natural resources), labor and capital to be the three essentials of the economy. In this equation, because natural resources are finite, growth must necessarily come to an end at some point.

Then Adam Smith came along, and with him, the idea that the economy could continue to grow and grow.  Heinberg says this is because of the "gradual deletion by economists of land from the theoretical primary ingredients of the economy (increasingly, only labor and capital really mattered, land having been demoted to a subcategory of capital)."  Ecological economics, however, returns to the view that natural capital is finite and in light of this, argues in favor of steady-state no-growth, or even degrowth economics, rather than the pervailing model, the growth economy.

In this view of the world, it could be said that this year's World Food Prize rewards the right people doing the wrong things.  Biotechnology harms the environment, leads to food insecurity, creates the farmer's and thus the consumer's dependence on industries that must attend first to their bottom line, and perpetuates economic practices based on the idea that natural resources are unlimited.  We know this, and we know what should be done instead.  Will it happen?

We end with a poem by farmer and poet of the land, Wendell Berry. 

The Mad Farmer Revolution

Being a Fragment
of the Natural History of New Eden,
in Homage
To Mr. Ed McClanahan, One of the Locals

The mad farmer, the thirsty one,
went dry. When he had time
he threw a visionary high
lonesome on the holy communion wine.
"It is an awesome event
when an earthen man has drunk
his fill of the blood of a god,"
people said, and got out of his way.
He plowed the churchyard, the
minister's wife, three graveyards
and a golf course. In a parking lot
he planted a forest of little pines.
He sanctified the groves,
dancing at night in the oak shades
with goddesses. He led
a field of corn to creep up
and tassel like an Indian tribe
on the courthouse lawn. Pumpkins
ran out to the ends of their vines
to follow him. Ripe plums
and peaches reached into his pockets.
Flowers sprang up in his tracks
everywhere he stepped. And then
his planter's eye fell on
that parson's fair fine lady
again. "O holy plowman," cried she,
"I am all grown up in weeds.
Pray, bring me back into good tilth."
He tilled her carefully
and laid her by, and she
did bring forth others of her kind,
and others, and some more.
They sowed and reaped till all
the countryside was filled
with farmers and their brides sowing
and reaping. When they died
they became two spirits of the woods.

On their graves were written
these words without sound:
"Here lies Saint Plowman.
Here lies Saint Fertile Ground."

                       Wendell Berry

Monday, October 21, 2013

How many 'human' species are there? Is it even a real question? Why does anybody care? The Dmanisi skulls

I was a graduate student in Anthropology at the University of Michigan in the 1970s.  It was one of the leading departments at the time, and the home of an ecological and evolutionary genetic framework for viewing human evolution. The view was based on what is known as the 'modern evolutionary synthesis' from the 1930s, in which Mendelian inheritance of genetic variants, and Darwinian evolutionary dynamics were united.  This union allowed one to interpret the fossil record in terms of species diversity, adaptation, and evolution within a single genetic framework--at least in theory.  The evolutionary population genetic approach differed from the more stereotypically categorical approaches of morphological schools of thought that, to oversimplify, took type specimens to represent species.

Michigan's leading lights at the time were the late (and much missed) Frank Livingstone who modeled genetic population dynamics, and Loring Brace who analyzed fossils.  Milford Wolpoff showed up towards the end of my time there.  Many distinguished future paleontologists were there as students.

Genetic and morphological variation and species formation: a problem!
The modern synthesis allowed one to relate intra- and inter-population variation, rather than 'type', to evolutionary principles and dealt among other things with how much variation is found within a species and how much between species.  Variation represented genetic diversity (even if the actual genes were not known), and the idea was that after a certain amount of time had elapsed and diversity had accumulated, new species formed.  Darwin himself just hand-waved about this (Chapter 4 of Origin of Species), but the implication was that when you had enough variation you could infer new species.

This idea has had widespread acceptance but if you think about it, it doesn't really hold water because there is no definition of what 'enough' is--and only a somewhat artificial definition even of 'species'.  As human beings make very clear, morphological diversity is not the same as species diversity: people from the ends of the world can mate successfully even though we differ in adaptive morphological ways (e.g., skin color) and humans from the ends of the world have been separated by around 100,000 years.

A challenging example of morphological species assessment is sexual dimorphism.  Males and females from the same species can be quite different and while we can try to explain sexual dimorphism in terms of the behavior of living species, there's no simple rule.  Humans, chimps, and gorillas are all very closely related, but hugely different in their sexual dimorphism.  This means we're on shakier ground when it comes to fossils. 

Behavioral variation and species formation:  also a problem!
In the atmosphere of the 1970s, we had to use more abstract genetic and ecological theory to try to make sense of the scattered fossil record known at the time and when we had little genetic data.  One of the major theoretical arguments in the Michigan school was what is called the Competitive Exclusion Principle (CEP).  This asserted that two species could not occupy the same ecological niche: they must differ in important ways in the places, foods, times, and so on that they lived.  If they overlapped too much, they would in a true Darwinian sense compete too severely, and one would die out.  Or, perhaps, the two could never have evolved in the first place to be so similar.

This idea has since been revised here and there, but if you think about it, it's rather vague if not tautological just as is the problem of defining species.  One can always find some difference in diet, location, habits, or behavior, and proclaim that the CEP was being confirmed--each species has its own 'niche'!  That's rather vacuous, no matter how neatly Darwinian it sounds.  After all, not even two members of the same population in the same species eat or behave exactly alike.

In any case, how would the CEP work in the case of humans or hominids (our ancestral species since separating from common ancestry with chimpanzees)?  As Loring Brace used to argue in papers and texts, culture is the human ecological niche.  Culture includes technology, language, ritual, symbolism, and so on, and Brace's argument was that a species with culture would simply drive out of existence any other species that also had culture.  The idea was that only one species could have culture.  There's only one species of humans, right?  As with the CEP in general, this may sound OK unless you think too carefully about it.  It's a kind of argument by definition. We have different cultures around the world, or even nearby, and they tolerate each other.  Or, if it is culture per se, rather than cultural difference, how can that be defined, and how can one in any serious way 'prove' that no two hominid species could have culture in the same place (not to mention different places)?  Does using stone tools count as culture, or not?  What level of verbal activity is culture, and what is just, well, grunts?  Single Species is about as non-definitive or even tautological as CEP.

The Single Species hypothesis
Regardless of these epistemological problems, which generally weren't raised at the time, the bottom line at Michigan was the Single Species assertion.  This was a hyper 'lumper' view of human paleontology, as opposed to the 'splitter' view in which many different contemporary species were asserted based on the fossil record (investigators often naturally wanting to be able to give species names, and hence confer special importance, to the bones they found--something that continues to this day).

Most anthropologists  at the time accepted the existence of a diversity of contemporary species often if not typically during the time since we and chimps shared a common ancestral species. This hyper-splitter view was held even to the point (as Brace often satirically pointed out) that many leading anthropologists seemed unwilling to accept that any of the known fossils, morphologically crude relative to us moderns as they seemed, were actually our noble species' ancestors:  all the fossil specimens were from side branches that became extinct!

One  aspect of this had to do with whether modern-looking fossils represented humans who expanded out of Africa around 100,000 years ago and exterminated the other existing hominids for whom we have fossils dating back even well before a million years ago in Africa and Eurasia.  The replacement hypothesis was opposed by the hypothesis, in some ways based on Single Species thinking, that humans had evolved globally always as a single species, continuous one-species evolving as new genetic abilities arose here or there and spread by diffusion through mating between adjacent populations.

In the last several decades, a steady march of new finds has led to a diminution of the Single Species hypothesis, especially in the earliest hominid finds around 2 million years ago.  Even some of the Michigan types (not to be named) claimed they'd never really argued for a single species.  There is at present at least little or no argument against the existence of multiple species at least at our hominid origins (Australopithecines and other forms).

Single vs multiple species issues recently have pertained mainly to the more recent period.  However, a new find in Dmanisi, Georgia, that has just been described (see below) is viewed as answering that question.  But does it?

Modern genetics to the rescue?
Earlier, I referred to the viewpoint back in my graduate student days as a genetically based one.  It was that, but only in principle.  Other than a few blood group markers, like Rh and ABO, we had very little actual data on genetic diversity even among humans and even less between species (that is, few variants--'alleles'--in humans even were known also to exist in chimpanzees).  So genetic arguments were largely theoretical.

But from the 70s onward, genetic data (on protein variants) from global samples of modern humans began to accumulate.  This led to controversy because the date of separation of all of us moderns from our common ancestor that was estimated from this genetic variation corresponded to the finding of modern-looking fossils first in Africa and then expanding elsewhere at around 100,000 years ago.  But the existence of million year old fossils from Eurasia tended to undermine the Single Species hypothesis and support a the view that two contemporary hominid species, one old and one modern  competed, and our ancestors won out completely.

People were arguing about this, and in one of my very first papers, in 1976 (Am J Phys Anth 44:31-49), the population geneticist Takeo Maruyama and I looked at the allele-frequency data, sampled from different continental populations (Europe, Asia, and Africa), from which the 100,000 year global human separation time had been estimated.   We did computer simulations to show that gene flow between neighboring populations--the kind of mate exchanges that human hunter-gatherers as well as our primate relatives engage in and hence likely applied at the time, would be able to spread genetic variation fast enough, to reduce inter-group differences enough, that an actual 1,000,000 year separation could look like a 100,000 separation, that is, by making populations look too closely related if you assumed no gene flow among them as the typical genetic tree-making models of the time did.  Indeed, we argued that
"It seems futile to employ population genetic models and present protein data to attempt to reveal about the past what the actual, ancestral specimens themselves cannot."
This was in essence a single-species argument that suggested that no literal replacement of one hominid species by another need have taken place.  Where does this argument stand now?  Since that time, we have vastly more direct genetic data from fossils themselves, and not just from frequencies of various alleles but from whole DNA sequences.  This enables one to compare sequences among Neandertals and other fossil groups as well as chimpanzees and modern humans and look at their differences and similarities.

What has been seen suggests that there was in Eurasia some inter-group admixture.  For example, one can see close sequence similarity across a given chromosome among the fossils, but this is interrupted here and there by runs of rather different, and seemingly more distant sequence that seem to come not from the same lineage but from more distantly related ones.  Such 'introgression' as well as sequence elements in these same genomes that may have quite different apparent times of common ancestry.  The general interpretation of such evidence is that after some thousands of years of separation, individuals came together (so to speak) and produced offspring. 

That of course implies that, by the classical species definition of reproductive incompatibility, not only did these guys not exterminate each other or manifest the CEP (at least, not immediately), but they were reproductively compatible with each other.  Nor do these data preclude the gene flow time-diminution issue.  Hello, single species hypothesis again?

The new Dmanisi find
Now we see the news, published in the Oct 18, 2013 issue of Science, and reported all over the media, of a remarkably well-preserved new fossil from Dmanisi in Georgia (the European one).  Here is one view of this skull:

Dmninisi specimen, Georgian National Museum

This is about 1.8 million years old, and deposited remarkably close in time to this one, archeologists say, were several other skulls that look 'very' different from it.  The variation is said to be similar to that of "patterns and ranges of variation in chimpanzee and bonobo demes....and in a global sample of H. sapiens," that is, of worldwide human variation.  Based on these interpretations, the conclusion is that these are of the same species.  Hence the splitters have been wrong.

But to compare this very local variation to our species and its 'racial' variation, is rather curious.  How, given their very local means of support and movement, could ancestral groups as diverse as, say, Amerindians, Koreans, Nigerians, and Finns be found in one local site?   This perspective would argue strongly against the inferences from the site as to times of deposition, or their population assertions.  Local demes that a single carnivore could prey on would intermix, if basically all other primates are a model.  They couldn't have stayed so diverse--unless they were, in fact, different species.

But if the interpretation of the coming together of human-like variation to a single site stretches the imagination, so does the alternative: how could such different groups be so locally different in the first place?  We're not talking about Columbus and some African slaves suddenly arriving in the Americas from long-distance sailing voyages!  How could they have ever become so diverse--unless they were, in fact, different species?  How much variation is too much variation for a single species in a single location?  The point here is to ask, rather than answer these questions.

 The instant and breathless pronouncements about the new Dmanisi specimen seem to replay the old Single vs Multiple Species hypotheses. But, perhaps as usual in the raucously public arena of human paleontology, there seems to be more noise than signal here, in the sense that all elements of the population ecology, as I've tried to suggest above, are vague and verge on tautological.  For example, the authors argue for single species using the arguments summarized above, invoking the idea that this is the most 'parsimonious' hypothesis--that is, the simplest explanation consistent with the facts.

But in the context of the above considerations among others, how is one to determine what parsimony means here---and, more importantly, what does one imagine justifies the 'parsimony' principle in such a situation?  Is the explanation 'consistent' with the facts?  Evolution is usually polyphyletic (branches of contemporary related species), so is 'single species' really simpler?  One might say that Loring Brace had to essentially coin the cultural-niche hypothesis to assert theoretical support for a single-species view.  From a process point of view, does evolution necessarily follow the simplest path, even if we could define what that is or might have been?  By what reasoning would one assert that?

Parsimony is a nice organizing idea, and usefully tends to reduce explanatory clutter and ad hoc or post hoc invention of reasons to fit an author's predispositions.  But in fact it has no authority in science (one could argue not even in physics), and in cases as vague as paleontological reconstruction is entirely too flexible, mainly serving as an assertion.  Single-species may be right or it may be wrong, but conclusions are being jumped to with rather minimal theoretical underpinning.  At 1.8 million years old, there will be no DNA to ride to the rescue.

Does the question that Takeo and I asked, nearly 40 years ago, still stand, or has it been answered?  If the admixture interpretation of the Neanderthal data is correct, then at least to some extent, admixture did occur.  By the usual species definition, these creatures at that time were a single species, no matter how genetically variable.  But the separation times of modern humans, based now on huge amounts of global DNA sequence data still are about 100,000 years, matching what some, at least, argue on the basis of known fossil morpologies.

Could gene flow have artificially reduced this apparent time?  Evidence for introgression--admixture--between what had been long-separated groups seems convincing at present, and in a technical sense settles the argument in favor of a single species.  But how far back?  My thinking is that while direct sequence rather than indirect allele-frequency data are better, the issues are not closed--neither by much better genetic data, nor by many more and better fossils. Perhaps new simulation studies are in order (I've developed a program that could do that, but haven't done it yet).

None of this makes the Dmanisi specimen unimportant, though whether it's the Find of The Century (for today) or just a very interesting find, is as debatable as one could ever want an issue to be if one wants to spill ink over it.  The species question is at least fascinating to many people.  What practical evolutionary difference it would make in the case of such very similar peoples, the question will keep many anthropologists gainfully employed for years to come, whether or not it settles any arguments!

The Dmanisi find is being discussed by many others, including very knowledgeably in blog posts by John Hawks and Adam van Arsdale.