Tuesday, February 14, 2012

Ptolemaic genetics: epicycles of lobbying

That was then...
Ibn al-Shatir's model for the
appearances of Mercury,
showing the multiplication of
epicycles in a Ptolemaic
enterprise. 14th century CE
(Wikimedia Commons).
Way back then, in the dark ol' days of science, the Roman astronomer Claudius Ptolemy (90-168AD) tried to explain the position of the planets in terms of divinely perfect circles of orbit around God's home (the Earth).  The idea that we were at the center of perfect celestial spheres was a standard 'scientific' explanation of the cosmos and our place in it.

But the cantankerous planets refused to play by the rules, and their paths deviated from perfect circles.  Indeed, occasionally the seemed to move backward through the skies!  Still, perfect circular orbits around Earth simply had to be true based on the fundamental belief system of the time, so astronomers invented numerous little deviations, called epicycles, to make the (we now know) elliptical orbital pegs fit the round holes of theory.

And then along came Nicolaus Copernicus (1473-1543 AD).  And the cosmos was turned inside out: the earth was not the center of things after all!

Thomas Kuhn famously described in The Structure of Scientific Revolutions how the best and the brightest scientists struggle valiantly to fit pegs into holes they don't really fit, until some bright person ccomes along and shows the benighted herd a better way to account for the same things.  Copernicus, Galileo, Newton, Einstein, and others were the knights in shining armor who inaugurated some of the most noteworthy of these occasional 'scientific revolutions.'  Darwin's evolutionary ideas are also a classic example.

The same kind of struggle is just what is happening now in genetics and evolutionary biology--indeed in many other fields in which statistical evidence runs headlong into causal complexity.  Whether, when, or what knightly change will occur is anyone's guess.

And this is now
Everyone remembers the hoopla the sequencing of the human genome was met with when it was announced (or rather, each time it was announced) -- we were promised that we would by now not only know why people were sick, but we'd be able to predict what we'd get sick with in future.  It was promised that this would be a silver-bullet reality by the early 21st century by no other than Francis Collins.  Others were promising lifespans in the centuries: all of us would be Methuselahs!

So, all those illnesses would now be treatable or preventable in the first place. How?  Well, the genome would allow us to identify druggable pathways, and common diseases must be due to common genetic variants (an idea that came to be known as common disease common variant, or CDCV), and if we could just identify them, we'd be in business.  After all, didn't Darwin show us that everything about everything alive was due to genetic causation and natural selection?  If that's the case, we should be able to find it, and our wizardry at engineering would take the ball and run with it.  Big Pharma jumped on the 'druggable' genome bandwagon and people running big sequencing labs jumped on the CDCV idea, and genomewide association studies (GWAS) were born.  And then the 1000 Genomes project, and all the -omics projects....  Big is better, of course!  Not that these efforts weren't questioned at the time, based on what everyone should have known about evolution and population genetics, but the powers-that-be plowed ahead anyway.

Well, we're no longer in a minority of naysayers.  It's widely recognized that GWAS haven't been very successful, relative to the loud promises being trumpeted only a few years ago.  And even the successes they have had -- and numerous genes associated with traits have been identified, it must be said -- typically explain only a small amount of the variation in disease, or any trait, in fact.  So now researchers are working on automating the prediction of disease from gene variants based on protein structure and other DNA-based clues.  But the assumption--the belief system, really--is still that the answer is in the DNA, and disease prediction is still going to be possible.

A piece in Feb 9 Nature describes a number of state-of-the-art approaches to predicting the effects of DNA variants, in part based on what amino acid changes do to proteins.  The idea now is that diseases are going to be found to be due to rare variants, and the challenge is to figure out what these variants do.  In part, evolution will help us to do this.
"Sequencing data from an increasing number of species and larger human populations are revealing which variants can be tolerated by evolution and exist in healthy individuals."
But, are we trying to explain a current disease, or predict the diseases someone will eventually get? These are different endeavors, though it may often be inconvenient to acknowledge that.  Rare pediatric diseases that are due to single genetic mutations, or genetic diseases that cluster in families (and, again, usually with young onset age and rare) are easier to parse than the complex chronic diseases that most of us will eventually get.  But, based on the comparison of the genomes that have already been sequenced, we now know that we all seem to differ from each other at something like 3 million bases.  That is, we all have a genome that has never existed before and never will again. Assigning function to all that variation is from daunting to impossible -- not least because a lot of it might not even have a function.  And the idea that we'll eventually be able to make predictions from those variants is based on questionable assumptions.

It's true in one sense that every disease we get is genetic -- everything that happens in our body is affected by genes -- but in another sense, much of what happens is a response to the environment, and so is environmentally determined--that is, is not due to genetic variation in susceptibility.  Predicting a disease from genes when it's due to combined action of genes and environment, therefore, is a very challenging problem.

Here is just one example of why: Native Americans throughout the Americas are about 65 years into a widespread epidemic of obesity, type 2 diabetes and gallbladder disease, diseases that were quite rare in these people before World War II.  There are a number of reasons to suspect that their high prevalence is due to a fairly simple genetic susceptibility.  But, if gene variants (still not identified) are responsible, they have been at high frequency in the descendants of those who crossed the Bering Straits from Siberia for at least 10,000 years -- which means that variants that are now detrimental were "tolerated by evolution and exist[ed] in healthy individuals" for a very long time.

If geneticists had wanted to predict 70 years ago what diseases Native Americans were susceptible to, these variants would have been completely overlooked, because they weren't yet causing disease.  And indeed these 'risk' genes, whatever they be, were benign -- until the environment changed.  We're all walking around with variants that would kill us in some environment or other, and since we can't predict the environments we'll be living in even 20 years from now, never mind 50 or 100, the idea that we'll be able to predict which of our variants will be detrimental when we're old is just wrong. In fact, we're each walking around with substantial numbers of mutant or even 'dead' genes, with apparently no ill effect at all -- but who knows what the effect might be in a different environment.

But, ok, some of us do have single gene variants that make us sick now.  Many of these have been identified, most readily when a family of affected individuals is examined (though the benefit of knowing the gene is rarely of use therapeutically), but many more remain to be.  The current idea is that this can be done by looking for mutations in chromosome regions that are conserved among species, and figuring out which of these change amino acids (and thus the protein coded for by the gene).  The idea is that unvarying regions are unvarying because natural selection has tested the variants that arose and found them wanting, thus eliminating them from the population.  They must, therefore, be functionally important!
A host of increasingly sophisticated algorithms predict whether a mutation is likely to change the function of a protein, or alter its expression. Sequencing data from an increasing number of species and larger human populations are revealing which variants can be tolerated by evolution and exist in healthy individuals. Huge research projects are assigning putative functions to sequences throughout the genome and allowing researchers to improve their hypotheses about variants. And for regions with known function, new techniques can use yeast and bacteria to assess the effects of hundreds of potential mammalian variants in a single experiment.
This is potentially useful, because for those with single gene mutations that cause disease -- 1 variant among 3 million other ways in which each person differs from everyone else -- homing in on the causative mutation is, again, difficult to impossible if you don't have a large family with similarly affected individuals in which to confirm the association of mutation and disease.

Well, if we can do with or without a protein (or other functional DNA element), depending on the variation we have across the genome, then even when the element is important its variation in a given individual may not be causal: there are many examples where that is clearly true.  Further, the same kind of evolutionary reasoning would say that centrally important -- and hence highly conserved -- parts of the genome probably cannot vary much without being lethal, largely to the embryo.  So, from that equally sound Darwinian reasoning, we would expect that disease-associated variation will be in the minor genes with only little effect!  So the 'evolutionary conservation' argument cuts both ways, and it's not at all clear which way its cut is sharpest.  It's a great idea, but in some ways the hope that searching for conservation will bail us out, is just more wishful thinking to save business as usual.

Methuselah (Della Francesca ca. 1550) 
To complicate things even more, not all amino acid changes cause disease, or even do much of anything.  And again, sometimes they will only be harmful in a given environment.  And, of course, not all diseases are caused by protein changing mutations -- sometimes they are caused by disturbances to gene regulation.

In fairness, the multitude of researchers trying to make sense of the limitless genetic variation that is pouring out of DNA sequencers recognize that it's complicated.  But then, why are they still saying things like this, as quoted in the Nature piece: “The marriage of human genetics and functional genomics can deliver what the original plan of the human genome promised to medicine.”

What's to the rescue?  Do we need another 'scientific revolution'?
We have no idea when or if our current model of living Nature will be shown to be naive, or whether our understanding is OK but we haven't cottoned on to a seriously better way to think about the problems, or indeed whether the hubris of computer and molecular scientists' love of technology will, in fact, be victorious.  If it comes, it could be.  But we are certainly in the midst of a struggle to fit the square truths about genetics and evolution into the round holes of Mendelian and Darwinian orthodoxy.

Perhaps the problem to be solved is how to back away from enumerative, probabilistic, reductionistic treatment of complex, multiple causation, and to make inferences in other ways.  We need to understand causation by numerous, small or even ephemeral statistical effects, without our current enumerative statistical methods of inference. In terms of the philosophy of science, doing that would require some replacement of the 400 year-old foundations of modern science, based on reductionistic, inductive methods that enabled science to get to the point today where we realize that we need something different.

The situation here is complicated relative to scientific revolutions in Copernicus', Newton's, Darwin's or even Einstein's time by the large, institutionalized, bureaucratized, fiscal juggernaut that science has become. This makes the rivalries for truth, for explanations that this time will finally, really, truly solve the complexity problem even more frenzied, hubristic, grasping, and lobbying than before.  That adds to the normal amount of ego all of us in science have, the desire to be right, to have insight, and so on.  Whether it will hasten the inspiration for a transforming better idea, or will just force momentum along incremental paths and make real insight even harder to come by, is a matter of opinion.

Sadly, the science funding system, including the role of lobbying via the media, is so entrenched in our careers, that dishonesty about what is claimed to the media or even said in grants is widespread and quietly acknowledged even by the most prominent people in the field: "It's what you have to say to get funded!", they say.  But where does dissembling end and dishonesty begin when it comes time to the design and reporting of studies (and, here, we're not referring to fraud, but to misleading results and over promising the importance of the work)?  The commitment to the ideology and the promises restrains freedom of thought, and certainly dampens innovative science.  But it's a trap for those who have to have grants and credit to make their living in research institutions and the science media.
Zip-line over rainforest canopy,
Costa Rica (Wikimedia)

But right now, scientists are like tropical trees, struggling mightily to be the one that reaches the sunlight, putting the others in their shade. What we need is a conceptual zip-line over the canopy.

No comments: