Thursday, December 12, 2013

What domestication can and can't tell us about evolution

Domestication is the harnessing of one species by and for the benefit of another, usually via selective breeding. Humans are master domesticators, but we're not the only ones.  There are ants that farm fungi, and milk aphids, but it has also been suggested that the Melissotarsus ant in continental Africa and Madagascar has domesticated a scale insect for its meat but if not for meat, some other nutritional benefit such as of waxy secretions from the scale insect. Whether these ant/domesticate relationships depend on genetic changes is not clear, at least to us.

Two key differences are, first, that humans work teleologically, or so we assume!  That is, the 'artificial selection' that the domesticators had some end in mind--more yield, more easy harvestability, drought resistance, and so on.  Secondly, we presume that in the past, as now, this process involved not just purposive breeding, but strong selection -- much faster and more directed than natural selection.  If domestication was slow or inadvertent, then the resulting genetic picture may differ.

Humans appear to have begun to domesticate plants and animals ~12,000 years ago, and a new review in Nature Reviews Genetics ("Evolution of crop species: genetics of domestication and diversification," Meyer and Purugganan, online 18 Nov 2013) reports that recent work has identified genetic signatures of that artificial selection in plants, and of subsequent diversification of these crops.  These studies "reveal the functions of genes that are involved in the evolution of crops that are under domestication, the types of mutations that occur during this process and the parallelism of mutations that occur in the same pathways and proteins, as well as the selective forces that are acting on these mutations and that are associated with geographical adaptation of crop species."

Charles Darwin, of course, based much of his argument about evolution and natural selection in The Origin of Species on observations about artificial selection from the breeding of plants and animals for food.  He writes of this explicitly in these frequently quoted words from his autobiography:
..After my return to England it appeared to me that by following the example of Lyell in Geology, and by collecting all facts which bore in any way on the variation of animals and plants under domestication and nature, some light might perhaps be thrown on the whole subject. My first note-book was opened in July 1837. I worked on true Baconian principles, and without any theory collected facts on a wholesale scale, more especially with respect to domesticated productions, by printed enquiries, by conversation with skilful breeders and gardeners, and by extensive reading. When I see the list of books of all kinds which I read and abstracted, including whole series of Journals and Transactions, I am surprised at my industry. I soon perceived that selection was the keystone of man's success in making useful races of animals and plants. But how selection could be applied to organisms living in a state of nature remained for some time a mystery to me.
Fifteen months after I had begun my systematic enquiry, I happened to read for amusement Malthus on Population, and being well prepared to appreciate the struggle for existence which everywhere goes on from long-continued observation of the habits of animals and plants, it at once struck me that under these circumstances favourable variations would tend to be preserved, and unfavourable ones to be destroyed. The result of this would be the formation of a new species.
Here, then, I had at last got a theory by which to work; but I was so anxious to avoid prejudice, that I determined not for some time to write even the briefest sketch of it.   
Indeed, after the Origin, Darwin published two hefty volumes of the effects of artifical selection on plants and animals.

Now, in a modern parallel, Meyer and Purugganan argue that understanding of the genetic underpinnings of domestication can shed light on evolutionary processes, in general, specifically because domesticated crops are recent, selection was strong, and (presumably) consistently directional, and there is good archaeological and historic evidence of the origin, spread and diversification of domesticated crops.

Humans first began to domesticate plants about 12,000 years ago in the Middle East and Fertile Crescent, but plants were also domesticated elsewhere, in China, Mesoamerica, South America, sub-Saharan Africa, and North America from 10,000 to 6000 years ago as well.  Many of these were independent of each other, and involved totally different species, providing, in principle at least, multiple independent views of the genomic aspect of the process.  Artificial selection often involves genetic changes that reduce a plant's fitness in the wild, and species that are completely domesticated cannot survive without human intervention in their reproduction and growth. The process may be rapid, or may take thousands of years.

After domestication, in the "improvement phase", the species can diversify and spread, involving genetic and phenotypic changes that allow adaptation to different ecosystems and climates, generally as a response to selection pressure on chosen traits.  Many such traits have been selected for, but generally they have to do with increasing quality, yield and ease of farming.  Milk yield in dairy animals, tameness or ability to reproduce under domestication, for example.  Or, in grasses, the evolution of larger seeds than in wild grasses, and crucially, a non-shattering rachis.

Commonly observed traits accompanying domestication and diversification; Table 1, Meyer and Purugganan, 2013

When wild wheat is ripe, for example, the rachis (the stem on which the wheat shafts grow) easily shatters, allowing seeds to disperse in a wind or when otherwise disturbed.  This wouldn't be desirable in a crop plant, which the farmer wants to be able to harvest at his or her chosen time, and domesticated wheat has a history of selection for a less brittle rachis so that the seed remains in situ until ready for harvest.
 
Hulled wheat vs free-threshing wheat (wild vs domesticated); Wikimedia




Genes associated with domestication and diversification have been identified with fine-mapping or GWAS, primarily in maize and rice, although, say Meyer and Parugganan, identifying causal mutations has been difficult, although some functional studies have been done.  In addition, they point out, it can be difficult to distinguish mere correlation with domestication and diversification with causation. 

The first "domestication gene" identified was teosinte branched1 (tb1) which is responsible for differences in the shoot of wild and domesticated maize.  Not all changes can be traced to a single gene, however.  Hundreds of domestication genes and loci have been identified in other plants as well, largely in cereal crops, although, for many of the same reasons that genes 'for' disease and other traits can be hard to identify, the specific genes responsible and their functions are often difficult to narrow down--just as we have trouble finding 'the' gene or genes 'for' human traits. 

Architecture of domesticated maize vs wild teosinte; Doesbley, 2003
Even with complications, however, Murray and Parugganan evaluate the role in domestication or diversification of 60 specific genes that have been functionally validated and/or included in population genetic studies.  They note that many of these are regulatory genes that control the timing, amount, or cell-context of the gene's usage.  And, many of the genetic changes correlated with domestication and diversification are nonsense mutations or frameshift indels that alter the protein coded for by a gene, and result in the "large phenotypic effects that are observed during crop evolution."

Among the genes the authors identify with domestication are those involved in regulation of inflorescence development (an inflorescence is the cluster of flowers on a stem that will become seeds; see image above), vegetative growth habit and height, seed pigment, size, casing, nitrogen access and efficiency, and fruit flavor in strawberry, and so forth. Diversification genes include those involved in fruit shape and size, inflorescence architecture, color, starch composition, dwarfism, flowering time, and more. 

It is difficult to know whether a mutation is a precursor to domestication or diversification, or simply happened to arise at around the same time.  People might have noticed a precursor and chosen to breed it, for example.  Some are present in the wild plant but at much lower frequency than in the cultivated plant, which suggests that it, in conjunction with other genetic changes, may be associated with domestication. 

It's possible that because domestication is selection on a trait not a gene, the underlying genetic architecture of a trait is different in different species.  This is certainly true for many phenotypes not related to domestication, so wouldn't be unexpected.  Murray and Parugganan suggest that finding such parallelisms can explain the genetic basis for Darwin's idea of "analogous variations" and for Russian botanist and geneticist of the early 20th century, Nikolay Vavilov's idea of the Law of Homologous Series (I can't resist noting here that Gary Nabhan writes beautifully about Vavilov's life and work in his 2009 book, "Where Our Food Comes From: Retracing Nikolay Vivilov's Quest to End Famine").  This can happen by phenogenetic drift, or by parallel evolution.

This paper is an excellent reprise of the state of knowledge of crop domestication -- with one quibble. Murray and Parugganan write that "Domestication provides a fascinating model for the study of evolution..."  Darwin thought so, too, but artificial selection, the basis for domestication, is directed, strong, and often fast.  Not only is natural selection generally much weaker, but it is not directed and evolution by natural selection is typically slow.  With complex causation, strong selection should often have very different genomic consequences compared to weak selection -- the former being more single-gene or at least simpler in nature.  But even with relatively simple causation that could be picked out at a specific gene level by artificial selection, slow natural selection may not be so gene-specific.

In addition, much of evolution seems not to happen by natural selection but instead by genetic drift or other forms of selection (organismal selection, niche selection, and so forth), while domestication is due to strong, directed artificial selection.  Thus, the lessons of domestication aren't always a good model for evolution in general.

Indeed, while Darwin thought of domestication as a good model, Alfred Wallace, the co-discoverer of evolution did not.  He noted, in his letter read to the Linnean Society ("On the Tendency of Varieties to depart indefinitely from the Original Type"), in 1858 along with Darwin's, announcing their co-discovery,
     One of the strongest arguments which have been adduced to prove the original and permanent distinctness of species is, that varieties produced in a state of domesticity are more or less unstable, and often have a tendency, if left to themselves, to return to the normal form of the parent species; and this instability is considered to be a distinctive peculiarity of all varieities, even of those occurring among wild animals in a state of nature, and to constitute a provision for preserving unchanged the originally created distinct species....
     It will be observed that this argument rests entirely on the assumption, that varieties occurring in a state of nature are in all respects analogous to or even identical with those of domestic animals, and are governed by the same laws as regards their permanence or further variation.  But it is the object of the present paper to show that this assumption is altogether false, that there is a general principle in nature which will cause many varieties to survive the parent species, and to give rise to successive variations departing further and further from the original type, and which also produces, in domesticated animals, the tendency of varieties to return to the parent form.
Selection is a far more curious phenomenon than it is typically given credit for being.  It is too easy for us to compress time in our minds and think of natural selection as if it were artificial, strong and directed.   But the beasts and foliage of nature may, like a Rousseau painting, be a thicket of meandering change, sometimes for inscrutable reasons.

Wednesday, December 11, 2013

Sometime geneticist Joe Terwilliger on genetics

We may recently have given the false impression that geneticist Joe Terwilliger gives less priority to science, or at least good science, than to other perhaps more frivolous pursuits (he is Abe Lincoln every February, for example, and tuba player the rest of the time -- unless he's cleaning up bean debacles as a diplomat, or being a basketball and language coach to Dennis Rodman), so we wanted to help correct any such misconceptions here.  Perhaps to that end, Joe (now known in South Korea, we're afraid, as "sometime geneticist Joe Terwilliger") suggested we republish a blog post he first posted on his own short-lived blog in 2008.  He recently dug this up again and says that few could disagree, even 5 years later.


Joe as Abe on the balcony (but not of Ford's Theater)

The point is that sometimes there is a lot of convenient hard-of-hearing even in science, which fancies itself to be an objective search for truth. Some of the details in Joe's post are out-of-date but we, and he, think that the basic thrust is not.  In a sense that makes the conclusion all the more cogent, because the same modes of thinking about genomic causation are still predominant, despite the vastly costly but essentially consistent results in the five years since 2008.  And, as Joe points out, he and Ken had much the same message in 2000.

One not-so-subtle change, we will note, is that promises by NIH Director Francis Dr Collins, and many others in presumably responsible positions, have steadily altered  their due date, which recedes into the distance like, say, an oasis as you grope for water, the fences if you want your pitchers to have a better earned run average, or a preacher's promises of ultimate salvation, if you weekly plunk coins into the basket.

So perhaps the lesson is that under these circumstances, rather than just dismiss critics, science--actual science as it's supposed to be--should feel a need to take stock of what it's doing.  But we leave it to you to judge.

And if you hear about Joe in other contexts in weeks to come, remember that he was a sometime geneticist here first:


The Rise and Fall of Human Genetics and the Common Variant - Common Disease Hypothesis
By Joe Terwilliger
Nov 2008

There is an enormity of positive press coverage for the Human Genome Project and its successor, the HapMap Project, even though within the field the initial euphoric party when the first results came out has already done a full 180 to be replaced by the hangover that inevitably follows such excesses.

For those of you not familiar with the history of this field and the controversies about its prognosis which were present from the outset, I refer you to a review paper I and a colleague wrote back in 2000 at the height of the controversy - Nature Genetics 26, 151 - 157 . The basic gist of the argument put forward for the HapMap project was the so-called common variant/common disease hypothesis (CV/CD) which proposed that "most of the genetic risk for common, complex diseases is due to disease loci where there is one common variant (or a small number of them)" [Hum Molec Genet 11:2417-23]. Under those circumstances it was widely argued that using the technologies being developed for the HapMap project, that one would be able to identify these genes using "genome-wide association studies" (GWAS), basically by scoring the genotype for each individual in a cross sectional study for each of 500,000 to 1,000,000 individual marker loci - the argument being that if common variants explained a large fraction of the attributable risk for a given disease, that one could identify them by comparing allele frequencies at nearby common variants in affected vs unaffected individuals. This point was contested by researchers only with regard to how many markers you might have to study for this to work if that model of the true state of nature applied. Many overly optimistic scientists initially proposed 30,000 such loci would be sufficient, and when Kruglyak suggested it might take 500,000 such markers people attacked his models, yet today the current technological platforms use 1,000,000 and more markers, with products in the pipelines to increase this even more, because it quickly became clear that the earlier models of regular and predictable levels of linkage disequiblrium were not realistic, something that should have been clear from even the most basic understanding of population genetics, or even empirical data from lower organisms.

Today such studies are widespread, having been conducted for virtually every disease under the sun, and yet the number of common variants with appreciable attributable fractions that have been identified is miniscule. Scientists have trumpetted such results as have been found for Crohn's disease, in which 32 genes were detected using panels of thousands of individuals genotyped at hundreds of thousands of markers - this sounds great until you start looking at the fine print, in which it is pointed out that all of these loci put together explain less than 10% of the attributable risk of disease, and for various well-known statistical reasons, this is a gross overestimate of the actual percentage of the variance explained. Most of these loci individually explain far less than half a percent of the risk, meaning that while this may be biologically interesting, it has no impact at all on public health as most of the risk remains unexplained. This is completely opposite to the CV/CD theory proposed as defined above. In fact, this is about the best case for any complex trait studied, with virtually every example dataset I have personally looked at there is absolutely nothing discovered at all.

At the beginning of the euphoria for such association studies, the example "poster child" used to justify the proposal was the relationship between variation at the ApoE gene and risk of Alzheimer disease. In an impressively gutsy paper recently, a GWAS study was performed in Alzheimer disease and published as an important result, with a title that sent me rolling on the floor in tears laughing: "A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease" [ J Clin Psychiatry. 2007 Apr;68(4):613-8 ] - in an amazingly negative study they did not even have the expected number of false positive findings - just ApoE and absolutely nothing else... And the authors went on to describe how important this result was and claimed this means they need more money to do bigger studies to find the rest of the genes. Has anyone ever heard of stopping rules, that maybe there aren't any common variants of high attributable fraction??? This was a claim that Ken Weiss and I put forward many times over the past 15 years, and Ken has been making this point for a decade before that even, in his book, "Genetic variation and human disease", which anyone working in this field should read if they are not familiar with the basic evolutionary theory and empirical data which show why noone should ever have expected the CV/CD hypothesis to hold...

In many other fields, the studies that have been done at enormous expense have found absolutely nothing, and in what Ken Weiss calls a form of Western Zen (in which no means yes), the failure of one's research to find anything means they should get more money to do bigger studies, since obviously there are things to find but they did not have big enough studies with enough patients or enough markers - it could not possibly be that their hypotheses are wrong, and should be rejected... It is a truly bizarre world where failure is rewarded with more money - but when it comes to promising upper-middle-aged men (i.e. Congress) that they might not die if they fund our projects, they are happy to invest in things that have pretty much now been proven not to work...

While in a truly bizarre propaganda piece, Francis Collins, in a parting sycophantic commentary (J Clin Invest. 2008 May;118(5):1590-605) claimed that the controversy about the CV/CD hypothesis was "... ultimately resolved by the remarkable success of the genetic association studies enabled by the HapMap project." He went on to list a massive table of "successful" studies, including loci for such traits as bipolar, Parkinson disease and schizophrenia, and of course the laughable success of ApoE and Alzheimer disease. To be objective about these claims, let me quote from what researchers studying those diseases had to say.

Parkinson disease: "Taken together, studies appear to provide substantial evidence that none of the SNPs originally featured as PD loci (sic from GWAS studies) are convincingly replicated and that all may be false positives...it is worth examining the implications for GWAS in general." Am J Hum Genet 78:1081-82

Schizophrenia: "...data do not provide evidence for involvement of any genomic region with schizophrenia detectable with moderate [sic 1500 people!] sample size" Mol Psych 13:570-84

Bipolar AND Schizophrenia: "There has been great anticipation in the world of psychaitric research over the past year, with the community awaiting the results of a number of GWAS's... Similar pictures emerged for both disorders - no strong replications across studies, no candidates with strong effect on disease risk, and no clear replications of genes implicated by candidate gene studies." - Report of the World Congress of Psychiatric Genetics.

Ischaemic stroke: "We produced more than 200 million genotypes...Preliminary analysis of these data did not reveal any single locus conferring a large effect on risk for ischaemic stroke." Lancet Neurol. 2007 May;6(5):383-4.

And the list goes on and on of traits for which nothing was found, with the authors concluding they need more money for bigger studies with more markers. It is really scary that people are never willing to let go of hypotheses that did not pan out. Clearly CV/CD is not a reasonable model for complex traits. Even the diseases where they claim enormous success are not fitting with the model - they get very small p-values for associations that confer relative risks of 1.03 or so - not "the majority of the risk" as the CV/CD hypothesis proposed.

One must recall that in the intial paper proposing GWAS by Risch and Merikangas (Science 1996 Sep 13;273(5281):1516-7) - a paper which, incidentally, pointed out that one always has more power for such studies when collecting families rather than unrelated individuals - the authors stated that "despite the small magnitude of such (sic: common variants in)genes, the magnitude of their attributable risk (the proportion of people affected due to them) may be large because they are quite frequent in the population (sic: meaning >>10% in their models), making them of public health significance." The obvious corollary of this is that if they are not quite frequency, they are NOT having high attributable fraction and are therefore NOT of public health significance.

And yet, you still have scientists claiming that the results of these studies will lead to a scenario in which "we will say to you, 'suppose you have a 65% chance of getting prostate cancer when you're 65. If you start taking these pills when you're 45, that percent will change to 2". Amazing claims when the empirical evidence is clear that the majority of the risk of the majority of complex diseases is not explained by anything common across ethnicities, or common in populations... (Leroy Hood, quoted in the Seattle Post-Intelligencer). Francis Collins recently claimed that by 2020, "new gene-based designer drugs will be developed for ... ALzheimer disease, schizophrenia and many other conditions", and by 2010, "predictive genetic tests will be available for as many as a dozen common conditions". This does not jibe with the empirical evidence... In Breast Cancer for example, researchers claimed that knowledge of the BRCA1 and BRCA2 genes (which confer enormously high risk of breast cancer to carriers) was uninteresting as it had such a small attributable fraction in the population. Of course now they have performed GWAS studies and examined tens of thousands of individuals and have identified several additional loci which put together have a much smaller attributable fraction than BRCA1 and BRCA2, yet they claim this proves how important GWAS is. Interesting how the arguments change to fit the data, and everything is made to sound as if it were consistent with the theory.

I suggest that people go back and read "How many diseases does it take to map a gene with SNPs?" (2000) 26, 151 - 157. There are virtually no arguments we made in that controversial commentary 8 years ago which we could not make even stronger today, as the empirical data which has come up since then basically supports our theory almost perfectly, and refutes conclusively the CV/CD hypothesis, despite Francis Collins' rather odd claims to the contrary...

In the end, these projects will likely continue to be funded for another 5 or 10 years before people start realizing the boy has been crying wolf for a damned long time... This is a real problem for science in America, however, as NIH is spending big money on these rather non-scientific technologically-driven hypothesis-free projects at the expense of investigator-initiated hypothesis-driven science. Even more tragically training grants are enormously plentiful meaning that we are training an enormous number of students and postdocs in a field for which there will never be job opportunities for them, even if things are successful. Hypothesis-free science should never be allowed to result in Ph.D. degrees if one believes that science is about questioning what truth is and asking questions about nature, while engineering is about how to accomplish a definable task (like sequencing the genome quickly and cheaply). The mythological "financial crisis" at NIH is really more a function of the enormous amounts of money going into projects that are predetermined to be funded by political appointees and government bureaucrats rather than the marketplace of ideas through investigator-initiated proposals. Enormous amounts of government funding into small numbers of projects is a bad idea - one which began with Eric Lander's group at MIT proposing to build large factories for the sequencing of the genome rather than spreading it across sites, with the goal of getting it done faster (an engineering goal) instead of getting more sites involved so that perhaps better scientific research could have come along the way. This has led to a scenario years later in which the factories now want to do science and not just engineering, which is totally contrary to their raison d'etre, and leads to further concentrations of funding in small numbers of hands when science is better served, perhaps by a larger number of groups receiving a smaller amount of money so that more brains are working in different directions thinking of novel and innovative ideas not reliant on pure throughput. Human genetics has transformed from a field with low funding, driven by creative thinking into a field driven by big money and sheep following whatever shepherd du jour is telling them they should do (i.e. innovative means doing what they current trend is rather than something truly original and creative). This is bad for science, and also is bad science. GWAS has been successful technologically, and it has resoundingly rejected the CV/CD hypothesis through empirical data. If we accept this and move on, we can put the HapMap and HGP where it belongs, in the same scientific fate as the Supercollider, and let us get back to thinking instead of throwing money at problems that are fundamentally biological and not technological!


(most notably in terms of the big money NIH is sending into these non-scientific technologically-driven hypothesis-free studies, rather than investigator initiated hypothesis-driven science - one of the main causes of the "funding crisis" at NIH where a tiny portion of new grants are funded - get rid of the big science that is not working - like the supercollider! - and there is no funding crisis)

Tuesday, December 10, 2013

Are we living in Flatland? Does it matter?

Flatland--the book
In 1884, EA Abbot wrote a book titled Flatland: A Romance of Many Dimensions.  It was about a world that existed only in two dimensions--only as a surface in which everyone lived (they did not live on it because that would imply a third, depth dimension).  It was a satire on Victorian England's society, but it has often been cited in the context of our appreciation of dimensionality of existence.


We live in a 4-dimensional world: 3 space dimensions (which we can label x, y, and z, or north-south, east-west, and up-down), and one time dimension.  As Einstein showed, all 4 are inextricably physical realities.  We're quite familiar with this system, because we are of it, evolved in it, and live in it. These are often called 'orthogonal' dimensions, because you can move along one of them without affecting where you are on others; you can go east without affecting your north or up position (time is curiouser, but you can be in Cincinnati yesterday or tomorrow without affecting where Cincinnati is).  Life is part and parcel of this, as well as inanimate energy and matter.

Mathematicians can deal with many more dimensions in abstract ways, and there are even some theoretical reasons to think there may be more physical dimensions that are simply out of our ken.  At least, that's what 'string theory' asserts.

But we can think of 'dimensions' in other than ordinary physical ways.  We can think of causation by different factors acting in different ways, each factor a separate measurement. If they are orthogonal (independent of each other) they hence can be treated as independent causes, data or statistical 'dimensions' with the same sort of mathematics used in physics and geometry.  On the other hand it may be that theoretically, or empirically, multiple measured factors interact--are not independent.

Multivariate statistics is a common approach to such data, in which each independent risk factor, or independent set of correlated factors, is treated as a dimension, and every observed person has values in each dimension, the way you, now, have an east, north, up, and time position.  You also have your age, gender, which are essentially independent, and your income and diet which are probably not.  The same ideas apply to different parts of your genomes.

Multivariate statistical approaches categorize each observation's (say, each sampled person's) position and based on aggregate data, his/her possible future position. But as applied in real life, statistical estimates of these values are based on some relative criteria, comparing a person's position to the mean, or judging whether observations in this space are unusual in some way or other (statistically significance) that may be interpreted as indicating which dimensions are actually 'important'.  Is this gene 'important' in heart disease?  It's a subjective decision based on subjective evaluation of objective data.

Statistics is based on underlying probabilities, a subtle concept that can be interpreted in various ways.  All science is based on statistical assessment in one way or another, because we never know all possible factors or observe anything with perfection.  If we're lucky, the statistical aspects of things are about measurement error, not a poor understanding of reality.

Calculus is a mathematical way of expressing position, and especially changing position (movement) in any specified number of dimensions.  Calculus and related mathematical tools are somewhat different from statistics because these are tools for studying the behavior of things whose laws of behavior we understand or at least specify.  We make predictions or interpret observations based on a theory of how the world is.  That theory is completely precise (even if it includes, as quantum mechanics does, fundamental probabilism).  In a sense, a real scientific theory predicts data on the basis of some causative mechanism.  That is what the 'law' is about.

By contrast, as we noted recently, statistical studies often do not have an underlying theory or law that is being tested or reflected in the data.  The hope is, when the investigator even thinks about it explicitly, that observed correlations reveal something about the underlying causal processes that focused studies can reveal.

The Flatland issue comes about here because the mechanisms we can know to suggest must be confined to specific statements about the four dimensions of the physical world, or the essentially made-up 'dimensions' of statistical comparison, that we know about or can think of.   

In statistical approaches, especially in areas of genomics, biomedicine and health, and evolution, we generally do not have an adequately precise theory.  Regression analysis, for example, says that each increased unit of exposure to some risk-factor dimension increases risk by such-and-such an amount.  But the dimensions are purely empirical, derived retrospectively from analyzing our collected data, that we may use to make predictions, but without those being based on a theory of mechanism. Often the included variables are arbitrarily chosen, and we have no way of knowing what might lie outside our chosen 'dimensions'--that our presence in Flatland prevents us from seeing.  And again, what we include in much of these areas of science are generally not truly different dimensions, but instead are ways the data are arranged (for whatever reason) along the same dimensions.  So if we measure something on dimension X and on dimension Y in each individual and find the data look like this...


....then we can (mathematically) make a 'new' dimension, the characteristics of the line shown, and say that the data mainly all are being measured on this one dimension.  But it's really just a rearrangement of X and Y.  The pattern may tell us something about the causal relationship between whatever X and Y are, and that may help us understand or simplify, but it doesn't get beyond the original dimensions.  It is still commonsense, and a way of simplifying.  It is a fundamental way of thinking in much of science and, relevant here, in genomics and evolutionary science.

But if this only gets us so far, is it because we are still viewing the world from Flatland, and seeing things only in terms of what, to us, is commonsense?

Are we living in Flatland?
These are generalizations that one may quibble about, but they generally characterize how business is done these days.  So it is fair to ask whether there is reason to believe that, in any important ways, we are living in Flatland. 

A few weeks ago, we enumerated 19 strange or paradoxical facts (well-known and not discovered by us!), that should give people pause, and ask whether we might, with our current statistical and other standard approaches, be living in a kind of Flatland, oblivious to dimensions that may be important but that we don't even know of--and whether we should at least be looking.

Does it matter?
For strong causal factors, Flatland is a fine place to be.  We can identify those factors, and anticipate their effect with reasonable accuracy.  Strong causal factors are typically relatively simple: they are always detectable, often act alone, and rarely fail to have their effect.  These are the factors that, once identified, present juicy and suitable targets for engineering to avoid or modify them or their effects.

Humans are terrific engineers, so that when biomedical, agricultural, or other biologically related factors are at issue we can do something about them.  If they are undesirable, we can detect and remove them.  If they have desirable properties, we can use those properties to make changes in a positive direction (vaccines, bacteria that eat oil, enzymes and genes used to rearrange genomes experimentally or in public health).  If they are genetic, we can use them to reconstruct history.  Engineering may take time and skill, but it works eventually if we are determined enough.

When faced with such factors, there is no real detriment to living in Flatland.  What we can't see, or don't even know is there to see, doesn't really matter.

But in much of biomedical and evolutionary genetics, there is reason to think that we are indeed living in Flatland, and that it does matter a whole lot.  To see our world farther, we need to learn how to reach into its other causal dimensions.

Monday, December 9, 2013

From scroll to screen: the 500 year academic speed-up

How do academics deal with the chaos of online publishing these days, when it comes to evaluations for professional performance?  Deans, chairs and grant reviewers must make hiring, tenure, promotion, and funding decisions based on academic track records.  But what should be in that record?

We have been advocating our view that the social media, including things like blogs, Twitter, and so on as well as online publishing and open reviewing should count.  Indeed, perhaps the successful professor should be expected to work in this rapid, open mode of spreading his/her ideas, results, and influence.

How that could be done is an important question.  Chairs and deans tend to be conservative.  They want to be fair, and not to be bowled over by chaff and resume-padding.  We know some of the  issues that relate to 'peer' review and so on that lead to just that.  But the online world is chaotic, even as it's exciting, breathtaking, and vibrant.  Many are and more will be considering ways to get away from simple bean-counting (publications, citations, etc.) and move towards more substantial criteria (we've posted on this before, e.g. here).


Galen. De pulsibus. (Manuscript; Venice, ca. 1550). This Greek manuscript of Galen’s treatise on the pulse is interleaved with a Latin translation.  Wikimedia Commons

Nothing New
Today we wanted just to put this in a different perspective.  I am reading a very fine new book by Susan Mttern, called The Prince of Medicine, which is a life of the classic Greek physician Galen.  This sentence, in its context, struck me:  "Almost all significant medical writers had, apparently, commented on Hippocrates..."

Hippocrates of Cos; Rubens, 1638, Wikipedia

Why this struck me is this:  Galen's time (130-200 AD) was roughly 500 years after Hippocrates (and here we skip over who Hippocrates actually was, or wasn't).  By then, all scholars worth their stethoscopes had commented on the works of the great master of Cos.  His ideas had been lauded, picked on, updated, and debated. Various schools of thought, often as vitriolic as differences about selectionism or genetic determinism, chimed in to advocate their view, and critique Hippocrates, and over those five centuries, to critique each other.

It was a vigorous field of play, and must have included a lot of chaff along with the ideas that stuck around.  One would have to know what's what to be able to evaluate who should be listened to, and  who was just ranting or hacking away gratuitiously at the great founder of medicine.

Sound familiar?

We're lucky.  The same is going on now, but we get to enjoy its variegated flavors every day.  We don't have to wait five centuries and only judge retrospectively over the misty generations.

Actually, this is nothing new and it's not specific to the sciences.  The humanities are going online as well, if more slowly.  But in the last thousands of years the idea of descending trees of commentary were certainly part of western culture.  This, as I understand it at least, was the Talmudic and Scholastic and Koranic scholarship through the millennia since the separation of these religions.  It is likely also characteristic of Asian thought, but that's not something I know about.

Again, this took place over centuries, with the usual back and forth, sometimes polite, sometimes vitriolic. Just as science develops its schools of thought, so have religion and philosophy, based on their respective commentariats. But as with science even back in the classical days, things took generations.

We are lucky to live in our faster age, even if it can be too fast sometimes.  Our professional world will have its shakeout.  Ideas will come and go, win or lose in this arena.  But we may all live to see it.  If we can learn to use it creatively.

Friday, December 6, 2013

Gene therapy -- a technological challenge, but we're good at meeting technological challenges

A friend told me recently that gene therapy for diseases of the skin was making a lot of progress.  Given the fits-and-starts history of the field of gene therapy, the fact that the US Institute of Medicine has recommended that most research in gene therapy no longer require the special review it has long been subject to (discussed in Nature this week here), as well as recent announcements of progress and hope, I was interested in looking into it.

Gene therapy, or more generally genetic engineering, is the replacement of a disease-causing allele with a working one, or the silencing of a causal allele.  Gene therapy has a long history, and not always a successful one.  It first looked possible in the early 1970's when Friedmann and Roblin proposed in Science that "gene therapy may ameliorate some human genetic diseases in the future", before much at all about the association between genes and disease was understood. 

There has been much progress since the 70's in understanding the essentials of gene function, of course; the genetic underpinnings of many diseases, primarily rare, have been identified and, while gene therapy has seen some tragic failings, progress is being made.  In theory, while promises in the past were too grandiose, medicine is likely to see many more successes in treating single-gene diseases.  After all, when faced with a clear problem, humans are very good at engineering solutions.

The theory is straightforward: find the gene, figure out what's going wrong, and stop or replace it.  Once a faulty gene has been identified, though, the fundamental problem has been delivering the therapeutic DNA into the nucleus of cells in the affected organ, and then into the right site in the genome so that it can either halt coding for the defective protein or begin telling the cell how to make a working copy of the protein.

Gene therapy using an adenovirus vector: Wikipedia
One way to do the delivery is with a viral vector.  Viruses are naturals at delivering their own genes into host cells, so, in theory harnessing that ability for therapeutic purposes is ideal.  Target the right type of cell, and the introduced gene can be expressed directly or integrated into the host-cell's genome where it will be expressed.  However, most genetic diseases would require repeated delivery of the therapeutic DNA, which would mean repeated exposure to the virus.  Our immune systems are very good at identifying and targeting invading viruses; while this is usually a good thing, it's not so good for virus-based gene therapy that needs to be repeated.  And, if the patient was exposed to the virus before its use in therapy, it's less likely to be effective, again because of the immune response.

Gene therapy makes a comeback
Indeed, it was a severe inflammatory response to an adenovirus-delivered therapy that was responsible for the death of Jesse Gelsinger in 1999, a 19-year old subject in a clinical trial.  His death greatly reduced enthusiasm for gene therapy, and rightly so.  But it did eventually give researchers insight into safer and more effective DNA delivery systems, adeno-associated virus (AAV) being one, although the length of DNA that it can carry is limited, and thus so are the diseases it can eventually be used to treat.  Even so, as reported in a paper in ABBS last year ("Phoenix rising: gene therapy makes a comeback", ABBS (2012) 44 (8): 632-640, Lamberis), AAV-based therapy has been successful now for a genetic cause of blindness, LCA, and there were 80 on-going clinical trials testing the efficacy of AAV-based gene therapy in 2012. 

Retrovirus-based vectors are another approach.  An advantage over AAV is that these can incorporate much larger transgenes, but they have been associated with leukemia-like T lymphoproliferative disorder due to insertional mutagenesis, or random insertion of the virus and its engineered load into the target DNA.  Lentivirus-based vectors are another possibility, but they, too, have their downsides. 

Non-viral vectors consisting of "lipids, peptides, carbohydrates or nanoparticles that fuse with the cell membrane and release the therapeutic DNA in the cell cytoplasm" (Lamberis) depend on natural mechanisms of the target cell to take them in and transport them within the cell.  They aren't as efficient as viral systems, but are safer and less likely to trigger immune responses than virus-based vectors.  

A prime candidate for successful genetic engineering is skin disease.  The skin is an eminently accessible organ, so that if the problem of getting the therapeutic agent through the epidermal barrier is soluable, topical treatment of many skin diseases could be envisioned.  Amy Paller and colleagues reported success with this last year in PNAS.  They introduced "spherical nucleic acid nanoparticles conjugates" into the keratinocytes of mice, and in human skin grafted onto a mouse, with the ultimate goal of delivering RNA silencing systems that can inhibit the expression of faulty genes for a variety of keratin-associated skin diseases.

FIRST, the Foundation for Ichthyosis and Related Skin Types, awarded Paller with funding to continue with this work.  The foundation's announcement describes Paller's work this way: 
The blistering and thickening of skin seen in EI [epidermolytic ichthyosis] usually results from a change in a single letter of the DNA code (a mutation) in one copy of the gene that provides the codes for manufacture of a keratin protein in the upper layers of skin. Small interfering RNAs (siRNAs) are small pieces of genetic material that can identify DNA pieces and bind to them, preventing the gene from being translated into protein. siRNAs are able to distinguish the mutated DNA from the normal DNA, and thus are able to prevent only the abnormal keratin protein from being formed. The problem with siRNA has been getting it through the skin barrier to where it needs to go. Dr. Paller and her team have found a way to get the siRNAs through the skin, through nanotechnology. By putting about 30 copies of the siRNA all around a central gold nanoparticle (leading to what her group calls “spherical nucleic acids”), the siRNAs are able to be rubbed into skin in a simple moisturizer.
This indeed sounds as though it has potential.

CRISPR -- the next big thing
We can't end this post without mentioning another technological advance getting a lot of attention these days.  It exploits a portion of some prokaryotic immune systems, CRISPR, Clustered Regularly Interspaced Short Palindromic Repeats.  Found in bacterial and archaeal genomes, these short repeat DNA sequence loci, from invading structures called phage or plasmids, get incorporated into the prokaryote's genome between CRISPR repeats, as a 'search' sequence.  The CRISPR resulting structure is then the basis for recognition of exogenous invading genomes. The structure moves along an incomer's DNA till it detects a match to the incorporated 'search' sequence.  Then, aided by genes called CAS genes, it cuts the detected DNA.  Normally that destroys it, but other enzymes can be targeted to this cut, and repair it.  Here's where genetic engineering comes in -- the repair process can insert a user-designed sequence into the targeted break.

As Elizabeth Pennisi described in her piece, "The CRISPR Craze", (Science 23 August 2013 341(6148), 833-836), CRISPR are also showing potential for use in gene therapy if they can be used to "delete, add, activate or suppress targeted genes in human cells". The idea in general is that a harmful sequence could be detected, the DNA cut at this point, and a 'good' sequence inserted to replace the harmful part. Here's a video that shows the idea.  Mark Wanner at Jackson Labs also nicely describes the potential uses of CRISPR on his blog. 

There is much excitement about CRISPR's potential.  A story in The Independent quoted several scientists the other day, among them George Church, a geneticist at Harvard, who was one of the first to use CRISPR to actually edit nucleotides in human sequence. "“The efficiency and ease of use is completely unprecedented. I’m jumping out of my skin with excitement,” said Church."  Ok.  But there is still a lot of work to be done.  Getting the edited gene to the right place in the target genome without undesirable effects elsewhere will be a challenge, as with conventional gene therapy.

But, if humans are good at anything, it's technology, and gene therapy and genetic engineering are primarily technological challenges.  We often criticize the science of genetics as it portrays itself and its successes, but if these technologies live up to their potential, they can change a lot of lives for the better.

Thursday, December 5, 2013

Hang him! Get it over with and worry about 'justice' later!

Buckminster Fuller famously said, “You never change things by fighting the existing reality.  To change something, build a new model that makes the existing model obsolete.”

We hear this all the time in the context of genetics.  Over and over and over, we've written about issues that we think are clear, obvious, certainly not secret; problems with genetics that we think explain why we pretty much can't expect to predict phenotype from genotype, or vice versa.  Many of the same issues apply to identifying environmental causes of disease.  This is the amassed evidence that some defendant (some 'theory' in the case of science) is being wrongly accused (wrongly attributed or applied).

It's more complicated than that, of course.  Mendelian genetics had its day, and the cause of many single gene diseases has been identified, but these are largely rare, often congenital conditions, and while these successes are priceless for families with these diseases, the same kind of success hasn't panned out for common, complex diseases.  And there are even more seriously troubled waters,  if not dangerous rapids, for Mendelian ideas up ahead--as we'll describe in upcoming posts!

The same problems apply to epidemiology -- when infectious diseases were more prevalent in the West, when the field was just coming into its own, infectious agents were readily identified, leading to prevention and cures.  Similarly, tobacco was identified long ago as a cause of disease, as was asbestos in buildings, and other chemical toxins.

Our methods work well when the causal agent has a strong effect, that one could see in a dark room while wearing dark glasses.  Extensive experimental and observational data show that mutations in the CFTR gene seem to cause cystic fibrosis, inhaling coal dust causes black lung disease.  But when multiple genes with small effects, or an interacting network of genes and environmental factors, or a complex diet rather than one component of a single food, or a combination of diet and exercise are the risk factors, identifying cause is harder.  Or impossible.

Courtroom inside Tombstone Courthouse State Historic Park, Photo by Matthew A. Lynn; Wikimedia

We've blogged a number of times in the last month or so about these issues.  About why we think genetics is bogged down with, generally, diminishing returns on ever-larger studies, why its promises of personalized genomic medicine, and the benefits of whole genome sequencing and so on have turned out to be over-promises and won't be attained nearly to their advertised extent. 

We're often told that we're too negative.  And we're told that if we don't have an alternate model, we should not criticize the status quo.  But we believe we've got a positive message, and that is that we've learned a tremendous amount in the last 20 or 30 years about what genes do.  The science wouldn't be where it is if we hadn't that knowledge.  There's still a lot we don't understand, but we think that at this stage, much of that is because we're constrained in our thinking by a prevailing model that doesn't accommodate all observations.  We think new thinking is in order, but, no, we can't personally offer it up.  We think about these things all the time, and work with others to explore the  issues, but the lesson of history is that the field, or some new young thinker will have to do that.

The positive message is that we have tons of what appears to be reliable knowledge, but knowledge that points to the possibility (or, depending on how you view things, high likelihood) that some very fundamental facts of life are missing, and that current methods are not suited to detect.  And the fact that the same sorts of things apply to modern genetics and to evolutionary reconstructions and interpretations, reinforces this:  what the current methods can do, and have done, is to show where the confusing, contrary, perplexing, and sometimes paradoxical issues are. 

So, it's odd to hear that if we can't produce a new model we should shut up.  Or, worse, that by constantly saying these things somebody, like a congressperson, might hear them and wonder if we really need as much money or as many university jobs, as we've been fortunate to have.  But maybe such a threat should exist, and clearly!  What a stimulus to serious thought!

Imagine a 12 person jury evaluating the evidence on a murder case.  Eleven vote to convict, but one juror doesn't, saying that the evidence doesn't add up.  She points out the contradictions and the bits of evidence the others have ignored and so on.  No, she doesn't know who did commit the murder, but she's convinced the defendant didn't.  But the others don't budge.  They tell her that if she can't tell them who did do it, they won't change their vote. Hang the accused!

There are problems in genetics....and to press this point is to do a service, not a disservice, to the elusive truth that confronts us.  Whether or not anybody cares to listen, we will continue to express our message, fallible as we are, as we see it.

But we're also trying, within our poor powers to add or detract (to paraphrase the Gettysburg Address), to find better ideas.

[This post contributed to equally by both Ken and Anne.]

Wednesday, December 4, 2013

We should be more circumspect about animal models in research

Mice can be cute and cuddly, or if they come into view unexpectedly they can trigger the stereotypical leap to a chair and the "Eeeek!!!" cry for help.  But many of us come across mice in a very different, less cute setting--in our research labs.  There, our mice surprise us in other ways, that ought to be just as scary.  The general issues and their ethical and scientific import are discussed in a nice article by Jennifer Couzin-Frankel in Science.  She makes the point that there are far fewer rules when it comes to running, analyzing and interpreting studies using animal models than for human clinical trials, and thus it's often difficult to evaluate their importance.

From: http://cdn.toonvectors.com/images/35/16985/toonvectors-16985-940.jpg

Animal studies generally are based on far fewer individuals than clinical trials.  If somehow we justify requiring hundreds of thousands of cases (say, diabetics) and at least as many unaffected controls, from multiple global studies, to claim to identify the genes 'for' a trait (GWAS and other similar mapping efforts), and even then don't find much, how is it that we expect to learn about this by mapping studies in a small study of mice? 

There are many examples in which serious and clear high-risk mutational effects found in humans do not arise in mice with similar mutations induced transgenically.  Altered BRCA1 and breast cancer is one of legions of examples.  Much is learned even in those cases, about aspects of the gene's biology, but not necessarily about why the difference in the trait between mice and humans.  Mouse skulls develop differently in many ways from those of humans, in particular involving the sutures (joints) between the bones in the skull vault that protects the brain.  Yet mutations causing abnormal suture closure in humans have quite similar effects in mice--that is, the mouse model seems to work, but why that is is somewhat unclear.  We study dental or limb development in mice, but their teeth and limbs are quite different from ours--however some of the same genes and gene interactions apply in both cases.

When one gene on its own doesn't usually account for human traits like disease (not even those with strongest effect, such as BRCA1 and breast cancer) why do we think we'll understand diabetes by making single-gene knockouts?  The rationale, and it's true to some extent, is that we are trying to work out, with animal models, how the gene works and why a mutant version can lead to disease. But this is so very incomplete that it's curious how much we invest in the approach (which, we must immediately acknowledge, we have been doing in our own lab for many years).  The challenge is to know when and why the differences exist--and not to over-extrapolate from mouse to human.

Couzin-Frankel has identified many issues and reports on NIH efforts to look at them.  They are serious and perhaps should threaten funding of such studies and diverting funds to better ways--and making the environment suitable for people to find them.  There are elements of scandal and misrepresentation in the drug area that she reports.  Perhaps at least more transparency can result, but forgetting that, the scientific issues are serious in their own right.

Of mice and not-men in many more ways
This article and those cited in it don't begin to touch many of the serious issues, that go way beyond small sample sizes, non-randomization, and so on.  Here are some others, that we have seen in our own personal experience with our own work, or those of colleagues:

1.  Doing very unpleasant things to mice and getting them approved by your local research ethical review board ("IRB"), torment if not torture often justified on the grounds of preventing disease, an argument often stretched to the limit to justify things with only the remotest connection to health.

2.  Reporting the most extensive transgenic effect because that one (author says either privately or even in the paper) is the 'most representative' of the engineered effect.

3.  Using statistical techniques to get more results than there is blood in a turnip from small samples, which have all sorts of unreported but potentially relevant nuances.

4.  Assuming that an experimental effect done on one mouse strain represents 'the mouse' and, worse, 'the human'.  In fact, many if not most transgenic manipulations yield different results for different test strains, and even worse than that, often the chosen strain is one chosen because it has relevant characteristics--more likely to get cancer, more responsive to genetic manipulation, and so on.

5.  Assuming that inbred mice are genetically homogeneous--that is, they have no variation within their own genomes (that is, both copies of their chromosomes, inherited one from each parent, are of identical sequence) and, just as bad, no variation among individuals of the same inbred strain.  Those who pay attention know that this is not accurate.  Getting the whole genome sequence of one mouse from a strain and using that to represent all mice of the strain is an example.

In our own lab we have used our computer program ForSim to do simulations of the inbreeding and inter-breeding process that is involved in work to identify transgenic effects or to identify ('map') genes whose variation affects some trait of interest.  It is easy to show that mouse strains, crosses, and representative sequences are not reliable indicators of all the potentially relevant variation that may exist in a particular study design.

There is no easy answer.  First, despite reservations and some public opposition, the rationale that we'll save children from horrible diseases if we give those diseases to mice is compelling.  We are, after all, in charge. We eat pigs and cows, make chickens live shoulder to shoulder indoors for their entire lives, and so on, so what we do to mice isn't all that different.

Second, despite knowing that mice are different from humans (we usually don't yell "Eeeeek!!" when we see another human), we have a very understandable tendency to assume they're the same. That leads us to make dramatic discovery announcements to the press (and granting agencies).  Sometimes these discoveries do, in fact, pan out as advertised.  Again, there is no obvious way to know in advance of, or even after, an experimental result is in how or whether it will work on humans.

But at least we should be vastly more circumspect about this sort of animal research. That's only fair to the voting public that pays for it based on what we promise them....and for the poor mice who have no vote in the matter.