Monday, January 13, 2014

The rise and fall and rise again of Mendelism

One hundred and fifty years ago, Moravian monk Gregor Mendel brought the world Mendelian inheritance.  That is, the Law of Segregation and the Law of Independent Assortment, among other related principles.  We took from his work the idea that we (and our very distant pea-plant relatives) each carry two copies of every gene and each parent randomly passes a single copy to his or her offspring.  And, genes are transmitted roughly independently of each other in the formation of sperm and eggs (this isn't strictly true for genes that are close to each other on a chromosome, and the sex chromosomes are a bit different--but these exceptions are irrelevant for this post).

These principles were derived from Mendel's work with pea plants, a study of carefully chosen traits that behaved in a particular way that he felt would be useful for forming better agricultural hybrid plants -- when he crossed plants with different flower color or seed characteristics etc., the traits did not 'blend' in subsequent generations but instead remained discrete: the offspring had one flower color and one type of seed coat; green peas crossed with yellow ones didn't produce in-between colored peas.  But, Mendel knew very well that not all traits behaved in this way; he just felt they weren't of interest for his purposes.  And without actually saying so, he looked for simple patterns of inheritance that fit other 'hot' science of his time, the development of the atomic theory of matter.

In 1854, in preparation for his later work, Mendel acquired 34 varieties of pea seed, Pisum sativum, from local nurseries.  He spent the next 2 years growing them to determine which characters bred true; that is, which parental traits would appear, unblended, in the offspring. Not all did. He chose 7 that did to follow through multiple generations. He produced 7 crosses, each repeated twice, from a seed plant with a particular trait and the second time with that same trait from the pollen plant. In this way he determined that the male and female contribute equally to the offspring.

He bred his plants for the next 8 years, counting the occurrence of the 7 traits in nearly 30,000 peas. Because he was astutely brilliant and chose his traits well, he was able to document what became known as recessive and dominant inheritance, as well as the random segregation of alleles in offspring, and the independent inheritance of genes for traits using a sample size large enough to produce convincing results.  In essence he developed statistical rules for the apparent inheritance of his chosen traits.  But, he knew he wasn't explaining all of inheritance.  And later work on plants that didn't behave according to his laws further confirmed that he hadn't explained everything (sometimes to his frustration, or even despair, though he had moved on to monastery administration by then).

The story of how Mendel's work was largely ignored for 40 years, and rediscovered in 1900 is well-known (we blogged about it some time ago, too, here and here, e.g.).  As genetics got under way in earnest in the next few decades, to a sufficient extent, given the technology of the time, Mendelian rules, indeed, seemed rather like 'laws' of nature, although it was clear they didn't explain everything.

Thomas Morgan in the fly room;

One of the most influential and productive geneticists of the early era was Thomas Morgan.  He and his students at Columbia created the first linkage maps in fruit flies, identifying chromosomal loci associated with traits in the 1910's, and so on. Even as they did so, Morgan was aware that most traits were due to many genes, and that it was likely that most genes do more than one thing. He wrote in 1917, e.g.:
A man may be tall because he has long legs, or because he has a long body, or both. Some of the genes may affect all parts, but other genes may affect one region more than another. The result is that the genetic situation is complex and, as yet, not unraveled. (The Theory of the Gene, p 294).
And, Morgan ended the paragraph above with this sentence: "Added to this is the probability that the environment may also to some extent affect the end-product."

Morgan also knew that the genetic effect -- the 'trait'-- that he was studying could be restricted to just some developmental stage of his flies, but his point was not the trait but tracing its behavior relative to the idea that they had causal effects that were localized to particular places on chromosomes.  The trait-complexity was something he explicitly said needed to wait for a later time when more was known.

A student of Morgan's, A.H. Sturtevant, in his A History of Genetics (1965), wrote:
With Johannsen [who introduced the words "gene", "genotype" and "phenotype" in the early 1900's] it became evident that inherited variations could be slight and environmentally produced ones could be large, and that only experiments could distinguish them.
In 1902 Bateson pointed out that it should be expected that many genes would influence such a character as stature, since it is so obviously dependent on many diverse and separately varying elements. This point of view was implied by Morgan in 1903 (Evolution and Adaptation, p. 277), and by Pearson in 1904.
In the early 1900's extensive studies of even Mendel's traits showed variation not consistent with simple 2-factor control (Ken wrote an Evolutionary Anthropology piece discussing some of this).  So, chromosomal loci associated with traits were being found, but it was also recognized that traits were complex.

We bring all this up because an MT reader, André Comeau, plant geneticist and pathologist, recently wrote to Ken with some thoughts on Canadian plant breeder (and flutist) Charles Saunders, developer of Marquis wheat, the most important variety of wheat of the last century in Canada. Saunders developed this wheat after a lot of cross-breeding of different wheats from around the world. Marquis wheat became commercially available in 1909 and because it had a shorter growing season than other available wheat varieties, by 1920, 90% of the wheat grown in Canada was Marquis.

In 1902, Saunders attended the lecture at which William Bateson, one of Mendel's earliest and fiercest proponents, "introduced the terms ‘‘genetics’’, ‘‘homozygote’’, and ‘‘heterozygote’’ and emphasized that ‘‘fixity’’ of parental characteristics was of utmost importance in breeding. Bateson concluded that presentation by stating that ‘‘The period of confusion is passing away, and we have at length a basis from which to attack that mystery [heredity] such as we could scarcely have hoped two years ago would be discovered in our time." (This quote is from a 2008 tribute to Saunders.)  (However, we should note that Bateson and many others did not think that these Mendealian traits, which were by definition faithfully transmitted and hence invariable -- as they then thought -- could have anything to do with evolution, which has to do with things that do vary).

Apparently Saunders quickly adopted Mendel's ideas, but Dr Comeau notes that he may have believed there was something Mendel had missed.  Saunders wrote in 1910, "...I cannot help believing that the discovery of the supposed Mendelian units of inheritance may sometimes be due to the unfortunate combination of a lot of enthusiasm and of a very small number of factual observations."  So, even early adopters of the idea of Mendelian inheritance were aware that it didn't explain everything.

But skip forward 100 years.  Emphasis is still on Mendelian inheritance.  The idea of "genes for" traits and diseases is still firmly held -- indeed, direct-to-consumer gene testing companies bank on it.  Yes, there are Mendelian diseases, and thousands of causal genes have been identified but, in the intervening century, it seems that Mendelian traits have gotten more complex, while complex traits are still thought of as simple.  That is, multiple, sometimes thousands of alleles have been identified for many Mendelian diseases (cystic fibrosis is the most obvious example), while the hope lives on that single genes are still to be found for diseases that are difficult to define and pretty clearly polygenic, and/or have a large environmental component (heart disease or type 2 diabetes or autism are examples of these).

The history in broad brush
During the past century, there were basically two branches of genetics.  In one, Mendelian inheritance was used to find out much about genetic transmission and even to identify the nature of genes.  Much of genetics was Mendelian, involved in segregation analysis which was the statistical attempt to show whether a given trait (and its presumed causal gene) was inherited as a Dominant, Recessive, Sex-Linked etc. manner.  This formed the foundation of the very useful 'personalized' medicine called genetic counseling.

Fudge factors were introduced to accommodate the imperfect nature of the results -- uninherited instances were called 'phenocopies', and results that didn't fit the expected probabilities associated with diploid transmission (of one factor from a parent who carried two, as we do), and so on were introduced.  These allowed the models to be fitted to the theory, without forcing adherents to the theory to take the issues very seriously.  Partly, we didn't have the technical means to take it too seriously.  But the nuances were noted!

The other branch of genetics was called quantitative genetics, relating to traits caused by many genes.  Generally, these were quantitatively measured traits, like size, enzyme levels and so on.  Polygenic traits were understood even in the early 1900s to be caused by many genes, even if we had almost no way to find them.  Formal statistical theory, association and regression analysis basically, was developed by leading geneticists.  They knew actual genes were involved, but didn't have the technical means to find them.

We were also aware of traits that seemed to cluster in families but not to segregate with Mendel's probabilities.  So we compromised:  they were threshold traits, based on a quantitative polygenic basis.  Your genotype affected your blood pressure in a polygenic way, but if your blood pressure exceeded some limit, or threshold, you had a stroke.  So the threshold value somewhat represented a quasi-Mendelian trait.

Gene mapping from the dawn of its earliest methods about 25-30 years ago brought everything full circle.  The idea was that we could find the genes that 'caused' any old trait we were interested in.  Big early 'hits' included cystic fibrosis, and some cancer-related genes such as the BRCA1 and BRCA2 genes.  This led people to expect that everything could be predicted by identifying genes.

In turn, even when Mendelians themselves knew they had to introduce fudge factors, and quantitative geneticists knew that multiple genes contributed small effects, the public was sold on the idea that with Big Technology every trait would turn out to be enumerably 'genetic'.

Now in fact, these methods have shown that the 'simple' Mendelian traits' fudge factors reflected underlying complexity even at a single major causative gene.  People who have mutations that are associated in others with severe diseases can show no sign of disease, for example.  And polygenic traits turned out to be just that.  But it turns out that the confusion between inheritance of genetic material, and the inheritance of traits, persists even when (unlike in Mendel's garden) we know better.

Thus, it is the full recognition of these realities that is being resisted by the Big Data community, who cling to the idea of simple answers with high predictive power.  And that means, though Mendel should in a sense be consigned to a very honorable place in the history books of a science to which his work contributed so mightily, the tendency to exhume his ideas, to generate false promises, seems irresistible.


Holly Dunsworth said...

Thanks for writing this Anne! Raising the blogging bar, as ever...

Bradly Alicea said...

Excellent review. So is the better option to look for some sort of "deep association" or "deep causality"? By deep, I mean relationships that are generally indirect or non-deterministic (like so-called deep learning). Agreed, you don't see that consideration much in high-throughput studies. Considering that big genomic data yields many more loose associations than causal phenomena, it is quite surprising.

BTW, wasn't Marquis Wheat what they fed the tribbles in Star Trek?

Anne Buchanan said...

Thanks, Bradly. I don't know whether Marquis wheat was fed to the tribbles in Star Trek but it would certainly add to the story.

If I knew how to understand causality, I'd tell you. Heterogeneity and complexity seem to be huge stumbling blocks.

Anne Buchanan said...