Friday, July 29, 2011

Mendelian Inheritance: Basic Genetics or Basic Mistake? Part V.

This series of posts has been about the illusion of simple Mendelian inheritance, that has been an enormously powerful tool in understanding how genes are inherited, working through carefully chosen experimental situations in which traits were so closely tied to specific alleles (variant states of a single gene) that it seemed as if the trait itself were being inherited.  But it's only genes that are inherited (except for the goop that's in the fertilized egg or other cell that starts a new organism on its way).

We've been saying that the effect of  alleles on a trait varies with the alleles' contexts.  We can call this variation a 'spectrum' of effects, just as a rainbow is a spectrum of color: most variants have very small average effects, but a few, usually rare ones, can have such large effects that it seems that whenever you inherit one of those alleles you're nearly sure to have the trait.  Just as in the classical pattern of occurrence in families represented in this figure that we've grabbed from the web--almost.  We used the original in earlier posts in this series, because it is that which is in all the textbooks.  But this is more like reality and illustrates a main point.  The Aa's in the middle are not exactly as 'dominant' as the true-red Aa individuals.  And the light orange aa in the grandchild could easily be classified as unaffected, or affected....depending on what theory you were trying to confirm:

But we've said that the more clear-cut (or argued to be clear-cut) version is what one sees for a selection of variants in a selection of genes, and it grabbed scientific attention because it fit our expectations of  how inheritance works, based on Mendel's results with carefully chosen traits in peas, and similar trait-selection for the following century.  Even  then candid acknowledgment will be typically made that there is variation in the actual traits that people in the same family have: not all the red dots are equally solid or red, as in the figure.  And we know that as a rule, since most effects of individual variants are very small, and many variants contribute to most traits, even the basic idea of Mendelian inheritance makes little sense as a rigorous theory of inheritance--and we should realize that, accept that, even if we wish things were simpler.  The real world doesn't have to fulfill our wishes!

In fact, fine classical papers from around 1960 showed that the appearance--the illusion--of Mendelism can arise in another way unrelated to the extreme, usually rare ends of the allelic effects distributions.  If a presence/absence trait, like hyptertension vs normotension, stroke vs non-stroke, cancer vs non-cancer, arises when a threshold is exceeded on some underlying quantitative trait (e.g., if your blood pressure rises above some agreed-on cutoff level for calling it hypertension), and if many different genes contribute to it, the trait can occur in families in a way that appears clearly 'Mendelian', that is, as if only a single allele were responsible.

We know that even with complex inheritance, children will resemble their parents, and we know the average extent to which that should happen.  Generally, you're half-way between your parents' trait levels, such as stature.  But the usual idea is that this doesn't apply to discrete (yes/no) traits that follow Mendel's rules: you're either this or that, but not a half-way blend of your parents for such traits.  As Darwin would say, the traits do not 'blend'.

Nonetheless, as was shown in the '60s, the probability of a qualitative trait (yes/no, like hypertension of diabetes) can be similar enough between close relatives that it appears to follow Mendel's rules.
This is because if you have a combination of genetic variants, across your genome, that makes your blood pressure high, your children will inherit half of those (on average) plus whatever similar risk effects your spouse may have, and for a generation or two it can have a net result of around half of children of an affected parent also being affected with hypertension, which is what you'd expect if there were just one Hypertension gene with two states, normal and hypertensive.  The illusion can arise under a broad set of circumstances, and can fuel hopes of simple situations----or hopes that GWAS will, after all, really work.

This means that, in addition to all the other things discussed in this series, and not even considering shared environmental effects which can often be by far the most important, multi-gene causation can reinforce the ideas of someone assuming that Mendelian inheritance of the trait is true.   

We can relate all of this to GWAS findings in another way as well.  If a given allele is common enough in cases for its effects to be found and reach statistical significance, relative to its frequency in controls, then it has a chance to be detected in a GWAS study.  But this depends on its penetrance, that is, on the strength of its effects, on their own, on a trait measure like stature or blood pressure, or on the presence of a disease.  Highly penetrant alleles will be found more frequently in cases than controls because they have a higher chance of 'causing' the trait.  The greater the effect, the more likely if you have the allele you have the disease.

This is a kind of 'dominance', because the allele is being detected against the other allele at that gene in individuals, plus whatever other relevant variants they have in their genome. That typically only a few genes are identified in this way shows how relatively rare real dominance is--how far it is from being the baseline, basic nature of inheritance!  In fact, most variants that contribute contribute so little--have so little 'dominance' in this context--that we simply cannot detect their individual effects (or those effects are not enough to generate a statistically 'significant' association with the trait).

Put another way, these various considerations show how, if we assume a theory, we can make the data fit the theory and also assume we understand the data.  But if that theory is wrong or very inaccurate, it can lead us far Mendelian theory has indeed been doing for more than 150 years.

Thursday, July 28, 2011

Mendelian Inheritance: Basic Genetics or Basic Mistake? Part IV

So, if we are right that 'Mendelian' inheritance is fundamentally mistaken--or, at best, generally inaccurate and misleading--then what kinds of conclusions can we draw and how can some of the basic attributes of life be accounted for?  We have to assume that life evolved and that to a great extent means genes, broadly defined.  Indeed, we may be worse off than we think if, as we tried to show in an earlier post, even what a gene is, is elusive with current knowledge.

Waterhouse, A Mermaid
In our book after which this blog is eponymously named, we argued that there has been too much attention placed on evolution (and a competition-centered view of life at that), relative to the more ubiquitous properties found at life's other time scales--of development and maintenance of an organism, and of the interaction of factors on the ecological scale.

The idea of Mendelian inheritance, which is widely extended to the vast majority of gene-trait relationships that clearly are not following the monk's principles, is of discrete states one of which dominates in their various combinations.  This was (and is) extended to evolution, with our grossly inadequate 'winner take all', 'survival of the fittest' notion of one best -- fitness-wise dominant -- variant that natural selection favored into success just as surely as a dominant allele was favored ineluctably into manifestation in the organism.

But if you think of the other properties of life, which we center our book around and will briefly name here, you might ask how Mendelian thinking, which only by deep contortions can be related to those principles, could ever have taken hold, unless it's by what amounts to an ideology, a takeover of a certain highly deterministic, simplistic view of the living world--a view that simply, for decades, wrote off into alleged irrelevance the actual way in which organisms work.

Sequestration and modularity
From DNA on up, life is organized as hierarchically nested partially sequestered units.  DNA has functional sequence elements arranged  together along chromosomes, but partially isolated in that they can serve their individual functions.  The units (such as amino acid codons) are repeated many times.  Proteins have partly separated functional units, too.  Cells are packaged units that have many different partially isolated subunits within them, such as organelles like mitochodria, isolated areas like the nucleus, and local differences in what is present in the cell membrane (e.g., a cell may have a front and back end, so to speak).

An organism (or even collections of organisms as in bacterial biofilms) is made of large numbers of cells.  These are repeated units that communicate with each other via combinations of signaling and other molecules, and this is what leads them to express particular, context-dependent sets of genes.  So that they are repeated, but different.  This process occurs hierarchically during development, and in response to environmental changes during life.  An organism is divided into organs and organ systems, like brain, heart and vessels, digestive organs, and so on.

Organs are made of nested, repeated units.  Intestines are segmented along their length, and their surface is littered with repeated structures called 'villi'.  Skeletons are made of repeated, partially different but interaction bones.  Trees are made of leaves and so on.  Plants and animals alike are constructed by repetition and branching.

Yet, importantly, each organism has only the one genome that it inherited from its parents!  So the same genome makes brains and braincases, that are as different from each other as any two things in all of life.

These processes are both qualitative: each leaf or bone is a separate structure; and quantitative: each such structure is somewhat different.  This is the natural variation that is the material on which evolution can work.

If you just think about this, you would have to wonder how it could be brought about by Mendelian inheritance.  How could just two states at a single gene be responsible for such complexity and quantitative internal organization?

It is perhaps easier to see how breaking a gene could cause a major state change, and thus a normal and dead alternative at a gene could be manifest in Mendelian inheritance terms.  Or if the trait is very close to a protein coded by a single gene, two major alleles (variants) at that gene could have big differences (yellow vs green peas, for example).  But as a rule, Mendelian inheritance makes little sense.  Partly that's because, as mentioned in earlier parts of this series, we confuse inheritance of traits with inheritance of genes.  Genes--specific stretches of DNA--are clearly inherited in a Mendelian way (with some exceptions that don't matter in this context here).  But traits generally are not.

The reason for all of this is that the basic principles of life, that include the above descriptions (see our book for detailed discussion in this context), involve cooperation--that is, co-operation or contemporary interaction--among many different elements, each of them variable in a population.  What an individual inherits are sets of genomic variants from its parents.  The traits an individual manifests are the net results of these variants acting in the particular environments in which they find themselves.

Wednesday, July 27, 2011

Mendelian Inheritance: Basic Genetics or Basic Mistake? Part III.

In his experiments, published in 1866, Gregor Mendel crossed two strains of domestic peas.  'Crossing' means that one parent was from each strain, and because the strains were inbred, this meant that there was little variation in the genomic backgrounds within each strain.  Mendel used different pairs of strains for the different traits he studied.  Further, he found that his chosen traits (seven of them, like round vs wrinkled or green vs yellow exhibited what he called 'dominance' (as we translate the term today).  That means that the effect of the allele (genetic variant) from one of the two strains was always manifest as its corresponding trait.  To revisit what we said earlier in this series, in the first generation of a cross, every parental pea plant was either GG or YY genotype (green or yellow) at a given test gene so every offspring plant inherited a G allele from one parent and a Y from the other, meaning they had the GY genotype and the peas and pods were yellow (in which case we say Yellow is dominant over Green).

But things are more complicated in the generation produced by crossing these plants, because then a specified fraction of each offspring type would be expected (as we noted in Part I, for GY x GY, the famous 1GG, 2GY, 1GG ratios were expected.  Mendel went a few generations beyond that,  and the ratios became more subtle, but the point is the same.  However, each generation allowed some scrambling of the genomic background of the strains, that is, at the rest of the genome, due to what is known as 'recombination' among the two parental strains' chromosomes.

It was in the context of those backgrounds with their limited variation, that the plants seem to breed 'true'.  Of course, there was statistical variation in the frequency of the relative offspring types (and Mendel was accused of fudging some figures to make his story come out closer to what he expected).  He did not get exactly the expected proportions.   But this variation, which is observed in every 'Mendelian' situation in any species, has always been attributed solely to chance allele transmission from parent to offspring (or to data fudging by tossing plants that didn't fit).  The fudging accusation is highly debated (see my paper, Goings on in Mendel's Garden, Evol. Anthropol., 11:40-44, 2002). But there may be a more serious issue, hidden in the statistics.

That issue is the assumption, from Mendel's day to today, of the expected proportions of plant types (green vs yellow, smooth vs wrinkled).  That expectation was based on pure, 100% dominance, that is on the inherently dominant physiological effect of the two alleles in any of one of his test crosses.  The assumption is that there is no genomic background variation that causes deviation from these 'pure' expected proportions.  We don't know to what extent there were 'greenish' or 'yellowish' peas, or peas with nondescript or mottled nature in Mendel's experiments, that would have been tossed out on the grounds of foreign pollination or whatever.  We know however that the variation was small enough that the assumption of dominance was good enough--for Mendel's purposes. 

At least one of his traits, plant height, was largely quantitative and less clear to judge (this was written up in the early 1900s, especially by OE White in a series of thorough papers).  We know that genes in the wild have many different variant states, not just two. And these have variable effects.  And this is true for Mendel's traits.  This is the same story, consistently, with variation at alleles related to human diseases (like cystic fibrosis, or PKU, etc.), and variation and genetic control in essentially any species carefully studied.  For these reasons, Mendel probably could not have done what he did with random samples of wild plants--anymore than we can do it with GWAS and complex diseases today.  Indeed, we must say that the above-cited paper raised all of these points, before the mountain of confirming data that subsequent studies generated and that we have today was available.  Of course, these are inconvenient facts if you hunger, naturally perhaps, for simple answers to fond dreams of perfect crops, and immortality through genetics.

In that sense, Mendelian traits are an artifact or illusion of his simple experimental set up, one he intentionally chose because the traits 'worked' the way he wanted them to. But there are further reasons than natural variation in the test genes themselves, for thinking that Mendelian inheritance has been, from the beginning, a very misleading notion.

This is the fundamentally mistaken notion that a gene is the same as the trait it contributes to.  It is the assumption of causal inherency.  Instead, what we know very, very clearly is that with a few kinds of exceptions (such as many lethal dysfunctional mutations in genes), the effect of an allele is contextual:  it depends on the environment and, in this case more importantly, on the genomic background.  That is, the variants at the many other genes in the same plant or animal affect how the allele of interest is manifest.  This is because no gene is an island, despite our clinging to Mendelian concepts of inherent causation for the last 150 years.

The degree of this contextual dependence varies from gene to gene, trait to trait, population to population, and species to species.  There is no single biological theory (other than, perhaps, this generalization) that predicts what we will find.  Some alleles in some situations act in a way that would make Mendel smile.  But few traits are, overall, like that.  10% or so of known devastating mutations in humans, that typically are called 'Mendelian', are the normal allele in other species!

To a great extent Mendelian inheritance of traits, that has become the sacred icon of modern genetics, is simply wrong!  Certainly, in any species one can find traits that segregate in the expected fashion to a degree satisfactory for the purposes at hand.  There is a spectrum of effects, and some are of this simple-enough causation.  But from a point of view of an actual theory of biology, it is the specturm not its extreme, that is important.

In this sense, genetic effects are as relative to each other as motion of objects is to Einstein.  This means that a subtle but centrally important point,  that there is really little if any difference between 'physiological' and 'statistical' dominance!  These terms were introduced earlier in this series of posts.  The idea of inherent dominance has been a misleading oversimplification from the beginning.  Dominance is just a sometimes-observed approximate correlation between alleles and traits that applies to a particular population.  Inherent biological dominance is an experimental illusion.

Traits in Nature, like diseases, that appear to be Mendelian, are those that in current circumstances, chosen among countless traits that have been studied, seem to appear in families in the classical way.  We know this very well, but it's not in most peoples' perceived self-interest to face up to it.  Even the more devastating mutations in 'Mendelian' single-gene diseases typically have variable effects.  The same mutations in mice are very often strain-specific.  And many alleles at the gene have less, and even more, variable effects.

For example, most individuals with 'recessive' diseases are not homozygotes for 'the bad' allele: they are heterozygotes for various alleles that compromise the trait to various degrees away from 'normalcy'.  
To account for this, but still to cling to our Mendelian paradigm, we introduce fudge factors to account for incomplete dominance and the like, that must be introduced in genetic (family) counseling risk estimates.

To the extent these statements are true, the idea of Mendelian inheritance (of traits) has been a stunningly misperceived, mistaken theory that continues to cost huge amounts of money for chasing genes 'for' particular traits, as if such genes have inherent causal, and hence inherent predictive properties.

Tuesday, July 26, 2011

Mendelian Inheritance: Basic Genetics or Basic Mistake? Part II.

The idea of Mendelian inheritance, revered for more than a century, was initially about the inherently causal nature of dominance and recessiveness.  The A allele always and,  presumably, inherently, made its effects manifest, as in the accompanying figure.  Even as late as 1995, an influential paper termed this 'physiological' dominance.

But why so long after Mendel's rules were deeply established, was there a need for such a term?

In the genetics of quantitative traits, one can look at the average trait value (say, stature or body fat content of pigs or oil content of corn) in AA's and aa's.  If the 'A' were physiologically dominant, then the Aa's should have the same mean value as the AA's.  These are clearly statistical statements because unlike green vs yellow peas, there is variation and there are sampling issues (in fact, even Mendel's classical traits had some variation, but little overlap between dominant and recessive plants).

What transpires as a rule, however, is that the Aa's are not exactly like the AA's.  But a common alternative to Mendelian ideas was that rather than being dominant and recessive, the contributions of the alleles at a gene were additive, like doses of medicine:  Each copy of, say, an A allele you have, is a jump in your trait value.  In such situations, Aa's would be exactly intermediate between AA's and aa's.  If instead they shift towards one of the two homozygote (AA or aa) individuals, this is called statistical dominance.

Unlike physiological dominance, statistical dominance has to be evaluated by sampling from a population, and that means in turn that the idea of estimating the net or 'actual' effects of the A and a alleles, depends on the frequencies of these in the population, and more importantly, that also means that environmental and other genomic effects will affect the mean values of the carriers of the 3 genotypes.  Statistical geneticists now treat these two kinds of dominance differently, often ignoring the physiological because it is hard to identify in real-world situations of quantitatively variable traits, as opposed to experimental settings with only two clear states, that have been part of genetics ever since Mendel himself.

We can relate  this to GWAS findings a simple way.  If a given allele is common enough in cases for its effects to be found and reach statistical significance, relative to its frequency in controls, then it has a chance to be detected in a GWAS study.  But this depends on its penetrance, that is, on the strength of its effects, on their own, on a trait measure like stature or blood pressure, or on the presence of a disease.  Highly penetrant alleles will be found more frequently in cases than controls because they have a higher chance of 'causing' the trait.  The greater the effect, the more likely if you have the allele you have the disease.

This is a kind of 'dominance', because the allele is being detected against the other allele at that gene in individuals, plus whatever other relevant variants they have in their genome. That typically only a few genes are identified in this way shows how relatively rare real dominance is--how far it is from being the baseline, basic nature of inheritance!  In fact, most variants that contribute contribute so little--have so little 'dominance' in this context--that we simply cannot detect their individual effects (or those effects are not enough to generate a statistically 'significant' association with the trait).  So, dominance is far from the general rule.  We will see in the next installment why this should not be any kind of surprise.

Monday, July 25, 2011

Mendelian Inheritance: Basic Genetics or Basic Mistake? Part I.

Gregor Mendel
Gregor Mendel wanted to improve horticulture in his native Moldavia.  The idea was to use hybridization to bring desired traits from different pea strains together by controlled breeding.  To do this effectively depended on an understanding of inheritance.  General selective breeding worked for some traits--and had been carried out for thousands of years.  But some traits in the available strains of plants seemed to be dichotomous (two-states, like round and wrinkled), and perhaps there would be effective ways to take one state, say, round peas, and breed that into some other strain that had attributes the round-strain didn't currently have.

Mendel carefully chose strains of basically inbred  pea plants that had simple patterns of inheritance: the dichotomous traits 'bred true' from  parent to offspring. He knew there were many traits that did not just have two states, or that did not breed true in this simple way.  But to understand the effects of hybridization--or we now wrongly say, inheritance--he picked appropriate traits. His seven carefully chosen traits all had an additional attribute that was important in this sense.  One of the states was 'dominant' to the other state, which was therefore 'recessive'.   These are essentially the terms Mendel himself used.

The 7 traits of interest to Mendel
This meant that a plant produced by crossing two parents with different variants of the trait would have only one of the traits.  If a plant received an 'A' allele (variant) from one parent, that was enough to give it the 'A' trait, even if it had received the other, the 'a' allele from the other parent.  So AA's and Aa's had the dominant trait, and only aa's the recessive.  From these, he observed the famous Mendelian ratios that are in all the standard genetics texts and that every geneticist 'knows' to be true.  So if an Aa plant were crossed with another Aa plant, since each parent has a 50% chance of transmitting each of its alleles, 1/4 of the offspring of this cross would be AA's, 1/4 aa's, and 1/2 of the offspring would be Aa's.  This and other similar conclusions depending on the particular breeding scheme, was what Mendel showed, and what revolutionized the understanding of inheritance and opened the door to powerful experiments in genetics that are our legacy and working basis to this day.

Note right off the bat that there was a very big mistaken conclusion that followed:  Mendel was showing the nature of trait inheritance, but it was interpreted to mean the laws of genetic inheritance.  It worked only because of the 100% correspondence  due strictly to the careful choice of 2-state, highly determinative traits in his experiments.  Although even that isn't strictly true (see my 2002 paper in Evolutionary Anthropology), it was close enough that even now we confuse inheritance that strictly applies only to genes, with the appearance of traits in offspring compared to their parents.

Classically, many 'Mendelian' diseases were identified, because they approximately followed Mendel's laws.  Modern biomedical genetics began, around 1900, with Archibald Garrod's studies of 'recessive' traits that arose in inbred marriages, that raised the chance that a child would inherit the recessive allele from both parents (because the genetically related parents shared the allele from their common ancestor).  Step by step, we built in the illusion that genes were just waiting their transubstantiation into traits.

Until the nature of DNA was understood and we could examine DNA directly we had to work through  traits rather than genes, even though we called the field 'genetics'.  For decades it was observable traits in experimental species, such as fly eye color, or 'Mendelian' disease, or similar traits in plants.  Genetics split into two parts, one  dealing with this kind of 'clear cut' particulate inheritance, which eventually led to understanding of how protein-coding areas were arranged along chromosomes, the nature of chromosomes, the nature of genes coding for proteins, and the transmission of DNA from parent to offspring.

The other segment of geneticists dealt with the majority of traits that clearly did not 'segregate' from parent to offspring, the quantitative traits  like stature, milk-yield, grain nutrient properties and the like, from which the field of quantitative genetics developed.  It was more pragmatic and said that a quantitative or complex trait was due to the inheritance of many genes: we might not be able to identify them, but jointly they were responsible for traits in organisms.  The similarities between parents and offspring for complex traits was consistent with this view as well.

Cajanus Cajan; Wikimedia Commons
Thus, sight-unseen, inheritance of traits was the  underlying theory, equated to the inheritance of genes, even when we couldn't identify the genes.  When the DNA sequencing age gradually developed, we got the idea that by tracking down genetic variance we are tracking down inheritance variance--the extension of the Mendelian illusion that genes were the same as traits!  This has led to the problems that we so widely see (but are so widely waved away) in studies like GWAS: we are not really finding the genes that 'cause' our favorite traits (including most diseases).

Of course all of this is manifestly a Grand Illusion!  Even a fertilized pea ovule does not have peas, wrinkled, green, or otherwise!  Once the connection between a gene and a trait becomes less than 100%, or once many genes contribute information about a trait, we see how obviously Mendelian ideas were a badly misleading mistake.  They were great for providing ways to set up experiments that isolated genetic effects and led to an understanding of genetic inheritance.  But they were, from the beginning, very misleading about trait inheritance.

In the next installments of this series, we'll examine the idea of dominance and genetic effects further, and will eventually ask whether, surprisingly,  there really is such thing as dominance in the first place!

Sunday, July 24, 2011

Genetics, Research, Health....and ethics (if there are any)

Well, the NYTimes today has a story by Mark Bittman on the toll that junk food is taking on our national health.  Bittman says that a 20% tax on sugared beverages would raise billions in revenue that could be put towards health improvement, and would prevent 400,000 people from being diabetic, would greatly reduce our national obesity problem, and would save $30,000,000,000 (that's billions) in health care costs.

These are facts so well known that it's only the specific estimates that justify a major story.  Why we haven't long ago done something about it is more complex, and not a tribute to human nature--unless it's a tribute to selfishness.  That's another subject.

For us what is relevant is that in a time when we're trying (at least some are) to make health care available to all in the US, and to curtail health care costs, the debate is about profits, HMO efficiency (that is, reducing care quality), what insurance won't have to pay for, and so on.  Meanwhile, right beneath our noses are major answers.  Of course, lots of industries, such as testing equipment makers, test labs, and Pharma don't want to see a reduction in their customers (the obese, diabetic, hypertensives, and the like).

Removing the overwhelming burden of clearly environmentally caused diseases is simple, but doesn't require lots of research grants to keep the glucose flowing through the veins of universities, so it will naturally be resisted.  But if that were done, the diseases that would remain would be those more truly genetic cases of the same diseases, cases that occur without environmental triggers.  The mask of all the non-genetic cases--most cases--would not blow away the power of GWAS and other kinds of studies as it does today.

Genetics has already under-delivered on the promise of the use genetic data to actually do something about these diseases.  Most of the decades-known genetic diseases are still here with no gene-based therapy available.  We don't think that geneticists should be faulted in this regard, except for their self-aggrandizing hype, because the problems are difficult. Still, once a gene is known, preventing or treating the disease is in a sense an engineering problem (getting an improved gene to replace a defective one or its effects), and one wouldn't want to bet against our ingenuity and technology when it comes to engineering.

So we should implement environmental measures to reduce the disease burdens, pull the plug on research that is going nowhere based on over-geneticizing disease, and intensify research on those traits that really are genetic to show that genetic knowledge and technology really can make a difference other than to the careers of geneticists (like us).

Of course, this won't solve two problems.  First, if fewer people get or die from these common environmental diseases, people will last much longer, decaying gradually, suffering more years of increasingly helpless debility and demanding resources, energy, food, and care-taking that are already in short supply.

And, second, universities not being any more willing to be good citizens than businesses are, will shift their demands for research funds to 'meta' studies: studies of social aspects of lifestyle changes, of aspects of living longer, of surveys of how people feel about all of this, and so on.  Because we'll find something to keep ourselves in business, to keep the bureaucrats' portfolios full, and the like.  Only cuts from the funding source will force universities to cut back on go-nowhere programs that exist mainly because they bring in funds.  These comments may sound rather misanthropic, but if you've lived and worked in the university setting as long as we have, you'll know that there is truth in what we say.

There are many traps in human life, real existential traps.  Curing one disease makes room for another, and worsens overpopulation and resource burdens.  Closing down useless research programs costs people jobs. But when we know how to ameliorate major problems, but spend our resources on small vested interests (like the research industry) rather than on the major problems, criticism is justified.  That there are no escapes from the existential traps is another, genuine and serious thing to think about.

Friday, July 22, 2011

More sad commentary

Today we received a mass email from an outfit called ACCDON.  They are searching for editors to help non-native English speakers to write their papers so they can be published.  The service, like ones we've discussed before, includes advice from 'experts' on study design and other aspects of the paper the client has sent in to the company for help (unfortunately, the ad itself isn't even in good English).

It is not unreasonable for good scientists who aren't native speakers to want writing help.  For the last decades, this was routinely offered by journal Editors if the paper was being accepted, or by friends in the author's department.  It was part of mutual help that faculty provided.

But we're in different times.  First, a much higher fraction of researchers are from Asia (seemingly the main target of the message sent around) or other countries.  But English is the current lingua franca of science.  That can be a real challenge.  No problem there, but it is sad that help from colleagues is no longer sufficient.  That may be a sign of the competitive times. 

More importantly, the same desperate competition for limited but (if you're lucky) very bountiful resources, has led to parasite companies who will sell an implied promise to get your work in premier journals, or even help with your study design  (presuming the paid readers really are sufficient experts--a doubtful premise, since legitimate researchers are already overwhelmed by reviewing requests, and reviews rarely actually provide the kind of assistance being promised).  It is like test-prep companies,  taking advantage of peoples' desperate need to get their kids into Yale. And of course they're also selling promised income to those who sign up to be their reviewers (again capitalizing on under-employed scieintists?).

There is no magic to getting research published.  Indeed, there are so many journals that almost anything can be published (and there's a very rapid explosion of for-fee online journals, new ones nearly every day). But if the needy customers who hope that this service will lead to grants or publication even do get published, unless it's in a major journal, it won't help their careers much anyhow.  And of course most papers are in the proliferating minor journals.  Money spent, little gain.

Do your taxes that pay for research also pay for publication and for these crib-services?  If so, should they?  If not, is it right or just anti-social to bleed young researchers of their private funds to feed their hopes for success?

Of course, all careerism aside, it is demonstrably true that most published research is hardly cited by anybody, and that because of the haste and proliferation and hyperbolization, much of it contains errors and is not replicable.

This hypercompetitive system that leads to the commercialization of fear should be cooled down.  But that's not likely.

Thursday, July 21, 2011

Changing partners in life: recombination biology

Two papers have appeared this week about the nature of recombination in humans, one in Nature Genetics, by Wegmann et al. ("Recombination rates in admixed individuals identified by ancestry-based inference"), and the other in Nature by Hinch et al. ("The landscape of recombination in African Americans").  Do they add anything new to what we know about recombination or human evolution?

When sperm or egg cells are about to be produced, all our 23 pairs of chromosomes line up against their corresponding partners in the center of the cells, before being pulled apart into the separate daughter cells that will eventually be the sperm or egg cell, in which one copy of each will be present.

One might expect that a sperm or egg cell would have either the copy of, say chromosome 3, that the parent got from his or her parent, and is thus transmitted intact to the future grandchild.  But this is not what happens.  Instead, the lined-up 'homologous' chromosomes stick to each other here and there, and swap pieces.  This is called recombination and means that the grandchild receives a mix of pieces of each grandparent's chromosomes.

Recombination is treated generally as a random process: the chromosomes stick to each other in random places along their length.  But this is inaccurate.  The sticking points happen more frequently in some places than in others.  This differs from chromosome to chromosome and even to some extent between males and females.  Why is that so?

One reason is that recombination is effected by active mechanisms that involve proteins (that is, that are coded for by genes somewhere in the genome) and the DNA sequences that these proteins stick to in bringing recombination about.  Variation in those binding sequences can affect the frequency with which recombination will occur.  The results show up statistically as relative hot-spots of recombination.

Recombination generates new variation (new combinations of genetic variants) as a resource with effects on evolution, and this is because these combinations can have effects on the organism's traits.

These two papers are companion pieces.  As Wegmann et al. report,

By comparing the AfAdm [African admixture] map to existing maps, we were able to make several observations: (i) there is evidence for subtle population differences in recombination rates between African and European populations, (ii) African-European admixed individuals appear to have recombination rates that are, on average, intermediate between the African and European rates, and (iii) the degree to which the rates are intermediate is predictable from the average ancestry coefficient (~80% African and ~20% European) in our sample. Further, in admixed individuals, recombinations appear to be concentrated at hotspots in a manner correlated with ancestry: individuals with more African ancestry have recombinations at hotspots found in the HapMapYRI map, and individuals with more European ancestry have recombinations at hotspots found in the HapMapCEU map. These observations are consistent with the differentiation between populations for fine-scale recombination rates1–5 and with the European-African differentiation at PRDM9, the only known major locus affecting fine-scale recombination rates.

Whether this is important, or worth whatever its costs were to discover, is open to question.  Could the same not already be, or have been, known from work on other species where it would be much less costly?  And if it has effects on disease--always the self-interested rationale of investigators by which such stories are touted in the news--such effects would be detectable by the current methods (such as GWAS) that hunt for them.

So these papers provide what might be useful and important information about a basic aspect of cell biology and the generation of variation.  The value will prove out in the future, but certainly our repertoire of basic knowledge about variation, and in this case human variation, is increased thereby.

Wednesday, July 20, 2011

Epistemology and genetics: does pervasive transcription happen?

Pervasive transcription
RNA basepairing
It has been known for some time that only 1-5% or so of the mammalian genome actually codes for proteins. It does that by 'transcribing' a copy of a DNA region traditionally called a 'gene' into messenger RNA (mRNA) that was in turn 'translated' into an amino acid sequence (protein, in common parlance).  According to this established theory known as the Central Dogma of Biology (the problems with which we blogged about here), a gene had a transcription start and stop sites, specific sequence elements in DNA from which this canonical (regular) structure of a 'gene' could be identified from DNA sequence (but see below for problems with the definition of a gene).  From that, we got the estimate that our genome contains roughly 25,000 genes.

Not so long ago, the remainder of the genome, the non-coding DNA, was called 'junk DNA' because it wasn't known what, if any function it had.  Then, some of it, generally short regions near genes, was discovered to be sequence elements that, when bound by various proteins in a cell, cause the nearby gene to be transcribed.  So some of that DNA had a function after all.

But then, with various projects such as in 2007, the ENCODE project, a multi-institutional exploration of the function of all aspects of the genome in great detail, it was found that the majority of all the DNA in the genome is in fact transcribed into RNA in a process called 'pervasive transcription'.  As the ENCODE project reported,
First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another.  
What all this RNA did wasn't yet known, because it didn't have the structure that would be translated into protein, but that pervasive transcription happened seemed clear.  Some functions were subsequently discovered, such as 'microRNA' that codes for sequences complementary to mRNA that are used to inhibit the mRNA's translation into protein.  There were types of RNA that had their own functions within the classical idea of a gene, even if not translated into protein (these included ribosomal RNA and transfer RNA).

Or maybe not...
However, the pervasive transcription idea was challenged by van Bakel et al. in PLoS Biology, who said that most of the low-level transcription described by ENCODE was in fact experimental artifact or just meaningless noise, error without a function.

Oh, but it does!
Now, in a tit for tat, just last week a refutation of this refutation, a paper called "The Reality of Pervasive Transcription," appeared in PLoS Biology (on July 12).  Clark et al. confirm that pervasive transcription does in fact happen, and take issue with the van Bakel et al. results.
...we present an evaluation of the analysis and conclusions of van Bakel et al. compared to those of others and show that (1) the existence of pervasive transcription is supported by multiple independent techniques; (2) re-analysis of the van Bakel et al. tiling arrays shows that their results are atypical compared to those of ENCODE and lack independent validation; and (3) the RNA sequencing dataset used by van Bakel et al. suffered from insufficient sequencing depth and poor transcript assembly, compromising their ability to detect the less abundant transcripts outside of protein-coding genes. We conclude that the totality of the evidence strongly supports pervasive transcription of mammalian genomes, although the biological significance of many novel coding and noncoding transcripts remains to be explored.
Clark et al. question van Bakel et al.'s molecular technique as well as their 'logic and analysis'.
These may be summarized as (1) insufficient sequencing depth and breadth and poor transcript assembly, together with the sampling problems that arise as a consequence of the domination of sequence data by highly expressed transcripts; compounded by (2) the dismissal of transcripts derived from introns; (3) a lack of consideration of non-polyadenylated transcripts; (4) an inability to discriminate antisense transcripts; and (5) the questionable assertion that rarer RNAs are not genuine and/or functional transcripts.
They go into detail in the paper about how and why these are serious problems, and conclude that van Bakel et al.'s results are 'atypical' for tiling array data, their tissue samples were not sufficiently extensive, and that pervasive transcription is being detected by a variety of experimental methods.  (Tiling refers to the fact that sequencing is done one stretch at a time, and long stretches and their location on chromosomes from which they were copied is done by finding overlapping ends of these short stretches, that show how they 'tile' together relative to the chromosome as a whole.)

No, no, no! 
And finally (to date), van Bakel et al. respond, also in the July 12 PLoS Biology.
Clark et al. criticize several aspects of our study, and specifically challenge our assertion that the degree of pervasive transcription has previously been overstated. We disagree with much of their reasoning and their interpretation of our work. For example, many of our conclusions are based on overall sequence read distributions, while Clark et al. focus on transcript units and seqfrags (sets of overlapping reads). A key point is that one can derive a robust estimate of the relative amounts of different transcript types without having a complete reconstruction of every single transcript.
So, they defend their methods and interpretation and conclude that "a compelling wealth of evidence now supports our statement that 'the genome is not as pervasively transcribed as previously reported.'"

An epistemological challenge -- what do we know and how do we know it?
What is going on here?  Is most of the genome transcribed or isn't it?  We are intrigued not so much by the details of the argument but by the epistemology, how these researchers know what they think they know.  In the past, of course, a scientist had an hypothesis and set about testing it, and drew conclusions about the hypothesis based on his or her experimental results.  The results were then replicated, or not, by other scientists and the hypothesis accepted or not.  The theory of gravity allowed many kinds of predictions to be made because of its specificity and universality, for example, as do the theories of chemistry in relation to how atoms interact to form molecules.  This is a simplified description of course, but it's more accurate than not.

Molecular genetics these days is by and large not hypothesis driven, but technology driven.  There is no theory of what we should or must find in DNA.  Indeed, there is hardly a rule that, when we look closely, is not routinely violated.  Evolution assembles things in a haphazard, largely chance-driven way.  Rather than testing a hypothesis, masses of data are collected in blanket coverage fashion, mined in the hopes that a meaning will somehow arise.  That this is how much of genetics is now done is evident in this debate.  As first reported by ENCODE, complete genome sequencing seemed to be yielding a lot of DNA that was transcribed from other than protein coding regions, and so they speculated as to what that could mean.  Their speculation wasn't based on anything then known about DNA, or theory, but on results, results produced by then-current technology. That is the reason--and the only reason--that they were surprising.

And van Bakel et al. disagreed, again based on results they were getting and interpreting from their use of the technology.  Then Clark et al. disagreed with van Bakel et al.'s use and interpretation of molecular methods, and described their results as 'atypical'.  And both 'sides' of this debate attempt to strengthen their claim by stating that many others are confirming their findings.  And this is surely not the end of this debate.

We blogged last week about 'technology-driven science', suggesting that often the technology isn't ready for prime time, and thus that many errors that will be hard to ferret out are laced throughout these huge genetic databases that everyone is mining for meaning.  When interpretation of the findings is based on nothing more than whose sequencing methods are better, or whether or not a tissue or organism was sequenced enough times to be credible, or which sequencing platforms seem 'best' for a particular usage (by some criterion) -- meaning that the errors  of the platform are least disturbing to its objective -- rather than on any basic biological theory or prior knowledge, we're left with the curious problem of having no real way to know who's right.  If everyone's using the same methods, it can't be based on whose results are replicated most. If we haven't a clue what is there, even knowing what it means to be 'right' is a challenge!

But we don't even know what a gene is anymore!
These days, even the definition of a gene, our supposedly fundamental inherited functional unit, is in shreds and the suggested definitions in recent years have been almost laughably vague and nondescript.  Try this, by G. Pesole in a 2008 paper in the journal Gene, out for size, if you think we're kidding:
Gene: A discrete genomic region whose transcription is regulated by one or more promoters and distal regulatory elements and which contains the information for the synthesis of functional proteins or non-coding RNAs, related by the sharing of a portion of genetic information at the level of the ultimate products (proteins or RNAs).
Or how about this one, from 2007 (Gerstein et al., Genome Research):
Gene: A union of genomic sequences encoding a coherent set of potentially overlapping functional products.
Not very helpful!!

These definitions show how much we're in trouble.  Ken attended a whole conference on the 'concept of the gene', at the Santa Fe Institute, in 2009, but there was no simple consensus except that, well, there is no consensus and essentially no definition! 

The good old days
When microscopes and telescopes were invented, they opened up whole new previously unknown worlds to scientists.  If anyone doubted the existence of paramecium they had only to peer through the eyepieces of this strange new instrument to be convinced.  When cold fusion was announced by scientists in Utah some years ago, the reaction of disbelief was based on what was known about how atoms work.  That is, there were criteria, and theoretical frameworks, for evaluating the evidence.  Yes, there has always been a learning curve; when Galileo looked at the moon or planets, various optical aberrations actually were misleading, until they were worked out.

And now....
But they were building in an era when fitting theory was the objective.  In biology, we're in new territory here.

Tuesday, July 19, 2011

The ethics of studying genetic causation

A comment in the current issue of Nature, "Genomics for the world", calls for including non-Europeans in the genomic revolution. 
In the past decade, researchers have dramatically improved our understanding of the genetic basis of complex chronic diseases, such as Alzheimer's disease and type 2 diabetes, through more than 1,000 genome-wide association studies (GWAS). These scan the genomes of thousands of people for known genetic variants, to find out which are associated with a particular condition.
Yet the findings from such studies are likely to have less relevance than was previously thought for the world's population as a whole. Ninety-six per cent of subjects included in the GWAS conducted so far are people of European descent. And a recent Nature survey suggests that this bias is likely to persist in the upcoming efforts to sequence people's entire genomes.
Geneticists worldwide must investigate a much broader ensemble of populations, including racial and ethnic minorities. If we do not, a biased picture will emerge of which variants are important, and genomic medicine will largely benefit a privileged few.
And success is always just out of reach -- if only.  If only we have bigger samples, or more heterogeneous samples, or less heterogeneous samples, or more markers, or whole genomes.
The 'missing heritability problem' has led many to become dismissive of GWAS. A danger of this GWAS fatigue is that it deters others from applying the approach to populations where it is likely to yield excellent results. GWAS has proved most successful in relatively small homogeneous populations — in Finland, Iceland and Costa Rica, say, where people generally stay put. Large families and limited migration are common among populations in Latin America, Africa and South Asia — suggesting that new and important associations between diseases and regionally common genetic variants may be found easily in these groups.
Well, yes, the ultimate in relatively homogeneous studies, family studies, have been quite successful in finding genes -- in cases of clearly genetic diseases.  And it has been known for decades that different alleles or even different genes can lead to a similar phenotype.  So it's no surprise that studies in isolated populations might yield results, but if they aren't usually generalizable beyond a single family or a small population, that isn't going to be widely useful.  Nor have the genetic underpinnings of common chronic diseases been reliably or very usefully demonstrated, even in small homogeneous populations, so optimism about finding genes for diseases like heart disease or type 2 diabetes or asthma is pretty much unwarranted.

That said, including non-Europeans in medical studies is a laudable goal.  And yes, genetic variation by geographic ancestry is to be expected.  This was a point made in more than one journal paper by Ken many years ago when one-size-fits-all markers were touted as justification for the HapMap project....but it was an inconvenient, if obvious, truth that was ignored.  The underlying reasons were not very savory and beyond this post.

So who is pushing this now?  Geneticists who want to confirm their belief that GWAS work, that rare variants will explain the 'missing heritability' that isn't being captured by these studies now, and that all we need is bigger or more varied samples to prove it. And, pharmaceuticals pursuing the dream of the 'druggable' genome.  And everyone who wants the largesse to continue pouring into this kind of science (which, not incidentally, will deprive other areas of funds, in the zero-sum game of research funding).  One perhaps cannot blame the researchers for wanting research funds, or biotech firms wanting business, but nobody seems to be watching the priority store.

But even if the world is someday representatively sampled and included in genetic studies, "genomic medicine is going to largely benefit a privileged few" anyway.  First to benefit will be the sequencers and the makers of the sequencers, and they will benefit handsomely, and direct to consumer genome-risk selling firms, followed by the analyzers of the resulting bioinformatics who will be able to mine the data, and apply for grants for follow-up studies when they don't get definitive answers, and then maybe (and, yes, hopefully!) a few people with clearly genetic diseases.  Even in rich countries genomic medicine is going to remain unaffordable for most people for the foreseeable future -- trickle-down doesn't work here either.  And that's assuming that there are benefits to be had!  But, as we write about frequently here, widespread benefits of this kind of research haven't yet been demonstrated, and there are many reasons to believe they will be few and far between anyway, even at best. 

Meanwhile, when it comes to public health, other avoidable disorders go under-attended.   And once again we have to ask whether the largesse being showered on medical genomics could be better spent on prevention.  As a colleague once said to us, it would be a lot cheaper to give everyone at risk of type 2 diabetes a personal trainer than to do all these genetic studies of the disease.  And it would do people a lot more good!  Studies of the genetics of T2D have been ongoing for 40 years -- including in non-European populations -- and have yet to yield significant results.  Or prevent a single case of diabetes.  Even sickle cell and ApoE related diseases are not yet very, if at all, solved in a way that is based on genotype data.  Sickle cell was discovered more than a century ago.  The promises of genetically based genomics may have been sincere, but they have proven to be hollow even in a medium term sense, despite various exceptions that one might cite.  The thousands of GWAS hits are being misrepresented as such exceptions as a rule.

Pushing, exaggerating, and hyperbolizing expensive genetics research, even if it will get investiators interesting things study and subsidized visits to exotic places, is a highly, and usually knowingly cynical, way to lobby for funds (because this is often acknowledged in private).  If one is as smart as a  researcher at a privileged university, you could perhaps fairly be asked to use your intelligence to solve important problems on modest budgets and save the megabucks for real, proven public health improvements.

We suggest that we are at a point in our understanding of disease causation where lobbying for increased funding for genetic studies is simply to a great extent unethical.

Monday, July 18, 2011

One gene one .... what? The problem with the 'Centra Dogma'...or any dogmas in science

One thing that has been found in recent years, and about which knowledge has been rapidly expanding is that the grand old 'Central Dogma of Biology', that one gene codes for one protein is, well, substantially wrong (to be kind to it).

Gene transcription, from The Mermaid's Tale, Weiss & Buchanan, 2009

The Central Dogma held that DNA is a string of codes that specifies messenger RNA (mRNA) that is translated in the cell into protein.  One gene, one protein.  That was how it looked when the nature of DNA was first being discovered.  But it's been decades since we knew that was not accurate.

A few of the reasons are
  1. genes are interrupted coding sequences (they have non-coding 'introns'),  
  2. introns are spliced out of mRNA by particular DNA sequence motifs,  
  3. genomes evolve by duplication of whole segments, 
  4. much more DNA is transcribed into RNA than was thought; 
  5. some of this RNA has complimentary sequences to protein codes and there is an elaborate mechanism for this, which is never translated into protein, to inhibit 'real' genes (this known as microRNA);  
  6. some non-protein RNA--copied from genes without the usual gene processing sequence elements, is nonetheless found attached to ribosomes in the cell as if it were being translated anyway; 
  7. gene usage is determined in part by the way DNA in the gene's part of a chromosome is packaged and chemically modified ('epigenetics'); 
  8. important aspects of variation are due to mutations that happen during the lifetime of the organism; 
  9. each cell uses only some of its genes, and sometimes only one of its two copies of a gene that it is using; 
  10. genes can be assembled via effects of genes from other chromosomes or even make mRNA that is a composite of pieces from two different chromosomes ; and (so we don't have to keep on and on), 
  11. some mRNA is 'edited' after being transcribed, in replicable ways sometimes conserved among distantly related species, in which one nucleotide copied from the DNA template is replaced by a specific other nucleotide, thus changing the function, including the protein code, of the mRNA.
These are among many other things that we now know to be parts of DNA function.  They don't change the basic idea that DNA specifies protein structure, but there are so many details, of so many sorts, that it is clear that the idea of the Central Dogma is essentially wrong.  Yet, why do we still have 'exome' sequencing in so many expensive studies, or gene 'for' this or that trait, in so many expensive studies, when we know how simplistic this is?

There are many answers, and we regularly harp on them.  Here, the main point is that these discoveries are real, their importance highly variable and mainly unknown, and that they always add to, but rarely if ever reduce, the complexity between DNA and the traits that it affects.  Promises of simple prediction have been aided and abetted by the addictive discoveries that really work like the genes Mendel studied in peas.  We do a lot of hand-waving to dismiss complexity, but we cling like drowning sailors to life-rafts to the simple, Central Dogma, in the fervent hope that we'll find the Big Gene Story.   Yet those who are thinking about science itself, rather than about what they have to do to maintain their careers, know very well that we know very little about the nature of genetic function.

Dogma should not be part of science.  But historians and philosophers of science have shown that it certainly is.  A new book, for example, shows how this has affected immunology for decades, as  investigators clung to a 'fictive' theory, the idiotype network theory, even though it never had much basis (the book, The network collective: Rise and fall of a scientific paradigm, edited by Klaus Eichmann, is reviewed in the July issue of Bioessays). There's no place for dogma in science, nor for the tribalism that accompanies it.  But, well, we're only human so purging the motivations that drive dogma, and the careers that are made on its basis, may not be in the cards.

Friday, July 15, 2011

The invasion of the poison parsley

Polymeadows Farm

I spent most of the week visiting my sister and brother-in-law in Vermont.  They are dairy goat farmers, owners of Polymeadows Farm.  They sell their products all over New England; Polymeadows milk and yogurt even make it down to New York City.

I love visiting the farm, even if it means I use muscles that have lain dormant since the last time I was there.  Carrying 60 pound bales of hay and 5 gallon buckets of water takes some getting used to every time, and the 10 gallon milk cans just about defeated me this week.

And this year I was using muscles that I haven't used since my kids were small, because my sister's one year old grandson is there now and I carried him around a lot.  He loves doing chores with his grandmother.  They go out together every morning.  He sometimes gets a little frustrated when she won't let him carry the eggs he finds in the chicken coop, but that quickly passes and he delights in feeding the goats, filling the water buckets and the baby goats' milk bottles in the milk house, and sprinkling grain for the roosters that wander around the yard.  One day he's hoping he'll actually touch one.  He's learning to do chores in Spanish, so he points at the goats, and each morning smiles as though it's the first time he's seeing them, and says, "Cabras, cabras!"

The three of us wandered up the lane one morning after breakfast to pick black raspberries.  They were at the peak of ripeness and there were a lot of them, and that made a one-year old very happy.  We all picked for half an hour or so, some of us eating a lot more than the others of us and turning blue.  When our helper got tired I brought him back to the house, and went back up the lane to do some more picking.  The berries are all in the hedgerows between hay fields, where the mower doesn't reach, richly biodiverse strips of land.  There are grape vines there too, climbing over the aging, gnarled trees that haven't yet died and been split for fuel for the outdoor furnace that heats the houses in the winter.  My sister makes jam from the grapes and black raspberries every year, and it's the best. 

Giant Hogweed
We walked over the hill and down through two or three fields looking for berries where she's found them in years past.  The berries and grapes all happily co-habit, along with the cherry, spruce, ash, oak, maple, and poplar trees there.  Or they did until two invasive species came along, taking root up and down the hedgerows and slowly suffocating much that has been living there for centuries.  These are oriental bittersweet and poison parsley (also known as poison parsnip).  Many of the black raspberry bushes my sister once knew to be fruitful are no longer bearing, or aren't even there any more, and more and more of the grapevines are being crowded out.

Invasives are species that are particularly aggressive when introduced to habitats outside of their usual range.  They often out-compete local species because there are no predators in the adopted habitat or because they grow faster, or in a more diverse habitat.  Ultimately, invasives can choke out native species, leading to reduced biodiversity. 

There is apparently a fledgling movement afoot to beat invasives by eating them -- invading fish, animals, plants beware.  Here's the Top 10 edible invasives list.  Note that neither poison parsley nor oriental bittersweet is on the list.  Poison parsley is in the carrot/parsnip family, related to Queen Anne's lace (also called wild carrot) and the very toxic Giant Hogweed.  Apparently, some parts of the plant can be lethal if eaten -- there are stories about cows and people killed by this stuff, but the roots are edible.  They're parsnips, much like the parsnips you buy at the store -- this is one invasive that escaped from horticulture.  The sap, though, can cause severe burning to the skin, but it's phototoxic, so it only burns where it touches you on a sunny day.

Poison parsnip, wild edible.

My brother-in-law says that years ago when his father was growing corn in the same fields where he now mows hay, and dosing it with herbicides, it was the 'superweeds' that survived.  Then it was velvet weed, and when they stopped growing corn and planted things like alfalfa and orchard grass instead, and stopped using herbicides, the velvet weed disappeared.  The poison parsley has been up in those hedgerows for a long time, and he thinks it probably took hold in the old corn days.  It's not in the fields because they get mown and it needs to go to seed to propagate, but it sure is happy in the hedgerows. He knows about superweeds from experience, but he's not the only one who says this.  Herbicide-resistant weeds are an ever increasing problem, one we blogged about here.   

Invasives and evolution
In competitive Darwinian terms, in the short run, invasives can win and win big.  That is, those introduced plants that do get a foot-hold.  Most incomers might be driven to local extinction. But if an invasive has some advantage, they garner the most resources, reproduce best, spread the fastest, and out-compete many of the native plants in a given habitat.  Perhaps they have ways to extract resources, or avoid competitors or predators, that plants in the local stable ecosystem didn't have and hence were not equipped to resist.

The resulting reduction in biodiversity that can happen when a highly successful invasive takes hold explains why there is such intense interest in getting rid of them, or preventing their spread in the first place. Their success isn't necessarily because they adapt to their new environment with genetic changes, but rather it can be a combination of not being very picky about where they live and having no predators in the new habitat.

Rather like humans, as we elbow our way into every corner of the globe.  We can live in very different environments because we are able to exploit many different food sources, and because we have tools and culture and can find ways to survive in many different climates.

With successful invasives there is no 'adaptation' of the usual kind needed: the incomers' nature simply allows them their advantage, without the need for new mutations or variation to be selected for over generations.  That may apply to mutations in viruses or bacteria, where selection and spread are very fast.  But if classical 'Darwinian' evolutionary adaptation were required, most incomers would be driven back out long before they could adapt, or their adaptation would take much longer than documented recent invasions.  Perhaps that history is very misleading about the dynamics of invasions not made possible by quick human transport and so on. 

But perspective is important here.  In the long run, the picture can be quite different; the invaded habitat might itself adapt to the invader.  This has been true, e.g., after invading fire ants have decimated an area of other insect life.  In the short-term, it looks like the fire ant has won, but years later the local insect populations that had been reduced in number may well bounce back, to co-exist with the invaders.  Or co-exist because the invader was a selective force that molded changes in the host-land species.

But how long does a species have to be in a place before it's considered 'native'?  Many species that we think of as native were once invaders. Again, including humans.  So, many 'native' habitats are only so by our short-term definition.

I hope the grapevines up the lane behind the barn at Polymeadows, native or not, aren't squeezed out entirely by the invaders.  Wild concord grapes make much better jam than do parsnips.

Thursday, July 14, 2011

Bioinformatics: the more we look, the more we find

I'm at a bioinformatics summer course in Poznan, Poland this week.  There are lecturers and students from many places, and Poznan is a very fine setting and this is a very fine program organized by Woitech Makalowski and Elizabeta Makalowska.  The topics cover many areas of the information sciences that try to deal with the huge amount of information being revealed about many different species of animal, plant and bacteria (among others) as a result of high-throughput, automated DNA sequencing and other techniques of similar power.

The old models are falling rapidly.  We now know very clearly that genomes are more than 20-some thousand protein coding units strung together along an otherwise inert DNA sequence.  Instead, many more functions are being discovered, even if their functions are only partly known.  DNA is copied into RNA, and the RNA has many different uses, and is even processed in many different ways.

The bottom line is that for the relatively few and straightforward causal functions in DNA, there is now an expanding array of newly found functions.  Some of these are clear, major, and easy to characterize.  But much of the evidence is for things that seem to have some function (for example, the same elements are found in similar elements of the genome in multiple species, suggesting that they have been conserved by natural selection).  For most of this, bioinformatics provides statistical evidence from reams of data, and some confirmatory experiments support the data-base analysis findings, but what the function is, or how important it is, or how variable it is, are still quite unknown.

Part of the problem is that this ever-expanding amount of complexity, of types we had not at all anticipated, makes general evolutionary sense but makes the idea of prediction from genes to traits much more problematic than we have hoped.  The complexity makes evolutionary sense if you are willing to abandon the hyper-simplistic idea that gene makes protein makes trait and selection definitively removes the bad and advances the good versions of the trait.  Instead clearly selection is very tolerant of variation, that is redundancy all over the place, and each element of a genome has function that depends on what's in the rest of the organism's genome, and its living environment.

This suggests a much less causally definitive view of life than is the general theory, and also reveals how very little most people--even most biologists--are aware of when it comes to the complexity of genomes and their relation to the traits we care about.

There are no solid answers beyond simply noting how complex and internally variable organisms are, and it is not clear even what kinds of answers will be needed for better explanations.  It is clear that most people will propose purely technological answers: generate more data and more sophisticated computer programs to process it....and hope some deeper truths, if there are any, will emerge from the results.

Maybe it will work that way.  But except for a subset of things that follow the simple theories we thought applied to all of life, promises of quick or simple answers don't seem in the offing, no matter how much we may hunger for them.  So, too much to say in any detail in a blog post, but very stimulating and humbling to hear about even more sources of complexity than I was aware of!

Wednesday, July 13, 2011

Technology driven science

Ever since around Galileo's time, which was also the origin of modern science, empiricism replaced deductive reasoning as the core of basic knowledge.  This changed worldview was stimulated by many things, but largely by instrumentation.  In particular optics drove fundamental realizations about the world that could not have been made previously.  One thinks of the vast new worlds of facts revealed by telescopes and microscopes, but other things came along as well, such as the discovery of vacuums, basic findings in chemistry, anatomy, and geology among others.  Navigation with the aid of improved astrolabes and, especially, reliable clocks, opened the world to faster, safer navigation.  Steam power led to trains and improved mining and factories.

Instrumentation led to a wealth of new data, and that stimulated thinking both in theoretical science (e.g., gravitation, the sun-centered planetary system, and so on).  Partly this was driven by the desire to control commerce and global political power.  Engineering and 'science' became more part of each other.  This led to understanding the law-like nature of Nature, and stimulated evolutionary and even social science thinking.

Today, we are driven even faster--much faster--by technology.  Technology people to dream of having an advantage over research competitors, and companies push their gear to feed their own interests but also those of their academic and industry customers.  Genetics exemplifies these trends.  But are we  paying a fair price for the gains?

I'm at a meeting in bioinformatics, in Poland right now, and the new DNA sequencing and other technologies feature prominently.  The technologies are mentioned almost as much as the results, and talk after talk is about what 'can now be done' with the latest computer or sequencing powers.

Clearly, as we've said countless times here on MT, we are not getting the results we dreamed of when it comes to genetic causation and the prevention of all known human ills.  We are just as clearly learning new things--such as the discovery of all sorts of RNA that wasn't supposed to be there according to previous genetic 'theory' (one gene, one protein, for example).  Were it not for the misleading hype by which the system is driven, one might feel less like complaining.  But even advocates realize that these tools have neither revealed a new theory of life nor reaped their promised miracles.

One problem, widely recognized, is that the new technologies are pushed on the marked before they are really properly ready and battle-tested.  The drive, perhaps especially in the US but spreading around the world, to have the newest machinery, is largely responsible.  Don't wait for the stuff to work really well--get it now and get a jump on your competitors!  You can publish results and shamelessly acknowledge then, or at least later, that you know or knew there were plenty of problems, but it was, after all, only exploratory or pilot or tentative data.

One thing this impatience does, besides cost a lot, is fund the companies for years while they make their gear actually work.  New types of DNA sequencers are widely acknowledged to produce many errors on the sequence they report.  Extensive and expensive efforts, paid for by grants to a great extent, are made to use the stuff and document the errors and attempt to correct for them.  But they are not removed, and genomic and other data bases are now loaded with sequence and other types of data that have too many errors, and they are often inscrutable.  How do we decide what data to use, and what data we have to ignore (or ask the public to pay for again, now with updated technology)?

We keep the tech companies afloat by buying products that are flashy but not yet ready for prime time.  It's one thing to acknowledge that anything highly sophisticated will improve over time, but it's another to go in too far, too fast which is what we're doing.  In part, I think (though I'm no business expert of any sort!) that the companies are often start-ups that don't have huge amounts of up-front investment: they have to sell sooner rather than be more patient, because they're not building on top of other mainstream things that they sell.

It's a problem in this field, and it has many consequences.  Not only the cost, but the loading up of massive data bases with suspect data, tentative but erroneous conclusions that lead to large numbers of follow up studies by investigators eager to jump on bandwagons, follow the latest trends, or get better information on some important problem such a serious disease, are examples.

Better and more restricted focus, less 'omics' and more theoretical understanding and basic science, and a slowed down pace would help.  But these seem impossible in our current heated-up system, unless the funds dry up.  If that happens, it will be bad in some ways because some of these new technologies really are important, but it may be good if it forces us to think more before we act.

Tuesday, July 12, 2011

Sorting out the vitamin D story needs better data, not religious fervor

If even a tenth of what's being said about vitamin D these days is true, those of us astute enough to supplement what we get from the sun are destined for a ripe old age.  Or, if we're already old, and female, taking vitamin D might extend our lives even further.

The idea that vitamin D promotes bone density is based on solid evidence, but beyond that it's effectiveness is not entirely clear.  Advocates say that vitamin D prevents prostate, breast and other cancers, multiple sclerosis, diabetes, heart disease, osteoporosis, allergies, inflammation, fights the common cold and the flu, and boosts fertility, just to name a few of the benefits being touted.  And, limited sun exposure for mothers during pregnancy could be condemning children to a lifetime of illnesses.  According to a new study,
We have known for some time that mums-to-be with low vitamin D levels during pregnancy run the risk of having ­a child that may develop diabetes.
This latest study concluded that other conditions like asthma, autism, multiple sclerosis (MS) and Alzheimer’s could be related to low levels of maternal vitamin D.
Yes, Vitamin D is "Summer's Superhero"!
“Vitamin D deficiencies are rampant amidst our nation and could possibly lead to an increase in the most troubling diseases of our time,” said Steven Hotze, M.D., founder and CEO of PPVS [Physicians Preference, vitamins and supplements].
Indeed, Mozart may have died of vitamin D deficiency!  Think what music the world is missing because he slept during the day when he should have been sitting in the sun, and composed and caroused all night.

Well, how much of this hype is actually true?   And how would we know?  A Nature News Feature published online on July 6 details an ongoing debate on this question.  Recent recommendations have been that we are all vitamin D deficient, and should be boosting our vitamin D with sometimes megadoses of D3 supplements.  Naturally, many scientists who've been researching this subject now have vested interests in plugging supplements, and, as always, this makes it harder to separate the wheat from the chaff.

In response to all the hype, an 'expert panel' was convened by the Institute of Medicine (a non-profit affiliated with the US National Academy of Sciences) to soberly assess the evidence and make recommendations about healthy vitamin D levels, and who should be taking supplements.  They issued their report last November saying that vitamin D levels recommended by current conventional wisdom were too high, and in fact could even be harmful.  But the report has not gone down easily; panel members have been sent abusive and threatening emails, and so on.  As Nature says,
Much is at stake. By 2009, the amount spent on vitamin-D supplements in the United States had risen tenfold in ten years (see 'Raising the stakes'). Medical practitioners and public-health officials worldwide look to the IOM for guidance on how to interpret the conflicting claims about vitamin D. Yet several vitamin-D proponents say that the IOM's methods, which involved a systematic review of the literature, were flawed. They have accused the panel of misinterpreting data and over-emphasizing the danger of heavy supplementation. Just last month, the Endocrine Society, a professional association of 14,000 researchers and clinicians based in Chevy Chase, Maryland, released guidelines that recommend higher doses than the IOM did.
Why, instead of clearing confusion as was the IOM's goal, has the report sown division and unrest? "The IOM was too definitive in its recommendations," says Michael Holick, an endocrinologist at Boston University School of Medicine in Massachusetts, and an outspoken critic of the IOM panel's conclusions. "Basically, the vitamin-D recommendations are based on low-quality evidence," says Gordon Guyatt, a clinician researcher at McMaster University in Hamilton, Ontario, who has been a consultant on various guidelines. "I think admitting that would have made some of the angst disappear."
(For the record, Holick is not only an outspoken critic of the IOM recommendations, but he has been one of the leading drivers of the vitamin-D-cures-all train for some time, and has a lot to lose if it turns out he has been wrong.)

Indeed, many observers, interested and disinterested alike, recognize that there's a lot of questionable science in the vitamin D field.  It's hard to figure out how recommended levels have been set, there have been few if any prospective studies starting with a cohort of healthy people and following them forward, and many that look at vitamin D in people who are already ill.  Such studies are confounded by the fact that people who are ill tend to stay indoors, out of the sun and thus not synthesizing vitamin D, so it's not possible to know whether the vitamin D deficiency (deficiency according to current standards, however they were determined) preceded and thus led to the illness, or was a result.  Many studies have too few subjects for results to be robust, and so on.

Members of the now disbanded IOM panel are calling for large, multi-year prospective studies, in recognition of the fact that much of the data are from studies that are less credible than they should be.  But this too has generated heated dissent.  As the Nature piece says,
Perhaps IOM panel members underestimated the passion present in the vitamin-D field. Physicians who recommend high doses of vitamin D might not want to believe that the evidence they have trusted isn't quite up to par. "One thing I wasn't aware of before, is the tremendous pressure from industry and investigators who are tied to their religious belief in vitamin D," says Rosen.
So given all of this, why would one ever think of pouring more research funding down this sink-hole, to identify the obviously minor if not trivial effects over which these debates are centered?  Clear vitamin D deficiencies are not at issue.

Several years ago, we did an extensive review of the vitamin D literature and we, too, were unassailably convinced that most conclusions, from recommended blood levels to the diseases caused by deficiencies, were based on questionable to poor data.  As far as we could tell, basic questions are still unanswered, including almost everything about mechanisms of action.  This is another instance of correlations being assumed to be causation without biological justification.

To put it bluntly, the idea that we are all vitamin D deficient is manifest, blatant biological, and evolutionary clap-trap.  It is at the very least extremely naive and superficial thinking.  We live hugely longer and in better health than our ancestors did, when natural selection--to the extent that it cared--established the required levels for successful survival and reproduction.  So the assertion of pandemic deficiency really can mean no more than that we might be somewhat better off, or live even longer, if we doped up on the advocates' dietary supplements.  Such panacea talk is not new to human society, but in an age of science should be roundly stamped out, because it is misleading.