Monday, April 4, 2011

The lessons of the Land: part IV

Our 3-part series last week on lessons learned in modern genomically based plant breeding was intended to address its conceptual relation to evolution and genetic causation generally, and to problems in human genetics specifically.  We hadn't intended a 4th part, but thanks very much  to conversations with Ed Buckler, the Cornell plant geneticist whose ideas were featured in the Land Report that motivated this series, we wanted to add some further comments.

The idea of genome-based selection (GS) is that you take a sample from a population, of maize or goats, say, and measure phenotypes of interest in each individual, then genotype each individual for a large number of genome-spanning variable sites (markers), just as in genomewide association studies (GWAS).  You use these data to evaluate the contribution that every marker site makes to your trait, thus optimizing a phenotype-predicting score from the genotype.  Then, you use this score to select individuals for breeding.  After a number of generations, you expect an improved stock.

This is very similar in nature to what is done with human populations in some recently advocated methods of using genomewide data to make individualized predictions.  Peter Visscher is probably the author most prominently recognized as developing these methods, though many others are now also involved.

In both human genetics and agriculture, we use a current data set of achieved traits--kernel yield, muscle mass, human stature, blood pressure or disease.  But this is retrospective assessment of genetic associations, and it may only partially reflect genetic causation.  For example, environmental factors may be unmeasured.  Also much of our variation in natural population will be captured in one but not a next sample or not exist in other populations.  These facts place some limits on the predictive power of genomewide data.

Nonetheless, like using parents to predict traits in offspring, if the genetic component is substantial (for example, by measures like heritability, or trait correlations among relatives) there must be regions of the genome that are responsible and that is what this approach finds.  How advantageous it is over measurements of phenotypes in relatives can be debated, as can the amount of contribution to the trait, like disease risk, of large numbers of very rare, never to be seen again, variable sites.  And the prediction is of a net result, which need not be (and often will not be) due to a tractable set of genome sites.  So the biomedical dream of 'personalized genomic medicine' may or may not answer the dreams of its advocates.

The idea should work much better in agricultural breeding, because the population is closed, and genetic variation is systematically, and strongly favored.  Thus, over a few generations the genetic variation can be highly enhanced in the desired direction--at least under the controlled environmental conditions of the selective breeding.

The discrepancy between breeding experience, and the observational setting of human biomedicine--and of evolutionary biology--may, if carefully considered, provide ways in which the former can inform the latter.  There are reasons to think that important changes in view may result.


Javier G. said...

Where is the missing heritability? Is there in fact missing heritability?

Ken Weiss said...

I don't remember if it was said, but with exhaustive genome-spanning markers, the idea is that much if not all of the heritability can be accounted for by one marker scoring method or another. This will be easier in ag breeding because the population variation is restricted and experiments can be done.

Generally, 'hidden' heritability is basically the result of many genetic contributions too small to be detected by the marker set used, the sample size, and so on.

A good paper about the methods is Heffner et al., Crop Science 49:1-12, 2009. Papers by Peter Visscher, in the human context, and by Jannssens--that you can find by PubMed searching (my references are at work and I'm at home at the moment)--deal with these issues.

Javier G. said...

Thanks. I'm reading some of the papers by Vissher et al.

Ken Weiss said...

The idea is that if you have the whole genome (or enough markers) then to the extent that there is genetic contribution, the DNA sequence must account for it all. How much of this can actually be found given measurement issues, statistical noise, and so on is a separate question.

Likewise, since different samples will have different contributing sites, how much of what is found in a given sample that can be applied to additional samples is a different kind of question.

And it should be easier in the much more controlled experimental setting.