Showing posts with label population genetics. Show all posts
Showing posts with label population genetics. Show all posts

Monday, May 9, 2016

Darwin the Newtonian. Part III. In what sense does genetic drift 'exist'?

It has been about 50 years since Motoo Kimora and King and Jukes proposed that a substantial fraction of genetic variation can be selectively neutral, meaning that the frequency of such an allele (sequence variant) in a population or among species changes by chance--genetic drift--and, furthermore, that selectively 'neutral' variation and its dynamics are a widespread characteristic of evolution (see Wikipedia: Neutral theory of molecular evolution). Because Darwin had been so influential with his Newtonian-like deterministic theory of natural selection, natural evolution was and still is referred to as 'non-Darwinian' evolution. That's somewhat misleading, if convenient as a catch-phrase, and often used to denigrate the idea of neutral evolution, because even Darwin knew there were changes in life that were not due to selection (e.g., gradual loss of traits no longer useful, chance events affecting fitness).

First, of course, is the 'blind watchmaker' argument.  How else can one explain the highly organized functionally intricate traits of organisms, from the smallest microbe to the largest animals and plants?  No one can argue that such traits could plausibly just arise 'by chance'!

But beyond that, the reasoning basically coincides with what Darwin asserted.  It takes a basically thermodynamic belief and applies it to life.  Mother Nature can detect even the smallest difference between bearers of alternative genotypes, and in her Newtonian force-like way, will proffer better success on the better genotype.  If we're material scientists, not religious or other mystics, then it is almost axiomatic that since a mutation changes the nature of the molecule, if for no other reason that it requires the use of a different nucleotide and hence the use and or production of at least slightly different molecules and at least slightly different amounts of energy.

The difference might be very tiny in a given cell, but an organism has countless cells--many many billions in a human, and what about a whale or tree! Every nonessential nucleotide has to be provided for each of the billions of cells, renewed each time any cell divides.  A mutation that deleted something with no important function would make the bearer more economical in terms of its need for food and energy. The difference might be small, but those who then don't waste energy on something nonessential must on average do better: they'll have to find less food, for example, meaning spend less time out scouting and hence exposed to predators, etc.  In short, even such a trivial change will confer at least a tiny advantage, and as Darwin said many times to describe natural selection, nature detects the smallest grain in the balance (scale) of the struggle for life.  So even if there is no direct 'function,' every nucleotide functions in the sense of needing to be maintained in every cell, creating a thermodynamic or energy demand.  In this Newtonian view, which some evolutionary biologists hold or invoke quite strongly, there simply cannot be true selective neutrality--no genetic drift!


The relative success of any two genotypes in a population sample will almost never be exactly the same, and how could one ever claim that there is no functional reason for this difference?  Just because a statistical test doesn't find 'significant' differences in the probabilistic sense that it's not particularly unusual if nothing is going on, tiny differences nonetheless obviously can be real.  For example, a die that's biased in favor of 6 can, by chance, come up 3 or some other number more often in an experiment of just a few rolls. Significance cutoff values are, after all, nothing more than subjective criteria that we have chosen as conventions for making pragmatic decisions (the reason for dice being this way is interesting, but beyond our point here).

But what about the lightning strikes?  They are fortuitous events that, obviously, work randomly against individuals in a population in a way unrelated to their genotypes, thus adding some 'noise' to their relative reproductive success and hence of allele (genetic variant) frequencies in the population over time.  That noise would also be a form of true genetic drift, because it would be due to a cause unrelated to any function of the affected variants, whose frequencies would change, at least to some extent, by chance alone. A common, and not unreasonable selectionist response to that is to acknowledge that, OK! there's a minor role for chance, but nonetheless, on average, over time, the more efficient version must still win out in the end: 'must', for purely physical/chemical energetics if no other reasons.  That is, there can be no such thing as genetic drift on average, over the long haul.  Of course, 'overall' and 'in the end' have many unstated assumptions.  Among the most problematic is that sample sizes will eventually be sufficiently great for the underlying physical, deterministic truth to win out over the functionally unrelated lightning-strike types of factors.

On the other hand, the neutralists argue in essence that such minuscule energetic and many other differences are simply too weak to be detected by natural selection--that is, to affect the fitness of their bearers.  Our survival and reproduction are so heavily affected by those genotypes that really do affect them, that the remaining variants simply are not detectable by selection in life's real, finite daily hurly-burly competition. Their frequencies will evolve just by chance, even if the physical and energetic facts are real in molecular terms.

But to say that variants that are chemically or physically different do not affect fitness is actually a rather strong assertion! It is at best a very vague 'theory', and a very strong assumption of Newtonian (classical physics) deterministic principles. It is by no means obvious how one could ever prove that two variants have no effect.


So we have two contending viewpoints.  Everyone accepts that there is a chance component in survival and reproduction, but the selectionist view sees that component as trivial in the face of basic physical facts that two things that are different really are different and hence must be detectable by selection, and the other view that true equivalence is not only possible but widespread in life.

When you think about it, both views are so vague and dogmatic that they become largely philosophical rather than actual scientific views.  That's not good, if we fancy that we are actually trying to understand the real world.  What is the problem with these assertions?

Can drift be proved?
Maybe the simplest thing in an empirical setting would just be to rule out genetic drift, and show that even if the differences between two genotypes are small in terms of fitness there is always at least some difference.  But it might be easier to take the opposite approach, and prove that genetic drift exists.  To that, one must compare carriers of the different genotypes and show that in a real population context (because that's where evolution occurs) there is no, that is zero difference in their fitness. But to prove that something has a value of exactly zero is essentially impossible!


Is each outcome equally likely?  How to tell?


Again to a dice-rolling analogy, a truly unbiased die can still come up 6 a different number of times than 1/6th of the number of rolls: try any number of rolls not divisible by 6!  In the absence of any true theory of causation, or perhaps to contravene the pure thermodynamic consideration that different things really are different, we have to rely on statistical comparisons among samples of individuals with the different competing genotypes.  Since there is the lightning-strike source of at least some irrelevant chance effects and no way to know all the possible ways the genotypes' effects might differ truly but only slightly, we are stuck making comparisons of the realized fitness (e.g., number of surviving offspring) of the two groups.  That is what evolution does, after all.  But for us to make inferences we must apply some sort of statistical criteria, like a significance cut-off value ('p-value') to decide. We may judge the result to be 'not different from chance', but that is an arbitrary and subjective criterion.  Indeed, in the context of these contending views, it is also an emotional criterion.  Really proving that a fitness difference is exactly zero without any real external theory to guide us, is essentially impossible.

All we can really hope to do without better biological theory (if such were to exist) is to show that the fitness difference is very small.  But if there is even a small difference, if it is systematic it is the very definition of natural selection!  Showing that the difference is 'systematic' is easier to say than do, because there is no limit to the causal ideas we might hypothesize.  We cannot repeat the study exactly, and statistical tests relate to repeatable events.

There's another element making a test of real neutrality almost impossible.  We cannot sample groups of individuals who have this or that variant and who do not differ in anything else.  Every organism is different, and so are the details of their environment and lifestyle experiences.  So we really cannot ever prove that specific variants have no selective effect, except by this sort of weak statistical test averaging over non-replicable other effects that we assume are randomly distributed in our sample.  There are so many ways that selection might operate, that one cannot itemize them in a study and rule out all such things.  Again, selectionists can simply smile and be happy that their view is in a sense irrefutable.

A neutralist riposte to this smugness would be to say that, while it's literally true that we can't prove a variant to confer exactly zero effect, we can say that it has a trivially small effect--that it is effectively neutral.  But there is trouble with that argument, besides its subjectivity, which is the idea that the variant in question may in other times and genomic or environmental contexts have some stronger effect, and not be effectively neutral.


A related problem comes from the neutralists' own idea that by far most sequence variants seem to have no statistically discernible function or effect.  That is not the same as no effect.  Genomes are loaded with nearly or essentially neutral variants by the usual sampling strategies used in bioinformatic computing, such as that neutral sites have greater variation in populations or between species than is found in clearly functional elements.  But this in no way rules out the possibility that combinations of these do-almost-nothings might together have a substantial or even predominant effect on a trait and the carriers' fitness.


After all, is not that just what have countless very large-scale GWAS studies shown? Such studies repeatedly, and with great fanfare, report that there are tens, hundreds, or even thousands of genome sites that have very small but statistically identifiable individual effects but that even these together still account for only a minority of the heritability, the estimate of the overall amount of contribution that genetic variation makes to the trait's variation.  That is, it is likely that many variants that individually are not detectably different from being neutral may contribute to the trait, and thus potentially to its fitness value, in a functional sense.


This is one of the serious and I think deeply misperceived implications of the very high levels of complexity that are clearly and consistently observed, which raises questions about whether the concept of neutrality makes any empirical sense, and remains rather a metaphysical or philosophical idea.  This is related to the concepts of phenogenetic drift that we discussed in Part II of this series, in which the same phenotype with its particular fitness can be produced by a multitude of different genotypes--the underlying alleles being exchangeable.  So are they neutral or not?

In the end, we must acknowledge that selective neutrality cannot be proved, and that there can always be some, even if slight, selective difference at work.  Drift is apparently a mythical or even mystical, or at least metaphoric concept.  We live in a selection-driven world, just as Darwin said more than a century ago.  Or do we?  Tune in tomorrow.

Thursday, September 10, 2015

What population genetic diversity can and can't tell us

By Anne Buchanan and Ken Weiss

Genetic diversity is indisputably a marker of geographic origin and human migration.  The reason is very simple: new mutations arise independently and, to a great extent uniquely, and they arise in some local area with only a single copy of the newly arisen variant.  Over time, that variant will either disappear (not be passed down to any offspring) or may increase in frequency.  Because humans traditionally had but few surviving children per parent, and mated locally, only slow increases and spread of descendant copies of a variant would occur.  Local areas had a unique pattern of genomic variants and, depending on their population size and structure, different amounts of variation.  Because all humans originated from a smallish emigration from a source population in Africa, there is more, and more complex, genomic variation there than in Eurasia.

Beyond these clear facts about the amount and distribution of human genomic diversity, interpretations of what it means, implies, involves get fuzzy, political, emotional and controversial; race is seen as either a genetic construct or a social one, and it is correlated in some ways with geographic location or origin, so that it is not obvious how genetic variation per se can be interpreted in terms of traits like societal diversity in wealth, achievements and the like.

The danger of course is to assume that geographic correlation of some societal trait with genomic variation is caused by that variation, that is, that societal variation is 'genetic'.  It is natural for some in the developed world to want to see their achievements as being due to inherent genetic traits (read: superiority), and there is a very long history, all the way back to the Greeks in western tradition, to hold such views of inherency.  But this is hard to demonstrate.

An interesting new paper in the September issue of Genetics tries to make some sense of the meaning of genetic diversity ("Genetic Diversity and Societally Important Disparities," Rosenberg and Kang, 2015) by examining "the ways in which population differences in genetic diversity might contribute to consequential societal differences across populations." Rosenberg and Kang assess the importance of genetic diversity in forensics, organ transplants, and genome wide association studies, as well as its contribution to societal disparities.  They conclude that genetic diversity must be taken into account for biological purposes, but they find no association with societal diversity.  Here's why.

Their paper was at least in part occasioned by a controversy over a 2013 report concluding that population genetic variation can be used as a proxy for economic diversity, and success ("The 'Out of Africa' Hypothesis, Human Genetic Diversity, and Comparative Economic Development," American Economic Review, Ashraf and Galor, 2013).  Ashraf and Galor (A and G) write:
This research advances and empirically establishes the hypothesis that, in the course of the prehistoric exodus of Homo sapiens out of Africa, variation in migratory distance to various settlements across the globe affected genetic diversity and has had a persistent hump-shaped effect on comparative economic development, reflecting the trade-off between the beneficial and the detrimental effects of diversity on productivity. While the low diversity of Native American populations and the high diversity of African populations have been detrimental for the development of these regions, the intermediate levels of diversity associated with European and Asian populations have been conducive for development.
And, this was all determined at "the dawn of humankind."  Naturally, and conveniently, a hump-shaped pattern rather than a simple linear one was needed if one had to similarly denigrate Native Americans and Africans.  None of that sort of argument for inherency is qualitatively new but the attempt to make it genetic and hence inherently true had a juicy appeal.  Rosenberg and Kang (R and K), however, apply the same methods to an even larger data set and find no association with economic success.

R and K make it clear that, in their attempt to replicate A and G's study, they are considering within-population diversity, not between.  This is important, because internal diversity is calculated from the population itself, not from a larger collection of populations which has various issues of sample selection, sample size, and the like. Within a population when one can assume approximate random-mating, one can estimate heterozygosity in ways far more unclear when analyzing multiple populations at one go.  So, R and K are calculating expected heterozygosity, "the probability that two draws from a population at a specific site in the genome will produce different genetic types."

Expected heterozygosity follows a consistent geographic pattern,  
...occurring as a function of increasing distance from East Africa, measured over land-based routes. The highest heterozygosities appear in populations from Africa, followed by populations from the Middle East, Europe, and Central and South Asia. Populations of East Asia have still lower heterozygosities, and Pacific Islander and Native American populations, at the greatest geographic distance from Africa over migration paths traversed in human evolution, are the least heterozygous. The linear decrease in heterozygosity with increasing distance from Africa is a strong and replicable
relationship, achieving correlation coefficients near 20.9 in a variety of studies of different genetic markers and sets of populations.
The explanation for the decreasing diversity out of Africa is that each new founding population is a subset of the original group, and thus carries with it less genetic diversity than the non-migrants.



The serial founder model in human evolution. (A) A schematic of the model. Each color
represents a distinct allele. Migration events outward from Africa tend to carry with them only a
subset of the genetic diversity from the source population, and some alleles are lost during
migration events.  (B) An example of the model at a particular genetic locus, TGA012. Each set of
vertical bars depicts the allele frequencies in a population, with different colors representing distinct
alleles. Within continental regions, populations are plotted from left to right in decreasing order
of expected heterozygosity at the locus [equation (3)]. This figure illustrates the loss of alleles across
geographic regions; Native Americans all possess the same allele. The allele frequencies are taken
from Rosenberg et al. (2005).  Source: Rosenberg and Kang, 2015

Other factors influence diversity as well, such as admixture between different groups, but distance from the original source is replicably the primary determining factor.  There are of course geographic irregularities, such as bodies of water or mountain ranges, but the general pattern is clear, consistent with archeology, linguistic patterns, and so on.

Tests of the interaction between genetic diversity and social factors
Forensics
Genetic diversity is used in forensics to identify a suspect with high probability if the DNA from the crime scene is a perfect match to an individual in the database.  If an exact match isn't found, the DNA profile may be used to identify relatives, which can be done because they will differ by theoretically predictable amounts.  The underlying genetic heterozygosity in a population, however, determines the likelihood that a partial match to a sample is from a genetic relative.  In a low diversity population, risk of a false positive is higher than in a high diversity population, because in the former a higher fraction of individuals will share each allele, which will mean it is less informative.

The different levels of genetic diversity in different populations means that the usefulness of DNA for identification purposes varies between populations.  And, populations are unequally represented in forensic databases.  That is a social issue, not a biological one, and doesn't obviate the relationship between genetic diversity and identification of social relationships.

Transplants
Genetic diversity is important in determining matches for the purpose of organ transplantation, particularly bone marrow.  Here, higher diversity populations will have lower match probabilities -- that is, it's most difficult to find a match when diversity in the population is highest, and the difficulty descends with decreasing diversity.  These are rather clear issues.

The difficulty is greater when populations are less likely to be well represented in match databases, which is, again, a social issue.
...the chance that no donor match is found is greatest for African Americans, followed by the Asian-American, Hispanic, Native American, and white groups. As in the forensic case, the population genetics of genetic diversity, together with societal factors that vary across populations, contributes to the quantity of ultimate interest. Both genetic diversity and its interaction with factors that affect participation in transplantation are important in increasing the probability that any given recipient can find a successful match.
GWAS
Genome wide association studies searching for alleles associated with disease rely on the relative proximity of SNPs, or DNA markers, with disease alleles.  In populations with high genetic diversity, in African populations, or among African Americans, because of the longer history of genomic recombination events that scramble nearby nucleotide variants over the generations, results in lower linkage disequilibrium (LD), so that the proximity of markers to causal alleles can't be relied upon with the same likelihoods as in more recent populations.  One needs more marker test sites to find the LD one needs to make associations with traits, for example.  R and K report that it has been estimated that 96% of subjects in GWAS are of European ancestry. The social implications of this are that disease alleles are even less likely to be identified in high diversity populations than in others.  The vast majority of GWAS and similar findings can be extrapolated only with great and unknown uncertainty at present (though many still attempt it, in what can be called expeditions of wishful thinking).

So, these are three examples of situations in which differences in genetic diversity between populations, interacting with social diversity, can have important social implications -- false positives in forensics, low probabilities of transplant matches, and low likelihood of inclusion in genetic research.
Each of these settings involves a problem that is fundamentally biological—DNA-based identification, transplantation, and genetics of disease. In each setting, principles from population-genetic theory in which aspects of genetic diversity feature prominently underlie the contribution of genetic diversity: theories of forensic and transplantation matching explicitly produce an inverse relationship between match probabilities and genetic diversity, and GWA statistics rely on models of the decay of genetic diversity and production of LD during migrations.  
Back to economics
R and K then return to the societal economics question, to re-examine whether population-level biological determinants are relevant to economic development, asking whether population genetic diversity is as useful when applied to a discipline in which population genetics theory is not relevant. Among other things, there are dangers of being statistically misled by phenomena such as Simpson's paradox and the ecological fallacy.

A and G used a small amount of genetic data to calculate genetic heterozygosity for a small number of populations, and imputed heterozygosity for many more based on geographic distance from Africa. Imputation generally takes sites found in one study that didn't look for variation between them, and assumes the states of those internal sites based on studies of other pouplations where they were typed.  This is a common, if iffy practice, in GWAS, but at least works reasonably well when the samples are from the same geographic area, such as Europe. It is sometimes needed because different GWA studies of a given trait use different marker sites (because they use different genotyping platforms).

R and K recalulated the results by using actual genetic data for more populations, but retaining the same analytic methods used in the original study.  So, rather than actual data for 53 populations in 21 countries, R and K used genetic data from 237 populations in 39 countries.  And they found no effect of genetic diversity on economic success.

Further, they chose multiple different samples of 21 countries, and found a significant effect in at most 27% of them.  Thus, three quarters of the time, had A and G chosen a different sample subset, they would have found no effect.  And, conclude R and K, even if the assumption that studying population genetic diversity and its effect on economic development is valid, the effect didn't persist for an expanded set of populations and countries.  While genetic diversity affects differences between populations in a variety of other ways, when the effect is biological and population genetics theory applies, economic success is not one of them.  "[P]rinciples of population genetics produce no theory of the economic development of nations..."

It is of course plausible that overall variation patterns include variation that leads one population, overall, to have more, or less, of some societal attribute.  One can always construct post hoc stories that fit social prejudices, for example.  But plausibility is not the same as truth, and one can -- and should -- ask why the investigators are making their societal assertions in the first place.  Generally, we know the answer, and it isn't very savory.

Monday, December 15, 2014

Are we still doing 'beanbag' eu(genetics)? Part I. Some history

Way back in 1964, a famous paper was published in Perspectives in Biology and Medicine (vol 7: 343-359, and reprinted in the International Journal of Epidemiology in 2008; 'A defense of beanbag genetics').  The author was one John Burdon Sanderson Haldane, better known as JBS Haldane.  Along with RA Fisher and Sewall Wright (and, later Motoo Kimura, James Crow and an expanding array of others) Haldane helped found and then develop the field of population genetics.

JBS Haldane (1892-1964), from www.Britannica.com (on Google images)
Population genetics is the theory of change in genetic variation ('gene frequencies') in populations over time.  Essentially, one main thread of population genetics follows the fate of new mutations in DNA over time, and in that sense is centered around a variant--called an 'allele'--that arises by mutation in a single 'gene'.  It can model the change in that variant's frequency because of chance, population dynamics, natural selection and so on.  It can also model what happens with several such variants.  However, this theory is mute about what the variant actually does in the organism.

In a sense, population genetics is a particulate, molecular theory of change over time, that is, of evolution, that is largely divorced from real biology, but if biological traits are caused by genes, their variation must also be affected by genetic variants, so while the theory is a valid way to follow frequency dynamics, it seems to have judged it irrelevant to consider the organisms themselves.  Nonetheless, in the 1930's, the panache of its mathematical rigor and one might say fashionability as a molecular (and hence 'real' science) focus, led to population genetics being proclaimed 'the' formal genetic theory of evolution--and, really, more than that: the basic underlying assumption was, and has remained, that evolution is fundamentally a genetic phenomenon.  The impression was essentially given that the rest was incidental window-dressing.

Naturally, some biologists objected to this palace coup by a few mathematically skilled theoreticians; this is a natural resentment perhaps, since many biologists choose that field because they were innumerate, as was Darwin. But as importantly, because of the very particulate nature of the theory, relative to the real world, a leading spokesperson for evolutionary biology, Ernst Mayr, denigrated the theory as 'beanbag' genetics: the reduction of real organisms to a set of independent causal particles, the individual genes.  Instead, Mayr insisted, organisms and their evolution were more integrative, interaction-based phenomena.

In his 1964 paper, Haldane objected to this negative caricature of population genetics.  Essentially, he said that the theory allowed many ideas about evolution to be tested at least approximately, and could account in principle for a broad range of evolutionary phenomena.  He discussed the relationships between more nuanced aspects of genetics--interactions among genes, for example, that Mayr stressed--and the theory.

Nonetheless, while population genetics is very useful for putting some plausibility brackets around interpretations of genetic data from populations, it is still largely a one-gene-(or one linkage group of genes)-at-a-time theory; that is, it doesn't concern itself with actual traits or how they are manifest, and so on.  Indeed, leading developmental geneticists have, rightfully in our view, complained about the self-proclaimed theory of evolution's omission of the way that actual organisms are assembled, and evolve, and the role genes play in that.  The evolution of development (or 'EvoDevo') has become a major field of research, which, thanks to many advances in genetic experimental technology and model systems, has been able to relate developmental genetics to the evolution of the genes and the systems they're part of.

The other half of the 'bicameral' brain
There has long been a second thread of the theory, often called 'quantitative' genetics, that deals with the behavior of quantitative traits affected by large numbers of genes not specifically identified, that can predict aspects of traits in populations over time, but does not attempt to enumerate the individual genetic contributors.  These are called 'polygenic' traits (other similar terms are sometimes used), and are the target of many genomewide mapping efforts, about which we have written many times.

In fact, both these strains of thought go all the way back to around 1900 when Mendel's work was rediscovered, and then competed with Darwinian gradualistic ideas about evolution and genetics.  The competition involved squabbles between the Mendelians and what were call the 'biometricians', or quantitative geneticists.  Since that time, what these combined areas of theory and investigation have shown is that there is a spectrum of genetic causal effects.  Variation in traits that are generally very rare in their population are often due to variation in single genes--any number of diseases, usually severe and with very early onset, are in this category.  These behave in a classical 'Mendelian' way, just like Mendel's pea-traits did.  But most traits and most common, later-onset disorders are in the complex polygenic category.  Human thinking often tends either to focus on qualitative 'things', or on quantitative 'measures', and the difference between particulate and quantitative evolutionary genetics reflects that.

You may not be old enough to remember the phrase 'beanbag genetics' but it symbolized the naturalist's view that whole organisms or even ecosystems need to be studied as interaction entities,  rather than trying to understand evolution by particularizing things down to individual genetic variants, even if the latter are an essential part of the story.  That sort of reductionism was missing the point.  But have we long ago learned that lesson?

Where are we today?
In fact, today's Big Data GWAS-y world is conceptually still largely wedded to beanbag genetics.  It is still driven by a reductionistic approach that essentially believes that by enumerating the individual beans in each person's genome, that person's entire nature can be understood or even predicted from the moment of conception. Is this too much of a simplification or overstatement?  Is there a reason other than molecule-worship that the stress is so heavily on individual, particulate entities like 'genes', even though we know the genome is far from so clearly discretized in function?  Look past the caveats and denials offered by the Big Data empire to what they they are mainly doing, look at how they bury or pass over their caveats, and judge for yourself.

Effort is being made by people to study 'systems', such as molecular interaction networks.  This is a recognition of the problem posed by hyper-reductionism.  It is a step in a good direction, but even the systems approach largely seems beanbag in nature, by approaching complex traits as if they were a beanbag of internally interacting systems that can be enumerated and treated as units.  Network interactions are obviously relevant and involved in biological organisms, but it is not so clear, to us at least, that that path will be the best one to understand complex traits sufficiently well. At least, systems approaches force us to consider interactions among components as fundamental to life.

There is an important sociocultural problem associated with beanbag genetics, besides that we're still thinking in essentially the same way as 50 or even 100 years ago despite vastly more knowledge.  The problem goes beyond the promises to use the enumerative causal approach to develop 'personalized genomic medicine', which sounds so laudable.  Based on what we know today, those promises are highly exaggerated and misleading, even if they will work for clearly causal genomic 'beans' and even if every lesser finding will be trumpeted as a justification for the effort.  But one, if not the most immediate, consequence of that is that they eat up lots and lots of funds that could be spent in other ways, already known, that could yield vastly more improvements to public health (health is, after all, the promise being made).

However, beanbag thinking casts another, far more ominous shadow that also goes back to the early days of genetics, and that will be the subject of Part II of this series.

Thursday, October 16, 2014

What if Rev Jenyns had agreed? Part III. 'Group' selection in individuals, too.

We have been using Darwin's and Wallace's somewhat different views of evolution to address some questions of evolutionary genetics and their consequences for todays attempt to understand the biological, especially genomic, basis of traits of interest. Darwin had a more particularistic individual focus and Wallace a more group-focused, ecological one, on the dynamics of evolutionary change.

HMS Beagle in the Straits of Magellan

As a foil, we noted that a friend of Darwin's, Leonard Jenyns was offered the naturalist's job on the Beagle first, but turned it down, opening the way for Darwin. We mused about how we might think today had Wallace's view of evolution, announced in the same year that Darwin's was, been the first view of the new theory. Where we'd be now if we'd had a more group than individual focus is of course not knowable, but we feel Wallace's viewpoint, at least in some senses, has been wrongly neglected.

Population genetic theory traces what happens to genetic variants in a population over time. Almost without exception the theory treats each individual as representing a single genotype. We take individual blood samples or cheek swabs, and let our "Next-Gen" sequencer grind out the nucleotide sequences as though on a proverbial assembly line. In this sense, each individual--or, rather, the individual's genotype--is taken to be the unit of evolution.

Populations were, and generally still are, seen as a mix of these individual internally non- varying homogeneous units each having a genotype. But that's an obviously inaccurate way to view life, another reflection of the difference in viewpoint about variation in life that we've been characterizing by relating them symbolically to Darwin's and Wallace's stress in their views of evolution.

There is a strong tendency to equate genotypes with the traits they cause. This derives from the tendency to reduce natural selection to screening of single genes, because if single genes cannot be detected effectively by selection, they generally won't have high predictive value for biomedicine either. It is easy to see the issue.

But individuals are populations too
Let's ask something very simple: What is your 'genotype'? You began life as a single fertilized egg with two instances of human genomes, one inherited from each parent (here, we’ll ignore the slight complication of mitochondrial DNA). Two sets of chromosomes. But that was you then, not as you are now. Now, you’re a mix of countless billions of cells. They’re countless in several ways. First, cells in most of your tissues divide and produce two daughter cells, in processes that continue from fertilization to death. Second, cells die. Third, mutations occur so that each cell division introduces numerous new DNA changes in the daughter cells. These somatic (body cell) mutations don’t pass to the next generation (unless they occur in the germline) but they do affect the cells in which they are found.

But how do we determine your genotype? This is usually done from thousands or millions of cells—say, by sequencing DNA extracted from a blood sample or cheek swab. So what is usually sequenced is an aggregate of millions of instances of each genome segment, among which there is variation. The resulting analysis picks up, essentially, the most common nucleotides at each position. This is what is then called your genotype and the assumption is that it represents your nature, that is, all your cells that in aggregate make you what you are.

In fact, however, you are not just a member of a population of different competing individuals each with their inherited genotypes. In every meaningful sense of the word each person, too, is a i of genomes. A person's cells live and/or compete with each other in a Darwinian sense, and his/her body and organs and physiology are the net result of this internal variation, in the same sense that there is an average stature or blood pressure among individuals in a population.

If we were to clone a population of individuals, each from a single identical starting cell, and house them in entirely identical environments, there would still be variation among them (we see this, imperfectly, in colonies of inbred laboratory strains such as of mice). They are mostly the same, but not entirely. That’s because they are aggregates of cells, with genomes varying around their starting genome.

Yesterday we tried to describe why the traits in individuals in populations have a central tendency: most people have pretty similar stature or glucose levels or blood pressure. The reason is a group-evolutionary phenomenon. In a population, many different genomic elements contribute to the trait, and because the population is here and hence has evolved successfully in its competitive environment, the mix of elements and their individual frequencies is such that random draws of these elements mainly generate rather similar results.

It is this distribution of random draws of all the genetic variants in the population that determines the context and hence the success of a given variant. But the process is a relativistic one, rather than absolute effects of individual variants. Gene A's success depends on B's presence and vice versa, across the genome. There is always a small number of outliers, having drawn unusual combinations, and evolution screens these in a way that results in a central tendency that may shift over time, etc.

The same explanation accounts for the traits in individuals. There would be a central tendency in our hypothetical cloned mice. That’s because the somatic mutations generate many different cells, but most are not too different from each other. As in evolution in populations, if they are dysfunctional the cell dies (or, in some instances, they doom the whole cell-population to death, as when somatic mutations cause cancer in the individual). Otherwise, they usually comprise a population near the norm.

Is somatic variation important?
An individual is a group, or population of differing cells. In terms of the contribution of genetic variation among those cells, our knowledge is incomplete to say the least. From a given variant's point of view (and here we ignore the very challenging aspect of environmental effects), there may be some average risk--that is, phenotype among all sampled individuals with that variant in their sequenced genome. But somatically acquired variation will affect that variant's effects, and generally we don't yet know how to take that into account, so it represents a source of statistical noise, or variance, around our predictions. If the variant's risk is 5% does that mean that 5% of carriers are at 100% risk and the rest zero? Or all are at 5% risk? How can we tell? Currently we have little way to tell and I think manifestly even less interest in this problem.

Cancer is a good, long-studied example of the potentially devastating nature of somatic variation, because there is what I've called 'phenotype amplification': a cell that has inherited (from the person's parents or the cell's somatic ancestors) a carcinogenic genotype will not in itself be harmful, but it will divide unconstrained so that it becomes noticeable at the level of the organism. Most somatic mutations don't lead to uncontrolled cell proliferation, but they can be important in more subtle ways that are very hard to assess at present. But we do know something about them.

Evolution is a process of accumulation of variation over time. Sequences acquire new variants by mutations in a way that generates a hierarchical relationship, a tree of sequence variation that reflects the time order of when each variant first arrived. Older variants that are still around are typically more common than newer ones. This is how the individual genomes inherited by members of a population and is part of the reason that a group perspective can be an important but neglected aspect of our desire to relate genotypes to traits, as discussed yesterday. Older variants are more common and easier to find, but are unlikely to be too harmful, or they would not still be here. Rarer variants are very numerous in our huge, recently expanded human population. They can have strong effects but their rarity makes them hard to analyze by our current statistical methods.

However, the same sort of hierarchy occurs during life as somatic mutations arise in different cells at different times in individual people. Mutations arising early in embryonic development are going to be represented in more descendant cells, perhaps even all the cells in some descendant organ system, than recent variants. But because recent variants arise when there are many cells in each organ, the organ may contain a large number of very rare, but collectively important, variants.

The mix of variants, their relative frequencies, and their distribution of resulting effects are thus a population rather than individual phenomenon, both in populations and individuals. Reductionist approaches done well are not ‘wrong’, and tell us what can be told by treating individuals as single genotypes, and enumerating them to find associations. But the reductionist approach is only one way to consider the causal nature of life.

Our society likes to enumerate things and characterize their individual effects. Group selection is controversial in the sense of explaining altruism, and some versions of group selection as an evolutionary theory have well-demonstrated failings. But properly considered, groups are real entities that are important in evolution, and that helps account for the complexity we encounter when we force hyper-reductionistic, individual thinking to the exclusion of group perspectives. The same is true of the group nature of individuals' genotypes.

We have taken Darwin and Wallace as representatives of these differing perspectives. Had Jenyns taken the boat ride he was offered, we'd have been more strongly influenced by Wallace's population perspective because we wouldn't have had Darwin's. Instead, Darwin's view won, largely because of his social position and being in the London hub of science, as has been well-documented. A consequence is that the ridicule to which group-based evolutionary arguments have been subjected is a reflection of the resulting constricted theoretical ideology of many scientists—but not of the facts that science is trying to explain.

What needs to be worked on is not, or certainly not just, increased sample size to somehow make enumerative individual prediction accurate. For reasons we've tried to suggest, retrospective fitting to the particular agglomerate of genotypes does not yield accurate individual prediction--and here we've not even considering non-genomic aspects of each genome-site's environment. Instead, we should try to develop a better population-based understanding of the mix of variants and their frequencies, and a better sense of what a given allele's 'effect' is when we know each allele's effect is not singular nor absolute, but is strictly relative to its context both in terms of its individual and population occurrences. It's not obvious (to us, at least) how to do that, or how such an understanding might relate to whether accurate individualized prediction is likely to be possible in general.

Tuesday, October 14, 2014

What if Rev Jenyns had agreed? Part I. Would evolutionary theory be different?

In 2006 I wrote an article about the long potential impact that historical quirks can have on science, based on the fact that in 1831 an Anglican cleric named Leonard Jenyns said "no, thanks" to an offer. It so happened that that offer was to be the naturalist on a surveying voyage to be undertaken by the Royal Navy. But Jenyns was interested in natural history as a hobby, rather than as a career, and he said he had to spend time with his parishioners and couldn't be away for the long years of such a voyage. He might also have used that as an excuse to avoid the known dangers of such trips at the time.

Leonard Jenyns, the reluctant reverend
Too bad, said John Henslow at nearby Cambridge University, who had recommended Jenyns. So he recommended another of his students, a fellow named Charles Darwin. Darwin was interested in natural history, too, but spent most of his time riding and shooting, as did most members of his social class, and it wasn't clear that he'd make a serious enough candidate for the position. But, after agonizing and consulting family, Charles said "Yes!" The ship was, of course, the Beagle, and the voyage was to shake the world.

I've written about this incident before (Evol. Anth., 15:47-51, 2006) because it is interesting to surmise about how biology, in particular evolutionary and genetic theory and approaches, might be today if Jenyns had agreed, and Darwin had gone fox-hunting during those important years. What might have been different? Wouldn't we have eventually ended up where we are today, celebrating Jenyns rather than Darwin? I think definitely not.

Jenyns was basically a biblical fundamentalist, which meant a creationist.  He would have gotten along famously with Captain FitzRoy, also a strong believer.  Debates (after grace) over wine and meals would not have been about the origin and distribution of variation in plants and animals.  But can we doubt that we’d have learned about evolution anyway?  No, not at all.

At roughly the same time period, another not-so-wealthy naturalist was doing his natural history in remote parts of the world (first Amazonia, then Indonesia), and he developed a clear idea of the ‘transmutation’ of species on his own.  In 1858 he sent a brief manuscript explaining his idea to a correspondent, one who had become well-known among British naturalists, the same Charles Darwin. 

This stunned Darwin who had been working ploddingly on his own theory of evolution.  But with very good grace, he hastily assembled some bits and pieces to show his ideas (and, perhaps not so incidentally, his priority) which along with Wallace’s manuscript were read to the Linnaean Society.  The world had been told, but hardly anyone was listening until the following year when Darwin published his lengthy assertion of the idea that the diversity of life arose through a gradual historical process—his Origin of Species.

Both Darwin and Wallace were famously influenced by economist Thomas Malthus’ book arguing the inevitable pressure of growing population on available resources, and that idea led to the idea that it was competition for such resources in Nature that inevitably favored (selected) those better competitors in terms of their future reproductive success.  Adaptation by natural selection was the process that they argued explained the diversity and functional traits of species.

But the two ideas were rather different
Darwin and Wallace placed very different stress on how this process worked.  Darwin stressed competition among individuals for survival or mates, so that in a given location the better-endowed individuals would have all the fun at the expense of their less-suited contemporaries.  Since traits of organisms were at that time viewed as caused by the deterministic effects of some causal elements (that, in his way, the Moravian monk Gregor Mendel was studying, unbeknownst to Darwin and Wallace).  The most successful competitors would transmit these elements to their offspring, and the elements would thus proliferate over time to replace less-successful elements.

Differential success was also important to Wallace.  He recognized that, of course, individuals proliferate well or not, but his stress was more on competiton among groups or species, and/or of groups against the limits of their environment.  Some groups would do well and modify as successfully adapted species while others would wane.  It was the group characteristic, even though of course comprised of individual members, that told the tale.

Now, if Darwin had stuck to his guns, so to speak, we would be talking today of Wallacian, not Darwinian, evolution.  Whatever we would have discovered about the nature of inheritance, whether or not by now we had discovered DNA and its functions in the cell, we may very well not have developed our ferocious obsession with individual competition, an obsession that often drives us to view genes as if they themselves, rather than the whole individuals or whole populations or whole species, were the central competitors in the evolutionary race.

I think things today might be very different, and we might not be trying to enumerate individual genes in individuals’ genotypes when it came to accounting for genetic causation, genomic and even adaptive evolution.  The reason isn’t that individuals and their genotypes are unimportant, nor that some mysterious function unrelated to individual genes reifies the concept of population to give one population an edge over another.  The reason would simply be a different way to understand that the dynamics of both individuals and their genes are fundamentally aggregate phenomena.  And we’d have very different ideas on the role of populations and context.

In Part II, I’ll consider the collective nature of genomes in populations and how that affects their evolution in group-contextual ways.  Then in Part III, I'll try to show that individuals are themselves similarly context-driven populations of genotypes.

Monday, June 2, 2014

The visible colors: and the falseness of human races as natural categories

There is a constant tension between the tendencies to view the world in continuous vs discrete terms.  Even in science, this can be a problem, because a continuous view can lead to different interpretations than a discrete view.  Disputes about reality can arise, perhaps, over the distinction.  Is something a particle, or is it a wave?  Are the categories of a discrete view natural realities, or are they being imposed on Nature for some human reason?

The argument currently afoot has to do with how culpable it is to use genomic variation data to claim that there are a small number (usually stated as 5) major or primary human races, that blur at the intersections between them.  And, as commonly used software has it, those 'blurred' individuals are considered to be admixed between parents from the 'pure' races.

This is very misleading scientifically and, worse, unnecessarily so.  No analogy is perfect, but we can see the major issues using the example of color, which is often cited as comparable and showing the validity of the 'race' assertion (here, e.g.).  Color is the word we use for our sensory perceptions, the qualia, or psychological experience, by which we perceive light.  In physical terms, a given color is produced by light photons with a given energy level, with particular wavelengths or frequency (since light has a fixed speed, higher frequency means more waves pass by per second, and hence are shorter so they add up to the distance traveled in a second). From that point of view, here is the range of the colors to which the 'standard' human eye (that is, genotype) can respond, that is, a graphic portrayal of the wavelengths we detect:

The spectrum of visible light (wavelength in nanometers).  Wikimedia commons

The word 'color' refers to the qualia of perception, but we assign names to particular wavelengths, a cultural phenomenon based on our particular detection system. In those terms, visible light is a continuum of detectable wavelengths.  But traditionally, given that we are trichromat beings (with three distinct opsin genes, that is, whose three coded proteins each responds most efficiently to a different wavelength--see diagram below) we name three what we term 'primary' colors.  Each retinal 'cone' cell normally produces one of these opsin pigment proteins.  Each color of light that enters the eye triggers an appropriately weighted mix of red, green, and blue signals.  So for example pure
blue' frequency light basically only triggers a response from retinal cone cells that express the blue opsin gene product.

Basically, our ability to perceive any wavelength across the visible range is due to our brain's ability to mix the signal strength received from the retinal cells reporting its respective color activations. We often think of colors as being a mix of these primary colors, but there is nothing physically primary about them. They are artificial mark-points chosen by us because of our particular opsin repertoire.  One could choose other mark-points, and there need not be three (some species have fewer or more), and still perceive light in the entire visible (or even broader) wavelength range.  Various activities such as printing and the like have used different 'primary' colors (e.g., Google primary colors).   When we receive a mix of frequencies, our brain can sort out that mix and identify it.

What 'typical' human cone cells respond to.  Source: http://www.unm.edu/~toolson/human_cone_response.htm


In a sense, so long as you realize what is being done, there is no problem.  But if you think of the light-world as being inherently made of truly primary color categories, and of other colors as blurs in the edge of these categorical realities, then you are seriously misunderstanding the physical reality. First, the color spectrum reflects the color, as we perceive it, of single-wavelength radiation.  No individual wavelength is 'primary'.  Second, other colors are a mix of wavelengths that a trigger response by red, green, and blue opsins, and are synthesized (such as to be interpreted as 'pink') by the brain.

This is also a stereotype for two other reasons.  First, there is considerable variation among humans in the response characteristics of our opsins--the figure shows a typical response pattern for a reference blue, green, and red rhodopsin protein.  And of course a substantial fraction of people can't see some colors because they are missing one or more normally functioning opsin gene.  Secondly, the qualia, or what makes a given wavelength be experienced as 'blue,' is beyond current understanding, nor do we know that what you see as blue is the same as what I see as blue--even if we both have learned to call it 'blue'.  At present this is in the realm of philosophers, and causes a discussion--but no harm is done.

But that is not always the case.  Sometimes when falsely dividing a phenomenon into categories assumed to be true units rather than arbitrary reference points, with some rather unimportant blurs at the boundaries between the categories, the results of the error can be, literally, lethal.  This has been one consequence of the mis-use of theoretical misrepresentation of quantities as categories in human affairs.

Races are not like primary colors
We are writing this because there has been a recent resurrection of science that knowingly misrepresents the global distribution of human biological variation.  People are not photons, and we do not exist in 'primary' groups with blurred boundaries between them--any more than blue, red, and green are sacred and special points in the color spectrum.

We hear a lot of innocent-sounding talk about how one can argue for the existence of human 'races' as genetic, not just sociocultural, entities--but not be a 'racist'.  Yes, the argument goes, there is blurring at the edges, but the categories are real and they exist.

Human populations have long lived on different continents and some of our recent evolution as a species has taken place across great spans of distance, with geographic effects on the rates of gene flow over distance. Time and local geography, climate, culture, food sources, prey and predators and the like vary over space as well, and have in various ways led to adaptive differences among people, differently in different places.  Both cultural and genomic variation has accumulated around the globe.  But with few exceptions, such as truly isolated islands, genomic differences are correlated with geographic distance. 

Europe and Africa are not wholly discrete parts of the world.  The Americas may have been close to that, but only for about 10,000 or so years.  To assert that Europeans are genomically different from Africans, you must define what you mean by these categories.  Do you mean Italians are different from Egyptians?  Or do you mean Bantu speakers from South Africa are not the same as Norwegians?  This is important because with the same statistical methods of analysis, the same sorts of variation, if proportionately less in quantity, occur within these areas.  And had the analysis been done 1000 years ago, the major population of the world might be considered to be the Middle East, not Europe, because the decision of what are the major races, and what the admixed blurs would have been made by Islamic scholars, perhaps with some complaints by the high culture in India.   Choosing other populations as reference points ('parental' populations, or actual 'races')--Tahitians, Mongolians and South Indians, say, rather than the usual Africans, Europeans and Native Americans--would yield very different admixture statistics, because admixture programs are based on assumptions about history, not some inherent 'truth'.

So even those who want to stress differences, for whatever reasons, and who want to make assertions based on the several 'continents', themselves somewhat arbitrarily defined, have to be clear about what they are asserting--what they define as 'race', in particular.  This, of course, is made far more complicated by the 'admixture' that has occurred throughout known history of mass migration. Indeed, even the concept of 'admixture' itself requires specifying who is mixing with who--which in turn determines the outcome of admixture studies.

This sort of analysis has another aspect that is not properly understood.  The user chooses which and/or how many populations are considered parentals, of which other sampled individuals are admixed product.  These are statistical rather than history-based assumptions, using various sorts of significance criteria (which are subjective choices).  And, importantly, this type of analysis is based on alleles that were chosen for study because they are global--that is, the same variants are shared by the different  'races', just in different frequencies. Truly local variation is just that, local, so groups can't be compared in the same way.  Any sample you might choose to take will have lots of rare variants, found nowhere else.  So races in much if not most of the modern discussion, are groups defined in part because their frequencies of the same variants differ.  The genotypes in one 'race' can appear in others as well, but with lower probability.  If you want group-specific variants, you will usually find that they depend essentially on how you define the groups, and very rarely will everyone in a group that is more than very local have the purportedly characteristic variant.  A given genotype may be more likely in one pre-defined sample or group, but these are quantitative rather than qualitative differences largely based on local proximity.  Locally restricted variants can be important in adaptive traits, depending on the dynamics of history, and they can be exceedingly important, but they are generally far from characterizing everyone in a group or in defining groups.  People come into this world as discrete entities, but this is not how populations are generally constructed or evolve.

If we were talking about turtles or ostriches or oaks, nobody would care about these distinctions, even if there is absolutely no need to use such categories.  There are ways to represent human biological variation over space in more continuous terms, avoiding the obviously manifest problems with false, vague, or leaky categories of people, or making excuses for the 'blurring' at the edges, as if those blurred individuals were just no-accounts staggering around polluting the purity of our species!  Asserting the supposed reality of 'race', that is, of true categories on the ground rather than just in the mind, leads to all sorts of scientific problems and, of course, historically to the worst of human problems.

Does it make sense to ask whether members of 'the' European' race are taller than those in the African 'race'?  What part of Europe, and what part of Africa do you mean?  Ethiopia?  Nigeria?  Botswana?  Norway?  Greece? And does the person have to be living there now, or just have had all his/her ancestry from there?  And what about that 'his/her'?  Do we have to consider only living 'Africans' and 'Europeans', or can we use, say, skeletons from these 'races' from any time in the past (should be OK, if the trait is really 'genetic' since gene pools change slowly).  Or can we use Kazakhs or Saamis or Mbutis in our 'race' comparison?  Clearly we have to start refining our statements, and when that is the case even for societally rather neutral traits like stature, how much more careful need we be when we raise topics--as those who like to assert the reality of 'race' can't resist focusing on--with sociocultural or policy relevance (criminality, intelligence, addictability, reckless behavior, genes for ping-pong skill or running speed or being a violinist)?  Why do we need the categories, unless it really is a subterranean desire to focus on such traits to make a political point....or to affect policy?

At the same time, when scientists who think carefully and avoid this sort of categorical thinking, or even deny the reality of categories, or denigrate the idea that the categories are 'just' social constructs, they (the scientists) are denying what is an even greater reality.  That is that, for many people, 'race' is an entirely real category, one they experience on a daily basis.  If in the US you are 'black' or 'white' or 'Hispanic' or 'Asian' you are treated in a group-based way culturally.  If you have any phenotypically discernible African ancestry, for example, you may very well be treated as, and feel as if you were  'black', regardless of your ancestry fraction.  You may have some legal rights if you have at least 1/8 Native American ancestry, and for that and other reasons, you may know very well that 'race' does exist as a reality in your life.  This is inherently a sociocultural construct, and hence a reality.  In that very correct sense, the existence of 'race' is a scientific fact.

Scientists who acknowledge this but then continue to assert the genomic reality of race, essentially  because it is a convenient shorthand and because the bulk of data come from widely dispersed people, play into the hands of the ugliest aspects of human history, and given that history, which they know very well, they do so willingly.  Some even do it with great glee, knowing how it angers 'liberals'. One can speak of genetic (and cultural) variation having a geographic-historic origin that is (except for recent long-distance admixture) proportional to distance, and can think about local adaptations,  without using categorical race concepts.  Some may argue with what is genomic, what is the result of natural selection, and what is basically cultural. But there is no need to wallow in categories, and then  no need to try to define the 'fuzzy boundaries' between them.

Evolutionary genetic models as they are conventionally constructed contribute to the problem, because they are based on the frequency of genetic variants, and frequency is inherently a sample statistic. That is, frequencies are based on a population of inference, specified by the user.  A population is defined as if it had specific boundaries.  Natural selection is also modeled as if 'environments' were packaged in population-delimited ways.  For many reasons, it would be better if we developed less boxed-in evolutionary concepts and analysis, but that's not convenient if it takes time or means your book or grant can't just be dashed off without considering serious underlying issues like these, or if the hurried press likes to take whatever you say and make hay with it.

The use of 'primary' color category concepts is arbitrary relative to the actual color spectrum, but at least is based on our retinal genes, which in a natural way provide a convenient set of what are otherwise arbitrary physical reference points.  Nobody is disadvantaged by the use of those categories in human affairs.  But human populations are not in natural categories, categories are not needed, and they are not neutral relative to human affairs.

Like the light spectrum, there are not, and never have been 'primary' colors of humans.  What is true, however, is that when it comes to that topic, a lot of people cannot see the light.