Saturday, May 23, 2009

Genetic leaf-litter

There are many ways in which everyone is a conceptual prisoner, encaged in culturally based limits. We are born to, and trained in and entrained by our circumstances, and these in turn are a legacy of history. We can try to escape from this but probably the most we can hope for is to keep subtle assumptions and constraints at bay. In genetics, there is a pervasive concept of the 'wild type', a concept that goes back into the history of genetic research, referring to the natural allele at a gene, that was favored by a history of selection, relative to which other alleles (mutational variants) were viewed as generally rare and harmful (waiting to be shortly removed by natural selection).

There is a tacit extension of this gene-specific concept to the whole genome (or even organism) as when 'normal' inbred laboratory mice are referred to as the 'wild type' relative to an experimental modification such as a transgenic gene knockout mouse of the same strain.
Sometimes this is clear shorthand, but beware of conceptual shorthand! An implication of this kind of genetic thinking is that in regard to human traits, including especially disease, there is the normal human genome as represented by 'the' human genome sequence available in genome data bases, and the disease-causing mutants. But in fact genomes are very large sequences of DNA that serve as targets for mutation in every cell, every individual, every generation.

We know that biological traits are the result of developmental processes that include countless genes (of the classical protein-coding type as well as many other functional DNA sequence elements). Species contain large numbers of members--there are about 7
billion of us humans stalking the Earth. What this means is that there is a potentially huge amount of variation at most if not all viable spots in our genome. After a mutation occurs, it may proliferate if its bearer successfully reproduces. Over time, some of these alleles grow in frequency to become quite common.

When genomic DNA is sequenced in a number of individuals, this variation is easily detected. But whether affected by natural selection or just by the chance aspects of reproductive success or failure, most allelic variation that is present in genomes at any given time is rare. Relative to the more common variants, this genetic variation is a kind of leaf-litter of variation. Even with hundreds of thousands or, indeed, hundreds of millions of very rare variants present in our species, any small sample will pick up some of them by chance.

In a small sample, those will seem
to be more common than they are; so if we sequenced 5 people (10 copies of the genome) the lucky variants whose true population frequency is only a few in a billion that by chance are in the 5 people we sample, will seem to have a frequency of at least 10% (one copy of the 10 we sampled being the variant). The tip-off that this genomic leaf-litter exists is that most of the variants are not seen in other samples, or if common enough to be sampled more than once, usually only seen in samples from the same geographic region (because that's where they arose as new mutations, and were transmitted to descendants who remained living in the same continent). In developed countries, variants that cause disease will show up in specialty clinics at major medical centers.

In trying to find variants by mapping, as in genomewide association studies (GWAS) that compare sequences between cases and controls, we may feel that we have so far detected the common, but not all the rare causal variants that exist. But we may also feel that if we can just enlarge our samples, we'll get a much better handle on the nature of the effects of these variants, or we'll detect the remaining variants that haven't yet been detected.


This is likely to be an illusion, as the growing number of those of us who argue that very large GWAS will not bring a big payoff of the kind envisioned and promised by those who argue for this kind of project. There are several reasons for this skepticism.
First, it is hard to detect rare things with statistical significance, much less to get a good idea of their effects and action. One needs huge samples to get enough instances to show that the variant is meaningfully more common in cases than controls.

But second, the leaf-litter phenomenon means that as sample sizes increase, more and more rarer and rarer variants will be picked up. It will be difficult to show clearly that they are causally involved with our trait, but even if they are they will have less and less effect on public health. They will vary from population to population, and sample to sample from the same population. Environments may affect whether carriers of the variant manifest the disease, and most such variation will at most have minor effect on risk of disease (if the effect were stronger, the allele would have been removed by selection, or we would have been able to detect it in family studies).

And if it requires more than one such variant, or even many of them, to combine to produce disease, the detection and evaluation situation will be that much more challenging, if not pointless.
There will always be exceptions, as is true about the nature of life. But the leaf-litter phenomenon is real and there is plenty of evidence for it. It is predicted by population genetics theory. And it is consistent with results of mapping studies that have been done to date. Ironically, perhaps, while the individual rare alleles have little detectable effects, their aggregate effects in the population may account for the observed heritability (familial aggregation of risk, or similarity of trait values) of most traits, including disease. That heritability, which is clearly there, is what has been considered mysterious given the failure of linkage or GWAS studies to find the genes that are responsible.

We are presented with a kind of epistemological paradox: the genetic variation exists, but we may have insurmountable challenges to find most of it. Indeed, it is somewhat mystical even to argue that it exists as individual effects, if they cannot be found or replicated by current statistical genetic methods.
Evolution 'cares' about reproductive success, not about simplicity in genetic causation. From a population perspective, evolution occurs because mutations occur generating variation that selection and chance can effect from one generation to the next.

Genetic leaf-litter is thus the fuel for evolution. We may care to know the cause of each instance of a trait or disease, but Nature has only cared about viability and success, and tolerates the leaf-litter. As massive amounts of human DNA sequence are produced, we will see this. It will be an incredible playground for population and evolutionary geneticists. But what we do with it, in terms of identifying disease causation, is not clear.

1 comment:

Regis said...

At any rate, I liked some of the vadlo mouse cartoons!