Thursday, February 18, 2010

African diversity

The latest news from the GenoSphere is the paper in Nature reporting the 'whole genome' sequence of several Africans (described here, among many other places). We take a particular interest in this paper because some of the lead investigators are friends of ours here at Penn State, where much of the sequencing and, in particular, annotation (analysis) work on whole genome and ancient DNA genome analysis is being done.

One is Archbishop Desmond Tutu, the others are San ("Bushmen"). The rationale for this choice besides its manifest value as a publicity stunt is the legitimate representation by Bishop Tutu of the culturally Bantu part of southern Africa and the San to represent another language group. This set of sequences thus at least in a crude way spans the two major and most different populations of the African continent. We know very well that there are differences between these populations in terms of genetic variation (see Sarah Tishkoff's recent paper in Science, 2009., PubMed ID: 19407144). So this study fleshes the picture out in much greater detail (to the extent that can be done with many fewer sampled subjects).

What this study shows is, as every other human whole genome sequence has shown, a multitude of new sequence variants--new in 3 ways: some will be sequencing errors, some will be rare variants in the population that simply haven't yet been seen, and some will really be new: unique mutations between the subjects' parents and themselves.

No major 'disease' variants were seen. Of course, as Tutu's case dramatically shows, he doesn't have any major diseases, but we knew that without needing DNA, from the fact that he's elderly (all 5 subjects were around 80 years old) and in good health (he's had some diseases in the past, but apparently nothing diagnosable genetically yet -- his reaction to this was reportedly 'immense relief', in spite of the fact that he's 80 and has had cancer and TB, albeit treatable so far, a reminder to the rest of us that non-genetic diseases are probably more likely to fell us than anything lurking in our genomes).

The San individuals, too, are adults in good health. In their harsh environment, if you have a real genetic disorder you're not around to be sequenced. Variants for adult milk drinking ability, lighter skin color, or malarial resistance were not found, but that shows mainly what's expected. Skin color is genetically complex, they haven't been fighting malaria, and they're not in the northeast African populations that have adapted to life-long milk drinking.

Whether major disease or other simple trait variants are found in any given person (especially an older healthy adult) will depend entirely on the major variants circulating in their population, their frequency and strength of effect, and the luck of their genetic draw in sampling them from their respective population. Some of the first people sequenced do carry such variants, but others don't. Interesting, informative, and expected.

The great sequence differences among the San individuals was also to be expected. For some historical reasons, they have accumulated more divergence than other African populations. Probably, they are today a relict population that survived being shoved into the Kalahari desert by the expanding, more technically powerful Bantu speakers that is known to have occurred not too many centuries past. Prior to that, they may have been living in small, widely scattered bands across much of sub-Saharan Africa, with little gene flow (marriage exchange) between them. Thus, new variation would arise (as it does everywhere) but staying very local. That, plus chance (genetic drift) will allow differences to accumulate between San populations. Why so many variants were found is curious and may or may not be due to sequencing errors (time will tell), but the deep split seems robust.

The one rather loose statement being made about these samples, and one that is potentially dangerous is that the San have the most 'ancient' human DNA lineages. That's patently false. We all have had the same time since our common human ancestors. There has not likely even been more generations in the San than in other humans (this would depend on the average age at which San, and others, have borne half their children).

Instead of the San being in any way more differently human from the rest of us, their lineage is not older but has simply been more isolated from the rest. It does represent an interesting subject of study, and one long known, for which we now have good data.

These new data go into the bookshelf of whole human genome sequences.

Genome sequencing now can or at least should be moved from the Melodrama Dept to real, routine science. The technology is racing ahead, and soon around $1500 or less will buy you such sequence, indeed, one at 40x coverage--that means, each part of the genome will have been sequenced independently 40 times (on average), greatly increasing the accuracy of the billions of basepair 'calls' from the sequencing device.

(There are some technical issues with the current paper that we can just mention. It is at lower coverage level and hence perhaps slightly more error prone that 40x coverage would be. And not all the sequences reported were 'whole genome'; some were just the protein coding 'exome' parts. These are minor points relative to this post)

We're soon to be whelmed, if not overwhelmed by such sequences, most of which will go into growing data bases where geneticists can analyze the pattern of variation in all sorts of ways. Some disease-related information will surely result, but the hype, hype, hype about how each sequence reveals important disease information will stop. Or it should stop, at least. We can get on with our work, without the TV crews and material-hungry journalists.

This paper is interesting and is a first stage in getting better data on variation in Africa. So far it shows what we knew to be the case about Bantus, San, and about admixture between them in southern Africa. The sequence shows much variation between the groups, and especially among the San, but we've known that for more than 20 years (PNAS, 1989, vol 86, pages 9350-54, PubMed ID 2594772). That work was done by old friends of ours Henry Harpending and Linda Vigilant who at the time, like the current genome sequence authors, were also here at Penn State (and in our own Anthropology Department) at the time.

The lack of major new findings is what would be expected, but demonstrates that we really are learning things from the past generation of studies of genetic variation. The details of future sequences will be worth waiting for, but when they come they will not be worth shouting about.


Ken Weiss said...

And here's a comment on our post. The stories all over the web and media proclaim that investigators have 'decoded' the genomes of several people This sounds so sexy! But it's very misleading. Scientists have sequenced, not 'decoded' these genomes. To decode would mean to work out the functions of the genomes, but they have not (nor can anyone at present) 'decode' a genome.

We know of many functions of the genome and a sequence shows the genes that we know undertake them. The decoding was done in labs all over the world. Inferring functional variants might be called 'decoding', but even then it's generally a stretch, since predictance (predicting traits from genotypes) is very poor as a rule (and not done in this case)

Too bad we can't have media that report what are actually good and useful results in a responsible way. Science should not be a circus.

And too bad we scientists have such incentives to playing up to (or is it down to?) sensationalism in the media.

Anonymous said...

Ken, you are obviously jealous that Stephan and Webb have embarked so many articles in Nature this few years.

Ken Weiss said...

Sorry, but that's not so. They're friends, and they do very good work and they get recognized for it. They did not sensationalize the story. We talked about it before hand, in just that regard. It was the journal that did that, as far as I know.