The latest news from the GenoSphere is the paper in Nature reporting the 'whole genome' sequence of several Africans (described here, among many other places). We take a particular interest in this paper because some of the lead investigators are friends of ours here at Penn State, where much of the sequencing and, in particular, annotation (analysis) work on whole genome and ancient DNA genome analysis is being done.
One is Archbishop Desmond Tutu, the others are San ("Bushmen"). The rationale for this choice besides its manifest value as a publicity stunt is the legitimate representation by Bishop Tutu of the culturally Bantu part of southern Africa and the San to represent another language group. This set of sequences thus at least in a crude way spans the two major and most different populations of the African continent. We know very well that there are differences between these populations in terms of genetic variation (see Sarah Tishkoff's recent paper in Science, 2009., PubMed ID: 19407144). So this study fleshes the picture out in much greater detail (to the extent that can be done with many fewer sampled subjects).
What this study shows is, as every other human whole genome sequence has shown, a multitude of new sequence variants--new in 3 ways: some will be sequencing errors, some will be rare variants in the population that simply haven't yet been seen, and some will really be new: unique mutations between the subjects' parents and themselves.
No major 'disease' variants were seen. Of course, as Tutu's case dramatically shows, he doesn't have any major diseases, but we knew that without needing DNA, from the fact that he's elderly (all 5 subjects were around 80 years old) and in good health (he's had some diseases in the past, but apparently nothing diagnosable genetically yet -- his reaction to this was reportedly 'immense relief', in spite of the fact that he's 80 and has had cancer and TB, albeit treatable so far, a reminder to the rest of us that non-genetic diseases are probably more likely to fell us than anything lurking in our genomes).
The San individuals, too, are adults in good health. In their harsh environment, if you have a real genetic disorder you're not around to be sequenced. Variants for adult milk drinking ability, lighter skin color, or malarial resistance were not found, but that shows mainly what's expected. Skin color is genetically complex, they haven't been fighting malaria, and they're not in the northeast African populations that have adapted to life-long milk drinking.
Whether major disease or other simple trait variants are found in any given person (especially an older healthy adult) will depend entirely on the major variants circulating in their population, their frequency and strength of effect, and the luck of their genetic draw in sampling them from their respective population. Some of the first people sequenced do carry such variants, but others don't. Interesting, informative, and expected.
The great sequence differences among the San individuals was also to be expected. For some historical reasons, they have accumulated more divergence than other African populations. Probably, they are today a relict population that survived being shoved into the Kalahari desert by the expanding, more technically powerful Bantu speakers that is known to have occurred not too many centuries past. Prior to that, they may have been living in small, widely scattered bands across much of sub-Saharan Africa, with little gene flow (marriage exchange) between them. Thus, new variation would arise (as it does everywhere) but staying very local. That, plus chance (genetic drift) will allow differences to accumulate between San populations. Why so many variants were found is curious and may or may not be due to sequencing errors (time will tell), but the deep split seems robust.
The one rather loose statement being made about these samples, and one that is potentially dangerous is that the San have the most 'ancient' human DNA lineages. That's patently false. We all have had the same time since our common human ancestors. There has not likely even been more generations in the San than in other humans (this would depend on the average age at which San, and others, have borne half their children).
Instead of the San being in any way more differently human from the rest of us, their lineage is not older but has simply been more isolated from the rest. It does represent an interesting subject of study, and one long known, for which we now have good data.
These new data go into the bookshelf of whole human genome sequences.
Genome sequencing now can or at least should be moved from the Melodrama Dept to real, routine science. The technology is racing ahead, and soon around $1500 or less will buy you such sequence, indeed, one at 40x coverage--that means, each part of the genome will have been sequenced independently 40 times (on average), greatly increasing the accuracy of the billions of basepair 'calls' from the sequencing device.
(There are some technical issues with the current paper that we can just mention. It is at lower coverage level and hence perhaps slightly more error prone that 40x coverage would be. And not all the sequences reported were 'whole genome'; some were just the protein coding 'exome' parts. These are minor points relative to this post)
We're soon to be whelmed, if not overwhelmed by such sequences, most of which will go into growing data bases where geneticists can analyze the pattern of variation in all sorts of ways. Some disease-related information will surely result, but the hype, hype, hype about how each sequence reveals important disease information will stop. Or it should stop, at least. We can get on with our work, without the TV crews and material-hungry journalists.
This paper is interesting and is a first stage in getting better data on variation in Africa. So far it shows what we knew to be the case about Bantus, San, and about admixture between them in southern Africa. The sequence shows much variation between the groups, and especially among the San, but we've known that for more than 20 years (PNAS, 1989, vol 86, pages 9350-54, PubMed ID 2594772). That work was done by old friends of ours Henry Harpending and Linda Vigilant who at the time, like the current genome sequence authors, were also here at Penn State (and in our own Anthropology Department) at the time.
The lack of major new findings is what would be expected, but demonstrates that we really are learning things from the past generation of studies of genetic variation. The details of future sequences will be worth waiting for, but when they come they will not be worth shouting about.