Monday, March 25, 2019

Human Genome Diversity: important to recognize, but not a new issue

A couple of decades ago, several of us, led by Luca Cavalli-Sforza, Marc Feldman, Ken Kidd, and several others (including yours truly), got together to suggest a worldwide sampling of human genetic diversity that would specifically include the diverse 'anthropological' populations (traditional tribal groups who still existed but were being surrounded or incorporated -- or worse -- by the growing, large agricultural/industrial civilizations.  The idea, called the Human Genome Diversity Project (HGDP), was to collect DNA samples from hundreds of populations worldwide who would otherwise be un- or under-represented in the available data on human genomic variation.  The large agricultural/industrial populations are swamping (if not literally exterminating) these more ethnically aboriginal peoples.  Yet their pattern of genomic diversity is that from which the dense populations derived, and the latters' variation may tell us about the origins and nature, and perhaps adaptive fitness, interactions and so on of the larger pan-human population into which 'we' grew.

The idea of a global HDGP was stifled by two things.  One was attacks by political opportunists (and played culpably by the media) who felt this global sampling was demeaning to the aboriginal populations or would be designed imperialistically to profit from those peoples by patenting findings; and secondly, by the hungry economic maw of the human genome sequencing project then in progress and preemptive.

The upshot was that the HGDP was never funded.  Luca donated the set of global samples then available to him, to the France-based CEPH website, where they were given the HGDP name (that is still the case, though I think it wrong, because the data set was not a systematically global design-first-then-sample project, so it rather co-opted the HGDP name).  Nonetheless, and to the good, the DNA along with analytic results from those samples are freely available to qualified researchers.

Another HGDP organizer, Ken Kidd at Yale (along with his wife, Judy, and other collaborators), has produced an excellent, publicly accessible website called ALFRED, which provides allele frequency data from populations around the world, plus documentation of the sampled population and a variety of other user-friendly features.  Among other things, this is a fine tool for teaching global human diversity,

Now, a new paper by Sarah Tishkoff and others (Sirugo et al., "The Missing Diversity in Human Genetic Studies", in Cell 177, March 21, 2019) makes the case for sampling human genomic diversity, of a sort, pointing out various reasons why it would be good to address the current bias in genetics towards Europeans with global sampling of human variation.  Obviously, I agree with that although many technical points could be raised about whether the inevitably smaller samples from scattered small populations could possibly be analyzed as effectively as the very large samples required to identify risk variants that are being over-peddled to us via the various 'omics and Big Data advocates.

What are the 'populations' and what does 'diversity' properly include?
The value, potential and humane importance of properly sampling humans beyond the major large populations in Europe and North America is obvious, but the new paper makes the case mainly for the larger 'mainline' populations other than Europeans.  Unfortunately, though even they be numerous in the census sense, they are heterogeneous and it is unclear who, exactly, and how, current data represent them.  Can we just blithely say we need to include 'Africans' to address the representativeness problem?  Are, for example, African-Americans, not to mention 'Hispanic-Americans' all the same among possible samples?  And the same regarding Asians. The current paper deals with these issues at least to some extent.  But then what about, say, New Zealand natives, or Cherokees, other small populations, or which castes and from which parts of India must we collect data? How exhaustive should we sample and how can complex genomes effectively be parsed in this way (not to mention environments--a topic at least acknowledged by Sirugo et al.).

Francis Collins' current 'All of Us' sloganeering is, to me, a culpable mis-representation to the public, a strategy to ply huge funds out of Congress in open-ended ways, too big to terminate, a welfare project for university research and their various supporting industries and interests.  The idea seems to be implicit, though unjustified, that any sort of open-ended Big Data 'omical project can be fair to small sub-groups (indeed, I would argue from various aspects of what we know already, it can't for the major ethnic groups either).  So what does the promise that this is for 'All of us' actually mean, beyond transparent strategy to pry open-ended funding from Congress?

Problems with the promise in the first place
Now while I agree that increasing sampling of human diversity is important for many reasons, not least being fairness, the paper promises that it will increase or improve 'precision' medicine.  To me, that is sloganeering, and avoids facing up to what Big Data 'omics have already shown us about causal complexity of the important non-Mendelian traits--complexity not only in the genomic but also environmental senses.

There are several obvious, but obviously conveniently ignored reasons for this.  First, 'genetic' causation involves more than inherited genomic variation.  Important variation arises during life, when cells divide.  This somatic variation is genetic, but not sampled in the usual genome-sequencing way.  Yet somatic variation clearly has important consequences because, a cell doesn't 'know' if its genome sequences were inherited from the individual's parents, or arose during the individual's life.

Secondly, the whole enterprise assumes that induction can lead to deduction, that is, that what we've observed in the past leads us to predict the future.  It is not just inherited and somatic mutations whose future is literally unpredictable, but the same is true for lifestyle exposures.  Yet lifestyle exposures are vital components of complex disease risks.  They cannot be predicted, even in principle.  That means past exposures do not predict future ones (to environments or mutations).  This is not a dark secret, no matter how inconvenient for the 'omics prediction industries.  Unlike many areas in chemistry and physics, induction does not lead to deduction in life.

What we need is deep re-thinking of the problem of genomic effects on disease and other traits.  But that is not easy to arrange when careers and institutions depend on very large, very predictable, basically permanent funding is needed for the persons involved.  To improve these aspects of our science, we need a different way to support it, new economics, not bigger data or more sequencing.  and a side benefit of such reform, were it ever possible, would be to free up investigators' minds from surviving to surmising--new ideas.

Our "I'm first!!" era in science
I do have to note that the tendency to ignore, or be ignorant of, prior work is manifest in this paper, which does not mention the HGDP.  We are in an "I'm first!" era in science.  I think Shakespeare understood the clearer truth: 'What is past, is prologue'.

Good ideas need to be followed up, and properly sampling the world is one such good idea.  But this paper doesn't really deal with the small, traditional aboriginal populations.  In the case of the HGDP effort, there was simply a lack of support for sampling small, relatively isolated populations to build a picture of human genomic diversity out of the context from which it actually arose.  But it was an effort that explicitly recognized the issues, as they stood at that time.  So it is not excusable that the new paper fails to acknowledge the precedent advocating worldwide population sampling.  The senior author was very familiar with that effort.

A good idea, that should not seem novel, would be for scientists to read, and cite, their predecessors who had prior recognition of an issue or problem and inevitably, even if indirectly, are leads to stimulating subsequent work.  But crediting others doesn't help one's career score-counting, and it takes at least a tad of effort to find out what an ideas' ancestors may have thought, not to mention crediting them.  In this case, the senior author had every reason indeed to know directly about this history.  Indeed, she did her doctoral and post-doctoral work in places deeply involved in the HGPD!

Anyway, this griping aside, it is at least worth discussing in a serious way whether and how a global sampling of worldwide populations, beyond the main 'racial' groups, would be a good thing to do.  I think it would.  We are, after all, throwing away countless millions (or is it billions?) on proudly hypothesis-free Big Data 'omical enumerations, projects too big to stop (no matter how, by now, largely pointless). We now know the basic landscape, and it is not nearly as encouraging as its self-interested press regularly blares.  Its valuable results should stimulate hard, new thinking, but as long as business as usual pays and absorbs careers, who knows when that will happen?

Even if reform is difficult because of vested interests that we've allowed to develop, it is proper to acknowledge one's intellectual ancestors.


Sarah Tishkoff said...

Ken, glad to hear that you support the need to include ethnically diverse humans in human genomics research. I am, indeed, well aware of the important contributions of the HGDP towards the study of human genetic diversity. In fact, we had originally cited the study but the editor made us remove it (along with about 50 other citations) because she said we couldn’t have more than 15 (in the end she let us get away with 18). Further, the focus of this commentary was on clinical applications and GWAS which are the examples we cite. I think discussion of studies of population genomic variation in indigenous populations, and the bioethical issues associated with doing that type of research, would be a great topic for a future review. Regards, Sarah

Ken Weiss said...

I guess we just disagree. No journal can prevent acknowledging the legacy of the past, which in this case was rather clear, explicit and well-known. Recognition need not involve specific citations. There were similar ideas and databases prior to the HGDP proposal. Without acknowledgment, knowledgeable readers may reasonably think the authors were trying to claim more credit than is due.

And there is something scientifically important at stake, too: we all work in the context of and are framed by the history or our field (it's 'paradigms' if you will). So that context should be acknowledged. Anyway, that's my view.

Giorgio Sirugo said...


I will take full responsibility for having sliced off a lot of what is missing in our commentary, including referencing the HGDP. The claim of originality that might transpire from our article is unintentional. That said, below is the link to a good piece written by Ricki Lewis after a conversation with Sarah, in which the seminal importance of the HGDP is clearly stated.

All the best,


Ken Weiss said...

Thanks Giorgio! I think it is important to recognize and credit the past for reasons I've tried to state. We all make mistakes, and I think hardly a paper is published in which someone could find citation omissions. Space is limited, and memories and awareness fallible. Indeed, I think the usual pattern is the opposite, and is intentional: to over-cite, often in a transparent ploy to flatter potential reviewers (when I detect that as a reviewer, it has the opposite effect!).

I guess I reacted in part because I was involved in the work, largely led by Luca Cavalli-Sforza and Ken Kidd, who had been in this for years before that and were giants in the field. But as I tried to say in an earlier message, I think that from the point of view of the history and 'flow' of science, what is past is prologue and shapes what we think today (and perhaps, stimulates the innovator to think differently).

Actually, of course, attempts to sample genetic variation as it was knowable at the time, were many during the 20th century. If you're interested William Boyd wrote a good book about this 'Genetics and the Races of Man', in the 1950s. And Cavalli et al's massive tome History and Geography of Human Genes, amazingly now 25 years ago, covered what was known (including the immunological genotypes that, themselves, had been given global treatments).

Of course we all want to be identified for what we do, and hope it's influential and will, itself, be cited! It's our nature and how our careers are (too) often evaluated. So in this case I guess we are seeing eye to eye, and that's good.