The Mermaid's Tale: Ooops! The human genome does not exist! Part I. The notion of a type specimen

It's sultry summer, and of course the science hype machine motors on unabated, but we want to ignore it briefly and relax at least a little bit before Penn State resumes it's serious academic activities--that is, football season. Still, while you're basking in the sun getting your tan (and skin cancer), along with your mint juleps (and undoubtedly some other ugly diseases), here's something to think about:

The human genome doesn't exist

René Magritte: This is not a pipe.
It's a painting. But a painting of 'a'
pipe--but not 'the' pipe, which doesn't
actually exist in the real world. Or does it?

Despite many claims to the contrary, that The Human Genome project sequenced the human genome and thus set in motion the most exciting era of fundamental new scientific discovery since Galileo, it has turned out that the HG doesn't exist, after all. It no more exists than does 'the chair' or 'the dog' as Plato once asserted. He said we have dogs and chairs but they are only imperfect instances of the real true dog and chair.

Now what everybody raves on about, the HG, is like that. It is not from one person, not even one copy from one person's two. It's not clear whether this is even from one person or from several donors, or if from one, who that person is in terms of his origins (his, because there's a Y chromosome in the sequence).

In any case, it's not a 'normal' genome, because while we believe the sequenced individual(s) was/were healthy at the time they bled for the cause, they won't be healthy forever. Since we are told every day by the press that every disease without exception must be genetic, and hence we should GWAS it endlessly, the donors will eventually become genetically abnormal.

Worse, 'the' genome keeps changing! We are now in version 19, released in 2009. It's either more details from the same donors, or now has bits that couldn't be sequenced easily from those donors' DNA and so was sequenced from additional donors (we don't know which is the case). There will be future revisions. Not only that, but it is a 'haploid' sequence, with only a single-nucleotide per position reference, whereas any human has two copies (except for X and Y chromosomes). Yet in any one person 1% or more of sites actually vary.

This is not Van Gogh's
kitchen chair

Now, this means clearly that nobody actually has 'the' currently posted human genome sequence any more than your kitchen chair is 'the' chair. Of course, what some biologists who are more savvy than most people who talk about it know is that the HG is a reference sequence. Nobody has and nobody has ever had, this incomplete reference HG. All of us have some variants of that sequence. It is wrong, but standard, to refer to what we each carry as 'copies' of the HG. They are not copies! There is no copy of the HG! No one person ever had that sequence, so no person could copy (replicate) it to transmit to its offspring.

Probably, we should use a term like 'instance' rather than copy, in the same sense that each chair is an instance of the concept of chair, even if no Platonic actual ideal chair exists, except as a reference concept in our minds.

What is good for the goose is good for the gander. If the HG doesn't actually exist, neither does a given gene say 'the' beta-globin gene, which is expressed in red blood cells and of which some variants are involved in anemia (like sickle cell anemia). Here is hg19's version of a very small part of that sequence (click to see details):

Any one instance of this in an actual person, like you, may or may not have this exact sequence. But unless it's been deleted from your genome, you'll have something very similar, not because it is a copy of some Platonic idea, but because they are descendant copies handed down through the generations since we shared a common ancestor, and that--evolution--is the crucial difference. It's why we should be careful about treating a reference as being real, or how we relate the existing instances of the 'same' (homologous) sequence.

We use 'the' HG sequence as a way of referring to a standardized set of nucleotide locations along the chromosomes in the individuals who were sequenced. Each of us varies from that sequence in millions of places (in each copy that we carry). To find similar functional elements--genes and so on--in my instances of the human genome, we use 'the' HG as a kind of map. If I had your DNA sequence, I could identify which is 'the' beta-globin gene by its similarity to that in the figure, even if you didn't have the exact same sequence.

This is very useful, but it's important to be aware that not only does nobody have the same sequence, but even the structural details--the locations of the 'same' elements in you and us and the donors of 'the' HG--differ among people. So the coordinates (the number at the top line) will not be the same. For example, the beta-globin gene starts at chromosome 11 position 5,246, 696 in draft hg19, but at 5,203, 400 in draft hg18, a difference of around 43,000 nucleotides! Where did they come from? Which draft should we believe--if any? What will hg20 say? What is 'the' HG?

Type specimens

American Robin

Think about this in another context. For centuries we have had the concept of a type specimen. There is 'the' robin, 'the' monarch butterfly (or even, perhaps, 'the' Loch Ness monster??). 'The' Neanderthal hominid is an arbitrary individual whose luck it was to die and not rot, so we could discover him around 30,000 years later in Germany. The lucky old sod gets to represent all of his contemporaries and ancestors and descendants for hundreds of thousands of years. And 'the' Neanderthal DNA sequence is partial, a composite of several individuals who lived hundreds of kilometers and thousands of years apart. Think about that when you hear people pronouncing on how 'the' Neanderthal lived, or what color hair 'it' had, or whether it had religion!

Robin you'll see in the UK (image: RSPB)

The international organization of systematists--biologists who are concerned with cataloging, characterizing, and name species and their relationships--has decided what counts as a reference specimen of an organism, in the same way we decide what counts as a reference specimen of the genome of a species. Indeed, and interestingly, the genome of 'the' mouse or 'the' horse, is likely rarely if ever to be from the same official physical type specimen on display at some authorized museum. Thus some poor (former) robin sits rigidly on some display branch for all to see, representing all robin-hood individuals it never saw or dreamt of.

This leads to interesting questions about how we treat our subjects, biology and evolution. Are there alternatives to type specimens? So, while you nurse your mint julep in suspense, think about this--the whole concept of a reference specimen, be it genetic or physical, because tomorrow we'll give a least a bit of thought to this question.

11 comments:

DeannaAugust 1, 2012 at 12:36 PM
You can learn more about how the reference assembly is changing here:
http://genomereference.org
Henk PoleyAugust 1, 2012 at 1:46 PM
Minor nitpick on the intro. Though UV causes suntan and cancer, UV-B also causes the skin to create vitamin D, which is used to keep a lot of cancers at bay. 50-75% reduction for a lot of cancers when looking at the highest serum level cohorts.
AnonymousAugust 4, 2012 at 3:46 PM
Future project: synthesise and clone the reference human. To be stored under an airtight belljar in Sèvres, next to the reference kilogram.
AD JohnsonAugust 6, 2012 at 9:42 AM
Thanks for a nice blog entry. I've worked in genetics and bioinformatics for about a decade and collaborated with many from a variety of backgrounds. Anecdotally, these are issues that a significant fraction of geneticists, biologists and clinicians are relatively unaware of until they are explained to them (i.e., it is not covered in most courses and textbooks).

A few years ago I proposed a modified IUPAC code to try to provide single reference sequences more reflective of populations/organisms:

http://bioinformatics.oxfordjournals.org/content/26/10/1386.short

More recently Euan Ashley and colleagues explored the construction of sythetic reference genomes from different popluations. This article provides some nice examples of the benefits of an improved reference or synthetic sequence:

http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002280

P.S. - go Lions (proud Alum)!
Ken WeissAugust 6, 2012 at 9:51 AM
We deal with other aspects, probably similar in spirit to your thoughts, in the later parts of this series...but also we raise some problems.

At this point, as far as "go Lions" is concerned, it's too bad what happened to the football team, since only one player, named Jerry Sandusky, has ever been implicated in wrong doing, and of course the coach did the positive things he's been credited with, even if he had his human failings.

But that's what happened to 85 Lions. What about the other 44,915 Lions who are here to play with pen and computer, not footballs? I think we need to focus on academic reform, to take some leadership in addressing national problems in a long creep of weakening standards. Hopefully, it'll happen, but I'm not holding my breath.

Wednesday, August 1, 2012

Ooops! The human genome does not exist! Part I. The notion of a type specimen

11 comments: