The human genome doesn't exist
René Magritte: This is not a pipe. It's a painting. But a painting of 'a' pipe--but not 'the' pipe, which doesn't actually exist in the real world. Or does it? |
Now what everybody raves on about, the HG, is like that. It is not from one person, not even one copy from one person's two. It's not clear whether this is even from one person or from several donors, or if from one, who that person is in terms of his origins (his, because there's a Y chromosome in the sequence).
In any case, it's not a 'normal' genome, because while we believe the sequenced individual(s) was/were healthy at the time they bled for the cause, they won't be healthy forever. Since we are told every day by the press that every disease without exception must be genetic, and hence we should GWAS it endlessly, the donors will eventually become genetically abnormal.
Worse, 'the' genome keeps changing! We are now in version 19, released in 2009. It's either more details from the same donors, or now has bits that couldn't be sequenced easily from those donors' DNA and so was sequenced from additional donors (we don't know which is the case). There will be future revisions. Not only that, but it is a 'haploid' sequence, with only a single-nucleotide per position reference, whereas any human has two copies (except for X and Y chromosomes). Yet in any one person 1% or more of sites actually vary.
This is not Van Gogh's kitchen chair |
Probably, we should use a term like 'instance' rather than copy, in the same sense that each chair is an instance of the concept of chair, even if no Platonic actual ideal chair exists, except as a reference concept in our minds.
What is good for the goose is good for the gander. If the HG doesn't actually exist, neither does a given gene say 'the' beta-globin gene, which is expressed in red blood cells and of which some variants are involved in anemia (like sickle cell anemia). Here is hg19's version of a very small part of that sequence (click to see details):
Any one instance of this in an actual person, like you, may or may not have this exact sequence. But unless it's been deleted from your genome, you'll have something very similar, not because it is a copy of some Platonic idea, but because they are descendant copies handed down through the generations since we shared a common ancestor, and that--evolution--is the crucial difference. It's why we should be careful about treating a reference as being real, or how we relate the existing instances of the 'same' (homologous) sequence.
We use 'the' HG sequence as a way of referring to a standardized set of nucleotide locations along the chromosomes in the individuals who were sequenced. Each of us varies from that sequence in millions of places (in each copy that we carry). To find similar functional elements--genes and so on--in my instances of the human genome, we use 'the' HG as a kind of map. If I had your DNA sequence, I could identify which is 'the' beta-globin gene by its similarity to that in the figure, even if you didn't have the exact same sequence.
This is very useful, but it's important to be aware that not only does nobody have the same sequence, but even the structural details--the locations of the 'same' elements in you and us and the donors of 'the' HG--differ among people. So the coordinates (the number at the top line) will not be the same. For example, the beta-globin gene starts at chromosome 11 position 5,246, 696 in draft hg19, but at 5,203, 400 in draft hg18, a difference of around 43,000 nucleotides! Where did they come from? Which draft should we believe--if any? What will hg20 say? What is 'the' HG?
Type specimens
American Robin |
Robin you'll see in the UK (image: RSPB) |
This leads to interesting questions about how we treat our subjects, biology and evolution. Are there alternatives to type specimens? So, while you nurse your mint julep in suspense, think about this--the whole concept of a reference specimen, be it genetic or physical, because tomorrow we'll give a least a bit of thought to this question.
You can learn more about how the reference assembly is changing here:
ReplyDeletehttp://genomereference.org
Thanks for this link, which I had not known of. I'll look at it. Meanwhile, I was just talking with people here at Penn State who are involved, and learned something of what's new. This led me to modify the first post, somewhat qualified from the original version posted this morning.
DeleteAlso, I'm adding a Part IV to further update the concepts.
But I think the conceptual issues are not entirely changed, even with improvements, and part of the purpose of the posts is that some or even many biologists or even human geneticists aren't fully aware of the nature of a reference sequence etc. and what I would call its epistemological meaning, relative to the thorny problem of population vs stereotypical thinking.
We've just looked at this blog and it does provide information on many of the issues discussed in this series. It shows among other things that there are issues we don't mention, that there are a number of people working on how best to represent out genome (and yours!).
DeleteThere is also a publication (should have included it the first time): http://www.ncbi.nlm.nih.gov/pubmed/21750661
DeleteMany people aren't aware of how the genome was constructed, despite this being documented in the literature. We are working towards evolving the reference assembly into more of a pan genome- but we clearly have to continue fixing errors as well. Additionally, there is a lot of room for additional reference sequences from individuals from different populations. The nature of the reference assembly you need depends largely on the questions you are trying to ask.
Thanks for this reference, another we did not know about....it's impossible to keep up with everything. I think nothing we say is inconsistent with this.
DeleteI do believe that many if not most people are unaware of these things, even though they use 'the' genome regularly.
In future episodes of this brief series I think we address some of these questions, at least in our own way.
Minor nitpick on the intro. Though UV causes suntan and cancer, UV-B also causes the skin to create vitamin D, which is used to keep a lot of cancers at bay. 50-75% reduction for a lot of cancers when looking at the highest serum level cohorts.
ReplyDeleteWell, maybe; the vit D story is far from clear at least in relation to sunlight exposure, but the skin cancer story is very clear. Whether Aussies have a net gain or loss of cancer overall, as a result of their infatuation with the sun (which, historically, gave them lots of skin cancer, though maybe their being more protective these days) isn't clear (to us, but maybe to someone it is).
DeleteAnyway, that is a nit in the intro, which was just a glib intro, so we don't mind being picked!
Future project: synthesise and clone the reference human. To be stored under an airtight belljar in Sèvres, next to the reference kilogram.
ReplyDeleteWell, there is a problem. His/her/their genomes will experienced mutations during the cloning. Even just en-jarring the original person(s) won't work, as it/they too will have experienced mutations since the DNA was collected.
DeleteWe deal with some of the complications in later parts of this series.
It seems that having 'the' meter stick, or 'the' cesium clock is an easier task.
Thanks for a nice blog entry. I've worked in genetics and bioinformatics for about a decade and collaborated with many from a variety of backgrounds. Anecdotally, these are issues that a significant fraction of geneticists, biologists and clinicians are relatively unaware of until they are explained to them (i.e., it is not covered in most courses and textbooks).
ReplyDeleteA few years ago I proposed a modified IUPAC code to try to provide single reference sequences more reflective of populations/organisms:
http://bioinformatics.oxfordjournals.org/content/26/10/1386.short
More recently Euan Ashley and colleagues explored the construction of sythetic reference genomes from different popluations. This article provides some nice examples of the benefits of an improved reference or synthetic sequence:
http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002280
P.S. - go Lions (proud Alum)!
We deal with other aspects, probably similar in spirit to your thoughts, in the later parts of this series...but also we raise some problems.
ReplyDeleteAt this point, as far as "go Lions" is concerned, it's too bad what happened to the football team, since only one player, named Jerry Sandusky, has ever been implicated in wrong doing, and of course the coach did the positive things he's been credited with, even if he had his human failings.
But that's what happened to 85 Lions. What about the other 44,915 Lions who are here to play with pen and computer, not footballs? I think we need to focus on academic reform, to take some leadership in addressing national problems in a long creep of weakening standards. Hopefully, it'll happen, but I'm not holding my breath.