Thursday, May 28, 2009

'The' mouse last!

A new paper in PLoS Biology reports the release of a better and more thoroughly documented genomic DNA sequence for a major strain of laboratory mice, the C57BL/6J strain. This strain is the one used in countless experimental settings and into which many transgenic modifications have been introduced. C57's are an old, venerable experimental model for mammalian development and also for biomedical research. So, the publication of the new results is important and worthy.

It will show us many things about the evolution of this type of mouse. However, conclusions about 'the' mouse have to be taken with some caution. First, like most laboratory strains, C57's are inbred. That means that from a starting cross between two 'parental' strains (derived from 19th century mouse-fanciers' breeds), brother-sister mating over many generations has led to mice that are not only identical to each other, but have identical sequences on both copies of their genome (the one they inherited from their father and mother, respectively). In the production of inbred mice, any genotypes that (due to inbreeding) are harmful don't reproduce. So what we have today are combinations of the alleles (sequence variants) at all genes that happened to have been present in the original parental animals and are also able to form a viable, compatible combination when there is no variation. They must be compatible both with embryonic development, as well as daily life after birth and successful mating.

These mice are in this sense not 'real' mice. Similarly, they're not real humans either! The new paper estimates around 1000 genes not shared with humans, and roughly 25% sequence difference between 'the' mouse (C57's) and 'the' human genome (that is, the reference sequence, itself a composite of DNA from various human donors).

In this sense, we have to be properly circumspect about the relevance of the new, higher quality C57 sequence relative to both mouse evolution and human disease-related research. In this sense, the now-typical hyping of the news release to the media needs to be judged as excessive and self-serving for the scientists involved (and the journals, news media, etc.). The new data need to be kept in perspective.

That said, the better we understand mice that we work with on a daily basis, and whose experimental results are in thousands of papers, the better our basic understanding of mammal biology and evolution will be.

What is different is more thorough and reliable DNA sequence data, a more complete coverage of all of the genome than we had before (some parts are difficult to sequence for chemical and other reasons), and documentation on recent ad hoc events such as the duplication or loss of chunks of DNA that happened on the way from our common ancestor with mice to the lab mice we have today. This kind of copy number variation (CNV) was not known until a few years ago, but could explain some trait differences (including disease). Also, some kinds of RNA that are copied from the genome have potentially important functions yet to be understood, and finding these in the mouse--and showing which are also conserved in humans or other species--is a step towards an understanding of the wealth of new DNA function that does not code for protein but does something else instead.

But one mouse is not the same as mice! There will, for example, be at least some variation among C57's even in any given lab, and between labs, depending on when they obtained their mice from a supplier. This is because mutations accumulate over the generations. In real mammals, CNVs are often polymorphic: each of us may vary in the CNVs we have between our two copies of the genome, and there is variation among people within populations and among populations. The same is true, of course, at every functional DNA unit: there is lots of variation.

Using an animal model like C57's essentially sweeps these issues under the rug. We hope what we learn is robust enough that we are not being misled. But when the same transgenic experiment is done on different mouse strains, around 30% of the time the results are quite different. Similarly, different humans with the same known disease mutation can have very different traits (e.g., Craig Venter and Jim Watson, whose sequences have been published, have several 'disease' variants yet they don't have the disease).

We need to keep in mind that a collection of mice from the same inbred strain, like we have in cages in our lab, is like copies of a snapshot of one (artificial) mouse rather than a natural population. But if we do keep that in mind, at least it is best to have a well-focused snapshot at high pixel resolution! That's what the new data help us with.

So this is a good bit of new data, that will make life more interesting and reliable for all sorts of scientists, not just those working on disease. Whether the public, at whom the exaggerated publicity releases were aimed, will actually reap the suggested benefits or not, is a separate, though important, question.


  1. I've only just begun to read the paper, but interestingly, they are reporting on Build 36 which has been publicly available since February of 2006. A newer version, Build 37, has been the standard since July 2007.

  2. Yes, Kazz in our lab pointed that out. In a way it further undermines the hyperbole given to the story. On the other hand, as I understand it, the new PLoS paper provides various kinds of analysis of the build. So, while it's not new raw data, it's helpful. Now, of course, someone will have to check whether Build 37 is inconsistent with the Build 36 analysis in any substantial way.