Sparing you the gory details, we'll just say that the chromosomal intervals that one or more of the traits mapped to span 30% of the genome, and 10% of all coding genes. That's 2400 genes or so that could potentially be of interest in affecting head shape in just these particular mice, with the restricted genetic variation they have (because they are descendants of only two inbred parental mouse strains). That's a lot of genes to wade through to figure out which might be most likely to be involved in the traits we're looking at.
One way to prioritize candidate genes from such a study is to look for the genes in every interval that you know from prior work to be involved in your trait of interest. Or to identify genes in families that include genes involved in your trait of interest -- these would then be considered guilty by association.
But this means most genes don't have a fighting chance of being considered, because you don't happen to know anything about them, or because nobody knows anything about them, or because what's known about them only partially represents what they do.
To try to minimize this, many people automate the search, with programs that cull the genes that the literature indicates might be of interest, or that seem to be expressed where you want them to be. So, this might solve the problem of no one knowing everything about all genes, but it doesn't solve the problem of nothing being known about so many genes, or that there's only partial knowledge. And it doesn't solve the problem of having to tell the program what to look for, which means you're constraining it in the same way you would if you were doing the search by hand, looking for specific families of genes. Nor, of course, does it solve the problem of what's happening in all the non-coding DNA that flanks all those genes.
Thus, we decided that the least biased way to comb the data was to go through all the genes in all the intervals by hand. We're still making sense of all that, not least because we are hoping not to be constrained by the usual ideas about statistical significance, but we've learned some interesting things along the way.
For example, one of the intervals of interest is loaded with olfactory receptor (OR) genes. Olfactory receptors reside on the cell surface of olfactory receptor neurons, and are involved in odorant detection. ORs form the largest family of genes in many genomes -- about 1000 different genes -- and they cluster in sets of genes in various locations on a number of chromosomes. ORs have a distinctive expression pattern, with only one expressed per neuron in the tissue lining the nose, where they each are sensitive to particular aspects of molecules the animal inhales, and hopes to smell. How expression of the remaining 999 genes in each cell is blocked is still not known.
ORs are an interesting example of something we've blogged about before, but that continually surprises us. One of the ways we're evaluating the possible role of all these genes in development of the traits we're looking at is to look at where they are expressed in the developing embryo. We initially thought this would be helpful for narrowing the search, but it turns out that about 95% or even more of genes (for which there are expression data) are expressed in the head (80% alone in the brain), so it's turning out that expression isn't all that helpful for narrowing the search. But it does mean we've looked at images of gene expression for around 2000 genes.
|Olfr66, GenePaint, E14.5|
What's it doing there? These are olfactory receptors! You don't smell with your backbone! In fact, a lot of ORs are known to be expressed outside the olfactory region, particularly in the testes, but also in the spleen, the thyroid, salivary glands, the uterus, the skin, and other tissues. A 2006 paper is of interest in this regard, not only because it documents non-olfactory related expression, but because of its title -- "Widespread ectopic expression of olfactory receptor genes". Ectopic expression, meaning expression where it's not supposed to be.
But it's only not supposed to be expressed in the axial skeleton because that's not where its name says it will be, not because Nature says so! People named these genes! And, there is some discussion in the paper about how ORs might be involved in chemotaxis of sperm as they try to reach and penetrate the egg -- how they direct their movement, based on chemicals in their environment. Which is equivalent to assuming they are essentially carrying out their olfactory function in the testes, where a different form of molecular reaction than odorant-detection is going on. But, what about in cartilage, in the image above? It's hard to imagine chemotaxis has anything to do with OR function here.
Well then, maybe it's an experimental artifact -- maybe the experiment picked up expression of a gene sort of like Olfr66, but not quite, along with Olfr66? Maybe. But, then we'd have to explain away all the expression studies showing non olfactory expression of many ORs, and it's rather unlikely that it's all due to experimental artifact. This is how our own assumptions constrain what we know or even want to know about the function of so many genes. Maybe Olfr66 has a function we don't yet understand. As do other ORs. And, by extension, so many other genes.
But calling unexpected expression 'ectopic', or naming genes based on only a single role, or in their involvement in disease, when they have other perfectly normal functions, are ways of building in assumptions that, once accepted, can keep us from recognizing that there's a lot we don't yet understand about genes.