Friday, August 20, 2010

Genes playing possum!

A story by Gina Kolata in the NY Times reports a muscular disorder that is due to what was considerd a 'dead' gene, or 'junk' DNA.  A dead gene, usually called a pseudogene, is a DNA sequence derived from an incomplete copy of a functionally active gene, or a gene that was once actively used but whose transcription regulatory sequence mutated away.  Or a gene could suffer a mutation in its coding sequence that makes the resulting protein not work.

The smugness with which DNA sequence in between regular protein-coding genes was called 'junk' DNA is rapidly fading.  Much of our DNA has no known function, but even there we have evidence that there may be function--for example, some regions of such DNA have sequence that is conserved (basically the same) among species that haven't shared a common ancestor for many millions or more years.  If it had no function maintained by natural selection, why hasn't mutation simply erased the similarity among species of that sequence?

To find that a pseudogene region that was known to be transcribed into RNA can actually interfere with normal processes is interesting and biomedically important.  It's worthy of a story in the Times (and Kolata is a worthy person to write it) (see, we don't just criticize the popular science media!).

We can't make generalizations from this about 'junk' DNA however.  The degree to which a bit of non-coding DNA has some function, of some sort, is difficult to know simply because proving 'no' function is virtually impossible.  That's why evolutionary conservation is a persuasive indicator that something's still usefully active.

Francis Collins is quoted as saying that this is interesting and complex in its mechanism, which is not yet understood.  He rightly says that in genetics, whatever can go wrong will go wrong--a principle Ken called the Rusty Rule of life in his 1990 book on disease genes, because evolutionarily we know this had to be so, since mutation can strike anywhere in the genome.

There may be DNA with no function, not even spacing-function to keep other functional elements some proper distance apart.  Perhaps it could be called 'junk'.  But right now the problem we face in non-coding DNA is the opposite: there is so much of it, it's hard to understand how natural selection could be maintaining it.

When a sequence variant has very little function--and most of this DNA seems clearly in that category--then we expect genetic drift (chance) to determine how the frequency of the variant will change over time.  In that case, deep evolutionary conservation is not to be expected or at least should be less than that of really functional DNA.  But even saying 'less' is problematic, because we need a baseline for the rate at which variation will accumulate in truly nonfunctional DNA.

But if we can't be sure of what's really nonfunctional, where is our baseline?  We can try theory, try some experimental things (like watching bacteria over thousands of generations in a lab), but it's not easy to know.

Ironically, a common bit of DNA to use for that baseline is--you may have guessed it--pseudogenes!  Because a dead gene has no function!  Well, the Times example shows that some, at least, do have a function, and it is very possible that this disorder, though not lethal, could affect the reproductive success of those unfortunate enough to carry it.

Life is always playing tricks on it.  What you see may not be what you get.  Something may look pseudo, but only be playing possum.


James Goetz said...

I suppose that the best way to develop a baseline would be the direct analysis of mutation rates. For example, mutation rates estimated from the direct analysis of mutations would crosscheck mutation rates estimated from the divergence of, yes, "junk DNA." Also, if mutation rates estimated from direct analysis match mutation rates estimated from divergence, then science has good estimations of the amount of non-conserved DNA, which I suppose could be called "junk DNA." Additionally, if there are large amounts of non-conserved DNA in mammals, then I suppose that would be explained by the nearly neutral insertion rate exceeding the deletion rate. And please forgive me if I botched genetics terminology in this paragraph.:)

Ken Weiss said...

Mutation rates will be estimated with more accuracy as whole-genome sequence from related individuals come on the scene. But that's just humans or perhaps a few other lab model species.

However, mutation rates don't tell you if the new allele has any function or not, and that's what we need to know.

There are lots of indirect ways, such as comparing variation in non-coding or redundant coding parts of DNA in different types of organism. A good paper has just appeared on that subject by Michael Lynch. But his object is to estimate the mutation rate, not the amount of nonfunctional DNA. Indeed, one of his arguments is that nearly-nonfunctional DNA (DNA with minimal effect on fitness) evolves basically by chance (genetic drift).

So the problem is a challenging one

occamseraser said...

Lynch on mutation: this one?
or this one?
or yet another one?

Ken Weiss said...

He has written much on this and I'm not familiar with all of it. I would say for recent summaries, his book with Sinauer, The Origins of Genome Architecture, and a PNAS paper summarizing the main points, about 4-5 years ago.

Above all, a paper in the current Trends in Genetics. That would be the place to start. Then his book.

Ken Weiss said...

Another of his points is that a lot of 'deleterious' mutations end up becoming fixed. That's related to the impact of drift when selection is weak and populations small. It's a subject I discuss in an upcoming installment of my column in Evolutionary Anthropology