Wednesday, February 1, 2012

Triumph of the Darwinian Method, continued: genetics as history

Is a DNA sequence 'random'?  How would one know?  There are several tests for randomness that one can do on a DNA sequence, however it was derived -- there's a discussion of this issue here. For example, one can go down the sequence one nucleotide at a time and ask if that nucleotide can predict the next one in line.  One way is to ask whether the next ones are A, C, G, and T each 25% of the time.  That would mean that whatever the current nucleotide is, the next one is just unpredictable.  Then one can ask the same for the nucleotide 2, 4, ..., etc 12,287 positions down the row.  With some exceptions, such tests for predictability or periodicity would fail.  That means, the sequence is random!

That is very weird, since DNA is responsible for organisms, and organisms seem to be anything but random!  Or, could it be that at some higher level, organisms in this earth are, in some profound ways, 'random' in structure or behavior etc.?

If I give you a DNA sequence, and you go into some program on the web (of which there are many) that searches all known DNA sequences to see which are closest to the test sequence, the expectation is that it will be an exact or near match to something known.  We have DNA sequence data from most branches of life, so we'd 'hit' a known species, or something similar in sequence.  That would pin our test sequence on the tree of relationships which to a Darwinian is a tree of life, and the key aspect of that is that the tree is the result of a history.

If this seems obvious, it is at the same time a profound reflection of the only convincing hypothesis about life, and here our method assumes a 'tree' and fits a sequence on it, and the only reason we can do this is because life is history.  If the sequence were from an individual already sequenced, there would be a complete match.  If it were from that individual's sibling, there would be a very close match.  If from the same population within a species, the similarity would be less but still very strong.  If from a different but 'related' species (say, two different forms of cat) again similarities would be clear--way more than with a bird or lizard or maple sequence.  This is only because of history, and it is only that fact that allows us to make sense of DNA similarities.

Suppose you were given the following sequence:
ACGTCCAATCTGGGGTAAACCCGAGATCTGAGGCCTACCTGCAATTTCGGCCACACACAGGGTGTTACCCCGACTTCAGGGCA

Now, go search the data bases for it, to see what known sequence it's closest to.  Here is what you'll find if you use a common tool, called BLAST, for comparing sequences: "No significant similarity found."

Now, since we have sequence from basically every branch of life (though, at present, not whole genome sequence, to be sure, but this is a practical but not conceptual problem in our current context), how can our test sequence not fit the tree?  If it really were unrelated to anything known, nor within the tree of known sequence, we would either have a sequence totally made up (which this one was!), or from a wholly unknown branch of life.  We have so much data at present, that such a result would be very spooky and unlikely.  Mutations arise randomly in DNA, relative to their effect on the organism, but even this kind of randomness is inherited, which is why even sequences that, by various statistical tests, seem to be random assemblages of nucleotides, fall into historical relationships with each other.  They may be 'random' on their own, but not to each other.

A sequence truly unrelated to any other could deeply threaten our very Darwinian foundations!  That is how strong and well-supported his hypothesis about life is.  If that is a failure of induction, then so be it.  We would go to very great lengths to find other explanations for our mysterious sequence, before we would even begin to question the hypothesis of evolution!  Would it be less weird than current allegations of meteorite structures, to suggest that such a sequence came from Mars?  Would it resuscitate arguments about spontaneous generation (see earlier post on this)?

And in another way such a sequence would be at least as profound, or perhaps much more profound than just a missing relationship.  That is because, on its own DNA sequence can seem, statistically,  to be a random string of nucleotides--once we know about history, and have experimental data (which we do, in profusion) we can see how utterly nonrandom DNA sequences are when it comes to what they do--that cannot by itself be 'read' off from the sequence alone, without this information.  And this information is essentially connected to Darwinian ideas.

This gets us to consider not just the fact of the tree of life's history, but the functional roles of DNA, and Darwin's other idea, that the tree is built by natural selection.  Comparative DNA sequences, viewed through the Darwinian method, also say something about that as well.  That is for next time in this series....

9 comments:

Holly Dunsworth said...

Ooo, I can't wait!

Holly Dunsworth said...

What were the terms you chose in BLAST? At least, some highlights? There are so many boxes to check (or not)... so many!

Ken Weiss said...

Tomorrow there will be another use. The main 'tracks' to show are RefSeq(the gene itself), Conservation (upper left box in that set of options), Encode regulation, and Repeat masker. Then if you want to explore more or in more detail, you can add other tracks. 'Dense' is the usual way I select to show things.

You can scroll along the chromosome, search by specified regions, species, or gene names to see what you would like.

Ken Weiss said...

Sorry, Holly, maybe I misunderstood your question, as my answer was about the genome browser. Anne did the BLAST search and she can tell you what options she picked....

Anne Buchanan said...

I went for simple. From the BLAST home page (http://blast.ncbi.nlm.nih.gov/Blast.cgi), under "Basic Blast", I chose "nucleotide blast", and then just pasted in my (made-up) sequence in the first box where it says "Enter Query Sequence." Then hit "BLAST", wait a bit, and see what you get.

If you're not used to using BLAST, you could paste in an actual sequence, to see what happens when you get a match. One way to do this is, at the NCBI website, go to the nucleotide dababase and search for any gene of your choice -- here's one, not exactly at random, as it's relevant to Friday's scheduled post: http://www.ncbi.nlm.nih.gov/nuccore/NM_013618.3. Scroll to the bottom of the page, select some of the sequence and paste it into the BLAST search box.

Hope that's at least sort of clear!

Anne Buchanan said...

P.S. I really wish we could include links in comments! Anyone know how to do that in Blogger?

Anonymous said...

I think you can use standard html "a href" tags:

BLAST

NM_013618.3

Anne Buchanan said...

Thank you!

Anne Buchanan said...

Just because I can, I'm posting a link to another gene sequence.