Thursday, February 2, 2012

Triumph of the Darwinian Method, continued: genetics as function, function as history

So a powerful reason that the universal approach to biological questions was inspired by Darwin's theory of evolution (descent with modification), is that history leaves a trace in gene sequences, and gene sequences reveal history.  Even a sequence that, statistically, is random by itself, is very non-random when compared to other sequences.  That's because all DNA sequences are, in the history of life sense, related.  They may be statistically random along the chain of nucleotides, but they are very much not random when compared to each other.

The same is true when we look into the structure of a DNA sequence, where once again history shows why what may seem random is anything but.  Again, it is the Darwinian method, and the assumption of common ancestry, that makes it possible to understand this.

A century of work has shown us that DNA is related to the functions that go on in cells, in ways that are essentially common to all aspects of life.  This has to do with how those functions are encoded in DNA, and once we know how to read the code, we can also see both persuasive evidence for natural and other forms of selection, but also that make sense strictly in light of life as history.

Here is a sequence of part of a human  gene: (gastrin):


We picked this as a nice figure showing what we want, from a human genetics textbook by G. Moroni, 2001.  The nucleotide sequence is in black, and the amino acid sequence, or protein code, is in brown, written under the nucleotide code.  By itself, the DNA sequence appears just to be random nucleotides scrambled in a row.  But here are labeled various parts that make it a gene: where messenger RNA is transcribed and the amino acids it codes for (brown), where the regulatory proteins bind to make this happen the TTATA in color), and a signal for where a string of A's will be attached (AAATAAA).  These types of features, and others not shown, can be identified nowadays just by analyzing a naked DNA sequence.

You can go to what is called a genome 'browser' (the link is to the UCSC genome browser) and see the many structural and functional elements of DNA for any gene you can name in any species for which we have the DNA sequence.  Here it is for the gastrin gene. The top lines are the sequence location, then the gene with its protein-coding parts (dark boxes), connected by a thin blue line for the noncoding parts (called introns), showing where the gene code begins and ends, and belowthat a grey bar whose darkness reflects the degree of sequence conservation, or similarity with other species, and below that black boxes showing the location of various short sequence elements that are 'motifs' found exactly or nearly repeated in many places in the human and other related species' genomes:


A browser can show many other features, but a snapshot showing them all would utterly clog this post (but, you can see the results for the gastrin gene here.)  Note for example that the coding regions are areas where the conservation bars are darkest.  That means that these areas are also very similar in sequence in other species for the 'same' gene. "Conservation", "same", "coding regions", etc., are all terms that refer to aspects of DNA we identify by comparison or that are similar because of history--shared ancestry.  And the reason some areas in and around the gene have varied or changed less during that history, that is, that are conserved in sequence, we know from all sorts of data, is that they are  functional parts of the DNA.  As Darwin's theory would hold, they are important enough that mutational change in the DNA sequence probably did not work as well, and did not reproduce as well, as in less important areas: what Darwin called natural selection.

We know these things because, after many decades of research, we have learned that the general features, like the code for amino acids that make up the protein (here, gastrin), are essentially universal: that means that we can also find or identify these functional elements by various kinds of experiment, but the methods themselves derive from what's important: our ability to compare sequences of genes from any species or among species, and to compare sequences from the same gene in different species.  Genes come and go, so we find the 'same' gene in sets of species that have diverged since the gene's origins.

Because of this, because of shared history and common ancestry, what by itself may seem to be an entirely random sequence of nucleotides, becomes understandable as an entirely non-random sequence when it comes to explaining what it does and why it exists. In fact, non-functional parts of the sequence may accumulate mutations randomly without selective constraint, but even they are transmitted faithfully (with occasional mutational change), and hence bear a trace of their history.

From the point of view of the Darwinian method, that is, the aspects of the scientific method that Darwin's insights set rolling, these tools that are related to life's nature as shared history are fundamental to our modern understanding of life.  That is the triumph of the method.

Thus, a truly random sequence (such as in our previous post) would not only be statistically unpatterned itself (which is what 'random' means), but it would have none of the known functional structures that we know the evolution of life has produced in all its creatures, large and small.  That's why we'd be truly spooked by an actual sequence that not only didn't fit anywhere on the known tree of life's creatures, but also didn't show any elements that we are know that evolution has made fundamental to the nature of living organisms.

There are still many things to be debated, about how to interpret various of these DNA-sequence factors, but not their nature as products of history.  One can debate the nature and role of natural selection, or the strength of effect of individual sequence variants on the traits (like presence of disease) of the organism carrying them.  We write all the time here on MT about over-stated claims about genetic causation and how easy it is to concoct adaptive Just-So stories.  The Darwinian method is so powerful that it lures scientists to excess, to uncritical acceptance of scenarios and claims that go beyond what really is scientifically legitimate--in some sense, just as Adam Sedgwick accused Darwin of doing:  assuming a theory which no facts could erode.  That's ideology, not science.  Scientists may indulge in such story-invention more, if anything, than Darwin himself did, so strong has a simplistic selectionist belief become in many quarters, either because a notion of Darwinian theory has been bought uncritically, or because as in many public arenas, education in biology has been a kind of Darwinian indoctrination.

Iron-clad theories are self-fulfilling, and too many in science and the media, buy into facile stories.  There are many ways for differential proliferation of genetic variation and the traits it affects to occur or for variation to be distributed around the earth, of which natural selection is only one.  Temptation to invent stories notwithstanding, however, the aspects of the Darwinian method that we've tried to explain are so pervasive that there is no serious doubt about the historical nature of life on earth, which applies to all the competing explanations, and the fact that to persist or proliferate systematically, DNA sequence elements must have tolerable, or advantageous, function.  It should not have to be pointed out that this does not include creationist explanations, that do not require nor predict the kind of trace of history that we clearly observe.  Biologists are not arguing about the fact of life as history nor that that is why DNA sequences are nonrandom in ways they are nonrandom.

Because evolution happened in the past, we must triangulate our approaches to understand it.  This is  why a mixture of repeated observation--induction--has been fundamental from Darwin to us today, and yet why deduction from a theory worked over the years by observation of DNA, leads us to predict what we'd find in some newly discovered sequence.  And for the same reasons, why neither induction nor deduction would help us explain a truly novel DNA sequence.  Scientific reasoning, especially when most things can't be proven by experiment and result from past events, must be a kind of social mix of various people taking various approaches, and combining their findings.

At the core of this mix is what is known as the Darwinian method.  In a profound sense, whatever we in biology argue about, it isn't that!

No comments: