The HypeMachine is out in force behind the 30 new publications, accessible via Nature's "ENCODE explorer", featuring the ENCODE show. It's been 10 years, at least, since Nature, Science, and other journals started press-release, Big Splash, orchestrated displays of Surprise! at their latest magnificent special issues. Well, it's how business works--and this is business, after all, despite the attempt to make it appear simply the quest for knowledge.
The media pick up on it of course--you can't miss it today--and it's no different from the political convention gala celebration. Some journalists understand the issues better than others, but few of them are doing other than reporting the hype. American life has become one big Potemkin village: all show, but substance a lot harder to find. So what is this ENCODE report?
ENCODE is a mega-project designed to use various data and means to identify the functional usages (or not) of every part of selected segments of the genome. Data from sequencing, gene expression, RNA detection, experimental function, and so on are coalesced into a set of digested reports. But you can also play with various aspects on the ENCODE site.
So, besides the clear money-seeking hoopla, what is actually being found? First, we are told that the term 'junk DNA' is a relic of the past. That was when we 'knew' that DNA was a string of nucleotides, like a necklace, but along which there were spaced some protein-coding sections. The history of research was driven largely by exploiting Mendel's finding of discrete functional heritable units that were later 'genes'.
Then we learned that some of the proteins coded by those units, by genes, served to control the use of other genes, depending on their cell context. How they did that was that the coded protein recognized short DNA sequences roughly nearby on the chromosome to some gene to be regulated. These genes are called transcription factors. That showed how cells differentiate, but also that some DNA was 'regulatory': it didn't code for protein, but instead directly coded for recognition by these TFs. So it was functional, but not coding.
Step by step many other kinds of function have been found, over a period of 20 or so years. Read the report and summaries if you want to know what these functions are. But essentially, they further relate to gene usage, repression, or modification; or to chromosome copying and packaging, or to other such functions.
Not an idle nucleotide?
They confirm our idea that, at its base, DNA controls protein production and perpetuates itself, but through many subtle, complex functional roles. Many are still to be discovered, but a much larger fraction of the genome is found to have some function, and the protein coding parts have been shown to comprise only a few percent of the whole of our DNA. Whether ENCODE has actually determined the function of 80% of the genome, as they now claim, is debatable.
For example, introns are non-coding stretches of DNA that interrupt the coding parts of genes. A gene codes for a protein essentially in sections--stretches of DNA with amino acid codes, separated by stretches that don't code. All of this is transcribed to RNA but the non-coding parts are spliced out, and the coding parts joined together. ENCODE descriptions, in stretching for their 80% figure, include introns because there are some clearly shown functions of some of the nucleotides found in some introns. The important part is that introns are not all just inert DNA, but the excess is to say that all of it matters.
Around 40% or more of the genome is made of various types of repeat elements--short segments of DNA of which copies or near-copies are found adjacent to each other or splattered all over our chromosomes. They get inserted from time to time by various molecular means. Some few of these repeat elements have been shown to have function, and the location and size of various clusters of them may, as well. But most have no plausible function in the proper sense of the term. The cell may have to deal with them, but that's not generally clear and that doesn't mean that they actually do anything.
So there is a bit, or a lot, of exaggeration. Still, the point is valid that more is going on in genomes than had been thought. However, whatever the percent of functional DNA actually is, documenting that function is fine, but much of it has been gradually being discovered for 20 years or so. So the hoopla is very, very misplaced. ENCODE data base is going to be a useful, convenient web tool and so on. It's great to have new knowledge, and new websites comprising and summarizing such knowledge, but it's not all newly discovered. And, equally important, some of the functions are not very well understood and much guessing is still going on. And 'function' is being conveniently redefined to self-justify and puff up the importance. All this, too, would be fine if fully acknowledged and, especially, if it weren't for the self-importance being proclaimed. And the constant dropping of hints that this will lead to disease cures, like claims that Mars adventures are because there might be life out there, have to be viewed mainly as just advertisements--sales pitches.
A deeper issue: evolutionary considerations
Nobody wants to think about some of the deeper issues. One of them is that biological function basically arises because it is advantageous to your reproductive success (evolutionary fitness in the face of natural selection), or is removed if it's harmful. So if this all has function, it must be screened by natural selection. That is easy to say but very hard to understand. Why?
First, the more that has function that is screened by selection, the more excess reproduction is needed to overcome that selection so that a species--ours, or any other species--doesn't just go extinct because everybody's got bad bits of DNA somewhere in their genome! This is called genetic load. Obviously we don't hatch out thousands of babies so that each human couple on average produces two or more surviving children.
In turn this means that even if 80% of the nucleotides in our DNA (that's 2.5 billion nucleotides, by the way) is being screened by selection, the selection pressure is on average, trivially small. For the vast majority of sites, only in very occasional, rare, bad-luck combinations with other sites will selection screen you up or out. And that in turn means that genetic drift --chance--will dominate in determining the variants each of us has, and hence the general nature of, genomic DNA sequences. And again in turn, this means that in essence most of this DNA doesn't, for any practical purposes, have function after all. At least not very specific or important function. Most species, certainly most large slow-reproducing species, simply couldn't support so much selective screening.
There are ways for DNA over the eons to have evolved a lot of distributed function, even if in any individual like you or me, the vast majority has essentially no function (or, that is, its variation has essentially no important effect on the variation between you and me). But this is far from strong assertions of function that are being made.
Reporters and bloggers should take off their rose-colored hype-believing glasses, and understand this.
The important bottom line
Despite all these reservations, a deeper understanding of genomic content is important both for basic biology and its practical application. What it all really means is actually something that was not wanted and not really predicted, even though we had good reason to do so. It's that each new discovery shows that genetic control of what an organism is and how an organism works (and, hence, how organisms evolve) is more complex and less about single, deterministic causation than Mendel's work had led us to think. He showed us the extremes of simplicity, by carefully chosen experiments, and that set out a path we could follow for discovery.
But complexity always grows, never diminishes, by this work. It shows why promises of simple cures for disease--always the hope dangled before the drooling public to keep the till open--may be more elusive, at least elusive if approached from a genetic-causal viewpoint. Part of the problem is that all of these functions are not just additive: you can't get to net function--how you actually are as an organism--just by adding up all the little variant contributions in 80% of your genome.
Genes and the various genome functions interact. They form networks and complexes of contribution to final traits that are not just additive. Indeed, identifying how they are not additive is a major challenge. Some aspects of genome function are obviously strong and clearly understandable by usual scientific approaches. Some are close enough to being additive that we can assume that. But we know that much, probably most, is not so easy.
Secondly, we have because of the specific history of discovery come to think of genome function as being linear--things along a line of nucleotides that make up a single chromosome. Action occurs in what is called cis--meaning along the chromosomes. But we are learning--and ENCODE presents but did not discover this--that there are trans interactions, which means direct interactions between bits of DNA on different chromosomes, as they arrange themselves in the nucleus of cells--and, somehow, do this differently depending on the cell type and its context at any given time. This must be important, but there are deep questions: genomes have rearranged their segments in many ways during evolution, and yet related species with very similar functions (like, say, all mammals), still manage to function even though their genomes are rather differently arranged (and they have different numbers of chromosomes). So how critical is all of this trans activity and organization?
And there are many other curious things going on that are not mentioned in the current ENCODE papers. All very interesting and at least some of it undoubtedly important.
There are many implications of this growing knowledge, and they all point to a need to come to grips in some better, or even some fundamentally newer, way than we have been doing. This is an important challenge, and the ball to keep your eye on. Like political conventions, simple stories and promises are spun to win your votes, but big issues are at stake.