Thursday, March 7, 2013

What is genetic function? The ENCODE non-questions

The human genome is 3.1 billion nucleotides long.  If only 1-2% of it codes for proteins, what does the rest do?  And how do we figure that out?  One way is to commit multi-millions of dollars to a project dedicated to doing just that, and that's exactly what's been done by the ENCODE project, which recently published, with much fanfare, the results of years of work by a consortium of over 400 people and numerous labs. The comment by the ENCODE PR spokespeople that got the most attention during all the hoopla when the papers were published was the idea that while  98-99% of the genome was once called 'junk DNA', now it looks like 80% of the genome is in fact functional.

There is a heated, and indeed vitriolic debate about how misrepresentative and even highly wasteful, the ENCODE Megaproject was.  ENCODE is a cute acronym (we need that in science, after all, since much of what we're about is marketing, so we need a brand or trade-mark) for ENCyclopedia Of DNA Elements.  In a nutshell, the project was a large consortium whose objective was to identify as much of the functional elements in genomes as possible.

The interaction of tRNA and mRNA in protein synthesis:
Wikipedia
We know that DNA codes for protein, but that is only about 1-3% of the genome.  Another small fraction is transcribed into a variety of RNA molecules that do things on their own (that is, they don't just get translated into protein).  Examples are transferRNA, ribosomalRNA, and various others. Some of this, called microRNA, is used to affect gene usage, by interfering with messenger RNA and hence protein production.  Protein coding and these RNA other processes and interactions are what in our Mermaid's Tale book we called 'correspondence' codes in the genome, because DNA contains the code for -- corresponds to -- the RNA which is then used elsewhere in the cell. 

Then there are bits of DNA that contain what we called 'recognition' codes.  The DNA sequence is directly recognized by other molecules, such as proteins called transcription factors, that physically bind to the DNA sequence elements and, among other things, cause nearby protein-coding genes to be transcribed into messengerRNA.  These are codes, but they act locally on the DNA itself.

The middle and ends of chromosomes (centromeres and telomeres) contain DNA sequences used for protecting the integrity of the DNA molecule in the potentially hostile chemical environment of the cell, or in the process by which chromosomes are copied when the cell divides.  There are other codes of various kinds in DNA that affect how it is wrapped around proteins so it can fit into the nucleus and so on.

But various studies had shown that much or even most DNA is actually transcribed into RNA molecules of unknown (if any) function.  It is replicable--so not just chance or experimental trash.  Since the function isn't known, it is debatable whether this is truly 'functional' or not.

Meanwhile, 40% or even more of our DNA consists of repeat elements, short sequences that are found scattered all over the genome, and (among other ways) are copied from one location and inserted more or less randomly in some other location.  These relate to various processes, including errors in DNA replication (e.g., microsatellites) or use some viral-related mechanism on rare occasions, but enough over evolutionary time to proliferate in the hundreds of thousands.

By some accounts, especially since it seems to be transcribed into RNA, much or even most of the genome is 'functional'.  Such claims challenge well-established ideas that most of the genome has very little function--what was called 'junk' DNA--and that therefore only a small fraction really matters.

But what is function?
A lively, funny, but quite sharp--some would say vicious--attack on the excited reports of ENCODE by Dan Graur and colleagues was published recently.  First, even though the ENCODE authors, being good scientists, put lots of caveats in the original papers, they were not averse to the super-hyping given the report by the media.  Instead of saying that ENCODE had provided a very useful and accessible  data resource and some thought-provoking data, the usual hype about transformative new findings, mysteries uncovered, etc. was all over the media last year.  Graur et al. blasted such reportage as culpable, or even scientifically naive hype (or, perhaps, bovine droppings).  Indeed, the aspects of genome structure and use that were reported by the project were all to some extent or other already well-known, even if ENCODE provides a more systematic data resource and coverage of them than had been available before.

The controversy involves many different issues, some of them quite technical and methodological, but the core centered around ideas of 'function'.  The project investigators used various methods to find biochemical activity of different kinds to identify aspects of the genome that were functional by that standard.  Thus, for example, if a transcription factor protein stuck to a particular bit of DNA, that was activity and classified as function; it didn't have to be shown to affect a protein-coding gene's expression level.

From an evolutionary point of view, function only matters if it affects reproductive success--or 'fitness' in the Darwinian sense related to natural selection.  Why is this?  It's because if it doesn't affect fitness, then mutations will eventually disrupt the activity but with no loss to the organism's reproduction.  The bit of DNA will, over time, accumulate variation among individuals and between species.  By contrast, a bit of DNA that does have a fitness effect will have much less variation in the population, because mutational disruption will harm the individual, who won't reproduce, taking the variation out with it.  We say that relatively limited variation, or sequence conservation among or within species, indicates evolutionarily important function.  Indeed, even if the bit of DNA did have some function that affected a trait, say body shape, but not in a way that would be screened by natural selection--that is, not in a way that affected fitness--that function would sooner or later be erased by mutation.

In that sense the function might be real but evolutionary unimportant or irrelevant. The idea that one could have function but not be affected by mutation in this way is tantamount, the critics argued, to saying that organized structures could arise just by chance, without being molded by natural selection.  That is hard to justify (actually, there may be such reasons, but they're too much to go into  here).  But it's worth noting that Graur et al. do point out that such function could, under some circumstances, become relevant to natural selection, so that even highly variable bits of DNA may not be unrelated to evolutionary potential.  But looking at it at any given time can't tell you that, and doesn't warrant assigning function in the evolutionary sense to it.

Wasted electrons--debates over angels on pin-heads
This is a debate about many things, but in part centers around orthodoxy.   The discussion is over the question  "What fraction of the human genome is actually 'functional' in these latter senses?"  10%? 80%?  17.654382234887%? 

This is a thoroughly electron-wasting debate (using up electrons via the internet and airwaves), because we know very well, beyond any serious doubt, that the usefully used parts of the genome vary from person to person and, indeed, from cell to cell within each of us!  And if we take a broader evolutionary view, different parts and regions and fractions of genomes will be used over time. 

Among individuals in a species at any given time there are hundreds of dead or partly dead genes, regulatory regions with variable strength transcription factor binding, and so on, all across the genome.  These vary from person to person, as we have clear evidence to prove.  And, have you forgotten the hoopla over copy number variation, the hot recently-new finding that our numbers of genes and other parts of our genomes vary among by the thousands us and between the two copies of the genome each of us carries?

Since everyone differs, it is almost impossible in principle to ask this question of any single individual, because there just isn't enough information and unique observations that can't be tested with the statistical approaches needed to document it (needed for reasons that are not controversial).  Alternatively, we might come to some sort of average functional fraction for a species, but that is rather vague and perhaps misleading--misleading about how DNA functions.  For example, it's been estimated that a high fraction of our 'real' genes (protein-coding or regulatory regions) are individually dispensable if other well-working genes cover the same function.  In that sense, a high fraction even of those 'real' genes are dispensable. 

Whether something has a fitness effect is also a statistical question, since there is always a probabilistic aspect to reproductive success.  The amount of conservation in a DNA region is in principle an indicator of past history of natural selection, but it also involves other factors (population size, mutation rate, and so on), and it is inherently a relative measure.  Assessing what varies enough to be judged not to have a fitness-related function is also a statistical issue, not one with precise criteria.  If the relatively limited variation in protein-coding regions reflect real function, how much more variable reflects no, or 'less' function?  One might say that the evolutionary definition of function, based on more relative variation, is not entirely free of the subjectivity issues that plague the ENCODE definition of fitness based on having some biochemical activity.

This wastes trillions of electrons, because it stimulates the media streams of hot air, capitalizing on the flap, even though the issues themselves are debates over non-questions or even subjective issues, as we have tried to suggest here, that are not clear cut--and of course there is the strong vested interest of the investigators vigorously to defend the over-selling of yet another over-priced mega-project, so its funding won't be cut.  As usual, this electron stream misses much of the interesting and actually scientific aspect of the findings and their ambiguities.

For example, the issues rest on the tacit idea that genome functions can be enumerated at the nucleotide sequence level by essentially assuming that each function is independent of other functions, which is purely a fiction.  But interdependence makes these kinds of issues, that are based on differing criteria of 'function' and relative variation, very tricky.   And the evolutionary argument essentially assumes that non-conservation means no function which is also a misperception of the dynamic control and complexity of genetic mechanisms and evolutionary adaptation.  This is not the place for me to outline my view on that, but I do think that a proper understanding of genomes and their evolution can answer the perceived differences of point of view in the current food fight.

Of course, thinking seriously about evolution is harder, won't please Big Story seeking journalists, and makes less dramatic material for grant applications.  So the food fight is not at all surprising.

7 comments:

Anonymous said...

On a related note of function, I find it fascinating that a variety of RNA transcripts derived from even extinct transposable elements perform inhibitory functions (RNAi) by basically folding back over the DNA sequence it was transcribed from and silencing it from further transcription. Considering mobile elements, extinct and active, make up ~50% of the human genome, it's feasible that a large portion of transcripts are these types of RNAi's, helping to maintain stability.

Ken Weiss said...

I had not known about that finding in RNAi. Graur et al. would probably insist that it be shown to do what you describe in proper cellular contexts--that is, that it actually does affect transcription in cells that use the gene and that the effect makes a difference to the cell.

Just thinking about it shows the complexity of the issues generally, and about 'function' in particular.

One thing that seems certain: there is much about what DNA does that we don't yet understand.

Anonymous said...

Am trying to comb through my papers. One article from 2003 by Sikjen & Plasterk which has gotten a lot of press focused on still-active Tc1 DNA transposons in C. elegans germline and found that this "snap-back" dsRNA would silence the transposon from further transcription. I believe they also postulated this was one important mechanism in regulating transposition events in the germline, whereas mechanisms may be less stringent in mature cells.

Being interested in DNA topology, I would be interested in what sort of structure (some variant of a tetrahelix?) that this dsDNA/dsRNA structure might take.

http://www.ncbi.nlm.nih.gov/pubmed/14628056

Josh Nicholson said...

Prediction: A decade from now there will be another Encode project only much larger. Because THAT was the problem with ENCODE not all this other stuff you talk about. ;)

http://news.sciencemag.org/scienceinsider/2013/03/ready-for-more-10000-cancer-geno.html

Ken Weiss said...

If bigger means better you'd be right about the science. But since bigger means at least more money, it is a safe prediction, regardless of the science. Scientists are not fools, and know how to run a business.

Ignacio Gallo said...

Great post thanks!

I don't think that the debate about what could be the "evolutionary significant" amount of function is just a waste of electrons, though...

I understand when you say that it won't lead very far as such, but it's a good reminder of the fact that the biochemical version of fitness is just a rough indicator that scientists use for something much deeper.

I actually wouldn't oppose considering big projects like ENCODE worthwhile just in view of the increase in public awareness on the fundamental aspects of biology which they cause, big exercises in PR. Though of course I wouldn't mind if extreme poverty and hunger were dealt with first.

But then, if one sees the gargantuan project at least in part as a social exercise, it's actually quite healthy that some electrons are spent in playing devil's advocate.

Maybe with a bit more electrons had been sacrificed, Angelina Jolie would've spared her twins... (just joking there, I really have no clue how warranted that was :) )

Ken Weiss said...

I guess I'd just say that money spent on huge generic Big Science studies is money unavailable to individual investigators who might have innovative ideas. And when many people can't afford a home or health are, it's not, in my mind, justifiable to spend so many electrons on 'public education'

If the data, despite the political issues of Big Science, lead people to think more carefully about causal concepts in genetics, then that might be good.

But most of what happens is that such studies superficially invoke some 'theory' (in the Encode accusation, that they used naive concepts of function), draw some conclusion, trumpet it to the media.....and the public you would like to see educated (even the public of scientists) get propagandized as much as educated.

Some times, the controversy does lead to new thinking, and the Encode controversy may be such. But we'll see. The gravitational pull of business as usual is very strong.