Thursday, September 6, 2012

More advertising, yet a lesson to learn

The HypeMachine is out in force behind the 30 new publications, accessible via Nature'"ENCODE explorer", featuring the ENCODE show.  It's been 10 years, at least, since Nature, Science, and other journals started press-release, Big Splash, orchestrated displays of Surprise! at their latest magnificent special issues.  Well, it's how business works--and this is business, after all, despite the attempt to make it appear simply the quest for knowledge.

The media pick up on it of course--you can't miss it today--and it's no different from the political convention gala celebration.  Some journalists understand the issues better than others, but few of them are doing other than reporting the hype.  American life has become one big Potemkin village:  all show, but substance a lot harder to find.  So what is this ENCODE report?

ENCODE is a mega-project designed to use various data and means to identify the functional usages (or not) of every part of selected segments of the genome.  Data from sequencing, gene expression, RNA detection, experimental function, and so on are coalesced into a set of digested reports.  But you can also play with various aspects on the ENCODE site.

So, besides the clear money-seeking hoopla, what is actually being found?  First, we are told that the term 'junk DNA' is a relic of the past.  That was when we 'knew' that DNA was a string of nucleotides, like a necklace, but along which there were spaced some protein-coding sections.  The history of research was driven largely by exploiting Mendel's finding of discrete functional heritable units that were later 'genes'.

Then we learned that some of the proteins coded by those units, by genes, served to control the use of other genes, depending on their cell context.  How they did that was that the coded protein recognized short DNA sequences roughly nearby on the chromosome to some gene to be regulated.  These genes are called transcription factors.  That showed how cells differentiate, but also that some DNA was 'regulatory': it didn't code for protein, but instead directly coded for recognition by these TFs.  So it was functional, but not coding.

Step by step many other kinds of function have been found, over a period of 20 or so years.  Read the report and summaries if you want to know what these functions are.  But essentially, they further relate to gene usage, repression, or modification; or to chromosome copying and packaging, or to other such functions.

Not an idle nucleotide?
They confirm our idea that, at its base, DNA controls protein production and perpetuates itself, but through many subtle, complex functional roles.  Many are still to be discovered, but a much larger fraction of the genome is found to have some function, and the protein coding parts have been shown to comprise only a few percent of the whole of our DNA.  Whether ENCODE has actually determined the function of 80% of the genome, as they now claim, is debatable.

For example, introns are non-coding stretches of DNA that interrupt the coding parts of genes.  A gene codes for a protein essentially in sections--stretches of DNA with amino  acid codes, separated by stretches that don't code.  All of this is transcribed to RNA but the non-coding parts are spliced out, and the coding parts joined together.  ENCODE descriptions, in stretching for their 80% figure, include introns because there are some clearly shown functions of some of the nucleotides found in some introns.  The important part is that introns are not all just inert DNA, but the excess is to say that all of it matters.

Around 40% or more of the genome is made of various types of repeat elements--short segments of DNA of which copies or near-copies are found adjacent to each other or splattered all over our chromosomes.  They get inserted from time to time by various molecular means.  Some few of these repeat elements have been shown to have function, and the location and size of various clusters of them may, as well.  But most have no plausible function in the proper sense of the term.  The cell may have to deal with them, but that's not generally clear and that doesn't mean that they actually do anything.

So there is a bit, or a lot, of exaggeration.  Still, the point is valid that more is going on in genomes than had been thought.  However, whatever the percent of functional DNA actually is, documenting that function is fine, but much of it has been gradually being discovered for 20 years or so.  So the hoopla is very, very misplaced. ENCODE data base is going to be a useful, convenient web tool and so on.  It's great to have new knowledge, and new websites comprising and summarizing such knowledge, but it's not all newly discovered.  And, equally important, some of the functions are not very well understood and much guessing is still going on.  And 'function' is being conveniently redefined to self-justify and puff up the importance.  All this, too, would be fine if fully acknowledged and, especially,  if it weren't for the self-importance being proclaimed.  And the constant dropping of hints that this will lead to disease cures, like claims that Mars adventures are because there might be life out there, have to be viewed mainly as just advertisements--sales pitches.

A deeper issue: evolutionary considerations
Nobody wants to think about some of the deeper issues.  One of them is that biological function basically arises because it is advantageous to your reproductive success (evolutionary fitness in the face of natural selection), or is removed if it's harmful.  So if this all has function, it must be screened by natural selection.  That is easy to say but very hard to understand. Why?

First, the more that has function that is screened by selection, the more excess reproduction is needed to overcome that selection so that a species--ours, or any other species--doesn't just go extinct because everybody's got bad bits of DNA somewhere in their genome!  This is called genetic load.  Obviously we don't hatch out thousands of babies so that each human couple on average produces two or more surviving children.

In turn this means that even if 80% of the nucleotides in our DNA (that's 2.5 billion nucleotides, by the way) is being screened by selection, the selection pressure is on average, trivially small.  For the vast majority of sites, only in very occasional, rare, bad-luck combinations with other sites will selection screen you up or out.  And that in turn means that genetic drift --chance--will dominate in determining the variants each of us has, and hence the general nature of, genomic DNA sequences.  And again in turn, this means that in essence most of this DNA doesn't, for any practical purposes, have function after all.  At least not very specific or important function.  Most species, certainly most large slow-reproducing species, simply couldn't support so much selective screening.

There are ways for DNA over the eons to have evolved a lot of distributed function, even if in any individual like you or me, the vast majority has essentially no function (or, that is, its variation has essentially no important effect on the variation between you and me).  But this is far from strong assertions of function that are being made.

Reporters and bloggers should take off their rose-colored hype-believing glasses, and understand this.

The important bottom line
Despite all these reservations, a deeper understanding of genomic content is important both for basic biology and its practical application.  What it all really means is actually something that was not wanted and not really predicted, even though we had good reason to do so.  It's that each new discovery shows that genetic control of what an organism is and how an organism works (and, hence, how organisms evolve) is more complex and less about single, deterministic causation than Mendel's work had led us to think.  He showed us the extremes of simplicity, by carefully chosen experiments, and that set out a path we could follow for discovery.

But complexity always grows, never diminishes, by this work.  It shows why promises of simple cures for disease--always the hope dangled before the drooling public to keep the till open--may be more elusive, at least elusive if approached from a genetic-causal viewpoint.  Part of the problem is that all of these functions are not just additive: you can't get to net function--how you actually are as an organism--just by adding up all the little variant contributions in 80% of your genome.

Genes and the various genome functions interact.  They form networks and complexes of contribution to final traits that are not just additive.  Indeed, identifying how they are not additive is a major challenge. Some aspects of genome function are obviously strong and clearly understandable by usual scientific approaches.  Some are close enough to being additive that we can assume that.  But we know that much, probably most, is not so easy.

Secondly, we have because of the specific history of discovery come to think of genome function as being linear--things along a line of nucleotides that make up a single chromosome.  Action occurs in what is called cis--meaning along the chromosomes.  But we are learning--and ENCODE presents but did not discover this--that there are trans interactions, which means direct interactions between bits of DNA on different chromosomes, as they arrange themselves in the nucleus of cells--and, somehow, do this differently depending on the cell type and its context at any given time.  This must be important, but there are deep questions:  genomes have rearranged their segments in many ways during evolution, and yet related species with very similar functions (like, say, all mammals), still manage to function even though their genomes are rather differently arranged (and they have different numbers of chromosomes).  So how critical is all of this trans activity and organization?

And there are many other curious things going on that are not mentioned in the current ENCODE papers.  All very interesting and at least some of it undoubtedly important.

There are many implications of this growing knowledge, and they all point to a need to come to grips in some better, or even some fundamentally newer, way than we have been doing.  This is an important challenge, and the ball to keep your eye on.  Like political conventions, simple stories and promises are spun to win your votes, but big issues are at stake.


rich lawler said...

The genetic load argument is interesting. I also recall that genetic load is greatly diminished if soft selection prevails, which was likely the case in the evolution of humans (since population regulation was likely local and we roamed in smallish groups). Hence, the cost of selection would be weakened even further, even if 80% of the genome was under the scrutiny of selection.

Ken Weiss said...

Yes, and to the degree, which is clearly rather more the case than the exception, that traits are controlled by many genes, the selection intensity is distributed even more widely.

Humans clearly are viable (til we blow ourselves all up or pave over all arable land), yet very slow reproducers, a long-term legacy from primates. And most mammals are slow reproducers, relative to all of this, as well.

So all of this DNA function may have been put in place gradually, our own genomes reflecting that very, very ancient legacy. But there is so much variation, and genome rearrangement among species, that clearly these many functions are minimally important (tolerate variation).

It is not seriously conceivable that the hundreds of thousands of distributed or repeat elements across the genome, comprising half or more of it, would have 'function' in the sense of all this hyperbole in the media.

Amit said...

Did you see this piece by Mike Eisen? Reading that press release he links to, you would think that NHGRI spent $200M to conclude that junk DNA doesn't exist!

Ken Weiss said...

It is a sad commentary on our advertising-based times. Young people need to raise grass-roots objections, or they (you!) are trapped in decades of playing this Red Queen rat-race just to stay employed.

It is worse than the fact that many scientists know that we stopped talking about junk DNA years and years ago, because many scientists--especially those in the health and medical sciences--don't know that and have a very cartoonishly naive idea about evolution and genetic architecture.

The real work of science should be done more quietly, slowly, deliberately and with fewer mega-projects so that more people working under the constraints of limited budgets and hence forced to think harder, can raise the chance of somebody actually coming up with new ideas.

Holly Dunsworth said...

That so many ENCODE reporters thought that junk DNA needed to be overturned is indicating that maybe there still is a great need to overturn the notion of junk DNA in the pop mind. Repetition is important with science news and education. And not all news stories reach all people. Non-scientists don't get the exposure that scientists do. It takes time to overturn them in the popular mind. I think we should be more patient about that.

Ken Weiss said...

I think reporters, even Gina Kolata who is usually very good, should not be so gullible, and if they're science reporters they ought to learn more before writing about it. But also, this is clearly part of the hype-machinery. The scientists themselves, when talking to the reporters, are probably saying they're dramatically over-turning the idea of 'junk' DNA, because that's a way to promote their work...and they see that reporters eat it up because it makes for a good story line.

So, you are probably right in many ways, but I am far less charitable to the System, which I think more or less knowingly is a mutual reinforcement, exaggeration phenomenon.

As I recently heard in another context, only bad news really gets reported. In the past even science reporters for major media clearly said that only 'controversy' sells. That is what is really going on here: overturning, revolutionary, transformative,.... are the kind of concepts they can write about.

And the journals know this and play the same game. At least, that's my view. You're just too nice a person compared to me!

Holly Dunsworth said...

I don't think it's niceness as much as weighing the worse of two evils... unseemly antics by egomaniacal ladder climbers don't actually matter compared to the real crisis in public science education. Creationists LOVE to talk about junk DNA. Every day another human who has been raised by lies gets closer to voting age.

Ken Weiss said...

A scary prospect! You're right.

Daniel M Parker said...

A question I have about these ‘new’ findings is: Would this hype even work if we weren’t talking about humans?
I think that most of these findings won’t be at all surprising for people who work at the molecular level. I don’t think I’ve heard someone mention ‘junk DNA,’ at least in a serious manner, in quite a while. However, I sometimes think people act a little silly when we start thinking and talking about humans. In the medical literature people get hyped up about those cures that are just around the corner and have a tendency to exaggerate findings too (e.g. reduced drug sensitivity becomes drug resistance in certain journals, etc.) But would this ‘discovery’ have a mass of articles (~30) and front page news if we were talking about a parasite? Or what if it was an organism that has very little or nothing to do with humans?

Ken Weiss said...

Today, no, and it's worse. It has to be sold as having to do with disease. No more could we spend what we do on space, these days, without the promise of finding life out there.

Too bad. It didn't used to be nearly so bad in this respect. Science journals, even the big ones, were black and white, with boring covers and no hype to speak of. It was better that way, for science at least.

But this also has to do with public support for science. Whether that is more or less than decades ago, or whether scientists just always want more, I can't say.

Amit said...

Regarding genetic load, there are so many computational methods of predicting deleteriousness (for lack of a better term) of amino acid changing polymorphisms I've lost track. But extending this to non-coding sites seems to be very open question. Especially if deleteriousness depends on genetic background.

Ken Weiss said...

First, the amino acid predictions are based on chemical properties etc. and suffer from the problem of predicting in isolation rather than, as you say, context.

For non-coding variation, there is little current way to predict harmfulness. Even regulatory binding sites tolerate variation.

What it boils down to is that empirically we have to _estimate_ effects from data--that is, retrospectively and in the contexts of our particular sample, because we have no good theory for most of the non-coding sites.

Certainly deleteriousness depends on contexts--including ecology, climate, other members of same or other species, other cells in the body, and so on.

And with so many sites contributing to sites, and many if not most contributing simultaneously to many traits ('pleiotropism'), the problems we face with current methods are clear.

Proper science in the classical sense predicts from theory, but we are constrained, in the absence of adequate theory to make inferences of a general statistical kind from limited past data and hope it will predict future data.

This is far to vast a topic for this reply, but we've dealt with some of it in past blog-posts.

You are sensing, I think, the nature and vastness of the challenge!

John R. Vokey said...

A propos of the larger (largest?) point, at least Science admits it is but a magazine, not a true (any more) scientific journal. As it now relegates the methods to ephemeral web sources, it can no more be considered a scientific journal of record. Nature has yet to admit the same, but we all know it is but a matter of time.

But, I want to raise a smaller (smallest?) point: Science (the magazine) assumes/almost demands that submissions be in MS Word format. Despite the fact that many (most?) real scientists prepare their manuscripts in LaTeX, Science provides at most pitiful/useless TeX/LaTeX/BibTeX submission support. Why do we, as the academic science community, tolerate this? We should shun Science magazine for that reason alone, never mind the fact that it publishes inadequate reports, and has lately become the mag for professional science shills.

Ken Weiss said...

where did they admit that? I would like to see it!

I have never used LaTeX, and wonder if there isn't a Word-LaTeX translator program avaiable. In any case, I assume this has to do with the practical aspects of physical publication, that is, how compositors--whatever they're called these days--actually set up the printing machines. Not just to treat scientists like secretaries (who use Word because you don't have to know much to type things*).

*Well, actually, I've insulted secretaries: they know a hell of a lot more about how to use Word and all its excessive features than I do!

Anne Buchanan said...

And John, of course your point about LaTeX reflects the larger issues about publication that are yet to be settled, as reflected in the Elsevier boycott and increasing use of arXiv and the push for open access and so on. As long as scientists feel the benefits of publishing in Science are greater than the benefits of having control over the process, they'll keep submitting to Science in Word. And of course Science counts on just that.

John R. Vokey said...

I meant that Science explicitly cites itself as "Science magazine", not a journal.

There are Word to LaTeX translators (as well as the reverse), but few of them work all that well. But my point was just how bizarre it is that the leading science publication uses a proprietary (and clumsy) word-processing program to typeset its articles, when the scientists submitting the articles are already using a brilliant, open-source typesetting language/system (i.e., TeX/LaTeX). For those who are convinced that a Word-like system is the only way to write, there is LyX---an open source, TeX/LaTeX system with a WYSIWYG interface.

Ken Weiss said...

They would not consider themselves to be a 'journal'? A surprise to me, though the truth is that they are a mix of journal and magazine. And while they're for a membership organization and should not in that sense be out pandering credit cards and so on and all their hyping, they might argue that this is the way to do the "Advancement of Science"

I have never used LaTeX, so I may have to give it a look, or more likely, explore LyX.....except that journals want Word.

It must be that old-style typesetting isn't done any more, and the new machine-editor interfaces are Word based, for better or worse.