Monday, February 17, 2014

In search of scientific blunders

The current (March 6,  2014) issue of The New York Review of Books includes a review (paywall) by physicist Freeman Dyson of a new book by Mario Livio called Brilliant Blunders: Colossal Mistakes by Great Scientists that Changed Our Understanding of Life and the Universe. The book provides the story of five famous mistakes by giants of science, Charles Darwin, William Thomson (Lord Kelvin), Linus Pauling, Fred Hoyle, and Albert Einstein, but Dyson writes that after reading the book, he now sees mistakes by great scientists everywhere.  Isaac Newton, James Clerk Maxwell, Gregor Mendel, himself. The thesis is that these great and greatly influential thinkers made major mistakes but essentially because they were thoughtful about it and had good reasons relative to then-current theory, the price in reputation they paid, or that society paid, was slight (unlike military or other real-world blunders).

Darwin ignored Mendel and Mendel ignored Darwin.  But who cares?  They did so despite vastly important work that, well, that worked.  In a sense, their mistakes reflect the boldness of their contributions.  Dyson argues that "The greatest scientists are the best losers."

In contrast to these bold thinkers, most modern scientists, caught in the current system, we think, are pressured to play safe and too often that's just what they do.  We have major problems facing us in terms of understanding biological evolution and causation and as we've recently written, we personally think the name of the game today is generally just that: play safe, don't make blunders, because if you do, your continued success in the system is at risk.  No tenure, no grants, no promotions.  (Note, we say 'blunders', that is, mistakes or wrong guesses even if for good reasons, but not fraud or other culpable misdeeds).

We've tried to take the basic statistical basis of modern evolutionary and genomic inference to task for clinging to conservative, established modes of thought rather than addressing what we believe are genuine, deep problems that are not (we feel) amenable to multivariate sampling statistics and their kin.

The proof isn't in the pudding!
The current peer review system is supposed to be a quality control system for grants and publications, so that work is judged fairly on merits alone.  Of course like all large institutional systems it has its flaws.  Some are easy trades for fairness rather than Old Boy-ism, but the system also very strongly and clearly leads to safe conventionalism.  It can accommodate real innovation, but it makes that hard.  But we have at least lived under the illusion that citation counts are an accurate recognition of quality.

Now, a new commentary in Science ("Peering into Peer Review," Jeffrey Mervis) reports that this doesn't seem to be so.  The results of two studies of the citation records of papers reporting the results of citation counts of nearly 1500 studies funded by the National Heart, Lung, and Blood Institute of the NIH between 2001 and 2008.  At least among those that were funded, there is no sense that better priority meant higher impact; the funded projects with the poorest review scores were cited as many times as the studies with the highest scores.  Michael Lauer, head of the division of Cardiovascular Sciences at the NHLBI who carried out the studies, was surprised.
"Peer review should be able to tell us what research projects will have the biggest impacts," Lauer contends.  "In fact, we explicitly tell scientists it's one of the main criteria for review.  But what we found is quite remarkable.  Peer review is not predicting outcomes at all.  And that's quite disconcerting."
In recent years, 8-10% or so of grant applications have been funded, so even grants with the lowest priority scores are in the top tier so it's no surprise if they are all being cited equally.  But, as Mervis points out, the fraction of grants funded was much higher in the early years of these studies, and presumably, if peer review ranking of applications is at all meaningful, there should have been quality differences between the highest and lowest ranked grants.  Another difference, that probably should have made a difference in quality of results, was that the lowest ranked grants generally received less funding.

If citation counts actually reflect on quality, this indicates that a whole lot of good research is not being funded these days -- which of course is no surprise.  However, this study highlights something else about citations, too, if in an oblique way: we're not talking about a lot of citations.  According to the commentary, each publication, whether from a grant that had a top, middle or lower tier priority score, was cited 13 or 14 times.  To us, that's not high impact.  Perhaps the equal paucity of citations should be of more concern than that high and low quality grants are equally cited.  Perhaps this shows the unsurprising arbitrary aspect of grant ratings and reviews, but it also shows something that not many people talk about, which is that most publications are mostly ignored.  This isn't a surprise either, given the number of papers published every year, but it should probably be more sobering than it seems to be.

Or could something beyond these issues be going on?

There is more than bad luck here--much more!
Perhaps if we were really funding scientific quality--innovation, creative thinking, and really bellying-up to core, serious problems, we should expect a strong negative correlation.  The top scores should go to deep, creative proposals that, almost by nature, are likely to fail.  These would not be crank or superficial proposals, but seriously good probing of the most important (and hence truly highest priority) research.  Most of us know very well that being really creative is a sure way to not be funded, and that is widely (if privately) acknowledged: write a safe, incremental proposal, promise major results even if you know that's unlikely, and then use the money to do something you really think is new.

If not a strong negative correlation overall, at least there should be a few really new findings that got lots of citations and recognition, but most top-scoring projects went nowhere. That, at least, could be an indicator that we're funding the best real science.

One wonders how, or if, the Large Hadron Collider would have reported their very large experiment if they had not found evidence for the Higgs Boson--or, are they forcing a positive interpretation out of their statistical data?  We can't judge, but the power of wishful thinking is always there we think a clear negative would have been reported as such.

One can say that much of today's science is going nowhere or littlewhere (GWAS and much else), though the 'negative' results are often safe, highly publishable and lauded results that a big-data study is guaranteed to find. The first few times no real map results were achieved, and we saw that polygenic traits really turned out to be polygenic, we might have had a major positive finding: a theory, even if a very generic one, was confirmed.  Of course, a GWAS study may make a big strike, but more typically, instead of saying 'our GWAS basically found nothing, showing that our idea was wrong and some other idea is needed',  even a big strike-out can be made into a big story as if there is a real lesson learned; minor hits are lauded.  Indeed, empty big data results are usually not viewed as the anathema of a failed theory, but highly published and publicized to justify even bigger studies of the same type.

The kind of busts that regularly result in the safe market that predominates today are not really tests of any theory, and so do not come up with a definitively empty results.  They are not high quality blunders. And even these top-ranked, generally safe studies,  are not reflecting distinctions between priority scores and citations.

If what we want is safe science, in areas where we just need routine work on some new project, even if it's important such as a universal flu vaccine or data on obesity and disease, then we could establish an Engineering Science funding pool and build a firewall between that highly relevant but specific goal-oriented work, and basic science.  Nominally, NSF should be the place for the latter and NIH for the former, but that's not how things are today, because both have become the social welfare net for universities and their faculty.  Good work does get done, but a lot of trivial work is supported even if it's of good technical quality, and the pressure is to do that,while actual innovation is in a sense punished by the conservative nature of the system.

It would be terrific if the very best science tested real ideas that went nowhere.  Then we could encourage development that would lead to more blunders, and the correlated emergence of more Darwins and Einsteins.


  1. "Mendel ignored Darwin"

    I remember reading somewhere that Darwin never heard of Mendel (too insignificant pea-grower), but Mendel was keenly aware of Darwin's theory.

    1. Darwin had a book that referred to Mendel's work but either didn't cut the pages or did not seem to have been aware of it. Origin was quickly (and I've read not well) translated and Mendel heavily annotated it but didn't seen to think it relevant, Mendel was studying inherited elements that didn't change but of course Darwin's ideas were all about change.

  2. We have a blog post today on the citation, grant and other issues related to one sub-field and one specific ENCODE researcher. It is built on top of Lior Pachter's trio of commentaries (you can find them from our sidebar).