...but geneticists make them more often. A Comment in this week's Nature, "Methods: Face up to false positives" by Daniel MacArthur and accompanying editorial are getting a lot of notice around the web. MacArthur's point is that biologists are too often too quick to submit surprising results for publication, and scientific journals too eager to get them into print. Much more eager than studies that report results that everyone expected.
This is all encouraged by a lay press that trumpets these kinds of results often without understanding them and certainly without vetting them. Often results are simply wrong, either for technical reasons or because statistical tests were inappropriate, wrongly done, incorrectly interpreted or poorly understood. The evidence of this is that journals are now issuing many more retractions than ever before.
Peer review catches some of this before it's published, but not nearly enough; reviewers are often overwhelmed with requests and don't give a manuscript enough attention or sometimes aren't in fact qualified to do so adequately. And journal editors are clearly not doing a good enough job.
But, as MacArthur says, "Few principles are more depressingly familiar to the veteran scientist: the more surprising a result seems to be, the less likely it is to be true." And, he says, "it has never been easier to generate high-impact false positives than in the genomic era." And this is a problem because
Flawed papers cause harm beyond their authors: they trigger futile projects, stalling the careers of graduate students and postdocs, and they degrade the reputation of genomic research. To minimize the damage, researchers, reviewers and editors need to raise the standard of evidence required to establish a finding as fact.It's, as the saying goes, a perfect storm. The unrelenting pressure to get results that will be published in high-impact journals, and then The New York Times, which can make a career -- i.e., get a post-doc a job or any researcher more grants, tenure, and further rewards -- combined with journals' drive to be 'high-impact' and newspapers' need to sell newspapers all discourages time-consuming attention to detail. And, as a commenter on the Nature piece said, in this atmosphere "any researcher who [is] more self-critical than average would be at a major competitive disadvantage."
That time-consuming attention to detail would include checking and rechecking data coming off the sequencer, questioning surprising results and redoing them, driven by the recognition that even the sophisticated technology biologists now rely on for the masses of data they are analyzing can and does make mistakes. Which is why sequencing is often done 30 or more times before it's deemed good enough to believe. But doing it right takes money as well as time.
And a healthy skepticism (which we blogged about here), or, as the commenter said, some self-criticism. You don't have to work with online genomic databases very long before it becomes obvious -- at least to the healthy skeptic -- that you have to check and recheck the data. Long experience in our lab with these data bases has taught us that they are full of sequence errors that aren't retracted, annotation errors, incorrect sequence assemblies and so on. And, results based on incorrect data are published and not retracted, but very obvious to, again, the healthy skeptic who checks the data. MacArthur cautions researchers to be stringent with quality control in their own labs, which is essential, but they also need to be aware that publicly available data are not error-free, so that results from comparative genomics must be approached with caution as well.
We've blogged before about a gene mapping study we're involved in. We've approached it as skeptics, and, we hope, avoided many common errors that way. This of course doesn't mean that we've avoided all errors, or that we'll reach important conclusions, but at least our eyes are open.
But just yesterday we ran into another instance of why that's important, and how insidious database errors can be. We are currently characterizing the SNPs (variants) in genes that differ between the strains of mice we're looking at to try to identify which are responsible for morphological differences between them.
The UCSC genome browser, an invaluable tool for bioinformatics, can show in one screen the structure of a gene of choice for numerous mammals. One of the ways a gene is identified is by someone having found a messenger RNA 'transcript' (copy) of the DNA sequence. That shows that the stretch of DNA that looks as if it might be a gene actually is one. We were looking at a gene that our mapping has identified as a possible candidate of interest and noticed that it was much much shorter in mice than in any of the other mammals shown. If we had just asked the data base for mouse genes in this chromosome region, we'd have retrieved just this short transcript. We might have accepted this without thinking and moved on, but this is a very unlikely result given how closely related the listed organisms are so we knew enough to question the data.
But we checked the mouse DNA sequence and other data and, sure enough, longer transcripts corresponding more closely to what's been reported in other mammals have been reported in mice. And additional parts of the possible gene, that correspond to what is known to be in other mammal transcripts, also exist in the mouse DNA. This strongly suggests that nobody has reported the longer transcript, but that it most likely exists and is used by mice. Thus, variation in the unreported parts of the mouse genome might be contributing to the evidence we found for an effect on head shape. But it took knowledge of comparative genomics and a healthy skepticism to figure out that there was something wrong with the original data as presented.
Not a new realization
There is a wealth of literature showing many reasons why first-reports of a new finding are likely to be misleading--either wrong or exaggerated. This is not a matter of dishonest investigators! But it is a matter of too-hasty ones. The reason is that if you search for things, those that by pure statistical fluke pop out are the ones that are going to be noticed. If you're not sufficiently critical of the possibility that they are artifacts of your study design, and you take the results seriously, you will report them to the major journals. And your career takes off!
A traditional football coach once said of forward passes, that there are three things that can happen (incomplete, complete, intercepted) and only one of them is good....so he didn't like to pass. Something similar applies here: If you are circumspect, you may
1. later have the let-down experience of realizing that there was some error--not carelessness, just aspects of luck or things like problems with the DNA sequencer's ability to find variants in a sample, and so on.
2. Then you don't get your first Big Story paper, much less the later ones that refine the finding (that is, acknowledge it was wrong without actually saying so).
3. Worse, if it's actually right but you wait til you've appropriately dotted your i's and crossed your t's, somebody else might find the same thing and report it, and they get all the credit! You may be wrong and later data dampens your results, but nobody remembers the exaggeration, Nature and the NY Times don't retract the story, your paper still gets all the citations (nobody 'vacates' them the way Penn State's football victories were vacated by the NCAA), you already got your merit raise based on the paper.....you win even when you lose!
So the pressures are on everyone to rush to judgment, and the penalties are mild (here, of course, we're not talking about any sort of fraud or dishonesty). Again, many papers and examples exist pointing the issues out, and the subject has been written about time and again. But in whose interest is it to change operating procedures?
Even so, it's refreshing to see this cautionary piece in a major journal. Will it make a difference? Not unless students are taught to be skeptical about results from the very start. And the journals' confessions aren't sincere: Tomorrow, you can safely bet that the same journals will be back to business as usual.