Friday, July 27, 2012

Genomic scientists wanted: Healthy skepticism required

Everyone makes mistakes
...but geneticists make them more often.  A Comment in this week's Nature, "Methods: Face up to false positives" by Daniel MacArthur and accompanying editorial are getting a lot of notice around the web.  MacArthur's point is that biologists are too often too quick to submit surprising results for publication, and scientific journals too eager to get them into print.  Much more eager than studies that report results that everyone expected.

This is all encouraged by a lay press that trumpets these kinds of results often without understanding them and certainly without vetting them.  Often results are simply wrong, either for technical reasons or because statistical tests were inappropriate, wrongly done, incorrectly interpreted or poorly understood.  The evidence of this is that journals are now issuing many more retractions than ever before.

Peer review catches some of this before it's published, but not nearly enough; reviewers are often overwhelmed with requests and don't give a manuscript enough attention or sometimes aren't in fact qualified to do so adequately.  And journal editors are clearly not doing a good enough job.

But, as MacArthur says, "Few principles are more depressingly familiar to the veteran scientist: the more surprising a result seems to be, the less likely it is to be true."  And, he says, "it has never been easier to generate high-impact false positives than in the genomic era." And this is a problem because
Flawed papers cause harm beyond their authors: they trigger futile projects, stalling the careers of graduate students and postdocs, and they degrade the reputation of genomic research. To minimize the damage, researchers, reviewers and editors need to raise the standard of evidence required to establish a finding as fact.
It's, as the saying goes, a perfect storm.  The unrelenting pressure to get results that will be published in high-impact journals, and then The New York Times, which can make a career -- i.e., get a post-doc a job or any researcher more grants, tenure, and further rewards -- combined with journals' drive to be 'high-impact' and newspapers' need to sell newspapers all discourages time-consuming attention to detail.  And, as a commenter on the Nature piece said, in this atmosphere "any researcher who [is] more self-critical than average would be at a major competitive disadvantage."

That time-consuming attention to detail would include checking and rechecking data coming off the sequencer, questioning surprising results and redoing them, driven by the recognition that even the sophisticated technology biologists now rely on for the masses of data they are analyzing can and does make mistakes.  Which is why sequencing is often done 30 or more times before it's deemed good enough to believe.  But doing it right takes money as well as time.

Skepticism required
And a healthy skepticism (which we blogged about here), or, as the commenter said, some self-criticism.  You don't have to work with online genomic databases very long before it becomes obvious -- at least to the healthy skeptic -- that you have to check and recheck the data.  Long experience in our lab with these data bases has taught us that they are full of sequence errors that aren't retracted, annotation errors, incorrect sequence assemblies and so on.  And, results based on incorrect data are published and not retracted, but very obvious to, again, the healthy skeptic who checks the data. MacArthur cautions researchers to be stringent with quality control in their own labs, which is essential, but they also need to be aware that publicly available data are not error-free, so that results from comparative genomics must be approached with caution as well. 

We've blogged before about a gene mapping study we're involved in.  We've approached it as skeptics, and, we hope, avoided many common errors that way.  This of course doesn't mean that we've avoided all errors, or that we'll reach important conclusions, but at least our eyes are open.

But just yesterday we ran into another instance of why that's important, and how insidious database errors can be.  We are currently characterizing the SNPs (variants) in genes that differ between the strains of mice we're looking at to try to identify which are responsible for morphological differences between them.

The UCSC genome browser, an invaluable tool for bioinformatics, can show in one screen the structure of a gene of choice for numerous mammals.  One of the ways a gene is identified is by someone having found a messenger RNA 'transcript' (copy) of the DNA sequence.  That shows that the stretch of DNA that looks as if it might be a gene actually is one.  We were looking at a gene that our mapping has identified as a possible candidate of interest and noticed that it was much much shorter in mice than in any of the other mammals shown.  If we had just asked the data base for mouse genes in this chromosome region, we'd have retrieved just this short transcript.  We might have accepted this without thinking and moved on, but this is a very unlikely result given how closely related the listed organisms are so we knew enough to question the data.

But we checked the mouse DNA sequence and other data and, sure enough, longer transcripts corresponding more closely to what's been reported in other mammals have been reported in mice.  And additional parts of the possible gene, that correspond to what is known to be in other mammal transcripts, also exist in the mouse DNA.  This strongly suggests that nobody has reported the longer transcript, but that it most likely exists and is used by mice.  Thus, variation in the unreported parts of the mouse genome might be contributing to the evidence we found for an effect on head shape. But it took knowledge of comparative genomics and a healthy skepticism to figure out that there was something wrong with the original data as presented.

Not a new realization
There is a wealth of literature showing many reasons why first-reports of a new finding are likely to be misleading--either wrong or exaggerated. This is not a matter of dishonest investigators! But it is a matter of too-hasty ones. The reason is that if you search for things, those that by pure statistical fluke pop out are the ones that are going to be noticed. If you're not sufficiently critical of the possibility that they are artifacts of your study design, and you take the results seriously, you will report them to the major journals. And your career takes off!

A traditional football coach once said of forward passes, that there are three things that can happen (incomplete, complete, intercepted) and only one of them is he didn't like to pass. Something similar applies here: If you are circumspect, you may

1. later have the let-down experience of realizing that there was some error--not carelessness, just aspects of luck or things like problems with the DNA sequencer's ability to find variants in a sample, and so on.

2. Then you don't get your first Big Story paper, much less the later ones that refine the finding (that is, acknowledge it was wrong without actually saying so).

3. Worse, if it's actually right but you wait til you've appropriately dotted your i's and crossed your t's, somebody else might find the same thing and report it, and they get all the credit! You may be wrong and later data dampens your results, but nobody remembers the exaggeration, Nature and the NY Times don't retract the story, your paper still gets all the citations (nobody 'vacates' them the way Penn State's football victories were vacated by the NCAA), you already got your merit raise based on the win even when you lose!

So the pressures are on everyone to rush to judgment, and the penalties are mild (here, of course, we're not talking about any sort of fraud or dishonesty). Again, many papers and examples exist pointing the issues out, and the subject has been written about time and again. But in whose interest is it to change operating procedures?

Even so, it's refreshing to see this cautionary piece in a major journal. Will it make a difference? Not unless students are taught to be skeptical about results from the very start. And the journals' confessions aren't sincere: Tomorrow, you can safely bet that the same journals will be back to business as usual.


Holly Dunsworth said...

Is there a good, go-to repository on-line for posting confirming (but not new) results that people don't send to journals but that establish and reestablish fact over and over with maybe differing labs, machines, human subjects, human researchers? Is that what these genetic databases are? Or do their data have to go through journal peer-review first too?

Ken Weiss said...

NIH has tried to have _negative_ results reported by mandating reporting of results of all clinical trials. I heard directly not too long ago that Science (that august science magazine soon to be at a checkout counter near you) doesn't print rebuttals of its papers. They sometimes bury them in some online site. I can't remember if arsenic evolution paper was so ludicrous that they published refutations, but I think those were published in Science.

But confirmatory results rarely would be published by themselves (too boring!), though relevant types of papers get to appear in the minor journals that libraries have to subscribe to but that nobody reads.

There need to be stronger criteria for major findings to account for the various biases that lead to their over-interpretation, and less hyping of the results.

The problem is that in the frenetic world today, the journals have no real incentive to be tempered, and they (and the 24/7 media) are so hungry for stories that we'll not see this being addressed.

Anne Buchanan said...

Good question, Holly. There are or have been various databases and even journals for publishing negative results, which most journals are rarely willing to publish but which are obviously important. The paper last year in the reputable psychology journal showing that ESP is a real phenomenon is a case in point -- at least 2 studies unable to confirm those results were rejected by the same journal.

But negative results aren't the same as replicating or confirmatory results. These are often published in 'lesser' journals. But as for a repository, I don't know of any -- which doesn't mean there aren't any. The genomics databases are open-ended and people can publish anything, including duplicate data, in those. But this doesn't address the problem of errors in the data.

Submissions to the genetics databases are reviewed before they are published, but sequences aren't replicated then and catching sequencing errors is up to the scientist submitting the data.

The question of replicating results in genetics or epidemiology is not always straightforward though. Study populations can never be completely replicated, so non-replication can mean the data are wrong, due to technical errors or poor study design, say, or the data might be correct but the failure to replicate because of the inclusion of different subjects. So, 'gene for' studies very commonly don't replicate previous ones. All the studies might accurately represent their study populations but the 'truth' is different in each one. So failure to replicate can be as informative as replication. Or not.

Ken Weiss said...

Anne makes another, if much deeper, point. We want to use Enlightenment criteria of replicability, which is the underlying rationale for study designs and statistical analysis. Yet we have theory in both evolution and genetic mechanism that clearly shows why replicability is a problematic concept here.

There is no real incentive to absorbing that awareness. Eventually, we'll have to come to grips with the fact that causes can be real but not statistically documentable by the usual criteria.

This aspect of causal complexity is also a, if not, the major implication of evolution.

Some clever or lucky person will one day provide a better way to think. Today, we know the problem but choose to ignore it, preferring to chase rainbows. A fraction of biological causation is due to strong effects--that, too, is expected from evolutionary principles and what we know of genetic mechanisms.

So, whenever we have a substantial success, we proclaim it from the Journal-tops as if it's revolutionary, ignoring the majority of effort that isn't very successful. We do that in part for understandable venal reasons, but never ask whether the same investment of funds, talent, and effort might have had greater achievements.

Daniel said...

Hi Anne,

Thanks for the write-up of the piece - glad it's causing such constructive discussion!

I have just one minor correction:

"...but geneticists make them more often."

I don't actually think that's true. Genomics makes for easy pickings because (1) nearly all raw data is made public in the field, and (2) there's an active community of people willing to reanalyze and critique published findings. However, I suspect the underlying rates of error are similar or substantially higher in other fields - for instance, cell biology is a boiling mess of unreplicated, small-scale, hypothesis-driven studies, many or most of which are based on experimental and statistical noise.

Indeed, part of the reason I focused on genomics is because I'm optimistic that we can actually go some way towards fixing the problems in our field. For cell biology, on the other hand, I have less hope. :-)

Anne Buchanan said...

Thanks for your comment, Daniel. But ack, caught out on my flip lead-in! I agree with you, and certainly hope I didn't imply that this was what you'd said in your piece. Public data bases are indeed a plus in genomics, being open to correction and so on, but as our post said, should be approached with healthy skepticism.

Yes, epidemiology is another field with a high error rate, much of it due to statistical noise but also a fundamental difficulty dealing with complexity. Among other things. Public databases would be a plus!