Monday, August 14, 2017

The state of play in science

I've just read a new book that MT readers would benefit from reading as well.  It's Rigor Mortis, by Richard Harris (2017: Basic Books).  His subtitle is How sloppy science creates worthless cures, crushes hope, and wastes billions.  One might suspect that this title is stridently overstated, but while it is quite forthright--and its argument well-supported--I think the case is actually understated, for reasons I'll explain below.

Harris, science reporter for National Public Radio, goes over many different problems that plague biomedical research. At the core is the reproducibility problem, that is, the numbers of claims by research papers that are not reproducible by subsequent studies.  This particular problem made the news within the last couple of years in regard to using statistical criteria like p-values (significance cutoffs), and because of the major effort in psychology to replicate published studies, with a lot of failure to do so.  But there are other issues.

The typical scientific method assumes that there is a truth out there, and a good study should detect its features.  But if it's a truth, then some other study should get similar results.  But many many times in biomedical research, despite huge media ballyhoo with cheerleading by the investigators as well as the media, studies' breakthrough!! findings can't be supported by further examination.

As Harris extensively documents, this phenomenon is seen in claims of treatments or cures, or use of animal models (e.g., lab mice), or antibodies, or cell lines, or statistical 'significance' values.  It isn't a long book, so you can quickly see the examples for yourself.  Harris also accounts for the problems, quite properly I think, by documenting sloppy science but also the careerist pressures on investigators to find things they can publish in 'major' journals, so they can get jobs, promotions, high 'impact factor' pubs, and grants. In our obviously over-crowded market, it can be no surprise to anyone that there is shading of the truth, a tad of downright dishonesty, conveniently imprecise work, and so on.

Since scientists feed at the public trough (or depend on profits and sales for biomedical products to grant-funded investigators), they naturally have to compete and don't want to be shown up, and they have to work fast to keep the funds flowing in.  Rigor Mortis properly homes in on an important fact, that if our jobs depend on 'productivity' and bringing in grants, we will do what it takes, shading the truth or whatever else (even the occasional outright cheating) to stay in the game.

Why share data with your potential competitors who might, after all, find fault with your work or use it to get the jump on you for the next stage?  For that matter, why describe what you did in enough actual detail that someone (a rival or enemy!) might attempt to replicate your work.....or fail to do so? Why wait to publish until you've got a really adequate explanation of what you suggest is going on, with all the i's dotted and t's crossed?  Haste makes credit!  Harris very clearly shows these issues in the all-too human arena of our science research establishment today.  He calls what we have now, appropriately enough, a "broken culture" of science.

Part of that I think is a 'Malthusian' problem.  We are credited, in score-counting ways, by chairs and deans, for how many graduate students we turn (or churn) out.  Is our lab 'productive' in that way?  Of course, we need that army of what often are treated as drones because real faculty members are too busy writing grants or traveling to present their (students') latest research to waste--er, spend--much time in their labs themselves.  The result is the cruel excess of PhDs who can't find good jobs, wandering from post-doc to post-doc (another form of labor pool), or to instructorships rather than tenure-track jobs, or who simply drop out of the system after their PhD and post-docs.  We know of many who are in that boat; don't you?  A recent report showed that the mean age of first grant from NIH was about 45: enough said.

A reproducibility mirage
If there were one central technical problem that Harris stresses, it is the number of results that fail to be reproducible in other studies.  Irreproducible results leave us in limbo-land: how are we to interpret them?   What are we supposed to believe?  Which study--if any of them--is correct?  Why are so many studies proudly claiming dramatic findings that can't be reproduced, and/or why are the news media and university PR offices so loudly proclaiming these reported results?  What's wrong with our practices and standards?

Rigor Mortis goes through many of these issues, forthrightly and convincingly--showing that there is a problem.  But a solution is not so easy to come by, because it would require major shifting of and reform in research funding.  Naturally, that would be greatly resisted by hungry universities and those who they employ to set up a shopping-mall on their campus (i.e., faculty).

One purpose of this post is to draw attention to the wealth of reasons Harris presents for why we should be concerned about the state of play in biomedical research (and, indeed, in science more generally).  I do have some caveats, that I'll discuss below, but that is in no way intended to diminish the points Harris makes in his book.  What I want to add is a reason why I think that, if anything, Harris' presentation, strong and clear as it is, understates the problem.  I say this because to me, there is a deeper issue, beyond the many Harris enumerates: a deeper scientific problem.

Reproducibility is only the tip of the iceberg!
Harris stresses or even focuses on the problem of irreproducible results.  He suggests that if we were to hold far higher evidentiary standards, our work would be reproducible, and the next study down the line wouldn't routinely disagree with its predecessors.  From the point of view of careful science and proper inferential methods and the like, this is clearly true.  Many kinds of studies in biomedical and psychological sciences should have a standard of reporting that leads to at least some level of reproducibility.

However, I think that the situation is far more problematic than sloppy and hasty standards, or questionable statistics, even if they are clearly a prominent ones.  My view is that no matter how high our methodological standards are, the expectation of reproducibility flies in the face of what we know about life.  That is because life is not a reproducible phenomenon in the way physics and chemistry are!

Life is the product of evolution.  Nobody with open eyes can fail to understand that, and this applies to biological, biomedical, psychological and social scientists.  Evolution is at its very core a phenomenon that rests essentially on variation--on not being reproducible.  Each organism, indeed each cell, is different. Not even 'identical' twins are identical.

One reason for this is that genetic mutations are always occurring, even among the cells within our bodies. Another reason is that no two organisms are experiencing the same environment, and environmental factors affect and interact with the genomes of each individual organism of any species.  Organisms affect their environments in turn. These are dynamic phenomena and are not replicable!

This means that, in general, we should not be expecting reproducibility of results.  But one shouldn't overstate this because while obviously the fact that two humans are different doesn't mean they are entirely different.  Similarity is correlated with kinship, from first-degree relatives to members of populations, species, and different species.  The problem is not that there is similarity, it is that we have no formal theory about how much similarity.  We know two samples of people will differ both among those in each sample and between samples.  And, even the same people sampled at separate times will be different, due to aging, exposure to different environments and so on. Proper statistical criteria and so on can answer questions about whether differences seem only due to sampling from variation or from causal differences.  But that is a traditional assumption from the origin of statistics and probability, and isn't entirely apt for biology: since we cannot assume identity of individuals, much less of samples or populations (or species, as in using mouse models for human disease), our work requires some understanding of how much difference, or what sort of difference, we should expect--and build into our models and tests etc.

Evolution is by its very nature an ad hoc phenomenon in both time and place, meaning that there are no fixed rules about this, as there are laws of gravity or of chemical reactions. That means that reproducibility is not, in itself, even a valid criterion for judging scientific results.  Some reproducibility should be expected, but we have no rule for how much and, indeed, evolution tells us that there is no real rule for that.

One obvious and not speculative exemplar of the problem is the redundancy in our systems. Genomewide mapping has documented this exquisitely well: if variation at tens, hundreds, or sometimes even thousands of genome sites' affects a trait, like blood pressure, stature, or 'intelligence' and no two people have the same genotype, then no two people, even with the same trait measure have that measure for the same reason.  And as is very well known, mapping only accounts for a fraction of the estimated heritability of the studied traits, meaning that much or usually most of the contributing genetic variation is unidentified.  And then there's the environment. . . . .

It's a major problem. It's an inconvenient truth.  The sausage-grinder system of science 'productivity' cannot deal with it.  We need reform.  Where can that come from?

No comments: