I've just read a new book that MT readers would benefit from reading as well. It's Rigor Mortis, by Richard Harris (2017: Basic Books). His subtitle is How sloppy science creates worthless cures, crushes hope, and wastes billions. One might suspect that this title is stridently overstated, but while it is quite forthright--and its argument well-supported--I think the case is actually understated, for reasons I'll explain below.
Harris, science reporter for National Public Radio, goes over many different problems that plague biomedical research. At the core is the reproducibility problem, that is, the numbers of claims by research papers that are not reproducible by subsequent studies. This particular problem made the news within the last couple of years in regard to using statistical criteria like p-values (significance cutoffs), and because of the major effort in psychology to replicate published studies, with a lot of failure to do so. But there are other issues.
The typical scientific method assumes that there is a truth out there, and a good study should detect its features. But if it's a truth, then some other study should get similar results. But many many times in biomedical research, despite huge media ballyhoo with cheerleading by the investigators as well as the media, studies' breakthrough!! findings can't be supported by further examination.
As Harris extensively documents, this phenomenon is seen in claims of treatments or cures, or use of animal models (e.g., lab mice), or antibodies, or cell lines, or statistical 'significance' values. It isn't a long book, so you can quickly see the examples for yourself. Harris also accounts for the problems, quite properly I think, by documenting sloppy science but also the careerist pressures on investigators to find things they can publish in 'major' journals, so they can get jobs, promotions, high 'impact factor' pubs, and grants. In our obviously over-crowded market, it can be no surprise to anyone that there is shading of the truth, a tad of downright dishonesty, conveniently imprecise work, and so on.
Since scientists feed at the public trough (or depend on profits and sales for biomedical products to grant-funded investigators), they naturally have to compete and don't want to be shown up, and they have to work fast to keep the funds flowing in. Rigor Mortis properly homes in on an important fact, that if our jobs depend on 'productivity' and bringing in grants, we will do what it takes, shading the truth or whatever else (even the occasional outright cheating) to stay in the game.
Why share data with your potential competitors who might, after all, find fault with your work or use it to get the jump on you for the next stage? For that matter, why describe what you did in enough actual detail that someone (a rival or enemy!) might attempt to replicate your work.....or fail to do so? Why wait to publish until you've got a really adequate explanation of what you suggest is going on, with all the i's dotted and t's crossed? Haste makes credit! Harris very clearly shows these issues in the all-too human arena of our science research establishment today. He calls what we have now, appropriately enough, a "broken culture" of science.
Part of that I think is a 'Malthusian' problem. We are credited, in score-counting ways, by chairs and deans, for how many graduate students we turn (or churn) out. Is our lab 'productive' in that way? Of course, we need that army of what often are treated as drones because real faculty members are too busy writing grants or traveling to present their (students') latest research to waste--er, spend--much time in their labs themselves. The result is the cruel excess of PhDs who can't find good jobs, wandering from post-doc to post-doc (another form of labor pool), or to instructorships rather than tenure-track jobs, or who simply drop out of the system after their PhD and post-docs. We know of many who are in that boat; don't you? A recent report showed that the mean age of first grant from NIH was about 45: enough said.
A reproducibility mirage
If there were one central technical problem that Harris stresses, it is the number of results that fail to be reproducible in other studies. Irreproducible results leave us in limbo-land: how are we to interpret them? What are we supposed to believe? Which study--if any of them--is correct? Why are so many studies proudly claiming dramatic findings that can't be reproduced, and/or why are the news media and university PR offices so loudly proclaiming these reported results? What's wrong with our practices and standards?
Rigor Mortis goes through many of these issues, forthrightly and convincingly--showing that there is a problem. But a solution is not so easy to come by, because it would require major shifting of and reform in research funding. Naturally, that would be greatly resisted by hungry universities and those who they employ to set up a shopping-mall on their campus (i.e., faculty).
One purpose of this post is to draw attention to the wealth of reasons Harris presents for why we should be concerned about the state of play in biomedical research (and, indeed, in science more generally). I do have some caveats, that I'll discuss below, but that is in no way intended to diminish the points Harris makes in his book. What I want to add is a reason why I think that, if anything, Harris' presentation, strong and clear as it is, understates the problem. I say this because to me, there is a deeper issue, beyond the many Harris enumerates: a deeper scientific problem.
Reproducibility is only the tip of the iceberg!
Harris stresses or even focuses on the problem of irreproducible results. He suggests that if we were to hold far higher evidentiary standards, our work would be reproducible, and the next study down the line wouldn't routinely disagree with its predecessors. From the point of view of careful science and proper inferential methods and the like, this is clearly true. Many kinds of studies in biomedical and psychological sciences should have a standard of reporting that leads to at least some level of reproducibility.
However, I think that the situation is far more problematic than sloppy and hasty standards, or questionable statistics, even if they are clearly a prominent ones. My view is that no matter how high our methodological standards are, the expectation of reproducibility flies in the face of what we know about life. That is because life is not a reproducible phenomenon in the way physics and chemistry are!
Life is the product of evolution. Nobody with open eyes can fail to understand that, and this applies to biological, biomedical, psychological and social scientists. Evolution is at its very core a phenomenon that rests essentially on variation--on not being reproducible. Each organism, indeed each cell, is different. Not even 'identical' twins are identical.
One reason for this is that genetic mutations are always occurring, even among the cells within our bodies. Another reason is that no two organisms are experiencing the same environment, and environmental factors affect and interact with the genomes of each individual organism of any species. Organisms affect their environments in turn. These are dynamic phenomena and are not replicable!
This means that, in general, we should not be expecting reproducibility of results. But one shouldn't overstate this because while obviously the fact that two humans are different doesn't mean they are entirely different. Similarity is correlated with kinship, from first-degree relatives to members of populations, species, and different species. The problem is not that there is similarity, it is that we have no formal theory about how much similarity. We know two samples of people will differ both among those in each sample and between samples. And, even the same people sampled at separate times will be different, due to aging, exposure to different environments and so on. Proper statistical criteria and so on can answer questions about whether differences seem only due to sampling from variation or from causal differences. But that is a traditional assumption from the origin of statistics and probability, and isn't entirely apt for biology: since we cannot assume identity of individuals, much less of samples or populations (or species, as in using mouse models for human disease), our work requires some understanding of how much difference, or what sort of difference, we should expect--and build into our models and tests etc.
Evolution is by its very nature an ad hoc phenomenon in both time and place, meaning that there are no fixed rules about this, as there are laws of gravity or of chemical reactions. That means that reproducibility is not, in itself, even a valid criterion for judging scientific results. Some reproducibility should be expected, but we have no rule for how much and, indeed, evolution tells us that there is no real rule for that.
One obvious and not speculative exemplar of the problem is the redundancy in our systems. Genomewide mapping has documented this exquisitely well: if variation at tens, hundreds, or sometimes even thousands of genome sites' affects a trait, like blood pressure, stature, or 'intelligence' and no two people have the same genotype, then no two people, even with the same trait measure have that measure for the same reason. And as is very well known, mapping only accounts for a fraction of the estimated heritability of the studied traits, meaning that much or usually most of the contributing genetic variation is unidentified. And then there's the environment. . . . .
It's a major problem. It's an inconvenient truth. The sausage-grinder system of science 'productivity' cannot deal with it. We need reform. Where can that come from?
Showing posts with label reproducibility. Show all posts
Showing posts with label reproducibility. Show all posts
Monday, August 14, 2017
Wednesday, May 30, 2012
Magical science: now you see it, now you don't. Part II: How real is 'risk'?
By
Ken Weiss
Why is it that after countless studies, we don't know whether to believe the latest hot-off-the-press pronouncement of risk factors, genetic or environmental, for disease, or of assertions about the fitness history of a given genotype? Or in social and behavioral science....almost anything! Why are scientific studies, if they really are science, so often not replicated when the core tenet of science is that causes determine outcomes? Why should we have to have so many studies of the same thing, even decade after decade? Why do we still fund more studies of the same thing? Is there ever a time when we say Enough!?
That time hasn't come yet, and partly that's because professors have to have new studies to keep our grants and our jobs, and we do what we know how to do. But there are deeper reasons, without obvious answers, and they're important to you if you care about what science is, or what it should be--or what you should be paying for.
Last Thursday, we discussed some aspects of the problem when a set of causes that we suspect work only by affecting the probability of an outcome we're interested in. The cause may truly be deterministic, but we just don't understand it well enough, so must view its effect in probability terms. That means we have to study a sample, of repeated individuals exposed to the risk factor we're interested in, in the same way you have to flip a coin many times to see if it's really fair--if its probability of coming up Heads is really 50%. You can't just look at the coin or flip it once.
Nowadays, reports are often of meta-analysis, in which, because it is believed that no single study is definitive (i.e., reliable), we pool them and analyze the lot, that is, the net result of many studies, to achieve adequate sample sizes to see what risk really is associated with the risk factor. It should be a warning in itself that the samples of many studies (funded because they claimed and reviewers expected them to be adequate to the task) are now viewed as hopelessly inadequate. Maybe it's a warning that the supposed causes are weak to begin with--too weak for this kind of approach to be very meaningful?
Why, among countless examples, after having done many studies don't we know if HDL cholesterol does or doesn't protect from heart disease, or antioxidants from cancer, or coffee is a risk factor, or obesity is, or how to teach language or math, or avoid misbehavior of students, or whether criminality is genetic (or is a 'disease'), and so on--so many countless examples from the daily news, and you are paying for this, study after study without conclusive results, every day!
There are several reasons. These are serious issues, worthy of the attention of anyone who actually cares about understanding truth and the world we live in, and its evolution. The results are important to our society as well as to our basic understanding of the world.
So, then, why are so many results not replicable?
Here are at least some reasons to consider:
This situation--and our list is surely not exhaustive--is typical and pervasive in observational rather than experimental science. (In the same kinds of problems, lists just as long exist to explain why some areas even of experimental science don't do much better!)
A recent Times commentary and post of ours discussed these issues. The commentary says that we need to make social science more like experimental physical science with better replications and study designs and the like. But that may be wrong advice. It may simply lead us down an endless, expensive path that simply fails to recognize the problem. Social sciences already consider themselves to be real science. And presenting peer-reviewed work that way, they've got their fingers as deeply entrenched into the funding pot as, say genetics does.
Whether coffee is a risk factor for disease, or certain behaviors or diseases are genetically determined, or why some trait has evolved in our ancestry...these are all legitimate questions whose non-answers show that there may be something deeply wrong without current methods and ideas about science. We regularly comment on the problem. But there seems to be no real sense that there's an issue being recognized, in opposition to the forces that pressure scientists to continue business as usual---which means that we continue to do more and more and more-expensive studies of the same things.
One highly defensible solution would be to cut support for such non-productive science until people figure out a better way to view the world, and/or that we require scientists to be accountable for their results. No more, "I write the significance section of my grants with my fingers crossed behind my back" because I know that I'm not telling the truth (and the reviewers, who do the same themselves, know that you are doing that).
As it is, resources go to more and more and more studies of the same that yield basically little, students flock to large university departments that teach them how to do it, too, journals and funders make their careers reporting their research results, and policy makers follow the advice. Every day on almost any topic you will see in the news "studies show that....."
This is no secret: we all know the areas in which the advice goes little if anywhere. But politically, we haven't got the nerve to make such cuts and in a sense we would be lost if we had nobody assessing these issues. What to do is not an easy call, even if there were the societal will to act.
That time hasn't come yet, and partly that's because professors have to have new studies to keep our grants and our jobs, and we do what we know how to do. But there are deeper reasons, without obvious answers, and they're important to you if you care about what science is, or what it should be--or what you should be paying for.
Last Thursday, we discussed some aspects of the problem when a set of causes that we suspect work only by affecting the probability of an outcome we're interested in. The cause may truly be deterministic, but we just don't understand it well enough, so must view its effect in probability terms. That means we have to study a sample, of repeated individuals exposed to the risk factor we're interested in, in the same way you have to flip a coin many times to see if it's really fair--if its probability of coming up Heads is really 50%. You can't just look at the coin or flip it once.
Nowadays, reports are often of meta-analysis, in which, because it is believed that no single study is definitive (i.e., reliable), we pool them and analyze the lot, that is, the net result of many studies, to achieve adequate sample sizes to see what risk really is associated with the risk factor. It should be a warning in itself that the samples of many studies (funded because they claimed and reviewers expected them to be adequate to the task) are now viewed as hopelessly inadequate. Maybe it's a warning that the supposed causes are weak to begin with--too weak for this kind of approach to be very meaningful?
Why, among countless examples, after having done many studies don't we know if HDL cholesterol does or doesn't protect from heart disease, or antioxidants from cancer, or coffee is a risk factor, or obesity is, or how to teach language or math, or avoid misbehavior of students, or whether criminality is genetic (or is a 'disease'), and so on--so many countless examples from the daily news, and you are paying for this, study after study without conclusive results, every day!
There are several reasons. These are serious issues, worthy of the attention of anyone who actually cares about understanding truth and the world we live in, and its evolution. The results are important to our society as well as to our basic understanding of the world.
So, then, why are so many results not replicable?
Here are at least some reasons to consider:
1. If no one study is trustworthy, why on earth would pooling them be?Overall, when this is the situation, the risk factor is simply not a major one!
2. We are not defining the trait of interest accurately
3. We are always changing the definition of the trait or how we determine its presence or absence
4. We are not measuring the trait accurately
5. We have not identified the relevant causal risk factors
6. We have not measured the relevant risk factors accurately
7. The definition of the risk factors is changing or vague
8. The individual studies are each accurate, and our understanding of risk is in error
9. Some of the studies being pooled are inaccurate
10. The first study or two that indicated risk were biased (see our post on replication), and should be removed from meta-analysis....and if that were done the supposed risk factor would have little or no risk.
11. The risk factor's effects depend on its context: it is not a risk all by itself
12. The risk factor just doesn't have an inherent causal effect: our model or ideas are simply wrong
13. The context is always changing, so the idea of a stable risk is simply wrong
14. We have not really collected samples that are adequate for assessing risk (they may not be representative of the population at-risk)
15. We have not collected large enough samples to see the risk through the fog of measurement error and multiple contributing factors
16. Our statistical models of probability and sampling are not adequate or are inappropriate for the task at hand (usually, the models are far too simplified, so that at best they can be expected only to generate an approximate assessment of things)
17. Our statistical criteria ('significance level') are subjective but we are trying to understand an objective world
18. Some causes that are really operating are beyond what we know or are able to measure or observe (e.g., past natural selection events)
19. Negative results are rarely published, and so meta-analyses cannot include them, so a true measure of risk is unattainable
20. The outcome has numerous possible causes; each study picks up a unique, real one (familial genetic diseases, say), but it won't be replicable in another population (or family) with a different cause that is just as real
21. Population-based studies can never in fact be replicated because you can never study the same population--same people, same age, same environmental exposures--at the same time, again
22. The effect of risk factors can be so small--but real--that it is swamped by confounding, unmeasured variables.
This situation--and our list is surely not exhaustive--is typical and pervasive in observational rather than experimental science. (In the same kinds of problems, lists just as long exist to explain why some areas even of experimental science don't do much better!)
A recent Times commentary and post of ours discussed these issues. The commentary says that we need to make social science more like experimental physical science with better replications and study designs and the like. But that may be wrong advice. It may simply lead us down an endless, expensive path that simply fails to recognize the problem. Social sciences already consider themselves to be real science. And presenting peer-reviewed work that way, they've got their fingers as deeply entrenched into the funding pot as, say genetics does.
Whether coffee is a risk factor for disease, or certain behaviors or diseases are genetically determined, or why some trait has evolved in our ancestry...these are all legitimate questions whose non-answers show that there may be something deeply wrong without current methods and ideas about science. We regularly comment on the problem. But there seems to be no real sense that there's an issue being recognized, in opposition to the forces that pressure scientists to continue business as usual---which means that we continue to do more and more and more-expensive studies of the same things.
One highly defensible solution would be to cut support for such non-productive science until people figure out a better way to view the world, and/or that we require scientists to be accountable for their results. No more, "I write the significance section of my grants with my fingers crossed behind my back" because I know that I'm not telling the truth (and the reviewers, who do the same themselves, know that you are doing that).
As it is, resources go to more and more and more studies of the same that yield basically little, students flock to large university departments that teach them how to do it, too, journals and funders make their careers reporting their research results, and policy makers follow the advice. Every day on almost any topic you will see in the news "studies show that....."
This is no secret: we all know the areas in which the advice goes little if anywhere. But politically, we haven't got the nerve to make such cuts and in a sense we would be lost if we had nobody assessing these issues. What to do is not an easy call, even if there were the societal will to act.
Subscribe to:
Posts (Atom)