We found that primary studies whose outcome included behavioral parameters were generally more likely to report extreme effects, and those with a corresponding author based in the US were more likely to deviate in the direction predicted by their experimental hypotheses, particularly when their outcome did not include additional biological parameters. Nonbehavioral studies showed no such “US effect” and were subject mainly to sampling variance and small-study effects, which were stronger for non-US countries.That is, in the behavioral sciences, where, Fanelli and Ioannidis say, methodology is less rigorous than in hard sciences like genetics, investigators tend to report extreme effects, in favor of their hypothesis, more often than in hard sciences, particularly if at least one of the investigators is in the US. The effect is small, it must be said; results from authors in the US are on average about 5% larger than those of others. The authors' preferred explanation for this is that pressure to publish is greater in the US, though they did not observe any US effect in publications of genetics research, where pressure is certainly as great, or even greater since many practitioners are in medical schools and need a steady grant and publication stream to receive promotion or even their salaries.
Further, they write, results tend to be less replicable in the behavioral sciences, where methodology is softer and less standardized than in medical sciences. Indeed, the non-replicability of studies in psychology has gotten a lot of press in the last year or so. E.g., The Replicability Project is attempting to reproduce all the studies published in three psychology journals in 2008, and is ongoing. But, GWAS (genomewide association studies) are notoriously irreproducible.
Numerous biases that affect the kinds of data that are published are well-known; the preference for publication of positive results, as reported in this paper, is a big one, but there are others. Fanelli and Ioannidis note that the 'decline effect', when the strength of effect of some cause declines over time (indeed, the first reports of an effect are generally the strongest) and the 'early extremes effect', or the tendency of early reports of a given effect to vacillate between opposite extremes, are two more.
There are, of course, other issues that hinder science, including malfeasance and incompetence, but also thin or inappropriate methodology, ignorance of important factors that should be accounted for but aren't, the fact that samples of people can never be truly replicated and so on, issues that have been discussed by philosophers of science for a long time.
Well, ok, but John Ioannidis was interviewed on the BBC program, Inside Science, on August 29. He described the work reported in the paper, as we have above, but what he had to say next was fairly revealing of his own biases. He went on to explain that it's easier to manipulate the data, however innocently, in the behavioral sciences than in genetics, because genetics "is something that is very firm and hard and can be measured with very high precision so we can measure genes very accurately." And, he said, it's straightforward to find the cause of a genetic disease.
In the behavioral sciences, however, the outcomes are mushier -- "anything that relates to some psychological outcome or some behavioral trait, or what people do or don't do in their lives" -- and so harder to measure or analyze with accuracy. "Behavioral outcomes are softer; we have to struggle on how exactly we can measure them and we can have more leeway on how we will analyze the associations with behavioral traits or personality or what people do or how they behave."
Perhaps it's true that behavioral outcomes are harder to standardize, though there is generally a lot of variability even in traits caused by a single gene, but it's manifestly not so that unequivocal answers are routine in genetics, or that non-replicability of results is not a problem, or that methodology is always firm and certain, or that confounding isn't a potential issue. These problems have been a running theme on this blog for years. It is only by a genetics-leaning bias that one could really suggest that genetic studies (like genomewide mapping) yield replicable, firm, meaningful results.
So, we'd argue that the differences Fanelli and Ioannidis found in the kinds of biases they uncovered in genetics vs the behavioral sciences aren't due nearly as much as they believe to the kind of scientific problems each field represents. Instead, if real, we'd suggest they are more likely to be due to the sociology of each field. And perhaps more.
Some things to think about
The paper presents some interesting points. John Ionnidis is a leader in advocating and applying meta-analysis, and in analyzing multiple studies. At the same time, he has written numerous studies on quality and reliability of survey studies. He is far more sanguine about genetics than we think is accurate, and we don't think his idea that we can precisely measure genes and their causation of disease is reliably. In fact, we often cannot measure disease, phenotypes, with precision or objectivity, so outcomes can be subtly adjusted in genetic studies. Genetics, more like chemistry, may seem more rigorous, but we adjust after the fact all the time: if we don't like a result we redefine the disease. Autism is one of many examples. When genetic studies don't find what we were looking for, we (believing genes must be responsible) cut up the disease definition into subsets and then find genes whose variation causes those. This isn't necessarily cheating and may indeed be a form of improved knowledge: but the point is that geneticists do routinely play fast and loose with their outcome definitions and with the way they measure or model the effects of genetic variation.
As we have noted in earlier posts (here, e.g.) you do not have 'a' genotype. Each of your cells has a somewhat different genotype. And the effect of a given target gene often, if not usually, depends on its genomic context. And it is cellularly local expression levels that affect traits and these are affected by DNA modification as well as DNA sequence itself. For evolutionary reasons, pooling separate studies raises all sorts of serious questions--and the profession seems to be seduced by those cases, and they certainly do exist, when things work out reliably and replicably. Even failure to replicate elsewhere is consistent with a given study's findings.
It seems also true, as the PNAS paper suggests, both that careerist pressures in the US, and vague notions of measurement, trait, outcome and causation can be expected to plague behavioral studies. Complex behaviors simply can't be reduced to simple cause and effect!! ........or might they?
If complex behaviors are the result of many genes acting together the behavior could at least in principle be stable and predictable even if no single gene or genotype had predictive power for the same behavior. Also, perhaps experimental science concepts about evaluating samples and so on don't really apply. For example, behaviorists and indeed all of us, observe behaviors all day every day. We would come to studies with much more information about behavior, even if it is not conscious awareness or as well-defined as, say, doing a genome sequence. So perhaps the statistical results reflect the fact that the answer was already known before the study was done--that is, the study's 'null hypothesis' is bogus and the alternative hypothesis already quite likely before the study is ever even proposed.
On the other hand, it is very easy to see such biases, soft-headed opinionation and so on as evidence that there is plenty wrong with behavioral research and the results must indicate some problematic bias. And the US bias and lower level of replicability lend credence to the Fanelli and Ioannidis argument.
At least, there are serious things to think about beyond what someone basically embedded in standard statistical thinking might consider from that perspective.