Statistics is a form of mathematics, a way devised by humans for representing abstract relationships. Mathematics comprises axiomatic systems, which make assumptions about basic units such as numbers; basic relationships like adding and subtracting; and rules of inference (deductive logic); and then elaborates these to draw conclusions that are typically too intricate to reason out in other less formal ways. Mathematics is an awesomely powerful way of doing this abstract mental reasoning, but when applied to the real world it is only as true or precise as the correspondence between its assumptions and real-world entities or relationships. When that correspondence is high, mathematics is very precise indeed, a strong testament to the true orderliness of Nature. But when the correspondence is not good, mathematical applications verge on fiction, and this occurs in many important applied areas of probability and statistics.
You can't drive without a license, but anyone with R or SAS can be a push-button scientist. Anybody with a keyboard and some survey generating software can monkey around with asking people a bunch of questions and then 'analyze' the results. You can construct a complex, long, intricate, jargon-dense, expansive survey. You then choose who to subject to the survey--your 'sample'. You can grace the results with the term 'data', implying true representation of the world, and be off and running. Sample and survey designers may be intelligent, skilled, well-trained in survey design, and of wholly noble intent. There's only one little problem: if the empirical fit is poor, much of what you do will be non-sense (and some of it nonsense).
Population sciences, including biomedical, evolutionary, social and political fields are experiencing an increasingly widely recognized crisis of credibility. The fault is not in the statistical methods on which these fields heavily depend, but in the degree of fit (or not) to the assumptions--with the emphasis these days on the 'or not', and an often dismissal of the underlying issues in favor of a patina of technical, formalized results. Every capable statistician knows this, but of course might be out of business if openly paying it enough attention. And many statisticians may be rather disinterested or too foggy in the philosophy of science to understand what goes beyond the methodological technicalities. Jobs and journals depend on not being too self-critical. And therein lie rather serious problems.
There is the problem of the problems--the problems we want to solve, such as in understanding the cause of disease so that we can do something about it. When causal factors fit the assumptions, statistical or survey study methods work very well. But when causation is far from fitting the assumptions, the impulse of the professional community seems mainly to increase the size, scale, cost, and duration of studies, rather than to slow down and rethink the question itself. There may be plenty of careful attention paid to refining statistical design, but basically this stays safely within the boundaries of current methods and beliefs, and the need for research continuity. It may be very understandable, because one can't just quickly uproot everything or order up deep new insights. But it may be viewed as abuse of public trust as well as of the science itself.
The BBC Radio 4 program called More Or Less keeps a watchful eye on sociopolitical and scientific statistical claims, revealing what is really known (or not) about them. Here is a recent installment on the efficacy (or believability, or neither) of dietary surveys. And here is a FiveThirtyEight link to what was the basis of the podcast.
The promotion of statistical survey studies to assert fundamental discovery has been referred to as 'promissory science'. We are barraged daily with promises that if we just invest in this or that Big Data study, we will put an end to all human ills. It's a strategy, a tactic, and at least the top investigators are very well aware of it. Big long-term studies are a way to secure reliable funding and to defer delivering on promises into the vague future. The funding agencies, wanting to seem prudent and responsible to taxpayers with their resources, demand some 'societal impact' section on grant applications. But there is in fact little if any accountability in this regard, so one can say they are essentially bureaucratic window-dressing exercises.
Promissory science is an old game, practiced since time immemorial by preachers. It boils down to promising future bliss if you'll just pay up now. We needn't be (totally) cynical about this. When we set up a system that depends on public decisions about resources, we will get what we've got. But having said that, let's take a look at what is a growing recognition of the problem, and some suggestions as to how to fix it--and whether even these are really the Emperor of promissory science dressed in less gaudy clothing.
A growing at least partial awareness
The problem of results that are announced by the media, journals, universities, and so on but that don't deliver the advertised promises is complex but widespread, in part because research has become so costly, that some warning sirens are sounding when it becomes clear that the promised goods are not being delivered.
One widely known issue is the lack of reporting of negative results, or their burial in minor journals. Drug-testing research is notorious for this under-reporting. It's too bad because a negative result on a well-designed test is legitimately valuable and informative. A concern, besides corporate secretiveness, is that if the cost is high, taxpayers or share-holders may tire of funding yet more negative studies. Among other efforts, including by NIH, there is a formal attempt called AllTrials to rectify the under-reporting of drug trials, and this does seem at least to be thriving and growing if incomplete and not enforceable. But this non-reporting problem has been written about so much that we won't deal with it here.
Instead, there is a different sort of problem. The American Statistical Association has recently noted an important issue, which is the use and (often) misuse of p-values to support claims of identified causation (we've written several posts in the past about these issues; search on 'p-value' if you're interested, and the post by Jim Wood is especially pertinent). FiveThirtyEight has a good discussion of the p-value statement.
The usual interpretation is that p represents the probability that if there is in fact no causation by the test variable, that its apparent effect arose just by chance. So if the observed p in a study is less than some arbitrary cutoff, such as 0.05, it means essentially that if no causation were involved the chance you'd see this association anyway is no greater than 5%; that is, there is some evidence for a causal connection.
Trashing p-values is becoming a new cottage industry! Now JAMA is on the bandwagon, with an article that shows in a survey of biomedical literature from the past 25 years, including well over a million papers, a far disproportionate and increasing number of studies reported statistical significant results. Here is the study on the JAMA web page, though it is not public domain yet.
Besides the apparent reporting bias, the JAMA study found that those papers generally failed to provide adequate fleshing out of that result. Where are all the negative studies that statistical principles might expect to be found? We don't see them, especially in the 'major' journals, as has been noted many times in recent years. Just as importantly, authors often did not report confidence intervals or other measures of the degree of 'convincingness' that might illuminate the p-value. In a sense that means authors didn't say what range of effects is consistent with the data. They report a non-random effect, but often didn't give the effect size, that is, say how large the effect was even assuming that effect was unusual enough to support a causal explanation. So, for example, a statistically significant increase of risk from 1% to 1.01% is trivial, even if one could accept all the assumptions of the sampling and analysis.
Another vocal critic of what's afoot is John Ionnides; in a recent article he levels both barrels against the misuse and mis- or over-representation of statistical results in biomedical sciences, including meta-analysis (the pooling of many diverse small studies into a single large analysis to gain sufficient statistical power to detect effects and test for their consistency). This paper is a rant, but a well-deserved one, about how 'evidence-based' medicine has been 'hijacked' as he puts it. The same must be said of 'precision genomic' or 'personalized' medicine, or 'Big Data', and other sorts of imitative sloganeering going on from many quarters who obviously see this sort of promissory science as what you have to do to get major funding. We have set ourselves a professional trap, and it's hard to escape. For example, the same author has been leading the charge against misrepresentative statistics for many years, and he and others have shown that the 'major' journals have in a sense the least reliable results in terms of their replicability. But he's been raising these points in the same journals that he shows are culpable of the problem, rather than boycotting those journals. We're in a trap!
These critiques of current statistical practice are the points getting most of the ink and e-ink. There may be a lot of cover-ups of known issues, and even hypocrisy, in all of this, and perhaps more open or understandable tacit avoidance. The industry (e.g., drug, statistics, and research equipment) has a vested interest in keeping the motor running. Authors need to keep their careers on track. And, in the fairest and non-political sense, the problems are severe.
But while these issues are real and must be openly addressed, I think the problems are much deeper. In a nutshell, I think they relate to the nature of mathematics relative to the real world, and the nature and importance of theory in science. We'll discuss this tomorrow.