There is a valuable discussion in Nature about the problems that have arisen related to the (mis)use of statistics for decision-making. To simplify the issue, it is the idea that a rather subjectively chosen cutoff, or p, value leads to dichotomizing our inferences, when the underlying phenomena may or may not be dichotomous. For example, in a simplistic way to explain things, if a study's results pass such a cutoff test, it means that the chance the observed result would arise if nothing is going on (as opposed to the hypothesized effect) is so small--less than p percent of the time--that we accept the data as showing that our suggested something is going on. In other words, rare results (using our cutoff criterion for what 'rare' means) are considered to support our idea of what's afoot. The chosen cutoff level is arbitrary and used by convention, and its use doesn't reflect the various aspects of uncertainty or alternative interpretations that may abound in the actual data.
The Nature commentaries address these issues in various ways, and suggestions are made. These are helpful and thoughtful in themselves but they miss what I think is a very important, indeed often the critical point, when it comes to their application in many areas of biology and social science.
Instrumentation errors
In these (as other) sciences, various measurements and technologies are used to collect data. These are mechanical, so to speak, and are always imperfect. Sometimes it may be reasonable to assume that the errors are unrelated to what is being measured (for example, their distribution is unrelated to the value of a given instance) and don't affect what is being measured (as quantum measurements can do), then correcting for them in some reasonably systematic way, such as assuming normally distributed errors, clearly helps adjust findings for the inadvertent but causally unconnected errors.
Such corrections seem to apply quite validly to social and biological, including evolutionary and genetic, sciences. We'll never have perfect instrumentation or measurement, and often don't know the nature of our imperfections. Assuming errors uncorrelated with what is being sought seems reasonable even if approximate to some unknown degree. It's worked so well in the past that this sort of probabilistic treatment of results seems wholly appropriate.
But instrumentation errors are not the only possible errors in some sciences.
Conceptual errors: you can't 'correct' for them in inappropriate studies
Statistics is, properly, a branch of mathematics. That means it is an axiomatic system, an if-then way to make deductions or inductions. When and if the 'if' conditions are met, the 'then' consequences must follow. Statistics rests on probabilism rather than determinism, in the sense that it relates to and is developed around, the idea that some phenomena only occur with a given probability, say p, and that such a value somehow exists in Nature.
It may have to do with the practicalities of sampling by us, or by some natural screening phenomenon (as in, say, mutation, Mendelian transmission, natural selection). But it basically always rests on some version or other of an assumption that the sampling is parametric, that is, that our 'p' value somehow exists 'out there' in Nature. If we are, say, sampling 10% of a population (and the latter is actually well-defined!) then each draw has the same properties. For example, if it is a 'random' sample, then no property of a potential samplee affects whether or not it is actually sampled.
But note there is a big 'if' here: Sampling or whatever process is treated as probabilistic needs to have a parameter value! It is that which is used to compute significance measures and so on, from which we draw conclusions based on the results of our sample.
Is the universe parametric? Is life?
In physics, for example, the universe is assumed to be parametric. It is, universally, assumed to have some properties, like gravitational constant, Planck's constant, the speed of light, and so on. We can estimate the parameters here on earth (as, for example, Newton himself suggested), but assume they're the same elsewhere. If observation challenges that, we assume the cosmos is regular enough that there are at least some regularities, even if we've not figured them all out yet.
A key feature of a parametric universe is replicability. When things are replicable, because they are parametric--have fixed universal properties, then statistical estimates and their standard deviations etc. make sense and should reflect the human-introduced (e.g., measurement) sources of variation, not Nature's. Statistics is a field largely developed for this sort of context, or others in which sampling was reasonably assumed to represent the major source of error.
In my view it is more than incidental, but profound, that 'science' as we know it was an enterprise developed to study the 'laws' of Nature. Maybe this was the product of the theological beliefs that had preceded the Enlightenment or, as I think at least Newton said, 'science' was trying to understand God's laws.
In this spirit, in his Principia Mathematica (his most famous book), Newton stated the idea that if you understand how Nature works in some local example, what you learned would apply to the entire cosmos. This is how science, usually implicitly, works today. Chemistry here is assumed to be the same as chemistry on any distant galaxy, even those we cannot see. Consistency is the foundation upon which our idea of the cosmos and in that sense, classical science has been built.
Darwin was, in this sense, very clearly a Newtonian. Natural selection was a 'force' he likened to gravity, and his idea of 'chance' was not the formal one we use today. But what he did observe, though implicitly, was that evolution was about competing differences. In this sense, evolution is inherently not parametric.
Not only does evolution rest heavily on probability--chance aspects of reproductive success, which Darwin only minimally acknowledged, but it rests on each individual's own reproductive success being unique. Without variation, and that means variation in the traits that affect success, not just 'neutral' ones, there would be no evolution.
In this sense, the application of statistics and statistical inference in life sciences is legitimate relative to measurement and sampling issues, but is not relevant in terms of the underlying assumptions of its inferences. Each study subject is not identical except for randomly distributed 'noise', whether in our measurement or in its fate.
Life has properties we can measure and assign average values to, like the average reproductive success of AA, Aa, and aa genotypes at a given gene. But that is a retrospective average, and it is contrary to what we know about evolution to assume that, say, all AA's have the same fitness parameter and their reproductive variation is only due to chance sampling from that parameter.
Thinking of life in parametric terms is a convenience, but is an approximation of unknown and often unknowable inaccuracy. Evolution occurs over countless millennia, in which the non-parametric aspects can be dominating. We can estimate, say, recombination or mutation or fitness values from retrospective data, but they are not parameters that we can rigorously apply to the future and they typically are averages among sampled individuals.
Genetic effects are unique to each background and environmental experience, and we should honor that uniqueness as such! The statistical crisis that many are trying valiantly to explain away, so they can return to business as usual (even if not reporting p values) is a crisis of convenience, because it makes us think that a bit of different reportage (confidence limits rather than p values, for example) will cure all ills. That is a band-aid that is a convenient port-in-a-storm, but an illusory fix. It does not recognize the important, or even central, degree to which life is not a parametric phenomenon.
No comments:
Post a Comment