Tuesday, June 14, 2011

Cold chills and statistical mischief

In a Beeb story, Dr Phil Jones, a climatologist suggests that we now have evidence that global warming is real.  This is appropriate for MT because it reflects some strange, yet almost universally accepted, criteria for deciding what is really 'real'.  It is an issue biologists, evolutionary or genomic or otherwise, including anthropologists and others dealing with human variation should be more than just aware of, but should integrate into their thought-processes.

Previously, says Dr Jones (as quoted by the reporter), we didn't have enough information to be sure,
"but another year of data has pushed the trend past the threshold usually used to assess whether trends are 'real'".  Dr Jones says this shows the importance of using longer records for analysis.
Now what could this mean?  How can something become 'real' just because we have another set of data?  Dr Jones explains:
The trend over the period 1995-2009 was significant at the 90% level, but wasn't significant at the standard 95% level that people use," Professor Jones told BBC News.
If this doesn't strike you as strange, it should!  How can another year make something 'real'?  How can a 95% 'level' make something real?  Or if it makes it 'significant', does that make it real?  Or is there a difference?  Or if it's 'significant', is it important?

This is uncritical speaking that makes science seem like a kind of boardwalk shell game.  Find the pea and win a teddy bear!

In fact, what we mean is that by convention (that is, by a totally subjective and voluntary agreement), if something is likely to happen 'only' by chance once in 20 times, and it actually happens, we judge that it's due to factors other than chance. One in 20 means a 5% chance.  We call that the p value of a significance test.  (this is the complement--same meaning--as the 95% level referred to in the story).  And here significance is a very poor, if standard, word choice.  We would be better using a more neutrally descriptive term like 'unusuality' or 'rareness'.

In fact, global warming is either real or it's not (assuming we can define 'global warming').  Regardless of the cause--or the real significance for worldly affairs--the global temparature is always changing, so the questions really are something like: 'on average, is the global mean temperature rising more than usual, or in a way reflecting a long-term trend?'

Further, those who 'believe' in global warming--are convinced on various diverse grounds that it's happening, the 'mere' 90% level of previous years' data did not convince them that global warming wasn't taking place.  Indeed, if before now we didn't have data showing the trend at the 5% level, how on earth (so to speak) did anyone ever think to argue that this was happening?

There is absolutely no reason to think that very weak effects, that can never be detected by standard western statistical criteria are not 'real'.  They could even be 'significant': a unique mutation can kill you!

Perhaps a better way for this story to be told is that another year of data reinforced previous evidence that the trend was continuing, or accelerating, that its unusuality got greater, and that this is consistent with evidence from countless diverse sources (glacial melting, climate changes, biotic changes, and so on).

Suppose that no single factor was responsible for climate change, but instead that thousands of tiny factors were, and suppose further that climate change was too incremental to pass the kind of statistical significance test we use.  Global warming and its effects could still be as real as rain but not subject to this kind cutoff-criterion thinking.

This is precisely (so to speak) the problem facing GWAS and other aspects of genomic inference, and of reconstructing evolutionary histories.  p-value thinking is rigid, century old rigid criterion-of-convenience, with no bearing on real-world causality--on whether something is real or not real.  It may be that we would say if an effect is so weak that its p-value is more than 0.05, it's not important enough to ask for a grant to follow up that finding.  Hah!  We have yet to see anyone act that way!  If you believe there's an effect you'll argue your way out of unimpressive p-values.

And, again, even if the test-criterion is not passed, the effect could be genuine.  On the other side, and here we've got countless (sad) GWAS-like examples of trivially weak things that did pass some p-value test, and that fact is used to argue that the effect is 'significant' (hinting that that means 'important').

Statistical inference has been a powerful way that institutionalized science can progress in an orderly way.  But it is a human, cultural approach, not the only possible approach, and it has serious weaknesses that, because they are inconvenient, are usually honored in the breach.

No comments: