Showing posts with label correlation. Show all posts
Showing posts with label correlation. Show all posts

Thursday, June 19, 2014

Correlation, cause and how can you tell?

A very good BBC radio program called More or Less tries to point out uses and, mainly, misuses of statistics and data analysis, especially where policy or political claims and the like are involved.  A recent program (June 9) addressed the problem of confusing correlation with causation.  This is something everyone should know about, even if you're not a scientist.  Just because two things can be seen together does not mean that one causes the other.

The program mentioned an entertaining web site, Spurious Correlations, that you can find here.  It's run by Tyler Vigen, a Harvard law student, and it makes the correlation/causation problem glaringly obvious. There is even a feature for finding your own spurious correlation.

Number of people who died by becoming tangled in their bedsheets
correlates with
Total revenue generated by skiing facilities (US); Source: Spurious Correlations

How to tell if a correlation is spurious is no easy matter.  If many things are woven together in nature, or society, they can change together because of some over-arching shared factor.   But just because they are found together does not mean that in any practical sense they are causally related. Statistical significance of the correlation, meaning that it is unlikely to arise by chance, is a subjective judgment about what you count as 'unlikely.' After all, very unlikely events do occur!

Causation can be indirect and in a sense it is not always easy to understand just what it means for one thing to 'cause' another.  If wealth leads to buying racy cars and racy cars are less safe, is it the driving or the car, or the wealth that 'causes' accidents?  If AIDS can be a result of HIV infection, but you can get some symptoms of aids without HIV, or have HIV without the symptoms, does HIV cause AIDS? If the virus is indeed responsible, but only drug users or hookers get or transmit it, is the virus, the drug-use, or prostitution, or using prostitutes the 'cause'?

Another problem, besides just the way of thinking of causation, and the judgment about when a correlation is 'significant', relates to what you measure and how you search for shared patterns.  If you look through enough randomly generated patterns, eventually you'll find ones that are similar, with absolutely no causal connection between them.

Looking at the examples on the above web site should be a sobering lesson in how to recognize bad, or overstated, or over-reported science.  It won't by itself answer the question about how to determine when a correlation means causation.  Nobody has really solved that one, if indeed it has any sort of single answer.  And there are some curious things to think about.

Just what is causation?
In a purely Newtonian, deterministic universe, in a sense everything is causally connected to everything else, and was determined by the Big Bang.  For example, with universal gravity everything literally affects everything else and through a totally deterministic causal process.

In that sense nothing at all is truly probabilistic.  But quantum mechanics and various related principles of physical science hold that some things may really be truly probabilisitic rather than deterministic. If that is right, then the idea of a 'cause' becomes rather unclear.  How can some outcome truly occur only with some probability?  It verges on an effect without an actual cause.  For example, if the probability of something happening is, say, 15%, what establishes that value--what causes it?  A systematic or random process with a truly random cause, that is not just our inability to measure it precisely, in a sense redefines the very notion of cause.  Such things, some of them seemingly true of the quantum world, really do violate common sense.  So the whole idea of correlation vs causation takes on many different, subtle colors.

Monday, May 20, 2013

Retirement harmful to health or... an uncertainty principle?

Years and years:  but who's counting?
Breaking news!  As reported by the BBC ("Retirement Harmful to Health"): "...the chances of becoming ill appear to increase with the length of time spent in retirement."  Even more astonishing, the effect is the same for men and women. 
The study, published by the Institute of Economic Affairs (IEA), a think tank, found that retirement results in a "drastic decline in health" in the medium and long term.
The IEA said the study suggests people should work for longer for health as well as economic reasons.
This is of course just as astonishing as the fact that having more birthdays increases your lifespan (someone must have won a Nobel prize for that discovery! or at least got a headline story in the NY Times Science supplement).

Retirement is, of course, highly correlated with aging, which is, obviously, highly correlated with length of retirement and, of course, aging is highly correlated with ill health.  Further, people still working but already in ill health are more likely to retire than people healthy and still able to work well into old age.  And, since the report considers mental as well as physical health, it's also relevant that people with an ill spouse may be more likely to retire, which may increase their chances of becoming depressed.  So if this study had reached any other conclusions than that retirement is correlated with ill health, that would have been worthy of headlines.

It turns out that the background to the report treats the question in a relatively nuanced way, even if the conclusions are much less nuanced. E.g., from the report:
...evidence suggests that poorer health increases the likelihood of retirement. When looking at health and retirement it is therefore very difficult to separate cause from effect. In addition, a plethora of variables that cannot be observed are likely to bias results in any empirical studies -- and it is difficult to predict the direction of the bias.
Further, "Theoretically, the impact of retirement on health is far from certain."  "Other mechanisms by which retirement can affect health appear equally ambiguous."  "...an observed correlation between retirement and health says nothing about causation."  "Overall, the most methodologically convincing research on the health effects of retirement is rather mixed. This is likely to be due to researchers employing different research strategies and data."

But, they do report, from interview data with 7000 - 9000 people after varying numbers of years of retirement, more self-reported mental illness, more prescription drug usage, more diagnosed physical problems, and so on among retired people than those still working, and finally conclude that retirement is harmful to health.  Indeed, the report is titled "Work Longer, Live Healthier" so there's no missing their point.

It's true that other studies have found positive effects of retirement, but the authors write, "The results have been cross-checked against the methodologies used in earlier research studies, and it has been found that the positive impact of retirement on health found in earlier studies is, at the very least, partly due to shortcomings in that research."

Well, in fact it's hard to know how to do a study that would properly answer the retirement/health question.  When poor health can 'cause' retirement and retirement can (says this report) 'cause' poor health, how do cause and effect get teased out? 

A study comparing cases and controls would be problematic; can 'controls' who are still working be assumed to match retired 'cases' if the variable being measured (health) can affect whether someone works or retires? And, aging and ill health are already highly correlated, as are retirement and aging.  So disentangling cause from effect is inherently difficult.  

And several other things.
The risks and histories clearly involve cultural and lifestyle factors.  These change all the time, and indeed are affected by stories like the current one that, in itself, might lead readers of the story not to retire, because they'll think they're committing suicide to do it.  And what about all sorts of other factors like smoking history, involvement in wars and economic crashes and their harmful effects, and who knows what else that affects our health and our attitudes on a daily basis?  One thing that is certain: our exposure to those things in the future, which would affect the issue of health after retirement, is uncertain, in principle: as any study that purports to project results into an unknown future environment, this study cannot have any easily knowable implications for years beyond the immediate future at best.

And, if you have a relative who died early from, say cancer or a coronary, you may be driven to retire early to have a chance at enjoying life before your number comes up, if you think your relative's experience reflects your own vulnerabilities.  Or conversely, if your father lived to 110, you might think you will too, so why not sock away a few extra years' pension funds, publish some more astonishing research papers, or whatever.  That is, what may be irrelevant or at least unmeasured factors can confound this type of study.  Even knowing that you don't have to retire, may affect what you decide.  Or seeing what happens to your peers as they drop out of the office and/or off their perch.

The analogy with quantum mechanics: the Heisenberg principle
In a sense, what we see here, at least potentially, is something like the phenomenon in quantum mechanics in which an electron or photon exists as a wave, until you measure it.  Then, it collapses to a point, but because you've measured, say, its location, you can no longer measure its momentum.  The reason is that the very act of observing and measuring it changes its behavior.  This, loosely speaking, is the Heisenberg uncertainty principle.

Here, too, there is a quite similar-seeming uncertainty principle:  the very act of doing the study and publishing its results will affect the future course of the very people whose future you're trying to predict with your data.  The relevant behavior of the people you studied, and others who read the research, is affected by the fact that you did the study.  How that alters behavior is uncertain and basically not knowable.

But, the bad news that retirement is harmful to people's health is good news for governments looking to save money. Raising the age at which people can begin to draw their pensions is one way to save a lot of money, because people will contribute to retirement funds for more years and draw it for fewer. We just wish the evidence were sturdier.

Thursday, December 30, 2010

The problem of correlation and causation is solved.....or not

The BBC Radio 4 program, More or Less, is a show about statistics, how they are used and abused in reporting the news. Among other regular messages, the presenters spend a lot of time explaining that correlation is not causation, which of course is something we like to hear, since we say it a lot on MT, too (e.g., here).

For the 12/17 show, they decided to test science journalists in Britain, to see whether they'd bite on a correlation/causation story they cooked up, or whether they were by now savvy enough not to.  The numbers were true, but the mathematician on the show tried to sell the idea that one caused the other, hoping it would warrant a spot on the news.

This guy's story was that there's an extremely strong correlation between the number of mobile phone towers in a given location and number of births.  In fact, each tower is correlated with 17.4 births, to be precise.  A small village with only 1 tower will have very few births, and a city with a lot of towers will have many more.  Well, no one bit.  Or rather, one media outlet bit on the story, Radio Wales, who wanted to talk with him about the problem of confusing correlation and causation.  Apparently it was pretty obvious.

At first look, though, the mathematician assumed it would appear that the number of towers causes an increase in births.  But in fact, of course, both the number of towers and the number of births are a consequence of population size.  They are confounded by population size, an unmeasured variable that affects both observed variables.  And, regular readers know that the issue of confounding is another frequent feature of MT.

The More or Less presenter is hoping that the fact that this story had basically no takers means that British science journalists are beginning to get the correlation-doesn't-equal-causation message, though as the mathematician pointed out, a recent story about mobile phone use causing bad behavior at school suggests otherwise.  And, a glance at the BBC science or health pages is an almost daily confirmation that the problem persists, something we also point out on an as-needed basis.

But that's not really what interested us about this story.  What interested us was what happened next, when the presenter asked the mathematician why making causal links was so appealing to humans, given that they are so often false.

The mathematician answered that it's just our instinct, our brains have developed to recognize patterns and respond to them.   He said we think of patterns as causal links because 'we survive better that way.' Our ancestors thought that the movement of the stars causes seasons to change, for example -- and....somehow that allowed them to live longer. Thus, he said, it's hard to overcome our instinct to assign causality.

Translated, what he meant was that we evolved to make sense of patterns by finding causal links between two things.  (If true, this certainly isn't unique to humans -- we used to have a dog who was terrified when the wind closed a door.  But if a human closed it, that was perfectly fine.  She actually did understand causation!)

But, isn't the mathematician making the very same error he cautions against?  Because we evolved, and because we can see patterns, one caused the other?  This is also something we write a lot about, the idea that because a trait exists, it has an adaptive purpose -- the Just-So approach to anthropology, or genetics.  Many things come before many other things, but that doesn't help identify causal principles that connect  particular sets of things.  And, correlation can be made between variables in many different ways.  Most are not known to us, or at least we're usually just guessing about what the truth is.