Wednesday, March 12, 2014

What's the point? Causation strong, weak, and undetectable

We often write and, yes, complain about the kinds of sample-based survey statistical studies of health risk, and this week's news about the cause of obesity is another good reason to revisit the subject.  (In case you missed it, we've got two options this week, it's either sugar or antibiotics.)  Whether the risk factor is genetic or environmental, the fields use similar conceptual approaches.  Essentially, they are 19th century concepts largely originating in physics but applied to society at the time, and now dressed in 20th century computational and ready-made statistical methods.

To drink, or not to drink. Is that a question?

Here's a wonderful send up of the problem on the BBC Radio 4 program, The News Quiz.  (Start at 13:05.)  "Remember the Mediterranean Diet?  The last time I was in the Mediterranean, we had kebabs and San Miguel!"  "One week we are told to drink red wine, and the next week, white."  And so on. When the inability of science to determine risk factors becomes the butt of jokes, it's no laughing matter. (H/T Tom Farsides.)

Or is this all white to drink?
When there is what is referred to as point causation, a single major factor with very high predictive power, the approach works well.  Somewhat different methods are and have been used for genetic causation (Mendelian inheritance methods) or infectious or some environmental disease causes, when a single exposure or an exposure to a single 'point' cause is responsible.   If the cause has strong effects, even probabilistic aspects of its action or appearance are well-analyzed by fundamental statistical ideas and methods. These depend on replication (basically, that each observation is a repeat of the others) and a high-level of what essentially amounts to causal determinism by the risk factor, even if there are probabilistic aspects involved (such as the chance of being exposed to a pathogen, or of inheriting a Mendelian disease variant).

A good and interesting survey of the way that probabilistic and statistical thinking entered western science is given by Ian Hacking's book The Taming of Chance (Cambridge University Press, 1990). Hacking retraces debates over whether causation was just apparently probabilistic or was fundamentally so, and how 19th century's versions of 'big data', in the form of national statistical survey data used for things like insurance rate determination, legitimized what was essentially the borrowing of methods from physics to apply them to society.

Because of the success of epidemiology during the infectious disease (pre-antibiotic) era, and in genetics the Mendelian segregation analysis era, the idea grew that we could, with sufficient samples, apply them essentially to all forms of causation.  That has led to today's obsession with Big Data and elaborate (often off-the-shelf) statistical methods, basically the same cookie-cutter approach to any question.

But there is a problem with that.  It's what we often write about here, because we think it is a serious one.  We think that for reasons of cultural and scientific inertia and lack of sufficiently imaginative training, we cling to these methods even though we have good theoretical as well as experiential reasons for knowing they are inappropriate--they basically don't work, but we do them anyway.

When there are many different factors at play, varying among samples or populations, and most of the factors are either of low prevalence (rare in the population) or weak effect, the gussied-up 19th century mode of thinking is just not very effective at analysis or prediction.  That is why, no matter how one may defend the continuation of big-investment, big-data, long-term, high-throughput investment, we are daily seeing the uncertain, changeable, non-definitive results reported in the news and in the journals themselves.  Questions like the risks associated per gram of sugar, or of our dietary intake of antibiotics in food, or of PSA testing or regular mammography, or GWAS and other claims of genes 'for' this or that trait, are changing daily with each study being reported by the authors and the media as if, somehow, it is now, finally, definitive.

To increase sample size and get results, without having to actually think about the problem or frame well-posed questions, researchers propose exhaustive national or international studies, high-throughput data analysis, meta-analysis and the like.  Of course, if you look for associations and use statistical tests of different kinds, you will always get a 'result'.  But among other things, the approaches basically assume a well-behaved signal-to-noise ratio, that is, that the bigger the sample the closer the estimates of risk will be to the (assumed) underlying truth with less 'noise' due to measurement and sampling errors. That assumes that as we add new samples, their heterogeneity will be less than their collective refinement of the risk estimates--less error in those estimates--assuming they have true values. We know very well and from very extensive experience that there are reasons why this assumption is unlikely to be true, that are borne out in the data, but the expediency of proposing just to collect more, more, more seems too convenient to abandon.

Likewise, the idea that risk factors and their relative frequencies will be the same in the future as in our samples from the past.  But we know that that is bunkum even from a genetic point of view (variants' frequencies depend heavily on samples chosen, new variants are introduced and others lost), as well as environmental (lifestyles change in ways that are inherently unpredictable, and predicting environmental effects on gene expression, perhaps of genes never before found to be involved in disease, is also equally impossible).

In the face of daily now you see it, now you don't stories, always hyped up by the professors, the funders, and the media, one might expect that the science would stop and take stock, realizing that we are driving a model-T on a modern highway and need to do differently.  But that seems just too difficult.  Like turning a tanker around in the ocean, it's a slow, cumbersome process.  And, to a great extent, we have not yet collectively agreed that we even have a problem (even though we know very well that we do).

Of course, it's at least possible that what we see is what there is: that there simply isn't a better way besides fickle statistical association analysis to understand the risks we want to understand.  At least, then, we should show that that is the case with some definitive evidence demonstrating that it's impossible to do better.  But even that would require more careful thought than is being given to these problems.

In truth can anyone not realize that a significant factor in the current way of doing business is that investigators know that fickle results are good business--that one never need say a study is over or a question answered, and can keep asking for more data, more computers, longer, larger studies?

Strong point causation is like a drug.  Its initial pleasures addict us to hope for the high of simple answers for every question.  But the truth, weak causation, has lured us into addiction.


Holly Dunsworth said...

Have you ever thought about convening a thinktank session (small conference) about this problem? It seems to be one too big for graduate students to take on, so instead they take on the mainstream/momentum-fueled way and nothing changes as they progress through their careers. We need people who aren't entering the field and trying to make their way to take this on and at full brain ahead. Who would fund it? McArthur?

Holly Dunsworth said...

Not that a small conference would be enough, but...the general idea of something like this.

Ken Weiss said...

A few months ago in an MT series (search on 'Solvay') we started suggesting this sort of thing, noting some very strange genomic phenomena that aren't being adequately addressed by current methods.

Any sorts of discussions or workshops that frankly acknowledges the problem and addresses what to do about it (other than asking SAS or SPSS to add some new analysis modules, or asking NIH to fund even larger data resources) would have the potential to do some good.

Holly Dunsworth said...

Has anyone expressed interest?

Ken Weiss said...

At the time, yes, and a tad since then. One old-time colleague at Michigan seems to have begun thinking about how we could do this.

But everyone's busy and it could be quite a threatening thing even to broach, so nothing really.

Holly Dunsworth said...

threatening how?

Ken Weiss said...

Nobody likes to admit it, but in science as in any area of human affairs, anyone who suggests change threatens vested interests. Here the science-industrial-health complex is a huge vested interest.

These interests are venal and selfish, but they are also about the real world. If I discover a new theory, or make a new machine, that may upset the apple cart _but_ that shows everyone how to do new projects, get new grants, identify new questions, then they will seize it.

So the resistance and feeling of threat is the inertia of business as usual (or, one may say, Kuhn's idea of 'normal science'). That's why people criticize critics by saying that their criticism is inapt unless they suggest something better.

That's an understandable human reaction, but rather like a juror saying s/he'll vote 'guilty' because, while there is no evidence that the accused did it, the prosecution hasn't identified who actually did!

Holly Dunsworth said...

Aren't there enough retired brilliant minds who don't give a bleep and want to do something monumental?

Ken Weiss said...

They are too few, too often rutted in their decades of old ways, rarely brilliant, without resources....and maybe too old for the task all round!

Real change has to come from the smart young ones (like you) who see a widespread problem and don't want to have to spend their next 40 years trying to walk in the same quicksand.

But some of us lesser lights at least try!

Holly Dunsworth said...

There are too few established and retired scientists who are willing to change science?

Ken Weiss said...

Well, most of us have been programmed by decades of work and success even to be able to re-think in a profound way. We can try. But again, it is the young that have to do it. You might enjoy reading Kuhn's book if you never have.

Or, come to that, and since it relates to anatomy in the latter parts, Ludwig Fleck's classic (that anticipated Kuhn's book on scientific revolutions) "Genesis and Development of a Scientific Fact"

Anonymous said...

"Borrowing from physics".

Wasn't most of statistics developed by Economists?

Holly Dunsworth said...

Thanks Ken, but I'm not an epidemiologist or geneticist or statistics whiz or anything remotely right for what this issue needs. I'm not the one to carry this out. I thought maybe I could help stoke a fire in those who are better poised, like you.

Ken Weiss said...

My knowledge is very limited. Much of the theory was developed by physicists and mathematicians, but certainly sociopolitical interests also were involved, largely because of the eventual recognition, in the late 1700s and/or early 1800's (I can't remember in detail) who observed regularities and consistency in many aspects of society.

Depending on what you mean by 'economists', much of the application was by them. As to what one means by 'development' as opposed to 'usage', that's another question.

But the very interesting aspect is that societies, including economies, do have regularities despite somehow being composed of independent agents (people...but how independent are they, is one question).

Malthus and his ideas led to some developments. Insurance entities private and governmental were ways to raise money, so used mortality and morbidity data, and so on.

Clearly things go way back, if only informally. The Romans, for instance, had tax collectors and the like, and worried about their budgets, who was paying up, number of possible soldiers, and so on.

Ken Weiss said...

Yes, you're doing the right thing. I mentioned Fleck because in the '30s much of his work (the main topic was how syphillis was viewed) was about medical science including anatomy and his point was how culture defines things and they then take on an assumed sense of objectivity. How the skeleton was viewed and presented through time is something I think you'd find quite interesting, and even useful to how this is done in skeletal biology today.

Holly Dunsworth said...

Can't wait to check it out. Hope to see lots about males-as-ideal-humans!

Manoj Samanta said...

Ronald Ross, who received Nobel prize in 1902 (second year of the prize) for savings the Brits from mosquito bites, developed a lot of early statistical methods used in biology.

"Nobel Prize controversy

Ronald Ross was awarded Nobel Prize basically for his discovery of the life cycle of malarial parasite, although he himself considered his epidemiological mathematics as a much more valuable contribution."

Manoj Samanta said...

Also check this link - "Statistical methods in epidemiology:
Karl Pearson, Ronald Ross, Major Greenwood
and Austin Bradford Hill, 1900–1945"

JayMan said...

Much of this was the focus of my blog posts:

Nuts Over Nuts | JayMan's Blog


Trans Fat Hysteria and the Mystery of Heart Disease | JayMan's Blog

I won't touch the GWAS and "genes for X" strawman.

I agree with the basic gist of your argument. But I think you may be going way too far. The key problem is that we can't (ethically) conduct tightly controlled experiments on human beings, so we try to use various tricks to make an end run around this limitation. Of course, none of them truly do this. Biggest among these is uncontrolled observational studies. Reliance on these (mostly due to convenience and set-in dogma) is responsible for many of the great sins of medical misinformation.

I think, however, that the solution is actually to rely more of genes. The proven success of behavioral genetics speaks to this being an area that requires more attention. But, let's not get into that argument again. :)