Tuesday, December 3, 2013

The 'Oz' of medicine: look behind the curtain or caveat emptor!

There has been quite a stir over recent attempts to provide a general calculator of heart disease risk and associated recommendations.  In particular, how reliable and how independent are the offered recommendations relative to the prescription of lifetime use of medicine like statins?  How skeptical should the general public be about the reliability of the risks and the disinterestedness--lack of potential gain or self-interest--behind recommendations that a high fraction of the population go on life-long meds?

We've commented on aspects of this general issue before, and in particular about the shaky aspects of the GetOnStatins push.  Here's another article in the NYTimes about this, raising issues about the accuracy of the risk estimates--the predictions--based on the chosen risk factors (cholesterol levels, age, sex, and numerous others). These calculators give wide-ranging results and are now widely viewed to be inaccurate.  And the attacks we've seen are not the accusation that, while conflicts of interest seem likely to be a factor, the main problem is that pharma is pushing the recommendations to increase sales.


There are a couple of issues that are deeply problematic across a wide spectrum of the kinds of biomedical (and, indeed, evolutionary) research that is getting so much funding and public play.

First, the research and interpretations are based on what amounts to a deep belief system--yes, that's an apt word for it--an assumption that raw data collection, handed to high-speed computers, can find the patterns that will properly estimate risk, given various observed traits of the sampled person. That in turn amounts to assuming that such data do, in fact, contain and will reveal the nature of causal truth.  That is essentially the rationale for the glamorous touting of what is now catch-phrased as 'Big Data' (we need our branding, apparently).  It sounds impressive, and it generates attention and large, long-term studies that are the boon of professors who need to fund their own salaries and their research empires.

Whether the belief in computer analysis of uncritically collected data is scientifically appropriate or not is hard to separate from the thicket of vested career interests and the glamour of technology in our current society.  But there are a few clear-cut problems that a few write about but that the system as a whole dare not think too hard about because it might slow down the train.

Problem 1:  Risks are estimated retrospectively--from the past experience of sampled individuals, whether in a properly focused study or in a Big Data extravaganza.  But risks are only useful prospectively: that is, about what will happen to you in your future, not about what already happened to somebody else (which, of course, we already know).

We frequently mention this issue because it's a fundamental problem, and in the case of the heart disease calculator, it is that the people from whom the risks have been estimated lived importantly different lifestyles from people now using the calculator (and it's regardless of the unknown future risk exposures).  Smoking is lower and exercise higher.  Drug (legal) exposures are different, diagnoses and treatment options, etc.  Not to mention changes in prevalence of unknown risk factors.

This reveals something more that is deeply at the inescapable heart of the problem:  We respond to 'news' about risks by altering our behavior, companies market different things to us as a result, and we undertake other behavior whose relation to the disease is not known or whose exposure levels can't be predicted....not even in principle.  Thus we are saying essentially that "if you live like your grandparents did, this is your risk."  But in truth we are not seers, nor is any Oz behind the curtain, who know, or can know, how you will live or who controls all the outcomes--if only we could discover him. 

Problem 2:  The idea is that by doing statistical association studies (correlation or regression analysis, for example) on what happened in the past we are revealing the causal understructure of the results we care about.  Symbolically, we write  

DiseaseProb = RiskExposureAmount x Dose-responseEffect + otherStuffIincludingErrors.

Such equations are routinely referred to as constituting a causal 'model', but in truth it's nothing of the kind.  Instead it's generic rather than being developed from or tied in any serious way to the actual causation--the mechanism--that may apply to the risk factor.  And 'may' is decided by a subjective statistical test of some kind that we choose to apply to any associations we find in the form of data we choose to collect (our study or sample design).

We are usually not actually applying any serious form of 'theory' to the model or to the reesults.  We are just searching for non-random associations (correlations) that may be just chance, may be due to the measured factor, or may be due to some other confounding but unmeasured factors.  And even this depends on how 'randomness' plays into our sampling and to causation, and how important rare events that cannot be captured by most samples (e.g., very rare genetic variants, or somatically arising variants that are not transmitted).

It is by such reasoning that we feel we can assume that the same association we observe in our sample will apply to the other people whom we haven't observed.  Since we don't know how different your life will be from the lives from whom even these estimates were derived, we can't know how different your risks will be, even if we've identified the right factors.

Problem 3:  Statistical analysis is based on probability concepts, which in turn are (a) based on ideas of repeatability, like coin flipping, and (b) that the probabilities can be accurately estimated.  But people, not to mention their environments, are not replicable entities (not even 'identical' twins).  We are not all totally unlike each other, but are never exactly alike.  No two populations or two samples are identical (that fact ironically is based on a real theory, that of evolution).  National exhaustive data bases will always have such problems, and the extent of them is essentially unknown, and we have no theory for it, largely because of Problem 2: we have no real theory for the disease causation problem.

Problem 4:  Competing causes inevitably gum up the works.  Your risk of a heart attack depends on your risk of completely unrelated causes, like car crashes, drug overdoses, gun violence, cancer or diabetes, etc.  If those change, your risk of heart disease will change.  If you're killed in a crash today you can't have a stroke tomorrow.  But we cannot know if any such exposure factors will change or by how much.  The car-crash illuistration is perhaps trivial.  But environmental changes on a large scale certainly are not:  War or pestilence are extreme examples that we know are regular types of occurrence.

Problem 5:  Theory in physics is in many ways the historic precedent on which we base our thinking.  This arose in the time of Galileo and Newton.  One advantage physics has is that its objects are highly replicable.  It is believed that every electron, everywhere in the universe, is identical.  If you want to know if certain factors can reveal, say, the Higgs Boson, you collide beams of gazillions of identical protons at each other, their splatter pattern tells you something about their makeup.  Things scatter at a predictable pattern and though its probabilistic, you know the expected distribution from your theory and can then estimate, to the closest possible extent, whether the result fits.

But life is not replicable in that way, and life is the product of an evolutionary process that not only involves imperfect replication (of DNA from one generation to the next) and differential proliferation depending on local circumstances.  Life is about difference not replicability.

Problem 6:  Big Data is proposed as the appropriate approach, not a focused hypothesis test.  Big Data are uncritical data--by policy!  This raises all sorts of issues such as nature of sample and accuracy of measurements (of genotypes and of phenotypes).  Somatic mutation and cell-specific interactions and so forth, for example, are not measured, nor is the microbiome, and can't really be retro-collected. If the Big Data, Computers Are Everything approach works, then we will have undergone a change in the basic nature of science--it could be, of course, but now it's more a belief, for various reasons, than anything with sound theory behind it (other than theories of computing and statistics, etc., of course).

There is far insufficient evaluation of how well whole-population-based data bases, including CDC and cancer registries, have actually identified things not knowable in other ways--or what kinds of things (eg other than major factors) they found.  Even posterchildren, like Framingham Heart Study of heart disease risks, and tests of the drug warfarin and clotting, have not been unambiguous successes.  Had this been done modestly without even in the '90s promises of magic bullets, and was done on a modest scale, followed by proof of principle in terms of actual cures etc., then the situation might be different.

Based on fundamental beliefs that the ultimate control in the cosmos is at the level of basic physics, science believes that the world must be predictable and that everything, from gravity on up to life, must follow universal laws.  That drives the belief that if we observe enough things we can find out how those laws apply to life, which is more complex than a hydrogen atom.  But there could be, say, uncomputably many arrangements of components that explain much of biological causation, that statistical models can't explain even when statistical studies can reveal them, or causal interactions too indirect for our current kinds of non-theory-based statistical study designs can reveal.

This is old news that nobody wants to acknowledge
We're disclosing no secrets here, except the secret that the profession simply won't stop foisting this sort of approach onto the funding agencies and media, associated with promises of major positive social impact, rather than reforming the way science is done.  The profession is not confessing that basically one writes regression equations because computers can do them, Big Data can be fed into them, and....really, .... that we usually haven't much of a clue how the causation actually works.  Even if, as is likely often the case, a measured risk factor really is a risk factor, its connection with the outcomes is typically not even mildly understood, at present.  Yet our daily proclamations to the media.

Even major environmental or genetic risk factors often take extensive, highly focused studies to work out in any quantitative way that is very useful in telling people what their risks are or helping them decide what to do about them.  If a particular gene gives you, say, a 50% risk of a nasty disease, you might want to be screened, or you'll stop some behavior that confers such level of risk. You don't care if it's 55% or 71%.

But most risk factors change you risk by only a few percent, or less, and hardly measurable with accuracy, except by invoking the belief we discussed in Problem 3.

None of this is new to science or to this particular era in science.  But in an age of impatience, where so many vested interests drive the system, communication is so rapid, and self-promotion so prevalent among professionals and the media etc., the issues are not given much attention.  It doesn't pay to face up to them.  They are too challenging.

The drumbeat of critiques of research results that emblazon the media is over-matched by the blare of excited promises and proclamations.  It's as if there's an Oz behind the curtain who pulls various levers and makes things happen (and the scientist seems to claim that s/he is that Oz, or has secret communication with him).  It's an understandable human tendency, it's our way of doing business, even if it's only somewhat connected to the problems we say we're doing our best to solve.  Often, our best would be to slow down, scale down, focus and think harder. That's no guarantee, and it doesn't mean we shouldn't use technology and even large-scale approaches.  But we should stop using them mainly as a way of keeping the tap open.

Now yesterday there was another story, reflecting a long-known finding that obesity is a risk factor for breast cancer.  This isn't far-fetched, if the idea is correct that obesity relates to cholesterol which is a molecule in pathways related to various steroid hormones which could stimulate growth in breast cancer cells.  The idea is that if one lowers cholesterol, this risk might be lowered.  And how does one lower cholesterol?  Taking statins is one way.  So, like so many things, this is a complicated story.

It could be that the way things are being done, including Big Data 'omics' approaches is the best approach to genomic causation that there is, and we just won't get to the kinds of rigor that physics and chemistry enjoy.  Or that something better is due, and it'll come along when somebody has the right insight.  But that will likely happen to that person who is banging his/her head against problems like those we've outlined here, and in other posts.

10 comments:

Jim Wood said...

A thoughtful essay, Ken, one that raises some fundamentally important issues in biomedical/epidemiological research and any other form of science that relies heavily on conventional statistics. Damn you, I wish I had written it! There is an old (and now thoroughly unfashionable) idea in sampling theory that, once you had identified your target population and developed a good sample frame for it, you had in principle defined the universe of inference for your study. Therefore, any use of your results to make predictions or projections, to estimate risks, or whatever, for any other grouping of individuals was, strictly speaking, inadmissible or at least inadvisable without extreme caution -- unless you could show that the precise mechanisms that generated the data, including both the real causal processes and the sampling procedures, were the same in both cases. As you note, that's never going to be true in biological research, never mind social or psychological research. That "extreme caution" bit seems to have been lost in the shuffle somehow.

I also like your comments about regression models (which I admit that I use). Most regression analyses are nothing more than gussied-up correlational studies.

Finally, I highly recommend a cartoon in the latest "New Yorker", which shows a seedy-looking drug dealer hanging out on a city street corner, yelling: "Statins. I got statins. Who wants some statins?"

Ken Weiss said...

Another 'Problem' has been pointed out to me by my friend Charlie Sing, at the University of Michigan, who has been dealing with these sorts of frustrating issues for many hears. The problem is one we've probably written about here in the past (I can't recall).

It's this: We estimate things like risk using statistical methods that assume repeatability and so on, from groups identified by having this or that characteristic, like a genotype, and apply the result prospectively to similar groups for their future. But this assumes that each person in the 'group' that we define in that way is identical. That is, that the same risk, that we've estimated as an average for the group, applies to each member of the group.

It isn't difficult to see why this is inaccurate and generally to an unknown extent.

Jim Wood said...

There are various hi-tech ways to "correct" for so-called unobserved heterogeneity (i.e. all the stuff we forgot to measure) in estimating risks or incidence functions. How well they work is another story....

Jim Wood said...

You can also stratify your sample.

Ken Weiss said...

Yes, and geneticists nobly do that, for a few key variables (like sex). But we don't know all the things to stratify by, and even then we make the uniformity assumption within strata. That is essentially the basis of genomic risk assessment these days. Of course the more the strata are refined presumably the more homogeneous the risks within a stratum. But we are currently far from showing, or even knowing, how refined that has to be.

A case can be made (and a lot of money is now being made) for saying that your genotype is unique and with various methods now in play we can predict your risk. That's Francis Collins' 'personalized genomic medicine'.

But this still applies based on integrating various subgroup risks in many ways--your risk at GeneA, plus your risk at GeneB, etc. is your overall risk.

And this won't get around Problem 1 (using retrospective data prospectively) or aspects of the other problems.

At least, this is why I think that re-thinking is called for.

Jim Wood said...

You are, of course, quite right about stratification. And there's another problem: if you stratify on more than a tiny handful of variables, your per-stratum sample size gets too small.

John R. Vokey said...

Two, related issues: The first is the indiscriminate use of the word "risk" as a label for any epidemiological correlation (especially ecological correlations)---with its immediate (intended?) interpretation of causality. We would be better off avoiding the term entirely---leaving the bivariate, partial, and semi-partial correlations as "correlations". Second, even if there is ultimately a more or less direct causal path, it doesn't follow that by by shifting oneself from a "higher risk" to a "lower risk" condition modifies the underlying aetiology (to take the most obvious example, becoming a non-smoker now does not undo the damage from 20 years ago that is or will become the source of some ultimate diagnosis of lung cancer, as detection of lung cancer has a roughly 20-year lag). And, of course, there is the chronic problem that any or all of these "risk" factors may simply be markers (or proxies) for some other disease process (as all the *experimental*---not epidemiological--- evidence suggests does appear to be the case for circulating levels of cholesterol and heart disease).

Ken Weiss said...

I agree, of course, but unfortunately I think that whether or not they've reasoned it out explicitly, 'cause' is exactly what people in biomedicine (and much of evolutionary reconstruction, such as of behavior) have in mind. That is what sells, in all sorts of ways, including the literal one. Even statistical types who clearly know better (such as those at 23andMe) are doing. We need to reestablish sanctions for these sorts of promotion or coloring if we want to tame the beast of overstatement (and that might not work if the problems are, as I think, much deeper than semantic).

Manoj Samanta said...

I finished rereading 'Black Swan' by Taleb. He often highlights one issue that is often ignored by the 'more data is better' crowd, and I wrote a blog post discussing it.

http://www.homolog.us/blogs/blog/2013/12/03/big-data-increasing-sample-size-adds-errors/

Briefly, when you have 'winner-take-all' bias in assessment, small amount of noise gets more and more attention and gets amplified with more data. With large amount of data, you enter the tail of the distribution and believe noise is information.

Ken Weiss said...

Thanks for pointing this out. I read the book as well, but had forgotten that point.