Comments on The Mermaid's Tale: The 'Oz' of medicine: look behind the curtain or caveat emptor!

Thanks for pointing this out. I read the book as ...

2013-12-03T15:55:05.617-05:00

Thanks for pointing this out. I read the book as well, but had forgotten that point.

I finished rereading 'Black Swan' by Taleb...

2013-12-03T15:51:13.896-05:00

I finished rereading 'Black Swan' by Taleb. He often highlights one issue that is often ignored by the 'more data is better' crowd, and I wrote a blog post discussing it.

http://www.homolog.us/blogs/blog/2013/12/03/big-data-increasing-sample-size-adds-errors/

Briefly, when you have 'winner-take-all' bias in assessment, small amount of noise gets more and more attention and gets amplified with more data. With large amount of data, you enter the tail of the distribution and believe noise is information.

I agree, of course, but unfortunately I think that...

2013-12-03T12:56:26.677-05:00

I agree, of course, but unfortunately I think that whether or not they've reasoned it out explicitly, 'cause' is exactly what people in biomedicine (and much of evolutionary reconstruction, such as of behavior) have in mind. That is what sells, in all sorts of ways, including the literal one. Even statistical types who clearly know better (such as those at 23andMe) are doing. We need to reestablish sanctions for these sorts of promotion or coloring if we want to tame the beast of overstatement (and that might not work if the problems are, as I think, much deeper than semantic).

Two, related issues: The first is the indiscrimin...

2013-12-03T12:50:12.831-05:00

Two, related issues: The first is the indiscriminate use of the word "risk" as a label for any epidemiological correlation (especially ecological correlations)---with its immediate (intended?) interpretation of causality. We would be better off avoiding the term entirely---leaving the bivariate, partial, and semi-partial correlations as "correlations". Second, even if there is ultimately a more or less direct causal path, it doesn't follow that by by shifting oneself from a "higher risk" to a "lower risk" condition modifies the underlying aetiology (to take the most obvious example, becoming a non-smoker now does not undo the damage from 20 years ago that is or will become the source of some ultimate diagnosis of lung cancer, as detection of lung cancer has a roughly 20-year lag). And, of course, there is the chronic problem that any or all of these "risk" factors may simply be markers (or proxies) for some other disease process (as all the *experimental*---not epidemiological--- evidence suggests does appear to be the case for circulating levels of cholesterol and heart disease).

You are, of course, quite right about stratificati...

2013-12-03T12:49:12.757-05:00

You are, of course, quite right about stratification. And there's another problem: if you stratify on more than a tiny handful of variables, your per-stratum sample size gets too small.

Yes, and geneticists nobly do that, for a few key ...

2013-12-03T12:24:18.295-05:00

Yes, and geneticists nobly do that, for a few key variables (like sex). But we don't know all the things to stratify by, and even then we make the uniformity assumption within strata. That is essentially the basis of genomic risk assessment these days. Of course the more the strata are refined presumably the more homogeneous the risks within a stratum. But we are currently far from showing, or even knowing, how refined that has to be.

A case can be made (and a lot of money is now being made) for saying that your genotype is unique and with various methods now in play we can predict your risk. That's Francis Collins' 'personalized genomic medicine'.

But this still applies based on integrating various subgroup risks in many ways--your risk at GeneA, plus your risk at GeneB, etc. is your overall risk.

And this won't get around Problem 1 (using retrospective data prospectively) or aspects of the other problems.

At least, this is why I think that re-thinking is called for.

You can also stratify your sample.

2013-12-03T10:22:41.797-05:00

You can also stratify your sample.

There are various hi-tech ways to "correct&qu...

2013-12-03T10:20:22.345-05:00

There are various hi-tech ways to "correct" for so-called unobserved heterogeneity (i.e. all the stuff we forgot to measure) in estimating risks or incidence functions. How well they work is another story....

Another 'Problem' has been pointed out to ...

2013-12-03T09:08:47.407-05:00

Another 'Problem' has been pointed out to me by my friend Charlie Sing, at the University of Michigan, who has been dealing with these sorts of frustrating issues for many hears. The problem is one we've probably written about here in the past (I can't recall).

It's this: We estimate things like risk using statistical methods that assume repeatability and so on, from groups identified by having this or that characteristic, like a genotype, and apply the result prospectively to similar groups for their future. But this assumes that each person in the 'group' that we define in that way is identical. That is, that the same risk, that we've estimated as an average for the group, applies to each member of the group.

It isn't difficult to see why this is inaccurate and generally to an unknown extent.

A thoughtful essay, Ken, one that raises some fund...

2013-12-03T07:23:00.131-05:00

A thoughtful essay, Ken, one that raises some fundamentally important issues in biomedical/epidemiological research and any other form of science that relies heavily on conventional statistics. Damn you, I wish I had written it! There is an old (and now thoroughly unfashionable) idea in sampling theory that, once you had identified your target population and developed a good sample frame for it, you had in principle defined the universe of inference for your study. Therefore, any use of your results to make predictions or projections, to estimate risks, or whatever, for any other grouping of individuals was, strictly speaking, inadmissible or at least inadvisable without extreme caution -- unless you could show that the precise mechanisms that generated the data, including both the real causal processes and the sampling procedures, were the same in both cases. As you note, that's never going to be true in biological research, never mind social or psychological research. That "extreme caution" bit seems to have been lost in the shuffle somehow.

I also like your comments about regression models (which I admit that I use). Most regression analyses are nothing more than gussied-up correlational studies.

Finally, I highly recommend a cartoon in the latest "New Yorker", which shows a seedy-looking drug dealer hanging out on a city street corner, yelling: "Statins. I got statins. Who wants some statins?"