A major issue is that the statistical evidence shows that many important and costly drugs are now known to be effective in only a small fraction of those patients who take them. That is shown in this figure from Schork's commentary. For each of 10 important drugs, the blue icons are persons with positive results, the red icons are the relative number of people who do not respond successfully to the drug.
Schork calls this 'imprecision medicine', and asks how we might improve our precision. The argument is that large-scale sampling is too vague or generic to provide focused results. So he advocates samples of size N=1! This seems rather weird, since you can hardly find associations that are interpretable from a single observation; did a drug actually work, or would the person's health have improved despite the drug, e.g.? But the idea is at least somewhat more sensible: it is to measure every possible little thing on one's chosen guinea pig and observe the outcome of treatment.
"N-of-1" sounds great and, like Big Data, is sure to be exploited by countless investigators to glamorize their research, make their grant applications sound deeply insightful and innovative, and draw attention to their profound scientific insights. There are profound issues here, even if it's too much yet another PR-spinning way to promote one's research. As Schork points out, major epidemiological research, like drug trials, uses huge samples with only very incomplete data on each subject. His plea is for far more individually intense measurements on the subjects. This will lead to more data on those who did or didn't respond. But wait.....what does it mean to say 'those'?
In fact, it means that we have to pool these sorts of data to get what will amount to population samples. Schork writes that "if done properly, claims about a person's response to an intervention could be just as well supported by a statistical analysis" as standard population-based studies. However, it boils down to replication-based methods in the end, and that means basically standard statistical assumptions. You can check the cited reference yourself if you don't agree with our assessment.
That is, even while advocating N-of-1 approaches, the conclusion is that patterns will arise when a collection of such person-trials are looked at jointly. In a sense, this really boils down to collecting more intense information on individuals rather than just collecting rather generic aggregates. It makes sense in that way, but it really does not get around the problem of population sampling and the statistical gerrymandering typically needed to find signals that are strong or reliable enough to be important and generalizable.
While better and more focused data may be an entirely laudable goal, if quality control and so on can in some way be ensured, but beyond this, N-of-1 seems more like a shell game or an illusion in important ways. It's a sloganized way to get around the real truth, of causal complexity, that the scientific community (including us, of course) simply have not found adequate ways of understanding--or, if we have, then we've been dishonorably ignoring what we know in making false promises to the public who support our work and who seem to believe what scientists say.
We often don't have such knowledge, but whether there is or isn't a conceptually better way, rather than a kind of 'trick' to work around the problem, is the relevant question. There will always be successes, both lucky and because of appropriately focused data. The plea for more detailed knowledge, and treatment adjustments, for individual patients goes back to Hippocrates and should not be promoted as a new idea. Medicine is still largely an art and still involves intuition (ask any thoughtful physician if you doubt that).
However, retrospective claims usually stress the successes, even if they are one-off rather than general, at the neglect of the lack of overall effectiveness of the approach--as an excuse to avoid facing fully up to the problem of causal complexity. What we need is not more slogans, but better ideas, questions, more realistic expectations, or really new thinking. The best way of generating the latter is to stop kidding ourselves by encouraging investigators, especially young investigators, to dive into the very crowded reductionist pool.
Hmm. So you have a drug (say, daily low-dose aspirin) that improves outcomes in, say, 10 out of 1000 cases, and causes harm in 7 out of 1000 cases. (I don't know the real numbers off hand, but those are the idea.)
ReplyDeleteHow do you figure out who it was who was actually helped? It's got to be more than just "has 4 or more characteristics (e.g. BMI, age) or measurements (e.g. blood chemistry) that are predictive of clotting-related disease", because that was what got you to the 2000 folks in your test and control groups (doh!).
After the fact, you've got (1000 minus the number who had nasty strokes) people who might have been helped, but you don't know which of them were helped. If you could identify even one of them, you could do all sorts of kewl N=1 reductionist science on that person. But you can't.
So to argue with you, it's not reductionism's fault that this doesn't work, it's our lack of information with which to play reductionist games. So the problem isn't reductionism, it's with trying to apply reductionism where it can't work.
(Which is to say, I'm a big believer in personalized medicine: I just don't think it's going to happen anywhere near as soon or as easily as the blokes hyping it claim.)
By the way, do the authors of the paper even address this???
I don't think it's really lack of information, unless there are key variables we're oblivious to. I think it's that causation is complex without a single major cause that is easily identifiable on its own. So I personally don't think this is going to solve the general problem, even if for one reason or another it works in some instances.
ReplyDeletePersonalized is how medicine has always been. What's happening now de-personalizes it by making doctors use computer data bases to treat their individual patients. Unfortunately, I think there is still so much art and intuition in medicine that the drive for uncritical, exhaustive 'data' is largely illusionary. We need better ideas, in my opinion.
David,
ReplyDeleteYou _can't_ know who was actually helped, because all 1000 shared the same putative risk factors, based on past studies on population risk, and averages calculated from these studies. You in fact have no idea what puts some 'at-risk' people over the threshold.
The idea of N of 1 studies is presumably to eliminate probability in favor of certainty. But N of 1 studies will also be based on the same probabilistic calculations of risk, right? And you'll still never know whose heart attack or stroke you prevented. How can you? It's an outcome that didn't happen. Will you say that the drug worked on everyone who didn't have a stroke, but didn't work on those who did?
I meant to add that one of the aims of many clinical trials is to learn enough to be able to predict who will benefit from a drug or procedure. That means population data, not n=1 data, must be collected and anaylzed. So we're right back where we started.
ReplyDeleteThe arguments laid out apply for events which have a relatively low probablity and for drugs with a small effect size. However, for diseases like non-resectable NSCLC with very low 5 year survival rates, mutliple potentially useful therapies, some with big effect sizes if they are targeted appropriately, extensive genomic diversity and therefore a need for combination therapies, single drug cinical trials starting when a patient has been proven to be refractive to SOC (as most are), then N=1 trials with extensive biomarker analysis and agressive use of unpaproved combinations using historrical control are absolutely justified because it's the only rational way forward. Any other way forward with leave us with thousands more dead who have died for the cause of "robust statistical methods".
ReplyDeleteWhen there are dire traits and problematic knowledge, and the trait is relatively rare, then of course people have to try to turn up whatever useful information they can get. Without knowledge of mechanism, or the kinds of statistical inferential tools we now rely on, ad hoc approaches and even just open-ended information gathering may be the only thing. But how one can judge whether the N-of-one based decisions made worked, or didn't, in any given case may mainly be a guessing game and may lead to hordes of false positive reports. Time will tell.
ReplyDeleteOur point was more general. One can debate the inferences one can make even in cases such as you cite, but that even the N-of-1 'manual' essentially moves to group statistics, doesn't really deal well with the overload of observations, may not be very good at telling what is 'low probability' but looks good in ad hoc cases. Worst of all, in my personal view, is that the fashionable turn of phrase (N of 1) will be regularly used--you can bet on it!--to justify mega-projects that collect every possible sort of data on this rationale, which rationale de facto relieves the pressure that should be felt to think in new ways, ways other than just more blind data collection. The idea that non-robust methods can do the trick seems wishful thinking more than very cogent, to me, but in really grim situations one will need to try anything that could work. If inference really boils down to intuition and luck etc., then one hopes one's own physician is the one with that intuition.