Wednesday, August 15, 2018

On the 'probability' of rain (or disease): does it make sense?

We typically bandy the word probability around, as if we actually understand it. The term, or a variant of it like probably, can be used in all sorts of contexts that, on the surface seem quite obvious and related to some sense of uncertainty; e.g., "That's probably true," or "Probably not."  But is it so obvious?  Are the concepts clear at all?  When are they, actually, more than just informally and subjectively, meaningful?

Will it rain today?  Might it?  What is the chance of rain?
One of the typical uses of probabilistic terms in daily life has to do with weather predictions.  As a former meteorologist myself, I find this a cogent context in which to muse about these terms, but with extensions that have much deeper relevance.

Here is an episode of a generally very fine BBC Radio 4 program called More or Less, whose mission is to educate listeners on the proper use and understanding of numbers, statistics, probabilities and the like.  This episode deals, somewhat unclearly and to me quite vaguely, unsatisfactorily, and even somewhat defensively, about the use and interpretation of weather forecasts.

So what does a forecast calling for an x% chance of rain mean?  Let's think of an imaginary chessboard laid over a particular location.  It is raining under the black, but not under the white squares.  There is nothing probabilistic about this.  50% of people in the area will experience rain.  If I don't know where you live, exactly, I'd have to say that you have a 50% chance of rain, but that has nothing to do with the weather itself but rather with my uncertainty of where you live.  Even then it's misleadingly vague since people don't live randomly across a region (they are, for example, usually clustered in some sub-regions).

Another interpretation is that I don't know where the black and white squares will be exactly, at any given time, but my weather models predict that in about half of the region, rain will fall.  This could be because my computer models, necessarily based on imperfect measurement and imperfect theory, are therefore imperfect--but I run them many times, making small random changes in various values to account for that imperfection, and I find that among these model runs, 50% of the time at any given spot, or 50% of the entire area under consideration, experiences rain.

Or, is it that there is an imaginary chessboard moving overhead and so the 50% of the land will be under the black and hence getting rain at any given time, and thus that any given area will only get it 50% of the time, but every area will certainly get rain at some time during the forecast period, indeed every area will be getting rain half of the period?  Then the best forecast is that you will get wet if you stay outside all day, but if you only run out to get the mail you might not?  Might??

Or is it that my models are imperfect but theory or experience tell me that there is a 50% chance of any rain in the area--that is, my knowledge can tell me no more than that.  In that case, any given place will have this guesstimated chance of rain.  But does that mean at any given time during the forecast period, or at every time during it?  Or is it that my knowledge is very good, but the meteorological factors--the nature of atmospheric motion and so on--only probabilistically form droplets that are large enough not just to be clouds but to fall to earth?  That is, is it the atmospheric process itself that is probabilistic--at least based on the theory, since I can't observe every droplet.

If a rain-generating front is passing through the area, it could rain everywhere along the front, but only until the front has moved past the area.  Thus, it may rain with 100% certainty, but only 50% of the specified time, if the front takes that amount of time to pass through.

I've undoubtedly only mentioned some of the many ways that weather forecast probabilities can be intended or interpreted as meaning.  It is not clear--and the BBC program shows this--that everyone or perhaps even anyone making them actually understands, or is thinking clearly about, what these probability forecasts mean.  Even meteorologists themselves, especially when dumbing down for the average Joe who only wants to know if he should carry his brolly with him, are likely ('probably'?!) unclear about these values.  Probably they mean a bit of this and a bit of that.  I wonder if anyone can know which of the meanings are being used in any given forecast.

Well, fine, everyone knows that nobody really knows everything about the weather.  Anyway, it's not that big of a deal if you get an unexpected drenching now and then, or more often haul your raincoat to work but never need it.

But what about things that really matter, like your future health?  My doc takes my blood pressure and looks at my weight, and may warn me that I am at 'risk' of a heart attack or stroke--that without taking some preventive measures I may (or probably will) have such a fate.  That's a lot more important than a soaked shirt.  But what does it mean?  Isn't everybody at some risk of these diseases?Does my doc actually know?  Does anybody?  Who is thinking clearly about these kinds of risk pronouncements?

OK, caveats, caveats: but will I get diabetes?
In genomics 'precision' genomic medicine is one of the genomics marketing slogans of the day, the very vague (I would say culpably false) promise that from your genotype we can predict your future--that's what 'precision' implies.  The same applies even if weaseling now would include environmental factors as well as genomic ones.  And the idea implies knowledge not just of some vague probability, but by implication it means perfection--prediction with certainty.  But to what extent--if any at all--is the promise, or can the promise be true?  What would it mean to be 'true'?  After all, anyone might get, say type 2 diabetes, mightn't they?  Or, more specifically, what does such a sentence itself even mean, if anything?

We know that, today at least, some people get diabetes sometime in their lives, and even if we don't know why or which ones, that seems like a safe assertion.  But to say that any person, not specifically identified, might become diabetic is rather useless.  We want a reason--a cause--and if we have that we assume it will enable us to identify specifically vulnerable individuals.  Even then, however, we don't know more than to say, in some sense that we may not even understand as well as we think we do, that not all the vulnerable will get the disease: but we seem to think that they share some probability of getting it.  But what does that mean, and how do we get such figures?

Does it mean that among all those with a given GWAS! genotype, (1) a fraction f will get diabetes?(2) a fraction f will get diabetes if they live beyond some specified age? (3) a fraction f will get diabetes before they die if they live the same lifestyle diet as those from whom the risk was estimated? (4) a net fraction f will get diabetes, pro-rated year by year as they age; (5) a net fraction related to f will get diabetes, but that is adjusted for current age, sex, race, etc.?

What about each individual consulting their Big Data genomic counselor?  Are these fractions f related to each individual as a probability p=f that s/he will get diabetes (conditional on things like items 1-5 above)?  That is, is every person at the same risk?

Only if we can equate our past sample, from which we estimated f by induction to the probability p used by deduction to assert for each new individual might this, even in principle, lead to 'precision genomic medicine'.  It is prediction, not just description that we are being promised.  Even if we were thinking in public health terms, this is essentially the same, because it would relate to the fraction of individuals who will be affected in the future, because each person is exposed to the same probability.

Of course, we might believe that each person has some unique probability of getting diabetes (related, again, to the above items), and that f reflects the mix (e.g., average) of these probabilities.  But then, we have to assume that all the genotypes and lifestyles and so on in the current group whose future we're offering 'precision' predictions is exactly like the sample from which the predictions were derived, that this mix of risks is, somehow, conserved.  How can such an assumption ever be justified?

Of course, we know very well that no current sample whose future we want to be precise about will be exactly the same as the past sample from which the probabilities (or fractions) were derived.  Obviously, much will differ, but we also know that we simply have no way to assess by how much it will differ.  For example, future diets, sociopolitical, and other factors that affect risk will not be the same as those in the past, and are inherently unpredictable.  So, on what meaningful basis can 'precision' prediction be promised?

Just for fun, let's take the promise of precision genomic medicine at its face value.  I go to the doc, who tells me
"Based on your genome sequence, I must advise you of your fate in regard to diabetes."
"Thanks, doc.  Fire away!"
"You have a 23.5% chance of getting the disease."
"Wow!  That sounds high!  That means I have a 23.5% chance that I won't die in a car or plane crash, right?  That's very comforting.  And if about 10% of people get cancer, then of my 76.5% chance of not getting diabetes, it means only a 7.65% chance of cancer!  Again, wow!"
"But wait, Doc!  Hold on a minute.  I might get diabetes and cancer, right?  About a 7.65% percent chance of that, right?"
"Um, well, um, it doesn't work quite that way [to himself, sotto voce: "at least I think so..."].....that's because you might die of diabetes, so you wouldn't get cancer.  Of course, the cancer could come first, but it would linger, because you have to live long enough to experience your 23.5% risk of diabetes.  That would not be good news.  And, of course, you could get diabetes and then get in a crash.  I said get diabetes, not die of it, after all!"
I gather you, too, can imagine how to construct many different sorts of fantasy conversations like this, even rashly assuming that your doctor understood probability, had read his New England Journal regularly when not too sleepy after a day's work at the clinic--and that the article in the NEJM was actually accurate.  And that NIH knew in sincerity what they were promising in the way of genomic predictability promises.  But wait!  The medical journals, and even the online genotyping scam companies--you can probably name one or two of them--change your estimated risks from time to time as new 'data' come in.  So when can I assume case-closed and I (well, the Doc) really knows the true probabilities?

I mean, what if there are no such true probabilities, because even if there were, not just knowledge, but also circumstances (cultural, not to mention mutations) continually change, and what if we have no way whatever to know how they're gonna change?  Then what is the use of these 'precision' predictions?  They, at best, only apply to a single, current instance.  So what (if anything at all) does 'precision' mean?

It only takes a tad of thinking to see how precisely imprecise these promises all are--must be, except very short-term extrapolations of what past data showed, and extrapolations of unknown (and unknowable) 'precision'.  Except, of course, the very precise truth that you, as a taxpayer, are going to foot the bill for a whole lot more of this sort of promises.

Unlike the weather, we don't have anything close to as rigorous an understanding of human biology and cultures as we do of the behavior of gases and fluids (the atmosphere).  We might want to say, self-protectingly and more honestly modest, that our use of 'probability' is very subjective and really just means an extrapolated rough average of some unspecifiable sort.  But then that doesn't sound like the glowing promise of 'precision', does it?  One has to wonder what sort of advice would make scientifically proper, and honorable, use of the kind of probabilistic, vague, ephemeral evidence we have when we rely on 'omics approaches, or even when it's the best we can do at present.

In meteorology, it used to be (when I was playing that game) that we'd joke "persistence is the best forecast".  This was, of course, for short range, but short range was all we could do with any sort of 'precision'.  We are pretty much in that situation now, in regard to genomics and health.

The difference is, weather forecasters are honest, and admit what they don't know.

No comments: