Wednesday, January 18, 2012

Probability does not exist. Part III. Making the call...

We continue a discussion of randomness, probability, and scientific inference.  We made some points in the first two installments about the elusive and subjective aspects of probability.  Here we'll ask a few similar kinds of questions, and then (finally!) get to the relevance for genetics and evolution and other areas like epidemiology.

What does it mean to say that a phenomenon is 'random'?  This is a rather subjective terms, but intuitively it means that nothing in a test or experiment affects the particular outcome.  If dice are thrown randomly, it means that nothing about the throw affects whether a 1 or 6 will come up.  More generally, dice throws would be said to be random events because each toss leaves each face equally likely to occur.  Equally likely is a rather circular term, but it implies again that each side comes up the same fraction of the time.

Many events are said to be random or probabilistic with the unstated assumption that the event is inherently random.  That there is no process that makes the outcome specifically predictable.  Quantum mechanics, to many people, are like that: the position of an electron around an atom is inherently probabilistic.  No matter how much information or perfect measurement we might have, the electron's position is only predictable in a probabilistic sense (forgive us, any physicist readers, if we're not well-enough informed to have described this properly!).

Other things are said to be 'random' in that while they might be deterministic, we simply can't tell the difference between that and a truly probabilistic process.  Dice rolling is generally viewed that way--as fundamentally random.  We saw in the previous installments that this can be modified in a way.  A six and a one may not have exactly the same probability of coming up on a given roll, but once we know their side-specific probability, the process is random relative to that.  If a 6 has prob. 0.168 and a 1 has prob 0.170 of coming up, those will be the fractions we'd observe, but cannot predict any more accurately than that.

Coin-flipping is a classic example of supposedly truly probabilistic events.  But is it?  Flipping coins lots of times never generates exactly 50% heads and 50% tails, the kind of discrepancy seen in dice.  But is the discrepancy we observe just experimental error of random processes, or is there a true bias--does one side 'really' have a higher chance of coming up roses?  Is coin-flipping a truly random process, or do we just not know enough to predict the outcome of a flip?


Here is a device developed a few years ago for a Stanford statistics professor named Persi Diaconis (who has special interests in the mathematics of gambling, magic, and things like that).  He has studied coin-flipping in practical as well as theoretical terms.  He has shown that if you set up the flip in the same way every time, you will get the same outcome every time -- that is, the outcome is entirely predictable.  Put this in other terms, as he has done in his paper on the subject, coin flipping is basically a standard physics phenomenon, that obeys the essentially deterministic laws of physics.  If you know all the relevant values about the coin, the flipping force and direct, the landing surface, and so on, the outcome is entirely predictable.

The reason outcomes seem 'random' is that there are so many things we don't know about a given flip, and so many differences from flip to flip, that we generally don't know enough to predict the outcome.  That is, they seem truly probabilistic.  But in a sense they are instead truly deterministic.

Diaconis and his co-authors analyzed the various factors, in classical physics terms, and to control for them and as we understand their result, they concluded that at least for their test coin, there was a 51% probability that the side that was up when flipped will come up at the end.  Flipping is somehow, and subtly, biased.

We make the call on coin-flipping at the beginning of a football game or to see who  pays for the next round of drinks, or who draws the short straw and has to do an unpleasant job.  We think of these as random.  But if we're skeptical, how do we make the call on that question itself?  Here, despite all of the above, and relevant to the entire nature of probabilistic-seeming events and understanding them, belief and subjectivity inevitably enter the room...whether or not their entrance is announced.

That's because at some point we have to decide whether the results (51% that the starting upside will be the ending upside) really do mean something in themselves, or are just the fluke results of a finite number of observations whose outcomes could be the way we see them 'just by chance'.  Even the latter phrase is circular and vague.  In the end, we decide what it takes for us to 'believe' one interpretation over another.  And there is no objective way to decide what we should believe: everyone has to make that call for him or herself!

If we see a situation in which different possible outcomes have very different probabilities--arise with very different fractions of the time--these issues may not arise.  We'd all agree on the general interpretation.  We share our beliefs.  Even with unique events, the assumption of probability that relates to the fraction of outcomes of each sort if the event could be repeated, is not a serious issue: results more or less bear out what we think we would see in repeated tests.  Or if we have seen a few repetitions of an event, we can be confident that we understand the relative probabilities of the outcomes.

But we've given a few examples of experiments to try to show how subtle and elusive concepts like randomness and probability are, even for the simplest and most controllable kinds of situations.  These are ones in which the probability differences among outcomes (heads vs tails, faces of dice) are very small (nearly 50% heads, 50% tails, 1/6 for each die face).

The reason for this is that in many situations in biology, including attempts to understand the relationship between genes and traits (e.g., GWAS, personalized medicine) or attempts to detect evidence of natural selection from gene sequences, the situation is more like dice:  things seem to be probabilistic--perhaps inherently so, and the probability differences between different outcomes, even according to our genetic and evolutionary theories, are very small.  Similar situations arise in epidemiology, as we've written often in MT, such as whether PSA testing improves prostate cancer outcomes, or vitamin supplements improve health, and so on.

That is, we're trying to detect small probabilistic needles in haystacks.  And, to a considerable extent, even according to the theory of those doing the studies and claiming to have found the evidence, the events are not repeatable.  In part IV of this series, we'll discuss these in more specific terms, in relation to the issues in the first 3 parts.

No comments: