Tuesday, January 18, 2011

When your guess is (literally) as good as mine

Randomness is one of the most important concepts in modern science, and yet one of the most difficult to understand.  It seems that no one gave it much thought until the 18th or 19th century, when some founding mathematicians began to study probabilities and their meaning.  Mathematicians cared. Gamblers cared.  Mathematicians who gambled cared. But most people accepted that Nature was governed by wholly deterministic laws.  That idea in its modern form was largely the legacy of Galileo (but see NOTE) and Newton.

This is still debated--that is, is randomness just a concept of mathematics that in the real world is only an illusory manifestation of our lack of sufficient data about things, or could there be things that are really, truly, no-kidding random?  Quantum physics is held by many to be the only area in which that may be the case.  Rolling dice and flipping coins only appear random in this view.  But even if truly random, many argue that quantum effects are so small and numerous that they even out and simply have no bearing on things at the monstrous level of animals and plants, and can be ignored.

Randomness is usually viewed as just a matter of sampling and experimental measurement error.  In the long run, with big enough samples, things will 'even out'. That is, this kind of randomness is just statistical 'noise' that we have to deal with in sampling the real world to understand it.

But in evolution and genetics, since the next generation emerges only from the current one, and what is dead and gone can't come back, randomness (if it exists) can have a permanent effect.

Just the other day we heard an otherwise sophisticated evolutionary geneticist say, of genome mapping kinds of searches for evidence of adaptation, that the problems we are facing these days are just a matter of using the right statistics (i.e., to detect what's significant).  Amazingly, we heard that at least one statistics instructor here at Penn State (where the statistics program has long been a leading one), told a student that the .05 significance cutoff level had something fundamentally meaningful other than being a wholly conventional, subjective, arbitrarily decided criterion for making decisions about evidence.

Using and properly interpreting the right statistics is certainly important, and failure to do that is responsible for a lot of problems in genetics and evolutionary interpretation, as it is in many other areas of life and of science.

But in areas like GWAS and genomic scale analysis, the problem is not all in the statistics.  It's in the phenomena themselves.  Even when nothing causally genetic is going on, genome-scale analysis with open-ended, unconstrained data mining, is almost guaranteed to find something that will appear to be 'significant'.  And we know theoretically and empirically that important evolutionary things can be going on, but not be detectable.  In fact, we can't actually prove that 'nothing causally genetic is going on', either!

GWAS Manhattan plot
Randomness is studied intensely but remains a fundamentally very elusive concept, and everyone should have at least some grasp of why this is true.  Policy that affects daily life depends on the use (or misuse) of statistics and concepts of randomness.  We are not alone in saying this: You might enjoy and benefit from listening to a very clear discussion of the issues on BBC Radio 4's In Our Time from January 13.

This program points out what we said yesterday about astrology: if the world is truly deterministic, then everything is so connected to everything else -- as is often said, the universe is just a clockwork phenomenon -- that everything in a sense can predict everything else.  If that is actually true, than Darwinian evolution, premised on the idea that some genetic variation has a greater chance of success (that's how Darwin himself phrased it around 40 times in the Origin of Species), is a sham: because it's all predictable.  The poor rabbit is destined to be eaten, not because its slowness reduces its chance of escape from the wolf.

Think how arrogant we so often are in making strong assertions with the great data limitations that we have, or in some ways cannot completely overcome.  This is especially important, in a subtle way, because modern science largely rests on concepts of statistical significance and formal 'hypothesis testing', in processes that are either probabilistic as far as we can tell, or that we are forced to study by, but hopefully appropriately structured (random?), sampling.  And to realize that much of Nature could be entirely non-random, even controlled by very simple processes, and yet appear random by every known test.  Those processes can be totally non-random, for example as the BBC discussion mentions, the sequence of digits in the value of pi (circumference divided by diameter of any circle in the universe), are indistinguishable from randomness.

Indeed, often our belief in deterministic laws of Nature may lead us to assume determinism, and treat the probabilistic aspect of data as just a practical impediment, and using 'significance tests' (with arbitrarily agreed-on cutoff values) as if that proves it, when the process itself could be inherently probabilistic.

It is very sobering.

NOTE: Actually, Galileo did many experiments.  These are described in his Dialogues Concerning Two New Sciences (1638), he showed laws of motion and gravity by rolling a ball down inclined planes of various steepness.  He had to do that because time measurement, done by a water-clock, was not precise enough for dropping things off the Leaning Tower.  But he also recognized that he needed replication, and repeated these experiments 100 times: an acknowledgment of random measurement error.


James Goetz said...

I suppose if the appearance of randomness is merely an illusion of inscrutable causal determinism, then all scientific experiments with a mathematical analysis have only illusionary validity. In other words, there would be no validity to anything discovered by the scientific method. Does anybody agree or disagree with me on this?

Ken Weiss said...

That's basically the question. If something seems random after close scrutiny (or in the case of the results of random sampling to estimate properties of the real world), then for all practical purposes it is random. Whether it's illusory or not is a philosophical question unless or until science gets to the scale at which it makes a difference.

But the idea of 'validity' is elusive. The description of something is valid within its range of applicability, meaning that we're claiming to understand and predict how it appears, but not what its ultimate properties might be.

So, something that appears random and is analyzed probabilistically, will provide a valid distributional understanding (such as in the expected number of heads in 100 coin flips, and the expected variation among such trials).

But when and whether something is ultimately valid may be unknowable.

Anonymous said...

This is a nice posting

John R. Vokey said...

Two points:

1. There are two meanings of random here that are being equivocated: A is random with respect to B (the usual meaning in statistics), and A is fundamentally random (i.e., there is no B such that A is NOT random with respect to it---the meaning in quantum physics). The first meaning allows for A to be completely determined.

2. The .05 alpha level may be less arbitrary than is usually assumed (see: Cowles, Michael; Davis, Caroline
Canadian Journal of Behavioural Science/Revue canadienne des Sciences du comportement , Volume 14 (3): 248
PsychARTICLES® – Jul 1, 1982).

Ken Weiss said...

Yes, fair enough. I don't think we have any substantial difference here.

There is perhaps a third sense, which is the effective predictability of B from A, whatever its ultimate underlying reason. The latter can then be viewed as a matter of philosophy.

I can't comment on the second point without access to the actual article. But the abstract talks about a small sample of 36, who in a gambling test found between 10% and 1% chance of being wrong to be meaningful to them, what one may call an emotional sense of 'meaningful'.

Assuming that kind of study itself to be very meaningful (which I'm not sure I would), then in the human emotional sense 5% may not be arbitrary, and I'm sure there are many who would cook up a post hoc adaptive reason that a 95% safety tolerance led to greater net fitness than being too, or not enough, gullible.

But I think that historically Fisher introduced it strictly as a kind of suggested hunch, with no formal justification being offered. And in the post we made 'arbitrary' would have to be understood to be relative to whether 5% is a measure of actual truth, which it isn't, since it's a measure of our judgment of whether we want to accept the hypothesis given the evidence. If the hypothesis is properly framed and everything, it's either true or not (assuming the principle of the excluded middle). Any chosen cutoff convention is in that sense of the word 'arbitrary'.

That 5% is arbitrary is also shown by the way it's honored in the breach. Investigators hungry for the answer they want often (sometimes surreptitiously) lower the bar in various ways. Investigators (such as geneticists doing thousands of tests) raise the bar to avoid having to track down countless expensive false trails, knowing they may be overlooking true causal elements as a consequence of being able to focus on the most plausible ones.

So if we differ on your second point, it may be on what one means by 'arbitrary'.

John R. Vokey said...

Fair enough, but Fisher didn't introduce it, he was codifying pre-existing practise. See Cowles and Davis (1982). On the Origins of the .05 Level of Statistical Significance. _American Psychologist_, 37, 553-558.

Ken Weiss said...

Now that you mention this, I think I knew it. I had thought until recently that this arose out of his putting the tea-taster to the test, but on reading about it I realized otherwise, perhaps by browsing Fisher's book. Anyway, it is interesting how elusive probability and randomness actually are.

James Goetz said...

I can refine my critique of causal determinism:

If the appearance of probability in events is merely an illusion within inscrutable causal determinism, then all scientific experiments with a probabilistic analysis have only illusionary validity. Likewise, causal determinism is a so-called baby of rationalistic thought that invalidates most of science. Also, regardless of the worthlessness of causal determinism in science, nobody can disprove it, especially with a significance test.

Ken Weiss said...

I guess I can only say that we know of very simple deterministic processes that yield results indistinguishable from randomness, in the sense that a present state does not predict future states (e.g., the consecutive digits of pi, or the results of computer 'cellular automata'). We know of processes that, so far as anyone can tell so far, are truly random (quantum mechanics).

We know of sampling issues that, for given sample sizes and underlying probabilities, cannot be resolved, whether the underlying process is perfectly deterministic or is truly random (e.g., whether Heads-Heads-Tails came from a fair coin).

We know of processes that even if perfectly deterministic, from which measurement that is not 100% accurate, which it never is, will be unable to predict future states any better than random (so-called 'chaotic' processes, the predictions of economists).

So we really cannot know, with current concepts and methods, what the ultimate truths actually are.

And even then the thinking almost always assumes the excluded middle (something can be true or false but not both), the rules of mathematics and deductive logic, assumptions about the universality of 'laws' of nature.

Given all of this, it's not clear to me what illusory or validity actually mean.

James Goetz said...

"Given all of this, it's not clear to me what illusory or validity actually mean."

Well, it's all a conjecture to me. And unless all of the perceptions of my senses are completely wrong, then making conjectures of validity are helpful, which of course is a conjecture.:)

Ken Weiss said...

Well, conjecture is interesting to do, and at least some people would like to understand the true nature of existence. And hoping that it's not the existentialist's nothingness, and hoping, if not, to understand what it is

But it is probably a vain hope, or at least we don't seem to be very close to a demonstrable understanding of it all.