But there are energetic, and sometimes fierce discussions about just how we should go about doing our work. These discussions often involve the basic statistical methods on which our inferences lie. We've talked about the statistical aspects of life science in numerous past posts. Today, I want to write about an aspect that relates to notions of 'revolution' in science, or what Kuhn called paradigm shifts. What follows is my own view, not necessarily that of anybody else (including the late Kuhn).

xkcd |

For many if not most aspects of modern science, we express basic truths mathematically in terms of some parameters. These include values such as Newton's gravitational constant and any number of basically fixed values for atomic properties and interactions. Such parameters of Nature are not known with perfect precision, but they are assumed to have some universally fixed value, which is

*estimated*by various methods. The better the method or data, the closer the estimate is held to be, relative to the true value. Good science is assumed to approach such values asymptotically, even if we can never reach that value without any error or misestimation.

This is not the same as showing that the value is 'true', or that the underlying theory that asserts there is such a value is true. Most statistical tests evaluate data relative to some assumed truth or property of truth, or some optimizing criterion given our assumptions about what's going on, but many scientists think that viewing results this way is a conceptual mistake. They argue that our knowledge leads only to some degree of confidence, a subjective feeling, about an interpretation of Nature. Using approaches generally referred to as 'Bayesian', it's argued that all we can really do is refine our choice of properties of nature that we have most confidence in. They rarely use the terms 'belief' or 'faith' in the preferred explanation, because 'confidence' carries a stronger sense of an acceptance that can be systematically changed. The difference between Bayesian approaches and purely subjective hunches about Nature is that Bayesian approaches have a rigorous and in that sense highly objective format.

This comes from a famous rearrangement of a basic fact of probabilities, credited to Thomas Bayes. It is a rearrangement of basic laws of probability, and it goes like this:

p(Hypothesis|Evidence) = p(Evidence|Hypothesis) p(Hypothesis) / p(Evidence)

This says that the probability of some Hypothesis we may be interested in is equal to the probability of that evidence E if the Hypothesis were truly true, times the probability that we have in mind for the Hypothesis, all divided by the overall probability of the Evidence; that is, there must be a lot of ways that the Evidence might arise (or we'd already know our Hypothesis was true!), so you sum up the probaiblity of the data if H is true (weighted by your prior probability that it is, and separately the probability of the data if an alternative to H is true weighted by the probability of its not being true. It's somewhat elusive, but here's an oversimplified example:

Suppose we believe that a coin is fair. But there's a chance that it isn't. In advance of doing any coin-flipping, we might express our lack of knowledge by saying the chance that the coin is fair is, say, 50%, since we have no way to actually know if it is or isn't. But now we flip the coin some number of times. If it's fair, the probability of it coming up Heads equals 50%, or p(H) = 0.5 per flip. But suppose we observe 60% Heads. A fair coin could yield such results and we can calculate the probability of that happening. But an unfair coin could also generate such a result.

For simplicity, let's say we observe HHHTT. For a fair coin, with p(H) = 1/2, the probability of this result is (1/2)(1/2)(1/2)(1/2)(1/2)(1/2) = 0.0312, but if the coin is unfair in a way that yields 60% Heads, the probability of this result is (0.6)(0.6)(0.6)(0.4)(0.4) = 0.035. Using the formula above, the probability that the coin is fair actually drops from 50% to about 31%: we're less confident about the coin's fairness. If we kept flipping and getting such results, that value would continue dropping, as we became less confident that it's fair and increasingly confident that its true probability of Heads was 0.6 instead of 0.5. We might also ask if the probability of it being fair is, say, zero, or 1/8, or 0.122467--that is, we can test any value between zero (no chance it's fair) to 1.0 (completely sure it's fair).

The basic idea is that we have some

*prior*reason, or probability (p(H)) that the Hypothesis is true and we gather some new Evidence to evaluate that probability, and we adjust it in light of the new Evidence. The adjusted value is called the

*posterior*(to the new data) probability of the Hypothesis, and Bayes' theorem provides a way to make that adjustment. Since we assume that

*something*must be true, Bayes' formula provides a systematic way to change what we believe about competing explanations. That is, our prior probability is less than 1.0 (certainty of our Hypothesis) which implies that there are other hypotheses that might be true instead. The use of Bayes' theorem adjusts our confidence in our specified Hypothesis, but doesn't say or show that it is true. Advocates of a Bayesian approach argue that this is the reality we must accept, and that Bayesian approaches tell us how to get a best estimate based on current knowledge. It's always possible that we're not approaching truth in any absolute sense.

A key aspect of the Bayesian view of knowledge is that the explanation is about the

*probability*of the data arising if our preferred explanation is true, accepting that it might or might not be. It assigns quantitative criteria for alternative explanations whose relative probability can be expressed--that is, the set of possible hypotheses each have a probability (a value between zero and 1), and their sum exhausts all possibilities (just as Heads and Tails exhaust the possible flip outcomes, or a coin must either be fair or not-fair).

**OK, OK so what does this have to do with scientific 'revolutions'?**

The basic idea of Bayesian analysis is that it provides a technically rigorous way to express subjective confidence in a scientific context. It provides a means to use increasing amounts of data to adjust the level of confidence we assign to competing hypotheses, and identify the Hypothesis that we prefer.

This is a good way to express confidence rather than a yes-no illusion of ultimate truth, and has found widespread use. However, its use does depend on whether the various aspects of experiments and hypothesis can adequately be expressed in probabilistic terms that accurately reflect how the real world is--and, for example, important causal components may be missing, or the range of possibilities may not be expressible in terms of probability distributions.

I am by no means an expert, but a leading proponent of Bayesian approaches, the late ET Jaynes, said this in his classical text on the subject (

*Probability Theory,*Cambridge Press, 2003):

Before Bayesian methods can be used, a problem must be developed beyond the 'exploratory phase' to the point where he it has enough structure to determine all the needed apparatus (a model, sample space, hypothesis space prior probabilities, sampling distribution).This captures the relevant point for me here, in the context of the idea of scientific revolutions or paradigm shifts. I acknowledge that in my personal view, and this is about philosophy of inference, such terms should be used only for what is perhaps their original reference, the major and stunning changes like the Darwinian revolution, and not the more pedestrian applications of everyday scientific life that are nonetheless casually referred to as revolutions.

These issues are (hotly) debated, but I feel we should make a distinction between scientific refinement and scientific revolutions. To me, Bayesian analysis is a systematic way to refine a numerical estimate of the relative probability of an idea about Nature compared to other ideas that could be correct. The prior probability of the best of these alternatives should asymptotically with increased amounts of data (as schematically shown in the figure below), unless something's wrong with the conceptualization of the problem. I think this is conceptually very different from having a given scientific 'paradigm' replace another

*with which it is incommensurable.*

Where it's useful, Bayesian analysis is about altered ideas among what are clearly commensurable hypotheses--based on different values of the same parameters. Usually, the alternative hypotheses are not very different, in fact, so that for example, a coin has some bias in its probability of Heads, ranging from no-chance to fair (50%) to inevitable; but assuming such things as that the flips are all done the same way and the results generated by flipping are probabilistic by nature.

In my view, Bayesian analysis is a good way to work through issues

*within*a given theoretical framework, or paradigm, and it has many strong and persuasive advocates. But is not a way to achieve a scientific revolution nor does it reflect one. Sometimes the idea is used rather casually, as if formal Bayesian analysis can adjudicate between truly incomparable ideas; there, to me, we simply must rely on our subjective evaluations. One can't, of course, predict when or even whether a truly revolutionary change--a paradigm shift, if you will--will occur, or even if such is needed.

Ptolemaic epicycles added accuracy to the predictions of planetary motion, at the price of being cumbersome. One could have applied Bayesian analysis to the problem at the time, had the method been available. The Copernican revolution changed the basic structure of the underlying notion of what was going on. One might perhaps construct Bayesian analysis that would evaluate the differences by somehow expressing planetary positions in probabilistic terms in both systems and allow one to pick a preference, but I think this would be rather forced--and, most importantly, a

*post hoc*way to evaluate things (that is, only

*after*we have both models to compare). In fact, in this case one wouldn't really say one view was true and the other not--they are different ways of describing the same motions of bodies moving around in space relative to each other, and the decision of how to model that is essentially one of mathematical convenience.

I think the situation is much clearer in biology. Creationist ideas about when and where species were created or how they related to each other in terms of what was called the Great Chain of Being, could have been adjusted by Bayesian approaches as, for example, the dates of fossils being discovered could refine estimates of when God created the species involved. But Bayesian analysis is inappropriate for deciding whether creationism or evolution is the best hypothesis for accounting for life's diversity in the first place. The choice in both approaches would be a subjective one, but without artificial contortions the two hypotheses are not probabilistic alternatives in a very meaningful sense. That's what incommensurability, which applies in this case I think, implies. You can't very meaningfully assign a 'probability' to whether creationism or evolution is true, even if the evidence is overwhelmingly in favor of the latter.

**Current approaches**

These posts express my view of the subject of scientific theory, after decades of working in science during periods of huge changes in knowledge and technology. I don't think that scientific revolutions are changes in prior probabilities, even if they may reflect them, but are more and different from that. From this viewpoint, advocates for Bayesian analysis in genomics are refining, but not challenging the basic explanatory framework at all. One often hears talk of paradigm shifts and use of similar 'revolution' rhetoric, but basically what is happening is just scaling up our current "normal science", because we know how to do that, not necessarily because it's a satisfactory paradigm about life. And there are many reasons why that is what people normally do, more or less as Kuhn described. I don't think our basic understanding of the logic of evolution or genetics has changed since I was a graduate student decades ago, even if our definition of a gene, or modes of gene frequency change, or our understanding of mechanisms have been augmented in major ways.

It is of course possible that our current theory of, say genomic causes of disease,

*is*truly true, and what we need to do is refine its precision. This is, after all, what the great Big Data advocacy asserts: we are on the right track and if you just give us more and more DNA sequence data we'll get there, at least asymptotically. Some advocate Bayesian approaches to this task, while others use a variety of different statistical criteria for making inferences.

Is this attitude right for what we know of genomics and evolution? Or is there reason to think that current "normal science" is pushing up against a limit and only a true conceptual revolution, one essentially incommensurate with, or not expressible in the terms of, our current models? In past posts, we've suggested numerous reasons why we think current modes of thought are inadequate.

It's all too easy to speak of scientific revolutions (or to claim, with excitement, that one is in the midst of creating one if only s/he can have bigger grants, which is in fact how this is usually expressed). It's much harder to find the path to a real conceptual revolution.

## 6 comments:

Off topic - I am reading your 'Genetics and Logic of Evolution', and find it thought-provoking. This is exactly the book I have been looking for !

Manoj

Reply to Manoj

Thanks for the compliment. In The Mermaid's Tale published a few years later we tried to be more explicit about principles of life that we think are every bit as important to understanding the nature of life and its evolution, building (we hope) on Genetics and the Logic of Evolution. And in a sense, to assert such things was a motive for starting our blog.

Of course, we may be wrong in much of what we say, but we think things that aren't being discussed ought to be....

That ain't a blog, that's a thesis!

Great stuff. You might find this paper to be interesting:

http://www.stat.columbia.edu/~gelman/research/published/philosophy.pdf

Reply to Steve

Thanks! I'll take a look.

Reply to Onuralp

Thanks for this; the link is to a commentary that shows the breadth of scope that these thoughts can have. Most of human life and society is, I guess, more about engineering what we want based on what we think we know, rather than on deeper creativity.

Post a Comment