Comments on The Mermaid's Tale: Post-truth science?

This will take a short route to what deserves a lo...

2017-01-03T08:46:53.062-05:00

This will take a short route to what deserves a long answer. I can't respond to some issues you raise, because I am not a formal statistician My point is not a complaint about statistics, but about the use when the assumptions do not fit, or not to a known or even knowable degree. We are investing heavily in strong promises and expensive but very weak mega-projects, in my view. Of course, strong patterns can be found, and even under those conditions, truly useful causative agents. But in general with all-purpose mega-studies it is less clear about whether, especially with weak causation which is by far the rule, that means causation or correlation which is relevant because it relates to whether retrospective fitting implies predictive power, which is what is being promised.

Competing risks happen to be something I have worked with, off and on, for decades. It is as you say, but that is a double edged sword because the assumptions underlying retrospective fitting (again) don't allow automatic projection into the future. And there is a plethora of outcomes involved, and nearly countless alternative pathways to similar outcomes. Secular trends in causes of death and disease make that point very clearly, in my view, because while they (may) allow retrospective fitting, and may allow crude (to some, usually unknowable extent) prediction, they do not automatically allow clear or known or often not even knowable personal prediction. Also, as to your final point, pervasive is a key word if in using it you essentially mean cause (of the 'pervasive effects') that is always present. But in genetics and evolutionary biology this is not always the case (if it were, in a sense there would be no evolution).

These are the issues we try to raise. They also entail a difference between causal replicability, and the statistical aspects of measurement and sampling (for which, in a sense, statistical testing was developed). To me, Pearl's approach is far too formalized to be of much reliability since we don't know when or to what extent it applies: how would we know? Typically, if not necessarily, it is by replication or some other subsequent test, but replication is not required of evolution, which is a process of differentiation.

Anyway, these are to some extent at least matters of one's outlook. To me, the cogent point is to make clear to those (even in biology and genetics) who don't think carefully, that promises of genomic 'precision' and 'personalized' genomic medicine are far more about marketing, in ways that would make PT Barnum proud, than they are about serious science. Biology and genetics work well in those causal situations where their (often implicit) assumptions are more or less met, since we may not care about the details. But much of this area is about very weak, very numerous, contemporary contributory factors, and there the methods do not work in ways that, one might say, can be proved or often even tested cogently.

Given the differences in our background, it is qui...

2017-01-02T17:39:58.895-05:00

Given the differences in our background, it is quite possible that I was not clear.

Standard statistical sampling is “wholesale” and is not designed to explain singular/rare events. Case-based sampling (e.g. case-control), however, is useful for studying the causes of rare outcomes. In my very limited experience using proteomics the point was to generate candidates for interventions or other amelioratory measures. The biologists and engineers took over from there. Since we were looking for population level effects, rare events weren’t as interesting.

Premises are just that, premises, and can be changed at will. Many of "standard" introductory
statistical methods were chosen for mathematical tractability, not necessarily because they are required or appropriate for a specific application. (Fisher himself noted that) Distributional approximations often fall into that class. They can be useful but are not required. Modern computing makes that less of a burden.

As you note, significant difference tests are not particularly conducive to model (theory) building, but they are not supposed to be. The push for additivity (e.g in Two-way Anova and Randomized Complete Blocks) is a hunt for regularities (“significant sameness”). Part of statistical modeling involves choosing transformations or changing the measurement to get additivity. Effects that aren’t separable on some metric scale can be quite separable and regular on an ordinal scale (This happens in marketing when you switch from, say, 7 point liking scales to discrete choice scales. This is a classic example where the attempt to measure non-numeric states confounds the model. )

Statistical tests focus on the deductive part of the process. You find out where the theory doesn’t fit the data and get more data or generate new hypotheses. Repeatable variation/differences is the target. “Statistical significance” for one-off studies is only part of that.

My experience suggests Pearl's methods are quite useful for predicting outcomes and for hypothesis generation in a marketing/product development context, and appear to be quite useful in epidemiology, too. They are built upon non-additivity and non-parametric models. Pearl is quite explicit that statistics is not meant for causal inference. A reference to Rothman’s and Pearl’s work in evolutionary biology is in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4063485/

WRT to Ann’e comment, multiple causal pathways in epidemiology are also known as Competing Risks when the outcome is death/failure. Different potential causands can be on each path, as in Bernoulli's smallpox model. In product design, you have bundles of benefits, and the trick is to balance the benefits to beat the competitions’ bundles. My original training and interest in predator-prey and competition models transferred quite nicely to epidemiology and from there to marketing models.

To your closing comment, I think we both agree that statistical tools are not about individual level effects. They are quite good at picking up smaller pervasive effects.

I'll basically second Anne's reply. Stat...

2016-12-31T15:52:25.864-05:00

I'll basically second Anne's reply. Statistical sampling in an evolutionary context leads to analysis that violates--to generally an unknown or unknowable extent--various premises that relate to interpretation, such as about error distributions, additivity and so on. And, as Anne says, the complex of factors is not the same, or are their relative proportions even among similar elements. Large or overwhelming numbers of causal contributors, even assuming the truth of an essentially genetic causal model (i.e., forgetting about environmental factors, somatic mutation, and so on), are too weak or rare individually to be included or to generate statistical significance. That is why multiple-causal models, while almost certainly true in principle, are problematic. The causes are not a fixed enumerable set, and it is their interaction (often too subtle to show up in additive models) that may be key to their net effect.

To me, as a rather 'philosophical' point, we do not have a very useful, much less precise, causal theory in most cases, for the statistical approach usually taken to be as apt as it's treated to be. We make what I call 'internal' comparison--like between obese and normal, case and control, and so on--and test for 'difference', rather than testing for fit to a serious-level causal theory. That is, our statistics are not just about measurement and sampling errors, which may follow reasonable distributional assumptions. If there is some underlying theory we don't know how to separate that from the measurement sorts of issues. And if the theory is that, based on the fact that the essence of evolution is about the dynamics of difference, rather than the repetition of similarity, then to me statistical models are just off the mark. For strong causation, it doesn't always matter, but this may not be so for weak causation, we have little way of knowing when, whether, why or how much it matters. And even for strong causative factors things are problematic. This is the difference between physics and biology, in a sense, in my view.

I am far from any sort of expert in the modeling things you mention (though I think my comments are pertinent relative to Pearl's ideas about causation, and we might have to debate how effective things are even in the epidemiological context. The latter would depend on the objectives, which often don't require individual risk or formal understanding of causation etc. and which work effectively when causation is rather singular and strong (e.g., a particular virus; smoking; airborne lead or asbestos).

Or, perhaps, I have failed to understand your comment!

Bill, thanks for your comment. I'll answer br...

2016-12-31T15:38:17.473-05:00

Bill, thanks for your comment. I'll answer briefly, but I think Ken is going to weigh in as well, it being his list.

The problem in #10 is less about the problem with induction than about the problem of using statistical methods that depend on a large number of occurrences of a risk factor to determine its role in causation. Too rare and it's either not going to occur in your sample, or it won't reach statistical significance, even if it's 100% the cause in one or a few families.

I'm not quite sure of the point you're making with respect to #16. That it's possible to build statistical models that take multiple causes into account? The problem with that is that these kinds of models are built on the assumption that every case is the result of the same set of interacting causes, and we know in genetics that this isn't true. There are often many pathways to the 'same' phenotype.

But, again, Ken can better clarify his points.

Hello Anne, Could you expand on two of your endn...

2016-12-31T13:15:10.783-05:00

Hello Anne,

Could you expand on two of your endnotes?
#10 - "enumerating causation by statistical sampling is often impossible because rare variants ..." Are you referring to the fallibilism of statistical (or any ) induction? You, the scientist, get to set the sampling size and criteria and as with any instrument the capabilities are finite.

#16 - "Our reductionist models, even those that deal with networks, badly under-include interactions and complementarity. We are prisoners of single-cause thinking ..." I am not that familiar with biological models, but epidemiological and marketing networks are quite capable of representing multiple component (INUS) causation (e.g. Rothman's causal pies/sufficient components and Pearl's networks). Complimentarity (as I understand it) is also included, where a single "causand" can both inhibit and disinhibit an outcome, depending on context in it incurs. If you require "statistical significance" to include an arc, your ability to detect interactions will be limited by your sample.

In practice, I personally had no problem ordering nodes based on theoretical or other considerations. The proof of a network was in its (out-of-sample) predictions.

Thanks