Wednesday, November 28, 2018

Induction-deduction, and replicability: is there any difference?

In what sense--what scientific sense--does the future resemble the past?  Or perhaps, to what extent does it?  Can we know?  If we can't, then what credence for future prediction can we give to results of studies today, necessarily from the past experience of current samples?  Similarly, in what sense can we extrapolate findings on this sample to some other sample or population?  If these questions are not easily answerable (indeed if they are answerable at all!), then much of current, and currently very widespread and expensive science, is at best of unclear, questionable value.

We can look at these issues in terms of a couple of standard aspects of science: the relationship between induction and deduction; and the idea of replicability.  Induction and deduction basically come from the Enlightenment time in western history, when it was found in a formal sense that the world of western science--which at that time meant physical science--followed universal 'laws' of Nature.  At that time, life itself was generally excluded from this view, not least because it was believed to be the result of ad hoc creation events by God.

The induction--deduction problem
Some terminology:  I will make an important distinction between two terms.  By induction I mean drawing a conclusion from specific observed data (e.g., estimating some presumed causal parameter's value).  Essentially, this means inferring a conclusion from the past, from events that have already occurred. But often what we want to do is to predict the future.  We do that, often implicitly, by equating observed past values as estimates of causal parameters, that apply generally and therefore to the future; I refer to that predictive process, derived from observed data, as deduction.  So, for example, if I flip a coin 10 times and get 5 Heads, I assume that this is somehow built into the very nature of coin-flipping so that the probability of Heads on any future flip is 0.5 (50%).

If we can assume that induction implies deduction, then what we observe in our present or past observations will persist so that we can predict it in the future.  In a law-like universe, if we are sampling properly, this will occur and we generally assume this means with complete precision if we had perfect measurement (here I speculate, but I think that quantum phenomena at the appropriate scale have the same universally parametric properties).

Promises like 'precision genomic medicine', which I think amount to culpably public deceptions, effectively equate induction with deduction: we observe some genomic elements associated in some statistical way with some outcome, and assume that the same genome scores will similarly predict the future of people decades from now.  There is no serious justification for this assumption at present, nor quantification of by how much there might be errors in assuming the predictive power of past observations, in part because mutations and lifestyle clearly have major effects, but especially because these are unpredictable--even in principle.  Indeed, there is another, much deeper problem of a similar kind, that has gotten recent--but to me often quite naive attention: replicability.

The replicability problem
Studies, perhaps especially in social and behavioral fields, report findings that others cannot replicate.  This is being interpreted as suggesting that (ignoring the rare outright fraud), there is some problem with our decision-making criteria, other forms of bias, or poor study designs.  Otherwise, shouldn't studies of the same question agree?  There has been a call for the investigators involved to improve their statistical analysis (i.e., keep buying the same software!! but use it better), report negative results, and so on.

But this is potentially, and I think fundamentally, naive.  It assumes that such study results should be replicable.  It assumes, as I would put it, that at the level of interest, life = physics.  This is, I believe not just wrong but fundamentally so.

The assumption of replicability is not really different from equating induction to deduction, except in some subtle way applied to a more diverse set of conditions.  Induction of genomic-based disease risk is done on a population like, say, case-control samples, and then applied to the same population in terms of its current members' future disease risks.  But we know very well that different genotypes are found in different populations, so it is not clear what degree of predictability we should, or can, assume.

Replicability is similar except that in general a result is assumed to apply across populations or samples, not just to the same sample's future.  That is, I think, an even broader assumption than the genomics-precision promise that does, at least nominally, now recognize population differences.

The real, the deeper problem is that we have absolutely no reason to expect any particular degree of replicability between samples for these kinds of things.  Evolution is about variation, locally responsive and temporary, and that applies to social behavior as well.  We know that 'distance' or difference accumulates (generally) gradually over time and separation as a property of cultural as well as biological evolution.  The same obviously applies even more to psychological and sociological samples and inferences from them.

Not only is it silly to think that samples of, say, this year's college seniors at X University will respond to questionnaires in the same way as samples of some other class or university or beyond.  Of course, college students come cheap to researchers, and they're convenient.  But they are not 'representative' in the replicability sense except by some sort of rather profound assumption.  This is obvious, yet it is a tacit concept of very much research (biological, psychological, and sociological).

Even social scientists acknowledge the local and temporary nature of many of the things they investigate, because the latter are affected by cultural and historical patterns, fads, fashions, and so much more.  Indeed, the idea of replicability is to me curious to begin with.  Thus, a study that fails to replicate some other study may not reflect failings in either, and the idea that we should replicate in this kind of way is a carryover of physics envy.  Perhaps in many situations, a replication result is what should be examined most closely! The social and even biological realms are simply not as 'Newtonian', or law-like, as is the real physical realm in which our notions of science--especially the very idea of a law-like replicability, arose. Not only is failure to replicate not necessarily suspect at all, but replicability should not generally be assumed.  Or, put an other way, a claim that replicability is to be expected is a strong claim about Nature that requires very strong evidence!

This raises the very deep problem that in the absence of replicability assumptions, we don't know what to expect of the next study, after we've done the first.....or is this a justification for just keeping the same studies going (and funded) indefinitely?  That's of course the very rewarding game being played in genomics.

No comments: