Tuesday, January 14, 2014

In search of a comb; Heraclitus on genetics

I was recently reading "The Collected Wisdom of Heraclitus" (BC 535-475), perhaps the first Greek philosopher known.  A couple of his thoughts, called fragments, seemed apt for the work and ideas that we have and try to pursue on MT.

Fragment #8 is translated in the version I have (B. Haxton, 2001) as

    "Men dig tons of earth
      to find an ounce of gold."

This calm bit of wise observation is relevant to the activity of any vibrant science.  In the first place, we do have to dig through tons of studies and results to find the gem that leads us forward.  In a sense, the huge effort to map genomic effects for everything, physical or behavioral, in every living creature, is the work of a huge army of pick-and-axe miners.  They may often be of comparable skill, and their sweat and toil may seem (to those who have even a modicum of reflection) rather worthless drudgery.  Or they may fancy that each flake that glimmers is their nugget, that gives importance to their lives or at least their careers.

Mineshaft; Wikipedia

But as the mineshaft gets longer and deeper, and the day carries on past the lunch hour, and as day after day passes, someday, somewhere, in some mine the golden moment will befall the lucky miner.  An important question is where, how, or even whether the many shafts that do not turn up a nugget are worth the effort to dig them.  When society is paying for the mining operation, those investors (i.e., you and I), should ask that wholly legitimate question.  It is of course unclear when it is clear that the digging is a fool's errand, probing where no vein of importance may lie.  Those who criticize the blind pursuit of pyrite, as we often do of the rather factory-like scaling up of incremental data mining, are essentially making the judgment that other areas ought to be tapped instead, ones where we have more indication that real ore is to be found.

This, of course, is as much a gamble as is the view that if we keep digging sooner or later we'll find something (besides eventually exiting somewhere in China).  There is certainly a lot of self-interest in the view that serendipidity will eventually yield either gold or, perhaps, something else like, say, shale gas.  Those who oppose this view note that serendipitous findings can arise from any sort of activity that mines the unknown to find truth, and argue that many areas are known where a more concentrated, focused effort is likely to find a larger vein.

But this then brings another of Heraclitus' thoughts to mind.

Combing through the tangle
Aphorism  #50:

    "Under the comb
      the tangle and the straight path
      are the same."

The message here is a bit more abstract, but I think relevant.  Today, Big Data is a catch (-a-longterm-grant) phrase that is based on what is essentially no theory except the notion that more means better and that massive induction (omnibus data collection) will lead to fundamental insights.  Such a belief has some historical support but also is undermined as an approach, depending on what aspect of science history you would like to cite.

The idea of major data collection is that it provides clearly incremental contributions as each terabyte of new data is added.  It's not too much of an oversimplification to say that what we see is a tangle of relationships among the data.  The current idea is that we will apply a statistical 'comb', basically a linear model, and that will untangle the data and show the straight path of Truth.

A linear model is basically a cause-effect view that

                                          Cause = a*Measure1 + b*Measure2 + ....

where a are numbers (mathematical coefficients) and the Measures are genotypes, environmental exposures, and the like.  Not all models are totally linear, but the differences are usually very slight and, more importantly, the models have little or no actual causal theory underlying them.

Under these conditions, if there is any connection between causes we decide to measure (for whatever good or bad or no reason) and the measured putative causes, a statistical association may result that can reach some significance cutoff, or its conceptual equivalent, if samples are large enough and properly chosen.  This is not false evidence, though it can be caused by confounding factors.  For example, drivers of BMWs are surely more likely to have a subset of diseases than the general population, or perhaps statistically less likely, just because they are wealthier and all that goes with that--but not because of their BMW.

Heraclitus; Capitoline Museum in Rome; Wikimedia

In this sense, while the linear 'comb' may reveal associations, its straightening effect may distort, mislead, or divert attention.  Confounding is confoundedly difficult to untangle, especially if countless factors are interacting and these may change over time (so that today's Big Data become irrelevant tomorrow).

More interesting, to me at any rate, is that some tangles really are not linear.  There are a multitude of genomic effects that are not cis (that is, not due to the direct effects of nucleotides along the DNA strand of a given chromosome), but are trans (due to interactions between chromosomes).  We identified a set of paradoxes, conundrums, and other curiosities in a series of posts a couple of months ago (three posts, starting here).

The question here is whether to satisfy Heraclitus' notion, a comb may not be the right metaphor.  It may be that what a 'comb' of the right sort does is show that the tangle cannot just be made into a straight path.

The idea that Nature's path is a straight one is long-standing.  It goes back, essentially to the classic geometers and others, through Newton's shortest-distance theorem.  The path of light rays.  Parallel lines never crossing. Forces on objects viewed in isolation from all other forces or objects.  In fact, a lot of modern science has shown that this is not entirely correct.  Interactions are not always linear, and there are many phenomena that will impede the passage of a comb through the tangle.

In our view, coming to terms with this likely structure of Nature is the major challenge of our time.  Massive induction by recent 'omics (genomics, microbiomics, proteomics,.....you name it!) has shown us that this is true, and so hasn't been entirely wasted. But science has momentum just as other cultural components do, and there is huge momentum today to continue the linear approach.  It's how we keep our jobs.  What we need--we think, at least--is the lucky insight by the 'miner' who sees something the rest of us haven't seen, and lights a light that guides us into a new direction.  This is our thought for the coming year.


Anonymous said...

The giants of philosophy/science of yore had no massive database to mine yet produced some of the most insightful perceptions and truths ... what are we missing?

Ken Weiss said...

In part, we're substituting endless digging for a moment's reflection. Or, we're digging the gold of our careers, where we find the vein. That is a profound truth about human society, perhaps. Details such as specific disease cause and cure, are peripheral and ephemeral relative to our need to bring home the proverbial bacon.