Wednesday, April 2, 2014

Entropy and context-dependency: an epidemiological dilemma

Yesterday we discussed the problems facing genetic and environmental epidemiology (and, though we did not say it, many other fields of science as well).  It had to do with complexity and the difficulty of isolating causal variables and showing their effect.  One issue is not just that our current big-data statistical approaches are not doing adequately well at present, but whether we face inevitable limits to what we can know--now or ever!  As we said yesterday, one should never say "never" in science, but as we also said, it is perhaps possible that there are some 'nevers' in epidemiology.  To see this, we think an appeal to an analogy from physics--that may even apply literally to our field--will hopefully at least make the ideas we were trying to express clear.

An entropic universe
An important concept in physics is called 'entropy'.  Entropy refers to the evenness of energy or matter. As formally defined on its Wikipedia page, entropy is "a measure of the number of specific ways in which a thermodynamic system may be arranged, often taken to be a measure of disorder, or a measure of progressing towards thermodynamic equilibrium."  As we understand what cosmologists say, in the universe at the time of the Big Bang, everything was concentrated into a pin-head size volume.  This is shown in the figure.  Everything was highly orderly and in a degenerate sense everything could be arranged in only one or a small number of ways.  But as the universe expanded, in a statistical sense 'everything' started to get splattered out.  There are an ever increasing number of ways the same elements could be arranged in this ever-growing space.

Arrows show the order of cosmic time
Initially there were higher and lower density regions and still today there are concentrations, like galaxies and planets (and us!).  But by now, 14 billion years later and overall, the distribution of matter and energy in the cosmos is distributed very uniformly in all directions, to 999 parts in a thousand.  Only 0.1% is non-uniform--and that is the clustered stuff we see, the stars and planets and so on here and there in space.  It is this evenness and yet the existence of some relatively miniscule lumps of stuff, like the solar system and us, that in part led to the concept of cosmic 'inflation', the claimed confirmation of which made all the news recently.  And these arrangements are coming into and going out of existence all the time.

Entropy is in a sense also used as the very definition of the directional arrow of time itself.  Things went from very highly concentrated and in that sense organized, to very widely and evenly distributed, and in that sense disorganized.  The Second 'Law' of Thermodynamics asserts this phenomenon.  Although probabilistically, things could get re-organized, the overwhelming probability is that they'll just spread out further and further as the universe expands. In other words, grey paint could separate out into the black and white paint from which it was made, just by chance, but the probability is so small compared to the probability that mixed paint will be ever more evenly mixed (except for very local, instantaneously fleeting re-separation).  That overwhelming probability gives the cosmos its time direction: time goes from more to less organized.  Eventually, the universe will be essentially entropic: cold and dead, at least as we understand what cosmologists think.

Doing anything like work or concentrating matter or energy requires negative entropy--uneven concentrations that are fundamentally required in order to leverage change you might want to make.  We concentrate an explosion in the cylinders of our car, using the potential energy in our fuel, to drive the pistons.  But then the resulting heat is dissipated through work and out the exhaust, and can do no more work.

There are lots of debates in physics about entropy and whether there can be any escape from its limits, and, naturally, there are some who claim, by invoking various kinds of arguments, that this might be possible.  But the strong consensus seems to be that this is not so, at least in the universe as we live in it.

But what (if anything) has this to do with genomics or epidemiology?

An 'arrow' of limits for epidemiological sciences, too?
Statistical methods such as those that we almost inevitably, or even necessarily, have to use in genetic, epidemiological, evolutionary and social sciences, are based on finding associations between measured variables.  The fundamental underlying assumption is replication, that a cause and effect will be associated with each other in samples we examine: the presence of a cause will more often be observed in the presence of the effect--say, a disease or given level of blood pressure with some genotype or dietary factor--more often than would occur by chance.  'Just by chance' means as if the exposures and outcomes were totally scrambled out there in the real world.

If a true cause is concentrated in a subset of a population, its effect will also be concentrated, and we can observe the difference in frequencies if we know how to look for it.  Our statistical models say test whether an observed association could have happened just by chance, but if what we see is 'unusual' according to some evidentiary cutoff criterion we choose to define, then we assume we have detected a true causal link.   If it's there and is strong enough, it should be easy to find, at least in principle, if we design the right kind of sample, measurement, analysis--and appropriate statistical cutoff criterion.

But if some outcome, as we define it, is the result of many different causes, and even if each is individually concentrated (e.g., not all people smoke or eat McFood or have a given genotype), if the causes are scrambled up among people to too great an extent, we might say by analogy with cosmology, that the 'universe' of interest is essentially in an entropic state.  There simply isn't enough concentration of 'cause' to be a useful source of the 'work' of causation.  'Useful' here means that we can find the cause, predict its results, or do something about it.

Another common way to describe this in terms of our computer age is to say that there is a lot, not of concentrated mass or energy, but of 'information' in the system.  Entropy has had wide uses in computer sciences and even various applications in evolution (e.g., measurements of the evenness of variation at a gene in a population).  It has even been used, in false desperation, by creationists to say that life violates the Second Law and hence must have a divine origin.  But here, we're applying this concept somewhat differently, though rooted in ideas about information because what we want in our studies is information about cause and effect.  Information in this sense is a logical relationship among measurements that is a reflection of a physical one among the things measured.

Epidemiologic entropy may or may not map clearly onto the same concept in physics, but it may be a relevant and helpful heuristic to help understand the enigmatic nature of complex traits in the face of the major assault being waged at it these days.

Could 'never' really mean never here?
It is dangerous if not foolhardy to declare that science can never achieve something, because that is often just what happens shortly after the declaration is made.  But we may be on a bit safer ground here because, like the limiting speed of light, the idea of cosmic entropy seems very well grounded in physics.  It is held to be universally and literally true, not just a guess or idea shoehorned into the data.

By extension, if the causal elements related to some effect we see, or define as being real or something we care about, are highly entropic, there simply may be no way to concentrate causation in order to identify its organization, as outside observers.  As in the Second Law, there are an uncountable number of different ways the causal components could be arranged among people (we studying this are somewhat like epidemiological Maxwell's demons, for readers who know that thought experiment in physics).  We use statistical approaches to leverage an answer by identifying or sorting concentrations of causation (regression analysis for example) and quantifying its association with effect.

Here the concept of context-specificity is key:  The effect of one measured factor depends on its context, that is, its values for other relevant factors (measured or un-measured).  if every person has a unique mix of causal factors and exposures related to some outcome of interest, everyone will be unique relative to that outcome, and this casually 'entropic' state of affairs might, perhaps even in principle, defy statistical approaches to identify causation in the concentration-based way that we currently employ.  If each person were unique in terms of relevant causal factors, our  means of testing our assertion about those factors will be relatively futile, and prediction may even be literally impossible. For example, like stars and galaxies in space, both genotype and environmental states are always changing coming into and going out of existence.

Science has a deep belief that this is not the state of affairs, and indeed, things are not entirely entropic even for vague traits like the ones whose causal elusiveness fills the press.  There do seem to be some causal 'galaxies'--subsets of identified causal concentration sufficient for detection.  But to a great extent, causation does seem to be entropic.

This, like all analogies, is an imperfect way to think of life.  Unlike the cosmos, it is not clear that the entropic state of our epidemiologic analogy is always increasing, or that it produces anything analogous to entropy's defining the direction of time itself.  So we have to be careful using the analogy. Still, it may help us think about what we are trying to understand.

If these ideas are useful they may help us think of ways to approach the problem or even to redefine our questions and objectives.  Even if we were to find that causal entropy is what we face, we'd have to hope some clever people would re-think our questions, or our objectives.

No comments: