Showing posts with label physics. Show all posts
Showing posts with label physics. Show all posts

Friday, August 26, 2016

Is life itself a simulation of life?

It often happens in science that our theory of some area of reality is very precise, but the reality is too complex to work out precisely, or analytically.  This can be when we decide to use computer simulation of that reality to get at least a close approximation to the truth.  When a phenomenon is determined by a precise process, then if we increase the complexity of our simulation, and if the simulation really is simulating the underlying reality, then the more computer power we apply, the closer we get to the truth--that is, our results approach that truth asymptotically.

For example, if you want to predict the rotation of galaxies in space relative to each other, and of the stars within the galaxies, the theories of physics will do the job, in principle. But solving the equations directly the way one does in algebra or calculus is not possible with so many variables.  However, you can use a computer to simulate the movement and get a very good approximation (we've discussed this here, among other places).  Thus, at each time interval, you take the position and motion of each object you want to follow, and those measures of nearby objects, and use Newton's law of gravity to predict the position of the objects one time interval later.

If the motion you simulate doesn't match what you can observe, you suspect you've got something wrong with the theory you are using. In the case of cosmology, one such factor is known as 'dark matter'.  That can be built into models of galactic motion, to get better predictions.  In this way, simulation can tell you something you didn't already know, and because the equations can't be directly solved, simulation is an approach of choice.

In many situations, even if you think that the underlying causal process is deterministic, measurements are imperfect, and you may need to add a random 'noise' factor to each iteration of your simulation.  Each simulation will be slightly 'off' because of this, but you run the same simulation thousands of times, so the effect of the noise evens out, and the average result represents what you are trying to model.

Is life a simulation of life?
Just like other processes that we attempt to simulate, life is a complex reality.  We try to explain it with the very general theory of evolution, and we use genetics to try to explain how complex traits evolve, but there are far too many variables to predict future directions and the like analytically.   This is more than just because of biological complexity however, in part because the fundamental processes of life seem, as far as we can tell, inherently probabilistic (not just a matter of measurement error).  This adds an additional twist that makes life itself seem to be a simulation of its underlying processes.

Life evolves by parents transmitting genes to offspring.  For those genes to be transmitted to the next generation, the offspring have to live long enough, must be able to acquire mates, and must be able to reproduce. Genes vary because mutations arise.  For simplicity's sake, let's say that successful mating requires not falling victim to natural selection before offspring are produced, and that that depends on an organism's traits, and that genes are causally responsible for those traits.  In reality, there are other process to be considered, but these will illustrate our point.

Mutation and surviving natural selection seem to be probabilistic processes.  If we want to simulate life, we have to specify the probability of a mutation along some simulated genome, and the probability that a bearer of the mutation survives and reproduces.  Populations contain thousands of individuals, genomes incur thousands of mutations each generation, and reproductive success involves those same individuals.  This is far too hard to write tractable equations for in most interesting situations, unless we make almost uselessly simplifying assumptions.  So we simulate these phenomena.

How, basically, do we do this?  Here, generically and simplified, but illustrating the issues, is the typical way (and the way taken by my own elaborate simulation program, called ForSim which is freely available):

For each individual in a simulated population, each generation, we draw a random number based on an assumed mutation rate, and add the resulting number and location of mutations to the genotype of the individual.  Then for each resulting simulated genotype, we draw a random number from the probability that such a genotype reproduces, and either remove or keep the individual depending on the result.  We keep doing this for thousands of generations, and see what happens.  As an example, the box lists some of the parameter values one specifies for a program like ForSim.



Sometimes, if the simulation is accurate enough, the probability and other values we assume look like what ecologists or geneticists believe is going on in their field site or laboratory.  In the case of humans, however, we have little such data, so we make a guess at what we think might have been the case during our evolution.  Often these things are empirically estimated one at a time, but their real values affect each other in  many ways.  This is, of course, very far from the situation in physics, described above!  Still, we at least have a computer-based way to approximate our idea of evolutionary and genetic processes.

We run this for many, usually many thousand generations, and see the trait and genomic causal pattern that results (we've blogged about some of these issues here, among other posts).  This is a simulation since it seems to follow the principles we think are responsible for evolution and genetic function.  However, there is a major difference.

Unlike simulations in astronomy, life really does seem to involve random draws for probabilistic processes.  In that sense, life looks like it is, itself, a simulation of these processes.  The random draws it makes are not just practical estimates of some underlying phenomenon, but manifestation of the actual probabilistic nature of the phenomenon.

This is important, because when we simulate a process, we know that its probabilistic component can lead to different results each time through.  And yet, life itself is a one-time run of those processes. In that sense, life is a simulation but we can only guess at the underlying causal values (like mutation and survival rates) from the single set of data: what actually happened its one time through.  Of course, we can test various examples, like looking at mutation rates in bacteria or in some samples of people, but these involve many problems and are at best general estimates from samples, often artificial or simplified samples.

But wait!  Is life a simulation after all?  If not, what is life?
I don't want us to be bogged down in pure semantics here, but I think the answer is that in a very profound way, life is not a simulation in the sense we're discussing.  For the relevant variables, life is not based on an underlying theoretical process in the usual sense, of whose parameters we use random numbers to approximate in simulations.

For example, we evaluate biological data in terms of 'the' mutation rate in genomes from parent to offspring.  But in fact, we know there is no such thing as 'the' mutation rate, one that applies to each nucleotide as it is replicated from one generation to the next, and from which each actual mutation is a random draw.  The observed rate of mutation at a given location in a given sample of a given species' genomes depends among other things on the sex, the particular nucleotides surrounding the site in question (and hence all sites along the DNA string), and the nature of the mutation-detection proteins coded by that individual's genome, and mutagen levels in the environment.  In our theory, and in our simulations, we assume an average rate, and that the variation from that average will, so to speak, 'average out' in our simulations.

But I think that is fundamentally wrong. In life, every condition today is a branch-point for the future. The functional implications of a mutation here and now, depend on the local circumstances, and that is built into the production of the future local generations.  Life in fact does not 'average' over the genome and over individuals does not in fact generate what life does, but in a sense the opposite.  Each event has its own local dynamics and contingencies, but the effect of those conditions affects the rates of events in the future.  Everywhere it's different, and we have no theory about how different, especially over evolutionary time.

Indeed, one might say that the most fundamental single characteristic of life is that the variation generated here today is screened here today and not anyplace else or any time else.  In that sense, each mutation is not drawn from the same distribution.  The underlying causal properties vary everywhere and all the time.  Sometimes the difference may be slight, but we can't count on that being true and, importantly, we have no way of knowing when and to what extent it's true.

The same applies to foxes and rabbits. Every time a fox chases a rabbit, the conditions (including the genotypes of the fox and rabbit) differ. The chance aspect of whether it's caught or not are not the same each time, the success 'rate' is not drawn from a single, fixed distribution.  In reality, each chase is unique.

After the fact, we can look back at net results, and it's all too tempting to think of what we see as a steady, deterministic process with a bit of random noise thrown in.  But that's not an accurate way to think, because we don't know how inaccurate it is, when each event is to some (un-prespecified) extent unique.  Overall, life is not, in fact, drawing from an underlying distribution.  It is ad hoc by its very nature and that's what makes life different from other physical phenomena.

Life, and we who partake of it, are unique. The fact of local, contingent uniqueness is an important reason that the study of life eludes much of what makes modern physical science work.  The latter's methods and concepts assume replicable law-like underlying regularity. That's the kind of thing we attempt to model, or simulate, by treating phenomena like mutation as if they are draws from some basic underlying causal distribution. But life's underlying regularity is its irregularity.

This means that one of the best ways we have of dealing with complex phenomena of life, simulating them by computer, smoothes over the very underlying process that we want to understand.  In that sense, strangely, life appears to be a simulation but is even more elusive than that.  To a great extent, except by some very broad generalities that are often too broad to be very useful, life isn't the way we simulate it, and doesn't even simulate itself in that way.

What would be a better approach to understanding life?  The next generation will have to discover that.

Monday, November 30, 2015

Quantum spookiness is nothing compared to biology's mysteries!

The news is properly filled these days with reports of studies documenting various very mysterious aspects of the cosmos, on scales large and small.  News media feed on stories of outer space's inner secrets.  We have dark matter and dark energy that, if models of gravitational effects and other phenomena are correct, comprise the majority of the cosmos's contents. We have relativity, that shows that space and even time itself are curved.  We have ideas that there may be infinitely many universes (there are various versions of this, some called the multiverse).  We have quantum uncertainty by which a particle or wave or whatever can be everywhere at once and have multiple superposed states that are characterized in part only when we observe it.  We have space itself inflating (maybe faster than the speed of light).  And then there's entanglement, by which there seem to be instant correlated actions at unlimited distances.  And there is some idea that everything is just a manifestation of many-dimensional vibrations ('strings').

The general explanations are that these things make no 'sense' in terms of normal human experience, using just our built in sensory systems (eyes, ears, touch-sense, smell, etc.) but that mathematically observable data fit the above sorts of explanations to a huge degree of accuracy.  You cannot understand these phenomena in any real natural way but only by accustoming yourself to accept the mathematical results, the read-outs of instrumentation, and their interpretation.  Even the most thoughtful physicists routinely tell us this.

These kinds of ideas rightfully make the news, and biologists (perhaps not wanting to be left out, especially those in human-related areas) are thus led to concocting other-worldly ideas of their own, making promises of miracle precision and more or less health immortality, based on genes and the like.  There is a difference, however: unlike physicists, biologists reduce things to concepts like individual genes and their enumerable effects, treating them as basically simple, primary and independent causes.

In physics, if we could enumerate the properties of all the molecules in an object, like a baseball, comet, or a specified set of such objects, we (physicists, that is!) could write formal equations to describe their interactions with great precision.  Some of the factors might be probabilistic if we wanted to go beyond gravity and momentum and so on, to describe quantum-scale properties, but everything would follow the same set of rules for contributing to every interaction.  Physics is to a great, and perhaps ultimate extent, about replicable complexity.  A region of space or an object may be made of countless individual bits, but each bit is the same (in terms of things like gravity per unit mass and so on).  Each pair, say, of interactions of similar particles etc. follows the same rules. Every electron is alike as far as is known.  That is why physics can be expressed confidently as a manifestation of laws of nature, laws that seem to hold true everywhere in our detectable cosmos.

Of cats and Schroedinger's cat
Biology is very different.  We're clearly made of molecules and use energy just as inanimate objects do, and the laws of chemistry and physics apply 100% of the time at the molecular and physics levels. But the nature of life is essentially the product of non-replicable complexity, of uniquely interacting interactions.  Life is composed strictly of identifiable elements and forces etc at the molecular level. Yet the essence of life is descent with modification from a common origin, Darwin's key phrase, and this is all about differences.  Differences are essential when it comes to the adaptation of organisms, whether by natural selection, genetic drift, or whatever, because adaptation means change.  Without life's constituent units being different, there would be no evolution beyond purely mechanical changes like the formation of crystals.  Even if life is, in a sense the assembling of molecular structures, it is the difference in their makeups that makes us different from crystals.

Evolution and its genetic basis are often described in assertively simple terms, as if we understood them in a profound ultimate sense.  But that is a great exaggeration: the fact that some simple molecules interacted 4 billion years ago, in ways that captured energy and enabled the accretion of molecular complexity to generate today's magnificent biosphere, is every bit as mysterious, in the subjective sense of the term at least, as anything quantum mechanics or relativity can throw at us. Indeed, the essential nature of life itself is equally as non-intuitive. And that's just a start.

The evolution of complex organisms, like cats, built through developmental interactions of awe-inspiring complexity, leading to units made up of associated organ systems that communicate internally in some molecular ways (physiology) and externally in basically different (sensory) ways is as easy to say as "it's genetic!", but again as mysterious as quantum entanglement.  Organisms are the self-assembly of an assemblage of traits with interlocking function, that can be achieved in countless ways (because the genomes and environments of every individual are at least slightly different).  An important difference is that quantum entanglement may simply happen, but we--evolved bags of molecular reactions--can discover that it happens!

The poor cat in the box.  Source: "Schrödinger cat" by File:Kamee01.jpg: Martin Bahmann, Wilimedia Commons

This self-assembly is wondrous, even more so than the dual existence of Schroedinger's famous cat in a box.  That cat is alive and dead at the same time depending on whether a probilistic event has happened inside the box (see this interesting discussion), until you open the box, in which case the cat is alive or dead. This humorous illustration of quantum superposition garnered a lot of attention, though not that much by Schroedinger himself for which it was just a whimsical way to make the point about quantum strangeness.

But nobody seems to give a thought beyond sympathy for the poor cat!  That's too bad, because what's really amazing is the cat itself.  That feline construct makes most of physics pale by comparison.  A cat is not just a thing, but a massively well-organized entity, a phenomenon of interactions, thanks to the incredible dance of embryonic development.  Yet even development and the lives that plants and animals (and, indeed, single-celled organisms) live, impressively elaborate as they are, pale by comparison with various aspects these organisms have of awareness, self-awareness, and consciousness.

This is worth thinking about (so to speak) when inundated by the fully justified media blitz that weird physics evokes, but then you should ask whether anything in the incomprehensibly grand physics and cosmology worlds are even close to the elusiveness and amazing reality of these properties of life and how these properties could possibly come about, how they evolved and how they develop in each individual--as particular traits, not just the result of some generic evolutionary process.

And there's even more:  If flies or cats are not 'conscious' in the way that we are, then it is perhaps as amazing that their behavior, which so seems to have aspects of those traits, could be achieved without conscious awareness.  But if that be so, then the mystery of the nature of consciousness having evolved, and the nature of its nature, are only augmented many-fold, and even farther from our intuition than quantum entanglement.

Caveat emptor
Of course, we may have evolved to perceive the world just the way the world really is (extending our native senses with sensitive instruments to do so).  Maybe what seems strange or weird is just our own misunderstanding or willingness to jump on strangeness bandwagons.  Here from Aeon Magazine is a recent and thoughtful expression of reservations about such concepts as dark matter and energy.

If quantum entanglement and superposition, or relativity's time dilation and length contraction, are inscrutable, and stump our intuition, then surely consciousness trumps those stumps.  Will anyone reading this blog live to see even a comparable level of understanding in biology to what we have in physics?

Tuesday, October 20, 2015

Unknowns, yes, but are there unknowables in biology?

The old Rumsfeld jokes about the knowns and unknowns are pretty stale by now, so we won't really indulge in beating that dead horse.  But in fact his statement made a lot of sense.  There are things we think we know (like our age), things we think we don't know but might know (like whether there will be a new message in our inbox when we sign onto email), and things we don't know but don't know we don't know (such as how many undiscovered marine species there are). Rumsfeld is the subject of ridicule not for this pronouncement per se (at least to those who think about it), because it is actually reasonable, but for other things that he is said to have done or said (or failed to say) in regard to American politics.

Explaining what we don't know is a problem!  Source: Google images

The unknowns may be problems, but they are not Big problems.  What we don't know but might know are at least within the realm of learning.  We may eventually stumble across facts we don't know but don't yet even know are there.  The job of science is to learn what we know we don't know and even to discover what we don't yet know that we don't know.  We think there is nothing 'inside' an electron or photon, but there may be if we some day realize that possibility.  Then the guts of a photon will become a known unknown.

However, there's another, even more problematic, one may say truly problematic kind of mystery: things that are actually unknowable.  They present a Really Big problem.  For example, based on our understanding of the current understanding of cosmology, there are parts of the universe that are so far away that energy (light etc.) from them simply has not, and can never, reach us.  We know that the details of this part of space are literally unknowable, but because we have reasonably rigorous physical theory we think we can at least reliably extrapolate from what we can see to the general contents (density of matter and galaxies etc.) of what we know must exist but cannot see.  That is, it's literally unknowable but theoretically known.

However, things like whether life exists out there are in principle unknowable.  But at least we know very specifically why that is so.  In the future, most of what we can see in the sky today is, according to current cosmological theories, going to become invisible as the universe expands so that the light from these visible but distant parts will no longer be able to reach us.  If there are any living descendants, they will know what was there to see and its dynamics and we will at least be able to make reasonable extrapolations of what it's like out there even though it can no longer be seen.

There are also 'multiverse' theories of various sorts (a book discussing these ideas is Our Mathematical Universe, by Mark Tegmark).  At present, the various sorts of parallel universes are simply inaccessible, even in principle, so we can't really know anything about them (or, perhaps, even whether they exist).  Not only is electromagnetic radiation not able to reach us so we can't observe, even indirectly, what was going on when that light was emitted from these objects, but our universe is self-contained relative to these other universes (if they exist).

Again, all of this is because of the kind of rigorous theory that we have, and the belief that if that theory is wrong, there is at least a correct theory to be discovered--Nature does work by fixed 'laws', and while our current understanding may have flaws the regularities we are finding are not imaginary even if they are approximations to something deeper (but comparably regular). In that sense, the theory we have tells us quite a lot about what seems likely to be the case even if unobserved. It was on such a basis that the Higgs boson was discovered (assuming the inferences from the LHC experiments are correct).

What about biology?
Biology has been rather incredibly successful in the last century and more.  The discoveries of evolution and genetics are as great as those in any other science.  But there remain plenty of unknowns about biological evolution and its genomic basis that are far deeper than questions about undiscovered species.  We know that these things are unknown, but we presume they are knowable and will be understood some day.

One example is the way that homologous chromosomes (one inherited each of a person's parents) line up with each other in the first stage of meiosis (formation of sperm and egg cells).  How do they find each other?  We know they do line up when sex cells are produced, and there are some hypotheses and bits of relevant information about the process, but we're aware of the fact that we don't yet really know how it works.

Homologous chromosomes pair up...somehow.  Wikimedia, public domain.

Chromosomes also are arranged in a very different 3-dimensional way during the normal life of every cell.  They form a spaghetti-like ball in the nucleus, with different parts of our 23 pairs of chromosomes very near to each other.  This 'chromosome conformation', the specific spaghetti ball, shown schematically in the figure, varies among cell types, and even within a cell as it does different things.  The reason seems to be at least in part that the juxtaposed bits of chromosomes contain DNA that is being transcribed (such as into messenger RNA to be translated into protein) in that particular cell under its particular circumstances.
Chromosomes arrange themselves systematically in the nucleus.  Source: image by Cutkosky, Tarazi, and Lieberman-Aiden from Manoharan, BioTechniques, 2011
It is easy to discuss what we don't know in evolution and genetics and we do that a lot here on MT. Often we critique current practice for claiming to know far more than is actually known, or, equally seriously, making promises to the supporting public that suggest we know things that in truth (and in private) we know very well that we don't know.  In fact, we even know why some things that we promise are either unknown or known not to be correct (for example, causation of biological and behavioral traits is far more complex than is widely claimed).

There are pragmatic reasons why our current system of science does this, which we and many others have often discussed, but here we want to ask a different sort of question:  Are there things in biology that are unknowable, even in principle, and if so how do we know that?  The answer at least in part is 'yes', though that fact is routinely conveniently ignored.

Biological causation involves genetic and environmental factors.  That is clearly known, in part because DNA is largely an inert molecule so any given bit of DNA 'does' something only in a particular context in the cell and related to whatever external factors affect the cell.  But we know that the future environmental exposures are unknown, and we know that they are unknowable.  What we will eat or do cannot be predicted even in principle, and indeed will be affected by what science learns but hasn't yet learned (if we find that some dietary factor is harmful, we will stop eating it and eat something else).  There is no way to predict such knowledge or the response to it.

What else may there be of this sort?
A human has hundreds of billions of cells, a number which changes and varies among and within each of us.  Each cell has a slightly different genotype and is exposed to slightly different aspects of the physical environment as well.   One thing we know that we cannot now know is the genotype and environment of every cell at every time.  We can make some statistical approximations, based on guessing about the countless unknowns of these details, but the numbers of variables will exceed that of stars on the universe and even in theory cannot be known with knowable precision.

Unlike much of physics, the use of statistical analytic techniques is inapt, also to an unknowable degree.  We know that not all cells are identical observational units, for example, so that aggregate statistics that are used for decision-making (e.g., significance tests) are simply guesses or gross assumptions whose accuracy is unknowable.  This is in principle because each cell, each individual is always changing.  We might call these 'numerical unknowables', because they are a matter of practicality rather than theoretical limits about the phenomena themselves.

So are there theoretical aspects of biology that in some way we know are unknowable and not just unknown?  We have no reason, based on current biological theory, to suspect the kinds of truly unknowables, analogous to cosmology's parallel universes.  One can speculate about all sorts of things, such as parallel yous, and we can make up stories about how quantum uncertainty may affect us. But these are far from having the kind of cogency found in current physics.

Our lack of comparably rigorous theory relative to what physics and chemistry enjoy leaves open the possibility that life has its own knowably unknowables. If so, we would like at least to know what those limits may be, because much of biology relates to practical prediction (e.g., causes of disease). The state of knowledge in biology, no matter how advanced it has become, is still far from adequate to address the question of the levels of knowable things that may eventually be knowable, but also what the limits to knowability are.  In a sense, unlike physics and cosmology, in biology we have no theory that tells us what we cannot know.

And unlike physics and cosmology, where some of these sorts of issues really are philosophical rather than of any practical relevance to daily life, we in biology have very strong reasons to want to know what we can know, and what we can promise....but perhaps also unlike physics, because people expect benefits from biological research, strong incentives not to acknowledge limits to our knowledge.

Thursday, May 28, 2015

Discovering the "laws of life": where are we?

For several years, I've been in an off-and-on discussion about topics relevant to MT readers, with a thoughtful undergraduate student, named Arjun Plakett.  He has now graduated, but we're still in at least occasional contact.  He is a chemical engineering graduate, but is interested in the wider issues of science.

I mention this because Arjun recently brought to my attention an old, but thought-provoking and, I think, relevant book.  It is by the late Milton Rothman, and called Discovering the Natural Laws (1972, reprinted by Dover, 1989).  This book is a very interesting one, and I wish I had known of it long ago. It concerns the history by which some of the basic 'laws' of physics were shown to be just that: laws. And, while its specific topic is physics, it is very relevant to what is going on in genetics at present.

Source: my copy's cover

Rothman deals with many presumed laws of Nature, including Newton's laws of motion, conservation, relativity, and electromagnetisim.  For example, we all know that Newton's universal law of gravitation is that the gravitational attraction between two objects, of mass M and N, a distance r apart, is

F = G MN/r^2

(the denominator is r-squared).  G is the universal gravitational constant.  But how do we know this?  And how do we know it is a 'universal law'?  For example, as Rothman discusses, why do we say the effect is due to the exact square (power 2) of the distance between the objects?  Why not, for example, the 2.00005 power?  And what makes G a, much less the, constant? Why constant? Why does the same relationship apply regardless of the masses of the objects, or what they're made of? And what makes it universal--how could we possibly test that?

The fact that these laws are laws, as Rothman details for this and other laws of physics, was not easy to prove.  For example, what two objects would be used to show the gravitational law, where measurement errors and so on would not confound the result?  The Earth and Moon won't do, despite Newton's use of them, because they are affected by the pull of the Sun and other planets etc.  Those effects may be relatively small, but they do confound attempts to prove that the law is in fact a true, exact law.

A key to this demonstration of truth is that as various factors are accounted for, and measurement becomes more accurate, and different approaches triangulate, the law data become asymptotically close to the predictions of the theory: predictions based on theory get ever closer to what is observed.

In fact, and very satisfyingly, to a great approximation these various principles do appear to be universal (in our universe, at least) and without exception.  Only when we get to the level of resolution of individual fundamental particles, and quantum effects, do these principles break down or seem to vary.  But even then the data approach a specifiable kind of result, and the belief is that this is a problem of our incomplete understanding, not a sign of the fickleness of Nature!

Actually if a value, like G or the power 2 were not universal but instead were context-specific, but replicable in any given context, we could probably show that, and characterize those situations in which a given value of the parameter held, or we could define with increasing accuracy some functional relationship between the parameter's value and its circumstances.

But Mermaid's Tale is about evolution and genetics, so what is the relevance of these issues?  Of course, we're made of molecules and have to follow the principles--yes, the laws--of physics.  But at the scale of genes or organisms or evolution or disease prediction, what is its relevance?  The answer is epistemological, that is, about the nature of knowledge.

Theory and inference genetics and evolution: where are the 'laws'?  Are there laws?
When you have a formal, analytic or even statistical theory, you start with that, a priori one might say, and apply it to data either to test the theory itself, or to estimate some parameter (such as F in some setting such as a planet's orbital motion).  As I think it can be put, you use or test an externally derived theory.

Rothman quotes Karl Popper's view that "observation is always observation in the light of theories" which we test by experiments in what Rothman calls a 'guess and test' method, otherwise known as 'the scientific method'.  This is a hypothesis-driven science world-view.  It's what we were all taught in school.  Both the method and its inferential application depend on the assumption of law-like replicability.

Popper is most known for his idea that replication of observations never prove an hypothesis because the next observation might falsify it, but it only takes one negative observation to do that.  There may be some unusual circumstance that is valid but you hadn't yet encountered.  In general, but perhaps particularly in biomedical sciences, it is faddish and often self-serving to cite 'falsifiability' as our noble criterion, as if one has a deep knowledge of epistemology.  Rothman, like almost everyone, quotes Popper favorably.  But in fact, his idea doesn't work.  Not even in physics!  Why?

Any hypothesis can be 'falsified' if the experiment encounters design or measurement error or bad luck (in the case of statistical decision-making and sampling).  You don't necessarily know that's what happened, only that you didn't get your expected answer.  But what about falsifiability in biology and genetics?  We'll see below that even by our own best theory it's a very poor criterion in ways fundamentally beyond issues of lab mistakes or bad sampling luck.

We have only the most general theories of the physics-like sort for phenomena in biology.  Natural selection and genetic drift, genes' protein-coding mechanisms, and so on are in a sense easy and straightforward examples. But our over-arching theory, that of evolution, says that situations are always different.  Evolution is about descent with modification, to use Darwin's phrase, and this generates individuals as non-replicates of each other.  If you, very properly, include chance in the form of genetic drift, we don't really have a formal theory to test against data.  That's because chance involves sampling, measurement, and the very processes themselves.  Instead, in the absence of a precisely predictive formal theory, we use internal comparisons--cases vs controls, traits in individuals with, and without, a given genotype, and so on.  What falsifiability criterion should you apply and, more importantly, what justifies your claim of that criterion?

Statistical comparison may be an understandable way to do business under current circumstances that do have not provided a precise enough theory, but it means that our judgments are highly dependent on subjective inferential criteria like p-value significance cutoffs (we've blogged about this in the past).  They may suggest that 'something is going on' rather than 'nothing is going on', but even then only by the chosen cutoff standard.  Unlike the basic laws of physics, they do not generate asymptotically closer and closer fits to formally constructed expectations.

What is causal 'risk'?
We may be interested in estimating 'the' effect of a particular genetic variant on some trait.  We collect samples of carriers of one or two copies of the variant, and determine the risk in them, perhaps comparing this to the risk in samples of individuals.  But this is not 'the' risk of the variant!  It is, at best, an estimate of the fraction of outcomes in our particular data.  By contrast, a probability in the prediction sense is the result of a causal process with specific properties, not an empirical observation.  We usually don't have the evidence to assert that our observed fraction is an estimate of that underlying causal process.

Collecting larger samples will not in the appropriate sense lead asymptotically to a specific risk value.  That's because we know enough about genes to know that an effect is dependent on the individual's genomic background and life-experience.  Each individual has his/her own, unique 'risk'.  In fact, the term 'risk' itself, applied to an individual, is rather meaningless because it is simply inappropriate to use such individual concepts in this context, when the value is based on a group sample.  Or, put another way, the ideas about variance in estimates and so on just don't apply, because individuals are not replicable observations.  The variance of group estimates are different from what we might want to know about individuals.

These are the realities of the data, not a fault with the investigators.  If anything, it is a great triumph that we owe to Darwin whatever his overstatement of selection as a 'law' of nature, that we know these realities!  But it means, if we pay attention to what a century of fine life science has clearly shown, that fundamental context-dependence deprives us of the sort of replicability found in the physical sciences.  That undermines many of the unstated assumptions in our statistical methods (the existence of actual probability values, proper distributions from which we're sampling, replicability, etc.).

This provides a generic explanation for why we have the kinds of results we have, and why the methods being used have the problems they have.  In a sense, our methods, including statistical inference, are borrowed from the physical sciences, and may be inappropriate for many of the questions we're asking in evolution and contemporary genetics, because our science has shown that life doesn't obey the same 'laws' at the levels of our observation.

Darwin and Mendel had Newton envy, perhaps, but gave us our own field--if we pay attention
Rothman's book provides, to me, a clear way to see these issues. It may give us physics envy, a common and widespread phenomenon in biology.  I think Darwin and Mendel were both influenced indirectly by Newton and the ideas of rigorous 'laws' of nature.  But they didn't heed what Newton himself said in his Principia in 1687: "Those qualities of bodies that . . . belong to all bodies on which experiments can be made should be taken as qualities of all bodies universally." (Book 3, Rule 3, from the Cohen and Whitman translation, 1999)  That is, for laws of nature, if you understand behavior of what you can study, those same laws must apply universally, even to things you cannot or have not yet studied.

This is a cogent statement of the concept of laws, which for Newton was "the foundation of all natural philosophy."  It is what we mean by 'laws' in this context, and why better designed studies asymptotically approach the theoretical predictions.  But this principle is one that biomedical and evolutionary scientists and their students should have tattooed on so they don't forget.  The reason is that this principle of physics is not what we find in biology, and we should be forced to keep in mind and take seriously the differences in the sciences, and the problems we face in biology.  We do not have the same sort of replicability or universality in life unless non-replicability is our universal law.

If we cannot use this knowledge to ferret out comparable laws of biology, then what was the purpose of 150 years of post-Darwinian science?  It was to gain understanding, not to increase denial.  At least we should recognize the situation we face, which includes stop making misleading or false promises about our predictive, retrodictive, or curative powers, that essentially assume as laws what our own work has clearly proven are not laws.

Thursday, March 19, 2015

My complexity is more complex than your complexity!

Scientists often talk about how complex their field is, and of course often they are right.  But a word like 'complexity' may be used confer a sense of importance and gravitas to the subject, and often the description--even if true--seems used in an advertising sort of way.  After all, who wants to be working in an area that's 'simple'?  If it's simple, why haven't we solved its problems, unless we're simpletons!

Describing our field as 'complex' usually doesn't just mean there are things in our field that we can't measure or don't know about.  That's always true in any science.  Instead, the term usually means that our phenomena of interest involve a host of causal factors that make the relationship between those factors and the outcomes we're interested in imprecise.  If science is about understanding cause and effect, then what we mean is that the effects we observe aren't easily predictable from the purported or known causes that we assess.

So, in chemistry the folding of proteins is complex.  The structure of galaxies in the cosmos is complex.  And the genetic and other factors causing our traits, like disease, are usually complex.

When we defend our inability to explain everything in our field by saying it’s complex and we’re working hard on it, we are in some senses seeking justification for lots more funding, and exculpating ourselves from being guilty of being too dense to see.  But to a great extent the reason we can’t see the forest for the trees is that we are embedded in the trees, or, there are so many trees that we simply can’t yet figure out the forest.  It is a perfectly legitimate state to be in, because, again, once a problem is solved it’s no longer a science problem—it may be an engineering problem to figure out how to use it and so on, of course.

But your complex isn't the same as my complex!
Every field is different of course, but to me there is a major, I think basically qualitative difference between the enormous complexity of fields like physics and that of biology.  I think this is not yet well recognized by biologists (especially perhaps in biomedical areas), who, as has been widely suggested, often live a life of physics envy and try to present their work with the flavor of and as if it had the rigor of physical science.

To me, the difference is not that biology should be free of physical laws, nor that biological phenomena are not, in deeply profound ways, constrained by those laws.  Living organisms are bags of interacting molecules that so far as we know entirely obey the normal laws of chemistry and those are in essence the laws of physics.  Unless we're into the mind/body duality debate (about, say, the nature of consciousness)--which we're not--our bodies are molecular phenomena.

The difference is the degree to which those kinds of laws are useful in predicting our kind of phenomena.  I think we can see the point, whether or not you'll agree with it, by taking an example from cosmology.

The numbers vary, but there are said to be something on the order of hundreds of billions of stars in a galaxy and hundreds of billions of galaxies in the observable universe.  The number of atoms or their components within each star is essentially countless.  Yet, stars move within galaxies in regular patterns, and galaxies move around each other in regular patterns.  These patterns are complex by anybody's standards, but it is important to try to understand them if we want to understand the cosmos.

NGC 4414, a typical spiral galaxy in the constellation Coma Berenices, is about 55,000 light-years in diameter and approximately 60 million light-years away from Earth; Wikipedia

Cosmologists are faced with what is called a multibody problem.  To predict the velocity and position of even a small number of bodies in space is beyond a formal or 'closed' (or 'analytic') solution.  In a sense, this is because every instant every object is changing and since every object affects every other object via gravity, they're all changing all the time.  One can simulate this, and an interesting recent discussion by Brian Hayes of how to do that is in the Feb-Mar 2015 issue of American Scientist, if you're interested, and our presentation here uses that to illustrate our point.

The gist of this approach takes advantage of the assumption that Newton's law of gravitation is perfectly true everywhere (if general relativity or other things change this, it's irrelevant to our point here).  Gravitational attraction of an object can be modeled as if all its mass were concentrated at a point located in space.  Between two objects, of mass M1 and M2, that are some distance r apart, the force of gravitational attraction is given by F12=G*M1*M2/r12-squared, where G is a universal gravitational constant that is known and simply a part of the nature of matter.  There's a separate Fxy for any two objects x and y, and the multibody problem is that for these four bodies, 1, 2, x, and y, there must be an F1x, F2x, F1y, F2y each with their own 'F' equation, but also since gravity is a force that causes motion, all the bodies are always moving.  So the equations are always changing (the distances, or r's are changing).

The point here is that every object interacts with every other object all the time, so that any change in the location of any one object affects the motion of every other object. The trick of simulating this for a great many bodies like the billions of stars in galaxies and of galaxies among each other, is to iterate one tiny time interval at a time, then compute these many forces, then apply them to each object to alter its motion, and then do the same for the next small time interval.  With super computers cosmologists can achieve a lot by simulating even whole galaxies (see the above reference on how they do it).

Surely the physicist is justified in calling this complex!

Genetics and evolution are complex in an additional way
Genomes work strictly by interacting with other things, because DNA is essentially inert by itself. There are billions of nucleotides in genomes, and each has its own electromagnetic effects in the nucleus.  Generally this is very small but the genome is organized into modules consisting of multiple adjacent nucleotides (or depending on how you count, these segments can be separated by some nucleotides not part of a given module); these modules may overlap in that the same nucleotide may be involved in more than one module.  The modules are identified by their function, because they have no a priori function.

Genomes do their business by holding codes for molecules that are copied from the DNA (e.g., functional RNA or  proteins decoded from mRNA), or are recognized by proteins and other molecules for gene regulatory, packaging, and other functions.  These interactions take place because of electromagnetic charges and similar properties of each interacting molecule and the local DNA. Many functional entities involve large networks of these kinds of interactions to produce an effect. Andreas Wagner describes this complexity, showing it is of hyper-astronomical scale, in his recent book Arrival of the Fittest.

If you think about genomes in this way, you might think of each interaction as, say, the relationship between two stars, and the resulting collaborations as forming physiological functions ('galaxies') and the whole you (galaxy clusters).  Complex, yes, but can it be broken down piece by piece and simulated or understood that way, as cosmological simulations do?  I think the answer is a heavily qualified, 'partly'.  The reason is that there is a big, or huge, difference between galaxies of biological function, and galaxies made of mere stars.

The pairwise, and hence multi-way interactions in biological systems do not follow a uniform law of chemical attraction, in the sense that each molecule has its own unique charge.  Further, interactions between two molecules depend on the presence of other molecules (e.g., cofactors) and the conditions (e.g., pH) of the cell at the time.  There is no comparable uniform law of chemical attraction, even if the laws of chemical attraction are uniform for particular cases (e.g., specific ions or isomers of elements).  Since I'm not a chemist, I'm doubtlessly not expressing this properly, but hopefully I have the basic point correct.

This means that parsing the interactions down one by one, and iterating over some short time interval, as can be done in cosmology, is far less possible in biology.  And here we have to consider millions of interactions between proteins, proteins and DNA, proteins and RNA, RNA and RNA, RNA and DNA, other types of molecules and those just listed (e.g., sugars or other molecules that modify protein molecules).

Cosmology creates stars and galaxies by the same principles with essentially the same ingredients, and has done almost since (literally) the beginning of time (conditions at the Big Bang itself seem to have been somewhat different).  Stars and galaxies come and go, each with different specific details, but each produce by the same few principles--or so it seems at present.

Evolution also began at a biological 'big bang', somewhere on earth.  But its consequences are different specifically because evolution works by generating differences.  Mutation and chance and selection within individuals and among individuals and species, has led to the biosphere's ad hoc diversity. The same basic physical and chemical laws apply, but at the level of interaction, we don't have the tools to generalize a priori and simulate complex organisms.

Systems biologists certainly do this for metabolic networks of various sorts, but they only touch the surface of what is possible, and this is true of simulations as well (again, see Wagner's book for discussion both of the networks of biology and efforts to simulate them).

In that sense, physicists, our complexity is bigger than your complexity!  So there!

Monday, November 10, 2014

Dragonflies and innate understanding of physics

"The human mind possesses a basic probabilistic knowledge."  So say Fontanari et al. in a newly published paper in PNAS ("Probabilistic cognition in two indigenous Mayan groups").  They asked whether formal schooling was a necessary foundation for a sense of chance by comparing two unschooled Mayan groups with Mayan schoolchildren and a control, and determined that no formal education is required for making "correct probabilistic evaluations."

This paper hit the popular news media.  "We are all natural bookmakers," said New Scientist.  And the senior author, Vittorio Girotto, said,
"We wanted to show that this sense of chance exists, that it is universal, and that you do not need to be trained to evaluate uncertainty," says Girotto. "We have good evidence now that the human mind does possess this ability."
Researchers have also reported that infants have a sense of "intuitive physics," seemingly being born with the ability to understand gravity (that is, by 2 months of age, they expect an object to fall -- really, who understands gravity?), and to expect that an object doesn't cease to exist when hidden from view.

And, studies (e.g., here and here) suggest that by 5 or 6 months, infants have a sense of numbers.  But then, so, apparently, do non-human primates, such as tamarins.  When two objects were hidden behind a screen, tamarins expected to see two objects when the screen was lifted; when there were three, the animals looked at the objects longer than when they were presented with the expected number, suggesting surprise or confusion.  But then, dogs are good at playing Frisbee because they understand physics, too -- what does up there, comes down here.  

And, even crows understand water displacement, knowing that if they raise the water level in a small beaker, they'll be able to pluck out a piece of floating food.



And look at how bats, and even dragonflies track their in-flight prey.



There seem to be several things going on when things like the probability study in unschooled Mayans make the news.  To those of us who are schooled, probability, mathematics, physics -- or even grammar -- can seem like rather esoteric subjects that take years of training to master, or to even vaguely understand (though really, who understands probability?).  Traditional schooling has divided the world we know into disciplines that have names and bodies of knowledge that must be mastered.

But, in large part, formal education is giving names to things we already knew.  We have already internalized grammar as infants, we have a grasp of essential physical or mathematical principles, and it seems some basic understanding of chance as well. Essentially we're formalizing our description of the world we know from experience, but clearly we -- and dogs and tamarins, and crows and dragonflies and many other animals -- know that world before we know words or equations or models or principles that describe it.  And indeed, most animals never get to that stage.  I think we all can do this not because we have an innate sense of physics, or grammar, but because our brains have evolved to be able to recognize some kind of order, and to make generalizations from what we experience.  It's apparently important to survival, because so many organisms have evolved the same ability.

In this context, we should keep in mind that mathematics is just an elegant way of describing relationships and really exists only because over the millennia humans did in fact realize that relationships had regularity.  The long-known fact to western science, for example, that the Mayans had very sophisticated calendars shows that the recent news story is no surprise at all -- indeed, it would be very surprising were it not so.   How things work in the brain is, however, a different order of question.

Holly elegantly suggests it simply comes down to pattern recognition.  Frisbees follow predictable arcs, objects don't disappear inexplicably, if there are 5 yellow tokens and only 1 red, the chance of choosing a yellow one is higher than the chance of choosing a red one.  I'm happy with that.

Monday, April 21, 2014

Earths galore: we're getting closer...but to what?

Well, NASA's done it again.  They've found another exciting planet lurking in the depths of near space.  This time, the BBC proclaims, we have Kepler find 186f (illustrated, even!), the best one yet and (maybe) it (could) be watery!  It seems that the news cycle isn't just 24/7, but longer: every time NASA can release the story about some newly found somewhat-earthlike rock, the news outlet pick it up as if it were the first time and nobody can remember that we've seen almost the same many times before. But if they can get their sales with re-runs, we can't be blamed for at least returning to this topic (e.g., we blogged about this when NASA reported the news of an exoplanet circling the star Gliese 581, as well as others), though hopefully with a little bit more that's different compared with NASA's releases!

Just like Earth! [in an artist's ebullient imagination]  Credit:
CreditNASA Ames/SETI Institute/JPL-CalTec
Cred
A planetary plenitude
This discovery is called by the ever-sober news media an 'earth twin' or as the knowledgeable NY Times puts it, 'perhaps a cousin' (whatever that means).  Sssh!  If you keep very quiet, you might be able to hear your Keplerian kin-folk talking!

 Well, we can overlook such verbiage since ours attempts to be a science blog.

Actually, the discovery of a plenitude of possible planets, or 'habitable' ones as they seem often to be referred to, is interesting and continues apace.  They now number in the hundreds and only a trivial fraction of the universe has been scanned, or is even scannable with available technologies.

These truly are interesting findings, though they are, surprisingly, not at all surprising.  After all, space is massively large and filled with a chaos of objects hot and cold, large and small.  If, as seems likely, Newton was right and gravitation is universal, then the small stuff will often be captured by the gravitational attraction of the big stuff. Big hot stuff (stars) can capture smaller wandering rocks and they'll end up in orbit.  Some even smaller rocks are captured by the pull of, and orbit around, bigger rocks (like moons around planets). Lots of other rocks and stars will be in all sorts of relationships as well.  But some of these will be special.

If we care about our sort of life, then we want what is being called a Goldilocks planet: like her porridge, the rock will be not too hot, and not too cold, not too wet and not too dry, but just right!  That is, there will be water and warmth enough to keep it liquid but not turn it all to steam, and other things of that sort.  There is where, we're told, we'll find the ETs.  Some day.

Now this is genuinely thought-provoking, but it needs none of the circus hype of the news media.  That's because it basically tells us what we already knew.  In fact, the actual facts are to us a lot more interesting than the Disneyfication.

We've previously in general terms discussed the idea that if there are an infinity of starts, galaxies, planets or universes, there would just as likely be all sorts of life on them.  Here, we can be a tad more specific than that.  For example, if there are hundreds of planet-like things just here in our own local galaxy (the Milky Way), somewhat like 186f, and we've really just begun looking and technically can only see some of what might be out there, and if what we know is largely within our own galaxy, where there are in the range of 100 billion stars, then thousands and thousands of those stars must have orbiting rocks.  There are around 100 billion other galaxies (give or take a few), and we can assume that there must be thousands upon thousands of the same sorts of rocks orbiting stars and rocks orbiting around those rocks, in each galaxy.

That is, even on the back of the proverbial envelope, one would estimate that there would be at least 100 thousand billion habitable planets.  That is 100 trillion planets (100,000,000,000,000), as a minimal estimate.  Once we knew that there were 'habitable' rocks orbiting stars, such as Earth and perhaps one or two more even just around our own sun, there likely are around 100 or more billion earth-maybes in the Milky Way alone!  Of course, if you hold to Genesis, our Earth could be God's only watering hole, but once we had clear evidence of other possibles, a reasoning person must accept that these larger numbers become plausible.

The point is that even without the Kepler and other telescopes scanning the heavens for these things, the totally convincing plausibility argument would be that the universe is awash in 'habitable' planets.

But, ETs?
Now the fact that there are lots of warm, wet rocks out there is one thing, but it doesn't imply that there is anybody living on them. However, life--even just our sort of life--is clearly possible because we're here living it as proof.  Given that,  even a modest kind of belief in natural science would lead one to believe that if you have 100 trillion tries, there really has to be some sort of life out there, and probably lots of it, even if it's only on a trivially teeny fraction of the habitable planets.

This of course does not address whether it's our sort of life in the 'intelligent' sense.  Or life based on DNA. The fact that we are here is not quite so persuasive about that, because the numbers get astronomical (so to speak, but in the other direction--of smallness).  The number of nucleotides in earth-life's genetic history, from primal RNA to global DNA today, likely dwarfs even 100 trillion.  Each has arisen and/or later been changed by largely independent individual probabilities that are very, very small.  A net result is, in essence, the product of these probabilities (this and this and this...and this--the result--had to happen).  So to go from primal soup to any given form of  complex 'intelligence' over 3.5 billion years, that is, our form of it, would be minuscule relative even to the numbers of potentially habitable planets.  This could mean that intelligent life arising more than once, even with so many trials, would be very unlikely, and thus that we are lonely in our uniqueness.

But if others just like us may not happen more than once, there are also countlessly many such pathways to intelligence: after all, each human has a different genotype and there have been billions upon billions of us.  So it really is impossible to do more than muse about what the net resulting probabilities are.  To a great extent it depends on what we count as intelligent life.  To a greater extent, "Are we alone?"  is hardly even a scientific question to ask.

Worse for NASA (and Disney) is that even here on Earth where we know intelligent life has arisen, we've only been at it for, say 1,000,000 years, being generous and depending on what 'intelligent' means. But if it means having language and communicating by electromagnetic radiation (like radio), so we could communicate with ET's, that's only been about 100 years and probably we won't last much longer, either. So the probability that at any given time smart life is present in us and any other such place is a minuscule fraction of the time that life has been around on any of these lucky 100 trillion planets.

In that sense, large numbers don't nearly guarantee that there are smart anythings anywhere else.  The chance that us-like life is out there now, and that 'now' means we can communicate with it, becomes possibly rather miniscule.

Forget about chatting!
In one of our previous posts about Gliese 667C, we note the problems about thinking that we could communicate with, much less actually go to, such places (assuming we understand physics, like the limiting speed of light, correctly).

Kepler 186f is said to be about 500 light years away.  That means that a signal that we can pick up from there was sent when Da Vinci was painting the Mona Lisa.  If there was intelligent life there, and they're at all like us, they may well have obliterated themselves long ago.  But suppose they're peaceful (having evolved way beyond us), then just to send a friendly radio wave of "Hi!" to them, and get a wave back would take until the year 3014. By then most everything would have changed about human life here, with lots of world wars (though, of course, Republicans would still be trying to keep ordinary people from being able to afford a doctor).  Forget about chatting with the ETs!  Even Google will be out of business by that time.

And as we said about Gliese 667C which is a mere 22 light years away, 20 times closer than 186f, physically getting there would not be half or any of the fun.  It would be impossible in any practical sense, and even if we could actually do it, it would take millennia of space travel to get to 186f, and when we got there there might be nobody still around to drop in on.

So, what is the purpose of the space probe?
Without being too much of a spoil sport, because up to a point this kind of exploration really is interesting and in some ways now and then may tell us important things about our universe (not likely to be comforting to dogmatic religion), we have to ask about the purpose of this kind of probing.  In a sense, for reasons we suggest above, the numbers suggest that it really tells us nothing that we didn't have almost just as strong a reason to know anyway.  It would take something like a Genesis literalist to think that there would be no other planets with life on them, or even that they would be very few.  And of course either we think these findings suggest the plausibility that forms of life must exist out there, or else the burden of proof should be on a denier to show how, in the face of these overwhelming numbers (and not counting theories about multiple independent universes), there could fail to be some such 'life' on lots and lots of planets.

Of course, this is really just science fiction, almost literally.  The vast majority of any such planets are, were, or will be millions or even billions of light-years away. That means what we see today isn't there now, but was there eons ago.  Much of that light has been streaming here since before there was life on Earth--or even before there was an Earth!  Indeed, if a typical star's lifetime is around 10 billion years, much of what we see no loner exists as such and, likewise, much or most of what actually is out there came into existence too recently (even if millions or billions of years ago) for any evidence to have reached us.

So, it takes either a television sci-fi producer, a NASA PR rep, or a real dreamer to think we could ever go there or really communicate with much or even any of what must be out there.  If we really thought anything like that, we should intensely be doing very down-to-earth studies to see if the speed of light and relativity are limiting factors or whether transformative new aspects of space itself remain to be discovered.

At what point is the research cost** not worth the number of people who could be fed by the same funds, and so on?  When does asking such questions make one just a killjoy, and when does it make one concerned for the problems, and the actual unknowns, on the one planet we actually can do something about?



**Or, as we've suggested before, if this really is mainly just entertainment, why not let the video or other entertainment industries pay for it?

Wednesday, April 2, 2014

Entropy and context-dependency: an epidemiological dilemma

Yesterday we discussed the problems facing genetic and environmental epidemiology (and, though we did not say it, many other fields of science as well).  It had to do with complexity and the difficulty of isolating causal variables and showing their effect.  One issue is not just that our current big-data statistical approaches are not doing adequately well at present, but whether we face inevitable limits to what we can know--now or ever!  As we said yesterday, one should never say "never" in science, but as we also said, it is perhaps possible that there are some 'nevers' in epidemiology.  To see this, we think an appeal to an analogy from physics--that may even apply literally to our field--will hopefully at least make the ideas we were trying to express clear.

An entropic universe
An important concept in physics is called 'entropy'.  Entropy refers to the evenness of energy or matter. As formally defined on its Wikipedia page, entropy is "a measure of the number of specific ways in which a thermodynamic system may be arranged, often taken to be a measure of disorder, or a measure of progressing towards thermodynamic equilibrium."  As we understand what cosmologists say, in the universe at the time of the Big Bang, everything was concentrated into a pin-head size volume.  This is shown in the figure.  Everything was highly orderly and in a degenerate sense everything could be arranged in only one or a small number of ways.  But as the universe expanded, in a statistical sense 'everything' started to get splattered out.  There are an ever increasing number of ways the same elements could be arranged in this ever-growing space.

Arrows show the order of cosmic time
Initially there were higher and lower density regions and still today there are concentrations, like galaxies and planets (and us!).  But by now, 14 billion years later and overall, the distribution of matter and energy in the cosmos is distributed very uniformly in all directions, to 999 parts in a thousand.  Only 0.1% is non-uniform--and that is the clustered stuff we see, the stars and planets and so on here and there in space.  It is this evenness and yet the existence of some relatively miniscule lumps of stuff, like the solar system and us, that in part led to the concept of cosmic 'inflation', the claimed confirmation of which made all the news recently.  And these arrangements are coming into and going out of existence all the time.

Entropy is in a sense also used as the very definition of the directional arrow of time itself.  Things went from very highly concentrated and in that sense organized, to very widely and evenly distributed, and in that sense disorganized.  The Second 'Law' of Thermodynamics asserts this phenomenon.  Although probabilistically, things could get re-organized, the overwhelming probability is that they'll just spread out further and further as the universe expands. In other words, grey paint could separate out into the black and white paint from which it was made, just by chance, but the probability is so small compared to the probability that mixed paint will be ever more evenly mixed (except for very local, instantaneously fleeting re-separation).  That overwhelming probability gives the cosmos its time direction: time goes from more to less organized.  Eventually, the universe will be essentially entropic: cold and dead, at least as we understand what cosmologists think.

Doing anything like work or concentrating matter or energy requires negative entropy--uneven concentrations that are fundamentally required in order to leverage change you might want to make.  We concentrate an explosion in the cylinders of our car, using the potential energy in our fuel, to drive the pistons.  But then the resulting heat is dissipated through work and out the exhaust, and can do no more work.

There are lots of debates in physics about entropy and whether there can be any escape from its limits, and, naturally, there are some who claim, by invoking various kinds of arguments, that this might be possible.  But the strong consensus seems to be that this is not so, at least in the universe as we live in it.

But what (if anything) has this to do with genomics or epidemiology?

An 'arrow' of limits for epidemiological sciences, too?
Statistical methods such as those that we almost inevitably, or even necessarily, have to use in genetic, epidemiological, evolutionary and social sciences, are based on finding associations between measured variables.  The fundamental underlying assumption is replication, that a cause and effect will be associated with each other in samples we examine: the presence of a cause will more often be observed in the presence of the effect--say, a disease or given level of blood pressure with some genotype or dietary factor--more often than would occur by chance.  'Just by chance' means as if the exposures and outcomes were totally scrambled out there in the real world.

If a true cause is concentrated in a subset of a population, its effect will also be concentrated, and we can observe the difference in frequencies if we know how to look for it.  Our statistical models say test whether an observed association could have happened just by chance, but if what we see is 'unusual' according to some evidentiary cutoff criterion we choose to define, then we assume we have detected a true causal link.   If it's there and is strong enough, it should be easy to find, at least in principle, if we design the right kind of sample, measurement, analysis--and appropriate statistical cutoff criterion.

But if some outcome, as we define it, is the result of many different causes, and even if each is individually concentrated (e.g., not all people smoke or eat McFood or have a given genotype), if the causes are scrambled up among people to too great an extent, we might say by analogy with cosmology, that the 'universe' of interest is essentially in an entropic state.  There simply isn't enough concentration of 'cause' to be a useful source of the 'work' of causation.  'Useful' here means that we can find the cause, predict its results, or do something about it.

Another common way to describe this in terms of our computer age is to say that there is a lot, not of concentrated mass or energy, but of 'information' in the system.  Entropy has had wide uses in computer sciences and even various applications in evolution (e.g., measurements of the evenness of variation at a gene in a population).  It has even been used, in false desperation, by creationists to say that life violates the Second Law and hence must have a divine origin.  But here, we're applying this concept somewhat differently, though rooted in ideas about information because what we want in our studies is information about cause and effect.  Information in this sense is a logical relationship among measurements that is a reflection of a physical one among the things measured.

Epidemiologic entropy may or may not map clearly onto the same concept in physics, but it may be a relevant and helpful heuristic to help understand the enigmatic nature of complex traits in the face of the major assault being waged at it these days.

Could 'never' really mean never here?
It is dangerous if not foolhardy to declare that science can never achieve something, because that is often just what happens shortly after the declaration is made.  But we may be on a bit safer ground here because, like the limiting speed of light, the idea of cosmic entropy seems very well grounded in physics.  It is held to be universally and literally true, not just a guess or idea shoehorned into the data.

By extension, if the causal elements related to some effect we see, or define as being real or something we care about, are highly entropic, there simply may be no way to concentrate causation in order to identify its organization, as outside observers.  As in the Second Law, there are an uncountable number of different ways the causal components could be arranged among people (we studying this are somewhat like epidemiological Maxwell's demons, for readers who know that thought experiment in physics).  We use statistical approaches to leverage an answer by identifying or sorting concentrations of causation (regression analysis for example) and quantifying its association with effect.

Here the concept of context-specificity is key:  The effect of one measured factor depends on its context, that is, its values for other relevant factors (measured or un-measured).  if every person has a unique mix of causal factors and exposures related to some outcome of interest, everyone will be unique relative to that outcome, and this casually 'entropic' state of affairs might, perhaps even in principle, defy statistical approaches to identify causation in the concentration-based way that we currently employ.  If each person were unique in terms of relevant causal factors, our  means of testing our assertion about those factors will be relatively futile, and prediction may even be literally impossible. For example, like stars and galaxies in space, both genotype and environmental states are always changing coming into and going out of existence.

Science has a deep belief that this is not the state of affairs, and indeed, things are not entirely entropic even for vague traits like the ones whose causal elusiveness fills the press.  There do seem to be some causal 'galaxies'--subsets of identified causal concentration sufficient for detection.  But to a great extent, causation does seem to be entropic.

This, like all analogies, is an imperfect way to think of life.  Unlike the cosmos, it is not clear that the entropic state of our epidemiologic analogy is always increasing, or that it produces anything analogous to entropy's defining the direction of time itself.  So we have to be careful using the analogy. Still, it may help us think about what we are trying to understand.

If these ideas are useful they may help us think of ways to approach the problem or even to redefine our questions and objectives.  Even if we were to find that causal entropy is what we face, we'd have to hope some clever people would re-think our questions, or our objectives.

Friday, February 24, 2012

Claiming more? Show it! Faster than a speeding neutrino!

Marcello Truzzi, sociologist and founder or co-founder of a number of organizations investigating extraordinary claims, is said to have coined the phrase, "Extraordinary claims require extraordinary proof."  Carl Sagan popularized this phrase as "Extraordinary claims require extraordinary evidence."  And so it is with the recent claim that neutrinos can exceed the speed of light, the proverbial c in e=mc^2.

We posted about this remarkable claim when it first came out (here), and then when the same group claimed to have replicated their own results (here), and of course the story was all over the web.  As it should have been, because if true, it would have overturned one of the most robust theories in physics.  The finding was not only extraordinary, it was revolutionary.  (Ethan Siegel's blog, Starts With a Bang, has a bunch of very fine, accessible and detailed explanations of the whole story as it has unfolded.)

Flaws in the experiment are now being reported.  Here's the piece in Nature, but it's also everywhere -- like on the BBCScience Insider broke the news, and Siegel explains the possible alternative scenarios here.

The same group, OPERA, that made the original finding is now identifying their likely errors.  As the Nature piece explains it:
...according to a statement OPERA began circulating today, two possible problems have now been found with its set-up. As many physicists had speculated might be the case, both are related to the experiment’s pioneering use of Global Positioning System (GPS) signals to synchronize atomic clocks at each end of its neutrino beam. First, the passage of time on the clocks between the arrival of the synchronizing signal has to be interpolated and OPERA now says this may not have been done correctly. Second, there was a possible faulty connection between the GPS signal and the OPERA master clock.
The Science Insider story writes that if the group does the equivalent of rebooting their computer -- simpler, actually; just tightening the connection between the fiber optic cable that connects to their GPS receiver -- that one fix would add those missing 60 nanoseconds back to the neutrino travel time.

Oops.


Though, the BBC tells the story differently.  As they tell it, OPERA says there's another possible explanation, which has to do with "the oscillator used to produce the events time-stamps in between the GPS synchronizations.  These two issues can modify the neutrino time of flight in opposite directions."   The BBC says that tightening the connection would increase the apparent already ultra-fast speed, while fixing the oscillator would slow it down.  That is, either they are more right, or they know why they're wrong.

So, apparently, within the group there's still hope of a revolution.  They'll keep us posted.  And while they work on tightening up experimental conditions, a group at Fermilab in Illinois and a group in Japan are hoping to test this themselves.

While we leave this to the physicists to sort out, there are still some lessons to be learned for the rest of us. The speed of light is a given in physics -- it has been tested without serious challenge for a century, and any suggestion that it can be exceeded must be met with skepticism. There are simply too many direct experiments, and zillions more indirect ones, that seem consistent with the theory. Were he alive today, Marcello Truzzi would surely have written the neutrino results up in his journal, The Zetetic (The Skeptic).

We don't have the same kinds of laws in Biology, as a rule, but evolution and the nature of genes are aspects that seem to come about as close to fundamental theory as we currently can get. And there are important lessons to learn about life at large, compared with what we learn from the ultra-tiny neutrino.

Whether the speed of light is actually 100.000000000000% constant in every 'vacuum' and every part of the universe, apparently has to do with various theoretical issues or explanatory frameworks that are beyond what we know anything about. However, it is close enough and a robust enough finding that we can argue about whether a neutrino can violate this law at all. Any deviation, no matter how tiny, will grab major headlines and be good for the physics professor business!

We are not qualified to say whether a quadzillionth of a percent deviation from the proverbial c, will change much that is of even theoretical importance. Does every single last photon always stream along at exactly the same speed all the time? That kind of constancy would be basically unprecedented in the word of even science's everyday life. What if it simply showed that c is not an eternally totally fixed value but that photon-travelers, like neutrinos, sometimes hustle, sometimes dawdle a tiny tad?

Be that as it may, we have little if anything that is anywhere near so precise, exact, and universal about life or evolution. The proof of this is how easily--routinely, even--professors and their reporter-acolytes proclaim essentially revolutionary, major, dramatic, or transformative new findings.

A new fossil often is claimed to entirely overturn everything we said we knew about human evolution, or so the media and the discoverer will have you believe (as we have commented recently in MT). A fossil found sucking its thumb would be argued to completely revolutionize our understanding of the evolution of thumbs (and depending on its age at death, perhaps also about the length of childhood in our ancestors!).

In contemporary genetics, which is Gee!Wash in GWAS, first the idea that would revolutionize everything was that common variants cause common disease; then that to a great extent the same gene variants cause the same disease in all populations; then it was rare variants are the culprits to be discovered with wholegenome sequence; then epigenetics; then copy-number variation; then gene regulatory networks. The chain of 'omics' revolutions is, so far, endless. Medicine will be revolutionized by being genomically personalized.

We are truly learning a lot about life, but the major changes in views claimed for each new finding or paper shows clearly that we simply do not have our theoretical 'neutrinos'. Our knowledge is too easy to 'revolutionize' by the next technology that comes down the pike, to be considered theoretically very sound knowledge.

Now, physicists will be melodramatic about whatever is found in those little hyper-travelers, just as biologists are about how every new genetic variant they discover that will guarantee immortality. And, not least of the physicists' worries in all this is whether it will affect their funding. They have their sick side just as we do: that finding out truth will determine whether we can keep our jobs--even though our jobs are, supposedly, to find out truth!

The educated public, and scientists ourselves, need to realize and acknowledge how very far we are from physics-like understanding of life. And given that, and the topsy-turvy claim-laden recent history of genetics, medical genetics, and evolutionary biology, there should be some slowing down, taking stock, and tempering of our claims. When we are this far from absolute truths, and no really sound underlying theory, we have no business over-promising, much less being in such a frenzied (fund-seeking-based) race.

Perhaps it's time for some medicine for our ailment: some sanctions for claiming too much, and not acknowledging the depth of our own loose connections. Or some accountability for real progress in (say) curing disease, rather than the moving target of promises not met. This is because if we had some accountability or didn't rush for headlines and snow-jobs at every turn, we might temper our thoughts as well as our claims, and spend more time and effort understanding how multiple, variable, hard-to-measure causal elements worked together, and varied, in relation to biological traits normal and disease, and their evolution.

We face very challenging and legitimate issues in biology, both to understand evolution and to bring about major biomedical advances. We should be able to make much better progress if we knuckle down more intensely to understand the complexity of life's complexity, rather that slicing and dicing it up into this or that one-size, large-scale, comprehensively enumerative ('omic') style approach that essentially promises to turn complexity into simplicity. We know the professional pressures that push us in the latter direction, and we all feel them, and we inculcate new members of the guild into that environment. But nobody seems to be resisting these pressures.

Basically, like physicists, we should own up to our rather large array of loose connections. Until we do, there is something well-known that's Faster than a Speeding Neutrino. It is the speed with which biologists rush to call a press conference to announce their latest Discovery.