Thursday, March 19, 2015

My complexity is more complex than your complexity!

Scientists often talk about how complex their field is, and of course often they are right.  But a word like 'complexity' may be used confer a sense of importance and gravitas to the subject, and often the description--even if true--seems used in an advertising sort of way.  After all, who wants to be working in an area that's 'simple'?  If it's simple, why haven't we solved its problems, unless we're simpletons!

Describing our field as 'complex' usually doesn't just mean there are things in our field that we can't measure or don't know about.  That's always true in any science.  Instead, the term usually means that our phenomena of interest involve a host of causal factors that make the relationship between those factors and the outcomes we're interested in imprecise.  If science is about understanding cause and effect, then what we mean is that the effects we observe aren't easily predictable from the purported or known causes that we assess.

So, in chemistry the folding of proteins is complex.  The structure of galaxies in the cosmos is complex.  And the genetic and other factors causing our traits, like disease, are usually complex.

When we defend our inability to explain everything in our field by saying it’s complex and we’re working hard on it, we are in some senses seeking justification for lots more funding, and exculpating ourselves from being guilty of being too dense to see.  But to a great extent the reason we can’t see the forest for the trees is that we are embedded in the trees, or, there are so many trees that we simply can’t yet figure out the forest.  It is a perfectly legitimate state to be in, because, again, once a problem is solved it’s no longer a science problem—it may be an engineering problem to figure out how to use it and so on, of course.

But your complex isn't the same as my complex!
Every field is different of course, but to me there is a major, I think basically qualitative difference between the enormous complexity of fields like physics and that of biology.  I think this is not yet well recognized by biologists (especially perhaps in biomedical areas), who, as has been widely suggested, often live a life of physics envy and try to present their work with the flavor of and as if it had the rigor of physical science.

To me, the difference is not that biology should be free of physical laws, nor that biological phenomena are not, in deeply profound ways, constrained by those laws.  Living organisms are bags of interacting molecules that so far as we know entirely obey the normal laws of chemistry and those are in essence the laws of physics.  Unless we're into the mind/body duality debate (about, say, the nature of consciousness)--which we're not--our bodies are molecular phenomena.

The difference is the degree to which those kinds of laws are useful in predicting our kind of phenomena.  I think we can see the point, whether or not you'll agree with it, by taking an example from cosmology.

The numbers vary, but there are said to be something on the order of hundreds of billions of stars in a galaxy and hundreds of billions of galaxies in the observable universe.  The number of atoms or their components within each star is essentially countless.  Yet, stars move within galaxies in regular patterns, and galaxies move around each other in regular patterns.  These patterns are complex by anybody's standards, but it is important to try to understand them if we want to understand the cosmos.

NGC 4414, a typical spiral galaxy in the constellation Coma Berenices, is about 55,000 light-years in diameter and approximately 60 million light-years away from Earth; Wikipedia

Cosmologists are faced with what is called a multibody problem.  To predict the velocity and position of even a small number of bodies in space is beyond a formal or 'closed' (or 'analytic') solution.  In a sense, this is because every instant every object is changing and since every object affects every other object via gravity, they're all changing all the time.  One can simulate this, and an interesting recent discussion by Brian Hayes of how to do that is in the Feb-Mar 2015 issue of American Scientist, if you're interested, and our presentation here uses that to illustrate our point.

The gist of this approach takes advantage of the assumption that Newton's law of gravitation is perfectly true everywhere (if general relativity or other things change this, it's irrelevant to our point here).  Gravitational attraction of an object can be modeled as if all its mass were concentrated at a point located in space.  Between two objects, of mass M1 and M2, that are some distance r apart, the force of gravitational attraction is given by F12=G*M1*M2/r12-squared, where G is a universal gravitational constant that is known and simply a part of the nature of matter.  There's a separate Fxy for any two objects x and y, and the multibody problem is that for these four bodies, 1, 2, x, and y, there must be an F1x, F2x, F1y, F2y each with their own 'F' equation, but also since gravity is a force that causes motion, all the bodies are always moving.  So the equations are always changing (the distances, or r's are changing).

The point here is that every object interacts with every other object all the time, so that any change in the location of any one object affects the motion of every other object. The trick of simulating this for a great many bodies like the billions of stars in galaxies and of galaxies among each other, is to iterate one tiny time interval at a time, then compute these many forces, then apply them to each object to alter its motion, and then do the same for the next small time interval.  With super computers cosmologists can achieve a lot by simulating even whole galaxies (see the above reference on how they do it).

Surely the physicist is justified in calling this complex!

Genetics and evolution are complex in an additional way
Genomes work strictly by interacting with other things, because DNA is essentially inert by itself. There are billions of nucleotides in genomes, and each has its own electromagnetic effects in the nucleus.  Generally this is very small but the genome is organized into modules consisting of multiple adjacent nucleotides (or depending on how you count, these segments can be separated by some nucleotides not part of a given module); these modules may overlap in that the same nucleotide may be involved in more than one module.  The modules are identified by their function, because they have no a priori function.

Genomes do their business by holding codes for molecules that are copied from the DNA (e.g., functional RNA or  proteins decoded from mRNA), or are recognized by proteins and other molecules for gene regulatory, packaging, and other functions.  These interactions take place because of electromagnetic charges and similar properties of each interacting molecule and the local DNA. Many functional entities involve large networks of these kinds of interactions to produce an effect. Andreas Wagner describes this complexity, showing it is of hyper-astronomical scale, in his recent book Arrival of the Fittest.

If you think about genomes in this way, you might think of each interaction as, say, the relationship between two stars, and the resulting collaborations as forming physiological functions ('galaxies') and the whole you (galaxy clusters).  Complex, yes, but can it be broken down piece by piece and simulated or understood that way, as cosmological simulations do?  I think the answer is a heavily qualified, 'partly'.  The reason is that there is a big, or huge, difference between galaxies of biological function, and galaxies made of mere stars.

The pairwise, and hence multi-way interactions in biological systems do not follow a uniform law of chemical attraction, in the sense that each molecule has its own unique charge.  Further, interactions between two molecules depend on the presence of other molecules (e.g., cofactors) and the conditions (e.g., pH) of the cell at the time.  There is no comparable uniform law of chemical attraction, even if the laws of chemical attraction are uniform for particular cases (e.g., specific ions or isomers of elements).  Since I'm not a chemist, I'm doubtlessly not expressing this properly, but hopefully I have the basic point correct.

This means that parsing the interactions down one by one, and iterating over some short time interval, as can be done in cosmology, is far less possible in biology.  And here we have to consider millions of interactions between proteins, proteins and DNA, proteins and RNA, RNA and RNA, RNA and DNA, other types of molecules and those just listed (e.g., sugars or other molecules that modify protein molecules).

Cosmology creates stars and galaxies by the same principles with essentially the same ingredients, and has done almost since (literally) the beginning of time (conditions at the Big Bang itself seem to have been somewhat different).  Stars and galaxies come and go, each with different specific details, but each produce by the same few principles--or so it seems at present.

Evolution also began at a biological 'big bang', somewhere on earth.  But its consequences are different specifically because evolution works by generating differences.  Mutation and chance and selection within individuals and among individuals and species, has led to the biosphere's ad hoc diversity. The same basic physical and chemical laws apply, but at the level of interaction, we don't have the tools to generalize a priori and simulate complex organisms.

Systems biologists certainly do this for metabolic networks of various sorts, but they only touch the surface of what is possible, and this is true of simulations as well (again, see Wagner's book for discussion both of the networks of biology and efforts to simulate them).

In that sense, physicists, our complexity is bigger than your complexity!  So there!


James Goetz said...

I completely agree. For example, astrophysicists clearly know the indeterministic variables of galaxy and stellar evolution. But in the case of biology, we know the different types of mutations while we cannot always make a priori predictions for the effects of new mutations. That would be much more complicated.

Anonymous said...

Of course, this means that sociology is even more complex than biology since it is constrained by both physical laws and biological phenomena. Then again, complexity is not the same thing as difficulty. In addition, analysis at any level can only proceed by ignoring phenomena below or above a particular scale and simplifying those at the scale in question.

Ken Weiss said...

Yes, in essence every science faces complexities of its own kind and, if the questions aren't properly posed or are unrealistic, complexities of its own making!

Anonymous said...

Great post! As someone working in genomics, a lot of discoveries and analytical innovations are being made in DESCRIBING genomes and their various features and COMPARING genomes among species but we are still far from understanding how a genome makes a dog, a human or an amoeba. SNPs, CNVs, TEs, all manner of RNA, along with a myriad of mechanisms associated with pre- and post-transcriptional and translational regulation and beyond, work together in some combinatorial way to produce the diversity of life. It seems like only a few researchers, such as Andreas Wagner and his genotypic networks, are trying to move biology forward beyond simple differences in genome patterns. Systems biologists are attempting to put all the pieces together into some coherent picture of understanding, but many of them are working at the level of cells/tissues only. I work in evolutionary and conservation genetics, so SNPs are the main focus. Often they are anonymous (i.e., not located in annotated genes) but when they are known, there is an assumption of "functional importance" when a SNP reaches a high frequency in a particular population. This change may indeed be significant, but the often resulting assumption is strange: researchers will extrapolate from the SNP and its gene context to the entire organism as if that SNP has no other genomic relationships. Complexity is evaded or simply not comprehended. How did we as scientists get like this? Like the Modern Synthesis for evolutionary biology, we're still awaiting for a Grand Synthesis of Life. To get there, we will need to ask different/new questions. One of my favorite papers last year pointing towards a new directino was the perspective by Michael Lynch et al. 2014 in PNAS about evolutionary cell biology ( As those authors state in their abstract: "All aspects of biological diversification ultimately trace to evolutionary modifications at the cellular level." That's the level where genotypic networks act and have affect, so that's where the paradigm-shifting discoveries might be found. So once you have the genome of a kangaroo, an elephant and a mouse, it might be interesting to start comparing liver cells (e.g.) in these organisms to see how genomic changes, expressions and regulations (the "interactome" - ugh!) differ (or not).

Ken Weiss said...

To Anonymous:
I wish I could reply in the length your very thoughtful comment merits. We face some daunting issues. One has to do with probability and (vs?) determinism. Another is the nested effects of genotypes that evolution generates: somatic tree of descent among cells population tree of descent among genotypes, and ecosystem tree of descent among species, and probably more than that.

I try to read on the history of physics to see how the 'revolutionary' findings of the early 20th century were arrived at--how did they struggle with their version of complexity?

There are a lot of people, including Wagner, who are borrowing from physics the notion that the universe 'is' mathematics. But we have a different sort of regularity, we're not trained to deal with it, and there are precious few places where people are even being driven to think about it.

We've posted before on our view that statistical approaches make assumptions suitable for physics and chemistry (essentially, about the nature of replicability), that are not suitable for the types of questions we want to ask in biology.

We don't even have a useful 'inverse square' law for the interactions among coding and regulatory regions of genomes.

Do we have a long way to go, or is it really some singular insight? Or are we asking questions that may be of the sort we of humans want to ask, but are inapt relative to the evolutionary process that made us?