Showing posts with label ForSim. Show all posts
Showing posts with label ForSim. Show all posts

Friday, August 26, 2016

Is life itself a simulation of life?

It often happens in science that our theory of some area of reality is very precise, but the reality is too complex to work out precisely, or analytically.  This can be when we decide to use computer simulation of that reality to get at least a close approximation to the truth.  When a phenomenon is determined by a precise process, then if we increase the complexity of our simulation, and if the simulation really is simulating the underlying reality, then the more computer power we apply, the closer we get to the truth--that is, our results approach that truth asymptotically.

For example, if you want to predict the rotation of galaxies in space relative to each other, and of the stars within the galaxies, the theories of physics will do the job, in principle. But solving the equations directly the way one does in algebra or calculus is not possible with so many variables.  However, you can use a computer to simulate the movement and get a very good approximation (we've discussed this here, among other places).  Thus, at each time interval, you take the position and motion of each object you want to follow, and those measures of nearby objects, and use Newton's law of gravity to predict the position of the objects one time interval later.

If the motion you simulate doesn't match what you can observe, you suspect you've got something wrong with the theory you are using. In the case of cosmology, one such factor is known as 'dark matter'.  That can be built into models of galactic motion, to get better predictions.  In this way, simulation can tell you something you didn't already know, and because the equations can't be directly solved, simulation is an approach of choice.

In many situations, even if you think that the underlying causal process is deterministic, measurements are imperfect, and you may need to add a random 'noise' factor to each iteration of your simulation.  Each simulation will be slightly 'off' because of this, but you run the same simulation thousands of times, so the effect of the noise evens out, and the average result represents what you are trying to model.

Is life a simulation of life?
Just like other processes that we attempt to simulate, life is a complex reality.  We try to explain it with the very general theory of evolution, and we use genetics to try to explain how complex traits evolve, but there are far too many variables to predict future directions and the like analytically.   This is more than just because of biological complexity however, in part because the fundamental processes of life seem, as far as we can tell, inherently probabilistic (not just a matter of measurement error).  This adds an additional twist that makes life itself seem to be a simulation of its underlying processes.

Life evolves by parents transmitting genes to offspring.  For those genes to be transmitted to the next generation, the offspring have to live long enough, must be able to acquire mates, and must be able to reproduce. Genes vary because mutations arise.  For simplicity's sake, let's say that successful mating requires not falling victim to natural selection before offspring are produced, and that that depends on an organism's traits, and that genes are causally responsible for those traits.  In reality, there are other process to be considered, but these will illustrate our point.

Mutation and surviving natural selection seem to be probabilistic processes.  If we want to simulate life, we have to specify the probability of a mutation along some simulated genome, and the probability that a bearer of the mutation survives and reproduces.  Populations contain thousands of individuals, genomes incur thousands of mutations each generation, and reproductive success involves those same individuals.  This is far too hard to write tractable equations for in most interesting situations, unless we make almost uselessly simplifying assumptions.  So we simulate these phenomena.

How, basically, do we do this?  Here, generically and simplified, but illustrating the issues, is the typical way (and the way taken by my own elaborate simulation program, called ForSim which is freely available):

For each individual in a simulated population, each generation, we draw a random number based on an assumed mutation rate, and add the resulting number and location of mutations to the genotype of the individual.  Then for each resulting simulated genotype, we draw a random number from the probability that such a genotype reproduces, and either remove or keep the individual depending on the result.  We keep doing this for thousands of generations, and see what happens.  As an example, the box lists some of the parameter values one specifies for a program like ForSim.



Sometimes, if the simulation is accurate enough, the probability and other values we assume look like what ecologists or geneticists believe is going on in their field site or laboratory.  In the case of humans, however, we have little such data, so we make a guess at what we think might have been the case during our evolution.  Often these things are empirically estimated one at a time, but their real values affect each other in  many ways.  This is, of course, very far from the situation in physics, described above!  Still, we at least have a computer-based way to approximate our idea of evolutionary and genetic processes.

We run this for many, usually many thousand generations, and see the trait and genomic causal pattern that results (we've blogged about some of these issues here, among other posts).  This is a simulation since it seems to follow the principles we think are responsible for evolution and genetic function.  However, there is a major difference.

Unlike simulations in astronomy, life really does seem to involve random draws for probabilistic processes.  In that sense, life looks like it is, itself, a simulation of these processes.  The random draws it makes are not just practical estimates of some underlying phenomenon, but manifestation of the actual probabilistic nature of the phenomenon.

This is important, because when we simulate a process, we know that its probabilistic component can lead to different results each time through.  And yet, life itself is a one-time run of those processes. In that sense, life is a simulation but we can only guess at the underlying causal values (like mutation and survival rates) from the single set of data: what actually happened its one time through.  Of course, we can test various examples, like looking at mutation rates in bacteria or in some samples of people, but these involve many problems and are at best general estimates from samples, often artificial or simplified samples.

But wait!  Is life a simulation after all?  If not, what is life?
I don't want us to be bogged down in pure semantics here, but I think the answer is that in a very profound way, life is not a simulation in the sense we're discussing.  For the relevant variables, life is not based on an underlying theoretical process in the usual sense, of whose parameters we use random numbers to approximate in simulations.

For example, we evaluate biological data in terms of 'the' mutation rate in genomes from parent to offspring.  But in fact, we know there is no such thing as 'the' mutation rate, one that applies to each nucleotide as it is replicated from one generation to the next, and from which each actual mutation is a random draw.  The observed rate of mutation at a given location in a given sample of a given species' genomes depends among other things on the sex, the particular nucleotides surrounding the site in question (and hence all sites along the DNA string), and the nature of the mutation-detection proteins coded by that individual's genome, and mutagen levels in the environment.  In our theory, and in our simulations, we assume an average rate, and that the variation from that average will, so to speak, 'average out' in our simulations.

But I think that is fundamentally wrong. In life, every condition today is a branch-point for the future. The functional implications of a mutation here and now, depend on the local circumstances, and that is built into the production of the future local generations.  Life in fact does not 'average' over the genome and over individuals does not in fact generate what life does, but in a sense the opposite.  Each event has its own local dynamics and contingencies, but the effect of those conditions affects the rates of events in the future.  Everywhere it's different, and we have no theory about how different, especially over evolutionary time.

Indeed, one might say that the most fundamental single characteristic of life is that the variation generated here today is screened here today and not anyplace else or any time else.  In that sense, each mutation is not drawn from the same distribution.  The underlying causal properties vary everywhere and all the time.  Sometimes the difference may be slight, but we can't count on that being true and, importantly, we have no way of knowing when and to what extent it's true.

The same applies to foxes and rabbits. Every time a fox chases a rabbit, the conditions (including the genotypes of the fox and rabbit) differ. The chance aspect of whether it's caught or not are not the same each time, the success 'rate' is not drawn from a single, fixed distribution.  In reality, each chase is unique.

After the fact, we can look back at net results, and it's all too tempting to think of what we see as a steady, deterministic process with a bit of random noise thrown in.  But that's not an accurate way to think, because we don't know how inaccurate it is, when each event is to some (un-prespecified) extent unique.  Overall, life is not, in fact, drawing from an underlying distribution.  It is ad hoc by its very nature and that's what makes life different from other physical phenomena.

Life, and we who partake of it, are unique. The fact of local, contingent uniqueness is an important reason that the study of life eludes much of what makes modern physical science work.  The latter's methods and concepts assume replicable law-like underlying regularity. That's the kind of thing we attempt to model, or simulate, by treating phenomena like mutation as if they are draws from some basic underlying causal distribution. But life's underlying regularity is its irregularity.

This means that one of the best ways we have of dealing with complex phenomena of life, simulating them by computer, smoothes over the very underlying process that we want to understand.  In that sense, strangely, life appears to be a simulation but is even more elusive than that.  To a great extent, except by some very broad generalities that are often too broad to be very useful, life isn't the way we simulate it, and doesn't even simulate itself in that way.

What would be a better approach to understanding life?  The next generation will have to discover that.

Tuesday, June 5, 2012

Steal This Book! Computer simulation and scientific theory

Abbie Hoffman
In the riotous protest times of the '70s, leading protester Abbie Hoffman published Steal This Book, "a manual of survival in the prison that is Amerika."  Of course, one must assume that Hoffman wasn't too opposed to the system to decline any royalties, nor that he really meant for copies to be stolen: presumably the idea was to read the book and understand the realities of society at the time.  Then you could choose to accept or fight the system, or at least understand it and what it's doing to you.

We devote a lot of effort in MT to commenting on and, yes, criticizing what we believe are deserving targets in contemporary science, especially as relates to genetics and evolution. We premised MT on ideas in our book of the same name, because we think evolution and an over stress on simplified genetic causal thinking diverts attention from many aspects of biology that we feel are at least as important.

People resist learning some lessons we think they should learn, perhaps largely out of ignorance (though in many ways intentionally not facing what might dampen various vested interests). 

With Brian Lambert, I have developed a highly general and flexible computer simulation program called ForSim, for simulating genetic causation and its evolution.  Its dual major purposes are first, to generate simulated, but realistic, data to test various theories and detection methods for complex phenotypes--such as those so intensely being pursued by GWAS and other methods.   Users can simulate the data and then sample it in various ways (families, case-control studies, etc.) to see how much and how one can find of what is known (because the simulation generates all the data required) to be the truth.  Secondly, the evolution of that genetic architecture within and between populations can be simulated, to understand how genetic effects change.

ForSim is a net-effects program, that omits many important aspects of genetic + environmental causation, such as those that make up the bulk of the book MT.  Thus, it greatly oversimplifies reality. But it tries to be natural in many ways (future additions will explicitly allow simulation of gene networks, developmental biology, and episodic traits like some diseases).

ForSim is a complex, intricate program and most readers of this blog would not be interested in or attempt to use it.  Fine, that's not our point. Our point in mentioning it is that just to see what is involved in complex traits and their evolution is a sobering lesson in why we object to simplistic ideas and rosy promises. 

ForSim execution flow
If one absorbs the message, one should be less sanguine or naive about what is being promised and found (or not) in the real world.  And one can get a sense of why we say what we do!  We did not invent biological complexity or the reasons why gene mapping (GWAS and similar approaches) are struggling as they are (as reflected rather clearly in the flood of papers aggressively praising their dramatic success).

We don't expect you to use ForSim, but if you're interested in seeing just what is involved in even a restricted evolutionary simulation, read the ForSim book!  You don't have to steal it, because the Manual can be downloaded here.  It's free (as is the program for any MT reader who might want to try it).

Again, we're not advertising anything from which we make any monetary or other gain.  We use the program, but just reading the Manual can be very instructive.  We wrote the program, and use it, and talk about it, because one way or another we think everyone, scientists and public alike, should be made aware of the realities of the causal complexity that so often is an inherent part of life.

But....wait!  What exactly is computer simulation?  Can't you simulate anything you want, the way a video game simulates Dungeons and Space Fighters?  Isn't what we really need an improved actual theory, some laws of life that really work well in terms of relating your genes to your traits?  Surprisingly, the answer is yes, you can simulate anything you want, but no, simulation isn't inferior to other kinds of theory even in this same respect.  We'll explain that next time.

Monday, May 4, 2009

How complex is 'complex'?

The word 'complex' is frequently used, though not always as clearly as it might be. In today's genetics arena it means a trait that is the result of multiple genetic elements as well as environmental factors that are usually unknown or not specified, but can include the genetic element's genomic background. Can we get a clearer understanding of this interaction in some way that has not yet been well-explored?

Most complex traits, whose genetic contributors GWAS and related mapping methods are designed to find (see earlier posts) show substantial evidence of being 'genetic' in some sense: there is correlation of the trait among relatives or an association of risk of the trait--like a disease--among family members.

The problem is that despite evidence for genetic involvement, GWAS and other methods have only been able to identify a small fraction of the contributing elements. One response is that we need larger studies. Another is that the objective is not to account for the disease in terms of genes, but to find genetic pathways that are involved.

Most common diseases have increased substantially, if not dramatically, within living memory and more importantly within the time since trustworthy epidemiological data on incidence (rate of new cases per year) or prevalence (fraction of persons affected) have been available.

This would suggest to reasonable people, even including some geneticists, that at least for preventive purposes the major responsible (and avoidable) factors for the disease are environmental, such as exposures to risk factors like toxins, lifestyle changes such as in diet, etc.

A few years ago, the molecular technology infrastructure for mapping studies was laid down, and paid for on the rationale that common genetic variants were likely responsible for these common diseases--and hence that genetics was a right way to approach them. Common variants for common disease (CVCD) became a mantra.

In response to the environmental and other arguments raised even at the time, proponents of CVCD and the investment in the gene-mapping infrastructure (e.g., the HapMap project) said that, yes, environmental factors clearly were involved, but the increase in prevalence was due to their interaction with common genetic susceptibility variants.

Subsequent mapping, including numerous, often huge genomewide association studies, has generally failed to find such variants. The meaning of 'common' can of course be adjusted to fit results, but the bulk of the heritability of these many studied traits remains unexplained. It's a fair question whether these traits are truly complex and largely unmappable, or whether we just haven't studied them enough.

A kind of widespread relevant evidence may be the following. The substantial heritability of common disease as well as normal traits suggests that many genes contribute; the traits are often called 'polygenic' for that reason. But these many genes might individually vary in their effects. For many theoretical and empirical reasons, one would expect some alleles (genetic variants) at one or a few genes, to interact or respond more strongly to changing environmental factors.

If that is the case, then the more important genes that were not identifiable in case-control or family samples before the environmental change, should be mappable afterward. That's because those variants that would be the main responders to the environmental change, whatever it was. Their individual effects, modest before the change, should be major after it.

Yet, today, after a long list of diseases have had large, rapid increases in prevalence, the GWAS findings are as we have seen: they are not identifying much that is of population-scale importance. On the surface, this suggests that the argument about complexity really is correct: there are, indeed, many genes involved, but they each make very small net effect on risk. A few are detected whose effects are greater, but they are few and even their effects are only modest.

From this perspective, which is based on data, not theory, secular trends in risk and the failure of GWAS to find CVCD's is relevant data, suggest that complex traits really are basically homogeneous in terms of genetic causation.

Now, if this is true it constitutes material evidence that should change our understanding of the nature of these traits: why would it be that there are generally no major alleles waiting for environmental changes to give them a chance to be expressed? Indeed, isn't that just how natural selection is supposed to work, with environmental change favoring 'good' genetic variants in the population and raising them to high frequency? Those variants should have substantial effect on the trait so the organism carrying the variants would reproduce more successfully.

If our thinking is correct, then this tells us something. Perhaps the networks of which biological traits are built are internally adjusting--strong changes in one part of a pathway network lead to slowing down of others. Yet, secular trends show that the net result can involve major change. It is indeed somewhat difficult to believe that the genetic responses to environmental changes are so internally homogeneous that even after major stimulus none really stands out even when studied in large samples. There must be a message there--if we can but figure it out!

These are just superficial ideas at this point, but they could help direct changes in what we look for, or how we look. We are starting to use an evolutionary simulation program that Brian Lambert in our group has written (see the description of ForSim on Ken's web page for details) to see if this point is correct as we think, or if there is some aspect of genetic control that we are overlooking. Stay tuned for results.