Showing posts with label chance. Show all posts
Showing posts with label chance. Show all posts

Friday, January 27, 2017

Evolution as a pachinko history: what is 'random'?

We discussed a Japanese pachinko machine in an earlier post, a pinball machine, as an example of the difference between randomness and determinism, in an evolutionary context.   Here we want to use pachinko machine imagery in a different way.

The prevailing, often unstated but just-under-the-surface assumption is that every trait in life is here because of natural selection.  Of course, for a trait to be here at all, bearers of its ancestral states up to the present (or, at least, the recent past) were successful enough to have reproduced.  It would not be here if it were otherwise, unless, for example, it's itself harmful, or without function but connected to a much better, related trait since genes are usually used in many different bodily contexts and may be associated with both beneficial and harmful traits.  Most sensible evolutionary geneticists know that many or even most sites in genomes tolerate variation that has either no effect or effects so small that in realistic population sizes they change in frequency essentially by chance.





However, the widespread default assumption that there must be an adaptive explanation for every trait usually also tacitly assumes that probabilism doesn't make much difference.  Some alert evolutionary biologists will acknowledge that one version among contemporary but equivalent versions of a trait can evolve by chance relative to other versions.  But the insistence, tacit or expressed, is that natural selection, treated essentially as a force, is responsible.  The very typical view is that the trait arose because of selection 'for' it, and that's why it's here.  And speaking of 'here', here's where a pachinko analogy may be informative.

If a bevy of metal balls tumbles through the machine, each bouncing off the many pins, they will end up scattered across the bottom ledge of the machine (the gambling idea is to have them end up in a particular place, but that's not our point here).  So let's take a given ball and ask 'Why did it end up where it did?"





The obvious and clearly true answer is 'Gravity is responsible'.  That is the analogue of 'selection is responsible'.   But it is rather an empty answer.  One can always say that what's here must be here because it was favored (that is, not excluded) by fitness considerations: its ancestral bearers obviously reproduced!  We can define that as 'adaptation' and indeed in a sense that is what is done every day, almost thoughtlessly.

Gravity is, like the typical if tacit assumption about natural selection, a deterministic force for all practical purposes here.  But why did this ball end up in this particular place?  One obvious answer is that each starts out in a slightly different place at the top, and no two balls are absolutely identical. However, each ball makes a different path from the top to the bottom of the obstacle course it faces. Yes, it is gravity that determines that they go down (adapt), but not how they go down.

In fact, each ball takes a different path, zigging and zagging at each point based on what happens, essentially by chance, at that point.   This one might think of as local ecosystems on the evolutionary path of any organism, that are beyond its control.  So, in the end, even if the entire journey is deterministic, in the sense that every collision is, the result is not one that can, in practice, be understood except by following the path of each ball (each trait, in the biological analogy).  And this means that the trajectory cannot be predicted ahead of time. And in turn, this means that our interpretation of what a trait we see today was selected 'for' is often if not usually either basically just a guess or, more often, equates what the trait does today to what it was selected to be, expressed as if it were an express train from then to now.

And this doesn't consider another aspect of the chaotic and chance-affected nature of evolutionary adaptation: the interaction with the other balls bouncing around at the same time in such an obstacle course.  Collisions are in every meaningful sense in the game of life, if not pachinko, chance events that affect selective ones, even were we to assume that selection is simple, straightforward, and deterministic.

The famous argument by Gould and Lewontin that things useful for one purpose, such as 'spandrels' in cathedral roofs, are incidental traits that provide the options for future adaptations--life exploits today with what yesterday produced for whatever reason even if just by chance.  The analogy or metaphor has been questioned, but that is not important here.  What is important is that contingencies of this nature are chance events, relative to what builds on them.  Selectionism as a riposte to creationism is fine but hyper-selectionism becomes just another often thought-free dogma.  Darwin gave us inspiration and insight, but we should think for ourselves, not in 19th century terms.

A far humbler, and far less 'Darwinian' (but not anti-Darwinian!), explanation of life is called for if we really want to understand evolution as a subtle often noisy process, rather than as a faith.  Instead, even serious biologists freely invent--and that's an apt word for it--selective accounts, as if true explanations, for almost any trait one might mention. It's invented because some reason is imagined without any direct evidence other than present-day function, but then treated as if directly observed, which is rarely possible. Here is an interview that I just came across that in a different way makes some of the same points we are trying to make here.

Everything here today is 'adaptive' in the sense that it has worked up to now.  Everything here today is also a 4 billion year successful lineage, that all made its way through the pachinko pins.  But these are almost vacuous tautologies.  Understanding life requires understanding one's biases in trying to force simple solutions on complicated reality.

Monday, January 5, 2015

Is cancer just bad luck? Part I. Known risk factors are poor predictors

Cancers are a highly unpredictable set of diseases, representing a fundamental problem in understanding causation that Tomasetti and Vogelstein address in a recent paper in Science ("Variation in cancer risk among tissues can be explained by the number of stem cell divisions", 2 Jan 2015, Vol 347 Issue 217).  This paper has gotten a lot of notice, both approving and not.  A bit of history might be helpful.

Cancers are due to cell proliferation gone wrong, that is, not obeying the constraints on division and differentiation of their particular tissue.  The idea has been that this is due either to exposure to some environmental risk factor, including something to do with lifestyle, or to genetic predisposition.  Both seem to be true at the population level, with, for example, breast cancer associated with age at first birth or whether women breast feed or not, number of children, alcohol consumption, and so forth, and with clear genetic risk factors, like some BRCA1 and 2 risk alleles.  In populations, people who smoke are more likely to get lung and other cancers, people with HPV cervical cancer, and so on.  But this doesn't mean that everyone who smokes, or has particular genetic risk allele, will get cancer, and that's the issue.  

Even if a risk factor is know, that doesn't explain the immediate cause of a tumor, at the cell level.  That cause is gene(s) misbehaving, causing the cell to divide at an inappropriate time.  So the idea for decades had been that environmental agents that stimulated cell division put cells at risk of incurring a mutation, and environmental mutagens caused those changes, which were the ultimate or final causes of cancer.  Hence, the search for 'cancer' genes.  


In the old days (the 1990s!), direct searches for genes were generally not possible, with a few exceptions where viruses seemed to change genes in a cancer-causing way.  But some cancers seemed to be clearly familial, that is, inherited in a Mendelian way in families.  They were statistically predictable, but with the problem that the risk depended on whether you inherited a risk gene, and we could only make a probabilistic statement about that.  A few lucky breaks showed that finding such genetic mutations was possible.  Specific inherited cancer risk-genes was first and most clearly demonstrated for a couple of childhood tumors.  Most notably, perhaps, was the eye cancer, retinoblastoma.  A fortuitous chromosomal deletion allowed the responsible gene to be identified, which was rare at that time for biomedical genetics, that was largely confined to predicting risk with no understanding of nor ability to test the actual causal gene.  There were a few others with similarly lucky discovery.

However, when genotyping on a genome-wide scale became possible, the idea was clearly that we could search the entire genome for locations that were co-transmitted or associated with a given type of cancer.  There have been many different methods, and a few clear successes.  The hallmark, and indeed one of the first genomewide screens to yield a major risk factor, was the finding that the BRCA1 and BRCA2 genes could, when experiencing one of several particular mutations, lead to a very high lifetime risk of cancer.  This was done in large, multi-generational families, but the success spurred methods to search more generally in populations (that we now call GWAS or other types of searches).  The BRCA discovery led to the rampant genomewide approach that we have seen in the recent 15-20 years.  The idea underlying this work has been the idea of finding risk variants that are strong enough, if not to be transmitted clearly in families, at least consistently affect risk, and this has been extended to basically every trait someone could get a grant to study.

But even when BRCA causation was found, there were important questions.  Those inheriting a high-risk BRCA mutation and who did in fact get breast (or ovarian) cancer, did not get those diseases until mid- to late life.  The lifetime risk was very high indeed, and some unfortunately got separate cancers in each breast.  Yet this was not the rule.  So, if the gene 'caused' the cancer, why did it take so long to do it?  An obvious answer is that it was environmental factors.  Also, by far most cancers do not segregate in families in Mendelian fashion the way BRCA mutation effects can, and indeed relatives only share slightly excess risk.  Even cases are at only slightly elevated risk than controls for most cancer-related gene mapping results.  One would think that the final risk might be due to the additional contribution of environmental factors.

Epidemiological studies of environmental risk factors for cancer have identified the major ones -- smoking, asbestos, exposure to UV light and X-rays, exposure to some chemicals used in agriculture and so on.  So, many (especially environmental epidemiologists who don't have a stake in the competition for genomic funding) have argued that if genomic variation isn't a good predictor, environmental variation must be!  But after extensive work, environmental factors don't explain all causes of any given cancer, either, nor can exposure history reliably predict cancers -- only a small minority even of smokers goes on to develop lung cancer, e.g.  And, indeed, unlike smoking and a few others, most environmental associations and candidate factors aren't clear mutagens or promoters.  So what's going on??

Why don't environmental or genetic risk factors explain all the risk?
This is the problem Cristian Tomasetti, a mathematician, and Bert Vogelstein addressed.  Vogelstein was one of the pioneers of the search for somatic mutation. That is, the mutational change that makes a cell misbehave need not have been inherited, but have been generated during the person's life. Vogelstein years ago applied a particular technique to show that tumor cells contained a particular kind of mutation (called 'loss of heterozygosity') that was not found in non-cancer cells from the same individual, but often was found in particular genome regions for a given type of cancer (in particular, colorectal cancer).  That was rather clear evidence (and there was evidence from a growing number of other researchers, too) that cancer was indeed a 'genetic' disease, but not just due to inherited variants.

Tomasetti and Vogelstein point out that current data suggest that only 5-10% of cancers are caused by heritable factors, and environmental factors can't explain the wide disparities in risk of cancer in different tissues.  They wondered how much cancer is caused by chance and how much by environmental factors.  By "chance" they mean things that just happen to go wrong during the DNA copying that occurs during cell division, which is when a tumor gets started.  Their analysis suggests that these changes are just inherent molecular copying errors, that don't have to be induced by environmental factors.

Writing in the same issue of Science in which the paper appears, Jennifer Couzin-Frankel describes the work:
In a paper published...this week in Science, Vogelstein and Cristian Tomasetti, who joined the biostatistics department at Hopkins in 2013, put forth a mathematical formula to explain the genesis of cancer. Here’s how it works: Take the number of cells in an organ, identify what percentage of them are long-lived stem cells, and deter- mine how many times the stem cells divide. With every division, there’s a risk of a cancer- causing mutation in a daughter cell. Thus, Tomasetti and Vogelstein reasoned, the tissues that host the greatest number of stem cell divisions are those most vulnerable to cancer. When Tomasetti crunched the numbers and compared them with actual cancer statistics, he concluded that this theory explained two-thirds of all cancers.
Tomasetti and Vogelstein estimate the stochastic, or chance effects "associated with the lifetime number of stem cell divisions within each tissue."  These effects can be mathematically distinguished from environmental risk factors.  They predicted "that there should be a strong, quantitative correlation between the lifetime number of divisions among a particular class of cells within each organ (stem cells) and the lifetime risk of cancer arising in that organ."  And this is what they found, and how they determined that two-thirds of all cancers are due to chance; the changes that occur just by bad luck during DNA replication.

There are also life-history aspects of cell division that are generally consistent with this.  For example, neurons stop or at least slow down their division rates as the brain matures, while glial (supporting) cells keep dividing, and most brain cancers in adults are gliomas.  Retinoblastoma (eye cancer) risk is mainly at birth or early childhood, and retinal cells have stopped dividing after that.  But radiation treatment (an environmental mutagen) for RB has been found in the past, at least, to lead to later bone cancer, when bones are rapidly growing.

This has generated some attempts at rebuttal, which is not surprising, because many hopes as well as vested interests among geneticists and environmental epidemiologists are threatened by the finding.  But in fact, based on work and then-current ideas we ourselves were involved in back in the 1970s and 80s, the current kerfuffle is a reflection both of culpable misunderstanding, ignoring of long-standing evidence, wishful thinking, and looking away from some facts that raise challenges even for the 'new' explanation of cancer causation.  We'll discuss that tomorrow.

Thursday, July 24, 2014

On the mythology of natural selection. Part VIII: Complex evolution without selection?

The default and sometimes only explanation for the origin of complex biological traits is an often barely-altered invocation of Darwin's notion of an all-seeing force-like natural selection.  As we have said earlier in this series, when conditions are suitable, natural selection will follow essentially by definition of 'suitable'.  Natural selection is about the proliferation of better traits at the expense of lesser traits in a given environment.  But we have tried to suggest that there are many ways in which adaptive evolution can occur without that sort of selection.

In the processes we've discussed so far, we have largely avoided the notion of chance as anything but a blurring factor of purely deterministic notions.  Persistent genotype-based advantage will, depending on how strong the difference is, often proliferate even in the face of chance (genetic drift).  Theoretically, the fate depends (statistically) on the relative strength of the selective advantage of one state over its competitors, and the size of the population (which affects the chance aspects of reproduction).  In this situation, the success of the favored is to some extent, if not completely, largely like that of a steady force.  In fact, the models that show this largely assume a steady state (e.g., similar selective advantage of the allele in question over long time periods).  It is important in this theory, which is mathematical and not questioned, that selectively neutral or even somewhat harmful variants can also, if with less probability, proliferate at the expense of competing variants.  Overall, however, one can fiddle with the details of such a model to make of drift a fly, but not a fatal flaw, in the classical selectionist ointment.

But let us see how far we can go by considering that context changes all the time, sometimes more, sometimes less.  And if context changes all the time, selective value will change.  That makes things much less predictive over very long time periods, if these fitness differences are small.  We have to note this because as entrenched as deterministic selection is in the theory of evolution, the idea that evolution creeps along at usually a literally imperceptibly slow pace is equally entrenched.

What if we assume that there is no natural selection relative to some trait we wish to follow? To what extent could chance (drift) alone lead to adaptive traits?  We illustrate this notion with an example of we called receptor-mediated evolution in Chapter 10 of our book The Mermaid's Tale.   We schematically illustrate the evolution of a complex trait without any natural selection.  The legend is below the figure:

A.  Free-floating cells with environment-sensing surface proteins.  B, C. These experience mutation that make them able to adhere to copies of the same receptor on other cells (mutation-bearing cells are differentially shaded). This leads to aggregations of cells.  We can call it an ‘organism’ at some stage.  D, E.  Receptor-based aggregates sequester cells of specific mutational types or ‘species.’  The cells within a cluster can differentiate by gene expression, depending on whether they detect contact with the outer environment or not, forming specialized subsets (eventually leading to organs).  F.  A cluster can shed individual cells that will then divide to form new clusters of the same kind as their parents.  G. Mutations leading to the release of just the extracellular part of the receptor can bind to related cells elsewhere, triggering them to differentiate into new clusters—an early form of signaling.

Starting small and in some local area, there need be no serious competition for resources and hence no natural selection against (or for) the modified receptors that evolve by mutation in this little story.  Cells with like properties encounter each other randomly or locally because that's where they were formed.

This is of course schematic and hypothetical. But it shows, we think, that complexity can in principle arise slowly, element by element, without the need for competition for resources or overpopulation and so on.  What is required is that over time in some location, a variety of mutations arise among countless individuals (here, starting with cells).  Unless or until they do arise, evolution doesn't of course occur! This doesn't preclude the new evolving forms experiencing selection in some way or at some time, but the point is that it need not be a necessary part of the dynamics.  If chance combinations of non-harmful genotypes arise, and environments change, or a randomly arisen combination happens now to offer a viable function, can that continue to improve (very slowly) over time?

When, whether, where, or how often this sort of phenomenon accounts for adaptive change, very slowly and locally, is a matter to think about and perhaps there would be ways to test its credibility.  The slower evolution works, the greater is the plausibility that such phenomena can be a part of adaptive evolution--by drift and without the need for natural selection.

Drift?  Maybe--but is it, too, a mythological concept?
We have argued that chance in the form of what is called genetic drift must play a role in evolution. The course of evolution involves elements of competition but inevitably also of chance.  Chance has at least two relevant meanings here.

First, we might say that two foxes have somewhat different bodies, but are the same when it comes to catching rabbits.  The chance of a successful chase is the same.

Second,  genetic mutations in DNA sequence certainly happen sometimes by what is essentially chance: a cosmic ray from the sun zaps your DNA somewhere in a way totally unpredictable and, most important to evolution, that has no relationship to a trait that may affect the fitness of the victim.  When it comes to reproductive success, there is no selective difference between the new and competing existing genotypes.  For each genotype, the chance of reproducing is the same.

Now, how can we tell if the two foxes, or the two genotypes, have the 'same chance' of success?  What does 'same' mean here and how on earth could we possibly tell?

In this sense, one can never prove, essentially not even in principle, that two functional states are identical--that the difference is exactly zero.  By this criterion even drift becomes a mythological if not mystical notion.  Or you can take the position of a physicist who believes in deterministic laws of nature, then certainly at some level, even if you can't see it, there is a fitness difference. But as we have seen repeatedly in this series, that then is an assumption and defines all evolutionary change as being due to natural selection: If it survived it was selected for, end of story.

That is, as we have repeatedly said, a definition not a scientific statement.  But there's more than that.

Too small to detect, yet treated as if so important?
Let us suppose for the sake of argument that selection of a purely deterministic sort (steady, fixed selective difference between alternatives, no chance element, etc.) is taking place.   Let's say one state has a 1% advantage over its competitor.  Does this sound small?  Well, for evolutionarily relevant natural selection that would be considered quite unusually strong (remember, we're not here discussing artificial selection or selections such as antibiotic resistance which can be extremely strong).  Most selection in real-live Nature is probably at least ten times weaker--differences on the order of one part in a thousand or less.

But let's stick with the strong 1% advantage.  That means that if you have 101 offspring to my mere 100.  Here again we're letting it be deterministic not just a long-term average over a species' populations and countless generations.  Such a difference would be exceedingly difficult, or sometimes statistically impossible, to document from actual samples of completed fitness in a natural population.  Even if such a difference persisted for the thousands or more generations required for a major trait adaptation, it could not reliably be estimated at any given time, and that means for every given time (because even in generations when you did detect it, by some statistical criterion, you could not reliably know that this wasn't a fluke of sampling.

This in no way implies that slow, even steady and deterministic selection, the usual image, does not occur.  But it does mean that the image of great advantage in the raw competition of Nature is an exaggeration of large proportions.  It essentially gives an image that equates the carnage in the backyard as mainly about adaptive selection, rather than mainly about just plain carnage of everyone seeking its dinner.

Here the importance has serious implications for humans and those who seek Darwinian explanations for every little human trait, physical or especially behavioral.  The fact is that adaptive differences are essentially so trivial that they have no real import at any given time.   That is, they should not be used as tools to justify discrimination or inequality and so on.  How and whether such policies based on different traits, talents, and the like should be implemented cannot usually be justified on evolutionary grounds.  Evolutionary grounds are about net reproductive success, not human cultural values (in another post we noted that contrary scenarios from the usual are easy to construct!).

Tempering excessive invocation about natural selection is an important reason that we decided to do this long series about the nature of evolutionary adaptations and change.  That is because we see a sometimes rather fervent eagerness to revive evolutionary value judgments about people as individuals or as labeled groups.

Of course, many genetic variants lead to serious disease and clearly may impair reproductive success in a huge way.  But we treat disease for its own sake, not because of its evolutionary import.  Mixing sociological judgments with evolutionary theory is to dabble in the Devil's game, as history has shown.

If a trait is so adaptively important, why is so much variation still around?
If selection were strong because a trait were being refined or fine-tuned by selection, and selective differences were strong enough to make a contemporary mountain rather than molehill of, why is there then still so much variation?  Why hasn't selection made everyone almost alike?

For example, if intelligence (as in IQ scores, say) were so vital to the human place in Nature, why is there such a range between the very smart and the very not-so?  Here, we are not referring to pathological mental impairment.

The answer is that either trait differences aren't actually that relevant to evolution--that is make little difference to net reproductive success, or there is a balance between lowered reproduction due to selection, the blurring effects of chance, and the input of new variation by mutation and recombination. This would probably be the preferred explanation for most theoretical population geneticists.  But if true, it implies that selection really is not that strong after all, possibly because so many genes are contributing to the trait that individual differences simply cannot be tightly purged by selection.  Maybe this is in part because the many genes each have other roles to play as well, and can't be purged too tightly without affecting those traits.

Further, even if selection of a classical deterministic kind is at work, it could be that only the very fastest relative to the slowest fox has an advantage.  That will move the average chasing speed of foxes towards being faster, but other than the rare outliers, there need be no fitness-related differences.  This again is a very different idea from that of eagle-eyed, ever-vigilant, fine-tuned selection.

One should also realize that it isn't that foxes today are somehow more 'fit' than their distant ancestors were--that they have been struggling for eons of doing poorly to evolve being OK today.  At every age they were, as a population perfectly fit, for that particular time, as one would say if one were observing them at the time.

There are many issues here, but the bottom line is that at the level of individual genes, and probably at the trait level itself, selection is just not very precise and/or that the species does perfectly well with its broad trait variation.  Again, too big of a deal should not be made, on evolutionary grounds, for the range of differences we see.

In sum
We risk reductio ad absurdum by taking too strong a stand in any direction when it comes to the evolution of complex traits.  In any discussion of evolutionary factors, call them what you will, we face a major challenge in determining the reason for evolutionary success--or even, one might say, the meaning of 'reason' in this context.  We are stuck in a profound way with statistical statements based on empirical, and inherently limited samples, imperfect measurements, unobservable past events, and essentially subjective testing and decision-making criterion (a subject we've discussed before).

Tiny differences, be they 'due' to chance or some very weak force, can be imperceptible by such criteria but can accumulate.  They can lead over eons to something useful, and even if now and then nudged by other forms of selection, differential proliferation can occur essentially by what is reasonable to call chance.

Gene duplication is a form of drift that in principle can lead to redundancy that can buffer the organism against future mutations that generate function in one of the copies.  Most of our genomes have arisen, from early days, via duplication and rearrangement of existing bits of DNA (exon shuffling, in exact recombination, translocations, transpositions and the like).  Even standard genome evolutionary theory and explanations recognize this.  Relative to future function, gene duplication is a form of random event, like point mutation.  But duplication of existing functional elements provides a potential source of new function, usually related to current function--some of which can serve as fortunate 'pre-adaptations' for the organism's niche at that or later times.

Even more than that, as I have recently discussed in one of my regular column installments in Evolutionary Anthropology*** (with references to others' work), a random DNA sequence long enough to code for a protein of a respectable 50 or more amino acids is not trivial, if mutation or translocation or duplication generated a promoter sequence.  Since all nucleotide triplets can be used as codons, and only 3 of 64 possible codons are STOPs and there are 6 possible reading frames, etc. Other elements (polyA site, ATG, etc.) may also be needed, but genomes are big, organism numbers huge, and earth history very very long long.  What is transcribed need not be translated to be functional, as in the plethora of noncoding RNAs.  And the protein need not have a function right away so long as it doesn't get in the way of what a cell is doing.

A function can arise later.  Or not.  In the long history of life and the diverse functions, choices, and opportunities of species, the various forms of adaptive response discussed in this series may apply even to essentially randomly arisen genes.  When we dismiss anything but classical natural selection as our explanation, we close off other possible accounts for traits that we see here today.

We hope, at least, to have provided some food for thought on this fundamental aspect of causation in life and its genomes.




For discussions of ways chance and selection can mold what we see in genomes, work by Michael Lynch makes good reading (e.g., The frailty of adaptive hypotheses for the origins of organismal complexityPNAS, 2007 and his book The Origin of Genome Architecture); of course, he may not agree with what we say here.



***Weiss, K  Little Orphan's Nanny: Where do genes come from and who takes care of them?  Evol. Anthropol. 22: 4-8, 2013 (paywalled--email me for pdf)

Thursday, September 27, 2012

I am the Particle Man: Observer effect on family probability? (Part 3 of 3)

Since Monday and Tuesday we've been trying to answer what seems like a very simple question: What are the odds of having different sex ratios in a five-kid family? Like, what are the odds that Ann and Mitt Romney had those five boys?
source
We started investigating this question because I was viscerally annoyed with the simple calculation that 1/32 is the probability that a family of five will be all girls or all boys. Those odds imply that it's rare when it should be just as likely as any family of five.

If you haven't read them yet, please see Monday's and Tuesday's posts before starting here. They're the start of this journey that I'm chronicling, ending with today.

We stopped on Tuesday with a change of strategy in estimating the odds of different family compositions. See my long list of all 32 possible series of boy/girl in a five-kid family and add up the ways to achieve the six different family compositions. Here are our results:

What are the odds that you'll get...
5 girls, 0 boys? 1/32
5 boys, 0 girls? 1/32
4 girls, 1 boy?  5/32 (there are 5 possible series out of 32 that make up this boy/girl ratio in a family)
4 boys, 1 girl? 5/32
3 girls, 2 boys? 10/32 (there are 10 possible series out of 32 that make up this boy/girl ratio in a family)
3 boys, 2 girls? 10/32

(Psst. I googled how to calculate probabilities and found this website and DINGALING! they're actually using my example. And here's a nice site showing how to work with a binomial equation rather than list all the possible 32 outcomes like I did Tuesday.)

This sort of thinking about probabilities should remind you of how the odds of the outcomes of rolling the dice are not uniform across all numbers. Your best bet is a 6, 7, or 8 because there are more ways to get those three numbers than the others.


(The following list was edited thanks to a very nice comment, February 5, 2015) 

to roll a ...
2 ... there is 1 way: 1+1
3  ... there are 2 ways: 2 + 1; 1 + 2
4 ... there are 3 ways: 3 + 1; 1 + 3; 2 + 2
5  ... there are 4 ways: 3 + 2; 2 + 3; 4 + 1; 1 + 4
6  ... there are 5 ways: 3 + 3; 2 + 4; 4 + 2; 5 + 1; 1 + 5
7  ... there are 6 ways: 6 + 1;1+6; 5 + 2; 2 + 5; 4 + 3; 3 + 4
8 ... there are 5 ways:  4 + 4; 5 + 3; 3 + 5; 6 + 2; 2 + 6
9  ... there are 4 ways:3 + 6; 6 + 3; 5 + 4; 4 + 5
10  ... there are 3 ways: 5 + 5; 6 + 4; 4 + 6
11  ... there are 2 ways: 5 + 6; 6 + 5
12  ... there is 1 way: 6 + 6

(Psst. If you still think 7 is lucky for rolling the dice, then you should have more of a think about probability.)

And just like 6,7,and 8 from rolling the dice, having three boys and two girls (or three girls and two boys) has a "luckier" or higher probability, or more probable, more likely sex ratio in a family of five children.

How do we know which of the two sets of probabilities that I calculated--Tuesday's or today's--is correct?
All girls, no boys:         1/6 or 1/32?     (17% or 3 %)
All boys, no boys:        1/6 or 1/32?     (17% or 3 %)
Four girls, one boy:     1/6 or 5/32?     (17% or 16%)
Four boys, one girl:     1/6 or 5/32?     (17% or 16%)
Three girls, two boys:  1/6 or 10/32?   (17% or 31%)
Three boys, two girls:  1/6 or 10/32?   (17% or 31%)

I see very clearly why our second method (in bold) is superior to our first which was to incorrectly divvy up the odds in sixths. That is, I can see clearly why the odds of having five girls is still 1/32 and not 1/6. There are so many more ways to make a family of five with four girls or with three girls or with two girls than to make one with five girls, so you can't possibly have evenly distributed 1/6 odds for all those types of families of five children.

Initiate mind-blowing sequence.
But when you take the long view, 1/6 (or at least higher odds than 1/32) for a streak of five girls still seems not so crazy.

After all, the odds of having five children of all the same sex are only the lowest, the rarest, becuase we've arbitrarily decided that our family in question maxes out at five!

Would we find those same low odds of 1/32 for five girls in a row if the family had six kids--having more opportunities to have streaks of five girls during that span?

That's (a +b)^6 and if you scratch it out on a piece of paper you don't need to expand the binomial equation. Odds of having six straight girls is 1/64.  Same for any series you can make out of six births (all of which add up to a total of 64 different series of boy/girl adding up to six kids).

And then by just sketching or scribbling (but if you're fancy, you can also just use the binomial) you can see how you can get only three series (gggggg; gggggb; bggggg) to have five girls in a row to occur in a family with six births.

That means the odds of having a streak of five girls in a six child family is 3/64 which is 4.6875% (compared to 1/32 or 3.125% in a five child family).

So the odds are slightly larger in a bigger family.

Wait. Did I just do that right?

Let's try a family of seven to make sure I did.

Here are all possible streaks of five girls in a family of seven...
ggggggg
ggggggb
bgggggg
bbggggg
bgggggb
gggggbb
gggggbg
gbggggg

The odds of having five girls in a row in a family of seven =  8/128 = 6.25%

Okay, with a bigger family, the odds are even larger.

What about a family of eight? The odds of having five girls in a row in a family of eight =   19/256 = 7.4%
(Trust me... I scratched it out. And it could be more than 19/256, but my contacts fogged up before I could find anymore.)

Okay, yes. The odds of having a streak of five girls increase as the size of the family increases.

Wait. What?! How do odds change? Odds are odds?

Instead of going up in scale again to check, making calculations even harder, let's go down in scale to check our math. We already know from Tuesday that the odds of a five-kid family having a streak of four girls is 3/32 (ggggg; ggggb; bgggg)  = 9%.

http://www.classicfilmguide.com/indexb6d1.html
Okay, now what about in a four-kid family? The odds of having four girls in a four-kid family are 1/16 = 6%.

WHAT?! Just by making a fifth baby, you've just seriously upped your chances of having a streak of  four girls. Your odds go from 6% if you max out at four kids to 9% if you max out at five kids. That sounds reasonable, but...

This means your odds of having a streak of four girls or five girls (or anything!) depend on what DIDN'T YET HAPPEN IN THE FUTURE.


I'm sorry. Hold on. Time out for a sec. My brain is literally inside out right now.


Am I seriously figuring out now--Today. This minute.--that probability is vulnerable to what hasn't yet happened in the future? And that the present can change past probabilities?

That sounds so familiar. That idea. But never do I think I've ever come to it by myself.

Until now I think it was always just a sentiment that Deepak Chopra hugged into to Oprah who gifted to Martha Stewart who baked into a lemon zest fortune cookie.*

So predicting or estimating frequencies can change by the very nature of the present? Very interesting.

Doesn't it sound like we're crossing streams with the whole quantum mechanics pickle about changing a particle's state the moment it's observed? (and here)

Are people just particles?!?!

(yes)

Am I on psychedelic drugs and where can you get some too? 


This shouldn't be so bleeping mind-blowing should it?

Unless... unless... As my repulsed reaction to a snappy "1/32" indicated at the outset back on Monday:  Small scale probabilities are different from large scale ones. Probabilities become different the bigger and bigger that you get.

And it's no secret that people who think evolutionarily think big. We're transcending space and time constantly. What? We are. You're welcome to join us. It's fun here. No vomit comets necessary either.

So if we approach 100, 1,000, or say... um... just to pull a random number from the air... SEVEN BILLION births, we should expect to have a much higher than 1/32 chance in finding a streak of five girls.

True. Nobody's making a family of seven billion children. So the question is, do we treat each family as defined by a finite probability or do we see births and families in our species as part of one big series with vastly different probabilities at that level than at the level of the family?

If it's the latter, we should expect what, exactly? Greater odds than 1/32 for having five girls in a row that's for sure ... Greater than 19/256 that's for sure ... The odds are x (where x = ways to make 5+ girls in a row out of 7 billion) divided by 7 billion and so they're going to be greater than 1/32 by a long shot! It may even be close to our earlier totally gauche calculation on Tuesday of 1/6 or it could be even higher!**

So why do we even calculate odds at the family unit level? Just to practice our algebra? Are they really as meaningless as my gut was screaming out in Monday's post?

No no no. I know why we calculate them in our math workbooks and our homeworks. It's not just algebra practice, these are hypotheses we can test. We can use these expectations to see whether there is any factor skewing the outcomes of some families, perhaps there is something biochemical in the babymaking process that results in one kind of offspring for some parents. We'd have to look at families (to account for genes, etc) or at clusters of people living in the same environment (to account for bio-enviro interactions). If we find that within those sorts of sample populations people are having an unexpectedly high number of  all-girl families (i.e. there are significantly more than 1/32 families of five children who are girl-only), for example, then we might suspect that there is not a 50/50 boy/girl probability with these folks each time they make a baby and that might entice us to investigate further into their genes or into their ground water, etc.

But back to these issues about small versus large perspectives that we've uncovered here...

In general, we might find 1/32 five-girl familes of five kids max in our species, but if you look at a hospital register, for example, we'll find streaks of five girls much much more frequently than 1/32 (3%) of the time.

There is something misleading about the way we calculate probability in a closed and narrow view of the world. And there is something subtly different about thinking probabilistically about a series of independent events and thinking probabilistically about their outcomes, instead, especially when many separate series can have the same outcomes (e.g.  rolling a 7 with the dice or having 3 boys and 2 girls).

I think I've located my trouble with probability. It's just a small one with having a large denominator. You know, something pretty easily surmounted--it's just grasping silly little old infinity's all.

O! Maybe later I'll see if I can dig up what the demographic data say. I can ask: How do the frequencies of sex-ratios in human families fit these tight little closed and narrow probabilities/hypotheses? And do birth registers in hospitals show something much different, probably larger? I shall hope to find out. Not because it's a mystery; I already believe I know the answer. But because I can't simply believe to know an answer if there is a real way to see one, and there is a way in this case, so I should go and see in order to believe.

Thanks for reading. Hope our little journey back in time to the fundamentals of statistics blew your mind even a fraction of the way it blew mine!

Further humbling questions and related thoughts are increasingly probable to appear in future posts....



***

*Which reminds me to share this video of the furry little Buddha who lives in our house: https://www.facebook.com/photo.php?v=10152120845975584

**Anybody know how to calculate this? Is a super computer necessary? Are there shortcuts for working with such large numbers--something like Pascal's Triangle perhaps?

Monday, September 24, 2012

Designer Babies, Probabilities, and Problem Abilities. (Part 1 of 3)

As is tradition in the blogging world, I'd like to note that yesterday marked the start of my fourth year writing here at The Mermaid's Tale and I couldn't be more grateful for all of the wonderful things this experience has brought to me over the past three years. Thank you Ken and Anne and thank you readers.

After that preamble, today's post requires its own preamble as well.

I'm writing a popular science book about subjects that are so far afield that I will most certainly embarass myself. But it's for good cause. It's subject matter that anyone who's interested in human evolution (and sex) needs to cover and no one has. Enter my dumbass. Enter my concept of "reproductive consciousness."

Anyway, in further preparation (as if 35 years hasn't been enough) for the droplets of flopsweat sure to pour when my book is published, here's a peek at Yours Truly wrapping my head around a fundamental concept that I take for granted every day and thought I'd gotten a grip on back in high school, if not before:

Very simple probabilities.

It's as simple as flipping a coin.
Oh really?! Please enlighten me, world. 
 
Intro, context, yadda
I had just asked if anyone at our lunch table knew of a decent TV news program to watch while exercising in the mornings, since I was--hours later--still traumatized from watching the hosts of Good Morning America read Hannah Montana's existential tweets with their journo-sylLAHbles.

Needless to say, we all decided that the newspaper and NPR are the only morning news options.

Naturally this lead me to remember a quality story and discussion I'd heard from the BBC on NPR about the latest in "designer babies," which is to say the latest methods for controlling the genotypes of offspring and the ethical issues and debates over it all. This particular story was about taking healthy mitochondria from one female to replace harmfully mutated mitochondria in a mother who wants to have healthy babies but with her own nuclear DNA. Or, alternatively, taking a donor embryo from a woman with healthy mitochondria and replacing the nuclear DNA with the parents' who are trying to conceive a healthy baby.

And the controversy about that led me to share what I'd learned in another story recently on Slate--a practice that seems more deserving of the criticisms described in the BBC story. In this instance, the harmful mutation is an entire chromosome: the unwanted Y or X, depending on parental desire. And biotech has advanced far past separating the heavy slow swimmers (X sperm) from the fast nimble ones (Y) and only allowing the one type to enter the race to the egg. For those who are desperate to control the outcome, you can now test the embryos directly and implant only the ones with the desired pair of sex chromosomes.


Now if you're one of those people like in that Slate story who's just dying to have a girl so you can do makeup and hair with her, I hope it's obvious that although you may be closer to your dreams with this sort of genetic control, you're still not guaranteed to hit the baby jackpot simply by getting one with an XX.  There are lots of XX humans who would rather curl up and die than curl their eyelashes or dye their hair.  And, on the other hand, there are lots of XY who fancy it. There is nothing on the X or Y chromosome that anyone has linked definitively to the proclivity for behaving in these sorts of gendered and cultured ways. [Except indirectly of course, by priming an individual to be gendered and cultured according to whether they have boy or girl anatomy.]

Further, even if there is a genetic component involved in affinity for doing hair and makeup, this method for choosing baby sex isn't typing or sequencing any genes! It's just dealing with the whole chromosomes, and even if it was typing or sequencing the genes on the chromosomes for such behaviors (if there are any such genes), those genes would still only indicate a probabilistic phenotypic outcome, not a determined, certain one. Other parts of the genome as well as the epigenome, microbiome, environment in utero and beyond, including interactions with humans and their behaviors... these factors and more contribute to these sorts of complex behavioral phenotypes.

So I hope that parents who are putting $18,000 dollars towards making sure that their child has XX chromosomes know that the genotype is only probabilistically linked to certain parentally-desired phenotypes. That's not just a large financial gamble (well, for most Americans), but it's also a huge gamble with someone's life, someone who's sitting completely vulnerably and literally in your hands for the earliest years. That gamble piles on top of the risks (to parents and the new human) already inherent in making a new life in the first place. No matter how much money you throw at biology, you can never contain all the probability. But I guess if you have 18,000 bucks you can get the probability to lean towards your dreams.

If you sensed a bit of prickliness here, it's for two reasons: The biological complexities to be sure, but also my lack of empathy for these sex-obsessed parents. It's hard for me to imagine creating and clinging to such specific dreams about uncertain biological and cultural outcomes rather than simply being hopeful that you'll be pleasantly surprised and then mostly happy with whatever uncertainly happens and unfolds in life.

But that's not even what got me to write this post, if you can believe it
While on the topic of choosing your baby's sex, a colleague who's one of five girls shared a good question: Within couples, is there a biological basis for a sex bias? She wondered if there could be some reason that a child from a particular couple might not have a 50/50 shot at being a boy or a girl. She wondered if something about a couple’s chemistry skewed the odds towards one sex and that this could explain why some families have a biased sex ratio, like hers with five girls.

That’s a good question and the first thing you'd have to do before you start investigating is you'd have to calculate the probability of having five girls, particularly five kids who are all girls. That way you could test your estimated, expected frequency against the frequency that you later go out and observe in nature.

Okay, easy.

Easy... Ha. Ha. Ha. If you could listen inside my skull, you'd hear me telling my younger math-savvier self two little words that rhyme with Chekhov.

As we began down this thought-experiment rabbit hole--mentioning first, of course, how there’s 50/50 odds at each birth--a colleague quickly mentally calculated that the odds are 1 in 32 for having five girls and no boys.

Yes: (1/2)^5  (i.e. 50% times itself five times) gives you a 1 in 32 chance (roughly 3%) of having five girls.

But this sounds way too rare. If odds are 50/50 each time you make a baby and if events are independent, that is the odds do not change based on prior events, then how could the odds of having five girls be so small? How could it be any smaller than having any other kind of family? The odds for all families should be the same.*

I was simultaneously reminded of so many correct answers I’d rotely written in math and stats courses, while also feeling completely repulsed by the theory. What kind of meaning does 1/32 contain? It seems like nothing! Each birth is a discrete event. No outcome of prior births have anything to do with the odds that each birth will be 50/50 odds a girl. So how does multiplying their odds together in a string tell you anything meaningful except that we're so clever we can multiply fractions together and come up with a smaller chance for a series of events than for a single event? Isn’t this basically biological meaningless information? Whenever I feel this vehemently frustrated I should probably figure out why, and I may as well drag you all along for the ride in case you can empathize or in case anything I uncover helps you too.

To be continued tomorrow...(linked here)

And it's already written, so don't nobody go and post any answers below in the comments, okay?

(And as we'll see, they are. Sort of.)

Thursday, March 22, 2012

Random events result in order -- how?

Development is ultimately very organized and predictable -- children look like their parents, legs are generally where they belong, and a lion never gives birth to a whale -- but yet another paper describes the randomness of the processes at the cellular level.  How can this be?

The paper is in the April BioEssays; "Genes at work in random bouts", Alexey Golubev.  Golubev says that things that go on inside cells are generally thought to be determined by the interaction of different molecules, which is itself determined by the concentration of those molecules in the cell.  Ordinary differential equations (ODEs) describing all this can be written, and, Golubev says, "ODE solutions may be consistent with oscillatory and/or switch-like changes in molecule levels and, by inference, in cell conditions."  This begins to make intercellular processes sound determined and law-like.

But, the article is basically about the stochastic (random, or probabilistic) events occurring in cells that affect their gene expression patterns, and hence the cycle between cell divisions or the time it takes the cell to express the genes related to its particular tissue  This variation, the author notes, makes stem cells--cells not committed to just one cell-type--plastic and flexible. 

But, as he points out, the idea of molecular concentrations is only true at the level of populations of cells, not in single cells themselves.  There's a lot of randomness in terms of what's going on in single cells, in cell differentiation and cell proliferation, particularly with respect to when genes are turned on or off, and thus which proteins are available, and what happens when. 

The question becomes, then, given all this stochasticity in cellular activity, how development is so organized.  The apparent problem is that once one reaction has taken place, it affects the next reaction, and this includes hierarchical changes such as changes in gene expression in the cell.  Thus, the cell is not just a mix of things, each in large numbers, that will 'even out' over time.  Differences that can be occasioned by chance in a cell can add up. Of course, if the cell continues to detect the same external conditions, its response may adjust so that things do even out.  But it doesn't need to happen. 

On the other hand, most tissues in most organisms are comprised of many cells of the same type.  Each may be experiencing stochastic changes, but their tissue-specific behavior may usually 'even out' because the variation will be slight and in different directions among the cells, so that on average they are doing the same, appropriate, thing.  In unusual circumstances, if this doesn't happen, the organism may be very different from its peers....or  it may not survive.

This perhaps reflects a fundamental property of populations, known as the 'law of large numbers'.  The theory behind this (Ken was just realizing from reading a book called The Taming of Chance (1990, by Ian Hacking, Cambridge Press)), comes from the study of populations of individually differing individuals, whose aggregate behaviors have regular distributions: the 'normal' or bell-shaped--or at least orderly--distributions of stature, incomes, and so many other things.  In another common phrase, they have 'central tendencies.'  Normal meant that most were near the norm.  This statistical idea was worked out over the 18th and 19th century, and raises interesting questions about causation.  Hacking's book shows how people had to learn that causation was not about precisely fore-ordained laws, but about probabilities, and that this applied to society.

The classic cases had to do with things like suicide.  One can't predict who will commit suicide in a given year, or by what means.  But the numbers, and the number who do it by each method, are very similar from year to year in a given population.  Likewise, the life expectancy is an average, and nobody lives exactly that long: some die younger, some older.  So are many social facts like political affiliations and so on.

Why is this?  It is the net, end result of many different individuals each with slightly varying characteristics.  There were many explanations of why this was, that are beyond this post, but in essence there are many contributing factors of diverse kinds, that mostly aren't known, so that a few individuals are exposed to many, others to only a few, but most of us to some 'average' amount of these factors.  The fraction exposed to many such factors is the fraction of individuals who are taller, more intelligent, ...., or who commit suicide.

The law of large numbers is a statistical fact that can be proven mathematically under rather general conditions.  This leads to central tendencies.  That is why population statistics took on a central role in social sciences, where often the underlying causal factors and their specific effects are unknown or hard to estimate accurately.  Social sciences can 'understand' society--at least predict some things about it--without understanding causation in the strict sense.  And in some situations, these things don't work very well--economics is one, in which stability of population outcomes occasionally, at least, takes a quick left-turn.

Probably the same applies to the populations of cells that make up a tissue, and if so this would make the high amount of probabilistic events in cells that would make each cell different, which can lead to a central tendency for the kidney to filter blood is similar ways, and so on.  Because of local differences among cells, different genotypes, and different life-experiences, kidney functions differ among people.  Some are at the extremes and we call that 'disease', but most are roughly near the norm.

Biologists routinely speak of chance, but often act as if they believe that genes 'determine' the organisms the way a program determines what a computer does.  They know about variation in populations, and how, for example, polygenic traits like stature or blood pressure (the darlings of the GWAS world) vary, even if they are driven to enumerate all the underlying causes that vary among individuals in the population.  In a sense, the population concept applied to tissue is of the same sort, and provides another source of variation between genotype and trait.

Wednesday, September 14, 2011

The Individual and the Group

Doctors treat patients one by one, but public health is about the whole population, or at least subsets of it treated as aggregates.  This is something we've touched on before, including here this week.   Generally, the latter is a higher-level abstraction of risk relative to individuals.  But doctors are not particularly trained in how to apply aggregate data to individuals, or at least there are important, often subtle differences between the two perspectives.  An article at the Huffington Post nicely discusses this from a physician's point of view.

But the same issues may apply to evolution.  A given trait, such as presence of some condition, like brilliance of feather color, or some level of a quantitative variable like blood pressure or stature, may have a net or average reproductive success, and such success rates may vary by the value of the trait.  The success rate can be a matter of chance or may be due to systematic functional effects of the trait value on reproductive success; the latter case is what we mean by 'natural selection'.  We view species today as aggregates, but each individual has its own trait value, so the distinction between population and individual is important on both contemporary and evolutionary time scales, and in similar ways.

In evolution, our models generally ascribe the relative fitness of a given trait, compared to the variation in the rest of the population.  This is because the frequency of trait values in the future depends on what is transferred from this generation to the next.  But in public health, things are somewhat different.  Values (judgments, treatment decisions, and so on) are made on individuals and for individuals, without regard to their effect on the future or on the whole population.  Your doctor treats your tuberculosis, not the population's.

However, in both cases there are risks, or probabilities, involved.  In evolution, what is the probability of having a particular number of children for someone with a given trait?  Note that we refer to traits, not genes -- the effect on contributing genes is an indirect result of what happens to individuals who bear them.  In medicine it is the risk of getting a particular disease for someone with a given level of exposure to some risk factor, or of a given response to therapy.

But while we need individual predictions, and evolution selects on individuals, risks are estimated from populations.  So things are a bit circular, or at least not straightforward.  This provides much to think about.  Risks that are large are easy and behave just as what you were taught in Statistics 101 said they would.  But risks that are small are not different from chance, or from other small risks, and that is not so easy to deal with. Unfortunately, small risks are often what we most have to deal with both in medicine and evolution.

What about risks that are hard to detect, or perhaps even impossible, such as the risk of cancer from dental x-rays?  Should you avoid such exposures because radiation is clearly proven, at the cell level, to cause mutations and mutations can cause cancer?  Who decides what an acceptable risk is, or the statistical criteria for saying that there is, in fact, a risk?  We usually use significance tests for this, but they're subjective judgments.

In evolutionary terms, change due to selection accumulates over generations, but so do chance changes.  If selective differences between contributing genotypes are very small, chance (genetic drift) can be a major force for change.

And what about even large risks?  We've mused on this before.  If the risk that someone with your cholesterol level will have a stroke is, say 15%, does that mean a die is going to be rolled with 6 sides, and if it comes up 6 you're a goner?  Or does it mean that 15% of people like you are certain to have a stroke, and the others just as certain not to (but that we don't know how to tell who's who)?  Public health, or aggregate perspectives in evolutionary biology, don't concern themselves about this, even though of course they know that everything happens to individuals.

But doctors and those who are arguing that some particular genetic variation is important in evolution have to think about the individual level.

Monday, September 12, 2011

Other perspectives on causation--genetic or otherwise

There's a thought-provoking article, "Epidemiology, epigenetics and the 'Gloomy Prospect': embracing randomness in population health research and practice," in the August issue of the International Journal of Epidemiology (IJE) by George Davey Smith, one of the smartest, most thoughtful -- not to mention prolific -- people in the field of epidemiology these days.  He's in the School of Social and Community Medicine at the University of Bristol in the UK, and the paper is the published version of the 2011 IEA John Snow Lecture which he gave at the World Conference of Epidemiology in Edinburgh this past summer.  In the paper he addresses some of the same issues of causation that we often blog about (most recently here) and publish elsewhere (e.g. here, in Genetics, and here, in the IJE), and in doing so he touches on what we think is one of the most overlooked actors in much of life; randomness, or chance.

George specifically addresses epidemiology, the field of public health that has to do with the understanding of patterns of disease, ideally so that public health measures can be instituted to prevent disease outbreaks.  But, his points are equally applicable to many other areas, certainly including genetics.

Epidemiology is a field that uses population-level data to understand disease in aggregate.  This is how risk factors like smoking are discovered, and how events such as food poisoning epidemics, or outbreaks of cholera are explained.  And the field has a long history of success in explaining many disease outbreaks, and identifying many significant risk factors for which public health measures (clean water, e.g; or anti-smoking campaigns) have been implemented.

This is all well and good, and perhaps useful to policy makers.  But, as with genetic studies, the amount of variation in risk that's explained by epidemiological studies is often small, and the usefulness of the population-level approach is limited when it comes to predicting outcomes for individuals, for them or their doctors to optimize their chances of avoiding nasty diseases, and this is the subject of Davey Smith's paper. He uses Winnie Langley as his example; she smoked for 95 years -- why didn't she get lung cancer?  (The actual provenance of these photos isn't clear because they're all over the web, but we got this one here, and the one below here.)

The purpose of epidemiology, as a branch of public health, is to identify causes of disease that can be eliminated or attenuated, to prevent disease.  This is a lot easier when the causes have major effects. Indeed, epidemiology, like genetics, is most successful at dealing with causes with large effects such as infectious agents, cigarettes, or obesity, the equivalent to genes for diseases such as cystic fibrosis or Tay-Sachs or the periodic paralyses.  Though, a major difference is that clearly genetic diseases are much rarer than diseases with widespread environmental causes.  But the point is the same -- current methods in both fields are much better at finding causes that pack a wallop.  Even those, such as dietary salt or cholesterol are not as straightforward as their public image.

Can the risk factors that epidemiologists or geneticists do identify be translated into predicting who will or will not get sick?  Not definitively in either case, although some rare alleles, such as for Huntington's or PKU come close.  In general, however, the answer is no -- despite what direct-to-consumer genetic testing companies would like to sell you.  At least, the probabilities are usuallly low, and the estimates of those probabilities not very stable or precise, since many factors including changeable environmental exposures affect what a given genotype may do.  We've written a lot about why what we know about evolution means this must be true, and after much discussion in his paper of why this is so in epidemiology, Davey Smith makes the same point.

Most epidemiological research, as genetic research, however, is based on the belief that if we just identify more risk factors/genes, we'll be able to account for enough of the variance in risk of our favorite disease that we will be able to predict who will get it.  Genetic epidemiology, 'life course epidemiology', social epidemiology, and so on, are all attempts to expand the universe of risk factors such that eventually the field captures them all, from the uterine environment to old age.

But, as Davey Smith points out -- and we think it's fair to say, as we've pointed out numerous times over many years ourselves -- there is much too much randomness in life to ever reach this goal, even assuming all those replicable risk factors people are now looking for could be found.
The chance events that contribute to disease aetiology can be analysed at many levels, from the social to the molecular. Consider Winnie; why has she managed to smoke for 93 years without developing lung cancer? Perhaps her genotype is particularly resilient in this regard? Or perhaps many years ago the postman called at one particular minute rather than another, and when she opened the door a blast of wind caused Winnie to cough, and through this dislodge a metaplastic cell from her alveoli? Individual biographies would involve a multitude of such events, and even the most enthusiastic lifecourse epidemiologist could not hope to capture them.  Perhaps chance is an under-appreciated contributor to the epidemiology of disease.
He nicely dismantles the idea that siblings' shared environments will be a major clue to risk of most diseases, because, for one thing, it turns out that we share about as much with our siblings as we do with people who grow up in other households.  In large part this is because chance or stochastic events are much larger components of what happens to us than generally assumed.  Current methods tend to allow for statistical noise, but not for the essential role that chance plays in our lives, from the cellular level on up.  This has long been known, but scant attention has been paid to it by the reductionist sciences that epidemiology and genetics are.

Davey Smith points out that epigenetics is the current fad, based on the hope that by finding epigenetic mechanisms we'll soon be able to explain what now just looks like chance, but that this is a false hope.  He makes further points in this long paper, including offering an evolutionary explanation for the centrality of chance in life (it's advantageous to have a variable genotype given that environments are changeable), and so on.

Davey Smith concludes that the purpose of epidemiology after all is not to predict the fate of individuals but to provide population-level statistics.
For our purposes, it is immaterial whether there is true ontological indeterminacy—that events occur for which there is no immediate cause—or whether there is merely epistemological indeterminacy: that each and every aspect of life (from every single one of Winnie’s coughs down to each apparently stochastic subcellular molecular event) cannot be documented and known in an epidemiological context. Luckily, epidemiology is a group rather than individual level discipline, and it is at this level that knowledge is sought; thus averages are what we collect and estimate, even when using apparently individual-level data.
The point of the discipline is to "provide simple, understandable and statistically tractable higher-order regularities".

We're with George up to this point.  Indeed, when epidemiology can point to causes that public health measures can deal with (clean water, window screens, vaccination campaigns) -- that is, population-level causes that are amenable to population-level controls -- it has done its job, and done it well.  But why hasn't environmental epidemiology explained the asthma epidemic satisfactorily?  Even with population-level data.  And why don't the large population-level studies of hormone replacement therapy, or calcium and vitamin D population yield the same results?  Again, this is equivalent to the failings of GWAS (genome wide association studies).  And who can predict heart disease in the future when so many cultural changes, involving the dynamics of lifetime exposures to risk factors known and unknown?

Part of the problem is that main effects can differ among populations -- even assuming what a 'population' is and how one defines and samples it, and that the population-specific effect is not due to changeable population-specific environments.  The ApoE 4 gene variant is associated with Alzheimer's disease in European-derived populations, but much less so in African Americans, for example.  And the same risk variant, which is relatively infrequent in humans, is the standard in our close primate relatives.  Causation is relative, even when strong.  So even the population-based view of epidemiology is often problematic.

There is another point about randomness.  Sometimes, what we mean is that there is a distribution of probabilities of outcomes, as in 1's or 6's in rolls of dice.  There, we know that one has a 1/6 chance of a specific result, the probabilities (risks, in this context) are known and predictable, even if each individual's outcome isn't specifically knowable in advance.  But many chance ('random') factors have no such underlying theoretical distribution of this kind -- the probability you'll be struck by lightning, or that some part of some artery will be clogged by cholesterol plaque.  Dealing with that kind of randomness is far more problematic, yet that is likely to be the major role of probabilism.  In that case, all we can do is estimate risk from past experience and hope the same applies to the future....but we know, in changeable environments, that it won't.

The same kinds of statements apply with even more force when we're trying to infer evolutionary history and how today's genes and their effects got here.  It is a humbling lesson that is difficult to accept, even if the evidence for it is very strong.

As for Winnie, she may not be that much of an outlier, after all, perhaps in fact confirming that epidemiological methods can work when it comes to risk factors with large effects.  She may have smoked all her life, but she said she was too poor to smoke more than 5 cigarettes a day, and after 100, only smoked 1.