Thursday, December 18, 2014

(Other) lessons of the Broad Street pump: understanding causation isn't so easy

The iconic John Snow, often referred to as the "father of epidemiology," is commonly credited with discovering the cause of cholera after his careful, empirical examination of the 1854 outbreak of the devastating disease in the Soho neighborhood of London.  But I think it's only with hindsight that we can say this, and I think it's not quite right.

Snow was nothing if not a detail man.  A physician, he was very much an empiricist, experimenting and observing to test his ideas about health and disease like no one else of his time.  He had developed his waterborne theory of cholera some time before the 1854 epidemic, writing about it in detail in 1849.  The 1854 outbreak, very near his home, was an ideal circumstance for him to try to confirm his theory.

Modified from Snow's map in The Ghost Map; Johnson, 2006

Soon after the outbreak began, Snow began interviewing anyone with a family or household member who had died of the disease to determine the source of their drinking water.  Every case had drunk water from the Broad Street pump.  And, he confirmed that the worst symptoms were intestinal, not respiratory, which meant to him that the cause was something people had ingested rather than inhaled.  He found that there had been no cases among the 70 workers in the Broad Street brewery, because they were all given free beer, and never drank water at all.  From the information he collected, he drew his famous map of the neighborhood which showed that cases clustered around the Broad Street pump.  He concluded the pump was the source of the contaminated water that was making people ill.

He then enlisted the aid of a previously skeptical ally, and eventually convinced an even more skeptical local council to remove the handle from the pump -- to the disgust of many local residents who thought this was a cockamamie idea.  Not long after the removal of the handle, the epidemic was over.  But even Snow recognized that the epidemic had already begun to abate by the time the handle was removed.  That piece of the story is often lost, however; perhaps from the vantage point of 160 years on, when we know that Snow was right, the removal makes a nice tidy ending.

But did Snow identify the cause of cholera?  No, not in the way we would accept today.  We would say he had strong circumstantial evidence, but we'd require the causal organism.  There were multiple competing theories for the cause at the time. An excellent history of the epidemic, The Ghost Map: the Story of London's Most Terrifying Epidemic, and How it Changed Science, Cities and the Modern Worldby Steven Johnson (2006), tells the story in detail. Johnson writes that an editorial in the Times of London in 1849 considered the possible causes of cholera:
• “A … theory that supposes the poison to be an emanation from the earth”
• An “electric theory” based on atmospheric conditions
• The ozonic theory -- a deficiency of ozone in the air
• “Putrescent yeast, emanations of sewers, graveyards, etc.”
• Cholera was spread by microscopic animalcules or fungi, though
   this theory “failed to include all the observed phenomena.”
                                 Source: The Ghost Map, Steven Johnson, 2006,  Riverhead Books
Note that the idea that cholera was spread by "microscopic animalcules or fungi" was deemed empirically deficient by the editors of the Times, and it certainly was, as no organism associated with the disease had yet been identified.  In 1854 Snow himself looked at water from the Broad Street pump under his microscope, and had seen nothing of note.

And, Snow wasn't the only one with empirical, observed evidence for the cause of cholera.  Indeed, each of the alternatives put forth by the Times was entirely plausible, given the current state of knowledge.  Miasmatists were empiricists too: epidemics were localized in poor areas, where air smelled bad, water was filthy and smelled bad, there were more cases in cities, fewer cases in hills, no living organism had been found to suggest they were wrong.  What both Snow and the miasmatists had was circumstantial evidence, correlations, and belief in their preferred theory.  And, at the time, no definitive way to choose between them.

My point here is not to doubt Snow's theory, of course, but to suggest that although we now know that he was right, that was much less obvious at the time.  Indeed, it wasn't really until the organism that causes cholera, Vibrio cholerae, was discovered by Robert Koch in 1883 that Snow's story could be considered conclusive.  (Actually, the organism was first seen in 1854 by Italian anatomist Fillipo Pacini, but this was not well-known at the time.  If it had been, would Snow have had an easier time convincing people that he was right?  I think the germ theory of disease had to get going in earnest before that could have happened, so I think probably not.)

What killed the miasma theory?  One blow was the rise of the germ theory, and the discovery of organisms that caused disease, one after another.  (Though, is the miasma theory in fact dead?  Still today there is some thought that dirty air causes asthma!)

But determining the cause of infectious diseases has its own problems.  It wasn't, and isn't, as simple as seeing live organisms  under a microscope.  Robert Koch was a German physician and microbiologist who discovered a number of causal microbes.  He won the Nobel Prize in Physiology of Medicine in 1905 for his work on tuberculosis.  He proposed a set of postulates, first published in 1890, that were meant to be useful in confirming microbial causes of infectious disease.  

                                                           The Koch Postulates
1.The microorganism must be found in abundance in all organisms suffering from the disease, but should not be found in healthy organisms.

2. The microorganism must be isolated from a diseased organism and grown in pure culture.

3. The cultured microorganism should cause disease when introduced into a healthy organism.

4. The microorganism must be re-isolated from the inoculated, diseased experimental
host and identified as being identical to the original specific causative agent.

Unfortunately, and Koch knew this too, many microbes don't meet these criteria.  There can be asymptomatic carriers of cholera and other diseases; many microbes can't be grown in culture, and so on.  So, when a microbe behaves properly, following the postulates, all is good but when it doesn't, as with, say, HIV, controversy can ensue (see Duesberg).

Another blow to the miasma theory was the birth of a statistical basis for establishing causation.  The American philosopher, logician, and mathematician C.S. Peirce formulated the idea of randomized experiments in the late 1800’s, after which they began to be used in psychology and education.

Randomized experiments were popularized in other fields by R.A. Fisher in his 1925 book, Statistical Methods for Research Workers. This book also introduced additional elements of experimental design, and this was adopted by epidemiology.

Physician and epidemiologist Austin Bradford Hill in 1937, published Principles of Medical Statistics for use in epidemiology.  And, the development of population genetics, which Ken has been writing about this week, and the Modern Evolutionary Synthesis (which showed that Mendelian genetics is consistent with gradual evolution), and discoveries in genetics laid the foundation for approaches to looking for the genetic basis of traits and diseases.

Recognizing that attributing cause to disease needed a more formal approach, Bradford Hill suggested a set of criteria that he thought were at least useful to consider.  The "Hill Criteria," which he published in 1964, are still in use today.  
Strength: The larger the association, the more likely that it is causal
Consistency: Findings should be consistent between observers in different places.
Specificity: The more specific an association between a factor and an effect is, the bigger the probability of a causal relationship
Temporality: The effect has to occur after the cause
Biological gradient: Greater exposure should generally lead to greater incidence of the effect.
Plausibility: Must make sense
Coherence: Coherence between epidemiological and laboratory findings increases the likelihood of an effect
Experiment: "Occasionally it is possible to appeal to experimental evidence”
Analogy: The effect of similar factors may be considered.
         AB Hill, “The Environment and Disease: Association or Causation?,”
                          Proceedings of the Royal Society of Medicine, 58 (1965), 295-300.
Again, even the author knew that only one of these was actually a requirement for causation, as he discussed in the paper proposing the criteria; the cause has to precede the effect.  The others are either vague or just 'would be nice', or in many ways are highly or even purely subjective.  So, when they work, great and we attribute our conclusions to their application, but when they don't, it's not clear whether a possible factor isn't a cause, or just that the criteria aren't adequate for determining it, or our sample inadequate, or some other perhaps unknowable problem.

A set of "molecular Koch postulates" were devised in the 1980's, to determine the role of a gene in the virulence of a microbe, but they, too, have their failings for similar reasons.

And, statistical criteria have become the standard for determining causation, but we know that p-values are arbitrary (see Jim Wood's MT post, "Let's abandon significance tests", on this), that statistics are only as good as the studies that generate them, and studies are prone to biases and missing data and the like, and results can be difficult to replicate even if studies are state-of-the-art.  David Colquhoun has written a lot on this, including here and here.

Why go on about this?
We write frequently here on MT about how important it is to think about how we know what we know.  If we don't, we can get very close to religious territory, where knowledge is based on belief, not observation.  Indeed, even in science, to some of us, every trait is genetically determined, or we've got our favorite cause of obesity, or autism, or diabetes.  The ease with which we might choose to understand cause and effect without questioning how we know reflects two things -- one, belief is alive and well as a way to determine cause, and two, we often don't have demonstrably better ways to do it.

So, we don't know if sugar is the cause of the obesity epidemic or fat, or just overeating; we don't know whether breast feeding or bottle is the cause of the asthma epidemic; whether genes or environmental risk factors are the most important cause of type 2 diabetes, or which ones, and so on.  A lot of work in genetics is still based on the assumption that traits are simple, even though we know the kinds of traits that are likely to have simple explanations (the low-hanging fruit) and we know that they are rare.  We know the kinds of traits that are complex, and that aren't going to have easy explanations of the kind often suggested, and yet 'gene for' thinking is still prevalent in the popular press, and even among scientists.

Ludwik Fleck, a Polish physician and biologist, in 1935 published a book, Genesis and Development of a Scientific Fact, that is now properly recognized as the precursor to Thomas Kuhn's Structure of Scientific Revolutions.  Fleck wrote about "thought collectives" in science, his idea that facts in science are driven by context.  We follow the herd, until in fact the thought collective becomes a thought constraint.

Fleck writes of the development of the Wassermann test for syphilis, meant to determine who had the disease, but instead the thought collective at the time led the test result to define the disease.  It's an excellent short little book and well worth reading, but Ken wrote an even shorter column on Fleck, also worth reading if you're interested in Fleck and the sociology that is an important part of the way science actually operates.

A modern equivalent would be the common de facto practice of defining a genetic disease by genotype -- if a patient has one of the known genetic variants associated with the disease in other patients, he or she has the disease, but if not, he or she doesn't have the disease.  Even though we know that there can be many pathways to a given phenotype (our post last week on phenogenetic drift describes one reason for this).  Such definition, if everyone is aware of its nature, can guide therapy in useful ways -- that is, some genotype-defined subset of a broader disease category may respond to a particular kind of drug. But the changeable landscape of definition based on assumed causal process is an important part of the elusiveness of many conditions, like autism and many others. Too often the assumption that the outcome is 'genetic' defines, steers, or determines the concept of the trait itself. That can distract, and we think regularly does distract, from more realistic approaches to what is currently the very elusive nature of many traits, normal and otherwise, in animals and plants.

Understanding causation is a fundamental issue in science, but the difficulties are often overlooked in the rush to publish.  To the detriment of the science.


DG said...

"We write frequently here on MT about how important it is to think about how we know what we know."

This is one of the reasons this is one of the better blogs out there.

Ken's article on Fleck references a next installment on the anatomy of the human brain. Is that article easily accessible?

Anne Buchanan said...

It might be this one?

DG said...

Thanks, but I couldn't get to it.

It wants a login to display the page.

Anne Buchanan said...


Arlin said...

Hello. I enjoyed reading the pump story. The counter-example of the brewery workers was very striking, critical to the inference that drinking water was the source.

But I'm commenting because I'm on a crusade to set the record straight about the Modern Synthesis. In this blog it is invoked as "the Modern Evolutionary Synthesis (which showed that Mendelian genetics is consistent with gradual evolution)".

Actually, early geneticists such as Bateson, Punnett, Johannsen, et al understood that Mendelian genetics is consistent with gradual evolution, and stated this clearly as early as 1902. Their skepticism about gradual evolution was based on 2 lines of argument that we would accept today: they rejected non-Mendelian mechanisms of smooth change advocated by Darwinians, and they rejected the doctrine that evolutionary change is inherently gradual.

More generally, when evolutionary biologists describe the Modern Synthesis as a kind of generic framework combining mutation, genetics and selection, they are referring to a framework developed by early geneticists.

These points are documented thoroughly in a recent article in J. Hist. Biol. (

Ken Weiss said...

Reply to Arlin
I will be interested to read your paper. But I think that while the thinkers in the past were not naive or unaware (Darwin knew of traits that 'did not blend'), the general 'modern synthesis' interpretation was and is widely understood more or less as we've described it, and this went deep into the 20th century, indeed, is still the prevalent view.

Whether or not this view is historically accurate or, perhaps putting it more accurately, historically complete, I think it is the 'canonical' view, one might say, today.

I think that one can play the historical exegesis game in lots of ways, and interpret what our forebears said in retrospect, but then that risks selective choosing of what can be read as prescient and ignoring things that were wrong.

What counts today, to non-historians, is the way in which the current story is used in current research. In fact, I think, most biologists have little knowledge and less care about what was said in the past.

I'd say that clearly the people formulating the modern synthesis were assembling their views based on what they knew of ideas of their fairly recent predecessors, perhaps claiming more novelty in thinking on their own part than was warranted.

In any case, I look forward to reading your paper.