Tuesday, July 23, 2013

Why is the cause of X (or Y or Z) so hard to find?

£1.1m to find the answer -- will it?
Acute Oak Decline (AOD) is killing England's oak trees, and no one knows exactly why.  The story is in the news (here at The Guardian, e.g., and on the BBC radio program "Farming Today" on July 15) because a government-funded research project into the cause and the cure, to the tune of £1.1m, has just been announced. AOD came on fast, kills quickly, and seems like it should be easy to understand.  But it isn't.

'Decline' is a generic term for a category of tree diseases with multifactorial causation.  There are several forms of oak decline, acute and chronic, and cause can differ.  There was an outbreak of Acute Oak Decline in the 1920s in Britain, attributed to "over-riding effects of successive first flush defoliations by the caterpillar of the oak roller moth (Tortrix viridana), followed by damage to summer leaves by powdery mildew (Erysiphe alphitoides)," according to a 2010 paper describing the current epidemic ("Description of Gibbsiella quercinecans gen. nov., sp. nov., associated with Acute Oak Decline," Brady et al.).  That is, the cause was completely different from that of the current epidemic.

The new wave of the disease, which first appeared about 20 years ago, is characterized by attack on oak stems, with small cracks appearing in the bark, from which a gooey exudate bleeds heavily.  The disease hits mature trees, and they can die within 3 years, a quick death for a large tree that can otherwise take 200 years to die.  Some trees do seem to recover, though why they do is another unknown.

Thousands of trees have been affected, predominantly in the English Midlands, but the disease is spreading and is now in the south and south east of England and in Wales, and the number of trees with the disease in Britain could rise significantly in years to come. 

Stem bleeding in an AOD affected tree; Wikipedia
Although the cause of this epidemic is not yet understood with certainty, as Brady et al. report, several bacteria are consistently found in the exudate from affected trees, and indeed in trees with AOD in Spain.  Based on genetic analyses of various house keeping genes and of 16S rRNA (which is the stretch of bacterial ribosomal RNA that is conventionally used for constructing phylogenies, the historical ancestral connections between organisms) of nine different Gram-negative strains of bacteria from affected trees, Brady et al. found the bugs to be within the family of Enterobacteriaceae, but no match to previously known strains. They thus concluded that the bacteria affecting oak trees were novel species. (I say this in one short sentence; web searching shows that this was a long and involved process, with extensive sequencing and statistical analysis, culminating in petitions to classify and reclassify these bacteria based on genetic distance from other known bacteria.)

So, why aren't investigators satisfied that the cause of AOD has been found? Because, for one thing, the same bacteria aren't found in all affected trees. And, the larva of the jewel beetle (buprestid beetle) has been found in the infected cracks of over 90% of affected trees. The beetle lays its eggs in the tree, which creates tiny holes and cracks, and it's here that the disease is found. So, do the beetle and the bacteria together cause the disease? Maybe, but if so, why aren't beetles and the same bacteria always found in affected trees? Is it the interaction between the adult beetles, their larvae and bacteria that is causal? Or does the disease come first, and the beetle or the bacteria (or both) follow?

Haven't we heard this before?
This all sounds hauntingly reminiscent of Colony Collapse Disorder (CCD) in honey bees -- bees started dying about a decade ago, it seemed that it should be a quick puzzle to solve; find the infectious agent, or the pesticide and then just fix it, but the pieces are still lying all over the table.  One study makes it look like it's obviously a virus -- or a mite, or insecticides, or pesticides, or a combination of factors -- but then colonies affected by none of these things will die off and we're back to the beginning.  And white-nose syndrome that's killing bats. What causes that?

Little brown bat with white nose syndrome; Wikipedia
But then, this is hauntingly reminiscent of the hunt for the cause of the asthma epidemic that began in the 1980's in the US and much of Europe, or the heart disease epidemic that waxed and waned (well, ebbed) in the last century still without an explanation, or the type 2 diabetes epidemic in Native Americans, or the obesity epidemic throughout the world, or the rise in autism, or ADHD, or.... We're dismally bad at figuring out causation when it isn't simple and sharp.

But no need to stop at disease.  What caused the financial crisis of the last 5 years?  Why such exacerbated disorder in the fiscal houses of Spain or Italy or Greece?  Why is Syria in flames?  These are all questions that legions of experts -- epidemiologists, geneticists, sociologists, economists, psychologists, etc. -- have been trained to answer, but no amount of training makes the answering easy.  Or often, even possible. For example, these are things due to human behavior, but are they in any serious sense biologically related phenomena?  If not, how can we treat 'society' or 'culture' as phenomena?  These are questions that were long seriously debated, and sometimes still are.

And what about predicting the next AOD, or influenza epidemic, or financial crisis?  Or even who'll be unlucky enough to become demented in old age, or die of heart disease?  We're even worse at that.  We aren't stupid.  But perhaps our methods are.

But it used to work!
Epidemiology and genetics had noble beginnings.  Both were good at finding single causes with strong effects.  Smoking, the genes that code for wrinkled or yellow peas in Mendel's garden, or cystic fibrosis or Tay Sachs, or the bacterium that causes cholera or Legionnaire's Disease, asbestos -- these were found with good robust methods.  These are sharp, single or 'point' causes: one cause, and if you're exposed to it you manifest the effects.  These successes went to their heads and the fields got cocky.

But then the questions got harder.  With infectious diseases knocked (so we thought), the landscape changed.  When cause is multifaceted, like the complex chronic diseases that will eventually get most of us, or when there were multiple causes for what look like the same disease, and so on, our methods fail us.  So, looking for 'the genes for' asthma, or heart disease, or autism, or major explanatory environmental risk factors, have not panned out.  These, and many others, are diseases that would surely have been easily cracked, if they had a single major cause.  We treated them as though they do -- we still treat them as though they are, looking for genes for everything that afflicts us -- because those methods worked so well before.  But that's now the wrong model.

It's possible to identify many different components, as in the oak and colony collapse story.  But with very rapid change, is it more likely that there is a single major cause, and some minor passengers along for the ride?  Or should we search for numerous highly correlated causal elements?  There is basically no theory for this!

The fact that incidence rises quickly should take the hunting dog off the genetic trail and alert him to an environmental scent, because genes don't change quickly but environments can.  Quickly rising incidence would seem to suggest some single environmental change.  So why aren't the causes of CCD, or AOD, or asthma or autism, each of which might be characterized as epidemics with fairly quick onset and rapid rise, simple and easy to identify?

Well, even that question deserves a multifaceted answer.  It could be that what we're calling a single disease -- heart disease, asthma, autism, schizophrenia etc. -- is in fact a collection of diseases, with different causes.  This kind of phenotypic heterogeneity can wreak havoc with the best of study designs.  This is true looking for genes or environmental factors.

And cause may in fact be (fairly) simple, but there are so many potentially causal environmental factors that it's exquisitely hard to find the needle in the haystack.  Or, there are multiple ways to kill off a bee colony, or to get asthma -- that kind of genotypic or environmental heterogeneity can easily do in a study.  You think you've identified the cause -- neonicotinoids, for example -- but the next dead colony you look at was never exposed.

And if it's so hard to characterize causation, and causation is complex anyway, and we don't know how to capture all the relevant factors in the environment, how can we possibly predict disease (or the next economic downturn, or whether this way of teaching kids will work) in a completely unpredictable environment?

We don't have the answers.  But the questions keep on coming.


Anne Buchanan said...

A fascinating episode of the BBC radio program Discovery highlights another example of the difficulties determining cause, in this case earthquakes. Substitute "disease" for "earthquake" and we're right back in familiar territory.

Predicting earthquakes with any degree of precision as to timing and severity is essentially impossible, even if the general cause has been known since plate tectonics was understood. Some quotes from the program: "There are things that are unknowable." "What should we tell the public?" "We admit that we cannot predict earthquakes with any high accuracy but we can monitor and see if any significant changes occur." "Should we evacuate a city 200 times, because one time might be "the one"?"

Finally, one of the guests told a famous story by Kenneth Arrow, Nobel Prize winner in economics. He was a weather forecaster in the Army, and one of his tasks was to prepare forecasts for the upcoming month. After some time he ran the statistics and realized these forecasts were no better than random. He told this to the general requesting them, and the general replied, "I know they're no good, but I want them for my planning purposes."

Everyone wants to know more than meteorology, or geology, or genetics, or epidemiology is capable of telling. Luckily for scientists a prediction of 30% risk is always right -- they're right if it happens and right if it doesn't.

Ken Weiss said...

Hey, as a former weather forecaster, I take exception to your remarks! Actually, what makes a forecast 'right' depends on the need. What proper forecasting does these days is mainly, or only, probabilistic and based on a mix of theory plus the recorded experience of similar conditions.

Long-term forecasting is accurate if you say that there will be less rain in the fall, sub-zero temperatures in winter, etc. But it's useless for reasons you discuss for details.

In other areas, like personalized genomics, there's also a mix. Some genotypes are far more highly predictive than most. So, as you say, the problem is the misleading generalized promises that give the impression things are far more precise than they are--and, also as you concluded, a probabilistic forecast for an individual is unfalsifiable!