Showing posts with label complex causation. Show all posts
Showing posts with label complex causation. Show all posts

Thursday, October 13, 2016

Genomic causation....or not

By Ken Weiss and Anne Buchanan

The Big Story in the latest Nature ("A radical revision of human genetics: Why many ‘deadly’ gene mutations are turning out to be harmless," by Erika Check Hayden) is that genes thought to be clearly causal of important diseases aren't always (the link is to the People magazine-like cover article in that issue.)  This is a follow-up on an August Nature paper describing the database from which the results discussed in this week's Nature are drawn.  The apparent mismatch between a gene variant and a trait can be, according to the paper, the result of technical error, a mis-call by a given piece of software, or due to the assumption that the identification of a given mutation in affected but not healthy individuals means the causal mutation has been found, without experimentally confirming the finding--which itself can be tricky for reasons we'll discuss.  Insufficient documentation of 'normal' sequence variation has meant that the frequency of so-called causal mutations hasn't been available for comparative purposes.  Again we'll mention below what 'insufficient' might mean, if anything.

People in general and researchers in particular need to be more than dismissively aware of these issues, but the conclusion that we still need to focus on single genes as causal of most disease, that is, do MuchMoreOfTheSame, which is an implication of the discussion, is not so obviously justified.   We'll begin with our usual contrarian statement that the idea here is being overhyped as if it were new, but we know that except for its details it clearly is not, for reasons we'll also explain.  That is important because presenting it as a major finding, and still focusing on single genes as being truly causal vs mistakenly identified, ignores what we think the deeper message needs to be.

The data come from a mega-project known as ExAC, a consortium of researchers sharing DNA sequences to document genetic variation and further understand disease causation, and now including data from approximately 60,000 individuals (in itself, rather small compared to the need for purpose). The data are primarily exome sequences, that is, from protein-coding regions of the human genome, not from whole genome sequences, again a major issue.  We have no reason at all to critique the original paper itself, which is large, sophisticated, and carefully analyzed as far as we can tell; but the excess claims about its novelty are we think very much hyperbolized, and that needs to be explained.

Some of the obvious complicating issues
We know that a gene generally does not act alone.  DNA in itself is basically inert.  We've been and continue to be misled by examples of gene causation in which context and interactions don't really matter much, but that leads us still to cling to these as though they are the rule.  This reinforces the yearning for causal simplicity and tractability.  Essentially even this ExAC story, or its public announcements, doesn't properly acknowledge causal context and complexity because it is critiquing some simplistic single-gene inferences, and assuming that the problems are methodological rather than conceptual.

There are many aspects of causal context that complicate the picture, that are not new and we're not making them up, but which the Bigger-than-Ever Data pleas don't address:
1.  Current data are from blood-samples and that may not reflect the true constitutive genome because of early somatic mutation, and this will vary among study subjects,
2.  Life-long exposure to local somatic mutation is not considered nor measured, 
3.  Epigenetic changes, especially local tissue-specific ones, are not included, 
4.  Environmental factors are not considered, and indeed would be hard to consider,
5.  Non-Europeans, and even many Europeans are barely included, if at all, though this is  beginning to be addressed, 
6.  Regulatory variation, which GWAS has convincingly shown is much more important to most traits than coding variation, is not included. Exome data have been treated naively by many investigators as if that is what is important, and exome-only data have been used a major excuse for Great Big Grants that can't find what we know is probably far more important, 
7.  Non-coding regions, non-regulatory RNA regions are not included in exome-only data,
8.  A mutation may be causal in one context but not in others, in one family or population and not others, rendering the determination that it's a false discovery difficult,
9.  Single gene analysis is still the basis of the new 'revelations', that is, the idea being hinted at that the 'causal' gene isn't really causal....but one implicit notion is that it was misidentified, which is perhaps sometimes true but probably not always so,
 10.  The new reports are presented in the news, at least, as if the gene is being exonerated of its putative ill effects.  But that may not be the case, because if the regulatory regions near the mutated gene have no or little activity, the 'bad' gene may simply not be being expressed.  Its coding sequence could falsely be assumed to be harmless, 
11. Many aspects of this kind of work are dependent on statistical assumptions and subjective cutoff values, a problem recently being openly recognized, 
12.  Bigger studies introduce all sorts of statistical 'noise', which can make something appear causal or can weaken its actual apparent cause.  Phenotypes can be measured in many ways, but we know very well that this can be changeable and subjective (and phenotypes are not very detailed in the initial ExAC database), 
13.  Early reports of strong genetic findings have well known upward bias in effect size, the finder's curse that later work fails to confirm.

Well, yes, we're always critical, but this new finding isn't really a surprise
To some readers we are too often critical, and at least some of us have to confess to a contrarian nature.  But here is why we say that these new findings, like so many that are by the grocery checkout in Nature, Science, and People magazines, while seemingly quite true, should not be treated as a surprise or a threat to what we've already known--nor a justification of just doing more, or much more of the same.

Gregor Mendel studied fully penetrant (deterministic) causation.  That is what we now know to be 'genes', in which the presence of the causal allele (in 2-allele systems) always caused the trait (green vs yellow peas, etc.; the same is true of recessive as dominant traits, given the appropriate genotype). But this is generally wrong, save at best for the exceptions such as those that Mendel himself knowingly and carefully chose to study.  But even this was not so clear!  Mendel has been accused of 'cheating' by ignoring inconsistent results. This may have been data fudging, but it is at least as likely to have been reacting to what we have known for a century as 'incomplete penetrance'.  (Ken wrote on this a number of years ago in one of his Evolutionary Anthropology columns.)  For whatever reason--and see below--the presence of a 'dominant' gene or  'recessive' homozyosity at a 'causal' gene doesn't always lead to the trait.

In most of the 20th century the probabilistic nature of real-world as opposed to textbook Mendelism has been completely known and accepted.  The reasons for incomplete penetrance were not known and indeed we had no way to know them as a rule.  Various explanations were offered, but the statistical nature of the inferences (estimates of penetrance probability, for example) were common practice and textbook standards.  Even the original authors acknowledge incomplete penetrance, but this essentially shows that what the ExAC consortium is reporting are details but nothing fundamentally new nor surprising.  Clinicians or investigators acting as if a variant were always causal should be blamed for gross oversimplification, and so should hyperbolic news media.

Recent advances such as genomewide association studies (GWAS) in various forms have used stringent statistical criteria to minimize false discovery.  This has led to mapped 'hits' that satisfied those criteria only accounting for a fraction of estimated overall genomic causation.  This was legitimate in that it didn't leave us swamped with hundreds of very weak or very rare false positive genome locations.  But even the acceptable, statistically safest genome sites showed typically small individual effects and risks far below 1.0. They were not 'dominant' in the usual sense.  That means that people with the 'causal' allele don't always, and in fact do not usually, have the trait.  This has been the finding for quantitative traits like stature and qualitative ones like presence of diabetes, heart attack-related events, psychiatric disorders and essentially all traits studied by GWAS. It is not exactly what the ExAC data were looking at, but it is highly relevant and is the relevant basic biological principle.

This does not necessarily mean that the target gene is not important for the disease trait, which seems to be one of the inferences headlined in the news splashes.  This is treated as a striking or even fundamental new finding, but it is nothing of that sort.  Indeed, the genes in question may not be falsely identified, but may very well contribute to risk in some people under some conditions at some age and in some environments.  The ExAC results don't really address this because (for example) to determine when a gene variant is a risk variant one would have to identify all the causes of 'incomplete penetrance' in every sample, but there are multiple explanations for incomplete penetrance, including the list of 1 - 13 above as well as methodological issues such as those pointed out by the ExAC project paper itself.

In addition, there may be 'protective' variants in the other regions of the genome (that is, the trait may need the contribution of many different genome regions), and working that out would typically involve "hyper astronomical" combinations of effects using unachievable, not to mention uninterpretable, sample sizes--from which one would have to estimate risk effects of almost uncountable numbers of sequence variants.  If there were, say, 100 other contributing genes, each with their own variant genotypes including regulatory variants, the number of combinations of backgrounds one would have to sort through to see how they affected the 'falsely' identified gene is effectively uncountable.

Even the most clearly causal genes such as variants of BRCA1 and breast cancer have penetrance far less than 1.0 in recent data (here referring to lifetime risk; risk at earlier ages is very far from 1.0). The risk, though clearly serious, depends on cohort, environmental and other mainly unknown factors.  Nobody doubts the role of BRCA1 but it is not in itself causal.  For example, it appears to be a mutation repair gene, but if no (or not enough) cancer-related mutations arise in the breast cells in a woman carrying a high-risk BRCA1 allele, she will not get breast cancer as a result of that gene's malfunction.

There are many other examples of mapping that identified genes that even if strongly and truly associated with a test trait have very far from complete penetrance.  A mutation in HFE and hemochromatosis comes to mind: in studies of some Europeans, a particular mutation seemed always to be present, but if the gene itself were tested in a general data base, rather than just in affected people, it had little or no causal effect.  This seems to be the sort of thing the ExAC report is finding.

The generic reason is again that genes, essentially all genes, work only in their context. That context includes 'environment', which refers to all the other genes and cells in the body and the external or 'lifestyle' factors, and also age and sex as well.  There is no obvious way to identify, evaluate or measure the effects of all possibly relevant lifestyle effects, and since these change, retrospective evaluation has unknown bearing on future risk (the same can be said of genomic variants for the same reason).  How could these even be sampled adequately?

Likewise, volumes of long-existing experimental and highly focused results tell the same tale. Transgenic mice, for example, in which the same mutation is introduced into their 'same' gene as in humans, very often show little or no, or only strain-specific effects.  This is true in other experimental organisms. The lesson, and it's by far not a new or very recent one, is that genomic context is vitally important, that is, it is person-specific genomic backgrounds of a target gene that affect the latter's effect strength--and vice versa: that is, the same is true for each of these other genes. That is why to such an extent we have long noted the legerdemain being foist on the research and public communities by the advocates of Big Data statistical testing.  Certainly methodological errors are also a problem, as the Nature piece describes, but they aren't the only problem.

So if someone reports some cases of a trait that seem too often to involve a given gene, such as the Nature piece seems generally to be about, but searches of unaffected people also occasionally find the same mutations in such genes (especially when only exomes are considered), then we are told that this is a surprise.  It is, to be sure, important to know, but it is just as important to know that essentially the same information has long been available to us in many forms.  It is not a surprise--even if it doesn't tell us where to go in search of genetic, much less genomic, causation.

Sorry, though it's important knowledge, it's not 'radical' nor dependent on these data!
The idea being suggested is that (surprise, surprise!) we need much more data to make this point or to find these surprisingly harmless mutations.  That is simply a misleading assertion, or attempted justification, though it has become the intentional industry standard closing argument.

It is of course very possible that we're missing some aspects of the studies and interpretations that are being touted, but we don't think that changes the basic points being made here.  They're consistent with the new findings but show that for many very good reasons this is what we knew was generally the case, that 'Mendelian' traits were the exception that led to a century of genetic discovery but only because it focused attention on what was then doable (while, not widely recognized by human geneticists, in parallel, agricultural genetics of polygenic traits showed what was more typical).

But now, if things are being recognized as being contextual much more deeply than in Francis' Collins money-strategy-based Big Data dreams, or 'precision' promises, and our inferential (statistical) criteria are properly under siege, we'll repeat our oft-stated mantra: deeply different, reformed understanding is needed, and a turn to research investment focused on basic science rather than exhaustive surveys, and on those many traits whose causal basis really is strong enough that it doesn't really require this deeper knowledge.  In a sense, if you need massive data to find an effect, then that effect is usually very rare and/or very weak.

And by the way, the same must be true for normal traits, like stature, intelligence, and so on, for which we're besieged with genome-mapping assertions, and this must also apply to ideas about gene-specific responses to natural selection in evolution.  Responses to environment (diet etc.) manifestly have the same problem.  It is not just a strange finding of exome mapping studies for disease. Likewise, 'normal' study subjects now being asked for in huge numbers may get the target trait later on in their lives, except for traits basically present early in life.  One can't doubt that misattributing the cause of such traits is an important problem, but we need to think of better solutions that Big Big Data, because not confirming a gene doesn't help, or finding that 'the' gene is only 'the' gene in some genomic or environmental backgrounds is the proverbial and historically frustrating needle in the haystack search.  So the story's advocated huge samples of 'normals' (random individuals) cannot really address the causal issue definitively (except to show what we know, that there's a big problem to be solved).  Selected family data may--may--help identify a gene that really is causal, but even they have some of the same sorts of problems.  And may apply only to that family.

The ExAC study is focused on severe diseases, which is somewhat like Mendel's selective approach, because it is quite obvious that complex diseases are complex.  It is plausible that severe, especially early onset diseases are genetically tractable, but it is not obvious that ever more data will answer the challenge.  And, ironically, the ExAC study has removed just such diseases from their consideration! So they're intentionally showing what is well known, that we're in needle in haystacks territory, even when someone has reported big needles.

Finally, we have to add that these points have been made by various authors for many years, often based on principles that did not require mega-studies to show.  Put another way, we had reason to expect what we're seeing, and years of studies supported that expectation.  This doesn't even consider the deep problems about statistical inference that are being widely noted and the deeply entrenched nature of that approach's conceptual and even material invested interests (see this week's Aeon essay, e.g.).  It's time to change, but doing so would involve deeply revising how resources are used--of course one of our common themes here on the MT--and that is a matter almost entirely of political economy, not science.  That is, it's as much about feeding the science industry as it is about medicine and public health.  And that is why it's mainly about business as usual rather than real reform.

Wednesday, June 18, 2014

Republican presidents are bad for our health?

Infant mortality in the US fluctuates with the political party of the President.  A paper published in the  June 4 issue of the widely respected International Journal of Epidemiology ("Us Infant Morality and the President's Party," Rodriguez et al.) reports that between 1965-2010, infant mortality rates were 3% higher when a Republican was president than when the president was a Democrat.

Rodriguez et al. write that previous "political epidemiology" has been cross-national, and has attempted to determine the effect of policy on public health; welfare states, national health systems vs not, higher social expenditures or medical expenditures per capita vs lower, etc. Income inequality was found to be correlated with public health until the data were re-analyzed and additional variables controlled (Avendano, 2012), leading Avendano to question whether income inequality was in fact causal, as opposed to either spurious or real but only correlation with some unmeasured variable(s).  Further studies have attempted to determine which social factors are actually causal; some have suggested social expenditure and the generosity of family policies may be. In this way, political policies may affect public health, but actual causality, rather than just correlation, is difficult to determine.

Rodriguez et al. posed the question, ‘Is the political party of the president of the USA associated with an important, objective and sensitive measure of population health, infant mortality?’ The idea is that the party in power drives macroeconomic policy, and macroeconomic policy influences the socioeconomic milieu, affecting variables that affect health and mortality.

Infant mortality has fallen dramatically since 1965, from a total of 24.7 per 1000 births to 6.1 in 2010.  In the graph below, the authors have removed the trend, and show total infant mortality, neonatal and post-neonatal mortality, by president, for blacks and whites.  During Democratic administrations, all rates are lower, across the board, on average 3% lower.



Logged IMR, NMR, and PMR residual trends and presidential partisan regimes, 1965–2010; Source, Rodriguez et al., 2014
The statistical effects are essentially the same for Blacks and Whites, but infant mortality among Black infants is about two times higher than it is among Whites, so the absolute effect is larger.  The percentages may be small -- very small, in fact -- but their consistency does lend them credibility.

Several things stand out about these results.  First, as the authors point out, the implementation of policies that might have had a direct effect on infant mortality -- Johnson's Great Society and Medicaid in the 1960's, or expansion of Medicaid eligibility between 1979 and 1992 don't correlate with these periodic dips in IMRs.  That would be the easy explanation.  But this means that the correlation with political party may have little or nothing to do with policy differences.

Or, Rodriguez et al. suggest, the correlation could reflect real, cyclical changes in socioeconomic conditions for mothers and infants, depending on national policy.  Or, differential availability of abortion, since high risk fetuses may be more likely to be aborted than fetuses at lower risk.  Or, it might reflect differing attitudes toward health disparities, with Democrats more likely than Republicans to use government to address them -- but what actual governmental policy is implemented, or eliminated, and thus responsible for the fluctuations is anybody's guess.

But there's something curious about these findings.  Neonatal mortality, death before 28 days of life, is generally considered to be due to conditions of pregnancy or congenital abnormalities, while post-neonatal infant mortality, death between 29 days and 1 year, includes sudden infant death syndrome, which isn't correlated with socioeconomic status, but PMR is also considered to be a reflection of socioeconomic conditions.  If that's so, then neonatal mortality should look quite different from post-neonatal mortality in this study, but it doesn't.  It shouldn't be fluctuating with policy differences or income inequality or whatever political or economic factors, if any, might be responsible for the trend reported here.  And, one would expect there to be a more marked difference between Black neonatal and post-neonatal mortality, since health disparities are most reflected in Black infant deaths.

Equally problematic is that one might expect that presidential terms are short relative to the lag time between implementing a new policy and its effects.  The study did allow for a one year lag time, but still, most health policies don't have immediate impact. So the incumbent's party may be irrelevant to what happens during his term, or it would at least be the successor's (sometimes the same sometimes different) party.  Do people's expectations, based on the current President's outlook, change their behavior in subtle ways? Sounds plausible, and would have nothing to do with the policy change itself, but so many people are uninvolved, uninterested in, or skeptical of the political system that this might not be much of an explanation. And the pattern goes back before CNN and FOX imitation news organizations had much intentionally motivating influence on what people thought or were aware of.

Still, social and political epidemiology are interesting approaches to understanding the underlying causes of ill health and mortality.  The fields look at risk factors several steps removed from those generally considered as causes of disease, so that AIDS, or malaria, e.g., might be attributed to poverty rather than HIV infection or being bitten by a parsite-carrying mosquito, and legitimately so.  That is, the idea is that poverty increases one's risk of exposure to diseases, and if you eliminate poverty you eliminate risk.  The difficulty, of course, is that enacting public health policy that calls for eliminating poverty is a lot more difficult than distributing bed nets or clean needles.

And, the problem of identifying cause from correlation is huge with such metadata.  It can be pretty much pure guess work to pull causal factors from the social or political hat, as this paper suggests -- in fact, if something like, to make something up, differences in completeness of registration of vital statistics in Republican and Democratic years were responsible for this cyclical dip, it would look just the same as if the cause were changing policy. The point is, that we just don't know.  In addition, it's hard to avoid interpreting results from one's particular political point of view -- maybe there's something interesting in this paper, maybe there's not, but it's very hard to know.

Thursday, June 5, 2014

Autism -- back to blaming the mother?

Two recent reports of the cause of autism reach different conclusions, though they are alike in that neither implicates genetics, at least not directly.  The first, published in the International Journal of Epidemiology ("Maternal lifestyle and environmental risk factors for autism spectrum disorders," Lyall et al.), reviews the evidence for environmental risk factors and finds that diet can influence risk, and that folic acid supplements taken around conception are associated with reduced risk.  Further,
Although many investigations have suggested no impact of maternal smoking and alcohol use on ASD [autism spectrum disorder], more rigorous exposure assessment is needed. A number of studies have demonstrated significant increases in ASD risk with estimated exposure to air pollution during the prenatal period, particularly for heavy metals and particulate matter. Little research has assessed other persistent and non-persistent organic pollutants in association with ASD specifically.
Lyall et al. call for larger epidemiological studies of maternal exposure to vitamins, fats and other nutrients, as well as pesticides and endocrine-disrupting chemicals, even though environmental epidemiological studies of autism have been done for decades.

The second paper, in  Molecular Psychiatry ("Elevated fetal steroidogenic activity in autism," Baron-Cohen et al.), reports the results of looking at hormone levels in amniotic fluid samples collected at between 15 and 16 weeks gestation from a sample taken from a registry of nearly 20,000 male infants in Denmark, born between 1993 and 1999.  The final sample was fairly small, including 128 male infants with autism and 217 controls; the 24 females in the registry who were later diagnosed with autism were excluded from the study because they were atypical for a variety of reasons.  Prevalence of autism is generally higher in males.
We find that amniotic fluid steroid hormones are elevated in those who later received diagnoses on the autism spectrum. Rather than the abnormality being restricted to a specific steroid hormone, a latent steroidogenic factor is elevated, which includes all hormones in the Δ4 pathway, as well as cortisol.
The effect on the developing brain, Baron-Cohen et al. suggest, may be epigenetic.  That is, steroids modify DNA in ways that affect gene expression without changing coding sequence.
Steroids and their receptors act as epigenetic fetal programming influences on early brain development. Through their nuclear hormone receptors, steroids can alter gene expression via direct or indirect influence on multiple epigenetic processes such as histone acetylation, DNA methylation and have transcriptional and post-transcriptional effects on noncoding mRNAs such as microRNAs. Furthermore, during early sensitive periods of brain development, there are sex differences in DNA methylation, methyl-binding proteins, chromatin modifications and microRNA expression, and these effects are mediated in part by early steroid hormone effects.
What is the source of the excess steroid?  "The fetus, the mother, the placenta or other external factors" -- that is to say, it could be anything and this study couldn't answer that question.  Indeed, it is also impossible to know, if the excess hormone really is involved, whether it's the cause of the disorder or the result.  Perhaps maternal stress is the source, the authors suggest, and perhaps, the authors note, steroids such as testosterone and cortisol are also elevated in other disorders with a skewed sex ratio.  In any case, they write, "Each of these sources require further investigation to determine how such influences might affect fetal development in autism."

Cortisol molecule

A story on the BBC website about this work quotes an autism "expert" saying that this is "an important first step" on the path to discovering what causes autism.  First step!? This is a curious way to describe things, since probably billions of dollars have been spent in the last 30 or 40 years on efforts to identify the cause of this disorder, much of it on genetic studies, with no robust results.  Given this track record, what criteria should we use to decide whether this study is worth paying any attention to?

As with many complex diseases and disorders, many genes with small effect have been identified, but none of these explains the high rates of autism now reported around the world.  It is interesting to see these two reports of possible environmental risk factors after a sea of genetic studies, though.  Decades ago autism was believed to be the result of "refrigerator mothering," but then blame swung toward genes and away from environment, and now it seems autism is epigenetic. The gene switch never could have been exactly right given the dramatic, rapid increase in prevalence of autism, and other than because genes are techy and faddish, why would one ever expect genes to be a main cause in the first place, other than as a rationale to do genetics (which we knew how to do) and a paucity of other ideas?  Or, environmental causes being difficult to replicate and confirm.

But many epidemiological studies looking for environmental causes have been done.  A 2010 paper in Current Opinion in Pediatrics reports, with respect to environmental risk factors, e.g.:
...the most powerful proof-of-concept evidence derives from studies specifically linking autism to exposures in early pregnancy – thalidomide, misoprostol, and valproic acid; maternal rubella infection; and the organophosphate insecticide, chlorpyrifos. There is no credible evidence that vaccines cause autism.
Older mothers and fathers have been associated with autism, birth order, toxic chemicals, vaccines and thimerosol, and so forth, though none reliably so.  And of those factors that have been replicated, they can't explain all cases.

Autism is a difficult trait to study.  The trait itself is hard to define, varies enormously, there are no biomarkers with which to make a definitive diagnosis, diagnostic criteria have changed over the years, and so forth.  But many traits are similarly complex -- asthma, schizophrenia, heart disease, etc. -- and similarly resistant to current methods for determining cause.  So it seems fair to assert that many attempts to determine causes of complex traits are fad-following approaches to understanding complexity with reductionist science.

Tuesday, September 17, 2013

The US Health Disadvantage

 Mobilization of an unprecedented kind is now necessary in the United States. It requires a campaign to remove the public veil of ignorance about the evidence.
So states the public health Policy Forum in the Aug 30 issue of Science ("Confronting the Sorry State of U.S. Health," Bayer et al.*), which raises some important questions about health and sickness in the United States.  The authors are commenting on a recent report published by the U.S. National Research Council and Institute of Medicine, "US Health in International Perspective: Shorter Lives, Poorer Health," (Jan, 2013) which asks why the US is among the richest nations in the world, and yet the health of its people is far down the list.  The report is the outcome of 18 months of work by a panel charged with exploring the problem and identifying causes and solutions.

The panel compared health outcomes of Americans with those of 16 other wealthy countries.  They found that Americans have had a shorter life expectancy than people in the comparable countries for many years, and that the differential is growing, especially for women.  The health disadvantage affects everyone up to age 75, it's worse among poorer Americans but exists even in the wealthy, and includes multiple diseases, risk factors and injuries. 



It's worth quoting the panel's findings in detail.
1. Adverse birth outcomes: For decades, the United States has experienced the highest infant mortality rate of high-income countries and also ranks poorly on other birth outcomes, such as low birth weight. American children are less likely to live to age 5 than children in other high-income countries.
2. Injuries and homicides: Deaths from motor vehicle crashes, nontransportation-
related injuries, and violence occur at much higher rates in the United States than in other countries and are a leading cause of death in children, adolescents, and young adults. Since the 1950s, U.S. adolescents and young adults have died at higher rates
from traffic accidents and homicide than their counterparts in other countries.
3. Adolescent pregnancy and sexually transmitted infections: Since the 1990s, among high-income countries, U.S. adolescents have had the highest rate of pregnancies and are more likely to acquire sexually transmitted infections.
4. HIV and AIDS: The United States has the second highest prevalence of HIV infection among the 17 peer countries and the highest incidence of AIDS.
5. Drug-related mortality: Americans lose more years of life to alcohol and other drugs than people in peer countries, even when deaths from drunk driving are excluded.

6. Obesity and diabetes: For decades, the United States has had the highest obesity rate among high-income countries. High prevalence rates for obesity are seen in U.S. children and in every age group thereafter. From age 20 onward, U.S. adults have among the highest prevalence rates of diabetes (and high plasma glucose levels) among peer countries.
7. Heart disease: The U.S. death rate from ischemic heart disease is the second highest among the 17 peer countries. Americans reach age 50 with a less favorable cardiovascular risk profile than their peers in Europe, and adults over age 50 are more likely to develop and die from cardiovascular disease than are older adults in other
high-income countries.
8. Chronic lung disease: Lung disease is more prevalent and associated with higher mortality in the United States than in the United Kingdom and other European countries.
9. Disability: Older U.S. adults report a higher prevalence of arthritis and activity limitations than their counterparts in the United Kingdom, other European countries, and Japan.
It's not all bad -- if an American reaches 75, s/he has a higher survival rate thereafter; the US has higher cancer screening and survival rates, blood pressure and cholesterol are better controlled, we're more likely to survive a stroke, we smoke less and our average household income is higher, suicide rates aren't higher than comparison countries (faint praise, that), and the health of recent immigrants is better than that of people born here. Otherwise, and even though health care spending per capita is much higher in the US than the comparison countries, health outcomes here are significantly worse. Though, of course, we're ahead of the curve in some respects, obesity rates e.g., with other countries fast catching up.  

So, why the dismal picture in the US?  The panel considered this at great length (it's a 400 page document).  You'd think it might be because we have more people without access to health care than other countries, but the disadvantage holds even for those with access to care.  We smoke and drink less, but eat more.  We have more accidents and have more guns.  Our educational attainment is lower than other countries, and poverty rates and income inequality higher. and social mobility lower.  And, the panel also points out, a less effective social safety net.  But, even those of us with "healthy behaviors" are more likely to get sick, and have accidents, than our counterparts in other wealthy countries.

So, understanding what's behind the sorry state of health in this country is not straightforward.  Indeed, the panel seemed sorely tempted to describe unhealthy social and environmental conditions in the US, and ascribe our health conditions to the whole sorry mess.
Potential explanations for the U.S. health disadvantage range from those factors that are commonly understood to influence health (e.g., such health behaviors as diet, physical inactivity, and smoking, or inadequate access to physicians and high-quality medical care) to more “upstream” social and environmental influences on health (e.g., income, education, and the conditions in which people live and work). All of these factors, in turn, may be shaped by broader national contexts and public policies that might affect health and the determinants of health, and therefore might explain why one advanced country enjoys better health than another.
That's of course not very helpful in policy terms because public health measures must be directed at something specific, like cleaning dirty water or vaccinating against disease. The situation reminds us of too many attempts to explain complex disease with simple, enumerable factors -- for example, we dream of simple genetic causes, but in fact it's multiple gene and environment interactions.  Here, the Affordable Care Act won't be the answer, nor would gun control be, nor enforcing seat belt laws, nor banning supersize drinks or increasing the availability of fresh fruits and vegetables in poor neighborhoods.  It's complicated.  And surely a combination of many factors, social and environmental.
 
The panel recommends, though, more data collection, more refined analytic methods and study design, and more research.  They recommend focusing on children and adolescents, because early life experiences and habits can affect the whole life span. They also recommend that research should be on the entire life course rather than more localized cause and effect.    But the study urges that the situation is so critical that action must be taken while research is ongoing, and they provide a long list of actions they believe should be taken, from increasing the use of motorcycle helmets to increasing the availability of public transport to improving air and water quality and increasing the proportion of adolescents who don't use illegal drugs.  More generally, they recommend:
(1) intensify efforts to pursue existing national health objectives that already target the specific areas in which the United States is lagging behind other high-income countries, (2) alert the public about the problem and stimulate a national discussion about inherent
tradeoffs in a range of actions to begin to match the achievements of other high-income nations, and (3) undertake analyses of policy options by studying the policies used by other high-income countries with better health outcomes and their adaptability to the United States.
But what kind of issue is this?  A public health issue?  Public policy?  Economic, educational?  Here we come to a fundamental question of causation. What, we might ask, causes AIDS? Is it HIV?  Needle sharing?  Poverty?  A confluence of factors at all levels?  Epidemiology has long struggled to take multi-level causation into account, acknowledging the role of many different kinds of factors including biological and social determinants (see Nancy Krieger's old but seminal and still good 1994 paper on this, "Epidemiology and the web of causation: has anyone seen the spider?"), but once the web extends into social causes, the field of public health is pretty much stymied when it comes to fixing things.  And throwing this into the political arena is a sure recipe for a lot of grandstanding but not much else.

Is more research really needed into why Americans are sicker than our counterparts in other wealthy countries?  No doubt it is a serious problem, and very costly in both human and monetary terms.  But of course the request will be for more mega-scale, long-duration highly technological studies--more grant money.  You'd expect us to say that.  But is the plea for more funding a reflex or is it really the answer? 

It does not seem obviously so, except for the many small factors that would be found.  We know enough to know that the answer is going to be complicated, and causal factors changeable.  Indeed, we surely will be found to be leading the pack in some measures, and other countries will catch up.  And, whether the fix is deemed to be personal behavior or political, or a mix of many approaches, once we go beyond requiring vaccines or seat belts, we are the master of none of them. And they're always changing.  Perhaps research money should be going into things like how to improve health education (that is, how to get people to do things they'd rather not do, like exercise or eat less fat). 

If history is any guide, we're betting that when another such study is done in the future, we'll be better than we are now in some measures and worse in others.  And we won't know why.  And we'll say that 'more research is needed'.  Cardiovascular disease rates have risen and fallen over the past 60 years or so, and we still don't know why -- and that's just one disease.  A serious question is how to deal with phenomena that are so changing, and so subtly complex, that we have to keep surveying to understand them.  Could there be some better way, a different approach?
 

---------------------
*Thanks to Bob Ferrell for bringing this to our attention.

Tuesday, February 12, 2013

Concentric circles of causation

Public health has always been a broad discipline, encompassing the study of diseases in populations in all its forms, applied and academic both, and it's getting broader.  The story of the first epidemiologist, John Snow, is well-known -- he suspected that an 1854 cholera epidemic on Broad Street in London was due to a waterborne infectious agent, at a time when it was generally thought that cholera was an airborne disease.  He convinced authorities to remove the handle from the pump on the well that he suspected was the source of the epidemic.  It turned out, of course, that he was right, although the spread of cholera had already slowed by the time the handle was taken off the pump.  So, his contribution was to future public health, not his ill contemporaries.

Epidemiology is the study of patterns of disease in populations -- who's at risk, what's the cause, how to prevent it.  For more than 100 years after John Snow epidemiologists concentrated on proximate single causes or specific exposures -- cigarettes and lung cancer, Legionnaire's disease and Legionella bacteria, HIV and AIDS -- but recently the field of "social epidemiology" has gained some traction. This is the study of the social determinants of health and disease.  The proximate cause of AIDS is HIV, but often, HIV is contracted through drug use and shared needles, or patterns of multiple sex partners.  And, in most places, those at highest risk are poor, so that it's perfectly legitimate to say that poverty causes AIDS.  So, what's the actual 'cause' of the disease?  And, where does Public Health intervene to control it?

There has long been debate over whether epidemiology has a theoretical framework, or whether it is just a set of established statistical methods.  Social epidemiologist Nancy Krieger, in her seminal paper in 1994 called "Epidemiology and the web of causation: has anyone seen the spider?" discussed just this, writing that epidemiology at the time, yes, was interested in the 'web of disease causation' -- what causes disease in populations? -- but was neglecting the search for the spider, the maker of the web.  This paper kick-started the field of social epidemiology.

Krieger wrote:
[This paper] emphasizes why epidemiologists must look first and foremost to the link between social divisions and disease to understand etiology and to improve the public’s health, and in doing so exposes the incomplete and biased slant of epidemiologic theories reliant upon a biomedical and individualistic world-view. 
She's just published a new paper in the American Journal of Public Health, "History, Biology and Health Inequities: Emergent Embodied Phenotypes and the Illustrative Case of the Breast Cancer Estrogen Receptor" in which she argues that health inequities can only be reduced if diseases are considered not just as static biological entities, but within their social and evolutionary contexts, and as individual histories (she calls this broad view of, in this case the estrogen receptor, the "emergent embodied phenotype" -- we can ignore the jargonizing of what is a rather obvious idea).  The paper was brought to our attention by Susan Oyama, who has done some very good work on developmental systems, broadening the understanding of genes in context, among other things.

There's a new field in biology, too; 'systems biology'. This is the study of interconnected biological systems -- metabolic pathways, gene interactions,  cell signaling networks.  This is intended to be a more holistic approach to biology rather than a reductionist one.  The idea is that describing these interactions will help us to understand the 'emergent' traits that are complex diseases.  It is hoped that there will be practical applications -- understanding networks will, e.g., elucidate 'druggable' pathways that pharmaceutical companies can then intervene on, to prevent or cure disease.

Ken and I have often criticized the idea that the new field of 'systems biology' is going to be all it promises.  Like social epidemiology, systems biology is a somewhat self-congratulatory jargonized term whose  proponents suggest the very reasonable idea of taking a broader view of causation, in an attempt to take into account as many factors as possible that might explain how genes function or how traits are made or the causes of a disease.  It's basically a way of enumerating identifiable interactions among components.

So, rather than being satisfied with knowing that mutations in a specific gene on their own cause a particular disease, the idea is to take a larger view:  Enumerate the regulatory pathways or gene interactions (that is, between proteins) so that there are more targets for pharmaceutical intervention.

Seoul from space
Systems biology sounds like a good idea.  A standard, purely additive model of what genes do treats each gene like an independent dose of some effect, and the overall result is just the sum of these effects.  But this is clearly oversimplifying at best, because gene products interact in various molecular ways that need not just be additive.  Instead, sets of genes (their coded proteins, or the proteins that cause a given gene to be expressed) interact with each other -- in 'systems'.

Metabolism is an example in which chains or cycles or hierarchies of interactions pass molecules from one stage to the next.  Even if the interaction networks are the mechanism of action, there is no reason to expect that the effects of variation in the components will lead to simple or additive variation in the result.

However, the temptation (whether explicit or implicit) is to try to rescue complexity by treating networks as self-contained causal units. Take sets of tens or hundreds of genes and treat each as a single causal unit, and magically you've reduced the causal dimensions you need to consider by an order of magnitude!  But this is wishful thinking, and doesn't really simplify causation or the hunt to understanding, because we know that genes contribute to multiple networks, that vary, overlap, and have alternative pathways that are used under different circumstances.

Social epidemiology faces similar complexity.
Broad causal networks may seem to smooth out causal relationships until they look generalizable, but they aren't necessarily any smoother in social than molecular life.  There's a common issue in population sciences called the "ecological fallacy."  This is when an association that is true on the population level is inferred to be true on the individual level.  The county may always vote Republican, or this neighborhood may be a wealthy one, but it can't be assumed that everyone in the county votes Republican, or that everyone in the neighborhood has an income above the average. Or that the neighborhood causes political preference or wealth in a given individual.

Social epidemiology by its nature and objectives searches for population level associations and attempts to infer from those associations causal relationships that apply at the level of the individual. Poverty causes AIDS.  Racism causes stress which causes high blood pressure.  In a sense, as reasonable and plausible as these ideas, and as much as they must reflect causal processes in some way, the ecological fallacy is always lurking, because not everyone with AIDS is poor, and not everyone who is poor gets AIDS, not everyone who has experienced racism has high blood pressure, and so on. And, even eliminating poverty won't prevent further cases of AIDS.

Unlike systems biology, where the idea is to intervene in the networks with targeted pharmacological agents, it's hard to know what to do with the information that social epidemiology is producing.  Public Health is, ultimately, an applied science.  It has very little that could be called real theory, beyond the use of statistical methods to design and evaluate studies -- that is, assuming the kinds of repeatabliity that any statistical sampling requires, similar to saying that every roll of dice has the same probability of coming up 6.  Dice rolls may be repeatable events, but to a great extent, humans aren't.  And as with systems biology, the idea is not just to enumerate interactions but to identify targets of intervention.  But, when the field is identifying "poverty" or "racism" as causal factors, what can be done to intervene? 

Indeed, one might say that at least simple biological systems like the krebs energy cycle or photosynthesis require networks of interactions among a step-wise hierarchy of truly enumerable molecular components in quantitative relationships, and a very high degree of repeatability so they can be studied experimentally and evaluated with standard statistical methods.  But are 'poverty' and 'racism' even serious concepts of similar type?  If not, then current statistical or other study-design methods may simply be inappropriate, too vague, or unable to provide the type of answers we would like to get, especially if we desire to understand causation at the individual level.

Public Health, and epidemiology in particular, has a legacy of successful research and intervention in infectious diseases -- these are 'point source' diseases, with largely replicable causality, and if you cut the problem off at the source, you prevent the disease.  This has its parallels with human genetics, where clearly Mendelian, single-gene disorders are relatively easy to explain.  It's when causation gets complex that genetics -- and Public Health -- get into difficulties.


But, ok, granted, Public Health is a population-level field, and eliminating all cases of a disease has never been asked of it.  But it's curious to see the field seeming to homogenize causation at a time when complexity is a buzzword in other fields.  Though, again, this is quite in line with much of genetics, which has its own legacy of successes with point causation (Mendelian disease), just not so much with complex diseases. And which also mixes population-level effects of particular variants with the ability of those variants to 'personalize' medical care.

Have you seen the photos that Col. Chris Hadfield is posting on his Facebook page?  He's in the Space Station, taking pictures of Earth in his free time and sharing them with the world.  They are stunning.  He photographs cities, regions, large chunks of continents; they are lit up at night or green, or snow-covered, or cloud-covered, or dry-as-a-bone desert, or stretches of ocean during the day.  When he wonders what something is in one of his pictures, he crowdsources the answer.  The other day it was a spot in Tehran that looked curious to him.  It turned out it had once been an airport, and is now a playground, which Iranians told him.

If a region is experiencing drought, that's easy to see from these photos.  But, if there are cracks in the macadam on those long stretches of highway, or a bridge is unstable, or the rivers are polluted, you can't see it from space.  At least, not from these pictures.  You do get the big picture, but you have to zoom in to discover where to intervene to prevent a catastrophe due to crumbling infrastructure.

Social epidemiology may be a lot like Col. Hadfield's photographs -- beautiful descriptions, but so far removed from the individual that it's impossible to actually predict who's going to get sick and why, never mind useful for figuring out where to intervene.  Even identifying the risk factors themselves is notoriously difficult.  And one need not give it a special name, which is often a way we academics have of making their ideas seem new or particularly insightful.  Just 'epidemiology' will do very well--and the point is to include social as well as other potentially causal factors (and, as we hope is clear, we are not just picking on public health rhetoric, because genetics is just as much affected by unnecessary jargonizing).

Public Health is by goals and design a population-based field, and it works well when the message is that cholera is waterborne, so we need to keep our water sources clean, or that vaccination can protect entire populations against infectious disease.  Its methods have saved countless lives.  But, like trying to make risk predictions from  genetic data from whole populations, it's hard to know what to do with these descriptions of the causes of complex diseases from such a distance.

And ultimately,  of course, if we go out enough concentric circles of causation, it's life that is the cause of disease, and the only prevention is death.

Thursday, September 1, 2011

Killing malaria

Malaria kills more children per year than any other human disease.  Indeed, it has been surmised that no other single cause has killed more people, in all of human history.   And killer malaria makes people very sick long before it kills them.  Decades of studies have shown many different genetically based forms of resistance that the toll of malaria-related natural selection has taken, but that hasn't knocked the disease off the pedestal as the gold medal killer.

But, rates of malarial morbidity and mortality have been declining rapidly in parts of sub-Saharan Africa, including Eritrea, Rwanda, Zanzibar, Pemba, Tanzania mainland, Kenya and Zambia (as reported here), and in some cases it's not at all clear why.  Some of the decline is due to widespread use of bed netting to prevent mosquito bites, some to improved medical treatment, and some to use of pesticides, but the decline is also being seen in areas where none of this is happening.  A paper in the current issue of Malaria Journal by Meyrowitsch et al. suggests:
...other factors not related to intervention could potentially have an impact on mosquito vectors, and thereby reduce transmission, which subsequently will result in reductions in number of infected cases. Among these factors are urbanization, changes in agricultural practices and land use, and economic development resulting in e.g. improved housing construction.
Or, the decline might also be attributable to a decrease in the mosquito population due to changing rainfall patterns caused by climate change, an hypothesis tested by Meyrowitsch et al. They collected mosquitoes weekly in light traps in 50 households in northeast Tanzania in in two separate study periods (1998 - 2001 and 2003 - 2009), an area with no organized mosquito control.  It's a rural area; the study communities have around 1000 inhabitants, and people live in "mud-walled houses thatched with dried coconut leaves."  There are generally 2 rainy periods per year here, a long one in March-June and a shorter one in October-November.

Insect counts showed a marked decreased in the mosquito population over the 11 year study period (the primary mosquito vectors for malaria in sub-Saharan Africa are Anopheles gambiae and Anopheles funestus).
The average number of Anopheles gambiae and Anopheles funestus per trap decreased by 76.8% and 55.3%, respectively over the 1st period, and by 99.7% and 99.8% over the 2nd period. During the last year of sampling (2009), the use of 2368 traps produced a total of only 14 Anopheline mosquitoes. With the exception of the decline in An. gambiae during the 1st period, the results did not reveal any statistical association between mean trend in monthly rainfall and declining malaria vector populations.
Below are the tables of results for the two sampling periods.  (If you click on a table, it will actually be readable.)  If you look at rows 5 and 6 of, the total mosquito counts by year, you'll notice that the decline is not linear.  Instead, it seems something dramatic happened between 1998/9 and 1999/2000, the year with the most significant decline in mosquito numbers, and then again between 2004 and 2005.  The number of traps used in the first first period was less than the second, so the two periods aren't totally comparable so let's just stick with the second.  After 2006, the number of An, gambiae rose again, and then something happened between 2008 and 9 to drastically reduce insect numbers by 2009.  And it doesn't seem to be differences in total rainfall.  The authors don't discuss this apparent flux, but instead treat the decline as a general trend.  But, it might be that something distinct happened in 1999 and 2004 that explains the sharp decline, which could be overlooked by treating the decline as linear.





In any case, clearly the mosquito population has dropped precipitously.  The authors don't have morbidity and mortality statistics for the study period, but they assume they both fell. And these same kinds of results have been reported for various other parts of sub-Saharan Africa.  Indeed, they point out that a study on the island of Pemba, Tanzania, found that malaria transmission began to fall before the start of the malaria control program there. 

They conclude that the unpredictability of the rainfall resulting, presumably, from climate change could be the cause of these declining mosquito counts, rather than absolute differences in monthly rainfall.  And/or the decline may be due to:
...changes in socio-ecological conditions in the study area (e.g. changes in temperature, ability for water to pool, deforestation or land-use, change in the use of agricultural pesticides or insecticide-like compounds not directly applied for targeting malaria vectors, improved house constructions or changes related to agricultural activities). An increase in predatorily pressure on the mosquito population (e.g. birds or invertebrates) or an insect pathogen that specifically targeted mosquitoes, e.g. a bacterial, viral or fungi infection, could also potentially have induced the observed declines.
This decline in mosquito numbers, and thus in malarial infection, is very interesting in its own right, at least to us, not to mention very important if it signals the beginning of the decline in malarial infection.  But it's also interesting that the reason for the decline is so elusive -- another instance of the difficulty of determining causation.  Indeed, the explanation for the elimination of malaria from the United States early in the 20th century is still debated.  If the decline is in fact a trend, leading to elimination of the mosquito in areas where malaria has been endemic, it doesn't matter so much why it happened -- again, as in the US.  If, however, the population numbers are going to continue to jump around, potentially rising again, it's very important to figure out why.  And, as the authors state, if malaria is going to stick around, even if at lower levels, and children aren't going to be exposed as frequently, they won't develop immunity so that the few infections they do get will make them sicker.

One can say that the rate of malaria is not just about physical ecology or human biological susceptibility, but also about human culture.  If that's the explanation, it's very curious and interesting.  That's because in the first place, endemic malaria may have been due to the spread of settled agriculture, exposing land to water pooling where nearby mosquitoes could breed.  So culture enabled malaria to rise.  And now, if the suggestions are true, culture is leading to its decline.  In both cases much, at least, of this was unintended (such as global warming).  And if mass scale agriculture or global human population eventually decline, fewer people sharing the world's resources may ironically mean higher risk of malaria, as in the bad old days.