Wednesday, September 10, 2014

The Turner Oak effect: unexpected explanations

We are just catching up on a backlog of reading after three weeks away, which means that while the immediate subject of this post may be a bit dated, the topic certainly is not. The August 15 special issue of Science, which we're just now reading, is so fascinating that we can't let it go unremarked.  The issue, called "Parenting: A legacy that transcends genes," provides example after example of the effects of environmental factors on development, taste preferences, the way the brain works, disease risk, and many other aspects of life.  We can't of course evaluate the reliability of all of these results, but the evidence does seem to be pointing strongly in the direction of a mix of genes and environment in explaining the effects of parenting on growth and development.

We don't know that mounting such a strong challenge to the idea that genes are predominantly what make us who we are was the editors' intention, but the subtitle suggests that, and in our view, that's certainly what they have done.  Indeed, we can't help noting that this is an unintended but eloquent counterpoint to Nicholas Wade's view of life, in which everything including the kitchen sink is genetic (or at least we assume he'd say this, since sinks are designed by Eurasians who are because of natural selection genetically of superior inventiveness).

Cover of Science, Aug 15, 2014
Given the papers in this special issue, it's clear that more and more is being learned about how extra-genetic factors affect growth and development. What the mother eats in the days around conception, uterine conditions before conception, conditions during development, components of breast milk, ways of parenting and so forth all apparently affect the growth, development and health of a child.  In vitro fertilization may have life-long effects including risk of disease, starvation during pregnancy may affect risk of disease in offspring, what a mother eats while she's pregnant can influence her child's taste for specific foods, lack of parental care during infancy and early childhood can have lifelong effects, maternal mental illness may affect the development of the fetal brain, and so on.

Lane et al. write about "Parenting from before conception".  Infant health, they write, seems to be particularly influenced by conditions during 'fertilization and the first zygotic divisions, [when] the embryo is sensitive to signals from the mother's reproductive tract.'
The oviductal fluid surrounding the embryo varies according to maternal nutritional, metabolic, and inflammatory parameters, providing a microcosm that reflects the outside world. In responding to these environmental cues, the embryo exerts a high degree of developmental plasticity and can, within a discrete range, modulate its metabolism, gene expression, and rate of cell division. In this way, the maternal tract and the embryo collaborate to generate a developmental trajectory adapted to suit the anticipated external environment, to maximize survival and fitness of the organism. But if the resulting phenotype is a poor match for conditions after birth, or if adaptation constrains capacity to withstand later challenges, offspring are at risk.
Further,
Maternal diet at conception has a major impact on the developmental program. Reduced protein content for just the first 3 days of embryogenesis retards cell proliferation and skews the balance of cell lineage differentiation in the blastocyst.  The effect of nutritional disturbance at conception persists through implantation and influences placental development and nutrient transfer capacity, then after birth, the neonate gains weight more rapidly, developing higher systolic blood pressure and elevated anxiety.
Some of the effect is epigenetic, that is, modifications to the DNA structure that affect gene expression.  And some of the effect is, Lane et al. write, on oocyte mitochondria.  These organelles, "powerhouses of the cell", support blastocyst formation.  Their location and activity levels are known to respond to the mother's nutritional status, and ultimately affect the health of the child, as well as affecting gene expression in the brain, among other things.  Epigenetic effects on sperm, influenced by environmental conditions, also can affect the developing embryo.  But it's the "epi" in epigenetic that tells the tale: it's not the genetic (DNA sequence) variants that cause the trait difference, but variation in the use of the same sequence.

Many of the essays in this issue use the word 'plasticity', meaning that developing embryos are able to respond to various and varying environmental conditions.  If conditions are too extreme, of course, the embryo can't survive, but in general, how an embryo responds to immediate conditions may have lifelong effects.  From the review by Rilling and Young ("The biology of mammalian parenting and its effect on offspring social development"):
Parenting... shapes the neural development of the infant social brain. Recent work suggests that many of the principles governing parental behavior and its effect on infant development are conserved from rodent to humans.
That parenting has a strong effect on the infant's physiology, and that the effects of parent/child interactions have evolved to be strong is not a surprise, of course, given that parenting in mammals is essential for the survival of the offspring.  And plasticity, or adaptability, is a fundamental principle of life.  We have referred to this as 'facultativeness' in the past.  Organisms that are able to adapt to changing environments -- within survivable limits -- are much better equipped to survive and evolve.  Indeed, the final piece in this special section on parenting is titled "The evolution of flexible parenting."  Parenting behaviors among many species are well-documented to respond to environmental changes.  Put another way, it is not being genomically hard-wired that is most adaptable in these ways.

So, with all these examples of the interdigitation of nature and nurture, can we declare the death of genetic determinism?  Well, no.  Genetic determinism is alive and well, thanks in large part to Mendel and the resulting expectation that there are genes for traits that are out there to be found.  But in many ways, we've become prisoners of Mendel -- while many genes have been found to be associated with disease, we know very well that most traits are polygenic, and/or due to gene-environment interaction and we've know this for a century.  So the idea that the effect of parenting might transcend genes shouldn't be surprising.  And the idea that there might be factors that we haven't predicted that affect traits such as diseases or how brains work shouldn't be surprising, either.

The BBC recently aired an excellent 25-part program called "Plants: From Roots to Riches" about the history of Kew Gardens, and because the gardens have been so central to botany for so long, about the history of botany in general.  The series is still accessible online, and well worth a listen.  I bring this up because a story told on one of the episodes struck me as a very apropos lesson about causation.  A "Great Storm" hit the UK in 1987.  This was a hurricane that did tremendous damage, including killing millions of trees, 700 at Kew alone.

Before the storm, arborists had been concerned about a 200 year old tree at the Gardens, the Turner Oak.  It was clearly not well; leaves were stunted, growth was slow, but it wasn't clear what was wrong with it.  During the storm, the tree was uprooted completely and tossed into the air, but as luck would have it, it came back to earth right in the hole its exodus had created.  The arborists decided it didn't need as much attention as many other trees in the gardens after the storm, though, so they left it until they were finished tending to others.  This was three years later, at which time they discovered that the tree was thriving, growing again, and looking healthier than it had in decades.

Quercus x turneri at Kew Gardens; Royal Botanic Gardens

Why?  The arborists eventually realized that all the foot traffic at the Gardens had compacted the soil to the extent that the roots, and thus the tree, were suffering.  It turns out that the soil around a tree must be aerated if the tree is to thrive.

I love this serendipitous discovery.  A tree was ailing, no one knew why, until an unexpected event uncovered the explanation, and it turned out to be something that no one had thought to consider.  Many of the discoveries reported in the August 15 issue of Science strike me as of the same ilk.  Scientists have been looking for genes 'for' diabetes, taste, mental illness, obesity, and so on for decades now, and the explanation for these conditions may be instead events that happen even before conception, where it never occurred to anyone to look before.

There are numerous other examples; a few years ago it was reported that age at death (for late-life, not infant mortality) is affected by the month in which someone is born.  The authors, for some reason, did not follow up this potentially very important finding.  Maybe the effect is due to seasonal foods consumed by the mother during what turn out to be the riskier months of conception--if so, should there be lifelong evidence, if we but looked for it, of accelerated disease prodromes like obesity, hypertension and the like.

Perhaps the Turner Oak effect should be a thing -- it might encourage investigators to explicitly look for the unexpected.  What causes asthma? Could it be disposable diapers?  Who knows?  Broccoli has never been blamed for anything -- maybe it's time for broccoli to be implicated in some disease.  The problem is that we don't think to look because we all 'know' that broccoli is good for us.

Some ideas are kooky, but when it turns out that some kooky ideas really do seem to explain cause and effect, it means we shouldn't always be looking in the same place for our answers (the drunk under the lamppost phenomenon).  The cause and effect relationships described in the parenting issue of Science involve some unexpected environmental effects on gene expression -- epigenetic effects of various kinds -- and plasticity, meaning that cross-talk between genes and environment creates a give-and-take that can't be called genes or environment alone.  We don't know that these are final answers, but we know that we should expand our range of expected possibilities.

Perhaps the Turner Oak effect should guide more of our thinking in science.

Tuesday, September 9, 2014

Sloppy, over-sold research: is it a new problem? Is there a solution?

In our previous posts on epistemology (e.g., here)--the question of how we know or infer things about Nature--we listed several criteria that are widely used; induction, deduction, falsifiability, and so on.  Sometimes they are invoked explicitly, other times they are just used implicitly.

A regular MT commenter pointed out a paper on which he himself is an author, showing serious flaws in an earlier paper (Fredrickson et al.) published in PNAS, a prominent journal.  Fredrickson et al., is a report of a study of the genetics of well-being  The critique, also published in PNAS, points out fundamental flaws in the original paper. ("We show that not only is Fredrickson et al.’s article conceptually deficient, but more crucially, that their statistical analyses are fatally flawed, to the point that their claimed results are in fact essentially meaningless.") We can't judge the issues ourselves, as the paper is out of our area, but the critique seems to be rather broad, comprehensive, and cogent.  So, how could such a flawed paper make it into such a journal?

Our answer is that journals have always had their good and less-good papers, and there have always been scientists (and those who claim to be scientists) who trumpeted their knowledge and/or wares.  When there are credit, jobs, fame and so on to be had, one cannot be surprised at this.

Science has become a market, with industry and university welfare systems, a way for the middle class to get societal recognition (which is an important middle-class bauble), and journals proliferate, many avenues for profit blossom, and university administrators stop thinking and become bean-counters.  Solid science isn't always the first priority.

Science was never a pure quest for knowledge, but it is now to a considerable extent more than before, we think, a business with these various forms of material and symbolic 'profit' as coins of the realm, and the faux aspect can be expected to grow.  There isn't any easy fix, because raising standards to become better policed usually leads to becoming more elite, closed, and exclusive, and that is itself a form of opportunity-abuse.

Our commenter did add that he can no longer trust research sponsored by the US government, and here we would differ.  Much good work is done under government sponsorship, as well as industry sponsorship (which can have its own problems).  The government is a loaded, inertial bureaucracy with its armada of career-builders, and that is predictably stifling.  But the general idea is to do things right, to benefit society (not just professors, or funders, or university administrators).  The problem is how to improve the standard.

The issue is not epistemological
Actually we think the comment was misplaced in a sense, because our post was about epistemological criteria--how do we know how to design studies and make inferences?  The comment was about the way the results are reported, accepted, exaggerated, and the like.  This is certainly related to inference, but rather indirectly we'd say.  Reviewers and editors are too lax, have too many pages to fill, too many submissions to read and the like, so that judgment is not always exercised (or, often, authors bury their weak points in a dense tangle of 'supplementary information').

That is, one can do the most careful study, following the rules, but use bad judgment in its design, be too-accepting of test results (such as statistical tests), use inappropriate measures or tests.  And then, often in haste or desperation to get something published from one's work (an understandable pressure!) submit a paper that's less than even half baked.

What is needed is to tighten up the standards, the education and training, reduce the pressure for continual grant funding and publication streams to please Deans or reviewers, and give scientists time to think, make them accountable for their promises, and slow down.  In a phrase, reward and recognize quality more than quantity.

This is very hard to do.  Our commenter's points are very well taken, in that the journals (and news media) are now heavily populated by low- or sub-standard work whose importance is routinely and systematically exaggerated to feed the insatiable institutional maw that is contemporary science.

Friday, September 5, 2014

When do you believe research?

At the end of the mini-course we taught in Helsinki, after a week of discussion of many essentially philosophy-of-science issues including how to make decisions about cause and effect, or how to  determine whether a trait is 'genetic', or if it can be predicted from genes, a student asked how we decide which studies to believe.  That is, responding to our questioning nature, he wanted to know how we decide which research reports to be skeptical about and which to believe.  I've been thinking a lot about that.  I don't really have answers because it's a fundamental question, but here are a few thoughts.

The class, called Logical Reasoning in Human Genetics, is meant to get students thinking about how they know what they think they know.  Ken gave a lecture on the first day in which he talked about epistemological issues, including the scientific method. We're all taught from childhood that knowledge advances by the scientific method. There are multiple definitions, but let's just go with what's cited in the Wikipedia entry on the subject, in turn taken from the Oxford English Dictionary.  It's "a method or procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses." But most definitions go further, to say one adjusts the hypothesis until there is no discrepancy between it and the latest results.  This is how many web pages and books portray the process.

But this is awfully vague and not terribly helpful (if it's even true: for example, when is there no discrepancy between hypothesis and actual data?).  Who decides what is systematic observation, how and what we measure, how we conduct experiments, and formulate, test and modify hypotheses?  And even if we do agree on all this, it wouldn't give any hint as to which results should be believed.  Any that follow the method?  Any that lend evidence to our hypotheses?  There was plenty of evidence for the sun revolving around the Earth, and spontaneous generation, and the miasma theory of disease, all based on systematic observation and hypotheses, after all.  Clearly, empiricism isn't enough.

In his first lecture, Ken showed this slide:

The essential tenets of the scientific method.  Most of us would include at least some of these criteria in a list of essentials, right?  Ken discussed them all, and then showed why each of them in turn may be useful but cannot in fact be a solid basis for inferring causation.  One may hypothesize that all swans are white, and it may seem to stand up to observation -- but observing a single black swan does in that theory.  Figuratively speaking, when can we ever be sure that we'll never see a non-white swan?  So induction is not a perfectly reliable criterion for forming general inferences.  Prediction is an oft-cited criterion for scientific validity, but in areas of biology depends on knowing future environments, which is impossible in principle.  Scientists claim that theories may never be provable but can always be falsified, which leads to better theory. But scientists rarely, if ever, actually work to falsify their own theories.  And one can falsify an idea by a bad experiment even if the idea is correct.  P-values for statistical significance are subjective choices: P = 0.05 was not decreed by God. And so on.

So, then Ken added the following criterion:


This is probably a better description of how scientists actually do science.  And I'm writing this in Austria, so I'll mention that if you've read Austrian philosopher of science, Paul Feyerabend's "Against Method", this will sound familiar.  Feyerabend believed that strict adherence to the scientific method would inhibit progress, and that a bit of anarchy is essential to good science.  Further, the usual criteria, e.g. consistency and falsification, are antithetical to progress. Indeed, as a philosopher who took a hard long look at the history of scientific advances, Feyerabend concluded that the best description of good science is "anything goes," a phrase for which he is famous, and often condemned. But he didn't mean it as a principle, rather it was a description of how science is actually done.  It is a social and even political process.

However, even an anarchic bent doesn't help us decide which results to believe, even if it does mean that we shouldn't consider that sticklers for method have an advantage.

How do we decide?
A few weeks ago we wrote about a paper that claimed that tick bites are causing an epidemic of red meat allergies in the US and Europe.  Curious.  Curious enough to lead me to read 3 or 4 papers on the subject, all of which suggested a pattern of exposure and symptoms consistent with the habitat of the tick, as well as a mechanism that explained how the tick bite could cause this often severe allergy.  Seemed convincing to us.

But, someone on Twitter wasn't convinced:
The link is to a Lancet article, but it restricts its discussion to the anti-science claims of those who believe that Lyme disease is not what 'evidence-based' medicine says it is.
Similar to other antiscience groups, these advocates have created a pseudoscientific and alternative selection of practitioners, research, and publications and have coordinated public protests, accused opponents of both corruption and conspiracy, and spurred legislative efforts to subvert evidence-based medicine and peer-reviewed science. The relations and actions of some activists, medical practitioners, and commercial bodies involved in Lyme disease advocacy pose a threat to public health.
But should we skeptical about all tick-borne diseases?  The CDC still lists a number of them.  I don't know enough about this subject to comment further, but it's interesting indeed that antiscience claims can themselves be couched in a semblance of the scientific method.  Or at least a parallel track, with its own 'experts', publications, peer reviewers, and so on.  In fact this makes the question of how one decides what to believe almost mystical, or dare we say religious. Surprisingly, while it is often said that science, unlike other areas of human affairs, isn't decided by a vote, in reality group consensus about what is true is a kind of vote among competing scientists; the majority or those in most prominent positions, do tend to set established practice and criteria.

Or what about this piece, posted last week by the New York Times, on the effects of bisphenol on ovarian health?  Evidence seems to be mounting, but even people in the field are cautioning that it's hard to tell cause and effect.  Or, what about the causes of asthma?  Environmental epidemiology has found that breast feeding is a cause, but also bottle feeding, excessive hygiene, or pollution.  Same methods -- "systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses" -- contradictory results.

Or, what about climate change?  How do we decide what to believe?  Few of us are expert enough in meteorology, geology, or climate history to make a decision based on the data, so essentially we must decide based on whether we believe -- yes, believe -- that the science is being rigorously conducted.  But, how would we know?  Do we count the number of peer-reviewed papers reporting that the climate is changing?  If so, that's just a belief that peer-review adds weight to findings, rather than is simply evidence of a current fad in thinking about climate, or circling of wagons, or some other sociological quirk of science.  Do we count the number of papers or op/ed pieces written by US National Academy members, or Nobel prize winners?  In which case, we're even further from actual scientific evidence.

We can list one criterion that, today, must be true.  The results must be evolutionarily sound.  Evolution is probably as close as biology comes to 'theory'; descent with modification from a common ancestor.  If results don't fit within that theory, they are probably wrong.  But not definitively -- we should always be testing theory.

Here's another one, that must be true when considering causation -- the cause must precede the effect.  (This is one in a list of nine criteria sometimes relied upon in epidemiology, the rest of which aren't necessarily true, recognized even by Bradford Hill who devised the list.)  But this isn't terribly helpful. Many things can precede an effect, not just one, and many things that precede the event are unrelated to it.  Which such preceeder do we accept?

Several criteria that might help are replication and consistency, but for many reasons they can't be considered sufficient or necessary.  They might confirm what we think we know -- but consistent and replicated findings of disease due to bad air prior to the germ theory of disease confirmed miasma as a cause.  Life is about diversity, and that is how it evolves, so replication is not a necessary criterion for something about, say, genetic causation, to be true under some circumstances but not all.

Science is done by scientists in (and these days supported by) society.  We need jobs and we try to seek truth.  But one proverbial truth is that science should always be based on doubt and skepticism: rarely do we know everything perfectly.  Once we stop questioning -- and the hardest person to question is oneself -- then we become dogmatists, and our science is not that different from received truth in religion.

Scientists may rarely think seriously or critically about their criteria for truth.  We believe that there is truth, but it's elusive much of the time, especially in complex areas like evolutionary biology, genetics, and biomedical causation.  A major frustration is that we have no formal criteria for inference that always work.  Inference is a kind of collective, social decision process, based on faith, yes, faith in whatever a given scientist believes or is pressured by his/her peers to believe.  The history of science shows that this year's 'facts' are next year's discards.  So which study do we believe when there are important implications for that decision?  If it's not true that you can "use whatever criteria you want", for various pragmatic reasons, then what is true about scientific inference in these areas of knowledge?

Wednesday, September 3, 2014

Genomic cold fusion? Part III. Gene mapping: when minnows are whales

In the first two parts of this series we tried to outline the actual logic underlying the search for genes that affect a trait, disease or otherwise, that we might be interested in.  We titled this series ‘genomic cold fusion’ in response to a comment on a tweet made about our course Logical Reasoning in Human Genetics whose most recent offering was given by us a week or so ago in Helsinki, Finland.  The characterization was about the idea of ‘linkage analysis’—that is, in known pedigrees—to find genetic causal factors for complex traits. 

We tried to explain that evolutionary (population) history lies behind the logic of both family-based linkage and population-sample-based association approaches to genomewide mapping (such as in GWAS).  When causes are strong and not too numerous, mapping works in large families.  That’s because if something is genetic it must be familial—that in a sense is what ‘genetic’ means in this context—and one can trace transmission, following Mendelian principles, explicitly.

If causes are individually rare and there are many, pooling families doesn’t work very well, because getting large enough families to map individually is difficult and costly, but that is just what GWAS does in its implicit, unconstrained pooling of different families, where the family connections aren’t even known!

In the end, however, we concluded that if there were too many different causes, and they are weak or rare, and environmental factors are important, then the trait is basically the result of a mix of contributors, differing among individuals both within and between families.  Individually, we suggested, the causes are minnows, and fishing in a pond of minnows, no matter how it’s done, will only find minnows.  But there is more to the issues than this, and it deserves to be recognized.

When a minnow is a whale
There are tons of results in which a known genetic mutation identified as having a major effect is found to have lesser effects in some people.  Even family members sharing the variant may have different effects (more or less severe, for example, in regard to disease).  Some may have an essentially lethal phenotype, while others are only mildly affected.

The reason is that a variant’s causal effects depend fundamentally on its context.  This is true for environmental risk factors as much as genetic ones.  A causal minnow—a minor causal effect—can be major in some contexts.  Any approach to genetics that fails to take this basic fact seriously into account is, in a sense, amateurish.

A good illustration of this is that when a disease-causing genetic change is engineered into a laboratory mouse, it may or may not mimic the human trait.  Sometimes, perhaps most of the time, it will have a roughly similar effect in one strain of lab mice, but very different, or even no effect, in other strains.  Indeed, while this is very well-known to mouse workers (including ourselves when we were doing that sort of experiment), it is rarely taken seriously into account.  A transgenic effect is reported, but not checked in other strains of lab mice, or in other animal models, such as rats or dogs.  The reasons, especially for other species than mice, is that such testing is quite costly.  The bottom line is that we learn about the biology of the effect in one of its contexts, but extrapolate to other contexts, even humans, at our peril.  This, too, is well known.

This is why, among humans within or between populations, a mapping minnow can be a causal whale in some people, and vice versa.  It’s something that needs to be recognized more widely, but for which there really is no generic explanation.  It’s why risk estimates given for a genetic variation—such as by companies essentially practicing shell-game medicine without a license by advising customers about their ‘risk’ based on DNA analysis—are often not worth the electrons needed to send them.  Some risk factors are often very strong (one thinks of BRCA variation and breast cancer) but most are not, and some are very weak to start with and only strong in rare contexts.  Again, conscientious geneticists know this very well, or should.  It’s not secret.

Indeed, the fact that minnows can grow up to be whales or whales can shrink to minnows depending on the genomic and environmental pond they’re swimming in, is one of the important things genomicists should be directly addressing, rather than making the rather bold and expensive promises that they are making.

This isn’t an argument against doing genetics, but it is a reason to think differently, or at least carefully, before making very expensive promises that often are not very different from what preachers promise you if you’ll put some coin in the plate being passed.

Genetics is fundamental biology, and its challenges are great from the ground up.  At present, those challenges are typically whales, but are just as typically, and expediently, treated as if they are minnows. 

Monday, September 1, 2014

Lunch with the Captain

These days I’m having trouble finding time to write, especially to blog.

My colleagues and I are busy building a team and a large network of collaborators for a series of related malaria elimination projects.  Our initial goal in this project is to wipe malaria out in very specific populations.  If this works, and from our initial work at a smaller scale it appears as though it can, it will be vastly scaled up – reaching throughout Southeast Asia.

The impetus for this work is the so-called evolutionary arms race.  This part of the world has a very long history of popping out drug and multi-drug resistant strains of falciparum malaria (C Wongsrichanalai et al., 2001; Chansuda Wongsrichanalai, Pickard, Wernsdorfer, & Meshnick, 2002).  We (malaria workers) roll out a new line of defense (antimalarials) against our chosen adversary, and our adversary quickly develops a defense strategy against us.  These strains can subsequently move from this part of the world to others, parts of sub-Saharan Africa for example, where the malaria burden is much heavier and the results would be much more devastating (Payne, 1987).

Occasionally there are deaths from malaria infections here along the Thailand-Myanmar border (though usually the major toll the illness takes here is in time spent ill and therefore unable to work.)  Not that long ago, a 15 year old boy died from malaria.  He was not far from health care clinics that would have treated him.  The story I hear is that he was without close family members, he lived alone and worked in the agricultural fields, and that he essentially lay in those fields dying from the disease through an apparent gap in his and his community’s social network.  Everyone was devastated.  If complete drug resistance were to reach Africa, this story would be magnified in both space and time.  Even where the social networks were strong, the health clinics wouldn’t be able to adequately treat people with malaria.  The geographic reach would be huge and the numbers of death would likely dramatically increase.  This can’t happen.

Today our last, best tool against malaria is artemisinin and its derivatives.  But already throughout Southeast Asia researchers and health care workers are seeing parasites survive much longer in the human hosts after being treated with artemisinin (Ashley et al., 2014).  How much longer will it work at all?  And should we really wait to find out?  It often feels as though everyone around here has been doing the same “malaria control” game for a very long period of time, regardless of the fact that the outcome is always the same.  Our drugs stop working and we have to start over again.  Sometimes this problem is exacerbated by a lack of information and/or the dissemination of scientific knowledge.  Many of my Thai colleagues who actually work in direct malaria care in this area just learned last year (2013) that resistance to artemisinins might be occurring or even growing and spreading in their region.  A major scientific paper on this (that I’m aware of) came out 5 years ago (Dondorp, Nosten, & Yi, 2009), with rumors of it almost 10 years ago (Noedl et al., 2008)!  Shouldn’t the people who live in the war zone know that a war is happening!?  What a failure of science – and of our strategy over the long term.  It is time for a change.

So what we’re working on is a tool that we’re calling “targeted chemo-elimination.”  Essentially this is a form of mass drug administration.  That is, everyone in a targeted community would take drugs (antimalarials) regardless of whether or not they felt sick (some recent thoughts on this here, here, and here).  It is much more complicated than this though, in that it isn’t a single strong dose of the antimalarials, we’ll be using a cocktail of drugs so that we can hope to avoid further driving resistance, and since the administration will occur over time, over several stages, we’ll be able to vary this cocktail if necessary.

Logistically this is extremely difficult to pull off.  It is hard enough to get people in easy to reach populations in places like the U.S. to take medicine when they feel sick, let alone to take a vaccine that would prevent them from being sick.  How do we go about convincing people in extremely remote populations, frequently in the middle of old or continuing conflict zones, to take medicine, over a long period of time, regardless of whether or not they are currently feeling sick?  It isn’t easy.

But it can be done and the way to do it is through community engagement – drawing on notions and principles well-known in anthropology and other social sciences.  It can happen when there is understanding, trust, and social cohesion.  Sometimes these things are lacking in our target communities between members of the community, and/or between us and members of the community, and it is therefore important to build them up.  Sometimes we need to plant a seed, water it, foster it, and help it to grow.





This is exhausting work, physically, psychologically, and emotionally.

A little while back I made a trip to one of the communities in our target area, to visit local people and share some of what our project is about.  I wound up eating lunch at a table full of “freedom fighters”, some dressed in fatigues and drinking whisky out of small coffee cups.  A captain who was sitting at the table gave me a history lesson, translated to English through one of my colleagues who speaks both my tongue and the local language.  I heard stories about being betrayed by colonialists who promised these people their own land but never followed through and of people who were willing to die for that land, many of whom did in fact pay that price.



Among the things he said to me was that he admired two major things about Americans.  One is that their time is their money (time is extremely valuable).  And the other is that they realize they have a burden, to help others, that is bigger than a mountain (we were sitting at the base of a relatively large one).



I don’t know if this generalization is true of all Americans and I don’t care to go into that.  But I do know that time is of the essence and that I feel a burden.  There is a lot of work to do, and not so much time in which to do it.






*** As always, my opinions are my own.  This post and my opinions do not necessarily reflect those of Shoklo Malaria Research Unit, Mahidol Oxford Tropical Medicine Research Unit, or the Wellcome Trust. 



Ashley, E. a., Dhorda, M., Fairhurst, R. M., Amaratunga, C., Lim, P., Suon, S., … White, N. J. (2014). Spread of Artemisinin Resistance in Plasmodium falciparum Malaria. New England Journal of Medicine, 371(5), 411–423. doi:10.1056/NEJMoa1314981

Dondorp, A., Nosten, F., & Yi, P. (2009). Artemisinin resistance in Plasmodium falciparum malaria. The New England Journal of MedicineEngland Journal of …, 455–467. Retrieved from http://www.nejm.org/doi/full/10.1056/nejmoa0808859

Noedl, H., Se, Y., Schaecher, K., Smith, B., Socheat, D., & Fukuda, M. (2008). Evidence of artemisinin-resistant malaria in western Cambodia. N Engl J Med, 359(24), 2619–2620.

Payne, D. (1987). Spread of chloroquine resistance in Plasmodium falciparum. Parasitology Today (Personal Ed.), 3(8), 241–6. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/15463062

Wongsrichanalai, C., Pickard, A. L., Wernsdorfer, W. H., & Meshnick, S. R. (2002). Epidemiology of drug-resistant malaria. Lancet Infectious Diseases, 2, 209–218.

Wongsrichanalai, C., Sirichaisinthop, J., Karwacki, J. J., Congpuong, K., Miller, R. S., Pang, L., & Thimasarn, K. (2001). Drug resistant malaria on the Thai-Myanmar and Thai-Cambodian borders. Southeast Asian J Trop Med Public Health, 32(1), 41–49.

Friday, August 29, 2014

Genomic cold fusion? Part II. Realities of mapping

Mapping to find genomic causes of a trait of interest, like a disease, is done when the basic physiology is not known—maybe we have zero ideas, or the physiology we think is involved doesn’t show obvious differences between cases and controls.  If you know the biology, you won't have to use mapping methods, because you can explore the relevant genes directly.  Otherwise, and today often, we have to go fishing, in the genome, to find places that may vary in association—statistical regularity—with the trait.

The classical way to do this is called linkage analysis.  That term generally refers to tracing cases and marker variants in known families.  If parents transmit a causal allele (variant at some place in the genome) to their children, then we can find clusters of cases in those families, but no cases in other families (assuming one cause only).  We have Mendel’s classical rules for the transmission pattern and can attempt to fit that pattern to the data—for example, to exclude some non-genetic trait sharing.  After all, family members might share many things just because they have similar interests or habits.  Even disease can be due to shared environmental exposures. Mendelian principles allow us, with enough data, to discriminate.

“Enough data” is the catch.  Linkage analysis works well if there is a strong genetic signal.  If there is only one cause, we can collect multiple families and analyze their transmission patterns jointly.  Or, in some circumstances, we can collect very large, multi-generational families (often called pedigrees) and try to track a marker allele with the trait across the generations.  This has worked very well for some very strong-effect variants conferring very high risk for very specific, even quite rate, disorders.  That is because the linkage disequilibrium—the association between a marker allele and a causal variant due to their shared evolutionary history (as described in Part I) ties the two together in these families.

But it is often very costly or impractical to collect actual large pedigrees that include many children each generation, and multiple generations.  Family members who have died cannot be studied and medical records may be untrustworthy, or family members may have moved, refuse to participate in a study, or be inaccessible for many reasons.  So a generation or so ago the idea arose that if we collect cases from a population we may also collect copies of nearby marker alleles in linkage disequilibrium—shared evolutionary history in the population—so that, as described in Part I, a marker allele has been transmitted through many generations of unknown but assumed pedigree, so that the marker will have been transmitted in the pedigree along with the causal variant.  This is implicit linkage analysis, called genomewide association analysis (GWAS), about which we’ve commented many times in the past.  GWAS look for association between marker and causal site in implicit but assumed pedigrees, and is another form of linkage analysis.

When genetic causation is simple enough, this will work.  Indeed, it is far easier and less costly to collect many cases and controls than many deep pedigrees, so that a carefully designed GWAS can identify causes that are reasonably strong.  But this may not always work, when a trait is ‘complex’, and has many different genetic and/or environmental contributing causes.

If causation is complex, families provide a more powerful kind of sample to use in searching for genetic factors.  The reason is simple: in general a single family will be transmitting fewer causal variants than a collection of separate families.  Related to this is the reason that isolate populations, like Finland or Iceland, can in principle be good places to search, because they represent very large, even if implicit, pedigrees.  Sometimes the pedigree can actually be documented in such populations.

If causation is complex, then linkage analysis in families will hopefully be better than big population samples for finding causal contributors, simply because a family will be segregating (transmitting) fewer different causal variants than a big population.  We might find the variant in linkage analysis in a big family, or an isolate population, but of course if there are many different variants, a given family may point us only to one or two of them.  For this reason, many argue that family analysis is useless for complex traits—one commenter on a previous Tweet we made from our course, likened linkage analysis for complex traits to ‘cold fusion’.  In fact, this was a mistake and is incorrect. 

Association analysis, the main alternative to linkage analysis, is just a combining of many different implicit families, for the population-history reason we’ve described here and in Part I.  The more families you combine, whether they are explicit or implicit, the more variation, including statistical ‘noise’, you incorporate.  The rather paltry findings of many GWAS are a testament to this fact, explaining as they have only a small fraction of most traits to which that method has been applied.  Worse, the greater the sample of this type, like cases vs controls, the more environmental variation you may be grouping together, again greatly watering down even the weak signal of many or, probably, by far most genetic causal factors.

In fact, if you are forced to go fishing for genetic cause, you may well be fishing in dreamland because you may simply be in denial of the implications of causal complexity.  In fact, all mapping is a form of linkage analysis.  Instead, one should tailor one’s approach to the realities of data and trait.  Some complex trait genes have been found by linkage analysis (e.g., the BRCA breast-cancer associated genes), though of course here we might quibble about the definition of 'complexity'. 

Sneering at linkage analysis because it is difficult to get big families, or  because even single deep families may themselves be transmitting multiple causes (as is often found in isolate studies, in fact), is often simply a circle-the-wagon defense of Big Data studies, that capture huge amounts of funding with relatively little payoff to date.

A biological approach?
Many linkage and association analyses are done because we don’t understand the basic biology of a trait well enough to go straight to ‘candidate’ genes to detect, prevent, or develop treatment for a trait.  Today, even though this approach has been the rule for nearly 20 years now, with little payoff, the defense is often still that more, more and even more data will solve the problem.  But if causation is too complex this can also be a costly, self-interested, weak defense.

If we have whole genome sequence on huge numbers of people, or even everyone in a population, or in many populations so we can pool data, that we will find the pot of gold (or is it cold fusion?) at the end of the rainbow.

One argument for this is to search population-wide genome sequenced biomedical data bases for variants that may be transmitted from parents to offspring, but that are so rare that they cannot generate a useful signal in huge, pooled GWAS studies.  This usually will still be in the form of linkage analysis if a marker in a given causal gene is transmitted with the trait in occasional families but the same gene is identified, even if via different families.  That is, if variation in the same gene is found to be involved in different individuals, but with different specific alleles, then one can take that gene seriously as a causal candidate.

This sometimes works, but usually only when the gene’s biology is known enough to have a reason to suspect it.  Otherwise, the problem is that so much is shared between close family members (whether implicitly or explicitly in known pedigrees) that if you don’t know the biology there will be too much to search through, too much co-transmitted variation.  Causal variation need not be in regular ‘genes’, but can be, and for complex traits seems typically to be, in regulatory or other regions of the genome, whose functional sites may not be known.  Also, we all harbor variation in genes that is not harmful, and we all carry ‘dead’ genes without problems, as many studies have now shown.

If one knows enough biology to suspect a set of genes, and finds variants of known effect (such as truncating a gene’s coding region so a normal protein isn’t made) in different affected individuals, then one has strong evidence s/he has found a target gene.  There are many examples of this for single-gene traits.  But for complex traits, even most genes that have been identified have only weak effects—the same variant most of the time is also found in healthy, unaffected individuals.  In this case, which seems often to be the biological truth, there is no big-cause gene to be found, or a gene has a big-cause only in some unusual genotypes in the rest of the genome.

Even knowing the biology doesn't say whether a given gene's protein code is involved rather than its regulation or other related factors (like making the chromosomal region available in the right cells, downregulating its messenger RNA, and other genome functions).  Even in multiple instances of a gene region, there may be many nucleotide variants observed among cases and controls.  The hunt is usually not easy even knowing the biology--and this is, of course, especially true if the trait isn't well-defined, as is often the case, or if it is complex or has many different contributors.

Big Data, like any other method, works when it works.  The question is when and whether it is worth its cost, regardless of how advantageous for investigators who like playing with (or having and managing) huge resources.  Whether or not it is any less ‘cold fusion’ than classical linkage analysis in big families, is debatable.  

Again, most searches for causal variation in the genome rest on statistical linkage between marker sites and causal sites due to shared evolutionary history.  Good study design is always important.  Dismissal of one method over another is too often little more than advocacy of a scientist’s personal intellectual or vested interests.

The problem is that complex traits are properly named:  they are complex. Better ideas are needed than what are being proposed these days.  We know that Big Data is ‘in’ and the money will pour in that direction.  From such data bases all sorts of samples, family or otherwise, can be drawn.  Simulation of strategies (such as with programs like our ForSim that we discussed in our recent Logical Reasoning course in Finland) can be done to try to optimize studies. 

In the end, however, fishing in a pond of minnows, no matter how it’s done, will only find minnows. But these days they are very expensive minnows.

Thursday, August 28, 2014

Genomic cold fusion? Part I. Rational and irrational aspects of mapping

I’m sitting here on a smooth, quiet train from Zurich to Innsbruck, a few days after the mini-course that we taught in Helsinki. In this post I want to make a few reflections on things said by people reacting to Facebook or Twitter messages about the course, comments that were too short to do justice to what we actually said.

In particular, the issues have to do with the nature of genome mapping strategies and what they are or mean.  There seems to be a good bit of confusion in this area, perhaps because of a lack of proper explanation of what these methods do, and why and how they work.

First, nobody should be doing mapping, looking for genes causally responsible for traits, unless they have some legitimate reason for believing that a trait is substantially affected by genes—that is, that variation in the trait or risk of a trait like a disease is causally associated with variation in a particular spot in the genome.  Such a reason, at best, would be that the trait seems to segregate in families as if caused by a single Mendelian factor.  If the evidence is weaker than that—as it so often is—then mapping becomes the more problematic.

If we don’t know the part of the genome that affects the trait, then we use many measured variable sites, called markers, that span the genome with the idea that wherever the causal site is, it will be near one of our markers.  Essentially, that is, we are searching for statistically significant associations between the marker and trait, based on some basically subjectively chosen measure, like a p-value, in samples that we believe are appropriate for detecting causal effects.

What is perhaps not widely appreciated, is the nearly essential way that such searches rely on evolutionary assumptions.  We say ‘nearly’ because if one happens by huge luck to genotype the causal site itself, the test for association may be a bit more direct, as we’ll try to explain.

Mapping is based on evolutionary history
Evolution, or population history, generates the variation that causes the trait effect, and the variation we use as markers.  Mutational events generating these variants occur when they occur, and we choose markers based on the idea that they vary in our chosen type of sample, and that the instances of a given marker allele (variant) are descendant copies of some original mutation.  These instances of the same allele are said to be identical by descent (IBD) from that common ancestral copy.  Sets of instances of the marker also mark nearby chromosomal regions that have been passed down the same chain of descent.  That shared region is called a haplotype, and it gradually shortens over the post-mutation generations by a process called recombination.

If at some later time in the history of the haplotype ‘tagged’ by the marker variant another mutation occurs in a gene and alters that gene’s effects to generate the trait we are interested in, then the marker variant will be present in subsequent descendant copies of that twice-hit haplotype, and the causal signal will be associated with the presence of the marker variant.  This is called linkage disequilibrium (LD), and is the reason that mapping works.  That is, mapping works because of shared evolutionary (population) history of the marker and causal variants.

An hypothetical, simple example
[I’m continuing this post a couple of days from when I started it on the train to Innsbruck, and now finishing it in a nice hotel in Old Town, overlooking the Inn river.  Beautiful!]

Let’s say that we have a marker at which some people have a G nucleotide and others a T.   And let’s say the disease causal site, D, is near the G/T site, and that the D mutation, wherever it is on the chromosome, is near a copy of the chromosome that has the G on it at the marker site.  Then, what we hope is that the disease will be associated with the G—that enough more people with the disease will have the G than people without the disease.  This is the kind of association between trait-cause and marker that mapping is looking for.  But what can make it happen?

If we’re lucky everyone with the D allele at the causal site will have the trait (the ‘D’ mutation is fully penetrant, as we’d say).  And if there has been no recombination, and no other way to get the trait, then nobody with a T at the marker will also have the D variant—none of the T-bearers will have the disease.  Cases will have the G, controls the T.

This sort of perfect association depends on when the D-mutation, wherever it is on the chromosome, occurred relative to the mutation that produced the T at the marker.  We usually pick marker sites because we know that the variation (here, G vs T) is common in the population, and that means that the mutation is rather old.  Enough generations have passed for there to be a substantial fraction of T-bearing, and G-bearing people in the population.

If the ‘D’ mutation occurred right after this G-T marker’s mutation, then all copies of the G variant at the marker will also have the trait.  But if the trait-mutation occurred much later, then only a few of the G-bearing chromosomes will have the D-causing trait.  The association, even if true, will be weak.  If the D-site is far from the G-T marker site, then if the D-causing mutation occurred long enough ago for most G-bearers also to have the trait, but there’s a trap: in this case there will have  been enough time for recombination to switch the D-site onto a T-bearing marker chromosome.  The G-D association will no longer be perfect.

Likewise, if there are many different causes of the trait, then some cases will not be due to the D-variant (tagged by the G-allele at the nearby marker), even if the latter really is also a cause.  We’ll have cases with the T-marker variant, and in this case it’s not because of recombination.  The more causes of the trait the weaker the association between a specific marker, like the G-T one. 

Science or cold fusion?
So mapping is a multiple-edged sword.  Now, there are several ways to try to find trait-associated parts of the genome.  One is called linkage mapping, the other association mapping (genomewide association, or GWAS).  And one can also think that causal sites can be found not  by relying on linkage-disequilibrium, but simply by looking for causal variants directly.

These various strategies have their strong and weak points, and there is just as strong disagreement as to which to apply when.  That’s why someone can, sometimes sneeringly, claim that this or that approach is ‘cold fusion’—that it’s imaginary, and won’t or can’t work.  But since mapping for complex traits is not doing very well—as we’ve posted many times (and many others have repeatedly observed), we are usually explaining only a rather small if not trivial fraction of causation by mapping, the issues are serious, regardless of the vested interests of those contending with these issues.

In our next post we’ll discuss some of these issues about methods.