Thursday, April 23, 2015

Should we eat eggs?

Ken and I are in Seattle this week, Ken meeting with people at the Institute for Systems Biology, and having many interesting conversations about different approaches to complex problems.  We had dinner with Ken's host Sui Huang and others from his lab the other night.  It was good to talk with people who think about many of the issues we often blog about, such as why it's so hard to identify the cause of many complex diseases.

Indeed, we don't know the answers to what seem to be simple questions, such as whether eggs, or sugar, or fats, or carbs are good or bad for us.  Why don't we know, people asked at dinner.  I tried to quickly sum up my thinking on this, but did so very inadequately. Wish I could blame it on the wine.  But I thought I'd expand here.

The gold standard for determining cause and effect is, of course, randomized controlled trials (RCT's).  Randomly assign half your study population a treatment, and the other half a placebo and see whether the treatment has a statistically identifiable effect.  The groups should differ systematically only on whether they receive the treatment or not, so in theory the only effect you will see is of the treatment. The best studies are double blind, that is, even those administering the treatment don't know which individuals are getting which.

This can work well when you're testing something like a quick-acting drug, something with strong, immediate effects, but when what you want to know is the effect of, say, eating eggs, or chocolate, it's more complicated.  Yes, you could give one group a daily dose of chocolate for 6 weeks, say, or even 6 months, but placebo chocolate -- or eggs, or acupuncture, or marijuana -- has to be so disguised, or reduced to a pill containing what you decide is the single component of interest, that it no longer mimics the context in which it's actually eaten, and RCTs may well become much less informative.

And there are other issues.  A daily dose for how long?  And, how do you decide?  And, does the effect depend on what else you eat with your chocolate, whether, say, you take it with coffee or wine? Or, could it be that the age you start eating chocolate determines its effect?  That is, assumptions are necessarily built into your choice of study duration, age of subjects, how more or less reductionist your study is, and the contextual effects of the foods we eat, and so on.

Red licorice; Wikipedia
Ok, maybe you decide that you don't know how long it would take for chocolate consumption to have an effect, and so shouldn't build that into your study but instead you want to figure out how long it takes to raise HDL cholesterol/protect against cavities/prevent cancer/whatever you think its effect is.  An RCT isn't going to be useful for answering such questions because it takes too long, maybe decades, for the effect to be observed in a practicable sense; so you need a different study design.  Let's say a retrospective study, in which you ask people about their past diets.

Unfortunately, dietary recall is notoriously unreliable.  How often did you eat broccoli last year?  Or even last month?  Or in concoctions with many ingredients, like, say, soup, where you may not know what the ingredients were?  And, how much?  Maybe you tell me your serving was three spears -- but how much of the stem did you cut off, and does that matter?  And do you add butter?  Salt?  Do you stirfry it?

And, if you're being asked about something to which there might be a certain amount of guilt attached (damned Puritans!), how honest are you going to be when you're asked how much chocolate you eat, or how much alcohol you drink (physicians routinely round up to another beer, drink or glass of wine when their patients' answer that question)?

And, of course, the further back you go, the less reliable the answers will be.  But, what if it's the chocolate we ate before we were 10 that predisposes to an effect?  Sure, that's unlikely, but it does make the point that we're still building assumptions into our study design.  Necessarily.

And, no one eats chocolate, and only chocolate.  Or hardly anyone.  Or, only whatever component of a given food that's thought to have whatever effect we're after -- such as resveratrol in red wine, which may, or may not, protect against heart disease.  Does it matter what else we eat with the offending food?  Which of course could be anything.  Or how the nutrient is packaged; whether in grapes or wine?  How do we account for that?  It's the same problem geneticists have with respect to genomic context; a gene variant may be deleterious in one context, and not in another.

Further, it's possible that people who eat chocolate are healthier, or less healthy, in general than people who don't.  That is, there may be other factors that interfere with, or even explain, the direct effect of eating chocolate that you're trying to identify.  Or, people who eat a lot of eggs also eat a lot of, I don't know, red licorice and it's the dye in the licorice that's really causing whatever effect you're measuring.  You might take, say, (control for) weight or exercise or smoking into account, but it's unlikely you'll think to ask about red licorice consumption.

And so forth.  There are many similarities between these kinds of epidemiological questions and genetics, including assumptions about the kinds of factors we'd like to be identifying.  Ideally, we'd all love to find single genes or single nutrients, with large effects, large enough to drown out the inevitable noise from the rest of the diet, genome, exercise routine, etc. that is likely to influence the action of the single factor we're looking for.  But these are rare, and aren't going to explain the bulk of the complex diseases most of us will eventually get.

In addition, epidemiologists and geneticists share similar confounders -- everyone's dietary history (or genome, if we're doing genetics) is unique, as is their genome (or, again for geneticists, history of environmental exposures), and thus everyone gets to their disease in their own way, with unique interactions and pathways.  But we're collecting and making sense of data on a population of people.  Indeed, we can't hope to understand the effect, or association, at all without looking at large groups of people, but then we're stuck trying to figure out how to apply what we think we learned from the group to individuals who are probably as unalike as they are alike.

In reality, given the number of possible interacting factors, and the general weakness of their effects, we would quickly get to a point where we've got to take so much into account that our analysis becomes untenable.  There may be so many combinations to test that you can't replicate things enough to get a signal, or can't get nearly adequate samples.  The larger the sample, the more heterogeneity you may have in your data (the signal-to-noise ratio may not increase with sample size).  You have to correct for doing huge numbers of tests in order to evaluate your statistical significance of the findings, and that may make strong findings impracticable unless there are some individual causes in the data that are common enough to see them.  These and many other statistical issues are involved in multifactorial causal studies.  Even Big Data can't solve these problems, because everyone is still unique, effects are still small, and we still can't collect all unknowably-to-us relevant variables.

Researchers design the best studies they can, and peer review presumably assures that most of the studies that are funded are state-of-the-art.  But state-of-the-art epidemiology still can't assuredly overcome the many potential biases, or confounders or unknown risk factors, and so forth, that it needs to in order to reliably identify the causes of complex chronic diseases.  These aren't faults unless investigators or reporters of results don't fully acknowledge them and submit to the limitations they pose on the findings.  They are just realities.  When assumptions are built in to the study that force a particular kind of answer, that's a problem.

What causes disease X?  It seems to be a simple, well-posed question.  When the answer is a virus, or a bacterium, or a toxin, or smoking, or a gene with large effect, it is a well-posed question, with a single, or small number of identifiable answers.  But, when the process takes years or decades, and each factor has a small, hardly statistically identifiable effect, environments change, and everyone has a unique history of exposures anyway, the question is no longer well-posed, there's unlikely to be a single answer, and epidemiologists have a problem.  'Cause' may not even be an appropriate concept in this context.

On Exactitude in Science
Jorge Luis Borges, Collected Fictions, translated by Andrew Hurley.
...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.
—Suarez Miranda, Viajes de varones prudentes, Libro IV,Cap. XLV, Lerida, 1658 

Borges' short story is a parable, and not completely applicable to the Arts of Epidemiology and Genetics, not least because they have not yet attained Perfection with respect to complex disease.  Still, I think it nicely frames the problem: in our Empire, the Arts of Epidemiology and Genetics are trying to greatly reduce the size of the causal, predictive map, but the problem is that the 1:1 map of the Province is what's useful for explaining each individual's disease.  Yet, such exactitude does exactly nothing to allow us to generalize about causation, or predict disease.

So, should we eat eggs?  If we like them, eat them.


Dr. Mahmud al-Lunduni said...

Dear EvoDevoEco writers, there is a new paper that published by Lars Penke and Ruben Arslan that I think you will find interest in. You have written before on the topic of GWAS and GCTA and in general the genetics of IQ. I think you might agree with many of the reasons they give about the results of these studies so far and how the heritability of the IQ studies (twin studies) can be genetically explained. Link is here:

I would love to hear your opinion on this and your thoughts on whether it's a good review of recent population genetic advances.

Have a good one,

Anonymous said...

I have a question for the EcoDevoEvo bloggers, and it's about a recent online publication that attempts to fluff sociobiology and deride it's opponents within evolutionary biology and biological sciences.

The reason I ask you is that your writers have more experience with these figures than I and can tell me if what is being said here is true or not. Many people who've tried to use sociobiology and such to promote racist and sexist views have made the same arguments about people like Lewontin and Gould, and as a consequence I am taking it very seriously as per whether I should take what is said above at face value or not...

Anne Buchanan said...

Dr M al-L,

Thanks for alerting us to this. To me, the following seems a reasonable way to think about this, foregoing all the arguments about what IQ is, whether it's measurable, whether it's changeable, and so on.

Let's say we're trying to figure out the cause of some disease. Actually, let's say diabetes, which is common in Native Americans, e.g., and Mexican Americans, who have a history of admixture with Native Americans, and it looks as though there's a genetic component. We wouldn't have thought so 60 years ago, because diabetes was uncommon in these groups, and it would never have occurred to people that a major pandemic, with perhaps genetic underpinnings, was about to strike. There was no way to know this. (And, even if they all had had their genome on a chip back then, it couldn't have been predicted.)

Anyway, strike it did. Diabetes rates, as everyone knows, began to rise sharply after World War II, and lifetime risk for someone with high Amerindian admixture is currently very high, if the person lives in an environment that provokes the disease.

So, there was essentially no disease before whatever environmental change happened that 'caused' the epidemic (dietary changes, activity level changes, whatever it is), but it does run in families, and in populations, so there seems to be some genetic predisposition. Is this a genetic disease, or an environmental one? The same question can be asked of many diseases.

And, I think, of intelligence. We know that the brain shows great plasticity as it grows, responding to and to some extent being molded by environmental input. But that response, as everything, has a genetic basis.

People usually study either the genetic or the environmental epidemiology of diabetes. People who study the genetics do so for some reason -- that's what they know how to do, that's what interests them, etc. Likewise for those who choose to study environmental causation. But that choice doesn't make the other aspect less important, or less essential. Diabetes is still 100% genetic and 100% environmental.

People who decide to study intelligence are making the same kind of choice, but in addition, they make a political choice. Why intelligence? And then, why the genetics of intelligence? Or associated environmental factors? This is political, not scientific. Neither side is going to come out on top of this one and explain intelligence completely with genes or environment. Intelligence will always be environmental and genetic, just as diabetes will.

But, as with diabetes, because there are environmental risk factors, that's where intervention can be done. But again, that's a sociopolitical choice, not a scientific one.

As for the evolutionary history, without going into why we would think modern IQ tests have anything to do with what traits might have been useful millennia ago, suffice it to say that adaptability is more likely to be what was selected, if anything was. The very plasticity of the brain that is being increasingly documented and understood. That's not just true of how the brain works; adaptability is so evolutionarily useful and so ubiquitous that it is likely to have been among the earliest traits that arose, and is why species can adapt to changing environments, which is a key to survival.

Ken may still weigh in on the GWAS/GCTA aspects of this, and whatever else, but that's my 2 cents.

Anne Buchanan said...


We'll get back to your question. Been traveling, came home sick!

Anne Buchanan said...

Oh, forgot to add that Penke and Arslan do seem to have made their choice. Intelligence is going to be genetic. To me, that's ideology, not science.

Ken Weiss said...

The trait itself is culturally defined and may not have evolved for what is being measured per se. That adds some aspect of inferential complexity.

Clearly we know genes are involved in IQ, regardless of what the trait actually is. Tens or more genes are clearly able to damage intelligence severely when badly mutated. So the question is about 'normal' range intelligence.

But if large number of genes are involved, each person has a unique genotype, and prediction from that genotype is essentially useless, given the environmental and other factors. This seems to be clearly the case.

Then why are so many so obsessed with this? Is it a form of lascivious peering into peoples inherent worth? When we know life experience, nutrition, etc. etc. have great, if not preponderant, effects on societally relevant performance, what is to gain by estimating the fraction of variation due to large numbers of small-effect genes? Why not just measure achieved abilities and society allocates resources accordingly?

One reason to resist this obsession is the lesson of history. Racism and societal discrimination lie just beneath the surface, either in the minds of investigators themselves, or in the minds of those seeing these studies. There are, in our view, far more important problems to spend resources on. For example, doing something directly to help those with clear, truly genetic, intelligence problems.

Jim G said...

Back to the original post on "Should we eat eggs?"
Thank you, Anne, for a coherent stab at explaining in relatively simple terms the fundamental problem with establishing cause, even plausible association, whether it be genetics or environmental factors.
For me your key statement is "then we're stuck trying to figure out how to apply what we think we learned from the group to individuals who are probably as unalike as they are alike." Based on this statement and other musings, do you think that causal (epi') studies are best limited to studying potential risk factors that can be addressed at the population level (eg, improving air quality), rather than attempt to get from pop. level to individual level "prediction"? And if this is (largely) the case, does this (largely) rule out the value of trying to identify causal genetic variants because, of course, they can't be changed?

Ken Weiss said...

The problem is that it's so hard to know what works or not. Current issue of Nature has a commentary on drug and other kinds of trials, pointing out that many expensive, approved drugs are known only to work on a small fraction of patients....but the reason isn't known. The commentary is about the sorts of studies that might, miraculously, give answers. Epidemiology, even with huge samples, can't figure risks out.

Until somebody tries to rethink the question and/or the methods, we'll keep on with this expensive, ineffective wheel-spinning, I think.

Anne Buchanan said...


In part yes, what you say. But it's not always true. Epidemiology and genetics both do fine when the risk factor has a strong effect. So, it's not that it's never possible to predict at the individual level -- it's that when it takes a bunch of genes with minor effects, and then you throw in environment x gene interaction, and everyone's genome and environmental exposures are unique, it's a lot harder to retrodict (credibly figure out cause retrospectively), never mind predict. Especially given that future environments can never be predicted.