Indeed, we don't know the answers to what seem to be simple questions, such as whether eggs, or sugar, or fats, or carbs are good or bad for us. Why don't we know, people asked at dinner. I tried to quickly sum up my thinking on this, but did so very inadequately. Wish I could blame it on the wine. But I thought I'd expand here.
The gold standard for determining cause and effect is, of course, randomized controlled trials (RCT's). Randomly assign half your study population a treatment, and the other half a placebo and see whether the treatment has a statistically identifiable effect. The groups should differ systematically only on whether they receive the treatment or not, so in theory the only effect you will see is of the treatment. The best studies are double blind, that is, even those administering the treatment don't know which individuals are getting which.
This can work well when you're testing something like a quick-acting drug, something with strong, immediate effects, but when what you want to know is the effect of, say, eating eggs, or chocolate, it's more complicated. Yes, you could give one group a daily dose of chocolate for 6 weeks, say, or even 6 months, but placebo chocolate -- or eggs, or acupuncture, or marijuana -- has to be so disguised, or reduced to a pill containing what you decide is the single component of interest, that it no longer mimics the context in which it's actually eaten, and RCTs may well become much less informative.
And there are other issues. A daily dose for how long? And, how do you decide? And, does the effect depend on what else you eat with your chocolate, whether, say, you take it with coffee or wine? Or, could it be that the age you start eating chocolate determines its effect? That is, assumptions are necessarily built into your choice of study duration, age of subjects, how more or less reductionist your study is, and the contextual effects of the foods we eat, and so on.
|Red licorice; Wikipedia|
Unfortunately, dietary recall is notoriously unreliable. How often did you eat broccoli last year? Or even last month? Or in concoctions with many ingredients, like, say, soup, where you may not know what the ingredients were? And, how much? Maybe you tell me your serving was three spears -- but how much of the stem did you cut off, and does that matter? And do you add butter? Salt? Do you stirfry it?
And, if you're being asked about something to which there might be a certain amount of guilt attached (damned Puritans!), how honest are you going to be when you're asked how much chocolate you eat, or how much alcohol you drink (physicians routinely round up to another beer, drink or glass of wine when their patients' answer that question)?
And, of course, the further back you go, the less reliable the answers will be. But, what if it's the chocolate we ate before we were 10 that predisposes to an effect? Sure, that's unlikely, but it does make the point that we're still building assumptions into our study design. Necessarily.
And, no one eats chocolate, and only chocolate. Or hardly anyone. Or, only whatever component of a given food that's thought to have whatever effect we're after -- such as resveratrol in red wine, which may, or may not, protect against heart disease. Does it matter what else we eat with the offending food? Which of course could be anything. Or how the nutrient is packaged; whether in grapes or wine? How do we account for that? It's the same problem geneticists have with respect to genomic context; a gene variant may be deleterious in one context, and not in another.
Further, it's possible that people who eat chocolate are healthier, or less healthy, in general than people who don't. That is, there may be other factors that interfere with, or even explain, the direct effect of eating chocolate that you're trying to identify. Or, people who eat a lot of eggs also eat a lot of, I don't know, red licorice and it's the dye in the licorice that's really causing whatever effect you're measuring. You might take, say, (control for) weight or exercise or smoking into account, but it's unlikely you'll think to ask about red licorice consumption.
And so forth. There are many similarities between these kinds of epidemiological questions and genetics, including assumptions about the kinds of factors we'd like to be identifying. Ideally, we'd all love to find single genes or single nutrients, with large effects, large enough to drown out the inevitable noise from the rest of the diet, genome, exercise routine, etc. that is likely to influence the action of the single factor we're looking for. But these are rare, and aren't going to explain the bulk of the complex diseases most of us will eventually get.
In addition, epidemiologists and geneticists share similar confounders -- everyone's dietary history (or genome, if we're doing genetics) is unique, as is their genome (or, again for geneticists, history of environmental exposures), and thus everyone gets to their disease in their own way, with unique interactions and pathways. But we're collecting and making sense of data on a population of people. Indeed, we can't hope to understand the effect, or association, at all without looking at large groups of people, but then we're stuck trying to figure out how to apply what we think we learned from the group to individuals who are probably as unalike as they are alike.
In reality, given the number of possible interacting factors, and the general weakness of their effects, we would quickly get to a point where we've got to take so much into account that our analysis becomes untenable. There may be so many combinations to test that you can't replicate things enough to get a signal, or can't get nearly adequate samples. The larger the sample, the more heterogeneity you may have in your data (the signal-to-noise ratio may not increase with sample size). You have to correct for doing huge numbers of tests in order to evaluate your statistical significance of the findings, and that may make strong findings impracticable unless there are some individual causes in the data that are common enough to see them. These and many other statistical issues are involved in multifactorial causal studies. Even Big Data can't solve these problems, because everyone is still unique, effects are still small, and we still can't collect all unknowably-to-us relevant variables.
Researchers design the best studies they can, and peer review presumably assures that most of the studies that are funded are state-of-the-art. But state-of-the-art epidemiology still can't assuredly overcome the many potential biases, or confounders or unknown risk factors, and so forth, that it needs to in order to reliably identify the causes of complex chronic diseases. These aren't faults unless investigators or reporters of results don't fully acknowledge them and submit to the limitations they pose on the findings. They are just realities. When assumptions are built in to the study that force a particular kind of answer, that's a problem.
What causes disease X? It seems to be a simple, well-posed question. When the answer is a virus, or a bacterium, or a toxin, or smoking, or a gene with large effect, it is a well-posed question, with a single, or small number of identifiable answers. But, when the process takes years or decades, and each factor has a small, hardly statistically identifiable effect, environments change, and everyone has a unique history of exposures anyway, the question is no longer well-posed, there's unlikely to be a single answer, and epidemiologists have a problem. 'Cause' may not even be an appropriate concept in this context.
On Exactitude in Science
Jorge Luis Borges, Collected Fictions, translated by Andrew Hurley.
...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.
—Suarez Miranda, Viajes de varones prudentes, Libro IV,Cap. XLV, Lerida, 1658
Borges' short story is a parable, and not completely applicable to the Arts of Epidemiology and Genetics, not least because they have not yet attained Perfection with respect to complex disease. Still, I think it nicely frames the problem: in our Empire, the Arts of Epidemiology and Genetics are trying to greatly reduce the size of the causal, predictive map, but the problem is that the 1:1 map of the Province is what's useful for explaining each individual's disease. Yet, such exactitude does exactly nothing to allow us to generalize about causation, or predict disease.
So, should we eat eggs? If we like them, eat them.