Tuesday, March 3, 2015

Hot under the (epidemiological) collar

Blogs like this are venues for expressing views on the current scene, in our case, related to genetics, evolution and a few other things we throw in.  If you express a view, unless it's just plain vanilla, you will irritate some readers.  In a sense, if you don't then there's no point in writing the blogpost.  In this case, we heavily criticized the recent NYTimes article reporting that the government has now backed off its claim that dietary cholesterol is a heart disease risk factor.  We try to be responsible, but that doesn't mean we have to expect agreement or to mince words!

We argued that the kind of herky-jerky yes/no results from huge long-term megastudies are so common that this shows the studies are rather useless and we think should be phased out, the results to date archived for anyone who wants to mine them, and the funds put to something that actually generates more trustworthy and stable results (if risks are stable enough to be estimated in these ways).

Well, this generated a very heated message from an old friend, a prominent genetic epidemiologist, who said that if we were listened to, it would lead to throwing the baby out with the bathwater.  He was upset because he said it was not the data but the analysis of these big epidemiological (environmental or genetic) studies that was at fault.  The studies are based essentially on correlation or regression models that assume everyone starts out as an equal blank slate, and whose individual risk is the basically additive total of the various risk factor exposures.  Your sex gives you a 'dose' of risk, which age adds to, then smoking history, diet, and so on.  Once all your risk-factor measures are toted up, your net risk can be estimated.  It is this general approach that regardless of sophisticated details in the statistical method, is not good at finding what we really should be looking for.

Our friend's idea is that, for example, dietary cholesterol may on average not be harmful, but there are likely some subsets of the population for which it is a risk.  Standard models may be convenient to apply, and everyone knows how to do that....but they miss the boat.  The key problem is basically that risk factors interact and searching for complex interaction is rarely done because it is very demanding in terms of sample size, sample structure, and analytic tools.  But there is no reason, for example, to think that males and females respond to a given risk factor equally per exposure dose.  So interactions ('epistasis' in relation to genome elements)  are given very light treatment and basically wished away.

My irate friend basically argued that what is needed is not an end to the data but to the methods. 
One recent paper I was referred to applies application of a method for finding high-risk subsets, and has references to earlier descriptions of the methodology is this: Int J Epidemiol. ("Subgroups at high risk for ischaemic heart disease:identification and validation in 67 000 individuals from the general population", Frikke-Schmidt R et al."), 2015 Feb;44(1):117-28 (but unfortunately it is not freely available).

How effectively this will find really different subgroups is open.  We know, for example, that males and females are not at equal risk and respond differently to other factors, as mentioned above.  We know genomic components interact.  We know that as you get older you get closer to various risks, such as heart disease or cancer, and that the same exposure has different impacts with age, and so on.

The idea both in public health and in medicine (and in evolutionary inference) of identifying causation as effectively as possible, and that includes identifying high risk individuals as early as possible, is of course absolutely the right thing.  There are many instances of genetic risk factors like some variants in the gene responsible for cystic fibrosis, or in the BRCA1 gene related to breast cancer, where the Who Cares? principle applies: the single factor's effect is so predictably strong that one intervenes regardless of the details of how much risk is associated or what other factors might slightly modify the outcome.  We know that the nominal risk factor (e.g., a mutation) doesn't always lead to the same degree of severity, but the variation isn't enough to cause doubt: Who Cares about the details?

But whether in general this sort of method of searching for statistical associations can identify risk earlier enough, where preventive measures might be more helpful, is unclear.  We know about age and sex and smoking and so on, and maybe we don't really gain much from adjusting the exact values.  Or maybe we would.  Likewise for the complex interactions among hundreds of contributing genomic factors.  But there the number of factors and the assumption of independence and so on need to be recognized as being as daunting as they are, relative to statistical risk analysis.

In my view, this is still walking in dreamland.  The major factors will be identified, perhaps with more precision, but we face a huge, open-ended kind of 'multibody' problem.  It's just not possible to analyze all the possible combinations of factors and their interactions to get combination-specific risk estimates.  First, risks are contingent, one factor's effect depending on what else is present, as discussed above.  Second, not all combinations will show up in the data, even in huge samples, so estimating risks if interactions must be accounted for will simply come up short, or perhaps better-put, with unknown or even unknowable precision.

Third, we know very well even from just the recent few decades, that incidence of outcomes changes hugely, yet the genomes basically don't, and while we may estimate the effects of the particular combinations of environments our sampled individuals were exposed to, we simply cannot, even in principle, know what environmental factors current individuals for whom we are being promised precise predictions, will be exposed to.  Yet heritabilities, with all their problems, clearly show that genomes contribute typically far less than half of all risk.  Environmentally and genomically, specific factors or variants come and go, and no two people are identical or even close to it.

A major issue is not just that there is no way, that is no way to know what risks are associated with most risk factors, much less interactions among them, even in principle, but we have no way of knowing the degree of precision of predictions.   This is why, among other things, even if increasing our understanding is a very noble pursuit, promising 'precision' in prediction based on genomes or, really, almost any other risk factors, is irresponsible.


James Goetz said...

I agree with your friend. I think it was Mark Twain who said "There are three kinds of lies: lies, damned lies, and [lame interpretations of] statistics." :-)

Curtis Yarvin said...

Hello, EcoDevoEvo, somewhat related to one of your recent twitter discussion with Kevin Mitchell on biological races, do you have any position on whether biological races can be argued to "exist" and be meaningful and useful categories? I think most geneticists and population geneticists would agree on that point. People like Razib Khan and Dienekes Pontikos have postulated that races within Homo Sapiens exist and can be delineated (fuzzily) using PCA data.

Ken Weiss said...

This is at best a semantic question and often a political one. What does 'race' mean, or 'exist', or 'categories'. Whether most geneticists agree is also irrelevant in the sense that science is not an election. And I think the exact opposite is true: most do not think such rigid categoricalization is particularly valid.

However, 'race' means very much in a sociocultural sense to many people and has in our particular society some--I say, some--value and correlation with genetic variants or with various risks.

As to categories, population genetic processes are basically continuous, gene flow is always occurring with the occasional exception of major geographic barriers, most populations had mandatory exogamy in traditional times, and so on. So deciding if there are 'boundaries' is a statistical and hence subjective decision even given the samples one has--and choice of samples is a truly notorious issue and often an excuse for categorizing. Fuzzy categories are not categories! Why have such labels? Why not choose samples properly for what you want to know and simply analyze populations comparatively?

One major invocation of the same old Big 5 categories includes Polynesia and leaves out India. If that makes sense to you, go right ahead. But often this is categorization for analytic convenience, when it's legitimate, or for racist reasons when it isn't. I don't see the need for categories.

I've written a recent article on this that is in Cold Spring Harbor Perspectives in Biology [Cold Spring Harb Perspect Biol. 2014 Jan 1;6(1). pii: a021238. doi: 10.1101/cshperspect.a021238], and in Genome Research [Genome Res. 2009 May;19(5):703-10. doi: 10.1101/gr.076539.108].

But this is a very contentious issue because of the use of evolution and genetics in racist and discriminatory ways, so that it's hardly possible to discuss it in a measured way. I'll just say that categories should not be created and treated as if they were natural as such, unless there is a reason and justification. That is not the case with human global variation.

Anne may respond on Twitter, and you can search MT here for 'race' for earlier comments we've made.

Anne Buchanan said...


The point I was trying to make on Twitter was that genetics can be used to cluster humans however you choose to cluster them, from the entire human race (as compared with any other species, say) to continents to countries, or villages, or extended families or families, down to single individuals. So, yes, genetics can certainly be used to confirm 'races,' but these are socially defined, and the clusters are a subjective decision on the part of the researcher.

James Goetz said...

Per population genetics clusters, I agree with Anne that any levels of categorization are discretionary and subject to debate. Also, the discretionary categories will have vague boundaries because of statistical clines.

Giancarlo said...

IN the meanwhile between our discussion of whether or not GWAS hits for IQ constitute or represent selection there has been this paper published: https://thewinnower.com/papers/intelligence-gwas-hits-selection-signal-or-population-structure-a-test-of-the-null-hypothesis

I think the applications of a new method reveals that some amount of genuine selection is probably involved in the differences we see. Thoughts?

Anonymous said...

All the education/ IQ association won't matter because there is now evidence of substantial gaps in education closing.


If you look at GCSEs in UK for example Africans, Bangladeshis and Caribeans have substantially caught up to erased previous gaps in Education. There is also no reversion whatsoever. Black Subsaharan Africans who should have reverted the most since they would have had the lowest amount of "educationl alleles" don't, none of the groups do. They technically do a bit better than whites now.

Two of the populations from that study posted above are within very high performing groups in these GCSEs Indian UK Telugus and Sri Lankan UK Tamils.

On top of all of that the method in the paper above is untested and unproven and those alleles replicate very badly barely replicate. They don't even show up for cognition in the same sample. There are also another 5-6 found after using the same and very similar samples and they don't follow the pattern so well. 2 of them are said to not even follow a population structure or selection pattern. You can check on openpsych.net forums where the author himself talks about what he finds.

All in all the evidence is not trustworthy on the GWAS front and the evidence in the real world is not in the hereditarian favor(decreasing as we speak).

Ken Weiss said...

Thanks Anon for your comments. Focus on genes 'for' IQ clearly goes beyond the data and often reflects a hunger for differences to exist based on racial or other categories.

Anonymous said...

thanks for accepting my comment and also sorry for bad grammar.

Oh and I want to add that 2 alleles so far are more common among Africans than Europeans and or East Asians out of the 9 and about 3-4 are close in frequency. In the 2nd correlation study some of the African groups had pretty much the same average polygenic frequency as HAN Chinese.

Sorry if I am getting too technical/aggressive with info. I don't mind if you do not publish the comment, just wanted to let you know that its even worse than what my previous comment mentioned. Much worse.

Also a lot of commentators are most likely laymen that barely understand or even look into what they are using as evidence for racialism. I wouldn't give them so much respect as you do. Then again thats me.

So yeah in case you didn't know whats actually been happening.

Good luck.

Ken Weiss said...

Thanks for the update. It would be one thing if the effort were 'simply' to identify individuals who could benefit from or needed more remedial education and those who were more able to deal with more advanced material. That would be like, say, tryouts for a basketball or baseball team: only the qualified get to play on the team.

But what the hunger is for is group differences, and this is manifestly not being used to target extra resources to the group purportedly deficient. This is basically a sociopolitical arena, and that is why there is so much interest and so much debate about the results.

It's not clear that this is going to change, despite the facts as they are known--and they include a large realm of unknowns that are too often conveniently ignored.

PharmacoGenomic is The Future said...

Ridiculous. That more people don't understand the massive benefit to indexing, sequencing, and incorporating genetic testing/genetics into medicine is absurd and tragic.
Are there risks or other issues that are going to arise? Sure. But when the Wright brothers invented the airplane, people didn't stop them and say " hey someone is gonna use your invention to drop bombs on people and we can't let that happen..."

Is that an absurd argument to make? Of course, but so is saying that geneticists and genetic research is going to cause egregious harm to people.

Simply, it's a technology, and just as technology can be used for good, it can be used for bad. It's a tool.

I for one, think the benefits of all this far outstrip the bad.

Ken Weiss said...

It's clear from your alias that we come at this from very different perspectives. There is a difference between being a Luddite and owning up to the problems and not promising beyond what can be delivered. At least, we have no vested interest in stating our position (do you?).

We said nothing about genetics causing 'egregious harm', so don't know where your accusation is coming from. But people should not be misled, and funds should be spent on targets (in this case, genetic ones) where causation is clear and focused and the research might actually achieve some good.