As I've posted previously, I have been recovering from multiple bypass heart surgery. I had some angina, vague superficial chest pain that is a symptom of impending heart attack (or, more properly) coronary artery blockage. Fortunately, I am educated enough to recognize that this chest pain wasn't from my doing a new kind of exercise, I went to the doc and--to make a long story short--was sent right off to the hospital for heart bypass surgery (replacing clogged coronary arteries).
The radiography showed that at least some of my heart arteries were clogged--with whatever radio-opaque goop, presumably including cholesterol, and by whatever clogging mechanism. These causal facts are, as I understand things, complex and not completely understood, but the upshot was clear: surgery.....or else!
Now, the doctors would say that, given this evidence, I was at high risk of potentially lethal heart disease. I'm sure had the opportunity been there (and it may be in some future doctor's appointment), I will be chided--or scolded--for my bad diet, too much cholesterol, etc. It will be assumed that my voluntary lifestyle choices caused my blockage and my need for preventive artery replacement. Bad boy! Bad diet! Tsk, tsk, tsk....
But is that right, or might it be the opposite of a more serious truth?
What is bad behavior, health-wise?
I am 77. This is beyond the usual 76-ish life expectancy for US males (searching the ad-laden web to find data has become mainly a challenge to wade through the relentless commercialism). So, my lifestyle cannot be viewed as bad behavior in this respect. Indeed, I have already lived longer than half my birth cohort! So perhaps my diet and whatever else can, or should be viewed as having been protective. After all, I was symptom-free until after my expected lifespan.
It is very difficult to understand what 'risk' means in such regards. If my lifestyle led to my artery becoming clogged up, but it didn't happen until after I'd out-lived my average peer, can I legitimately think of that lifestyle as having been protective rather than risky? We all have to get some final disorder and some point, so is the absolute cause the relevant fact, or is it the relative? How can we decide such questions, if indeed they are meaningful ones that can even have meaningful answers?
If my behavior (for whatever reason, including just plain luck) led to my surviving in very good health, except for one weakest-link, then does that link suggest I've behaved badly, or does my overall great state of health suggest the opposite? More to the point, how can such questions even be answered in a meaningful sense? They seem meaningful....until you think a bit more carefully about them. . . . .
The philosophical quick-sand doesn't stop there. If my arterial clog would have led to a relatively quick death--not a 'premature' death at my age!--but saved me from some worse, more prolonged or debilitating fate, can we seriously view that as preventive or protective, with me now facing those dreadful fates?
When we have competing causes and inevitable mortality, we have to view the causes, and what causes them, in a rather different light. That doesn't mean there are consensus, much less easy, answers. But it may mean that rules for 'healthy' behavior are not so obvious as they seem to be.
Showing posts with label causation. Show all posts
Showing posts with label causation. Show all posts
Friday, September 6, 2019
Sunday, August 11, 2019
Who, me?? Why did I clog my 'widow maker'? [on medical cause and effect...and how we know, if we know]
By
Ken Weiss
So, having just returned from and now recuperating from coronary bypass surgery, I have to ask the 'complexity' question--a very personal one in this case: Why me? I've lived a physically and physiologically vigorous life. My diet may not have always been the very best for cardio health (though, for reasons we've discussed here many times over the years, it's not completely clear what that diet should actually be), but it wasn't particularly bad, given what's thought these days to be a "healthy" diet.
The surgeon who remodeled me at Penn State's fine medical complex in Hershey, said he knows the risk factors in a population but couldn't know why any given individual developed clogged coronary arteries, nor which artery would be affected. His job was to replace, not explain them, one might say. So, he didn't even attempt to tell me why I was now in need of bypass surgery.
As he said, there are five known major risk factors: obesity, unhealthy diet, high cholesterol, genetic predisposition, and smoking. Yes, having diabetes and high blood pressure are risk factors as well, but correlated enough with obesity that perhaps he considers these two conditions to be side effects of obesity. In any case, these risk factors have been determined by looking at associations between possible causal variables and heart disease in populations. Resulting statistics describe the population, not identifying specific high-risk individuals within it. Indeed, some people with heart disease have all the risk factors, some have a combination of a few, and some have none. And even then, it's not possible to say which was the cause of the disease in most individual cases.
I have none of these risk factors -- though, I could make up a story. I smoked when I was young, my father had a pacemaker when he was old, but he lived to 99. Still, I have done vigorous exercise my whole life, thinking that was my "get cancer program" since it meant, I thought, that I would go not out with a coronary. What caused my artery to clog? Indeed, why in my case was the clog in an unstentable artery location, and hence required major surgery?
This brings up, again, the question of whether one's individual risk can even be known with any sort of 'precision'. Or is that an illusion? Is it a culpably false promise made by the calculating Dr Collins at NIH, to get NIH funding, rather than to give the public a realistic understanding of what we know and what we can hope to know based on research investment of the type he favors?
How, based on current methods of science, can it really be individual? What kind of information would that require, just considering actual, i.e., past effects, assuming they could really be ascertained to any reasonable measurement standard? What would you need to consider? Diet, exercise, personality (temperament, for example). Climate? Profession? The effects of war, drought, epidemic? Genes, even?
Of course, the gross and inexcusable BS of promising 'precision genomic medicine' based on very costly, open-ended genomic (and other 'omic) data collection enterprises is culpable. It is an often openly acknowledged way of getting, and keeping, mega-funding without having any real ideas (and understandable since medical schools culpably don't pay faculty salaries or basic research costs as part of their jobs). Focused science has chances of finding things out; blind data enumeration, far less so--and what we've done of that so far shows this quite clearly.
We often say 'family history', and clinically this may be the most useful piece of predictive information, but what does that actually explain? Did Dad or Uncle Jane have the same trait because of genes, or because of their shared family habits and lifestyles? How could you really tell? A surgeon need not care, as their job is to fix the clogged pipes, and if heart disease runs in a family the physician will treat his or her patient as high risk. Still, to prevent this sort of thing, we need to know what causes it.
This is a central biomedical question! It is hard enough to know, much less accurately measure, all factors in life that might in this or that way be a 'risk' factor for a given disease, like clogged coronary plumbing. Is it a delusion to think we could identify, much less measure all the factors? If, as seems obvious, there isn't just a single factor, and probably everyone's exposure set is different (and their effects need not be 'additive'), how on earth can we even know how well we are measuring, or ascertaining, such factors?
And, if we can do this, it only applies directly to current cases and their past lifestyle exposures. But what we would like to do, for individuals and for public health, is to predict the future to lower risks. However, there is no way, not even in principle, no reasonable chance of knowing what future exposures will be, not even for populations. Diets and lifestyles change in ways we cannot predict, nor can we predict major future events--climate, war, pestilence, food types and availability, etc., that would be highly relevant.
So what should we do with our understanding of these unpredictable factors? Perhaps just level with patients and the public, and stop using the public to endow a particular, and particularly costly, part of the university research empire. Maybe a return to focused, hypothesis--based research--actual science--in my view.
The surgeon who remodeled me at Penn State's fine medical complex in Hershey, said he knows the risk factors in a population but couldn't know why any given individual developed clogged coronary arteries, nor which artery would be affected. His job was to replace, not explain them, one might say. So, he didn't even attempt to tell me why I was now in need of bypass surgery.
As he said, there are five known major risk factors: obesity, unhealthy diet, high cholesterol, genetic predisposition, and smoking. Yes, having diabetes and high blood pressure are risk factors as well, but correlated enough with obesity that perhaps he considers these two conditions to be side effects of obesity. In any case, these risk factors have been determined by looking at associations between possible causal variables and heart disease in populations. Resulting statistics describe the population, not identifying specific high-risk individuals within it. Indeed, some people with heart disease have all the risk factors, some have a combination of a few, and some have none. And even then, it's not possible to say which was the cause of the disease in most individual cases.
I have none of these risk factors -- though, I could make up a story. I smoked when I was young, my father had a pacemaker when he was old, but he lived to 99. Still, I have done vigorous exercise my whole life, thinking that was my "get cancer program" since it meant, I thought, that I would go not out with a coronary. What caused my artery to clog? Indeed, why in my case was the clog in an unstentable artery location, and hence required major surgery?
This brings up, again, the question of whether one's individual risk can even be known with any sort of 'precision'. Or is that an illusion? Is it a culpably false promise made by the calculating Dr Collins at NIH, to get NIH funding, rather than to give the public a realistic understanding of what we know and what we can hope to know based on research investment of the type he favors?
How, based on current methods of science, can it really be individual? What kind of information would that require, just considering actual, i.e., past effects, assuming they could really be ascertained to any reasonable measurement standard? What would you need to consider? Diet, exercise, personality (temperament, for example). Climate? Profession? The effects of war, drought, epidemic? Genes, even?
Of course, the gross and inexcusable BS of promising 'precision genomic medicine' based on very costly, open-ended genomic (and other 'omic) data collection enterprises is culpable. It is an often openly acknowledged way of getting, and keeping, mega-funding without having any real ideas (and understandable since medical schools culpably don't pay faculty salaries or basic research costs as part of their jobs). Focused science has chances of finding things out; blind data enumeration, far less so--and what we've done of that so far shows this quite clearly.
We often say 'family history', and clinically this may be the most useful piece of predictive information, but what does that actually explain? Did Dad or Uncle Jane have the same trait because of genes, or because of their shared family habits and lifestyles? How could you really tell? A surgeon need not care, as their job is to fix the clogged pipes, and if heart disease runs in a family the physician will treat his or her patient as high risk. Still, to prevent this sort of thing, we need to know what causes it.
This is a central biomedical question! It is hard enough to know, much less accurately measure, all factors in life that might in this or that way be a 'risk' factor for a given disease, like clogged coronary plumbing. Is it a delusion to think we could identify, much less measure all the factors? If, as seems obvious, there isn't just a single factor, and probably everyone's exposure set is different (and their effects need not be 'additive'), how on earth can we even know how well we are measuring, or ascertaining, such factors?
And, if we can do this, it only applies directly to current cases and their past lifestyle exposures. But what we would like to do, for individuals and for public health, is to predict the future to lower risks. However, there is no way, not even in principle, no reasonable chance of knowing what future exposures will be, not even for populations. Diets and lifestyles change in ways we cannot predict, nor can we predict major future events--climate, war, pestilence, food types and availability, etc., that would be highly relevant.
So what should we do with our understanding of these unpredictable factors? Perhaps just level with patients and the public, and stop using the public to endow a particular, and particularly costly, part of the university research empire. Maybe a return to focused, hypothesis--based research--actual science--in my view.
Friday, May 10, 2019
The music of life--more than a collection of notes
By
Ken Weiss
My composer friend wants to be quite modern about creating beautiful music. He doesn't like to use computer programs for composing but he has devised another 'modern' way to compose, given that, in writing a piece, he often changes his mind. Scratching out notes on paper to replace them with 'better' ones makes for a real mess on the working pages, and he'd then have to transcribe his work onto new pages, and that in itself introduces room for mistakes. So he had an idea.
He purchased a set of notes and musical symbols, printed individually on a kind of flexible plastic. Copies of each possible note and notation element were in boxes in a little tray. As he composed, he merely took each required note from its place in the tray, and used its static electricity to place it on a page with printed staff-lines. If he changed his mind, it was easy to remove or replace a given note, and put it back in its box in the tray without generating an inky mess on the page and having to keep starting over to make his work-in-progress legible.
But there turned out to be a serious, indeed even tragic, problem. He liked working in his studio, right in front of a window giving him an inspiring view of his garden. But, after days of work composing a comparably ethereal and beautiful piece, a gust blew through the window, riffled the pages, and shook all the notes off the page and onto the table! What a scattered mess! And what a heartbreaking loss of all that work!
Of course, you could say that the composition with all its beauty was in some sense still there, right before him: all the required notes were indeed still there--every one. But they were in a pile, no longer with any order from which he could reconstruct the composition just by picking the notes up and placing them back on the page. So, it was literally all there---but none of what mattered was!
As my composer friend told me this story, it occurred to me that this was analogous to the 'pile' of DNA letters (As, Cs, Gs, and Ts) that is found by sequencing people with and without some trait, like a disease. The letters differ greatly among individuals with the 'same' trait, because they don't have the trait for the same genetic reason. And the sampled individuals' genomes vary in literally countless ways that have nothing to do with the disease. Unlike the score, the 'letters' are still in their original order, but genes don't make a score as far as we are concerned because, unlike an orchestra, we don't know how to 'play' them!
In a sense, each person we see who is playing the same tune, so to speak, is doing so from a different score. Some shared notes may be involved, but they are all jumbled up with shared, and not-shared, notes that have nothing to do with the tune.
And yet we are widely promised, and widely being trephined to pay for, the idea that looking through the jumble of genetic 'notes' we can predict just about anything you can name about each individual's traits.
Indeed, unlike the composer's problem, there are all sorts of notes that are not even visible to us (they are called 'somatic mutations'). We yearn for a health-giving genomic 'tune', which is a very natural way to feel, but we are unable (or, at least, unwilling) to face the music of genomic reality.
And, of course, this mega-scale 'omics 'research' is all justified with great vigor by NIH, as if it is on the very verge of discovering fundamental findings that will lead to miraculous cures, indeed cures for 'All of us'. At what point is it justified to refer to it as a kind of culpable fraud, a public con job?
By our bigger, bigger, bigger approach, we have entrenched 'composers' trying to read scores that are to a great extent unreadable in the way being attempted. We are so intense at this, like rows of monks transcribing sacred manuscripts in a remote monastery, that we are committed to something that we basically have every legitimate good reason to know isn't the way things are.
He purchased a set of notes and musical symbols, printed individually on a kind of flexible plastic. Copies of each possible note and notation element were in boxes in a little tray. As he composed, he merely took each required note from its place in the tray, and used its static electricity to place it on a page with printed staff-lines. If he changed his mind, it was easy to remove or replace a given note, and put it back in its box in the tray without generating an inky mess on the page and having to keep starting over to make his work-in-progress legible.
But there turned out to be a serious, indeed even tragic, problem. He liked working in his studio, right in front of a window giving him an inspiring view of his garden. But, after days of work composing a comparably ethereal and beautiful piece, a gust blew through the window, riffled the pages, and shook all the notes off the page and onto the table! What a scattered mess! And what a heartbreaking loss of all that work!
Of course, you could say that the composition with all its beauty was in some sense still there, right before him: all the required notes were indeed still there--every one. But they were in a pile, no longer with any order from which he could reconstruct the composition just by picking the notes up and placing them back on the page. So, it was literally all there---but none of what mattered was!
As my composer friend told me this story, it occurred to me that this was analogous to the 'pile' of DNA letters (As, Cs, Gs, and Ts) that is found by sequencing people with and without some trait, like a disease. The letters differ greatly among individuals with the 'same' trait, because they don't have the trait for the same genetic reason. And the sampled individuals' genomes vary in literally countless ways that have nothing to do with the disease. Unlike the score, the 'letters' are still in their original order, but genes don't make a score as far as we are concerned because, unlike an orchestra, we don't know how to 'play' them!
In a sense, each person we see who is playing the same tune, so to speak, is doing so from a different score. Some shared notes may be involved, but they are all jumbled up with shared, and not-shared, notes that have nothing to do with the tune.
And yet we are widely promised, and widely being trephined to pay for, the idea that looking through the jumble of genetic 'notes' we can predict just about anything you can name about each individual's traits.
Indeed, unlike the composer's problem, there are all sorts of notes that are not even visible to us (they are called 'somatic mutations'). We yearn for a health-giving genomic 'tune', which is a very natural way to feel, but we are unable (or, at least, unwilling) to face the music of genomic reality.
And, of course, this mega-scale 'omics 'research' is all justified with great vigor by NIH, as if it is on the very verge of discovering fundamental findings that will lead to miraculous cures, indeed cures for 'All of us'. At what point is it justified to refer to it as a kind of culpable fraud, a public con job?
By our bigger, bigger, bigger approach, we have entrenched 'composers' trying to read scores that are to a great extent unreadable in the way being attempted. We are so intense at this, like rows of monks transcribing sacred manuscripts in a remote monastery, that we are committed to something that we basically have every legitimate good reason to know isn't the way things are.
Friday, October 19, 2018
Nyah, nyah! My study's bigger than your study!!
By
Ken Weiss
It looks like a food-fight at the Precision Corral! Maybe the Big Data era is over! That's because what we really seem to need (of course) is even bigger GWAS or other sorts of enumerative (or EnumerOmics studies, because then (and only then) will we really realize how complex traits are caused, so that we can produce 'precision' genomic medicine to cure all that ails us. After all, there no such thing as enough 'data' or a big (and open-ended) enough study. Of course, because so much knowledge....er, money....is at stake, such a food-fight is not just children in a sand box, but purported adults, scientists even, wanting more money from you, the taxpayer (what else?). The contest will never end on its own. It will have to be ended from the outside, in one way or another, because it is predatory: it takes resources away from what might be focused, limited, but actually successful problem-solving research.
The idea that we need larger and larger GWAS studies, not to mention almost any other kind of 'omics enumerative study, reflects the deeper idea that we have no idea what to do with what we've got. The easiest word to say is "more", because that keeps the fiscal flood gates open. Just as preachers keep the plate full by promising redemption in the future--a future that, like an oasis to desert trekkers, can be a mirage never reached, scientists are modern preachers who've learned the tricks of the trade. And, of course, since each group wants its flood gates to stay wide open it must resist any even faint suggestion that somebody else's gates might open wider.
There is a kind of desperate defense, as well as food fight, over the situation. This, at least, is one way to view a recent exchange between an assertion by Boyle et al. (Cell 169(7):1177-86, 2018**) that some few key genes perhaps with rare alleles scattered across the genome are the 'core' genes responsible for complex diseases, but that lesser often indirect or incidental genes across the genome provide other pathways to affect a trait, and are detected in GWAS. If a focus on this model were to take place, it might threaten the gravy train of more traditional, more mindless, Big Data chasing. As a plea to avoid that is Wray et al.'s falsely polite spitball in return (Cell 173:1573-80, 2018**) urging that things really are spread all over the genome, differently so in everyone. Thus, of course, the really true answer is some statistical prediction method, after we have more and even larger studies.
Could it be, possibly, that this is at its root merely a defense of large statistical data bases and Big Data per se, expressed as if it were a legitimate debate about biological causation? Could it be that for vested interests, if you have a well-funded hammer everything can be presented as if it were a nail (or, rather, a bucket's worth of nails, scattered all over the place)?
Am I being snide here?
Yes, of course. I'm not the Ultimate Authority to adjudicate about who's right, or what metric to use, or how many genome sites, in which individuals, can dance on the head of the same 'omics trait. But I'm not just being snide. One reason is that both the Boyle and Wray papers are right, as I'll explain.
The arguments seem in essence to assert that complex traits are due either to many genetic variants strewn across the genome, or to a few rare larger-effect alleles here and there complemented by nearby variants that may involve indirect pathways to the 'main' genes, and that these are scattered across the genome ('omnigenic'). Or that we can tinker with GWAS results and various technical measurements from them to get the real truth?
We are chasing our tails these days in an endless-seeming circle to see who can do the biggest and most detailed enumerative study, to find the most and tiniest of effects, with the most open-ended largesse, while Rome burns. Rome, here, are the victims of the many diseases which might be studied with actual positive therapeutic results by more, focused, if smaller, studies. Or, in many cases, by a real effort at revealing and ameliorating the lifestyle exposures that typically, one might say overwhelmingly, are responsible for common diseases.
If, sadly, it were to turn out that there is no more integrative way, other than add-'em-up, by which genetic variants cause or predispose to disease, then at least we should know that and spend our research resources elsewhere, where they might do good for someone other than universities. I actually happen to think that life is more integratively orderly than its effects typically being enumeratively additive, and that more thoughtful approaches, indeed reflecting findings of the decades of GWAS data, might lead to better understanding of complex traits. But this seemingly can't be achieved by just sampling extensively enough to estimate 'interactions'. The interactions may, and I think probably, have higher-level structure that can be addressed in other ways.
But if not, if these traits are as they seem, and there is no such simplifying understanding to be had, then let's come clean to the public and invest our resources in other ways to improve our lives before these additive trivia add up to our ends when those supporting the work tire of exaggerated promises.
Our scientific system, that we collectively let grow like mushrooms because it was good for our self interests, puts us in a situation where we must sing for our supper (often literally, if investigators' salary depends on grants). No one can be surprised at the cacophony of top-of-the-voice arias ("Me-me-meeeee!"). Human systems can't be perfect, but they can be perfected. At some point, perhaps we'll start doing that. If it happens, it will only partly reflect the particular scientific issues at issue, because it's mainly about the underlying system itself.
**NOTE: We provide links to sources, but, yep, they are paywalled --unless you just want to see the abstract or have access to an academic library. If you have the looney idea that as a taxpayer you have already paid for this research so private selling of its results should be illegal--sorry!--that's not our society.
The idea that we need larger and larger GWAS studies, not to mention almost any other kind of 'omics enumerative study, reflects the deeper idea that we have no idea what to do with what we've got. The easiest word to say is "more", because that keeps the fiscal flood gates open. Just as preachers keep the plate full by promising redemption in the future--a future that, like an oasis to desert trekkers, can be a mirage never reached, scientists are modern preachers who've learned the tricks of the trade. And, of course, since each group wants its flood gates to stay wide open it must resist any even faint suggestion that somebody else's gates might open wider.
There is a kind of desperate defense, as well as food fight, over the situation. This, at least, is one way to view a recent exchange between an assertion by Boyle et al. (Cell 169(7):1177-86, 2018**) that some few key genes perhaps with rare alleles scattered across the genome are the 'core' genes responsible for complex diseases, but that lesser often indirect or incidental genes across the genome provide other pathways to affect a trait, and are detected in GWAS. If a focus on this model were to take place, it might threaten the gravy train of more traditional, more mindless, Big Data chasing. As a plea to avoid that is Wray et al.'s falsely polite spitball in return (Cell 173:1573-80, 2018**) urging that things really are spread all over the genome, differently so in everyone. Thus, of course, the really true answer is some statistical prediction method, after we have more and even larger studies.
Could it be, possibly, that this is at its root merely a defense of large statistical data bases and Big Data per se, expressed as if it were a legitimate debate about biological causation? Could it be that for vested interests, if you have a well-funded hammer everything can be presented as if it were a nail (or, rather, a bucket's worth of nails, scattered all over the place)?
Am I being snide here?
Yes, of course. I'm not the Ultimate Authority to adjudicate about who's right, or what metric to use, or how many genome sites, in which individuals, can dance on the head of the same 'omics trait. But I'm not just being snide. One reason is that both the Boyle and Wray papers are right, as I'll explain.
The arguments seem in essence to assert that complex traits are due either to many genetic variants strewn across the genome, or to a few rare larger-effect alleles here and there complemented by nearby variants that may involve indirect pathways to the 'main' genes, and that these are scattered across the genome ('omnigenic'). Or that we can tinker with GWAS results and various technical measurements from them to get the real truth?
We are chasing our tails these days in an endless-seeming circle to see who can do the biggest and most detailed enumerative study, to find the most and tiniest of effects, with the most open-ended largesse, while Rome burns. Rome, here, are the victims of the many diseases which might be studied with actual positive therapeutic results by more, focused, if smaller, studies. Or, in many cases, by a real effort at revealing and ameliorating the lifestyle exposures that typically, one might say overwhelmingly, are responsible for common diseases.
If, sadly, it were to turn out that there is no more integrative way, other than add-'em-up, by which genetic variants cause or predispose to disease, then at least we should know that and spend our research resources elsewhere, where they might do good for someone other than universities. I actually happen to think that life is more integratively orderly than its effects typically being enumeratively additive, and that more thoughtful approaches, indeed reflecting findings of the decades of GWAS data, might lead to better understanding of complex traits. But this seemingly can't be achieved by just sampling extensively enough to estimate 'interactions'. The interactions may, and I think probably, have higher-level structure that can be addressed in other ways.
But if not, if these traits are as they seem, and there is no such simplifying understanding to be had, then let's come clean to the public and invest our resources in other ways to improve our lives before these additive trivia add up to our ends when those supporting the work tire of exaggerated promises.
Our scientific system, that we collectively let grow like mushrooms because it was good for our self interests, puts us in a situation where we must sing for our supper (often literally, if investigators' salary depends on grants). No one can be surprised at the cacophony of top-of-the-voice arias ("Me-me-meeeee!"). Human systems can't be perfect, but they can be perfected. At some point, perhaps we'll start doing that. If it happens, it will only partly reflect the particular scientific issues at issue, because it's mainly about the underlying system itself.
**NOTE: We provide links to sources, but, yep, they are paywalled --unless you just want to see the abstract or have access to an academic library. If you have the looney idea that as a taxpayer you have already paid for this research so private selling of its results should be illegal--sorry!--that's not our society.
Tuesday, October 16, 2018
Where has all the thinking gone....long time passing?
By
Ken Weiss
Where did we get the idea that our entire nature, not just our embryological development, but everything else, was pre-programmed by our genome? After all, the very essence of Homo sapiens compared to all other species, is that we use culture--language, tools, etc.--to do our business rather than just our physical biology. In a serious sense, we evolved to be free of our bodies, our genes made us freer from our genes than most if not all other species! And we evolved to live long enough to learn--language, technology, etc.--in order to live our thus-long lives.
Yet isn't an assumption of pre-programming the only assumption by which anyone could legitimately promise 'precision' genomic medicine? Of course, Mendel's work, adopted by human geneticists over a century ago, allowed great progress in understanding how genes lead at least to the simpler of our traits, with discrete (yes/no) manifestations, traits that do include many diseases that really, perhaps surprisingly, do behave in Mendelian fashion, and for which concepts like dominance and recessiveness been applied and that, sometimes, at least approximately hold up to closer scrutiny.
Even 100 years ago, agricultural and other geneticists who could do experiments, largely confirmed the extension of Mendel to continuously varying traits, like blood pressure or height. They reasoned that many genes (whatever they were, which was unknown at the time) contributed individually small effects. If each gene had two states in the usual Aa/AA/aa classroom example sense, but there were countless such genes, their joint action could approximate continuously varying traits whose measure was, say, the number of A alleles in an individual. This view was also consistent with the observed correlation of trait measure with kinship-degree among relatives. This history has been thoroughly documented. But there are some bits, important bits, missing, especially when it comes to the fervor for Big Data 'omics analysis of human diseases and other traits. In essence, we are still, a century later, conceptual prisoners of Mendel.
'Omics over the top: key questions generally ignored
Let us take GWAS (genomewide association studies) on their face value. GWAS find countless 'hits', sites of whatever sort across the genome whose variation affects variation in WhateverTrait you choose to map (everything simply must be 'genomic' or some other 'omic, no?). WhateverTrait varies because every subject in your study has a different combination of contributing alleles. Somewhat resembling classical Mendelian recessiveness, contributing alleles are found in cases as well as controls (or across the measured range of quantitative traits like stature or blood pressure), where the measured trait reflects how many A's one has: WhateverTrait is essentially the sum of A's in 'cases', which may be interpreted as a risk--some sort of 'probability' rather than certainty--of having been affected or of having the measured trait value.
We usually treat risk as a 'probability,' a single value, p, that applies to everyone with the same genotype. Here, of course, no two subjects have exactly the same genotype so some sort of aggregate risk score, adding up each person's 'hits', is assigned a p. This, however, tacitly assumes something like that each site contributes some fixed risk or 'probability' of affection. But this treats these values as if they were essential to the site, each thus acting as a parameter of risk. That is, sites are treated as a kind of fixed value or, one might say 'force', relative to the trait measure in question.
One obvious and serious issue is that these are necessarily estimated from past data, that is, by induction from samples. Not only is there sampling variation that usually is only crudely estimated by some standard statistical variation-related measure, but we know that the picture will be at least somewhat different in any other sample we might have chosen, not to mention other populations; and those who are actually candid about what they are doing know very well that the same people living in a different place or time would have different risks for the same trait.
No study is perfect, so we use some conveniently assumed well-behaved regression/correction adjustments to account for the statistical 'noise' due to factors like age, sex, and unmeasured environmental effects. Much worse than these issues, there are clearly factors of imprecision, and the obvious major one, taboo even to think about much less to mention, that relevant future factors (mutations, environments, lifestyles) are unknowable, even in principle. So what we really do, are forced to do, is extend what the past was like to the assumed future. But besides this, we don't count somatic changes (mutation arising in body tissues during life, that were not inherited), because they'd mess up our assertions of 'precision', and we can't measure them well in any case (so just shut one's eyes and pretend the ghost isn't in the house!).
All of these together mean that we are estimating risks from imperfect existing samples and past life-experience, but treating them as underlying parameters so that we can extend them to future samples. What that does is equate induction with deduction, assuming the past is rigorously parametric and will be the same in the future; but this is simply scientifically and epistemologically wrong, no matter how inconvenient it is to acknowledge this. Mutations, genotypes, and environments of the future are simply unpredictable, even in principle.
None of this is a secret, or new discovery, in any way. What it is, is inconvenient truth. These things should have been enough, by themselves and without badgering investigators about environmental factors that (we know very well, typically predominate) prevent all the NIH's precision promises from being accurate ('precise'), or even to a knowable degree. Yet this 'precision' sloganeering is being, sheepishly, aped all over the country by all sorts of groups who don't think for themselves and/or who go along lest they get left off the funding gravy train. This is the 'omics fad. If you think I am being too cynical, just look at what's being said, done, published, and claimed.
These are, to me, deep flaws in the way the GWAS and other 'omics industries, very well-heeled, are operating these days, to pick the public's pocket (pharma may, slowly, be awakening-- Lancet editorial, "UK life science research: time to burst the biomedical bubble," Lancet 392:187, 2018). But scientists need jobs and salaries, and if we put people in a position where they have to sing in this way for their supper, what else can you expect of them?
Unfortunately, there are much more serious problems with the science, and they have to do with the point-cause thinking on which all of this is based.
Even a point-cause must act through some process
By far most of the traits, disease or otherwise, that are being GWAS'ed and 'omicked these days, at substantial public expense, are treated as if the mapped 'causes' are point causes. If there are n causes, and a person has an unlucky set m out of many possible sets, one adds 'em up and predicts that person will have the target trait. And there is much that is ignored, assumed, or wishfully hidden in this 'will'. It is not clear how many authors treat it, tacitly, as a probability vs a certainty, because no two people in a sample have the same genotype and all we know is that they are 'affected' or 'unaffected'.
The genomics industry promises, essentially, that from conception onward, your DNA sequence will predict your diseases, even if only in the form of some 'risk'; the latter is usually a probability and despite the guise of 'precision' it can, of course, be adjusted as we learn more. For example, it must be adjusted for age, and usually other variables. Thus, we need ever larger and more and longer-lasting samples. This alone should steer people away from being profiteered by DNA testing companies. But that snipe aside, what does this risk or 'probability' actually mean?
Among other things, those candid enough to admit it know that environmental and lifestyle factors have a role, interacting with the genotype if not, usually, overwhelming it, meaning, for example, that the genotype only confers some, often modest, risk probability, the actual risk much more affected by lifestyle factors, most of which are not measured or not measured with accuracy, or not even yet identified. And usually there is some aspect that relates to age, or some assumption about what 'lifetime' risk means. Whose lifetime?
Aspects of such a 'probability'
There are interesting issues, longstanding issues, about these probabilities, even if we assume they have some kind of meaning. Why do so many important diseases, like cancers, only arise at some advanced age? How can a genomic 'risk' be so delayed and so different among people? Why are mice, with very similar genotypes to humans (which is why we do experiments on them to learn about human disease) only live to 3 while we live to our 70s and beyond?
Richard Peto, raised some of these questions many decades ago. But they were never really addressed, even in an era when NIH et al were spending much money on 'aging' research including studies of lifespan. There were generic theories that suggested from an evolutionary theory why some diseases were deferred to later ages (it is called 'negative pleiotropy'), but nobody tried seriously to explain why that was from a molecular/genetic point of view. Why do mice only live only 3 years, anyway? And so on.
These are old questions and very deep ones but they have not been answered and, generally, are conveniently forgotten--because, one might argue, they are inconvenient.
If a GWAS score increases the risk of a disease, that has a long delayed onset pattern, often striking late in life, and highly variable among individuals or over time, what sort of 'cause' is that genotype? What is it that takes decades for the genes to affect the person? There are a number of plausible answers, but they get very little attention at least in part because that stands in the way of the vested interests of entrenched too-big-to-kill Big Data faddish 'research' that demands instant promises to the public it is trephining for support. If the major reason is lifestyle factors, then the very delayed onset should be taken as persuasive evidence that the genotype is, in fact, by itself not a very powerful predictor.
Why would the additive effects of some combination of GWAS hits lead to disease risk? That is, in our complex nature why would each gene's effects be independent of each other contributor? In fact, mapping studies usually show evidence that other things, such as interactions are important--but they are at present almost impossibly complex to be understood.
Does each combination of genome-wide variants have a separate age-onset pattern, and if not, why not? And if so, how does the age effect work (especially if not due to person-years of exposure to the truly determining factors of lifestyle)? If such factors are at play, how can we really know, since we never see the same genotype twice? How can we assume that the time-relationship with each suspect genetic variant will be similar among samples or in the future? Is the disease due to post-natal somatic mutation, in which case why make predictions based on the purported constitutive genotypes of GWAS samples?
Obviously, if long delayed onset patterns are due not to genetic but to lifestyle exposures interacting with genotypes, then perhaps lifestyle exposures should be the health-related target, not exotic genomic interventions. Of course, the value of genome-based prediction clearly depends on environmental/lifestyle exposures, and the future of these exposure is obviously unknowable (as we clearly do know from seeing how unpredictable past exposures have affected today's disease patterns).
The point here is that our reliance on genotypes is a very convenient way of keeping busy, bringing in the salaries, but not facing up to the much more challenging issues that the easy one (run lots of data through DNA sequencers) can't address. I did not invent these points, and it is hard to believe that at least the more capable and less me-too scientists don't clearly know them, if quietly. Indeed, I know this from direct experience. Yes, scientists are fallible, vain, and we're only human. But of all human endeavors, science should be based on honesty because we have to rely on trust of each other's work.
The scientific problems are profound and not easily solved, and not soluble in a hurry. But much of the problem comes from the funding and careerist system that shackles us. This is the deeper explanation in many ways. The paint on the House of Science is the science itself, but it is the House that supports that paint that is the real problem.
A civically responsible science community, and its governmental supporters, should be freed from the iron chains of relentless Big Data for their survival, and start thinking, seriously, about the questions that their very efforts over the past 20 years, on trait after trait, in population after population, and yes, with Big Data, have clearly revealed.
Yet isn't an assumption of pre-programming the only assumption by which anyone could legitimately promise 'precision' genomic medicine? Of course, Mendel's work, adopted by human geneticists over a century ago, allowed great progress in understanding how genes lead at least to the simpler of our traits, with discrete (yes/no) manifestations, traits that do include many diseases that really, perhaps surprisingly, do behave in Mendelian fashion, and for which concepts like dominance and recessiveness been applied and that, sometimes, at least approximately hold up to closer scrutiny.
Even 100 years ago, agricultural and other geneticists who could do experiments, largely confirmed the extension of Mendel to continuously varying traits, like blood pressure or height. They reasoned that many genes (whatever they were, which was unknown at the time) contributed individually small effects. If each gene had two states in the usual Aa/AA/aa classroom example sense, but there were countless such genes, their joint action could approximate continuously varying traits whose measure was, say, the number of A alleles in an individual. This view was also consistent with the observed correlation of trait measure with kinship-degree among relatives. This history has been thoroughly documented. But there are some bits, important bits, missing, especially when it comes to the fervor for Big Data 'omics analysis of human diseases and other traits. In essence, we are still, a century later, conceptual prisoners of Mendel.
'Omics over the top: key questions generally ignored
Let us take GWAS (genomewide association studies) on their face value. GWAS find countless 'hits', sites of whatever sort across the genome whose variation affects variation in WhateverTrait you choose to map (everything simply must be 'genomic' or some other 'omic, no?). WhateverTrait varies because every subject in your study has a different combination of contributing alleles. Somewhat resembling classical Mendelian recessiveness, contributing alleles are found in cases as well as controls (or across the measured range of quantitative traits like stature or blood pressure), where the measured trait reflects how many A's one has: WhateverTrait is essentially the sum of A's in 'cases', which may be interpreted as a risk--some sort of 'probability' rather than certainty--of having been affected or of having the measured trait value.
We usually treat risk as a 'probability,' a single value, p, that applies to everyone with the same genotype. Here, of course, no two subjects have exactly the same genotype so some sort of aggregate risk score, adding up each person's 'hits', is assigned a p. This, however, tacitly assumes something like that each site contributes some fixed risk or 'probability' of affection. But this treats these values as if they were essential to the site, each thus acting as a parameter of risk. That is, sites are treated as a kind of fixed value or, one might say 'force', relative to the trait measure in question.
One obvious and serious issue is that these are necessarily estimated from past data, that is, by induction from samples. Not only is there sampling variation that usually is only crudely estimated by some standard statistical variation-related measure, but we know that the picture will be at least somewhat different in any other sample we might have chosen, not to mention other populations; and those who are actually candid about what they are doing know very well that the same people living in a different place or time would have different risks for the same trait.
No study is perfect, so we use some conveniently assumed well-behaved regression/correction adjustments to account for the statistical 'noise' due to factors like age, sex, and unmeasured environmental effects. Much worse than these issues, there are clearly factors of imprecision, and the obvious major one, taboo even to think about much less to mention, that relevant future factors (mutations, environments, lifestyles) are unknowable, even in principle. So what we really do, are forced to do, is extend what the past was like to the assumed future. But besides this, we don't count somatic changes (mutation arising in body tissues during life, that were not inherited), because they'd mess up our assertions of 'precision', and we can't measure them well in any case (so just shut one's eyes and pretend the ghost isn't in the house!).
All of these together mean that we are estimating risks from imperfect existing samples and past life-experience, but treating them as underlying parameters so that we can extend them to future samples. What that does is equate induction with deduction, assuming the past is rigorously parametric and will be the same in the future; but this is simply scientifically and epistemologically wrong, no matter how inconvenient it is to acknowledge this. Mutations, genotypes, and environments of the future are simply unpredictable, even in principle.
None of this is a secret, or new discovery, in any way. What it is, is inconvenient truth. These things should have been enough, by themselves and without badgering investigators about environmental factors that (we know very well, typically predominate) prevent all the NIH's precision promises from being accurate ('precise'), or even to a knowable degree. Yet this 'precision' sloganeering is being, sheepishly, aped all over the country by all sorts of groups who don't think for themselves and/or who go along lest they get left off the funding gravy train. This is the 'omics fad. If you think I am being too cynical, just look at what's being said, done, published, and claimed.
These are, to me, deep flaws in the way the GWAS and other 'omics industries, very well-heeled, are operating these days, to pick the public's pocket (pharma may, slowly, be awakening-- Lancet editorial, "UK life science research: time to burst the biomedical bubble," Lancet 392:187, 2018). But scientists need jobs and salaries, and if we put people in a position where they have to sing in this way for their supper, what else can you expect of them?
Unfortunately, there are much more serious problems with the science, and they have to do with the point-cause thinking on which all of this is based.
Even a point-cause must act through some process
By far most of the traits, disease or otherwise, that are being GWAS'ed and 'omicked these days, at substantial public expense, are treated as if the mapped 'causes' are point causes. If there are n causes, and a person has an unlucky set m out of many possible sets, one adds 'em up and predicts that person will have the target trait. And there is much that is ignored, assumed, or wishfully hidden in this 'will'. It is not clear how many authors treat it, tacitly, as a probability vs a certainty, because no two people in a sample have the same genotype and all we know is that they are 'affected' or 'unaffected'.
The genomics industry promises, essentially, that from conception onward, your DNA sequence will predict your diseases, even if only in the form of some 'risk'; the latter is usually a probability and despite the guise of 'precision' it can, of course, be adjusted as we learn more. For example, it must be adjusted for age, and usually other variables. Thus, we need ever larger and more and longer-lasting samples. This alone should steer people away from being profiteered by DNA testing companies. But that snipe aside, what does this risk or 'probability' actually mean?
Among other things, those candid enough to admit it know that environmental and lifestyle factors have a role, interacting with the genotype if not, usually, overwhelming it, meaning, for example, that the genotype only confers some, often modest, risk probability, the actual risk much more affected by lifestyle factors, most of which are not measured or not measured with accuracy, or not even yet identified. And usually there is some aspect that relates to age, or some assumption about what 'lifetime' risk means. Whose lifetime?
Aspects of such a 'probability'
There are interesting issues, longstanding issues, about these probabilities, even if we assume they have some kind of meaning. Why do so many important diseases, like cancers, only arise at some advanced age? How can a genomic 'risk' be so delayed and so different among people? Why are mice, with very similar genotypes to humans (which is why we do experiments on them to learn about human disease) only live to 3 while we live to our 70s and beyond?
Richard Peto, raised some of these questions many decades ago. But they were never really addressed, even in an era when NIH et al were spending much money on 'aging' research including studies of lifespan. There were generic theories that suggested from an evolutionary theory why some diseases were deferred to later ages (it is called 'negative pleiotropy'), but nobody tried seriously to explain why that was from a molecular/genetic point of view. Why do mice only live only 3 years, anyway? And so on.
These are old questions and very deep ones but they have not been answered and, generally, are conveniently forgotten--because, one might argue, they are inconvenient.
If a GWAS score increases the risk of a disease, that has a long delayed onset pattern, often striking late in life, and highly variable among individuals or over time, what sort of 'cause' is that genotype? What is it that takes decades for the genes to affect the person? There are a number of plausible answers, but they get very little attention at least in part because that stands in the way of the vested interests of entrenched too-big-to-kill Big Data faddish 'research' that demands instant promises to the public it is trephining for support. If the major reason is lifestyle factors, then the very delayed onset should be taken as persuasive evidence that the genotype is, in fact, by itself not a very powerful predictor.
Why would the additive effects of some combination of GWAS hits lead to disease risk? That is, in our complex nature why would each gene's effects be independent of each other contributor? In fact, mapping studies usually show evidence that other things, such as interactions are important--but they are at present almost impossibly complex to be understood.
Does each combination of genome-wide variants have a separate age-onset pattern, and if not, why not? And if so, how does the age effect work (especially if not due to person-years of exposure to the truly determining factors of lifestyle)? If such factors are at play, how can we really know, since we never see the same genotype twice? How can we assume that the time-relationship with each suspect genetic variant will be similar among samples or in the future? Is the disease due to post-natal somatic mutation, in which case why make predictions based on the purported constitutive genotypes of GWAS samples?
Obviously, if long delayed onset patterns are due not to genetic but to lifestyle exposures interacting with genotypes, then perhaps lifestyle exposures should be the health-related target, not exotic genomic interventions. Of course, the value of genome-based prediction clearly depends on environmental/lifestyle exposures, and the future of these exposure is obviously unknowable (as we clearly do know from seeing how unpredictable past exposures have affected today's disease patterns).
The point here is that our reliance on genotypes is a very convenient way of keeping busy, bringing in the salaries, but not facing up to the much more challenging issues that the easy one (run lots of data through DNA sequencers) can't address. I did not invent these points, and it is hard to believe that at least the more capable and less me-too scientists don't clearly know them, if quietly. Indeed, I know this from direct experience. Yes, scientists are fallible, vain, and we're only human. But of all human endeavors, science should be based on honesty because we have to rely on trust of each other's work.
The scientific problems are profound and not easily solved, and not soluble in a hurry. But much of the problem comes from the funding and careerist system that shackles us. This is the deeper explanation in many ways. The paint on the House of Science is the science itself, but it is the House that supports that paint that is the real problem.
A civically responsible science community, and its governmental supporters, should be freed from the iron chains of relentless Big Data for their survival, and start thinking, seriously, about the questions that their very efforts over the past 20 years, on trait after trait, in population after population, and yes, with Big Data, have clearly revealed.
Saturday, October 6, 2018
And yet it moves....our GWAScopes and Galileo's lesson on reality
By
Ken Weiss
In 1633, Galileo Galilei was forced to recant before the Pope his ideas about the movement of the Earth, or else to face the most awful penalty. As I understand the story, he did recant....but after leaving the Cathedral, he stomped his foot on the ground, and declared "And yet it moves!" For various reasons, usually reflecting their own selfish vested interests, the powers that be in human society frequently stifle unwelcome truths, truths that would threaten their privileged well-being. It was nothing new in Galileo's time--and it's still prevalent today.
All human endeavors are in some ways captives of current modes of thinking--world-views, beliefs, power and economic structures, levels of knowledge, and explanatory frameworks. Religions and social systems often, or perhaps typically, constrain thinking. They provide comforting answers and explanations, and people feel threatened by those not adhering, not like us in their views. The rejection of heresy applies far beyond formal religion. Dissenters or non-believers are part of 'them' rather than 'us', a potential threat, and it is thus common if not natural to distrust, exclude, or even persecute them.
At the same time, the world is as the world really is, especially when it comes to the physical Nature. And that is the subject of science and scientific knowledge. We are always limited by current knowledge, of course, and history has shown how deeply that can depend on technology, as Galileo's experience with the telescope exemplifies.
When you look through a telescope . . . .
In Galileo's time, it was generally thought or perhaps believed is a better word, that the cosmos was God's creation as known by biblical authority. It was created in the proverbial Genesis way, and the earth--with we humans on it--was the special center of that creation. The crystal spheres bearing the stars and planets, circled around and ennobled us with their divine light. In the west, at least, this was not just the view, it was what had (with few exceptions) seemed right since the ancients.
But knowledge is often, if not perhaps always, limited by our senses, and they in turn are limited by our sensory technology. Here, the classical example is the invention of the telescope, and eventually, what that cranky thinker Galileo saw through it. Before his time, we had we had our naked eyes to see the sun move, and the stars seemed quite plausibly to be crystal spheres bearing twinkles of light, rotating around us.
If you don't know the story, Wikipedia or many other sources can be consulted. But it was dramatic! Galileo's experience taught science a revolutionary lesson about reality vs myth and, very directly, about the importance of technology in our understanding of the world we live in.
The lesson from Galileo was that when you look through a telescope you are supposed to change your mind about what is out there in Nature. The telescope lets you see what's really there--even if it's not what you wanted to see, or thought you'd see, or would be most convenient for you to see.
From Mendel's eyes to ours
Ever since antiquity, plant and animal breeders empirically knew about inheritance, that is, about the physical similarities between parents and offspring. Choose parents with the most desirable traits, and their offspring will have those traits, at least, so to speak, on average. But how does that work?
Mendel heard lectures in Vienna that gave him some notion of the particulate nature of matter. When, in trying to improve agricultural yields, he noticed discrete differences, he decided do test their nature in pea plants which he knew about and were manageable subjects of experiments to understand the Molecular Laws of Life (my phrase, not his).
Analogies are never perfect, but we might say that Mendel's picking discrete, manageable traits was like pre-Newtonians looking at stars but not at what controlled their motion. Mendel got an idea of how parents and offspring could resemble each other in distinct traits. In a similar way that a telescope was the instrument that allowed Galileo to see the cosmos better, and do more observing than guessing, geneticists got their Galilean equivalent, in genomewide mapping (GWAS), which allowed us to do less guessing about inheritance and to see it better. We got our GWAScope!
But what have we done with our new toy? We have been mesmerized by gene-gazing. Like Galileo's contemporaries who, finally accepting that what he saw really was there and not just an artifact of the new instrument, gazed through their telescopes and listed off this and that finding, we are on a grand scale just enumerating, enumerating, and enumerating. We even boast about it. We build our careers on it.
That me-too effort is not surprising nor unprecedented. But it is also become what Kuhn called 'normal science'. It is butting our heads upon a wall. It is doing more and more of the same, without realizing that what we see is what's there, but we're not explaining it. From early in the 20th century we had quantitative genetics theory--the theory that agricultural breeders have used in formal ways for that century, making traditional breeding that had been around since the discovery of agriculture, more formalized and empirically rigorous. But we didn't have the direct genetic 'proof' that the theory was correct. Now we do, and we have it in spades.
We are spinning wheels and spending wealth on simple gene-gazing. It's time, it's high time, for some new insight to take us beyond what our GWAScopes can see, digesting and understanding what our gene-gazing has clearly shown.
Unfortunately, at present we have an 'omics Establishment that is as entrenched, for reasons we've often discussed here on MT, as the Church was for explanations of Truth in Galileo's time. It is now time for us to go beyond gene-gazing. GWAScopes have given us the insight--but who will have the insight to lead the way?
| Galileo: see Wikipedia "And yet it moves" |
At the same time, the world is as the world really is, especially when it comes to the physical Nature. And that is the subject of science and scientific knowledge. We are always limited by current knowledge, of course, and history has shown how deeply that can depend on technology, as Galileo's experience with the telescope exemplifies.
When you look through a telescope . . . .
In Galileo's time, it was generally thought or perhaps believed is a better word, that the cosmos was God's creation as known by biblical authority. It was created in the proverbial Genesis way, and the earth--with we humans on it--was the special center of that creation. The crystal spheres bearing the stars and planets, circled around and ennobled us with their divine light. In the west, at least, this was not just the view, it was what had (with few exceptions) seemed right since the ancients.
But knowledge is often, if not perhaps always, limited by our senses, and they in turn are limited by our sensory technology. Here, the classical example is the invention of the telescope, and eventually, what that cranky thinker Galileo saw through it. Before his time, we had we had our naked eyes to see the sun move, and the stars seemed quite plausibly to be crystal spheres bearing twinkles of light, rotating around us.
If you don't know the story, Wikipedia or many other sources can be consulted. But it was dramatic! Galileo's experience taught science a revolutionary lesson about reality vs myth and, very directly, about the importance of technology in our understanding of the world we live in.
The lesson from Galileo was that when you look through a telescope you are supposed to change your mind about what is out there in Nature. The telescope lets you see what's really there--even if it's not what you wanted to see, or thought you'd see, or would be most convenient for you to see.
![]() |
| Galileo's telescope (imagined). source: news.nationalgeographic.com |
Ever since antiquity, plant and animal breeders empirically knew about inheritance, that is, about the physical similarities between parents and offspring. Choose parents with the most desirable traits, and their offspring will have those traits, at least, so to speak, on average. But how does that work?
Mendel heard lectures in Vienna that gave him some notion of the particulate nature of matter. When, in trying to improve agricultural yields, he noticed discrete differences, he decided do test their nature in pea plants which he knew about and were manageable subjects of experiments to understand the Molecular Laws of Life (my phrase, not his).
Analogies are never perfect, but we might say that Mendel's picking discrete, manageable traits was like pre-Newtonians looking at stars but not at what controlled their motion. Mendel got an idea of how parents and offspring could resemble each other in distinct traits. In a similar way that a telescope was the instrument that allowed Galileo to see the cosmos better, and do more observing than guessing, geneticists got their Galilean equivalent, in genomewide mapping (GWAS), which allowed us to do less guessing about inheritance and to see it better. We got our GWAScope!
But what have we done with our new toy? We have been mesmerized by gene-gazing. Like Galileo's contemporaries who, finally accepting that what he saw really was there and not just an artifact of the new instrument, gazed through their telescopes and listed off this and that finding, we are on a grand scale just enumerating, enumerating, and enumerating. We even boast about it. We build our careers on it.
That me-too effort is not surprising nor unprecedented. But it is also become what Kuhn called 'normal science'. It is butting our heads upon a wall. It is doing more and more of the same, without realizing that what we see is what's there, but we're not explaining it. From early in the 20th century we had quantitative genetics theory--the theory that agricultural breeders have used in formal ways for that century, making traditional breeding that had been around since the discovery of agriculture, more formalized and empirically rigorous. But we didn't have the direct genetic 'proof' that the theory was correct. Now we do, and we have it in spades.
We are spinning wheels and spending wealth on simple gene-gazing. It's time, it's high time, for some new insight to take us beyond what our GWAScopes can see, digesting and understanding what our gene-gazing has clearly shown.
Unfortunately, at present we have an 'omics Establishment that is as entrenched, for reasons we've often discussed here on MT, as the Church was for explanations of Truth in Galileo's time. It is now time for us to go beyond gene-gazing. GWAScopes have given us the insight--but who will have the insight to lead the way?
Thursday, October 4, 2018
Processed meat? Really? How to process epidemiological news
By
Ken Weiss
So this week's Big Story in health is that processed meat is a risk for breast cancer. A study has been published that finds it so.....so it must be true, right? After all, it's on CNN and in some research report. Well, read even CNN's headliner story and you'll see the caveats, the admissions, softened of course, that the excess risk isn't that great, but, at least, that the past studies have been 'inconsistent'.
Of course, with this sort of 'research' the weak associations with some named risk factors can easily be correlated with who knows how many other behavioral or other factors, and even if researchers tried to winnow them out, it is obvious that it's a guessing game. Too many aspects of our lives are unreported, unknown, or correlated. This is why week after week, it seems, do-this or don't-do-that stories hit the headlines. If you believe them, well, I guess you should stop eating bacon.....until next week when some story will say that bacon prevents some disease or other.
Why breast cancer, by the way? Why not intestinal or many other cancers? Why, if even the current story refers to past results as being 'inconsistent' do we assume this one's right and they, or some of them, were wrong? Could it be that this is because investigators want attention, journalists need news stories, and so on?
Why, by the way, is it always things that are actually pleasurable to eat that end up in these stories? Why is it never cauliflower, or rhubarb, or squash? Why coffee and not hibiscus tea? Could western notions of sin have anything to do with the design of the studies themselves?
But what about, say, protective effects?
Of course, the headlines are always about the nasty diseases to which anything fun, like a juicy bacon sandwich, not to mention alcohol, coffee, cookies, and so on seems to condemn us. This makes for 'news', even if the past studies have been 'inconsistent' and therefore (it seems) we can believe this new one.
However, maybe eating bacon sandwiches has beneficial effects that don't make the headlines. Maybe they protect us from hives, antisocial or even criminal behavior, raise our IQ, or get fewer toothaches. Who could look for all those things, when they're busy trying to find bad things that bacon sandwiches cause? Have investigators of this sort of behavioral exposure asked whether bacon and, say, beer raise job performance, add to longevity, or (heavens!) improve one's sex life? Are these studies, essentially, about bad outcomes from things we enjoy? Is that, in fact, a subtle, indirect effect of the Protestant ethic or something like that? Of the urge to find bad things in these studies because they're paid for by NIH and done by people in medical schools?
The serious question
There are the pragmatic, self-interested aspects to these stories, and indeed even to the publication of the papers in proper journals. If they disagree with previous work on the purportedly same subject, they get new headlines, when they should perhaps not be published without explicitly addressing the disagreement in real detail, as the main point of the work--rather than the subtle implication that now, finally, these new authors have got it right. Or at least, they should not headline their findings. Or something!
Instead, news sells, and thus we build a legacy of yes/yes/no/maybe/no/yes! studies. These may generally be ignored by our baconophilic society, or they could make lots of people switch to spinach sandwiches, or many other kinds of effects. This latter is somewhat akin to the quantum mechanical notion that measurement gives only incomplete information but affects what's being measured.
Epidemiological studies of this sort have been funded, at large expense, for decades now, and if there is anything consistent about them, it's that they are not consistent. There must be a reason! Is it really that the previous studies weren't well done? Is it that if you fish for enough items, you'll catch something--big questionnaire studies looking at too many things? Is it changing behaviors in ways not being identified by the studies?
Or, perchance, is it that these investigators need projects to get funded? This sort of yo-yo result is very, very common. There must be some explanation, and that inconsistency itself is likely as fundamental and important as any given study's findings. Maybe bacon-burgers only are bad for you in some cultural environments, and these change in unmeasured ways, and that varying results are not 'inconsistent' at all--maybe it's the expectation that there's one relevant truth, so that inconsistency suggests problems in study design. Maybe the problem is in simplistic thinking about risks.
Where do cynical possibilities meet serious epistemological ones, and how do we tell?
| Yummy poison!! source: from the web, at Static.zoonar.com |
Why breast cancer, by the way? Why not intestinal or many other cancers? Why, if even the current story refers to past results as being 'inconsistent' do we assume this one's right and they, or some of them, were wrong? Could it be that this is because investigators want attention, journalists need news stories, and so on?
Why, by the way, is it always things that are actually pleasurable to eat that end up in these stories? Why is it never cauliflower, or rhubarb, or squash? Why coffee and not hibiscus tea? Could western notions of sin have anything to do with the design of the studies themselves?
But what about, say, protective effects?
Of course, the headlines are always about the nasty diseases to which anything fun, like a juicy bacon sandwich, not to mention alcohol, coffee, cookies, and so on seems to condemn us. This makes for 'news', even if the past studies have been 'inconsistent' and therefore (it seems) we can believe this new one.
However, maybe eating bacon sandwiches has beneficial effects that don't make the headlines. Maybe they protect us from hives, antisocial or even criminal behavior, raise our IQ, or get fewer toothaches. Who could look for all those things, when they're busy trying to find bad things that bacon sandwiches cause? Have investigators of this sort of behavioral exposure asked whether bacon and, say, beer raise job performance, add to longevity, or (heavens!) improve one's sex life? Are these studies, essentially, about bad outcomes from things we enjoy? Is that, in fact, a subtle, indirect effect of the Protestant ethic or something like that? Of the urge to find bad things in these studies because they're paid for by NIH and done by people in medical schools?
The serious question
There are the pragmatic, self-interested aspects to these stories, and indeed even to the publication of the papers in proper journals. If they disagree with previous work on the purportedly same subject, they get new headlines, when they should perhaps not be published without explicitly addressing the disagreement in real detail, as the main point of the work--rather than the subtle implication that now, finally, these new authors have got it right. Or at least, they should not headline their findings. Or something!
Instead, news sells, and thus we build a legacy of yes/yes/no/maybe/no/yes! studies. These may generally be ignored by our baconophilic society, or they could make lots of people switch to spinach sandwiches, or many other kinds of effects. This latter is somewhat akin to the quantum mechanical notion that measurement gives only incomplete information but affects what's being measured.
Epidemiological studies of this sort have been funded, at large expense, for decades now, and if there is anything consistent about them, it's that they are not consistent. There must be a reason! Is it really that the previous studies weren't well done? Is it that if you fish for enough items, you'll catch something--big questionnaire studies looking at too many things? Is it changing behaviors in ways not being identified by the studies?
Or, perchance, is it that these investigators need projects to get funded? This sort of yo-yo result is very, very common. There must be some explanation, and that inconsistency itself is likely as fundamental and important as any given study's findings. Maybe bacon-burgers only are bad for you in some cultural environments, and these change in unmeasured ways, and that varying results are not 'inconsistent' at all--maybe it's the expectation that there's one relevant truth, so that inconsistency suggests problems in study design. Maybe the problem is in simplistic thinking about risks.
Where do cynical possibilities meet serious epistemological ones, and how do we tell?
Monday, August 13, 2018
Big Data: the new Waiting for Godot
By
Ken Weiss
In Samuel Beckett's cryptic play, Waiting for Godot, two men spend the entire play anticipating the arrival of someone, Godot, at which point presumably something will happen--one can say, perhaps, that the wait will have been for some achieved objective. But what? Could it simply mean that they can then go somewhere else? Or, perhaps, there will be no end because Godot will never, in fact, arrive.
A good discussion of all of this is on the BBC Radio 4 The Forum podcast. Apparently, Beckett insisted that any such answers were in the play itself--he didn't imply that there was some external meaning, such as that Godot was God, or that the play was an allegory for the Cold War--which is one reason the play is so enigmatic.
Was the play written intentionally to be a joke, or a hoax? Of course, since the author refused to answer or perhaps even to recognize the legitimacy of the question, we'll never know. Or perhaps that in itself, is the tipoff that it really is a hoax. Or maybe (I think more likely) that because it was written in France in 1949, it's an existentialist era statement of the angst that comes from the recognition that the important questions in life don't have answers.
Waiting for the biomedical Promised Land
That was then, but today we are witnessing real-life versions of the play: things just as cleverly open-ended, with the 'What happens then?' question only having a vague, deferred answer, as in Beckett's title. And, as in the play, it is not clear how self-aware even some of the perpetrators are of what they are about.
I refer to the possibility that we are witnessing various Big Data endeavors, unknowingly imitative but as cleverly and cryptically open-ended as the implied resolution that will happen when Godot arrives. Big Data 'omics is a current, perhaps all too convenient, scientific version of the play, that we might call Waiting for God'omics. The arrival of the objective--indeed, not really stated, but just generically promised as, for example, 'precision genomic medicine' for 'All of Us'--is absolutely as slyly vague as what Vladimir and Estragon were presumably waiting for. The genomic Godot will never arrive!
This view is largely but not entirely cynical, for reasons that are at least a bit subtle themselves.
Reaching the oasis, the end of the rainbow, or the Promised Land is bad for business
One might note that if the 'omics Godot were ever to arrive, it would be the end of the Big Data (or should one say Big Gravy?) train, so obviously our Drs Vladimirs and Estragons must ensure that such a tragedy, arrival at the promised land, the elimination of all diseases in everyone, or whatever, never happens in real life. Is there any sense that anyone seriously thinks we would reach resolution of the cause of disease, with precision for all of us, say, and be able (that is, willing) to close down the Big Budget nature of our proliferating 'omictical me-too world?
We have entrenched the search for Godot, a goal so vague as to be unattainable. Even the proper use of the term 'precision' implies an asymptote, a truth that one never reaches but can get ever closer to. If we could get there, as is implied, we should have been promised 'exact' genomic medicine. And wouldn't this imply that then, finally, we'll divert the resources towards cures and prevention?
However, even if the perpetrators of the Big Promises never think or aren't aware of it, we must note that the goal cannot be reached even with the best and most honorable of intentions. Because of births and deaths, and environmental changes, and mutations and recombination, there truly never is the palm-draped oasis at which our venture could cease. There will never be an 'all' of us, and genetic causation is ever-changing (in part because of the similarly dynamic environment), meaning that there are no such things as risks to be approached with 'precision'. Risks are changeable and not stable, and indeed not fixed numerical values. At best, they are collective population (or sample) averages. So there is never a 'there' there, anywhere. There is only a different one everywhere.
But awareness of these facts doesn't seem to be part of the 'omicsalyptic promises with which we are inundated. They seem, by contrast, rote promises that are little if any different from political, economic, or religious promises--if only we do this, we'd get to a Promised Land. But such a land does not exist.
If we had, say, a real national health system, it would be properly and avowedly open-ended without anyone honorable objecting (if it were done well). And epidemiologically, of course, there will always be new mutations, recombinations, environments and the like to try to understand--disease with, or without strong genotype-phenotype causation. There will always be a need for health research (and basic science). But science, of all fields of human endeavor, should be honest. It should not hold out the promise that Godot will arrive, but in a sense, openly acknowledge that that can never happen.
But this doesn't let those off the guilty hook who are hawking today's implicit Big Data, big open-ended budget promise that by goosing up research now we'll soon eliminate genetic disease (I recall that Francis Collins did indeed, not all that long ago, promise that this Paradise would come soon--um, I think his date was something like 2010!) It's irresponsible, self-interested promising, of course. And those in genomics who are intelligent enough to deserve to be in genomics do, or should, know that very well.
Like Vladimir and Estragon, we'll always be told that we're waiting for Godot, and that he'll be coming soon.
NOTE: One might observe that Godoism is a firmly entrenched strategy elsewhere in our society, for examples, in regard to theoretical physics, where there will never be a collider big enough to answer the questions about fundamental particles: coming to closure would be as fiscally threatening to physics as it is to life sciences. Science is not alone in this, but our society does not pay it nearly enough skeptical heed.
![]() |
| www.mckellen.com |
A good discussion of all of this is on the BBC Radio 4 The Forum podcast. Apparently, Beckett insisted that any such answers were in the play itself--he didn't imply that there was some external meaning, such as that Godot was God, or that the play was an allegory for the Cold War--which is one reason the play is so enigmatic.
Was the play written intentionally to be a joke, or a hoax? Of course, since the author refused to answer or perhaps even to recognize the legitimacy of the question, we'll never know. Or perhaps that in itself, is the tipoff that it really is a hoax. Or maybe (I think more likely) that because it was written in France in 1949, it's an existentialist era statement of the angst that comes from the recognition that the important questions in life don't have answers.
Waiting for the biomedical Promised Land
That was then, but today we are witnessing real-life versions of the play: things just as cleverly open-ended, with the 'What happens then?' question only having a vague, deferred answer, as in Beckett's title. And, as in the play, it is not clear how self-aware even some of the perpetrators are of what they are about.
I refer to the possibility that we are witnessing various Big Data endeavors, unknowingly imitative but as cleverly and cryptically open-ended as the implied resolution that will happen when Godot arrives. Big Data 'omics is a current, perhaps all too convenient, scientific version of the play, that we might call Waiting for God'omics. The arrival of the objective--indeed, not really stated, but just generically promised as, for example, 'precision genomic medicine' for 'All of Us'--is absolutely as slyly vague as what Vladimir and Estragon were presumably waiting for. The genomic Godot will never arrive!
This view is largely but not entirely cynical, for reasons that are at least a bit subtle themselves.
Reaching the oasis, the end of the rainbow, or the Promised Land is bad for business
One might note that if the 'omics Godot were ever to arrive, it would be the end of the Big Data (or should one say Big Gravy?) train, so obviously our Drs Vladimirs and Estragons must ensure that such a tragedy, arrival at the promised land, the elimination of all diseases in everyone, or whatever, never happens in real life. Is there any sense that anyone seriously thinks we would reach resolution of the cause of disease, with precision for all of us, say, and be able (that is, willing) to close down the Big Budget nature of our proliferating 'omictical me-too world?
We have entrenched the search for Godot, a goal so vague as to be unattainable. Even the proper use of the term 'precision' implies an asymptote, a truth that one never reaches but can get ever closer to. If we could get there, as is implied, we should have been promised 'exact' genomic medicine. And wouldn't this imply that then, finally, we'll divert the resources towards cures and prevention?
However, even if the perpetrators of the Big Promises never think or aren't aware of it, we must note that the goal cannot be reached even with the best and most honorable of intentions. Because of births and deaths, and environmental changes, and mutations and recombination, there truly never is the palm-draped oasis at which our venture could cease. There will never be an 'all' of us, and genetic causation is ever-changing (in part because of the similarly dynamic environment), meaning that there are no such things as risks to be approached with 'precision'. Risks are changeable and not stable, and indeed not fixed numerical values. At best, they are collective population (or sample) averages. So there is never a 'there' there, anywhere. There is only a different one everywhere.
But awareness of these facts doesn't seem to be part of the 'omicsalyptic promises with which we are inundated. They seem, by contrast, rote promises that are little if any different from political, economic, or religious promises--if only we do this, we'd get to a Promised Land. But such a land does not exist.
If we had, say, a real national health system, it would be properly and avowedly open-ended without anyone honorable objecting (if it were done well). And epidemiologically, of course, there will always be new mutations, recombinations, environments and the like to try to understand--disease with, or without strong genotype-phenotype causation. There will always be a need for health research (and basic science). But science, of all fields of human endeavor, should be honest. It should not hold out the promise that Godot will arrive, but in a sense, openly acknowledge that that can never happen.
But this doesn't let those off the guilty hook who are hawking today's implicit Big Data, big open-ended budget promise that by goosing up research now we'll soon eliminate genetic disease (I recall that Francis Collins did indeed, not all that long ago, promise that this Paradise would come soon--um, I think his date was something like 2010!) It's irresponsible, self-interested promising, of course. And those in genomics who are intelligent enough to deserve to be in genomics do, or should, know that very well.
Like Vladimir and Estragon, we'll always be told that we're waiting for Godot, and that he'll be coming soon.
NOTE: One might observe that Godoism is a firmly entrenched strategy elsewhere in our society, for examples, in regard to theoretical physics, where there will never be a collider big enough to answer the questions about fundamental particles: coming to closure would be as fiscally threatening to physics as it is to life sciences. Science is not alone in this, but our society does not pay it nearly enough skeptical heed.
Thursday, June 14, 2018
A new biomedical insight?
By
Ken Weiss
Here is a thoughtful and timely quote:
Of course, palliation or cure of what disorders can be eased or cured should be the first order and obligation of medicine. Where nothing like that is clearly known, trials of possible treatments are surely in order, if the patient understands at least the basic nature of the research, for example, that some are being given placebos while others the treatment under investigation. Science doesn't know everything, and we often must learn the hard way, by trial and error.
Given that, perhaps the most important job of responsible science is to temper its claims, and to offer doses of the reality that life is a temporary arrangement, and that we need to get the most out of that bit of it to which we are privileged to have. So research investment should be focused on tractable, definable problems, not grandiose open-ended schemes. But promises of the latter are nothing new to society (in medicine or other realms of life).
The problem with false promises, by preachers of any type, is that they mislead the gullible, and in many cases this is known by those making the promises--or could and should be known. The role of false promise in religion is perhaps debatable, but its role in science, while understandable given human ego and the struggle for attention, careers, and funding, is toxic. People suffering, of poverty, hardship, or disease, seek and deserve solace. But science needs to be protected from the temptations of huckstering, so that it can do its very important business as objectively as is humanly possible.
By the way, the quote is from about 150 years ago, from War and Peace, Tolstoy's 1869 masterpiece about the nature of causation in human affairs.
". . . . as no single disease can be fully understood in a living person; for every living person has his individual peculiarities and always has his own peculiar, new, complex complaints unknown to medicine—not a disease of the lungs, of the kidneys, of the skin, of the heart, and so on, as described in medical books, but a disease that consists of one out of the innumerable combinations of ailments of those organs. This simple reflection can never occur to doctors . . . . because it is the work of their life to undertake the cure of disease, because it is for that that they are paid, and on that they have wasted the best years of their life. And what is more, that reflection could not occur to the doctors because they saw that they unquestionably were of use . . . not because they made the patient swallow drugs, mostly injurious (the injury done by them was hardly perceptible because they were given in such small doses). They were of use, were needed, were indispensable in fact (for the same reason that there have always been, and always will be, reputed healers, witches, homÅ“opaths and allopaths), because they satisfied the moral cravings of the patient . . . . They satisfied that eternal human need of hope for relief, that need for sympathetic action that is felt in the presence of suffering, that need that is shown in its simplest form in the little child, who must have the place rubbed when it has hurt itself. The child . . . . feels better for the kissing and rubbing. The child cannot believe that these stronger, cleverer creatures have not the power to relieve its pain. . . ."The language seems a bit arcane, and this is a translation, but its cogency as a justification for today's Big Data feeding frenzy is clear. People who are ill, or facing death, will naturally grasp at whatever straws may be offered them. In one way or another, this has been written about even back to Hippocrates.
Of course, palliation or cure of what disorders can be eased or cured should be the first order and obligation of medicine. Where nothing like that is clearly known, trials of possible treatments are surely in order, if the patient understands at least the basic nature of the research, for example, that some are being given placebos while others the treatment under investigation. Science doesn't know everything, and we often must learn the hard way, by trial and error.
Given that, perhaps the most important job of responsible science is to temper its claims, and to offer doses of the reality that life is a temporary arrangement, and that we need to get the most out of that bit of it to which we are privileged to have. So research investment should be focused on tractable, definable problems, not grandiose open-ended schemes. But promises of the latter are nothing new to society (in medicine or other realms of life).
The problem with false promises, by preachers of any type, is that they mislead the gullible, and in many cases this is known by those making the promises--or could and should be known. The role of false promise in religion is perhaps debatable, but its role in science, while understandable given human ego and the struggle for attention, careers, and funding, is toxic. People suffering, of poverty, hardship, or disease, seek and deserve solace. But science needs to be protected from the temptations of huckstering, so that it can do its very important business as objectively as is humanly possible.
By the way, the quote is from about 150 years ago, from War and Peace, Tolstoy's 1869 masterpiece about the nature of causation in human affairs.
Sunday, May 6, 2018
"All of us" Who are 'us'?
By
Ken Weiss
So the slogan du jour, All Of Us, is the name of a 1.4 billion dollar initiative being launched today by NIH Director Francis Collins. The plan is to enroll one million volunteers in this mega-effort, the goal of which is, well, it depends. It is either to learn how to prevent and treat "several common diseases" or, according to Dr Collins who talked about the initiative here, "It's gonna give us the information we currently lack" to "allow us to understand all of those things we don't know that will lead to better health care." He's very enthusiastic about All of Us (aka Precision Medicine), calling it a "national adventure that's going to transform medical care." This might be viewed in the context of promises in the late 1900s that by now we'd basically have solved these problems--rather than needing ever-bigger longer-term 'data'.
And one can ask how the data quality can possibly be maintained if medical records of whoever volunteers vary in their quality, verifiability, and so on. But that is a technical issue. There are sociological and ontological issues as well.
All of Us?
Serving 'all of us' sounds very noble and representative. But let's see how sincere this publicly hyped promise really is. Using very rough figures, which will serve the point, there are 320 million Americans. So 1 million volunteers would be about 0.3% of 'all' of us. So first we might ask: What about achieving some semblance of real inclusive fairness in our society, by making a special effort to oversample African Americans, Hispanics, and Native Americans, before the privileged, mainly white, middle class get their names on the roles? That might make up for past abuses affecting their health and well-being.
So, OK, let's stop dreaming but at least make the sample representative of the country, white and otherwise. Does that imply fairness? There are, for example, about 300,000 Navajo Native Americans in the country. If All Of Us means what it promises, there would be about 950 Navajos in the sample. And about 56 Hopi tribespeople. And there are, of course, many other ethnic groups that would have to be included. Random (proportionate) sampling would include about 600,000 'white' people in the sample.
These are just crude subpopulation counts from superficial Google searching, but the point is that in no sense is the proposed self-selected sample of volunteers going to represent All Of Us in anything resembling fair distribution of medical benefits. You can't get as much detailed genomewide (not to mention environmental) data from a few hundred sampled individuals compared to hundreds of thousands. To be fair and representative in that sense, the sample would have to be stratified in some way rather than volunteer-based. It seems very unlikely that the volunteers who will be included are in some real sense going to be representative of the US, rather than, say of university and other privileged communities, major cities, and so on--even if not because of intentional bias but simply because they are more likely to learn of All Of Us and to participate.
Of course, defining what is fair and just is not easy. For example, there are far more Anglo Americans than Navajo or Hopi. So the Anglos might expect to get most of the benefits. But that isn't what All Of Us seems to be promising. To get adequate information from a small group, given the causal complexity we are trying to understand, they should probably be heavily oversampled. Even doing that would leave room for enough samples from the larger populations of Anglo and African-Americans adequate for the kind of discovery we could anticipate from this sort of Big Data study of causes of common disease.
More problems than sociology
That is the sociological problem of claiming representativeness of 'all' of us. But of course there is a deeper problem that we've discussed many times, and that is the false implied promise of essentially blanket (miracle?) cures for common diseases. In fact, we know very well that complex causation, of the common diseases that are the purported target of this initiative, involves tens to thousands of variable genome locations, not to mention the environmental ones that are beyond simple counting. Further, and this is a serious, nontrivial point, we know that these sorts of contributing causes include genetic and environmental exposures in the sampled individuals' futures, and these cannot be predicted, even in principle. These are the realities.
And, even if the project were truly representative of the US population demographically, as a sample of self-selected volunteers there remains the problem of representing diseases in the population subsets. Presumably this is why they are focusing on "common diseases", but still the sample will have to be stratified by possible causal exposures (lifestyles, diets, etc) and ethnicity, and then they'll have to have enough controls to make case-control comparisons meaningful. So, how many common diseases, and how will they be represented (males/females, early/late onset, related to what environmental lifestyles, etc.?)? One million volunteers isn't going to be representative or a large enough sample that has to be stratified for statistical analysis, especially if the sample also includes the ethnic diversity that the project promises.
And there's the epistemological problem of causation being too individualistic for this kind of hypothesis-free data fishing to solve--indeed, it is just that kind of research that has shown us clearly how that kind of research is not what we need now. We need research focused on problems that really are 'genetic', and some movement of resources to new thinking, rather than perpetuating the same kind of open-ended, 'Big Data' investment.
And more
In this context, the PR seems mostly to be spin for more money for NIH and its welfare clients (euphemistically called 'universities'). Every lock on Big Money for the Big Data lobby, or perhaps belief-system, excludes funding for focused research, for example, on diseases that would seem to be tractably understood by real science rather than a massive hypothesis-free fishing expedition.
How could the 1.4 billion dollars be better spent? A legitimate goal might be to do a trial run of a linked electronic records system as part of explicit move towards what we really need, and which would really include all of us; a real national healthcare system. This could be openly explained--we're going to learn how to run such a comprehensive system, etc., so we don't get overwhelmed with mistakes. But then for the very same reason, a properly representative project is what should be done. That would involve stratified sampling, and more properly thought-out design. But that would require new thinking about the actual biology.
And one can ask how the data quality can possibly be maintained if medical records of whoever volunteers vary in their quality, verifiability, and so on. But that is a technical issue. There are sociological and ontological issues as well.
All of Us?
Serving 'all of us' sounds very noble and representative. But let's see how sincere this publicly hyped promise really is. Using very rough figures, which will serve the point, there are 320 million Americans. So 1 million volunteers would be about 0.3% of 'all' of us. So first we might ask: What about achieving some semblance of real inclusive fairness in our society, by making a special effort to oversample African Americans, Hispanics, and Native Americans, before the privileged, mainly white, middle class get their names on the roles? That might make up for past abuses affecting their health and well-being.
So, OK, let's stop dreaming but at least make the sample representative of the country, white and otherwise. Does that imply fairness? There are, for example, about 300,000 Navajo Native Americans in the country. If All Of Us means what it promises, there would be about 950 Navajos in the sample. And about 56 Hopi tribespeople. And there are, of course, many other ethnic groups that would have to be included. Random (proportionate) sampling would include about 600,000 'white' people in the sample.
These are just crude subpopulation counts from superficial Google searching, but the point is that in no sense is the proposed self-selected sample of volunteers going to represent All Of Us in anything resembling fair distribution of medical benefits. You can't get as much detailed genomewide (not to mention environmental) data from a few hundred sampled individuals compared to hundreds of thousands. To be fair and representative in that sense, the sample would have to be stratified in some way rather than volunteer-based. It seems very unlikely that the volunteers who will be included are in some real sense going to be representative of the US, rather than, say of university and other privileged communities, major cities, and so on--even if not because of intentional bias but simply because they are more likely to learn of All Of Us and to participate.
Of course, defining what is fair and just is not easy. For example, there are far more Anglo Americans than Navajo or Hopi. So the Anglos might expect to get most of the benefits. But that isn't what All Of Us seems to be promising. To get adequate information from a small group, given the causal complexity we are trying to understand, they should probably be heavily oversampled. Even doing that would leave room for enough samples from the larger populations of Anglo and African-Americans adequate for the kind of discovery we could anticipate from this sort of Big Data study of causes of common disease.
More problems than sociology
That is the sociological problem of claiming representativeness of 'all' of us. But of course there is a deeper problem that we've discussed many times, and that is the false implied promise of essentially blanket (miracle?) cures for common diseases. In fact, we know very well that complex causation, of the common diseases that are the purported target of this initiative, involves tens to thousands of variable genome locations, not to mention the environmental ones that are beyond simple counting. Further, and this is a serious, nontrivial point, we know that these sorts of contributing causes include genetic and environmental exposures in the sampled individuals' futures, and these cannot be predicted, even in principle. These are the realities.
And, even if the project were truly representative of the US population demographically, as a sample of self-selected volunteers there remains the problem of representing diseases in the population subsets. Presumably this is why they are focusing on "common diseases", but still the sample will have to be stratified by possible causal exposures (lifestyles, diets, etc) and ethnicity, and then they'll have to have enough controls to make case-control comparisons meaningful. So, how many common diseases, and how will they be represented (males/females, early/late onset, related to what environmental lifestyles, etc.?)? One million volunteers isn't going to be representative or a large enough sample that has to be stratified for statistical analysis, especially if the sample also includes the ethnic diversity that the project promises.
And there's the epistemological problem of causation being too individualistic for this kind of hypothesis-free data fishing to solve--indeed, it is just that kind of research that has shown us clearly how that kind of research is not what we need now. We need research focused on problems that really are 'genetic', and some movement of resources to new thinking, rather than perpetuating the same kind of open-ended, 'Big Data' investment.
And more
In this context, the PR seems mostly to be spin for more money for NIH and its welfare clients (euphemistically called 'universities'). Every lock on Big Money for the Big Data lobby, or perhaps belief-system, excludes funding for focused research, for example, on diseases that would seem to be tractably understood by real science rather than a massive hypothesis-free fishing expedition.
How could the 1.4 billion dollars be better spent? A legitimate goal might be to do a trial run of a linked electronic records system as part of explicit move towards what we really need, and which would really include all of us; a real national healthcare system. This could be openly explained--we're going to learn how to run such a comprehensive system, etc., so we don't get overwhelmed with mistakes. But then for the very same reason, a properly representative project is what should be done. That would involve stratified sampling, and more properly thought-out design. But that would require new thinking about the actual biology.
Thursday, April 26, 2018
Gene mapping: More Monty Python than Monty Python
By
Ken Weiss
![]() |
| The gene for ...... (Monty Python) |
Why we keep spending money on papers that keep showing how MontyPythonish genomewide association with complex traits is, is itself a valid question. To say, with a straight face, that we now know of hundreds, much less of thousands, of genomewide sites that affect some trait--in some particular sample of humans, with much or most of the estimated heritability yet unaccounted for, without saying that enough is enough, is almost in itself a comedy routine.
We have absolutely no reason--or, at least, no need--to criticize anything about individual mapping papers. Surely there are false findings, mis-used statistical tests, and so on, but that is part of the normal life in science, because we don't know everything and have to make assumptions, etc. Some of the findings will be ephemeral, sample-specific, and so on. That doesn't make them wrong. Instead, the critique should be aimed at authors who present such work with a straight face as if it is (1) important, (2) novel in any really novel way, and (3) not saying that the paper shows why, by now with so many qualitatively similar results, we should stop public funding of this sort of work. We should move on to more cogent science that reflects, but doesn't just repeat, the discovery of genomic causal (or, at least, associational) complexity.
The bottom line
What these studies show, and there is no reason to challenge the results per se, is that complex traits are not to be explained by simple, much less additive genetic models. There is massive causal redundancy with similar traits due to dissimilar genotypes. But this shouldn't be a surprise. Indeed, we can easily account for this in terms of evolutionary phenomena, both related to processes like gene duplication and the survival protection that alternative pathways provides.
Even if each GWAS 'hit' is correct and not some sort of artifact, it is unclear what the message is. To us, who have no vested interest in continuing, open-ended GWAS efforts with ever-larger samples, the bottom line is that this is not the way to understand biological causation.
We reach that view on genomic considerations alone, without even considering the environmental and somatic mutation components of phenotype generation, though these are often obviously determinative (as secular trends in risk clearly show). We reach this view without worrying about the likelihood that many or perhaps even most of these 'hits' are some sort of statistical, sampling, analytic or other artifact, or are so indirectly related to the measured trait, or so environment-dependent as to be virtually worthless in any practical sense.
What GWAS ignore
There are also three clear facts that are swept under the rug, or just ignored, in this sort of work. One is somatic mutation, which are not detected in constitutive genomewide studies but could be very important (e.g., cancer). The second is that DNA is inert and does something only in interaction with other molecules. Many of those relate to environmental and lifestyle exposures, which candid investigators know are usually dreadfully inaccurately measured. The third is that future mutations, not to mention future environments are unpredictable, even in principle. Yet the repeatedly stressed objective of GWAS is 'precision' predictive medicine. It sounds like a noble objective, but it's not so noble given the known and knowable reasons these promises can't be met.
So, if biological causation is complex, as these studies and diverse other sorts of direct and indirect evidence clearly show, then why can't we pull the plug on these sorts of studies, and instead, invest in some other mode of thinking, some way to do focused studies where genetic causation is clear and real, rather than continuing to feed the welfare state of GWAS?
We're held back by inertia, and the lack of better ideas, but another important if not defining constraint is that investigator careers depend on external funding and that leads to safe me-too proposals. We should stop imitating Monty Python, and recognize that if the gene-causation question even makes sense, some new way of thinking about it is needed.
Thursday, June 22, 2017
Everything is genetic, isn't it?
By
Ken Weiss
There is hardly a trait, physical or behavioral, for which there is not at least some familial resemblance, especially among close relatives. And I'm talking about what is meant when someone scolds you saying, "You're just like your mother!" The more distant the relatives in terms of generations of separation, the less the similarity. So you really can resist when told, "You're just like your great-grandmother!" The genetic effects decline in a systematic way with more distant kinship.
The 'heritability' of a trait refers to the relative degree to which its variation is the result of variation in genes, the rest being due to variation in non-genetic factors we call 'environment'. Heritability is a ratio that ranges from zero when genes have nothing to do with the trait, to 1.0 when all the variation is genetic. The measure applies to a sample or population and cannot automatically be extended to other samples or populations, where both genetic and environmental variation will be different, often to an unknown extent.
Most quantitative traits, like stature or blood pressure or IQ scores show some amount, often quite substantial, of genetic influence. It often happens that we are interested in some trait that we think must be produced or affected by genes, but that no relevant factor, like a protein, is known. The idea arose decades ago that if we could scan the genome, and compare those with different manifestations of the trait, using mapping techniques like GWAS (genomewide association studies), we could identify those sites, genomewide, whose variation in our chosen sample may affect the trait's variation. Qualitative traits like the presence or absence of a disease (say, diabetes or hypertension), may often be due to the presence of some set of genetic variants whose joint impact exceeds some diagnostic threshold, and mapping studies can compare genotypes in affected cases to unaffected controls to identify those sites.
Genes are involved in everything. . . . .
Many things can affect the amount of similarity among relatives, so one has to try to think carefully about attributing ideas of similarity and cause. Some traits, like stature (height) have very high heritability, sometimes estimated to be about 0.9, that is, 90% of the variation being due to the effects of genetic variation. Other traits have much lower heritability, but there's generally familial similarity. And, that's because we each develop from a single fertilized egg cell, which includes transmission of each of our parent's genomes, plus ingredients provided by the egg (and perhaps to a tiny degree sperm), much of which were the result of gene action in our parents when they produced that sperm or egg (e.g., RNA, proteins). This is why traits can usually be found to have some heritability--some contribution due to genetic variation among the sampled individuals. In that sense, we can say that genes are involved in everything.
Understanding the genetic factors involved in disease can be important and laudatory, even if tracking them down is a frustrating challenge. But because genes are involved in everything, our society also seems to have an unending lust for investigators to overstate the value of their findings or, in particular, to estimate or declaim on the heritability, and hence genetic determination, of the most societally sensitive traits, like sexuality, criminality, race, intelligence, physical abuse and the like.
. . . . . but not everything is 'genetic'!
If the estimated heritability for a trait we care about is substantial, then this does suggest the obvious: genes are contributing to the mechanisms of the trait and so it is reasonable to acknowledge that genetic variation contributes to variation in the trait. However, the mapping industry implies a somewhat different claim: it is that genes are a major factor in the sense that individual variants can be identified that are useful predictors of the trait of interest (NIH's lobbying machine has been saying we'll be able to predict future disease with 'precision'). There has been little constraint on the types of trait for which this approach, sometimes little more than belief or wishful-thinking, is appropriate.
It is important to understand that our standard measures of genes' relative effect are affected both by genetic variation and environmental lifestyle factors. That means that if environments were to change, the relative genetic effects, even in the very same individuals, would also change. But it isn't just environments that change; genotypes change, too, when mutations occur, and as with environmental factors, these change in ways that we cannot predict even in principle. That means that we cannot legitimately extrapolate, to a knowable extent, the genetic or environmental factors we observe in a given sample or population, to other, much less to future samples or populations. This is not a secret problem, but it doesn't seem to temper claims of dramatic discoveries, in regard to disease or perhaps even more for societally sensitive traits.
But let's assume, correctly, that genetic variation affects a trait. How does it work? The usual finding is that tens or even hundreds of genome locations affect variation in the test trait. Yet most of the effects of individual genes are very small or rare in the sample. At least as important is that the bulk of the estimated heritability remains unaccounted for, and unless we're far off base somehow, the unaccounted fraction is due to the leaf-litter of variants individually too weak or too rare to reach significance.
Often it's also asserted that all the effects are additive, which makes things tractable: for every new person, not part of the study, just identify their variants and add up their estimated individual effects to get the total effect on the new person for whatever publishable trait you're interested in. That's the predictive objective of the mapping studies. However, I think that for many reasons one cannot accept that these variable sites' actions are truly additive. The reasons have to with actual biology, not the statistical convenience of using the results to diagnose or predict traits. Cells and their compounds vary in concentrations per volume (3D), binding properties (multiple dimensions), surface areas (2D) and some in various ways that affect how how proteins are assembled and work, and so on. In aggregate, additivity may come out in the wash, but the usual goal of applied measures is to extrapolate these average results to prediction in individuals. There are many reasons to wish that were true, but few to believe it very strongly.
Even if they were really additive, the clearly very different leaf-litter background that together accounts for the bulk of the heritability can obscure the numerical amount of that additivity from sample to sample and person to person. That is, what you estimated from this sample, may not apply, to an unknowable extent, to the next sample. If and when it does works, we're lucky that our assumptions weren't too far off.
Of course, the focus and promises from the genetics interests assume that environment has nothing serious to do with the genetic effects. But it's a major, often by far the major, factor, and it may even in principle be far more changeable than genetic variation. One would have to say that environmental rather than genetic measures are likely to be, by far, the most important things to change in society's interest.
We regularly write these things here not just to be nay-sayers, but to try to stress what the issues are, hoping that someone, by luck or insight, finds better solutions or different ways to approach the problem that a century of genetics, despite its incredibly huge progress, has not yet done. What it has done is in exquisite detail to show us what the problems are.
A friend and himself a good scientist in relevant areas, Michael Joyner, has passed on a rather apt suggestion to me, that he says he saw in work by Denis Noble. We might be better off if we thought of the genome as a keyboard rather than as a code or program. That is a good way to think about the subtle point that, in the end, yes, Virginia, there really are genomic effects: genes affect every trait....but not every trait is 'genetic'!
The 'heritability' of a trait refers to the relative degree to which its variation is the result of variation in genes, the rest being due to variation in non-genetic factors we call 'environment'. Heritability is a ratio that ranges from zero when genes have nothing to do with the trait, to 1.0 when all the variation is genetic. The measure applies to a sample or population and cannot automatically be extended to other samples or populations, where both genetic and environmental variation will be different, often to an unknown extent.
Most quantitative traits, like stature or blood pressure or IQ scores show some amount, often quite substantial, of genetic influence. It often happens that we are interested in some trait that we think must be produced or affected by genes, but that no relevant factor, like a protein, is known. The idea arose decades ago that if we could scan the genome, and compare those with different manifestations of the trait, using mapping techniques like GWAS (genomewide association studies), we could identify those sites, genomewide, whose variation in our chosen sample may affect the trait's variation. Qualitative traits like the presence or absence of a disease (say, diabetes or hypertension), may often be due to the presence of some set of genetic variants whose joint impact exceeds some diagnostic threshold, and mapping studies can compare genotypes in affected cases to unaffected controls to identify those sites.
Genes are involved in everything. . . . .
Many things can affect the amount of similarity among relatives, so one has to try to think carefully about attributing ideas of similarity and cause. Some traits, like stature (height) have very high heritability, sometimes estimated to be about 0.9, that is, 90% of the variation being due to the effects of genetic variation. Other traits have much lower heritability, but there's generally familial similarity. And, that's because we each develop from a single fertilized egg cell, which includes transmission of each of our parent's genomes, plus ingredients provided by the egg (and perhaps to a tiny degree sperm), much of which were the result of gene action in our parents when they produced that sperm or egg (e.g., RNA, proteins). This is why traits can usually be found to have some heritability--some contribution due to genetic variation among the sampled individuals. In that sense, we can say that genes are involved in everything.
Understanding the genetic factors involved in disease can be important and laudatory, even if tracking them down is a frustrating challenge. But because genes are involved in everything, our society also seems to have an unending lust for investigators to overstate the value of their findings or, in particular, to estimate or declaim on the heritability, and hence genetic determination, of the most societally sensitive traits, like sexuality, criminality, race, intelligence, physical abuse and the like.
. . . . . but not everything is 'genetic'!
If the estimated heritability for a trait we care about is substantial, then this does suggest the obvious: genes are contributing to the mechanisms of the trait and so it is reasonable to acknowledge that genetic variation contributes to variation in the trait. However, the mapping industry implies a somewhat different claim: it is that genes are a major factor in the sense that individual variants can be identified that are useful predictors of the trait of interest (NIH's lobbying machine has been saying we'll be able to predict future disease with 'precision'). There has been little constraint on the types of trait for which this approach, sometimes little more than belief or wishful-thinking, is appropriate.
It is important to understand that our standard measures of genes' relative effect are affected both by genetic variation and environmental lifestyle factors. That means that if environments were to change, the relative genetic effects, even in the very same individuals, would also change. But it isn't just environments that change; genotypes change, too, when mutations occur, and as with environmental factors, these change in ways that we cannot predict even in principle. That means that we cannot legitimately extrapolate, to a knowable extent, the genetic or environmental factors we observe in a given sample or population, to other, much less to future samples or populations. This is not a secret problem, but it doesn't seem to temper claims of dramatic discoveries, in regard to disease or perhaps even more for societally sensitive traits.
Often it's also asserted that all the effects are additive, which makes things tractable: for every new person, not part of the study, just identify their variants and add up their estimated individual effects to get the total effect on the new person for whatever publishable trait you're interested in. That's the predictive objective of the mapping studies. However, I think that for many reasons one cannot accept that these variable sites' actions are truly additive. The reasons have to with actual biology, not the statistical convenience of using the results to diagnose or predict traits. Cells and their compounds vary in concentrations per volume (3D), binding properties (multiple dimensions), surface areas (2D) and some in various ways that affect how how proteins are assembled and work, and so on. In aggregate, additivity may come out in the wash, but the usual goal of applied measures is to extrapolate these average results to prediction in individuals. There are many reasons to wish that were true, but few to believe it very strongly.
Even if they were really additive, the clearly very different leaf-litter background that together accounts for the bulk of the heritability can obscure the numerical amount of that additivity from sample to sample and person to person. That is, what you estimated from this sample, may not apply, to an unknowable extent, to the next sample. If and when it does works, we're lucky that our assumptions weren't too far off.
Of course, the focus and promises from the genetics interests assume that environment has nothing serious to do with the genetic effects. But it's a major, often by far the major, factor, and it may even in principle be far more changeable than genetic variation. One would have to say that environmental rather than genetic measures are likely to be, by far, the most important things to change in society's interest.
We regularly write these things here not just to be nay-sayers, but to try to stress what the issues are, hoping that someone, by luck or insight, finds better solutions or different ways to approach the problem that a century of genetics, despite its incredibly huge progress, has not yet done. What it has done is in exquisite detail to show us what the problems are.
A friend and himself a good scientist in relevant areas, Michael Joyner, has passed on a rather apt suggestion to me, that he says he saw in work by Denis Noble. We might be better off if we thought of the genome as a keyboard rather than as a code or program. That is a good way to think about the subtle point that, in the end, yes, Virginia, there really are genomic effects: genes affect every trait....but not every trait is 'genetic'!
Thursday, April 27, 2017
The Law of No Restraint
By
Ken Weiss
There's a new law of science reporting or, perhaps more accurately put, of the science jungle. The law is to feed any story, no matter how fantastic, to science journalists (including your university's PR spinners), and they will pick up whatever can be spun into a Big Story, and feed it to the eager mainstream media. Caveats may appear somewhere in the stories, but not the headlines so that, however weak or tentative or incredible, the story gets its exposure anyway. Then on to tomorrow's over-sell.
One rationale for this is that unexpected findings--typically presented breathlessly as 'discoveries'--sell: they rate the headline. The caveats and doubts that might un-headline the story may be reported as well, but often buried in minimal terms late in the report. Even if the report balances skeptics and claimants, simply publishing the story is enough to give at least some credence to the discovery.
The science journalism industry is heavily inflated in our commercial, 24/7 news environment. It would be better for science, if not for sales, if all these hyped papers, rather than being publicized at the time the paper is published, first appeared in musty journals for specialists to argue over, and in the pop-sci news only after some mature judgments are made about them. Of course, that's not good for commercial or academic business.
We have just seen a piece reporting that humans were in California something like 135,000 years ago, rather than the well-established continental dates of about 12,000. The report which I won't grace by citing here, and you've probably seen it anyway, then went on to speculate about what 'species' of our ancestors these early guys might have been.
Why is this so questionable? If it were a finding on its own, it might seem credible, but given the plethora of skeletal and cultural archeological findings, up and down the Americas, such an ancient habitation seems a stretch. There is no comparable trail of earlier settlements in northeast Asia or Alaska that might suggest it, and there are lots of animal and human archeological remains--all basically consistent with each other, so why has no earlier finding yet been made? It is of course possible that this is the first and is a correct one, but it is far too soon for this to merit a headline story, even with caveats.
Another piece we saw today reported that a new analysis casts doubt on whether diets high in saturated fat are bad for you. This was a meta-analysis of various other studies that have been done, and got some headline treatment because the authors report that, contrary to many findings over many years, saturated fats don't clog arteries. Instead, they say, coronary heart disease is a chronic inflammatory condition. Naturally, the study's basic data are being challenged, as reflected in this story's discussion, by critiques of its data and method. These get into details we're not qualified to judge, and we can't comment on the relative merits of the case.
However, one thing we can note is that with respect to coronary heart disease, study after study has reported more or less the same, or at least consistent findings about the correlation between saturated fats and risk. Still, despite so very much careful science, including physiological studies as well as statistical analysis of population samples, can we still apparently not be sure about a dietary component that we've been told for years should play a much reduced role in what we eat? How on earth could we possibly still not know about saturated fat diets and disease risk?
If this very basic issue is unresolved after so long, and the story is similar for risk factors for many complex diseases, then what is all this promise of 'precise' medicine all about? Causal explanations are still fundamentally unclear for many cancers, dementias, psychiatric disorders, heart disease, and so on. So why isn't the most serious conclusion that our methods and approaches themselves are for some reason simply not adequate to answer such seemingly simple questions as 'is saturated fat bad for you?' Were the plethora of previous studies all flawed in some way? Is the current study? Do the publicizing of the studies themselves change behaviors in ways that affects future studies?
There may be no better explanation than that diets and physiology are hard to measure and are complex, and that no simple answer is true. We may all differ for genetic and other reasons to such an extent that population averages are untrustworthy, or our habits may change enough that studies don't get consistent answers. Or asking about one such risk factor when diets and lifestyles are complex is a science modus operandi that developed for studying simpler things (like exposure to toxins or bacteria, the basis of classical epidemiology), and we simply need a better gestalt from which to work.
Clearly a contributory sociological factor is that the science industry has simply been cruising down the same rails despite constant popping of promise bubbles, for decades now. It's always more money for more and bigger studies. It's rarely let's stop and take a deep breath and think of some better way to understand (in this case) dietary relationships to physical traits. In times past, at least, most stories like the ancient Californian didn't get ink so widely and rapidly. But if I'm running a journal, or a media network, or am a journalist needing to earn my living, and I need to turn a buck, naturally I need to write about things that aren't yet understood.
Unfortunately, as we've noted before, the science industry is a hungry beast that needs its continual feeding, and (like our 3 cats) always demands more, more, and more. There are ways we could reform things, at least up to a point. We'll never end the fact that some scientists will claim almost anything to get attention, and we'll always be faced with data that suggest one thing that doesn't turn out that way. But we should be able to temper the level of BS and get back more to sober science rather than sausage factory 'productivity'. And educate the public that some questions can't be answered the way we'd like, or aren't being asked in the right way. But that is something science might address effectively, if it weren't so rushed and pressured to 'produce'.
One rationale for this is that unexpected findings--typically presented breathlessly as 'discoveries'--sell: they rate the headline. The caveats and doubts that might un-headline the story may be reported as well, but often buried in minimal terms late in the report. Even if the report balances skeptics and claimants, simply publishing the story is enough to give at least some credence to the discovery.
The science journalism industry is heavily inflated in our commercial, 24/7 news environment. It would be better for science, if not for sales, if all these hyped papers, rather than being publicized at the time the paper is published, first appeared in musty journals for specialists to argue over, and in the pop-sci news only after some mature judgments are made about them. Of course, that's not good for commercial or academic business.
We have just seen a piece reporting that humans were in California something like 135,000 years ago, rather than the well-established continental dates of about 12,000. The report which I won't grace by citing here, and you've probably seen it anyway, then went on to speculate about what 'species' of our ancestors these early guys might have been.
Why is this so questionable? If it were a finding on its own, it might seem credible, but given the plethora of skeletal and cultural archeological findings, up and down the Americas, such an ancient habitation seems a stretch. There is no comparable trail of earlier settlements in northeast Asia or Alaska that might suggest it, and there are lots of animal and human archeological remains--all basically consistent with each other, so why has no earlier finding yet been made? It is of course possible that this is the first and is a correct one, but it is far too soon for this to merit a headline story, even with caveats.
Another piece we saw today reported that a new analysis casts doubt on whether diets high in saturated fat are bad for you. This was a meta-analysis of various other studies that have been done, and got some headline treatment because the authors report that, contrary to many findings over many years, saturated fats don't clog arteries. Instead, they say, coronary heart disease is a chronic inflammatory condition. Naturally, the study's basic data are being challenged, as reflected in this story's discussion, by critiques of its data and method. These get into details we're not qualified to judge, and we can't comment on the relative merits of the case.
However, one thing we can note is that with respect to coronary heart disease, study after study has reported more or less the same, or at least consistent findings about the correlation between saturated fats and risk. Still, despite so very much careful science, including physiological studies as well as statistical analysis of population samples, can we still apparently not be sure about a dietary component that we've been told for years should play a much reduced role in what we eat? How on earth could we possibly still not know about saturated fat diets and disease risk?
If this very basic issue is unresolved after so long, and the story is similar for risk factors for many complex diseases, then what is all this promise of 'precise' medicine all about? Causal explanations are still fundamentally unclear for many cancers, dementias, psychiatric disorders, heart disease, and so on. So why isn't the most serious conclusion that our methods and approaches themselves are for some reason simply not adequate to answer such seemingly simple questions as 'is saturated fat bad for you?' Were the plethora of previous studies all flawed in some way? Is the current study? Do the publicizing of the studies themselves change behaviors in ways that affects future studies?
There may be no better explanation than that diets and physiology are hard to measure and are complex, and that no simple answer is true. We may all differ for genetic and other reasons to such an extent that population averages are untrustworthy, or our habits may change enough that studies don't get consistent answers. Or asking about one such risk factor when diets and lifestyles are complex is a science modus operandi that developed for studying simpler things (like exposure to toxins or bacteria, the basis of classical epidemiology), and we simply need a better gestalt from which to work.
Clearly a contributory sociological factor is that the science industry has simply been cruising down the same rails despite constant popping of promise bubbles, for decades now. It's always more money for more and bigger studies. It's rarely let's stop and take a deep breath and think of some better way to understand (in this case) dietary relationships to physical traits. In times past, at least, most stories like the ancient Californian didn't get ink so widely and rapidly. But if I'm running a journal, or a media network, or am a journalist needing to earn my living, and I need to turn a buck, naturally I need to write about things that aren't yet understood.
Unfortunately, as we've noted before, the science industry is a hungry beast that needs its continual feeding, and (like our 3 cats) always demands more, more, and more. There are ways we could reform things, at least up to a point. We'll never end the fact that some scientists will claim almost anything to get attention, and we'll always be faced with data that suggest one thing that doesn't turn out that way. But we should be able to temper the level of BS and get back more to sober science rather than sausage factory 'productivity'. And educate the public that some questions can't be answered the way we'd like, or aren't being asked in the right way. But that is something science might address effectively, if it weren't so rushed and pressured to 'produce'.
Thursday, October 13, 2016
Genomic causation....or not
By
Ken Weiss
By Ken Weiss and Anne Buchanan
The Big Story in the latest Nature ("A radical revision of human genetics: Why many ‘deadly’ gene mutations are turning out to be harmless," by Erika Check Hayden) is that genes thought to be clearly causal of important diseases aren't always (the link is to the People magazine-like cover article in that issue.) This is a follow-up on an August Nature paper describing the database from which the results discussed in this week's Nature are drawn. The apparent mismatch between a gene variant and a trait can be, according to the paper, the result of technical error, a mis-call by a given piece of software, or due to the assumption that the identification of a given mutation in affected but not healthy individuals means the causal mutation has been found, without experimentally confirming the finding--which itself can be tricky for reasons we'll discuss. Insufficient documentation of 'normal' sequence variation has meant that the frequency of so-called causal mutations hasn't been available for comparative purposes. Again we'll mention below what 'insufficient' might mean, if anything.
People in general and researchers in particular need to be more than dismissively aware of these issues, but the conclusion that we still need to focus on single genes as causal of most disease, that is, do MuchMoreOfTheSame, which is an implication of the discussion, is not so obviously justified. We'll begin with our usual contrarian statement that the idea here is being overhyped as if it were new, but we know that except for its details it clearly is not, for reasons we'll also explain. That is important because presenting it as a major finding, and still focusing on single genes as being truly causal vs mistakenly identified, ignores what we think the deeper message needs to be.
The data come from a mega-project known as ExAC, a consortium of researchers sharing DNA sequences to document genetic variation and further understand disease causation, and now including data from approximately 60,000 individuals (in itself, rather small compared to the need for purpose). The data are primarily exome sequences, that is, from protein-coding regions of the human genome, not from whole genome sequences, again a major issue. We have no reason at all to critique the original paper itself, which is large, sophisticated, and carefully analyzed as far as we can tell; but the excess claims about its novelty are we think very much hyperbolized, and that needs to be explained.
Some of the obvious complicating issues
We know that a gene generally does not act alone. DNA in itself is basically inert. We've been and continue to be misled by examples of gene causation in which context and interactions don't really matter much, but that leads us still to cling to these as though they are the rule. This reinforces the yearning for causal simplicity and tractability. Essentially even this ExAC story, or its public announcements, doesn't properly acknowledge causal context and complexity because it is critiquing some simplistic single-gene inferences, and assuming that the problems are methodological rather than conceptual.
There are many aspects of causal context that complicate the picture, that are not new and we're not making them up, but which the Bigger-than-Ever Data pleas don't address:
Well, yes, we're always critical, but this new finding isn't really a surprise
To some readers we are too often critical, and at least some of us have to confess to a contrarian nature. But here is why we say that these new findings, like so many that are by the grocery checkout in Nature, Science, and People magazines, while seemingly quite true, should not be treated as a surprise or a threat to what we've already known--nor a justification of just doing more, or much more of the same.
Gregor Mendel studied fully penetrant (deterministic) causation. That is what we now know to be 'genes', in which the presence of the causal allele (in 2-allele systems) always caused the trait (green vs yellow peas, etc.; the same is true of recessive as dominant traits, given the appropriate genotype). But this is generally wrong, save at best for the exceptions such as those that Mendel himself knowingly and carefully chose to study. But even this was not so clear! Mendel has been accused of 'cheating' by ignoring inconsistent results. This may have been data fudging, but it is at least as likely to have been reacting to what we have known for a century as 'incomplete penetrance'. (Ken wrote on this a number of years ago in one of his Evolutionary Anthropology columns.) For whatever reason--and see below--the presence of a 'dominant' gene or 'recessive' homozyosity at a 'causal' gene doesn't always lead to the trait.
In most of the 20th century the probabilistic nature of real-world as opposed to textbook Mendelism has been completely known and accepted. The reasons for incomplete penetrance were not known and indeed we had no way to know them as a rule. Various explanations were offered, but the statistical nature of the inferences (estimates of penetrance probability, for example) were common practice and textbook standards. Even the original authors acknowledge incomplete penetrance, but this essentially shows that what the ExAC consortium is reporting are details but nothing fundamentally new nor surprising. Clinicians or investigators acting as if a variant were always causal should be blamed for gross oversimplification, and so should hyperbolic news media.
Recent advances such as genomewide association studies (GWAS) in various forms have used stringent statistical criteria to minimize false discovery. This has led to mapped 'hits' that satisfied those criteria only accounting for a fraction of estimated overall genomic causation. This was legitimate in that it didn't leave us swamped with hundreds of very weak or very rare false positive genome locations. But even the acceptable, statistically safest genome sites showed typically small individual effects and risks far below 1.0. They were not 'dominant' in the usual sense. That means that people with the 'causal' allele don't always, and in fact do not usually, have the trait. This has been the finding for quantitative traits like stature and qualitative ones like presence of diabetes, heart attack-related events, psychiatric disorders and essentially all traits studied by GWAS. It is not exactly what the ExAC data were looking at, but it is highly relevant and is the relevant basic biological principle.
This does not necessarily mean that the target gene is not important for the disease trait, which seems to be one of the inferences headlined in the news splashes. This is treated as a striking or even fundamental new finding, but it is nothing of that sort. Indeed, the genes in question may not be falsely identified, but may very well contribute to risk in some people under some conditions at some age and in some environments. The ExAC results don't really address this because (for example) to determine when a gene variant is a risk variant one would have to identify all the causes of 'incomplete penetrance' in every sample, but there are multiple explanations for incomplete penetrance, including the list of 1 - 13 above as well as methodological issues such as those pointed out by the ExAC project paper itself.
In addition, there may be 'protective' variants in the other regions of the genome (that is, the trait may need the contribution of many different genome regions), and working that out would typically involve "hyper astronomical" combinations of effects using unachievable, not to mention uninterpretable, sample sizes--from which one would have to estimate risk effects of almost uncountable numbers of sequence variants. If there were, say, 100 other contributing genes, each with their own variant genotypes including regulatory variants, the number of combinations of backgrounds one would have to sort through to see how they affected the 'falsely' identified gene is effectively uncountable.
Even the most clearly causal genes such as variants of BRCA1 and breast cancer have penetrance far less than 1.0 in recent data (here referring to lifetime risk; risk at earlier ages is very far from 1.0). The risk, though clearly serious, depends on cohort, environmental and other mainly unknown factors. Nobody doubts the role of BRCA1 but it is not in itself causal. For example, it appears to be a mutation repair gene, but if no (or not enough) cancer-related mutations arise in the breast cells in a woman carrying a high-risk BRCA1 allele, she will not get breast cancer as a result of that gene's malfunction.
There are many other examples of mapping that identified genes that even if strongly and truly associated with a test trait have very far from complete penetrance. A mutation in HFE and hemochromatosis comes to mind: in studies of some Europeans, a particular mutation seemed always to be present, but if the gene itself were tested in a general data base, rather than just in affected people, it had little or no causal effect. This seems to be the sort of thing the ExAC report is finding.
The generic reason is again that genes, essentially all genes, work only in their context. That context includes 'environment', which refers to all the other genes and cells in the body and the external or 'lifestyle' factors, and also age and sex as well. There is no obvious way to identify, evaluate or measure the effects of all possibly relevant lifestyle effects, and since these change, retrospective evaluation has unknown bearing on future risk (the same can be said of genomic variants for the same reason). How could these even be sampled adequately?
Likewise, volumes of long-existing experimental and highly focused results tell the same tale. Transgenic mice, for example, in which the same mutation is introduced into their 'same' gene as in humans, very often show little or no, or only strain-specific effects. This is true in other experimental organisms. The lesson, and it's by far not a new or very recent one, is that genomic context is vitally important, that is, it is person-specific genomic backgrounds of a target gene that affect the latter's effect strength--and vice versa: that is, the same is true for each of these other genes. That is why to such an extent we have long noted the legerdemain being foist on the research and public communities by the advocates of Big Data statistical testing. Certainly methodological errors are also a problem, as the Nature piece describes, but they aren't the only problem.
So if someone reports some cases of a trait that seem too often to involve a given gene, such as the Nature piece seems generally to be about, but searches of unaffected people also occasionally find the same mutations in such genes (especially when only exomes are considered), then we are told that this is a surprise. It is, to be sure, important to know, but it is just as important to know that essentially the same information has long been available to us in many forms. It is not a surprise--even if it doesn't tell us where to go in search of genetic, much less genomic, causation.
Sorry, though it's important knowledge, it's not 'radical' nor dependent on these data!
The idea being suggested is that (surprise, surprise!) we need much more data to make this point or to find these surprisingly harmless mutations. That is simply a misleading assertion, or attempted justification, though it has become the intentional industry standard closing argument.
It is of course very possible that we're missing some aspects of the studies and interpretations that are being touted, but we don't think that changes the basic points being made here. They're consistent with the new findings but show that for many very good reasons this is what we knew was generally the case, that 'Mendelian' traits were the exception that led to a century of genetic discovery but only because it focused attention on what was then doable (while, not widely recognized by human geneticists, in parallel, agricultural genetics of polygenic traits showed what was more typical).
But now, if things are being recognized as being contextual much more deeply than in Francis' Collins money-strategy-based Big Data dreams, or 'precision' promises, and our inferential (statistical) criteria are properly under siege, we'll repeat our oft-stated mantra: deeply different, reformed understanding is needed, and a turn to research investment focused on basic science rather than exhaustive surveys, and on those many traits whose causal basis really is strong enough that it doesn't really require this deeper knowledge. In a sense, if you need massive data to find an effect, then that effect is usually very rare and/or very weak.
And by the way, the same must be true for normal traits, like stature, intelligence, and so on, for which we're besieged with genome-mapping assertions, and this must also apply to ideas about gene-specific responses to natural selection in evolution. Responses to environment (diet etc.) manifestly have the same problem. It is not just a strange finding of exome mapping studies for disease. Likewise, 'normal' study subjects now being asked for in huge numbers may get the target trait later on in their lives, except for traits basically present early in life. One can't doubt that misattributing the cause of such traits is an important problem, but we need to think of better solutions that Big Big Data, because not confirming a gene doesn't help, or finding that 'the' gene is only 'the' gene in some genomic or environmental backgrounds is the proverbial and historically frustrating needle in the haystack search. So the story's advocated huge samples of 'normals' (random individuals) cannot really address the causal issue definitively (except to show what we know, that there's a big problem to be solved). Selected family data may--may--help identify a gene that really is causal, but even they have some of the same sorts of problems. And may apply only to that family.
The ExAC study is focused on severe diseases, which is somewhat like Mendel's selective approach, because it is quite obvious that complex diseases are complex. It is plausible that severe, especially early onset diseases are genetically tractable, but it is not obvious that ever more data will answer the challenge. And, ironically, the ExAC study has removed just such diseases from their consideration! So they're intentionally showing what is well known, that we're in needle in haystacks territory, even when someone has reported big needles.
Finally, we have to add that these points have been made by various authors for many years, often based on principles that did not require mega-studies to show. Put another way, we had reason to expect what we're seeing, and years of studies supported that expectation. This doesn't even consider the deep problems about statistical inference that are being widely noted and the deeply entrenched nature of that approach's conceptual and even material invested interests (see this week's Aeon essay, e.g.). It's time to change, but doing so would involve deeply revising how resources are used--of course one of our common themes here on the MT--and that is a matter almost entirely of political economy, not science. That is, it's as much about feeding the science industry as it is about medicine and public health. And that is why it's mainly about business as usual rather than real reform.
The Big Story in the latest Nature ("A radical revision of human genetics: Why many ‘deadly’ gene mutations are turning out to be harmless," by Erika Check Hayden) is that genes thought to be clearly causal of important diseases aren't always (the link is to the People magazine-like cover article in that issue.) This is a follow-up on an August Nature paper describing the database from which the results discussed in this week's Nature are drawn. The apparent mismatch between a gene variant and a trait can be, according to the paper, the result of technical error, a mis-call by a given piece of software, or due to the assumption that the identification of a given mutation in affected but not healthy individuals means the causal mutation has been found, without experimentally confirming the finding--which itself can be tricky for reasons we'll discuss. Insufficient documentation of 'normal' sequence variation has meant that the frequency of so-called causal mutations hasn't been available for comparative purposes. Again we'll mention below what 'insufficient' might mean, if anything.
People in general and researchers in particular need to be more than dismissively aware of these issues, but the conclusion that we still need to focus on single genes as causal of most disease, that is, do MuchMoreOfTheSame, which is an implication of the discussion, is not so obviously justified. We'll begin with our usual contrarian statement that the idea here is being overhyped as if it were new, but we know that except for its details it clearly is not, for reasons we'll also explain. That is important because presenting it as a major finding, and still focusing on single genes as being truly causal vs mistakenly identified, ignores what we think the deeper message needs to be.
The data come from a mega-project known as ExAC, a consortium of researchers sharing DNA sequences to document genetic variation and further understand disease causation, and now including data from approximately 60,000 individuals (in itself, rather small compared to the need for purpose). The data are primarily exome sequences, that is, from protein-coding regions of the human genome, not from whole genome sequences, again a major issue. We have no reason at all to critique the original paper itself, which is large, sophisticated, and carefully analyzed as far as we can tell; but the excess claims about its novelty are we think very much hyperbolized, and that needs to be explained.
Some of the obvious complicating issues
We know that a gene generally does not act alone. DNA in itself is basically inert. We've been and continue to be misled by examples of gene causation in which context and interactions don't really matter much, but that leads us still to cling to these as though they are the rule. This reinforces the yearning for causal simplicity and tractability. Essentially even this ExAC story, or its public announcements, doesn't properly acknowledge causal context and complexity because it is critiquing some simplistic single-gene inferences, and assuming that the problems are methodological rather than conceptual.
There are many aspects of causal context that complicate the picture, that are not new and we're not making them up, but which the Bigger-than-Ever Data pleas don't address:
1. Current data are from blood-samples and that may not reflect the true constitutive genome because of early somatic mutation, and this will vary among study subjects,
2. Life-long exposure to local somatic mutation is not considered nor measured,
3. Epigenetic changes, especially local tissue-specific ones, are not included,
4. Environmental factors are not considered, and indeed would be hard to consider,
5. Non-Europeans, and even many Europeans are barely included, if at all, though this is beginning to be addressed,
6. Regulatory variation, which GWAS has convincingly shown is much more important to most traits than coding variation, is not included. Exome data have been treated naively by many investigators as if that is what is important, and exome-only data have been used a major excuse for Great Big Grants that can't find what we know is probably far more important,
7. Non-coding regions, non-regulatory RNA regions are not included in exome-only data,
8. A mutation may be causal in one context but not in others, in one family or population and not others, rendering the determination that it's a false discovery difficult,
9. Single gene analysis is still the basis of the new 'revelations', that is, the idea being hinted at that the 'causal' gene isn't really causal....but one implicit notion is that it was misidentified, which is perhaps sometimes true but probably not always so,
10. The new reports are presented in the news, at least, as if the gene is being exonerated of its putative ill effects. But that may not be the case, because if the regulatory regions near the mutated gene have no or little activity, the 'bad' gene may simply not be being expressed. Its coding sequence could falsely be assumed to be harmless,
11. Many aspects of this kind of work are dependent on statistical assumptions and subjective cutoff values, a problem recently being openly recognized,
12. Bigger studies introduce all sorts of statistical 'noise', which can make something appear causal or can weaken its actual apparent cause. Phenotypes can be measured in many ways, but we know very well that this can be changeable and subjective (and phenotypes are not very detailed in the initial ExAC database),
13. Early reports of strong genetic findings have well known upward bias in effect size, the finder's curse that later work fails to confirm.
Well, yes, we're always critical, but this new finding isn't really a surprise
To some readers we are too often critical, and at least some of us have to confess to a contrarian nature. But here is why we say that these new findings, like so many that are by the grocery checkout in Nature, Science, and People magazines, while seemingly quite true, should not be treated as a surprise or a threat to what we've already known--nor a justification of just doing more, or much more of the same.
Gregor Mendel studied fully penetrant (deterministic) causation. That is what we now know to be 'genes', in which the presence of the causal allele (in 2-allele systems) always caused the trait (green vs yellow peas, etc.; the same is true of recessive as dominant traits, given the appropriate genotype). But this is generally wrong, save at best for the exceptions such as those that Mendel himself knowingly and carefully chose to study. But even this was not so clear! Mendel has been accused of 'cheating' by ignoring inconsistent results. This may have been data fudging, but it is at least as likely to have been reacting to what we have known for a century as 'incomplete penetrance'. (Ken wrote on this a number of years ago in one of his Evolutionary Anthropology columns.) For whatever reason--and see below--the presence of a 'dominant' gene or 'recessive' homozyosity at a 'causal' gene doesn't always lead to the trait.
In most of the 20th century the probabilistic nature of real-world as opposed to textbook Mendelism has been completely known and accepted. The reasons for incomplete penetrance were not known and indeed we had no way to know them as a rule. Various explanations were offered, but the statistical nature of the inferences (estimates of penetrance probability, for example) were common practice and textbook standards. Even the original authors acknowledge incomplete penetrance, but this essentially shows that what the ExAC consortium is reporting are details but nothing fundamentally new nor surprising. Clinicians or investigators acting as if a variant were always causal should be blamed for gross oversimplification, and so should hyperbolic news media.
Recent advances such as genomewide association studies (GWAS) in various forms have used stringent statistical criteria to minimize false discovery. This has led to mapped 'hits' that satisfied those criteria only accounting for a fraction of estimated overall genomic causation. This was legitimate in that it didn't leave us swamped with hundreds of very weak or very rare false positive genome locations. But even the acceptable, statistically safest genome sites showed typically small individual effects and risks far below 1.0. They were not 'dominant' in the usual sense. That means that people with the 'causal' allele don't always, and in fact do not usually, have the trait. This has been the finding for quantitative traits like stature and qualitative ones like presence of diabetes, heart attack-related events, psychiatric disorders and essentially all traits studied by GWAS. It is not exactly what the ExAC data were looking at, but it is highly relevant and is the relevant basic biological principle.
This does not necessarily mean that the target gene is not important for the disease trait, which seems to be one of the inferences headlined in the news splashes. This is treated as a striking or even fundamental new finding, but it is nothing of that sort. Indeed, the genes in question may not be falsely identified, but may very well contribute to risk in some people under some conditions at some age and in some environments. The ExAC results don't really address this because (for example) to determine when a gene variant is a risk variant one would have to identify all the causes of 'incomplete penetrance' in every sample, but there are multiple explanations for incomplete penetrance, including the list of 1 - 13 above as well as methodological issues such as those pointed out by the ExAC project paper itself.
In addition, there may be 'protective' variants in the other regions of the genome (that is, the trait may need the contribution of many different genome regions), and working that out would typically involve "hyper astronomical" combinations of effects using unachievable, not to mention uninterpretable, sample sizes--from which one would have to estimate risk effects of almost uncountable numbers of sequence variants. If there were, say, 100 other contributing genes, each with their own variant genotypes including regulatory variants, the number of combinations of backgrounds one would have to sort through to see how they affected the 'falsely' identified gene is effectively uncountable.
Even the most clearly causal genes such as variants of BRCA1 and breast cancer have penetrance far less than 1.0 in recent data (here referring to lifetime risk; risk at earlier ages is very far from 1.0). The risk, though clearly serious, depends on cohort, environmental and other mainly unknown factors. Nobody doubts the role of BRCA1 but it is not in itself causal. For example, it appears to be a mutation repair gene, but if no (or not enough) cancer-related mutations arise in the breast cells in a woman carrying a high-risk BRCA1 allele, she will not get breast cancer as a result of that gene's malfunction.
There are many other examples of mapping that identified genes that even if strongly and truly associated with a test trait have very far from complete penetrance. A mutation in HFE and hemochromatosis comes to mind: in studies of some Europeans, a particular mutation seemed always to be present, but if the gene itself were tested in a general data base, rather than just in affected people, it had little or no causal effect. This seems to be the sort of thing the ExAC report is finding.
The generic reason is again that genes, essentially all genes, work only in their context. That context includes 'environment', which refers to all the other genes and cells in the body and the external or 'lifestyle' factors, and also age and sex as well. There is no obvious way to identify, evaluate or measure the effects of all possibly relevant lifestyle effects, and since these change, retrospective evaluation has unknown bearing on future risk (the same can be said of genomic variants for the same reason). How could these even be sampled adequately?
Likewise, volumes of long-existing experimental and highly focused results tell the same tale. Transgenic mice, for example, in which the same mutation is introduced into their 'same' gene as in humans, very often show little or no, or only strain-specific effects. This is true in other experimental organisms. The lesson, and it's by far not a new or very recent one, is that genomic context is vitally important, that is, it is person-specific genomic backgrounds of a target gene that affect the latter's effect strength--and vice versa: that is, the same is true for each of these other genes. That is why to such an extent we have long noted the legerdemain being foist on the research and public communities by the advocates of Big Data statistical testing. Certainly methodological errors are also a problem, as the Nature piece describes, but they aren't the only problem.
So if someone reports some cases of a trait that seem too often to involve a given gene, such as the Nature piece seems generally to be about, but searches of unaffected people also occasionally find the same mutations in such genes (especially when only exomes are considered), then we are told that this is a surprise. It is, to be sure, important to know, but it is just as important to know that essentially the same information has long been available to us in many forms. It is not a surprise--even if it doesn't tell us where to go in search of genetic, much less genomic, causation.
Sorry, though it's important knowledge, it's not 'radical' nor dependent on these data!
The idea being suggested is that (surprise, surprise!) we need much more data to make this point or to find these surprisingly harmless mutations. That is simply a misleading assertion, or attempted justification, though it has become the intentional industry standard closing argument.
It is of course very possible that we're missing some aspects of the studies and interpretations that are being touted, but we don't think that changes the basic points being made here. They're consistent with the new findings but show that for many very good reasons this is what we knew was generally the case, that 'Mendelian' traits were the exception that led to a century of genetic discovery but only because it focused attention on what was then doable (while, not widely recognized by human geneticists, in parallel, agricultural genetics of polygenic traits showed what was more typical).
But now, if things are being recognized as being contextual much more deeply than in Francis' Collins money-strategy-based Big Data dreams, or 'precision' promises, and our inferential (statistical) criteria are properly under siege, we'll repeat our oft-stated mantra: deeply different, reformed understanding is needed, and a turn to research investment focused on basic science rather than exhaustive surveys, and on those many traits whose causal basis really is strong enough that it doesn't really require this deeper knowledge. In a sense, if you need massive data to find an effect, then that effect is usually very rare and/or very weak.
And by the way, the same must be true for normal traits, like stature, intelligence, and so on, for which we're besieged with genome-mapping assertions, and this must also apply to ideas about gene-specific responses to natural selection in evolution. Responses to environment (diet etc.) manifestly have the same problem. It is not just a strange finding of exome mapping studies for disease. Likewise, 'normal' study subjects now being asked for in huge numbers may get the target trait later on in their lives, except for traits basically present early in life. One can't doubt that misattributing the cause of such traits is an important problem, but we need to think of better solutions that Big Big Data, because not confirming a gene doesn't help, or finding that 'the' gene is only 'the' gene in some genomic or environmental backgrounds is the proverbial and historically frustrating needle in the haystack search. So the story's advocated huge samples of 'normals' (random individuals) cannot really address the causal issue definitively (except to show what we know, that there's a big problem to be solved). Selected family data may--may--help identify a gene that really is causal, but even they have some of the same sorts of problems. And may apply only to that family.
The ExAC study is focused on severe diseases, which is somewhat like Mendel's selective approach, because it is quite obvious that complex diseases are complex. It is plausible that severe, especially early onset diseases are genetically tractable, but it is not obvious that ever more data will answer the challenge. And, ironically, the ExAC study has removed just such diseases from their consideration! So they're intentionally showing what is well known, that we're in needle in haystacks territory, even when someone has reported big needles.
Finally, we have to add that these points have been made by various authors for many years, often based on principles that did not require mega-studies to show. Put another way, we had reason to expect what we're seeing, and years of studies supported that expectation. This doesn't even consider the deep problems about statistical inference that are being widely noted and the deeply entrenched nature of that approach's conceptual and even material invested interests (see this week's Aeon essay, e.g.). It's time to change, but doing so would involve deeply revising how resources are used--of course one of our common themes here on the MT--and that is a matter almost entirely of political economy, not science. That is, it's as much about feeding the science industry as it is about medicine and public health. And that is why it's mainly about business as usual rather than real reform.
Subscribe to:
Posts (Atom)



