As I've posted previously, I have been recovering from multiple bypass heart surgery. I had some angina, vague superficial chest pain that is a symptom of impending heart attack (or, more properly) coronary artery blockage. Fortunately, I am educated enough to recognize that this chest pain wasn't from my doing a new kind of exercise, I went to the doc and--to make a long story short--was sent right off to the hospital for heart bypass surgery (replacing clogged coronary arteries).
The radiography showed that at least some of my heart arteries were clogged--with whatever radio-opaque goop, presumably including cholesterol, and by whatever clogging mechanism. These causal facts are, as I understand things, complex and not completely understood, but the upshot was clear: surgery.....or else!
Now, the doctors would say that, given this evidence, I was at high risk of potentially lethal heart disease. I'm sure had the opportunity been there (and it may be in some future doctor's appointment), I will be chided--or scolded--for my bad diet, too much cholesterol, etc. It will be assumed that my voluntary lifestyle choices caused my blockage and my need for preventive artery replacement. Bad boy! Bad diet! Tsk, tsk, tsk....
But is that right, or might it be the opposite of a more serious truth?
What is bad behavior, health-wise?
I am 77. This is beyond the usual 76-ish life expectancy for US males (searching the ad-laden web to find data has become mainly a challenge to wade through the relentless commercialism). So, my lifestyle cannot be viewed as bad behavior in this respect. Indeed, I have already lived longer than half my birth cohort! So perhaps my diet and whatever else can, or should be viewed as having been protective. After all, I was symptom-free until after my expected lifespan.
It is very difficult to understand what 'risk' means in such regards. If my lifestyle led to my artery becoming clogged up, but it didn't happen until after I'd out-lived my average peer, can I legitimately think of that lifestyle as having been protective rather than risky? We all have to get some final disorder and some point, so is the absolute cause the relevant fact, or is it the relative? How can we decide such questions, if indeed they are meaningful ones that can even have meaningful answers?
If my behavior (for whatever reason, including just plain luck) led to my surviving in very good health, except for one weakest-link, then does that link suggest I've behaved badly, or does my overall great state of health suggest the opposite? More to the point, how can such questions even be answered in a meaningful sense? They seem meaningful....until you think a bit more carefully about them. . . . .
The philosophical quick-sand doesn't stop there. If my arterial clog would have led to a relatively quick death--not a 'premature' death at my age!--but saved me from some worse, more prolonged or debilitating fate, can we seriously view that as preventive or protective, with me now facing those dreadful fates?
When we have competing causes and inevitable mortality, we have to view the causes, and what causes them, in a rather different light. That doesn't mean there are consensus, much less easy, answers. But it may mean that rules for 'healthy' behavior are not so obvious as they seem to be.
Showing posts with label prediction. Show all posts
Showing posts with label prediction. Show all posts
Friday, September 6, 2019
Sunday, August 11, 2019
Who, me?? Why did I clog my 'widow maker'? [on medical cause and effect...and how we know, if we know]
By
Ken Weiss
So, having just returned from and now recuperating from coronary bypass surgery, I have to ask the 'complexity' question--a very personal one in this case: Why me? I've lived a physically and physiologically vigorous life. My diet may not have always been the very best for cardio health (though, for reasons we've discussed here many times over the years, it's not completely clear what that diet should actually be), but it wasn't particularly bad, given what's thought these days to be a "healthy" diet.
The surgeon who remodeled me at Penn State's fine medical complex in Hershey, said he knows the risk factors in a population but couldn't know why any given individual developed clogged coronary arteries, nor which artery would be affected. His job was to replace, not explain them, one might say. So, he didn't even attempt to tell me why I was now in need of bypass surgery.
As he said, there are five known major risk factors: obesity, unhealthy diet, high cholesterol, genetic predisposition, and smoking. Yes, having diabetes and high blood pressure are risk factors as well, but correlated enough with obesity that perhaps he considers these two conditions to be side effects of obesity. In any case, these risk factors have been determined by looking at associations between possible causal variables and heart disease in populations. Resulting statistics describe the population, not identifying specific high-risk individuals within it. Indeed, some people with heart disease have all the risk factors, some have a combination of a few, and some have none. And even then, it's not possible to say which was the cause of the disease in most individual cases.
I have none of these risk factors -- though, I could make up a story. I smoked when I was young, my father had a pacemaker when he was old, but he lived to 99. Still, I have done vigorous exercise my whole life, thinking that was my "get cancer program" since it meant, I thought, that I would go not out with a coronary. What caused my artery to clog? Indeed, why in my case was the clog in an unstentable artery location, and hence required major surgery?
This brings up, again, the question of whether one's individual risk can even be known with any sort of 'precision'. Or is that an illusion? Is it a culpably false promise made by the calculating Dr Collins at NIH, to get NIH funding, rather than to give the public a realistic understanding of what we know and what we can hope to know based on research investment of the type he favors?
How, based on current methods of science, can it really be individual? What kind of information would that require, just considering actual, i.e., past effects, assuming they could really be ascertained to any reasonable measurement standard? What would you need to consider? Diet, exercise, personality (temperament, for example). Climate? Profession? The effects of war, drought, epidemic? Genes, even?
Of course, the gross and inexcusable BS of promising 'precision genomic medicine' based on very costly, open-ended genomic (and other 'omic) data collection enterprises is culpable. It is an often openly acknowledged way of getting, and keeping, mega-funding without having any real ideas (and understandable since medical schools culpably don't pay faculty salaries or basic research costs as part of their jobs). Focused science has chances of finding things out; blind data enumeration, far less so--and what we've done of that so far shows this quite clearly.
We often say 'family history', and clinically this may be the most useful piece of predictive information, but what does that actually explain? Did Dad or Uncle Jane have the same trait because of genes, or because of their shared family habits and lifestyles? How could you really tell? A surgeon need not care, as their job is to fix the clogged pipes, and if heart disease runs in a family the physician will treat his or her patient as high risk. Still, to prevent this sort of thing, we need to know what causes it.
This is a central biomedical question! It is hard enough to know, much less accurately measure, all factors in life that might in this or that way be a 'risk' factor for a given disease, like clogged coronary plumbing. Is it a delusion to think we could identify, much less measure all the factors? If, as seems obvious, there isn't just a single factor, and probably everyone's exposure set is different (and their effects need not be 'additive'), how on earth can we even know how well we are measuring, or ascertaining, such factors?
And, if we can do this, it only applies directly to current cases and their past lifestyle exposures. But what we would like to do, for individuals and for public health, is to predict the future to lower risks. However, there is no way, not even in principle, no reasonable chance of knowing what future exposures will be, not even for populations. Diets and lifestyles change in ways we cannot predict, nor can we predict major future events--climate, war, pestilence, food types and availability, etc., that would be highly relevant.
So what should we do with our understanding of these unpredictable factors? Perhaps just level with patients and the public, and stop using the public to endow a particular, and particularly costly, part of the university research empire. Maybe a return to focused, hypothesis--based research--actual science--in my view.
The surgeon who remodeled me at Penn State's fine medical complex in Hershey, said he knows the risk factors in a population but couldn't know why any given individual developed clogged coronary arteries, nor which artery would be affected. His job was to replace, not explain them, one might say. So, he didn't even attempt to tell me why I was now in need of bypass surgery.
As he said, there are five known major risk factors: obesity, unhealthy diet, high cholesterol, genetic predisposition, and smoking. Yes, having diabetes and high blood pressure are risk factors as well, but correlated enough with obesity that perhaps he considers these two conditions to be side effects of obesity. In any case, these risk factors have been determined by looking at associations between possible causal variables and heart disease in populations. Resulting statistics describe the population, not identifying specific high-risk individuals within it. Indeed, some people with heart disease have all the risk factors, some have a combination of a few, and some have none. And even then, it's not possible to say which was the cause of the disease in most individual cases.
I have none of these risk factors -- though, I could make up a story. I smoked when I was young, my father had a pacemaker when he was old, but he lived to 99. Still, I have done vigorous exercise my whole life, thinking that was my "get cancer program" since it meant, I thought, that I would go not out with a coronary. What caused my artery to clog? Indeed, why in my case was the clog in an unstentable artery location, and hence required major surgery?
This brings up, again, the question of whether one's individual risk can even be known with any sort of 'precision'. Or is that an illusion? Is it a culpably false promise made by the calculating Dr Collins at NIH, to get NIH funding, rather than to give the public a realistic understanding of what we know and what we can hope to know based on research investment of the type he favors?
How, based on current methods of science, can it really be individual? What kind of information would that require, just considering actual, i.e., past effects, assuming they could really be ascertained to any reasonable measurement standard? What would you need to consider? Diet, exercise, personality (temperament, for example). Climate? Profession? The effects of war, drought, epidemic? Genes, even?
Of course, the gross and inexcusable BS of promising 'precision genomic medicine' based on very costly, open-ended genomic (and other 'omic) data collection enterprises is culpable. It is an often openly acknowledged way of getting, and keeping, mega-funding without having any real ideas (and understandable since medical schools culpably don't pay faculty salaries or basic research costs as part of their jobs). Focused science has chances of finding things out; blind data enumeration, far less so--and what we've done of that so far shows this quite clearly.
We often say 'family history', and clinically this may be the most useful piece of predictive information, but what does that actually explain? Did Dad or Uncle Jane have the same trait because of genes, or because of their shared family habits and lifestyles? How could you really tell? A surgeon need not care, as their job is to fix the clogged pipes, and if heart disease runs in a family the physician will treat his or her patient as high risk. Still, to prevent this sort of thing, we need to know what causes it.
This is a central biomedical question! It is hard enough to know, much less accurately measure, all factors in life that might in this or that way be a 'risk' factor for a given disease, like clogged coronary plumbing. Is it a delusion to think we could identify, much less measure all the factors? If, as seems obvious, there isn't just a single factor, and probably everyone's exposure set is different (and their effects need not be 'additive'), how on earth can we even know how well we are measuring, or ascertaining, such factors?
And, if we can do this, it only applies directly to current cases and their past lifestyle exposures. But what we would like to do, for individuals and for public health, is to predict the future to lower risks. However, there is no way, not even in principle, no reasonable chance of knowing what future exposures will be, not even for populations. Diets and lifestyles change in ways we cannot predict, nor can we predict major future events--climate, war, pestilence, food types and availability, etc., that would be highly relevant.
So what should we do with our understanding of these unpredictable factors? Perhaps just level with patients and the public, and stop using the public to endow a particular, and particularly costly, part of the university research empire. Maybe a return to focused, hypothesis--based research--actual science--in my view.
Tuesday, January 8, 2019
Susumu Ohno: Accounting for Why Gene Counting Doesn't Account for Things
By
Ken Weiss
The promise that for nearly two decades has been the main course on the 'omicists' menus, is that by counting--adding up the contributions of a list of enumerated genome locations--all our woes will be gone! The idea is simple: genes are fundamental to life because they code for proteins and stuff like that, which are the basis of life. This, in a nutshell, is the justification for much of the Big Data endeavors being sponsored by the NIH these days, long driven for historical reasons by an obsession with genes.
But, at least partly, this obsession has revealed to us what we should--and could--already have known. Genes are clearly fundamental to life, coding for proteins and other functions. But the reason we're seeing increasing weariness with GWAS and other fiscally high but scientifically low yield approaches is not new. It's not secret. And it is not a surprise. All we needed to do was to ask, where do genes come from? It is not a new question, the genome has been intensively studied, and indeed the answer has been known for nearly 50 (that is, fifty) years.
| Susumu Ohno (1928-2000), from Google images) |
So, what did Ohno say?
Where do 'genes' come from?
In his time, we didn't have much in the way of DNA sequencing. We knew that genes coded for proteins, and were located on chromosomes. We had learned a lot about how the code works, much of this from experiments, such as with bacteria. We knew proteins were fundamental building blocks of life, and were strings of amino acids. Watson and Crick and others had shown how DNA carries the relevant code, and so on.
But that did not answer the question: Where do all these genes come from? I'm not an historian, and cannot claim to know the many threads leading to the answer. But in essence, a point Ohno is credited for noting and whose importance he stressed, is that new genes largely arise from duplication events affecting existing genes. He had noticed amino acid similarities among some known proteins (hemoglobins); this and other evidence suggested that chromosomal or individual gene duplication was a mechanism, if not the mechanism, for the origin of new genes. Expecting random mutations in parts of DNA not already being used to code for RNA or DNA, to generate all the sequence aspects of a code for a new protein that would actually have some use, was too far-fetched. Indeed, nowadays one can be skeptical if an 'orphan' gene is claimed--that is, one not part of a gene family, of which there are also other genes in the genome.
Instead, if occasionally a stretch of DNA or even a whole chromosome duplicates, the individual inheriting that expanded genome gains two potentially important attributes. First, s/he has a redundant code; mutational errors in one gene that lead to a non-functional protein can be compensated for by the fact that an entirely different, duplicate gene exists and codes for the same protein.
Secondly, duplication is the basis of a much deeper, indeed fundamental aspect of life, going farther even than just gene: redundancy.
Evolution depends on redundancy: genomes are family affairs
By having redundant genes, the initial result of duplication, an individual is more likely to survive mutations. And over the long haul, with lots of duplication, the additional copies of a needed gene can mutate and over time take on new function, without threat to the individual, who will still have one or more healthy versions of the gene.
Indeed, perhaps one of the far under-appreciated but even fundamental axioms of life is that it is built on redundancy: not only are genomes almost exclusively carriers of members of gene families whose individual genes arose by duplication events, but our tissues themselves are constructed by repeating fundamental units: multicellular organization generally; bilateral or radial symmetry; blood cells, intestinal villi, lobes and alveoli in lungs, nephrons in kidneys, and so on.
I think it is not easy to imagine a different evolutionary way for our very simple biochemical beginnings to generate the kinds of complex organisms that populate the Earth. And this has deep consequences for those for whom dreams of omical sugar plums dance in their heads.
Why the 'omics' promises were always doomed to fail, or at least to pale
From the cell theory to Ohno to the very data that our 'omical dreams have yielded in extensive amounts, we have found that life relies on the protection of redundancy. From genes on up, if one thing goes wrong, there's an ally to pick up the slack. Redundancy means back-ups and alternatives. It also provides individual uniqueness, which is also fundamental to the dynamics evolution.
Together, these facts (and they're facts, not just wild speculations) show that, and why, we can't expect to predict everything from individual genes or even gene scores. There are many roads to the Promised Land.
It is important, I think, and entirely fair to assert that nothing I've said here has ever been secret, known only to a small, Masonic Lodge of biologists exchanging secret handshakes. Indeed, these basic facts have been at the heart of our science since the advent of the cell theory, centuries ago. Genomics has largely just added to what was already known as a generalization about life.
The implicit lesson, of Ohmo not Homer, is to Beware of Geneticists Bearing Gifts.
(updated to correct a spelling error in Prof. Ohno's name)
Thursday, December 20, 2018
Mr Darwin's new science
By
Ken Weiss
In his recent, marvelous must-read book, Naturalists in Paradise, about the major explorers of the Amazon (Wallace, Bates, and Spruce), John Hemming quotes Alfred Wallace lauding Darwin's Origin of Species in a letter to fellow explorer Henry Bates, saying that "Mr. Darwin has created a new science and a new philosophy."
I am no historian of science by any means, but I think that there is substance in this grand characterization. Prior to Darwin--taking him as representative and perhaps the most explicit spokesperson for the view--western science had viewed life as the result of one or more divine creation events. The inanimate, physical universe was also created 'In the beginning', but as a law-like place. At least, by Galileo and Newton and others, it had come formally to be seen as following universal mathematical principles. The key, in my view, is the 'universal' aspect of this view of existence.
Later, with various scientists leading up to them, Darwin, Wallace and a few others saw life itself as also having arisen at some 'beginning', but a natural one, and as a process, having diversified thereafter to what is here today. That process has come to be called evolution. The idea from Darwin's time to now, with no evidence of serious challenge, as he said, all life today has descended from some beginning in 'a few forms or into one' following universal 'laws acting all around us'. In asserting his view, in terms of laws, Darwin reflected his essentially explicit Newtonian viewpoint.
These 'laws' were, as specified by Darwin in the elegant last paragraph in his Origin of Species:
I often note my view that Darwin was a product of his times, that is, believing in 'laws' of Nature and a kind of determinism in evolution, and a poor sense of probabilism. As I note in some recent posts, I think he held this view, expressed clearly in terms of barnacles, but I wonder what he would have said about primates and humans in particular, as recently mused about here.
We're now in the world of probabilism in science, with fundamental probabilistic notions (mutation, drift, quantum mechanics, and so on). Darwin would certainly have understood the ideas, but I wonder what his view of the probabilistic aspects would be. That they were just a nuisance on the 'real' selective signal? That they challenged the idea of precise adaptation? How would they have affected his analysis of barnacles, as we've discussed recently here?
But is evolution law-like the way physics is? Physics' 'laws' are universals. Yet evolution (including selection) happens probabilistically, in the context of specific local circumstances. This seems at odds with Newtonian universality and its consequent determinism. Is anything missing in our 'theory' of life and evolution, for example, that vitiates promises of 'precision' genomic medicine? Or that could be used to derive such predictability? Or is it just that such promises are Newtonian, and don't fit the evolutionary living world in which we live?
Mr Darwin (and some contemporaries, in particular Alfred Wallace) founded a new science, but we have to go beyond his times to understand where that science will, or should, take us.
I am no historian of science by any means, but I think that there is substance in this grand characterization. Prior to Darwin--taking him as representative and perhaps the most explicit spokesperson for the view--western science had viewed life as the result of one or more divine creation events. The inanimate, physical universe was also created 'In the beginning', but as a law-like place. At least, by Galileo and Newton and others, it had come formally to be seen as following universal mathematical principles. The key, in my view, is the 'universal' aspect of this view of existence.
Later, with various scientists leading up to them, Darwin, Wallace and a few others saw life itself as also having arisen at some 'beginning', but a natural one, and as a process, having diversified thereafter to what is here today. That process has come to be called evolution. The idea from Darwin's time to now, with no evidence of serious challenge, as he said, all life today has descended from some beginning in 'a few forms or into one' following universal 'laws acting all around us'. In asserting his view, in terms of laws, Darwin reflected his essentially explicit Newtonian viewpoint.
These 'laws' were, as specified by Darwin in the elegant last paragraph in his Origin of Species:
- Growth with Reproduction
- Inheritance
- Variability 'from the indirect and direct action of the external conditions of life, and from use and disuse'
- Resource-limited rates of increase leading to a struggle for life
- Natural Selection, which led to divergence of Character and Extinction of less improved forms.
I often note my view that Darwin was a product of his times, that is, believing in 'laws' of Nature and a kind of determinism in evolution, and a poor sense of probabilism. As I note in some recent posts, I think he held this view, expressed clearly in terms of barnacles, but I wonder what he would have said about primates and humans in particular, as recently mused about here.
We're now in the world of probabilism in science, with fundamental probabilistic notions (mutation, drift, quantum mechanics, and so on). Darwin would certainly have understood the ideas, but I wonder what his view of the probabilistic aspects would be. That they were just a nuisance on the 'real' selective signal? That they challenged the idea of precise adaptation? How would they have affected his analysis of barnacles, as we've discussed recently here?
But is evolution law-like the way physics is? Physics' 'laws' are universals. Yet evolution (including selection) happens probabilistically, in the context of specific local circumstances. This seems at odds with Newtonian universality and its consequent determinism. Is anything missing in our 'theory' of life and evolution, for example, that vitiates promises of 'precision' genomic medicine? Or that could be used to derive such predictability? Or is it just that such promises are Newtonian, and don't fit the evolutionary living world in which we live?
Mr Darwin (and some contemporaries, in particular Alfred Wallace) founded a new science, but we have to go beyond his times to understand where that science will, or should, take us.
Wednesday, November 28, 2018
Induction-deduction, and replicability: is there any difference?
By
Ken Weiss
In what sense--what scientific sense--does the future resemble the past? Or perhaps, to what extent does it? Can we know? If we can't, then what credence for future prediction can we give to results of studies today, necessarily from the past experience of current samples? Similarly, in what sense can we extrapolate findings on this sample to some other sample or population? If these questions are not easily answerable (indeed if they are answerable at all!), then much of current, and currently very widespread and expensive science, is at best of unclear, questionable value.
We can look at these issues in terms of a couple of standard aspects of science: the relationship between induction and deduction; and the idea of replicability. Induction and deduction basically come from the Enlightenment time in western history, when it was found in a formal sense that the world of western science--which at that time meant physical science--followed universal 'laws' of Nature. At that time, life itself was generally excluded from this view, not least because it was believed to be the result of ad hoc creation events by God.
The induction--deduction problem
-----------------
Some terminology: I will make an important distinction between two terms. By induction I mean drawing a conclusion from specific observed data (e.g., estimating some presumed causal parameter's value). Essentially, this means inferring a conclusion from the past, from events that have already occurred. But often what we want to do is to predict the future. We do that, often implicitly, by equating observed past values as estimates of causal parameters, that apply generally and therefore to the future; I refer to that predictive process, derived from observed data, as deduction. So, for example, if I flip a coin 10 times and get 5 Heads, I assume that this is somehow built into the very nature of coin-flipping so that the probability of Heads on any future flip is 0.5 (50%).
-----------------
If we can assume that induction implies deduction, then what we observe in our present or past observations will persist so that we can predict it in the future. In a law-like universe, if we are sampling properly, this will occur and we generally assume this means with complete precision if we had perfect measurement (here I speculate, but I think that quantum phenomena at the appropriate scale have the same universally parametric properties).
Promises like 'precision genomic medicine', which I think amount to culpably public deceptions, effectively equate induction with deduction: we observe some genomic elements associated in some statistical way with some outcome, and assume that the same genome scores will similarly predict the future of people decades from now. There is no serious justification for this assumption at present, nor quantification of by how much there might be errors in assuming the predictive power of past observations, in part because mutations and lifestyle clearly have major effects, but especially because these are unpredictable--even in principle. Indeed, there is another, much deeper problem of a similar kind, that has gotten recent--but to me often quite naive attention: replicability.
The replicability problem
Studies, perhaps especially in social and behavioral fields, report findings that others cannot replicate. This is being interpreted as suggesting that (ignoring the rare outright fraud), there is some problem with our decision-making criteria, other forms of bias, or poor study designs. Otherwise, shouldn't studies of the same question agree? There has been a call for the investigators involved to improve their statistical analysis (i.e., keep buying the same software!! but use it better), report negative results, and so on.
But this is potentially, and I think fundamentally, naive. It assumes that such study results should be replicable. It assumes, as I would put it, that at the level of interest, life = physics. This is, I believe not just wrong but fundamentally so.
The assumption of replicability is not really different from equating induction to deduction, except in some subtle way applied to a more diverse set of conditions. Induction of genomic-based disease risk is done on a population like, say, case-control samples, and then applied to the same population in terms of its current members' future disease risks. But we know very well that different genotypes are found in different populations, so it is not clear what degree of predictability we should, or can, assume.
Replicability is similar except that in general a result is assumed to apply across populations or samples, not just to the same sample's future. That is, I think, an even broader assumption than the genomics-precision promise that does, at least nominally, now recognize population differences.
The real, the deeper problem is that we have absolutely no reason to expect any particular degree of replicability between samples for these kinds of things. Evolution is about variation, locally responsive and temporary, and that applies to social behavior as well. We know that 'distance' or difference accumulates (generally) gradually over time and separation as a property of cultural as well as biological evolution. The same obviously applies even more to psychological and sociological samples and inferences from them.
Not only is it silly to think that samples of, say, this year's college seniors at X University will respond to questionnaires in the same way as samples of some other class or university or beyond. Of course, college students come cheap to researchers, and they're convenient. But they are not 'representative' in the replicability sense except by some sort of rather profound assumption. This is obvious, yet it is a tacit concept of very much research (biological, psychological, and sociological).
Even social scientists acknowledge the local and temporary nature of many of the things they investigate, because the latter are affected by cultural and historical patterns, fads, fashions, and so much more. Indeed, the idea of replicability is to me curious to begin with. Thus, a study that fails to replicate some other study may not reflect failings in either, and the idea that we should replicate in this kind of way is a carryover of physics envy. Perhaps in many situations, a replication result is what should be examined most closely! The social and even biological realms are simply not as 'Newtonian', or law-like, as is the real physical realm in which our notions of science--especially the very idea of a law-like replicability, arose. Not only is failure to replicate not necessarily suspect at all, but replicability should not generally be assumed. Or, put an other way, a claim that replicability is to be expected is a strong claim about Nature that requires very strong evidence!
This raises the very deep problem that in the absence of replicability assumptions, we don't know what to expect of the next study, after we've done the first.....or is this a justification for just keeping the same studies going (and funded) indefinitely? That's of course the very rewarding game being played in genomics.
We can look at these issues in terms of a couple of standard aspects of science: the relationship between induction and deduction; and the idea of replicability. Induction and deduction basically come from the Enlightenment time in western history, when it was found in a formal sense that the world of western science--which at that time meant physical science--followed universal 'laws' of Nature. At that time, life itself was generally excluded from this view, not least because it was believed to be the result of ad hoc creation events by God.
The induction--deduction problem
-----------------
Some terminology: I will make an important distinction between two terms. By induction I mean drawing a conclusion from specific observed data (e.g., estimating some presumed causal parameter's value). Essentially, this means inferring a conclusion from the past, from events that have already occurred. But often what we want to do is to predict the future. We do that, often implicitly, by equating observed past values as estimates of causal parameters, that apply generally and therefore to the future; I refer to that predictive process, derived from observed data, as deduction. So, for example, if I flip a coin 10 times and get 5 Heads, I assume that this is somehow built into the very nature of coin-flipping so that the probability of Heads on any future flip is 0.5 (50%).
-----------------
If we can assume that induction implies deduction, then what we observe in our present or past observations will persist so that we can predict it in the future. In a law-like universe, if we are sampling properly, this will occur and we generally assume this means with complete precision if we had perfect measurement (here I speculate, but I think that quantum phenomena at the appropriate scale have the same universally parametric properties).
Promises like 'precision genomic medicine', which I think amount to culpably public deceptions, effectively equate induction with deduction: we observe some genomic elements associated in some statistical way with some outcome, and assume that the same genome scores will similarly predict the future of people decades from now. There is no serious justification for this assumption at present, nor quantification of by how much there might be errors in assuming the predictive power of past observations, in part because mutations and lifestyle clearly have major effects, but especially because these are unpredictable--even in principle. Indeed, there is another, much deeper problem of a similar kind, that has gotten recent--but to me often quite naive attention: replicability.
The replicability problem
Studies, perhaps especially in social and behavioral fields, report findings that others cannot replicate. This is being interpreted as suggesting that (ignoring the rare outright fraud), there is some problem with our decision-making criteria, other forms of bias, or poor study designs. Otherwise, shouldn't studies of the same question agree? There has been a call for the investigators involved to improve their statistical analysis (i.e., keep buying the same software!! but use it better), report negative results, and so on.
But this is potentially, and I think fundamentally, naive. It assumes that such study results should be replicable. It assumes, as I would put it, that at the level of interest, life = physics. This is, I believe not just wrong but fundamentally so.
The assumption of replicability is not really different from equating induction to deduction, except in some subtle way applied to a more diverse set of conditions. Induction of genomic-based disease risk is done on a population like, say, case-control samples, and then applied to the same population in terms of its current members' future disease risks. But we know very well that different genotypes are found in different populations, so it is not clear what degree of predictability we should, or can, assume.
Replicability is similar except that in general a result is assumed to apply across populations or samples, not just to the same sample's future. That is, I think, an even broader assumption than the genomics-precision promise that does, at least nominally, now recognize population differences.
The real, the deeper problem is that we have absolutely no reason to expect any particular degree of replicability between samples for these kinds of things. Evolution is about variation, locally responsive and temporary, and that applies to social behavior as well. We know that 'distance' or difference accumulates (generally) gradually over time and separation as a property of cultural as well as biological evolution. The same obviously applies even more to psychological and sociological samples and inferences from them.
Not only is it silly to think that samples of, say, this year's college seniors at X University will respond to questionnaires in the same way as samples of some other class or university or beyond. Of course, college students come cheap to researchers, and they're convenient. But they are not 'representative' in the replicability sense except by some sort of rather profound assumption. This is obvious, yet it is a tacit concept of very much research (biological, psychological, and sociological).
Even social scientists acknowledge the local and temporary nature of many of the things they investigate, because the latter are affected by cultural and historical patterns, fads, fashions, and so much more. Indeed, the idea of replicability is to me curious to begin with. Thus, a study that fails to replicate some other study may not reflect failings in either, and the idea that we should replicate in this kind of way is a carryover of physics envy. Perhaps in many situations, a replication result is what should be examined most closely! The social and even biological realms are simply not as 'Newtonian', or law-like, as is the real physical realm in which our notions of science--especially the very idea of a law-like replicability, arose. Not only is failure to replicate not necessarily suspect at all, but replicability should not generally be assumed. Or, put an other way, a claim that replicability is to be expected is a strong claim about Nature that requires very strong evidence!
This raises the very deep problem that in the absence of replicability assumptions, we don't know what to expect of the next study, after we've done the first.....or is this a justification for just keeping the same studies going (and funded) indefinitely? That's of course the very rewarding game being played in genomics.
Thursday, November 8, 2018
The horseshoe crab and the barnacle: induction vs deduction in evolution
By
Ken Weiss
Charles Darwin had incredible patience. After his many-year, global voyage on the HMS Beagle, he nestled in at Down House, where he was somehow able to stay calm and study mere barnacles to an endless extent (and to write 4--four--books on these little creatures). Who else would have had the obsessive patience (or independent wealth and time on one's hands) to do such a thing?
Darwin's meticulous work and its context in his life and thinking are very well described in Rebecca Stott's compelling 2003 book, Darwin and the Barnacle, which I highly recommend, as well as the discussion of these topics in Desmond and Moore's 1991 Darwin biography, The Life of a Tormented Evolutionist. These are easier, for seeing the points I will describe here, than plowing through Darwin's detailed own tomes (which, I openly confess, I have only browsed). His years of meticulous barnacle study raised many questions in Darwin's mind, about how species acquire their variation, and his pondering this eventually led to his recognition of 'evolution' as the answer, which he published only years later, in 1859, in his Origin of Species.
Darwin was, if anything, a careful and cautious person, and not much given to self-promotion. His works are laden with appropriate caveats including, one might surmise, careful defenses lest he be found to have made interpretive or theoretical mistakes. Yet he dared make generalizations of the broadest kind. It was his genius to see, in the overwhelming variation in nature, the material for understanding how natural processes, rather than creation events, led to the formation of new species. This was implicitly true of his struggle to understand the wide variation within and among species of barnacles, variation that enabled evolution, as he later came to see. Yet the same variation provided a subtle trap: it allowed escape from accusations of undocumented theorizing, but was so generic that in a sense it made his version of a theory of evolution almost unfalsifiable in principle.
But, in a subtle way, Mr Darwin, like all geniuses, was also a product of his time. I think he took an implicitly Newtonian, deterministic view of natural selection. As he said, selection could detect the 'smallest grain in the balance' [scale] of differences among organisms, that is, could evaluate and screen the tiniest amount of variation. He had, I think, only a rudimentary sense of probability; while he often used the word 'chance' in the Origin, it was in a very casual sense, and I think that he did not really think of chance or luck (what we call genetic drift) as important in evolution. This I would assert is widely persistent, if largely implicit, today.
One important aspect of barnacles to which Darwin paid extensive attention was their sexual diversity. In particular, many species were hermaphroditic. Indeed, in some species he found small, rudimentary males literally embedded for life within the body of the female. Other species were more sexually dichotomous. These patterns caught Darwin's attention. In particular, he viewed this transect in evolutionary time (our present day) as more than just a catalog of today, but also as a cross-section of tomorrow. He clearly thought that what we saw today among barnacle species represented the path that other species had taken towards becoming the fully sexually dichotomous (independent males and females) in some species today: the intermediates were on their way to these subsequent stages.
This is a deterministic view of selection and evolution: "an hermaphrodite species must pass into a bisexual species by insensibly small stages" from single organisms having both male and female sex organs to the dichotomous state of separate males and females (Desmond and Moore: 356-7).
But what does 'must pass' mean here? Yes, Darwin could array his specimens to show these various types of sexual dimorphism, but what would justify thinking of them as progressive 'stages'? What latent assumption is being made? It is to think of the different lifestyles as stages along a path leading to some final inevitable endpoint.
If this doesn't raise all sorts of questions in your mind, why not? Why, for example, are there any intermediate barnacle species here today? Over the eons of evolutionary time why haven't all of them long ago reached their final, presumably ideal and stable state? What justifies the idea that the species with 'intermediate' sexuality in Darwin's collections are not just doing fine, on their way to no other particular end? Is something wrong with their reproduction? If so, how did they get here in the first place? Why are there so many barnacle species today with their various reproductive strategies (states)?
Darwin's view was implicitly of the deterministic nature of selection--heading towards a goal which today's species show in their various progressive stages. His implicit view can be related to another, current controversy about evolution.
Rewinding the tape
There has for many recent decades been an argument about the degree of directedness or, one might say, predictability in evolution. If evolution is the selection among randomly generated mutational variants for those whose survival and reproduction are locally, at a given time favored, then wouldn't each such favored path be unique, none really replicable or predictable?
Not so, some biologists have argued! Their view is essentially that environments are what they are, and will systematically--and thus predictably--favor certain kinds of adaptation. There is, one might quip, only one way to make a cake in a particular environment. Different mutations may arise, but only those that lead to cake-making will persist. Thus, if we could 'rewind the tape' of evolution and go back to way back when, and start again, we would end up with the same sorts of adaptations that we see with the single play of the tape of life that we actually have. There would, so to speak, always be horseshoe crabs, even if we started over. Yes, yes, some details might differ, but nothing important (depending, of course, on how carefully you look--see my 'Plus ça ne change pas', Evol. Anthropol, 2013, a point others have made, too).
Others argue that evolution is so rooted in local chance and contingency, that there would be no way to predict the details of what would evolve, could we start over at some point. Yes, there would be creatures in each local niche, and there would be similarities to the extent that what we would see today would have to have been built from what genetic options were there yesterday, but there the similarity would end.
Induction, deduction, and the subtle implications of the notion of 'intermediate' forms
Stott's book, Darwin and the Barnacle, discusses Darwin's work in terms of the presumed intermediate barnacle stages he found. But the very use of such terms carries subtle implications. It conflates induction with deduction, it assumes what is past will be repeated. It makes of evolution what Darwin also made of it: a deterministic, force-like phenomenon. Indeed, it's not so different from a form of creationism.
This has deeper implications. Among them are repeatability of environments and genomes, at least to the extent that their combination in local areas--life, after all, operates strictly on local areas--will be repeated elsewhere and else-times. Only by assuming not only the repeatability of environments but also of genomic variation, can one see in current states of barnacle species today stages in a predictable evolutionary parade. The inductive argument is the observation of what happened in the past, and the deductive argument is that what we see is intermediate, on its way to becoming what some present-day more 'advanced' stage is like.
This kind of view, which is implicitly and (as with Darwin) sometimes explicitly invoked, is that we can use the past to predict the future. And yet we routinely teach that evolution is by its essential nature locally ad hoc and contingent, based on random mutations and genetic drift--and not driven by any outside God or other built-in specific creative force.
And 'force' seems to be an apt word here.
The idea that a trait found in fossils, that was intermediate between some more primitive state and something seen today, implies that a similar trait today could be an 'intermediate stage' today for a knowable tomorrow, conflates inductive observation with deductive prediction. It may indeed do so, but we have no way to prove it and usually scant reason to believe it. Instead, equating induction with deduction tacitly assumes, usually without any rigorous justification, that life is a deductive phenomenon like gravity or chemical reactions.
The problem is serious: the routine equating of induction with deduction gives a false idea about how life works, even in the short-term. Does a given genotype, say, predict a particular disease in someone who carries it, because we find that genotype associated with affected patients today? This may indeed be so, especially if a true causal reason is known; but it cannot be assumed to be. We know this from well-observed recent history: Secular trends in environmental factors with disease consequences have indeed been documented, meaning that the same genotype is not always associated with the same risk. There is no guarantee of a future repetition, not even in principle.
Darwin's worldview
Darwin was, in my view, a Newtonian in his view. That was the prevailing science ethos in his time. He accepted 'laws' of Nature and their infinitesimally precise action. That Nature was law-like was a prevailing, and one may say fashionable view at the time. It was also applied to social evolution, for example, as in Marx's and Engels' view of the political inevitability of socialism. That barnacles can evolve various kinds of sexual identities and arrangements doesn't mean any of what Darwin observed in them was on the way to full hermaphrodism or even later to fully distinct sexes...or, indeed, to any particular state of sexuality. But if you have a view like his, seeing the intermediate stages even contemporaneously, would reinforce the inevitabilistic aspect of a Newtonian perspective, and seemingly justify using induction to make deductions.
Even giants like Darwin are products of their times, as all we peons are. We gain comfort from equating deduction with induction, that the past we can observe allows us to predict the future. That makes it comfortingly safe to make assertions, the feeling that we understand the complex environment in which we must wend our way through life. But in science, at least, we should know the emptiness of the equation of the past with the future. Too bad we can't seem to see further.
![]() |
| From Darwin's books on barnacles (web image capture) |
Darwin was, if anything, a careful and cautious person, and not much given to self-promotion. His works are laden with appropriate caveats including, one might surmise, careful defenses lest he be found to have made interpretive or theoretical mistakes. Yet he dared make generalizations of the broadest kind. It was his genius to see, in the overwhelming variation in nature, the material for understanding how natural processes, rather than creation events, led to the formation of new species. This was implicitly true of his struggle to understand the wide variation within and among species of barnacles, variation that enabled evolution, as he later came to see. Yet the same variation provided a subtle trap: it allowed escape from accusations of undocumented theorizing, but was so generic that in a sense it made his version of a theory of evolution almost unfalsifiable in principle.
But, in a subtle way, Mr Darwin, like all geniuses, was also a product of his time. I think he took an implicitly Newtonian, deterministic view of natural selection. As he said, selection could detect the 'smallest grain in the balance' [scale] of differences among organisms, that is, could evaluate and screen the tiniest amount of variation. He had, I think, only a rudimentary sense of probability; while he often used the word 'chance' in the Origin, it was in a very casual sense, and I think that he did not really think of chance or luck (what we call genetic drift) as important in evolution. This I would assert is widely persistent, if largely implicit, today.
One important aspect of barnacles to which Darwin paid extensive attention was their sexual diversity. In particular, many species were hermaphroditic. Indeed, in some species he found small, rudimentary males literally embedded for life within the body of the female. Other species were more sexually dichotomous. These patterns caught Darwin's attention. In particular, he viewed this transect in evolutionary time (our present day) as more than just a catalog of today, but also as a cross-section of tomorrow. He clearly thought that what we saw today among barnacle species represented the path that other species had taken towards becoming the fully sexually dichotomous (independent males and females) in some species today: the intermediates were on their way to these subsequent stages.
This is a deterministic view of selection and evolution: "an hermaphrodite species must pass into a bisexual species by insensibly small stages" from single organisms having both male and female sex organs to the dichotomous state of separate males and females (Desmond and Moore: 356-7).
But what does 'must pass' mean here? Yes, Darwin could array his specimens to show these various types of sexual dimorphism, but what would justify thinking of them as progressive 'stages'? What latent assumption is being made? It is to think of the different lifestyles as stages along a path leading to some final inevitable endpoint.
If this doesn't raise all sorts of questions in your mind, why not? Why, for example, are there any intermediate barnacle species here today? Over the eons of evolutionary time why haven't all of them long ago reached their final, presumably ideal and stable state? What justifies the idea that the species with 'intermediate' sexuality in Darwin's collections are not just doing fine, on their way to no other particular end? Is something wrong with their reproduction? If so, how did they get here in the first place? Why are there so many barnacle species today with their various reproductive strategies (states)?
Darwin's view was implicitly of the deterministic nature of selection--heading towards a goal which today's species show in their various progressive stages. His implicit view can be related to another, current controversy about evolution.
Rewinding the tape
There has for many recent decades been an argument about the degree of directedness or, one might say, predictability in evolution. If evolution is the selection among randomly generated mutational variants for those whose survival and reproduction are locally, at a given time favored, then wouldn't each such favored path be unique, none really replicable or predictable?
Not so, some biologists have argued! Their view is essentially that environments are what they are, and will systematically--and thus predictably--favor certain kinds of adaptation. There is, one might quip, only one way to make a cake in a particular environment. Different mutations may arise, but only those that lead to cake-making will persist. Thus, if we could 'rewind the tape' of evolution and go back to way back when, and start again, we would end up with the same sorts of adaptations that we see with the single play of the tape of life that we actually have. There would, so to speak, always be horseshoe crabs, even if we started over. Yes, yes, some details might differ, but nothing important (depending, of course, on how carefully you look--see my 'Plus ça ne change pas', Evol. Anthropol, 2013, a point others have made, too).
Others argue that evolution is so rooted in local chance and contingency, that there would be no way to predict the details of what would evolve, could we start over at some point. Yes, there would be creatures in each local niche, and there would be similarities to the extent that what we would see today would have to have been built from what genetic options were there yesterday, but there the similarity would end.
Induction, deduction, and the subtle implications of the notion of 'intermediate' forms
Stott's book, Darwin and the Barnacle, discusses Darwin's work in terms of the presumed intermediate barnacle stages he found. But the very use of such terms carries subtle implications. It conflates induction with deduction, it assumes what is past will be repeated. It makes of evolution what Darwin also made of it: a deterministic, force-like phenomenon. Indeed, it's not so different from a form of creationism.
This has deeper implications. Among them are repeatability of environments and genomes, at least to the extent that their combination in local areas--life, after all, operates strictly on local areas--will be repeated elsewhere and else-times. Only by assuming not only the repeatability of environments but also of genomic variation, can one see in current states of barnacle species today stages in a predictable evolutionary parade. The inductive argument is the observation of what happened in the past, and the deductive argument is that what we see is intermediate, on its way to becoming what some present-day more 'advanced' stage is like.
This kind of view, which is implicitly and (as with Darwin) sometimes explicitly invoked, is that we can use the past to predict the future. And yet we routinely teach that evolution is by its essential nature locally ad hoc and contingent, based on random mutations and genetic drift--and not driven by any outside God or other built-in specific creative force.
And 'force' seems to be an apt word here.
The idea that a trait found in fossils, that was intermediate between some more primitive state and something seen today, implies that a similar trait today could be an 'intermediate stage' today for a knowable tomorrow, conflates inductive observation with deductive prediction. It may indeed do so, but we have no way to prove it and usually scant reason to believe it. Instead, equating induction with deduction tacitly assumes, usually without any rigorous justification, that life is a deductive phenomenon like gravity or chemical reactions.
The problem is serious: the routine equating of induction with deduction gives a false idea about how life works, even in the short-term. Does a given genotype, say, predict a particular disease in someone who carries it, because we find that genotype associated with affected patients today? This may indeed be so, especially if a true causal reason is known; but it cannot be assumed to be. We know this from well-observed recent history: Secular trends in environmental factors with disease consequences have indeed been documented, meaning that the same genotype is not always associated with the same risk. There is no guarantee of a future repetition, not even in principle.
Darwin's worldview
Darwin was, in my view, a Newtonian in his view. That was the prevailing science ethos in his time. He accepted 'laws' of Nature and their infinitesimally precise action. That Nature was law-like was a prevailing, and one may say fashionable view at the time. It was also applied to social evolution, for example, as in Marx's and Engels' view of the political inevitability of socialism. That barnacles can evolve various kinds of sexual identities and arrangements doesn't mean any of what Darwin observed in them was on the way to full hermaphrodism or even later to fully distinct sexes...or, indeed, to any particular state of sexuality. But if you have a view like his, seeing the intermediate stages even contemporaneously, would reinforce the inevitabilistic aspect of a Newtonian perspective, and seemingly justify using induction to make deductions.
Even giants like Darwin are products of their times, as all we peons are. We gain comfort from equating deduction with induction, that the past we can observe allows us to predict the future. That makes it comfortingly safe to make assertions, the feeling that we understand the complex environment in which we must wend our way through life. But in science, at least, we should know the emptiness of the equation of the past with the future. Too bad we can't seem to see further.
Thursday, April 20, 2017
Some genetic non-sense about nonsense genes
By
Ken Weiss
The April 12 issue of Nature has a research report and a main article about what is basically presented as the discovery that people typically carry doubly knocked-out genes, but show no effect. The idea as presented in the editorial (p 171) notes that the report (p235) uses an inbred population to isolate double knockout genes (that is, recessive homozygous null mutations), and look at their effects. The population sampled, from Pakistan, has high levels of consanguineous marriages. The criteria for a knockout mutation was based on the protein coding sequence.
We have no reason to question the technical accuracy of the papers, nor their relevance to biomedical and other genetics, but there are reasons to assert that this is nothing newly discovered, and that the story misses the really central point that should, I think, be undermining the expensive Big Data/GWAS approach to biological causation.
First, for some years now there have been reports of samples of individual humans (perhaps also of yeast, but I can't recall specifically) in which both copies of a gene appear to be inactivated. The criteria for saying so are generally indirect, based on nonsense, frameshift, or splice-site mutations in the protein code. That is, there are other aspects of coding regions that may be relevant to whether this is a truly thorough search to see that whatever is coded really is non-functional. The authors mention some of these. But, basically, costly as it is, this is science on the cheap because it clearly only addresses some aspects of gene functionality. It would obviously be almost impossible to show either that the gene was never expressed or never worked. For our purposes here, we need not question the finding itself. The fact that this is not a first discovery does raise the question why a journal like Nature is so desperate for Dramatic Finding stories, since this one really should be instead a report in one of many specialty human genetics journals.
Secondly, there are causes other than coding mutations for gene inactivation. They have to do with regulatory sequences, and inactivating mutations in that part of a gene's functional structure is much more difficult, if not impossible, to detect with any completeness. A gene's coding sequence itself may seem fine, but its regulatory sequences may simply not enable it to be expressed. Gene regulation depends on epigenetic DNA modification as well as multiple transcription factor binding sites, as well as the functional aspects of the many proteins required to activate a gene, and other aspects of the local DNA environment (such as RNA editing or RNA interference). The point here is that there are likely to be many other instances of people with complete or effectively complete double knockouts of genes.
Thirdly, the assertion that these double KOs have no effect depends on various assumptions. Mainly, it assumes that the sampled individuals will not, in the future, experience the otherwise-expected phenotypic effects of their defunct genes. Effects may depend on age, sex, and environmental effects rather than necessarily being a congenital yes/no functional effect.
Fourthly, there may be many coding mutations that make the protein non-functional, but these are ignored by this sort of study because they aren't clear knockout mutations, yet they are in whatever data are used for comparison of phenotypic outcomes. There are post-translational modification, RNA editing, RNA modification, and other aspects of a 'gene' that this is not picking up.
Fifthly, and by far most important, I think, is that this is the tip of the iceberg of redundancy in genetic functions. In that sense, the current paper is a kind of factoid that reflects what GWAS has been showing in great, if implicit, detail for a long time: there is great complexity and redundancy in biological functions. Individual mapped genes typically affect trait values or disease risks only slightly. Different combinations of variants at tens, hundreds, or even thousands of genome sites can yield essentially the same phenotype (and here we ignore the environment which makes things even more causally blurred).
Sixthly, other samples and certainly other populations, as well as individuals within the Pakistani data base, surely carry various aspects of redundant pathways, from plenty of them to none. Indeed, the inbreeding that was used in this study obviously affects the rest of the genome, and there's no particular way to know in what way, or more importantly, in which individuals. The authors found a number of basically trivial or no-effect results as it is, even after their hunt across the genome. Whether some individuals had an attributable effect of a particular double knockout is problematic at best. Every sample, even of the same population, and certainly of other populations, will have different background genotypes (homozygous or not), so this is largely a fishing expedition in a particular pond that cannot seriously be extrapolated to other samples.
Finally, this study cannot address the effect of somatic mutation on phenotypes and their risk of occurrence. Who knows how many local tissues have experienced double-knockout mutations and produced (or not produced) some disease or other phenotype outcome. Constitutive genome sequencing cannot detect this. Surely we should know this very inconvenient fact by now!
Given the well-documented and pervasive biological redundancy, it is not any sort of surprise that some genes can be non-functional and the individual phenotypically within a viable, normal range. Not only is this not a surprise, especially by now in the history of genetics, but its most important implication is that our Big Data genetic reductionistic experiment has been very successful! It has, or should have, shown us that we are not going to be getting our money's worth from that approach. It will yield some predictions in the sense of retrospective data fitting to case-control or other GWAS-like samples, and it will be trumpeted as a Big Success, but such findings, even if wholly correct, cannot yield reliable true predictions of future risk.
Does environment, by any chance, affect the studied traits? We have, in principle, no way to know what environmental exposures (or somatic mutations) will be like. The by now very well documented leaf-litter of rare and/or small-effect variants plagues GWAS for practical statistical reasons (and is why usually only a fraction of heritability is accounted for). Naturally, finding a single doubly inactivated gene may, but by no means need, yield reliable trait predictions.
By now, we know of many individual genes whose coded function is so proximate or central to some trait that mutations in such genes can have predictable effects. This is the case with many of the classical 'Mendelian' disorders and traits that we've known for decades. Molecular methods have admirably identified the gene and mutations in it whose effects are understandable in functional terms (for example, because the mutation destroys a key aspect of a coded protein's function). Examples are Huntington's disease, PKU, cystic fibrosis, and many others.
However, these are at best the exceptions that lured us to think that even more complex, often late-onset traits would be mappable so that we could parlay massive investment in computerized data sets into solid predictions and identify the 'druggable' genes-for that Big Pharma could target. This was predictably an illusion, as some of us were saying long ago and for the right reasons. Everyone should know better now, and this paper just reinforces the point, to the extent that one can assert that it's the political economic aspects of science funding, science careers, and hungry publications, and not the science itself, that leads to the persistence of drives to continue or expand the same methods anyway. Naturally (or should one say reflexively?), the authors advocate a huge Human Knockout Project to study every gene--today's reflex Big Data proposal.**
Instead, it's clearly time to recognize the relative futility of this, and change gears to more focused problems that might actually punch their weight in real genetic solutions!
** [NOTE added in a revision. We should have a wealth of data by now, from many different inbred mouse and other animal strains, and from specific knockout experiments in such animals, to know that the findings of the Pakistani family paper are to be expected. About 1/4 to 1/3 of knockout experiments in mice have no effect or not the same effect as in humans, or have no or different effect in other inbred mouse strains. How many times do we have to learn the same lesson? Indeed, with existing genomewide sequence databases from many species, one can search for 2KO'ed genes. We don't really need a new megaproject to have lots of comparable data.]
We have no reason to question the technical accuracy of the papers, nor their relevance to biomedical and other genetics, but there are reasons to assert that this is nothing newly discovered, and that the story misses the really central point that should, I think, be undermining the expensive Big Data/GWAS approach to biological causation.
First, for some years now there have been reports of samples of individual humans (perhaps also of yeast, but I can't recall specifically) in which both copies of a gene appear to be inactivated. The criteria for saying so are generally indirect, based on nonsense, frameshift, or splice-site mutations in the protein code. That is, there are other aspects of coding regions that may be relevant to whether this is a truly thorough search to see that whatever is coded really is non-functional. The authors mention some of these. But, basically, costly as it is, this is science on the cheap because it clearly only addresses some aspects of gene functionality. It would obviously be almost impossible to show either that the gene was never expressed or never worked. For our purposes here, we need not question the finding itself. The fact that this is not a first discovery does raise the question why a journal like Nature is so desperate for Dramatic Finding stories, since this one really should be instead a report in one of many specialty human genetics journals.
Secondly, there are causes other than coding mutations for gene inactivation. They have to do with regulatory sequences, and inactivating mutations in that part of a gene's functional structure is much more difficult, if not impossible, to detect with any completeness. A gene's coding sequence itself may seem fine, but its regulatory sequences may simply not enable it to be expressed. Gene regulation depends on epigenetic DNA modification as well as multiple transcription factor binding sites, as well as the functional aspects of the many proteins required to activate a gene, and other aspects of the local DNA environment (such as RNA editing or RNA interference). The point here is that there are likely to be many other instances of people with complete or effectively complete double knockouts of genes.
Thirdly, the assertion that these double KOs have no effect depends on various assumptions. Mainly, it assumes that the sampled individuals will not, in the future, experience the otherwise-expected phenotypic effects of their defunct genes. Effects may depend on age, sex, and environmental effects rather than necessarily being a congenital yes/no functional effect.
Fourthly, there may be many coding mutations that make the protein non-functional, but these are ignored by this sort of study because they aren't clear knockout mutations, yet they are in whatever data are used for comparison of phenotypic outcomes. There are post-translational modification, RNA editing, RNA modification, and other aspects of a 'gene' that this is not picking up.
Fifthly, and by far most important, I think, is that this is the tip of the iceberg of redundancy in genetic functions. In that sense, the current paper is a kind of factoid that reflects what GWAS has been showing in great, if implicit, detail for a long time: there is great complexity and redundancy in biological functions. Individual mapped genes typically affect trait values or disease risks only slightly. Different combinations of variants at tens, hundreds, or even thousands of genome sites can yield essentially the same phenotype (and here we ignore the environment which makes things even more causally blurred).
Sixthly, other samples and certainly other populations, as well as individuals within the Pakistani data base, surely carry various aspects of redundant pathways, from plenty of them to none. Indeed, the inbreeding that was used in this study obviously affects the rest of the genome, and there's no particular way to know in what way, or more importantly, in which individuals. The authors found a number of basically trivial or no-effect results as it is, even after their hunt across the genome. Whether some individuals had an attributable effect of a particular double knockout is problematic at best. Every sample, even of the same population, and certainly of other populations, will have different background genotypes (homozygous or not), so this is largely a fishing expedition in a particular pond that cannot seriously be extrapolated to other samples.
Finally, this study cannot address the effect of somatic mutation on phenotypes and their risk of occurrence. Who knows how many local tissues have experienced double-knockout mutations and produced (or not produced) some disease or other phenotype outcome. Constitutive genome sequencing cannot detect this. Surely we should know this very inconvenient fact by now!
Given the well-documented and pervasive biological redundancy, it is not any sort of surprise that some genes can be non-functional and the individual phenotypically within a viable, normal range. Not only is this not a surprise, especially by now in the history of genetics, but its most important implication is that our Big Data genetic reductionistic experiment has been very successful! It has, or should have, shown us that we are not going to be getting our money's worth from that approach. It will yield some predictions in the sense of retrospective data fitting to case-control or other GWAS-like samples, and it will be trumpeted as a Big Success, but such findings, even if wholly correct, cannot yield reliable true predictions of future risk.
Does environment, by any chance, affect the studied traits? We have, in principle, no way to know what environmental exposures (or somatic mutations) will be like. The by now very well documented leaf-litter of rare and/or small-effect variants plagues GWAS for practical statistical reasons (and is why usually only a fraction of heritability is accounted for). Naturally, finding a single doubly inactivated gene may, but by no means need, yield reliable trait predictions.
By now, we know of many individual genes whose coded function is so proximate or central to some trait that mutations in such genes can have predictable effects. This is the case with many of the classical 'Mendelian' disorders and traits that we've known for decades. Molecular methods have admirably identified the gene and mutations in it whose effects are understandable in functional terms (for example, because the mutation destroys a key aspect of a coded protein's function). Examples are Huntington's disease, PKU, cystic fibrosis, and many others.
However, these are at best the exceptions that lured us to think that even more complex, often late-onset traits would be mappable so that we could parlay massive investment in computerized data sets into solid predictions and identify the 'druggable' genes-for that Big Pharma could target. This was predictably an illusion, as some of us were saying long ago and for the right reasons. Everyone should know better now, and this paper just reinforces the point, to the extent that one can assert that it's the political economic aspects of science funding, science careers, and hungry publications, and not the science itself, that leads to the persistence of drives to continue or expand the same methods anyway. Naturally (or should one say reflexively?), the authors advocate a huge Human Knockout Project to study every gene--today's reflex Big Data proposal.**
Instead, it's clearly time to recognize the relative futility of this, and change gears to more focused problems that might actually punch their weight in real genetic solutions!
** [NOTE added in a revision. We should have a wealth of data by now, from many different inbred mouse and other animal strains, and from specific knockout experiments in such animals, to know that the findings of the Pakistani family paper are to be expected. About 1/4 to 1/3 of knockout experiments in mice have no effect or not the same effect as in humans, or have no or different effect in other inbred mouse strains. How many times do we have to learn the same lesson? Indeed, with existing genomewide sequence databases from many species, one can search for 2KO'ed genes. We don't really need a new megaproject to have lots of comparable data.]
Wednesday, March 29, 2017
The (bad) luck of the draw; more evidence
By
Ken Weiss
A while back, Vogelstein and Tomasetti (V-T) published a paper in Science in which it was argued that most cancers cannot be attributed to known environmental factors, but instead were due simply to the errors in DNA replication that occur throughout life when cells divide. See our earlier 2-part series on this.
Essentially the argument is that knowledge of the approximate number of at-risk cell divisions per unit of age could account for the age-related pattern of increase in cancers of different organs, if one ignored some obviously environmental causes like smoking. Cigarette smoke is a mutagen and if cancer is a mutagenic disease, as it certainly largely is, then that will account for the dose-related pattern of lung and oral cancers.
This got enraged responses from environmental epidemiologists whose careers are vested in the idea that if people would avoid carcinogens they'd reduce their cancer risk. Of course, this is partly just the environmental epidemiologists' natural reaction to their ox being gored--threats to their grant largesse and so on. But it is also true that environmental factors of various kinds, in addition to smoking, have been associated with cancer; some dietary components, viruses, sunlight, even diagnostic x-rays if done early and often enough, and other factors.
Most associated risks from agents like these are small, compared to smoking, but not zero and an at least legitimate objection to V-T's paper might be that the suggestion that environmental pollution, dietary excess, and so on don't matter when it comes to cancer is wrong. I think V-T are saying no such thing. Clearly some environmental exposures are mutagens and it would be a really hard-core reactionary to deny that mutations are unrelated to cancer. Other external or lifestyle agents are mitogens; they stimulate cell division, and it would be silly not to think they could have a role in cancer. If and when they do, it is not by causing mutations per se. Instead mitogenic exposures in themselves just stimulate cell division, which is dangerous if the cell is already transformed into a cancer cell. But it is also a way to increase cancer by just what V-T stress: the natural occurrence of mutations when cells divide.
There are a few who argue that cancer is due to transposable elements moving around and/or inserting into the genome where they can cause cells to misbehave, or other perhaps unknown factors such as of tissue organization, which can lead cells to 'misbehave', rather than mutations.
These alternatives are, currently, a rather minor cause of cancer. In response to their critics, V-T have just published a new multi-national analysis that they suggest supports their theory. They attempted to correct for the number of at-risk cells and so on, and found a convincing pattern that supports the intrinsic-mutation viewpoint. They did this to rebut their critics.
This is at least in part an unnecessary food-fight. When cells divide, DNA replication errors occur. This seems well-documented (indeed, Vogelstein did some work years ago that showed evidence for somatic mutation--that is, DNA changes that are not inherited--and genomes of cancer cells compared to normal cells of the same individual. Indeed, for decades this has been known in various levels of detail. Of course, showing that this is causal rather than coincidental is a separate problem, because the fact of mutations occurring during cell division doesn't necessarily mean that the mutations are causal. However, for several cancers the repeated involvement of specific genes, and the demonstration of mutations in the same gene or genes in many different individuals, or of the same effect in experimental mice and so on, is persuasive evidence that mutational change is important in cancer.
The specifics of that importance are in a sense somewhat separate from the assertion that environmental epidemiologists are complaining about. Unfortunately, to a great extent this is a silly debate. In essence, besides professional pride and careerism, the debate should not be about whether mutations are involved in cancer causation but whether specific environmental sources of mutation are identifiable and individually strong enough, as x-rays and tobacco smoke are, to be identified and avoided. Smoking targets particular cells in the oral cavity and lungs. But exposures that are more generic, but individually rare or not associated with a specific item like smoking, and can't be avoided, might raise the rate of somatic mutation generally. Just having a body temperature may be one such factor, for example.
I would say that we are inevitably exposed to chemicals and so on that will potentially damage cells, mutation being one such effect. V-T are substantially correct, from what the data look like, in saying that (in our words) namable, specific, and avoidable environmental mutations are not the major systematic, organ-targeting cause of cancer. Vague and/or generic exposure to mutagens will lead to mutations more or less randomly among our cells (maybe, depending on the agent, differently depending on how deep in our bodies the cells are relative to the outside world or other means of exposure). The more at-risk cells, the longer they're at risk, and so on, the greater the chance that some cell will experience a transforming set of changes.
Most of us probably inherit mutations in some of these genes from conception, and have to await other events to occur (whether these are mutational or of another nature as mentioned above). The age patterns of cancers seem very convincingly to show that. The real key factor here is the degree to which specific, identifiable, avoidable mutational agents can be identified. It seems silly or, perhaps as likely, mere professional jealousy, to resist that idea.
These statements apply even if cancers are not all, or not entirely, due to mutational effects. And, remember, not all of the mutations required to transform a cell need be of somatic origin. Since cancer is mostly, and obviously, a multi-factor disease genetically (not a single mutation as a rule), we should not have our hackles raised if we find what seems obvious, that mutations are part of cell division, part of life.
There are curious things about cancer, such as our large body size but delayed onset ages relative to the occurrence of cancer in smaller, and younger animals like mice. And different animals of different lifespans and body sizes, even different rodents, have different lifetime cancer risks (some may be the result of details of their inbreeding history or of inbreeding itself). Mouse cancer rates increase with age and hence the number of at-risk cell divisions, but the overall risk at very young ages despite many fewer cell divisions (yet similar genome sizes) shows that even the spontaneous mutation idea of V-T has problems. After all, elephants are huge and live very long lives; why don't they get cancer much earlier?
Overall, if if correct, V-T's view should not give too much comfort to our 'Precision' genomic medicine sloganeers, another aspect of budget protection, because the bad luck mutations are generally somatic, not germline, and hence not susceptible to Big Data epidemiology, genetic or otherwise, that depends on germ-line variation as the predictor.
Related to this are the numerous reports of changes in life expectancy among various segments of society and how they are changing based on behaviors, most recently, for example, the opiod epidemic among whites in depressed areas of the US. Such environmental changes are not predictable specifically, not even in principle, and can't be built into genome-based Big Data, or the budget-promoting promises coming out of NIH about such 'precision'. Even estimated lifetime cancer risks associated with mutations in clear-cut risk-affecting genes like BRCA1 mutations and breast cancer, vary greatly from population to population and study to study. The V-T debate, and their obviously valid point, regardless of the details, is only part of the lifetime cancer risk story.
ADDENDUM 1
Just after posting this, I learned of a new story on this 'controversy' in The Atlantic. It is really a silly debate, as noted in my original version. It tacitly makes many different assumptions about whether this or that tinkering with our lifestyles will add to or reduce the risk of cancer and hence support the anti-V-T lobby. If we're going to get into the nitty-gritty and typically very minor details about, for example, whether the statistical colon-cancer-protective effect of aspirin shows that V-T were wrong, then this really does smell of academic territory defense.
Why do I say that? Because if we go down that road, we'll have to say that statins are cancer-causing, and so is exercise, and kidney transplants and who knows what else. They cause cancer by allowing people to live longer, and accumulate more mutational damage to their cells. And the supposedly serious opioid epidemic among Trump supporters actually is protective, because those people are dying earlier and not getting cancer!
The main point is that mutations are clearly involved in carcinogenesis, cell division life-history is clearly involved in carcinogenesis, environmental mutagens are clearly involved in carcinogenesis, and inherited mutations are clearly contributory to the additional effects of life-history events. The silly extremism to which the objectors to V-T would take us would be to say that, obviously, if we avoided any interaction whatsoever with our environment, we'd never get cancer. Of course, we'd all be so demented and immobilized with diverse organ-system failures that we wouldn't realize our good fortune in not getting cancer.
The story and much of the discussion on all sides is also rather naive even about the nature of cancer (and how many or of which mutations etc it takes to get cancer); but that's for another post sometime.
ADDENDUM 2
I'll add another new bit to my post, that I hadn't thought of when I wrote the original. We have many ways to estimate mutation rates, in nature and in the laboratory. They include parent-offspring comparison in genomewide sequencing samples, and there have been sperm-to-sperm comparisons. I'm sure there are many other sets of data (see Michael Lynch in Trends in Genetics 2010 Aug; 26(8): 345–352. These give a consistent picture and one can say, if one wants to, that the inherent mutation rate is due to identifiable environmental factors, but given the breadth of the data that's not much different than saying that mutations are 'in the air'. There are even sex-specific differences.
The numerous mutation detection and repair mechanisms, built into genomes, adds to the idea that mutations are part of life, for example that they are not related to modern human lifestyles. Of course, evolution depends on mutation, so it cannot and never has been reduced to zero--a species that couldn't change doesn't last. Mutations occur in plants and animals and prokaryotes, in all environments and I believe, generally at rather similar species-specific rates.
If you want to argue that every mutation has an external (environmental) cause rather than an internal molecular one, that is merely saying there's no randomness in life or imperfection in molecular processes. That is as much a philosophical as an empirical assertion (as perhaps any quantum physicist can tell you!). The key, as asserted in the post here, is that for the environmentalists' claim to make sense, to be a mutational cause in the meaningful sense, the force or factor must be systematic and identifiable and tissue-specific, and it must be shown how it gets to the internal tissue in question and not to other tissues on the way in, etc.
Given how difficult it has been to chase down most environmental carcinogenic factors, to which exposure is more than very rare, and that the search has been going on for a very long time, and only a few have been found that are, in themselves, clearly causal (ultraviolet radiation, Human Papilloma Virus, ionizing radiation, the ones mentioned in the post), whatever is left over must be very weak, non tissue-specific, rare, and the like. Even radiation-induced lung cancer in uranium minors has been challenging to prove (for example, because miners also largely were smokers).
It is not much of a stretch to simply say that even if, in principle, all mutations in our body's lifetime were due to external exposures, and the relevant mutagens could be identified and shown in some convincing way to be specifically carcinogenic in specific tissues, in practice if not ultra-reality, then the aggregate exposures to such mutations are unavoidable and epistemically random with respect to tissue and gene. That I would say is the essence of the V-T finding.
Quibbling about that aspect of carcinogenesis is for those who have already determined how many angels dance on the head of a pin.
Essentially the argument is that knowledge of the approximate number of at-risk cell divisions per unit of age could account for the age-related pattern of increase in cancers of different organs, if one ignored some obviously environmental causes like smoking. Cigarette smoke is a mutagen and if cancer is a mutagenic disease, as it certainly largely is, then that will account for the dose-related pattern of lung and oral cancers.
This got enraged responses from environmental epidemiologists whose careers are vested in the idea that if people would avoid carcinogens they'd reduce their cancer risk. Of course, this is partly just the environmental epidemiologists' natural reaction to their ox being gored--threats to their grant largesse and so on. But it is also true that environmental factors of various kinds, in addition to smoking, have been associated with cancer; some dietary components, viruses, sunlight, even diagnostic x-rays if done early and often enough, and other factors.
Most associated risks from agents like these are small, compared to smoking, but not zero and an at least legitimate objection to V-T's paper might be that the suggestion that environmental pollution, dietary excess, and so on don't matter when it comes to cancer is wrong. I think V-T are saying no such thing. Clearly some environmental exposures are mutagens and it would be a really hard-core reactionary to deny that mutations are unrelated to cancer. Other external or lifestyle agents are mitogens; they stimulate cell division, and it would be silly not to think they could have a role in cancer. If and when they do, it is not by causing mutations per se. Instead mitogenic exposures in themselves just stimulate cell division, which is dangerous if the cell is already transformed into a cancer cell. But it is also a way to increase cancer by just what V-T stress: the natural occurrence of mutations when cells divide.
There are a few who argue that cancer is due to transposable elements moving around and/or inserting into the genome where they can cause cells to misbehave, or other perhaps unknown factors such as of tissue organization, which can lead cells to 'misbehave', rather than mutations.
These alternatives are, currently, a rather minor cause of cancer. In response to their critics, V-T have just published a new multi-national analysis that they suggest supports their theory. They attempted to correct for the number of at-risk cells and so on, and found a convincing pattern that supports the intrinsic-mutation viewpoint. They did this to rebut their critics.
This is at least in part an unnecessary food-fight. When cells divide, DNA replication errors occur. This seems well-documented (indeed, Vogelstein did some work years ago that showed evidence for somatic mutation--that is, DNA changes that are not inherited--and genomes of cancer cells compared to normal cells of the same individual. Indeed, for decades this has been known in various levels of detail. Of course, showing that this is causal rather than coincidental is a separate problem, because the fact of mutations occurring during cell division doesn't necessarily mean that the mutations are causal. However, for several cancers the repeated involvement of specific genes, and the demonstration of mutations in the same gene or genes in many different individuals, or of the same effect in experimental mice and so on, is persuasive evidence that mutational change is important in cancer.
The specifics of that importance are in a sense somewhat separate from the assertion that environmental epidemiologists are complaining about. Unfortunately, to a great extent this is a silly debate. In essence, besides professional pride and careerism, the debate should not be about whether mutations are involved in cancer causation but whether specific environmental sources of mutation are identifiable and individually strong enough, as x-rays and tobacco smoke are, to be identified and avoided. Smoking targets particular cells in the oral cavity and lungs. But exposures that are more generic, but individually rare or not associated with a specific item like smoking, and can't be avoided, might raise the rate of somatic mutation generally. Just having a body temperature may be one such factor, for example.
I would say that we are inevitably exposed to chemicals and so on that will potentially damage cells, mutation being one such effect. V-T are substantially correct, from what the data look like, in saying that (in our words) namable, specific, and avoidable environmental mutations are not the major systematic, organ-targeting cause of cancer. Vague and/or generic exposure to mutagens will lead to mutations more or less randomly among our cells (maybe, depending on the agent, differently depending on how deep in our bodies the cells are relative to the outside world or other means of exposure). The more at-risk cells, the longer they're at risk, and so on, the greater the chance that some cell will experience a transforming set of changes.
Most of us probably inherit mutations in some of these genes from conception, and have to await other events to occur (whether these are mutational or of another nature as mentioned above). The age patterns of cancers seem very convincingly to show that. The real key factor here is the degree to which specific, identifiable, avoidable mutational agents can be identified. It seems silly or, perhaps as likely, mere professional jealousy, to resist that idea.
These statements apply even if cancers are not all, or not entirely, due to mutational effects. And, remember, not all of the mutations required to transform a cell need be of somatic origin. Since cancer is mostly, and obviously, a multi-factor disease genetically (not a single mutation as a rule), we should not have our hackles raised if we find what seems obvious, that mutations are part of cell division, part of life.
There are curious things about cancer, such as our large body size but delayed onset ages relative to the occurrence of cancer in smaller, and younger animals like mice. And different animals of different lifespans and body sizes, even different rodents, have different lifetime cancer risks (some may be the result of details of their inbreeding history or of inbreeding itself). Mouse cancer rates increase with age and hence the number of at-risk cell divisions, but the overall risk at very young ages despite many fewer cell divisions (yet similar genome sizes) shows that even the spontaneous mutation idea of V-T has problems. After all, elephants are huge and live very long lives; why don't they get cancer much earlier?
Overall, if if correct, V-T's view should not give too much comfort to our 'Precision' genomic medicine sloganeers, another aspect of budget protection, because the bad luck mutations are generally somatic, not germline, and hence not susceptible to Big Data epidemiology, genetic or otherwise, that depends on germ-line variation as the predictor.
Related to this are the numerous reports of changes in life expectancy among various segments of society and how they are changing based on behaviors, most recently, for example, the opiod epidemic among whites in depressed areas of the US. Such environmental changes are not predictable specifically, not even in principle, and can't be built into genome-based Big Data, or the budget-promoting promises coming out of NIH about such 'precision'. Even estimated lifetime cancer risks associated with mutations in clear-cut risk-affecting genes like BRCA1 mutations and breast cancer, vary greatly from population to population and study to study. The V-T debate, and their obviously valid point, regardless of the details, is only part of the lifetime cancer risk story.
ADDENDUM 1
Just after posting this, I learned of a new story on this 'controversy' in The Atlantic. It is really a silly debate, as noted in my original version. It tacitly makes many different assumptions about whether this or that tinkering with our lifestyles will add to or reduce the risk of cancer and hence support the anti-V-T lobby. If we're going to get into the nitty-gritty and typically very minor details about, for example, whether the statistical colon-cancer-protective effect of aspirin shows that V-T were wrong, then this really does smell of academic territory defense.
Why do I say that? Because if we go down that road, we'll have to say that statins are cancer-causing, and so is exercise, and kidney transplants and who knows what else. They cause cancer by allowing people to live longer, and accumulate more mutational damage to their cells. And the supposedly serious opioid epidemic among Trump supporters actually is protective, because those people are dying earlier and not getting cancer!
The main point is that mutations are clearly involved in carcinogenesis, cell division life-history is clearly involved in carcinogenesis, environmental mutagens are clearly involved in carcinogenesis, and inherited mutations are clearly contributory to the additional effects of life-history events. The silly extremism to which the objectors to V-T would take us would be to say that, obviously, if we avoided any interaction whatsoever with our environment, we'd never get cancer. Of course, we'd all be so demented and immobilized with diverse organ-system failures that we wouldn't realize our good fortune in not getting cancer.
The story and much of the discussion on all sides is also rather naive even about the nature of cancer (and how many or of which mutations etc it takes to get cancer); but that's for another post sometime.
ADDENDUM 2
I'll add another new bit to my post, that I hadn't thought of when I wrote the original. We have many ways to estimate mutation rates, in nature and in the laboratory. They include parent-offspring comparison in genomewide sequencing samples, and there have been sperm-to-sperm comparisons. I'm sure there are many other sets of data (see Michael Lynch in Trends in Genetics 2010 Aug; 26(8): 345–352. These give a consistent picture and one can say, if one wants to, that the inherent mutation rate is due to identifiable environmental factors, but given the breadth of the data that's not much different than saying that mutations are 'in the air'. There are even sex-specific differences.
The numerous mutation detection and repair mechanisms, built into genomes, adds to the idea that mutations are part of life, for example that they are not related to modern human lifestyles. Of course, evolution depends on mutation, so it cannot and never has been reduced to zero--a species that couldn't change doesn't last. Mutations occur in plants and animals and prokaryotes, in all environments and I believe, generally at rather similar species-specific rates.
If you want to argue that every mutation has an external (environmental) cause rather than an internal molecular one, that is merely saying there's no randomness in life or imperfection in molecular processes. That is as much a philosophical as an empirical assertion (as perhaps any quantum physicist can tell you!). The key, as asserted in the post here, is that for the environmentalists' claim to make sense, to be a mutational cause in the meaningful sense, the force or factor must be systematic and identifiable and tissue-specific, and it must be shown how it gets to the internal tissue in question and not to other tissues on the way in, etc.
Given how difficult it has been to chase down most environmental carcinogenic factors, to which exposure is more than very rare, and that the search has been going on for a very long time, and only a few have been found that are, in themselves, clearly causal (ultraviolet radiation, Human Papilloma Virus, ionizing radiation, the ones mentioned in the post), whatever is left over must be very weak, non tissue-specific, rare, and the like. Even radiation-induced lung cancer in uranium minors has been challenging to prove (for example, because miners also largely were smokers).
It is not much of a stretch to simply say that even if, in principle, all mutations in our body's lifetime were due to external exposures, and the relevant mutagens could be identified and shown in some convincing way to be specifically carcinogenic in specific tissues, in practice if not ultra-reality, then the aggregate exposures to such mutations are unavoidable and epistemically random with respect to tissue and gene. That I would say is the essence of the V-T finding.
Quibbling about that aspect of carcinogenesis is for those who have already determined how many angels dance on the head of a pin.
Thursday, May 19, 2016
Another look at 'complexity'
By
Ken Weiss
A fascinating and clear description of one contemporary problem of sciences involved in 'complexity' can be found in an excellent discussion of how brains work, in yesterday's Aeon Magazine essay ("The Empty Brain," by Robert Epstein). Or rather, of how brains don't work. Despite the ubiquity of the metaphor, brains are not computers. Newborn babies, Epstein says, are born with brains that can learn, respond to the environment and change as they grow.
In dismissing the computer metaphor as a fad based on current culture, which seems like a very apt critique, he substitutes vague reasons without giving a better explanation. So, if we don't somehow 'store' an image of things in some 'place' in the brain, somehow we obviously do retain abilities to recall it. If the data-processing imagery is misleading, what else could there be?
We have no idea! But one important thing is that this essay reveals is that the problem of understanding multiple-component phenomena is a general one. The issues with the brain seem essentially the same as the issues in genomics, that we write about all the time, in which causation of the 'same' trait in different people is not due to the same causal factors (and we are struggling to figure out what they are in the first place).
In some fields like physics, chemistry, and cosmology, each item of a given kind, like an electron or a field or photon or mass is identical and their interactions replicable (if current understanding is correct). Complexities like the interactions or curves of motion among many galaxies each with many stars, planets, and interstellar material and energy, the computational and mathematical details are far too intricate and extensive for simple solutions. So one has to break the pattern down into subsets and simulate them on a computer. This seems to work well, however, and the reason is that the laws of behavior in physics apply equally to every object or component.
Biology is comprised of molecules and at their level of course the same must be true. But at anything close to the level of our needs for understanding, replicability is often very weak, except in the general sense that each person is 'more or less' alike in its physiology, neural structures, and so on. But at the level of underlying causation, we know that we're generally each different, often in ways that are important. This applies to normal development, health and even to behavior. Evolution works by screening differences, because that's how new species and adaptations and so on arise. So it is difference that is fundamental to us, and part of that is that each individual with the 'same' trait has it for different reasons. They may be nearly the same or very different--we have no a priori way to know, no general theory that is of much use in predicting, and we should stop pouring resources into projects to nibble away at tiny details, a convenient distraction from the hard thinking that we should be doing (as well as addressing many clearly tractable problems in genetics and behavior, where causal factors are strong, and well-known).
What are the issues?
There are several issues here and it's important to ask how we might think about them. Our current scientific legacy has us trying to identify fundamental causal units, and then to show how they 'add up' to produce the trait we are interested in. Add up means they act independently and each may, in a given individual, have its own particular strength (for example, variants at multiple contributing genes, with each person carrying a unique set of variants, and the variants having some specifiable independent effect). When one speaks of 'interactions' in this context, what is usually meant is that (usually) two factors combine beyond just adding up. The classical example within a given gene is 'dominance', in which the effect of the Aa genotype is not just the sum of the A and the a effects. Statistical methods allow for two-way interactions in roughly this way, by including terms like zAXB (some quantitative coefficient times the A and the B state in the individual), assuming that this is the same in every A-B instance (z is constant).
This is very generic (not based on any theory of how these factors interact), but for general inference that they do act in relevant ways, it seems fine. Theories of causality invoke such patterns as paths of factor interaction, but they almost always assume various clearly relevant simplifications: that interactions are only pair-wise, that there is no looping (the presence of A and B set up the effect, but A and B don't keep interacting in ways that might change that and there's no feedback from other factors), that the size of effects are fixed rather than being different in each individual context.
For discovery purposes this may be fine in many multivariate situations, and that's what the statistical package industry is about. But the assumptions may not be accurate and/or the number and complexity of interactions too great to be usefully inferred in practical data--too many interactions for achievable sample sizes, their parameters being affected by unmeasured variables, their individual effects too small to reach statistical 'significance' but in aggregate accounting for the bulk of effects, and so on.
These are not newly discovered issues, but often they can only be found by looking under the rug, where they've been conveniently swept because our statistical industry doesn't and cannot adequately deal with them. This is not a fault of the statistics except in the sense that they are not modeling things accurately enough, and in really complex situations, which seem to be the rule rather than the exception, it is simply not an appropriate way to make inferences.
We need, or should seek, something different. But what?
Finding better approaches is not easy, because we don't know what form they should take. Can we just tweak what we have, or are we asking the wrong sorts of questions for the methods we know about? Are our notions of causality somehow fundamentally inadequate? We don't know the answers. But what we now do have is a knowledge of the causal landscape that we face. It tells us that enumerative approaches are what we know how to do, but what we also know are not an optimal way to achieve understanding. The Aeon essay describes yet another such situation, so we know that we face the same sort of problem, which we call 'complexity' as a not very helpful catchword, in many areas. Modern science has shown this to us. Now we need to use appropriate science to figure it out.
But here is what we are not born with: information, data, rules, software, knowledge, lexicons, representations, algorithms, programs, models, memories, images, processors, subroutines, encoders, decoders, symbols, or buffers – design elements that allow digital computers to behave somewhat intelligently. Not only are we not born with such things, we also don’t develop them – ever.We are absolutely unqualified to discuss or even comment on the details or the neurobiology discussed. Indeed, even the author himself doesn't provide any sort of explanation of how brains actually work, using general hand-waving terms that are almost tautologically true, as when he says that experiences 'change' the brains. This involves countless neural connections (it must, since what else is there in the brain that is relevant?), and would be entirely different in two different people.
In dismissing the computer metaphor as a fad based on current culture, which seems like a very apt critique, he substitutes vague reasons without giving a better explanation. So, if we don't somehow 'store' an image of things in some 'place' in the brain, somehow we obviously do retain abilities to recall it. If the data-processing imagery is misleading, what else could there be?
We have no idea! But one important thing is that this essay reveals is that the problem of understanding multiple-component phenomena is a general one. The issues with the brain seem essentially the same as the issues in genomics, that we write about all the time, in which causation of the 'same' trait in different people is not due to the same causal factors (and we are struggling to figure out what they are in the first place).
![]() |
| A human brain, but what is it? Wikipedia |
In some fields like physics, chemistry, and cosmology, each item of a given kind, like an electron or a field or photon or mass is identical and their interactions replicable (if current understanding is correct). Complexities like the interactions or curves of motion among many galaxies each with many stars, planets, and interstellar material and energy, the computational and mathematical details are far too intricate and extensive for simple solutions. So one has to break the pattern down into subsets and simulate them on a computer. This seems to work well, however, and the reason is that the laws of behavior in physics apply equally to every object or component.
Biology is comprised of molecules and at their level of course the same must be true. But at anything close to the level of our needs for understanding, replicability is often very weak, except in the general sense that each person is 'more or less' alike in its physiology, neural structures, and so on. But at the level of underlying causation, we know that we're generally each different, often in ways that are important. This applies to normal development, health and even to behavior. Evolution works by screening differences, because that's how new species and adaptations and so on arise. So it is difference that is fundamental to us, and part of that is that each individual with the 'same' trait has it for different reasons. They may be nearly the same or very different--we have no a priori way to know, no general theory that is of much use in predicting, and we should stop pouring resources into projects to nibble away at tiny details, a convenient distraction from the hard thinking that we should be doing (as well as addressing many clearly tractable problems in genetics and behavior, where causal factors are strong, and well-known).
What are the issues?
There are several issues here and it's important to ask how we might think about them. Our current scientific legacy has us trying to identify fundamental causal units, and then to show how they 'add up' to produce the trait we are interested in. Add up means they act independently and each may, in a given individual, have its own particular strength (for example, variants at multiple contributing genes, with each person carrying a unique set of variants, and the variants having some specifiable independent effect). When one speaks of 'interactions' in this context, what is usually meant is that (usually) two factors combine beyond just adding up. The classical example within a given gene is 'dominance', in which the effect of the Aa genotype is not just the sum of the A and the a effects. Statistical methods allow for two-way interactions in roughly this way, by including terms like zAXB (some quantitative coefficient times the A and the B state in the individual), assuming that this is the same in every A-B instance (z is constant).
This is very generic (not based on any theory of how these factors interact), but for general inference that they do act in relevant ways, it seems fine. Theories of causality invoke such patterns as paths of factor interaction, but they almost always assume various clearly relevant simplifications: that interactions are only pair-wise, that there is no looping (the presence of A and B set up the effect, but A and B don't keep interacting in ways that might change that and there's no feedback from other factors), that the size of effects are fixed rather than being different in each individual context.
For discovery purposes this may be fine in many multivariate situations, and that's what the statistical package industry is about. But the assumptions may not be accurate and/or the number and complexity of interactions too great to be usefully inferred in practical data--too many interactions for achievable sample sizes, their parameters being affected by unmeasured variables, their individual effects too small to reach statistical 'significance' but in aggregate accounting for the bulk of effects, and so on.
These are not newly discovered issues, but often they can only be found by looking under the rug, where they've been conveniently swept because our statistical industry doesn't and cannot adequately deal with them. This is not a fault of the statistics except in the sense that they are not modeling things accurately enough, and in really complex situations, which seem to be the rule rather than the exception, it is simply not an appropriate way to make inferences.
We need, or should seek, something different. But what?
Finding better approaches is not easy, because we don't know what form they should take. Can we just tweak what we have, or are we asking the wrong sorts of questions for the methods we know about? Are our notions of causality somehow fundamentally inadequate? We don't know the answers. But what we now do have is a knowledge of the causal landscape that we face. It tells us that enumerative approaches are what we know how to do, but what we also know are not an optimal way to achieve understanding. The Aeon essay describes yet another such situation, so we know that we face the same sort of problem, which we call 'complexity' as a not very helpful catchword, in many areas. Modern science has shown this to us. Now we need to use appropriate science to figure it out.
Wednesday, January 27, 2016
"The Blizzard of 2016" and predictability: Part III: When is a health prediction 'precise' enough?
By
Ken Weiss
We've discussed
the use of data and models to predict the weather in the last few days (here and here). We've lauded the
successes, which are many, and noted the problems, including people not heeding advice. Sometimes that's due, as a
commenter on our first post in this series noted, to previous predictions that did not pan out, leading people to ignore
predictions in the future. It is the tendency of some weather
forecasters, like all media these days, to exaggerate or dramatize things, a normal part of our society's way of getting attention (and resources).
We also noted the genuine challenges
to prediction that meteorologists face. Theirs is a science that is based
on very sound physics principles and theory, that as a meteorologist friend put it, constrain what can and might happen, and make good forecasting possible.
In that sense the challenge for accuracy is in the complexity of global weather dynamics and inevitably imperfect data, that may defy perfect
analysis even by fast computers. There are
essentially random or unmeasured movements of molecules and so on, leading to 'chaotic'
properties of weather, which is indeed the iconic example of chaos, known as the so-called
'butterfly effect': if a butterfly flaps its wings, the initially tiny and unseen perturbation can proliferate through the atmosphere, leading to unpredicted, indeed, wildly unpredictable changes in what happens.
Reducing such effects is largely a matter of needing more data. Radar and satellite data are more or less continuous, but many other key observations are only made many miles apart, both on the surface and into the air, so that meteorologists must try to connect them with smooth gradients, or estimates of change, between the observations. Hence the limited number of future days (a few days to a week or so) for which forecasts are generally accurate.
![]() |
| The Butterfly Effect, far-reaching effects of initial conditions; Wikipedia, source |
Reducing such effects is largely a matter of needing more data. Radar and satellite data are more or less continuous, but many other key observations are only made many miles apart, both on the surface and into the air, so that meteorologists must try to connect them with smooth gradients, or estimates of change, between the observations. Hence the limited number of future days (a few days to a week or so) for which forecasts are generally accurate.
Meteorologists' experience, given their
resources, provide instructive parallels as well as differences with biomedical sciences, that aim for precise prediction, often of things decades in the future, such as disease risk based on genotype at birth or lifestyle exposures. We should pay attention to those parallels and differences.
When is the population average the best
forecast?
Open physical systems, like the
atmosphere, change but don't age. Physical continuity means that today is a reflection of yesterday, but the atmosphere doesn't accumulate 'damage' the way people do, at least not in a way that makes a difference to weather prediction. It can move, change, and refresh, with a continuing influx and loss of energy, evaporation and condensation, and circulating movement, and so on. By contrast, we are each on a one-way track, and a population continually has to start over with its continual influx of new births and loss to death. In that sense, a given set of atmospheric conditions today has essentially the
same future risk profile as such conditions had a year or century or millennium ago. In a way, that is what it means to have a general atmospheric theory. People aren't like that.
By far, most individual genetic and even environmental risk factors
identified by recent Big Data studies only alter lifetime risk by a
small fraction. That is why the advice changes so frequently and inconsistently. Shouldn't it be that eggs and coffee either are good or harmful for you? Shouldn't a given genetic variant definitely either put you at high risk, or not?
The answer is typically no, and the fault is in the reporting of data, not the data themselves. This is for several very good reasons. There is measurement error. From everything we know, the kinds of outcomes we are struggling to understand are affected by a very large number of separate causally relevant factors. Each individual is exposed to a different set or level of those factors, which may be continually changing. The impact of risk factors also changes cumulatively with exposure time--because we age. And we are trying to make lifetime predictions, that is, ones of open-ended duration, often decades into the future. We don't ask "Will I get cancer by Saturday?", but "Will I ever get cancer?" That's a very different sort of question.
Each person is unique, like each storm, but we rarely have the kind of replicable sampling of the entire 'space' of potentially risk-affecting genetic variants--and we never will, because many genetic or even environmental factors are very rare and/or their combinations essentially unique, they interact and they come and go. More importantly, we simply do not have the kind of rigorous theoretical basis that meteorology does. That means we may not even know what sort of data we need to collect to get a deeper understanding or more accurate predictive methods.
The answer is typically no, and the fault is in the reporting of data, not the data themselves. This is for several very good reasons. There is measurement error. From everything we know, the kinds of outcomes we are struggling to understand are affected by a very large number of separate causally relevant factors. Each individual is exposed to a different set or level of those factors, which may be continually changing. The impact of risk factors also changes cumulatively with exposure time--because we age. And we are trying to make lifetime predictions, that is, ones of open-ended duration, often decades into the future. We don't ask "Will I get cancer by Saturday?", but "Will I ever get cancer?" That's a very different sort of question.
Each person is unique, like each storm, but we rarely have the kind of replicable sampling of the entire 'space' of potentially risk-affecting genetic variants--and we never will, because many genetic or even environmental factors are very rare and/or their combinations essentially unique, they interact and they come and go. More importantly, we simply do not have the kind of rigorous theoretical basis that meteorology does. That means we may not even know what sort of data we need to collect to get a deeper understanding or more accurate predictive methods.
Unique contributions of combinations of a multiplicity of risk factors for a given outcome means the effect of each factor is generally very small and even in individuals their mix is continually changing. Lifetime risks for a trait are also necessarily averaged across all other traits--for example, all other competing causes of death or disease.
A fatal early heart attack is the best preventive against cancer! There are exceptions of course, but generally, forecasts are weak to begin with and in many ways over longer predictive time periods they will simply
approximate the population--public health--average. In a way that is a kind of analogy with weather forecasts that, beyond a few days into the future, move towards the climate average.
Disease forecasts change peoples' behavior (we stop eating eggs or forego our morning coffee, say), each person doing so, or not, to his/her own extent. That is, feedback from the forecast affects the very risk process itself, changing the risks themselves and in unknown ways. By contrast, weather forecasts can change behavior as well (we bring our umbrella with us) but the change doesn't affect the weather itself.
![]() |
| Parisians in the rain with umbrellas, by Louis-Léopold Boilly (1803) |
Of course, there are many genes in which variants have very
strong effects. For those, forecasts are not perfect but the details aren't worth worrying about: if there are treatments, you take them. Many of these are due to single genes and the trait may be present at birth. The mechanism can be studied because the problem is focused. As a rule we don't need Big Data to discover and deal with them.
The epidemiological and biomedical problem is with attempts to forecast complex traits, in which most every instance is causally unique. Well, every weather situation is unique in its details, too--but those details can all be related to a single unifying theory that is very precise in principle. Again, that's what we don't yet have in biology, and there is no really sound scientific justification for collecting reams of new data, which may refine predictions somewhat, but may not go much farther. We need to develop a better theory, or perhaps even to ask whether there is such a formal basis to be had--or is the complexity we see is just what there is?
The epidemiological and biomedical problem is with attempts to forecast complex traits, in which most every instance is causally unique. Well, every weather situation is unique in its details, too--but those details can all be related to a single unifying theory that is very precise in principle. Again, that's what we don't yet have in biology, and there is no really sound scientific justification for collecting reams of new data, which may refine predictions somewhat, but may not go much farther. We need to develop a better theory, or perhaps even to ask whether there is such a formal basis to be had--or is the complexity we see is just what there is?
Meteorology
has ways to check its 'precision' within days, whereas biomedical sciences have to wait decades for our rewards and punishments. In the absence of tight rules and ways to adjust errors, constraints on biomedical business as usual are weak. We think a key reason for this is that we must rely not on externally applied theory, but internal comparisons, like cases vs controls. We can test for statistical differences in risk, but there is no reason these will be the same in other samples, or the future. Even when a gene or dietary factor is identified by such studies, its effects are usually
not very strong even if the mechanism by which they affect risk can be
discovered. We see this repeatedly, even for risk factors that seemed to be obvious.
We are constrained not just to use internal comparisons
but to extrapolate the past to the future. Our comparisons, say between cases and controls, are retrospective and almost
wholly empirical rather than resting on adequate theory. The
'precision' predictions we are being promised are basically just applications of those retrospective findings to the future. It's typically little more than extrapolation, and because risk factors are complex and each person is unique, the extrapolation largely assumes additivity: that we just add up the risk estimates for various factors that we measured on existing samples, and use that sum as our estimate of future risk.
'precision' predictions we are being promised are basically just applications of those retrospective findings to the future. It's typically little more than extrapolation, and because risk factors are complex and each person is unique, the extrapolation largely assumes additivity: that we just add up the risk estimates for various factors that we measured on existing samples, and use that sum as our estimate of future risk.
Thus, while for meteorology, Big Data makes sense because there is strong underlying theory, in many aspects of biomedical and evolutionary sciences, this is simply not the case, at least not yet. Unlike meteorology, biomedical and genetic sciences are the really harder ones! We are arguably just as likely to progress in our understanding by accumulating results from carefully focused questions, where we're tracing some real causal signal (e.g., traits with specific, known strong risk factors), as by just feeding the incessant demands of the Big Data worldview. But this of course is a point we've written (ranted?) about many times.
You bet your life, or at least your
lifestyle!
If you venture out on the highway despite
a forecast snowstorm, you are placing your life in your hands. You are
also imposing dangers on others (because accidents often involve multiple
vehicles). In the case of disease, if you are led by scientists or the media to
take their 'precision' predictions too seriously, you are doing something
similar, though most likely mainly affecting yourself.
Actually, that's not entirely true. If you smoke or hog up on MegaBurgers, you certainly put yourself at risk, but you risk others, too. That's because those instances of disease that truly are strongly and even mappably genetic (which seems true of subsets of even of most 'complex' diseases), are masked by the majority of cases that are due to easily avoidable lifestyle factors; the causal 'noise' that risky lifestyles make genetic causation harder to tease out.
Actually, that's not entirely true. If you smoke or hog up on MegaBurgers, you certainly put yourself at risk, but you risk others, too. That's because those instances of disease that truly are strongly and even mappably genetic (which seems true of subsets of even of most 'complex' diseases), are masked by the majority of cases that are due to easily avoidable lifestyle factors; the causal 'noise' that risky lifestyles make genetic causation harder to tease out.
Of course, taking minor risks too seriously also has known potentially serious consequences, such as of intervening on something that was weakly problematic to begin with. Operating on a slow-growing prostate or colon cancer in older people, may lead to more damage than the cancer will. There are countless other examples.
Life as a Garden Party
The need is to understand weak predictability, and to learn to live with it. That's not easy.
I'm reminded of a time when I was a weather officer stationed at an Air Force fighter base in the eastern UK. One summer, on a Tuesday morning, the base commander called me over to HQ. It wasn't for the usual morning weather briefing.....
"Captain, I have a question for you," said the Colonel.
"Yes, sir?"
"My wife wants to hold a garden party on Saturday. What will the weather be?"
"It might rain, sir," I replied.
The Colonel was not very pleased with my non-specific answer, but this was England, after all!
And if I do say so myself, I think that was the proper, and accurate, forecast.**
**(It did rain. The wife was not happy! But I'd told the truth.)
The need is to understand weak predictability, and to learn to live with it. That's not easy.
I'm reminded of a time when I was a weather officer stationed at an Air Force fighter base in the eastern UK. One summer, on a Tuesday morning, the base commander called me over to HQ. It wasn't for the usual morning weather briefing.....
"Captain, I have a question for you," said the Colonel.
"Yes, sir?"
"My wife wants to hold a garden party on Saturday. What will the weather be?"
"It might rain, sir," I replied.
The Colonel was not very pleased with my non-specific answer, but this was England, after all!
And if I do say so myself, I think that was the proper, and accurate, forecast.**
![]() |
| Plus ça change.. Rain drenches royal garden party, 2013; The Guardian |
**(It did rain. The wife was not happy! But I'd told the truth.)
Subscribe to:
Posts (Atom)




