My composer friend wants to be quite modern about creating beautiful music. He doesn't like to use computer programs for composing but he has devised another 'modern' way to compose, given that, in writing a piece, he often changes his mind. Scratching out notes on paper to replace them with 'better' ones makes for a real mess on the working pages, and he'd then have to transcribe his work onto new pages, and that in itself introduces room for mistakes. So he had an idea.
He purchased a set of notes and musical symbols, printed individually on a kind of flexible plastic. Copies of each possible note and notation element were in boxes in a little tray. As he composed, he merely took each required note from its place in the tray, and used its static electricity to place it on a page with printed staff-lines. If he changed his mind, it was easy to remove or replace a given note, and put it back in its box in the tray without generating an inky mess on the page and having to keep starting over to make his work-in-progress legible.
But there turned out to be a serious, indeed even tragic, problem. He liked working in his studio, right in front of a window giving him an inspiring view of his garden. But, after days of work composing a comparably ethereal and beautiful piece, a gust blew through the window, riffled the pages, and shook all the notes off the page and onto the table! What a scattered mess! And what a heartbreaking loss of all that work!
Of course, you could say that the composition with all its beauty was in some sense still there, right before him: all the required notes were indeed still there--every one. But they were in a pile, no longer with any order from which he could reconstruct the composition just by picking the notes up and placing them back on the page. So, it was literally all there---but none of what mattered was!
As my composer friend told me this story, it occurred to me that this was analogous to the 'pile' of DNA letters (As, Cs, Gs, and Ts) that is found by sequencing people with and without some trait, like a disease. The letters differ greatly among individuals with the 'same' trait, because they don't have the trait for the same genetic reason. And the sampled individuals' genomes vary in literally countless ways that have nothing to do with the disease. Unlike the score, the 'letters' are still in their original order, but genes don't make a score as far as we are concerned because, unlike an orchestra, we don't know how to 'play' them!
In a sense, each person we see who is playing the same tune, so to speak, is doing so from a different score. Some shared notes may be involved, but they are all jumbled up with shared, and not-shared, notes that have nothing to do with the tune.
And yet we are widely promised, and widely being trephined to pay for, the idea that looking through the jumble of genetic 'notes' we can predict just about anything you can name about each individual's traits.
Indeed, unlike the composer's problem, there are all sorts of notes that are not even visible to us (they are called 'somatic mutations'). We yearn for a health-giving genomic 'tune', which is a very natural way to feel, but we are unable (or, at least, unwilling) to face the music of genomic reality.
And, of course, this mega-scale 'omics 'research' is all justified with great vigor by NIH, as if it is on the very verge of discovering fundamental findings that will lead to miraculous cures, indeed cures for 'All of us'. At what point is it justified to refer to it as a kind of culpable fraud, a public con job?
By our bigger, bigger, bigger approach, we have entrenched 'composers' trying to read scores that are to a great extent unreadable in the way being attempted. We are so intense at this, like rows of monks transcribing sacred manuscripts in a remote monastery, that we are committed to something that we basically have every legitimate good reason to know isn't the way things are.
Showing posts with label genomics. Show all posts
Showing posts with label genomics. Show all posts
Friday, May 10, 2019
Monday, August 13, 2018
Big Data: the new Waiting for Godot
By
Ken Weiss
In Samuel Beckett's cryptic play, Waiting for Godot, two men spend the entire play anticipating the arrival of someone, Godot, at which point presumably something will happen--one can say, perhaps, that the wait will have been for some achieved objective. But what? Could it simply mean that they can then go somewhere else? Or, perhaps, there will be no end because Godot will never, in fact, arrive.
A good discussion of all of this is on the BBC Radio 4 The Forum podcast. Apparently, Beckett insisted that any such answers were in the play itself--he didn't imply that there was some external meaning, such as that Godot was God, or that the play was an allegory for the Cold War--which is one reason the play is so enigmatic.
Was the play written intentionally to be a joke, or a hoax? Of course, since the author refused to answer or perhaps even to recognize the legitimacy of the question, we'll never know. Or perhaps that in itself, is the tipoff that it really is a hoax. Or maybe (I think more likely) that because it was written in France in 1949, it's an existentialist era statement of the angst that comes from the recognition that the important questions in life don't have answers.
Waiting for the biomedical Promised Land
That was then, but today we are witnessing real-life versions of the play: things just as cleverly open-ended, with the 'What happens then?' question only having a vague, deferred answer, as in Beckett's title. And, as in the play, it is not clear how self-aware even some of the perpetrators are of what they are about.
I refer to the possibility that we are witnessing various Big Data endeavors, unknowingly imitative but as cleverly and cryptically open-ended as the implied resolution that will happen when Godot arrives. Big Data 'omics is a current, perhaps all too convenient, scientific version of the play, that we might call Waiting for God'omics. The arrival of the objective--indeed, not really stated, but just generically promised as, for example, 'precision genomic medicine' for 'All of Us'--is absolutely as slyly vague as what Vladimir and Estragon were presumably waiting for. The genomic Godot will never arrive!
This view is largely but not entirely cynical, for reasons that are at least a bit subtle themselves.
Reaching the oasis, the end of the rainbow, or the Promised Land is bad for business
One might note that if the 'omics Godot were ever to arrive, it would be the end of the Big Data (or should one say Big Gravy?) train, so obviously our Drs Vladimirs and Estragons must ensure that such a tragedy, arrival at the promised land, the elimination of all diseases in everyone, or whatever, never happens in real life. Is there any sense that anyone seriously thinks we would reach resolution of the cause of disease, with precision for all of us, say, and be able (that is, willing) to close down the Big Budget nature of our proliferating 'omictical me-too world?
We have entrenched the search for Godot, a goal so vague as to be unattainable. Even the proper use of the term 'precision' implies an asymptote, a truth that one never reaches but can get ever closer to. If we could get there, as is implied, we should have been promised 'exact' genomic medicine. And wouldn't this imply that then, finally, we'll divert the resources towards cures and prevention?
However, even if the perpetrators of the Big Promises never think or aren't aware of it, we must note that the goal cannot be reached even with the best and most honorable of intentions. Because of births and deaths, and environmental changes, and mutations and recombination, there truly never is the palm-draped oasis at which our venture could cease. There will never be an 'all' of us, and genetic causation is ever-changing (in part because of the similarly dynamic environment), meaning that there are no such things as risks to be approached with 'precision'. Risks are changeable and not stable, and indeed not fixed numerical values. At best, they are collective population (or sample) averages. So there is never a 'there' there, anywhere. There is only a different one everywhere.
But awareness of these facts doesn't seem to be part of the 'omicsalyptic promises with which we are inundated. They seem, by contrast, rote promises that are little if any different from political, economic, or religious promises--if only we do this, we'd get to a Promised Land. But such a land does not exist.
If we had, say, a real national health system, it would be properly and avowedly open-ended without anyone honorable objecting (if it were done well). And epidemiologically, of course, there will always be new mutations, recombinations, environments and the like to try to understand--disease with, or without strong genotype-phenotype causation. There will always be a need for health research (and basic science). But science, of all fields of human endeavor, should be honest. It should not hold out the promise that Godot will arrive, but in a sense, openly acknowledge that that can never happen.
But this doesn't let those off the guilty hook who are hawking today's implicit Big Data, big open-ended budget promise that by goosing up research now we'll soon eliminate genetic disease (I recall that Francis Collins did indeed, not all that long ago, promise that this Paradise would come soon--um, I think his date was something like 2010!) It's irresponsible, self-interested promising, of course. And those in genomics who are intelligent enough to deserve to be in genomics do, or should, know that very well.
Like Vladimir and Estragon, we'll always be told that we're waiting for Godot, and that he'll be coming soon.
NOTE: One might observe that Godoism is a firmly entrenched strategy elsewhere in our society, for examples, in regard to theoretical physics, where there will never be a collider big enough to answer the questions about fundamental particles: coming to closure would be as fiscally threatening to physics as it is to life sciences. Science is not alone in this, but our society does not pay it nearly enough skeptical heed.
![]() |
| www.mckellen.com |
A good discussion of all of this is on the BBC Radio 4 The Forum podcast. Apparently, Beckett insisted that any such answers were in the play itself--he didn't imply that there was some external meaning, such as that Godot was God, or that the play was an allegory for the Cold War--which is one reason the play is so enigmatic.
Was the play written intentionally to be a joke, or a hoax? Of course, since the author refused to answer or perhaps even to recognize the legitimacy of the question, we'll never know. Or perhaps that in itself, is the tipoff that it really is a hoax. Or maybe (I think more likely) that because it was written in France in 1949, it's an existentialist era statement of the angst that comes from the recognition that the important questions in life don't have answers.
Waiting for the biomedical Promised Land
That was then, but today we are witnessing real-life versions of the play: things just as cleverly open-ended, with the 'What happens then?' question only having a vague, deferred answer, as in Beckett's title. And, as in the play, it is not clear how self-aware even some of the perpetrators are of what they are about.
I refer to the possibility that we are witnessing various Big Data endeavors, unknowingly imitative but as cleverly and cryptically open-ended as the implied resolution that will happen when Godot arrives. Big Data 'omics is a current, perhaps all too convenient, scientific version of the play, that we might call Waiting for God'omics. The arrival of the objective--indeed, not really stated, but just generically promised as, for example, 'precision genomic medicine' for 'All of Us'--is absolutely as slyly vague as what Vladimir and Estragon were presumably waiting for. The genomic Godot will never arrive!
This view is largely but not entirely cynical, for reasons that are at least a bit subtle themselves.
Reaching the oasis, the end of the rainbow, or the Promised Land is bad for business
One might note that if the 'omics Godot were ever to arrive, it would be the end of the Big Data (or should one say Big Gravy?) train, so obviously our Drs Vladimirs and Estragons must ensure that such a tragedy, arrival at the promised land, the elimination of all diseases in everyone, or whatever, never happens in real life. Is there any sense that anyone seriously thinks we would reach resolution of the cause of disease, with precision for all of us, say, and be able (that is, willing) to close down the Big Budget nature of our proliferating 'omictical me-too world?
We have entrenched the search for Godot, a goal so vague as to be unattainable. Even the proper use of the term 'precision' implies an asymptote, a truth that one never reaches but can get ever closer to. If we could get there, as is implied, we should have been promised 'exact' genomic medicine. And wouldn't this imply that then, finally, we'll divert the resources towards cures and prevention?
However, even if the perpetrators of the Big Promises never think or aren't aware of it, we must note that the goal cannot be reached even with the best and most honorable of intentions. Because of births and deaths, and environmental changes, and mutations and recombination, there truly never is the palm-draped oasis at which our venture could cease. There will never be an 'all' of us, and genetic causation is ever-changing (in part because of the similarly dynamic environment), meaning that there are no such things as risks to be approached with 'precision'. Risks are changeable and not stable, and indeed not fixed numerical values. At best, they are collective population (or sample) averages. So there is never a 'there' there, anywhere. There is only a different one everywhere.
But awareness of these facts doesn't seem to be part of the 'omicsalyptic promises with which we are inundated. They seem, by contrast, rote promises that are little if any different from political, economic, or religious promises--if only we do this, we'd get to a Promised Land. But such a land does not exist.
If we had, say, a real national health system, it would be properly and avowedly open-ended without anyone honorable objecting (if it were done well). And epidemiologically, of course, there will always be new mutations, recombinations, environments and the like to try to understand--disease with, or without strong genotype-phenotype causation. There will always be a need for health research (and basic science). But science, of all fields of human endeavor, should be honest. It should not hold out the promise that Godot will arrive, but in a sense, openly acknowledge that that can never happen.
But this doesn't let those off the guilty hook who are hawking today's implicit Big Data, big open-ended budget promise that by goosing up research now we'll soon eliminate genetic disease (I recall that Francis Collins did indeed, not all that long ago, promise that this Paradise would come soon--um, I think his date was something like 2010!) It's irresponsible, self-interested promising, of course. And those in genomics who are intelligent enough to deserve to be in genomics do, or should, know that very well.
Like Vladimir and Estragon, we'll always be told that we're waiting for Godot, and that he'll be coming soon.
NOTE: One might observe that Godoism is a firmly entrenched strategy elsewhere in our society, for examples, in regard to theoretical physics, where there will never be a collider big enough to answer the questions about fundamental particles: coming to closure would be as fiscally threatening to physics as it is to life sciences. Science is not alone in this, but our society does not pay it nearly enough skeptical heed.
Monday, June 1, 2015
Through a glass, very darkly: a comparison with genomics
By
Ken Weiss
One of our favorite BBC Radio4 programs is called In Our Time (available as downloadable podcast or to play on line). Every week, host Melvyn Bragg and 3 academic guests discuss some topic of interest, be it from science, philosophy, history, the arts or whatever, for about 40 minutes, explaining the topic and making it understandable and very interesting. This program is like a college education, except that it is free, eclectic, digestible.....and without grades or exams!
The May 28 episode is about glass. What is glass? How is it made? What is its molecular structure? A key point of this discussion is that glass is a complex molecular structure, composed variously of sand or polymers with other added elements heated, shaped, and then cooled. The key fact is that the molecules are irregularly arranged: what gives glass its properties is that it is not a crystal, despite what it may seem. For a given chemical composition and heating/cooling process, the properties of a piece of glass are predictable and repeatable, but not in the sense of a crystal. Glass goes through a 'phase transition' from liquid to solid when it cools, but it's not an orderly transition such as is seen in water and many other substances, and in crystal formation.
In a crystal, all the molecules are precisely arranged in spatial order relative to each other. The arrangement is generally predictable (understandable theoretically) and repeatable. It has specifiable mathematical relationships. But what makes glass glass is that it is not arranged in that way. Instead, a piece of glass's molecules are uniquely or randomly arranged; this means that the structure of any piece of glass can't be predicted, although it can be determined post hoc. What that disorderly transition means is that no two vases or windows are identical at the molecular level, even if they have the same macro-properties. The 'impurities' that are added affect the arrangement of the molecules in ways that provide different strength, melting or cooling temperatures, color, or refraction of light passing through.
There are unlimited ways to make glass; adding different materials in different relative amounts, heated according to their properties and the desired result. The molecules in each of a set of dinner glasses are basically the same in relative proportions, but entirely unrelated in their arrangement. But how then can a factory turn out countless drinking glasses, jelly jars, or optical lenses that seem identical?
The reproducible randomness of glass
The answer essentially has to do with large numbers. It is much like, say, the ideal gas law which says that the pressure or temperature of a gas in a container depends on the size of the container, and the amount of molecules of (any) gas that is inside it. Each molecule is careering around banging into other molecules of the gas or of the wall, and caroming off like billiard balls. It isn't possible to measure each encounter as there are typically gazzilions of them. But statistically, the number of collisions is roughly the same for a given set of conditions, so the net result is statistically highly predictable, without having to examine any individual molecule.
Similar properties apply to gas, according to this BBC discussion. Each piece of glass has uncountable numbers of molecules, more or less randomly arranged. But there are so many of them that their net macro-scale properties are highly predictable, without having to examine any individual molecule. But that's no longer true at the molecular level.
Thinking about the nature of glass provides an illustrative way to think about another kind of assemblage, but one that has very different properties.
The irreproducible randomness of genomes
Genomes are molecules that interact with other molecules. This interaction has been designed by evolutionary history to lead to development of fertilized seed or egg into an adult, and to respond to environmental conditions of various sorts. So the action or 'use' of genomes leads to changes in the organism. Some are gradual and 'normal', others are sudden and 'abnormal', that is, we call the change 'disease' or 'puberty' and so on. Whether these could usefully be viewed as phase transitions I don't know. The changes are, in terms of genes, only partly defined, often even if we know there are risk alleles that alter the timing of such transitions.
Like glass, the complex of any genotype's relationship to the organism's traits is empirical rather than wholly predictable from the 'molecules' (genotypic elements). Similarly, enumerating the causal elements is not that different from the situation in glass. Some properties of glass, like strength and color, can be predicted by knowing its constituents, but these are collective properties and not predictable from enumerations of the individual atoms. Genomics may be a bit more predictive, but not all that different in most cases.
Overall, the analogy seems at best imperfect in specifics, but very useful as a way of thinking about the power, or limits of power, of enumerative prediction, and an indicator of collective, if individually unique, prediction.
The May 28 episode is about glass. What is glass? How is it made? What is its molecular structure? A key point of this discussion is that glass is a complex molecular structure, composed variously of sand or polymers with other added elements heated, shaped, and then cooled. The key fact is that the molecules are irregularly arranged: what gives glass its properties is that it is not a crystal, despite what it may seem. For a given chemical composition and heating/cooling process, the properties of a piece of glass are predictable and repeatable, but not in the sense of a crystal. Glass goes through a 'phase transition' from liquid to solid when it cools, but it's not an orderly transition such as is seen in water and many other substances, and in crystal formation.
In a crystal, all the molecules are precisely arranged in spatial order relative to each other. The arrangement is generally predictable (understandable theoretically) and repeatable. It has specifiable mathematical relationships. But what makes glass glass is that it is not arranged in that way. Instead, a piece of glass's molecules are uniquely or randomly arranged; this means that the structure of any piece of glass can't be predicted, although it can be determined post hoc. What that disorderly transition means is that no two vases or windows are identical at the molecular level, even if they have the same macro-properties. The 'impurities' that are added affect the arrangement of the molecules in ways that provide different strength, melting or cooling temperatures, color, or refraction of light passing through.
There are unlimited ways to make glass; adding different materials in different relative amounts, heated according to their properties and the desired result. The molecules in each of a set of dinner glasses are basically the same in relative proportions, but entirely unrelated in their arrangement. But how then can a factory turn out countless drinking glasses, jelly jars, or optical lenses that seem identical?
The reproducible randomness of glass
The answer essentially has to do with large numbers. It is much like, say, the ideal gas law which says that the pressure or temperature of a gas in a container depends on the size of the container, and the amount of molecules of (any) gas that is inside it. Each molecule is careering around banging into other molecules of the gas or of the wall, and caroming off like billiard balls. It isn't possible to measure each encounter as there are typically gazzilions of them. But statistically, the number of collisions is roughly the same for a given set of conditions, so the net result is statistically highly predictable, without having to examine any individual molecule.
Similar properties apply to gas, according to this BBC discussion. Each piece of glass has uncountable numbers of molecules, more or less randomly arranged. But there are so many of them that their net macro-scale properties are highly predictable, without having to examine any individual molecule. But that's no longer true at the molecular level.
Thinking about the nature of glass provides an illustrative way to think about another kind of assemblage, but one that has very different properties.
The irreproducible randomness of genomes
Genomes are molecules that interact with other molecules. This interaction has been designed by evolutionary history to lead to development of fertilized seed or egg into an adult, and to respond to environmental conditions of various sorts. So the action or 'use' of genomes leads to changes in the organism. Some are gradual and 'normal', others are sudden and 'abnormal', that is, we call the change 'disease' or 'puberty' and so on. Whether these could usefully be viewed as phase transitions I don't know. The changes are, in terms of genes, only partly defined, often even if we know there are risk alleles that alter the timing of such transitions.
Like glass, the complex of any genotype's relationship to the organism's traits is empirical rather than wholly predictable from the 'molecules' (genotypic elements). Similarly, enumerating the causal elements is not that different from the situation in glass. Some properties of glass, like strength and color, can be predicted by knowing its constituents, but these are collective properties and not predictable from enumerations of the individual atoms. Genomics may be a bit more predictive, but not all that different in most cases.
Overall, the analogy seems at best imperfect in specifics, but very useful as a way of thinking about the power, or limits of power, of enumerative prediction, and an indicator of collective, if individually unique, prediction.
Monday, March 2, 2015
When even well-posed questions are hard to answer
On Friday, in acknowledgement of Rare Disease Day, our daughter Ellen blogged about living with a rare disease. She wrote eloquently about her wish to understand why she has this disease, including, if it's a single gene disorder, knowing the causal variant. She wrote about the advantages of this when navigating a medical system that isn't always sensitive to rare diseases, but in which genetics has become the gold standard. We fully support her wish to understand why she has this disease, and have tried to help as much as we can. We would do the DNA work ourselves if we could.
Even so, she mentioned that her parents, Ken and I, are skeptics about a lot of genetic research. Yes, that's true, but another word for that is 'realist'. We are alive at a time in history when more is known about genes and genomes than ever before, and for decades we've been hearing promises of what this new knowledge will mean for medicine, and the promises roll on. Once we all have our genomes on a disk, we'll be able to predict and treat whatever it is our DNA foretells.
Except, except, Ellen's genome is on a disk. Or at least her exome, the protein-coding parts of her genome. Her disease, hypokalemic periodic paralysis, is one of several forms of periodic paralysis, which have been found to be associated with three different ion channel genes. At one time a researcher in Germany was offering free genotyping to anyone diagnosed with the disease. Ellen sent blood samples, but was told that she doesn't have any of the known causal variants in these genes. She was also involved in a large whole exome study of unexplained Mendelian disease, but all they were able to tell her was that she doesn't have any potentially causal de novo mutations, mutations that neither Ken nor I have. She is the only family member with HKPP, and as such, the initial question in a search for the cause is whether she has a variant that we don't have, that might be responsible.
And, to her frustration, that is all she knows. It has been suggested that she go the clinical genetics route, having her DNA tested for known causes of HKPP, but that seems unlikely to be helpful, given that she knows what disease she has, just doesn't know why, and clinical labs don't look for new causal genes or variants, but instead a battery of those that are known.
Ellen has classic symptoms and classic triggers, and her disease is pretty well controlled at the moment, so identifying the cause, as she wrote in her post, might not change her treatment, but it would ease her mind about future dealings with the medical system. As importantly, it might help future patients avoid the lengthy, destructive diagnostic odyssey she herself experienced, which itself would be a very satisfying outcome.
Big Data advocates will say that the problem is that not enough people with HKPP have been sequenced, and once we've got a million genomes or more, that will facilitate identifying Ellen's and others' causal variants. But only 1 in 200,000 people have HKPP, so one million is unlikely to help. And, though the data are rather sparse, some estimates based on those data suggest that a fairly large minority, a third or so, won't have one of the known causal genetic variants. As with most diseases, the phenotypes vary greatly, and again as with most diseases, this is likely to be because every genome is unique, and genetic background matters, along with exposure to other triggering factors.
Perhaps there's an as-yet unidentified gene that would explain many of the unidentified cases, or there are many unique pathways to the disease, or both, but given the rarity and the heterogeneity of the periodic paralyses, it would take a huge amount of luck for even a large database to answer Ellen's question. We should perhaps call it dumb luck, because the investigators vacuum up generic data without specific regard to, say, the physiology of this particular disorder (and the same for countless other disorders). Of course, collecting data on every possible physiological or environmental factor, mostly with weak individual effects, isn't possible and that is a dilemma for modern public health science.
In addition, it's known from affected families that penetrance of alleles related to the periodic paralyses is not 100% -- some people with a 'causal' variant never experience an attack, making associating genotype with phenotype even harder. Again, genetic background may affect this but, as with many genetic disorders with variable penetrance, it's not at all clear. Incomplete penetrance is a fact, but also a fudge factor, because it leaves the impression the trait really is 'genetic'; in fact, we often don't know how many people have such mutations but no symptoms at all, because they aren't screened (but some studies looking for such asymptomatic cases have easily found them, and they can be as common as the 'causal' mutations in affected patients).
Further, it's possible that there are non-ion channel related causes of these channelopathies. That is, something upstream is going wrong. In that case, it's unclear where to even begin to look for genetic causation. Thus, hypothetically in this instance, ion channels respond to the ionic concentrations inside the cell and in its environs. Factors that affect the ion concentrations themselves could lead to effects similar to ion channel defects per se. Thus, again just surmising, there are known environmental stimuli for attacks but these may affect the ion concentrations themselves, not the channel protein function. And, of course, both could be at work, which would be rather expected given the many precedents for disease complexity.
And, it's possible that Ellen's disease is polygenic, or not genetic at all, though given that many cases of periodic paralysis, including in families, seem to have a single genetic cause, this seems unlikely.
Genetics asks two basic questions: What causes disease X? And, who will get it? The promises of the past few decades are that answers to both these questions are just around the corner for most diseases. The NIH Office of Rare Disease Research reports that there are 7000 known rare diseases (diseases that affect fewer than 1 in 200,000 people). The cause of many of these diseases has been identified, and by some criteria over 6000 specific genes have been associated with some usually rare single-gene disorder. In many cases, it's possible to predict who will get the disease, and that is where genetic counseling is so useful. It is, in our view, also where our limited research resources should be directed.
But, if you read MT at all regularly, you know what we think about the promise of predicting common, complex diseases with genes. Current science is very far from answering the two simple questions, what causes common, complex disease X?, and who will get it? And, you know that we think that's because these questions can't be answered in any way approximating the promise of, say, precision medicine.
But single-gene disorders are a different kind of problem. What causes Ellen's HKPP? That seems to be a well-posed question, and should be answerable. But to date, it hasn't been. Labs are reporting 25-30% success with identifying the cause of rare genetic diseases (some somewhat higher success rates), so she is not at all unique. We commented last week on the problem of identifying specific at-risk subgroups more effectively than blanket epidemiological studies currently can.
Are we skeptics? Or are we realists? When even the 'easy' cases, like Ellen's, the low-hanging fruit, are hard, what does this mean about the promises for genomics?
Even so, she mentioned that her parents, Ken and I, are skeptics about a lot of genetic research. Yes, that's true, but another word for that is 'realist'. We are alive at a time in history when more is known about genes and genomes than ever before, and for decades we've been hearing promises of what this new knowledge will mean for medicine, and the promises roll on. Once we all have our genomes on a disk, we'll be able to predict and treat whatever it is our DNA foretells.
![]() |
Lazuli Bunting; rare birds in Central Pennsylvania; Wikipedia, Leander Sylvester Keyser
|
And, to her frustration, that is all she knows. It has been suggested that she go the clinical genetics route, having her DNA tested for known causes of HKPP, but that seems unlikely to be helpful, given that she knows what disease she has, just doesn't know why, and clinical labs don't look for new causal genes or variants, but instead a battery of those that are known.
Ellen has classic symptoms and classic triggers, and her disease is pretty well controlled at the moment, so identifying the cause, as she wrote in her post, might not change her treatment, but it would ease her mind about future dealings with the medical system. As importantly, it might help future patients avoid the lengthy, destructive diagnostic odyssey she herself experienced, which itself would be a very satisfying outcome.
Big Data advocates will say that the problem is that not enough people with HKPP have been sequenced, and once we've got a million genomes or more, that will facilitate identifying Ellen's and others' causal variants. But only 1 in 200,000 people have HKPP, so one million is unlikely to help. And, though the data are rather sparse, some estimates based on those data suggest that a fairly large minority, a third or so, won't have one of the known causal genetic variants. As with most diseases, the phenotypes vary greatly, and again as with most diseases, this is likely to be because every genome is unique, and genetic background matters, along with exposure to other triggering factors.
Perhaps there's an as-yet unidentified gene that would explain many of the unidentified cases, or there are many unique pathways to the disease, or both, but given the rarity and the heterogeneity of the periodic paralyses, it would take a huge amount of luck for even a large database to answer Ellen's question. We should perhaps call it dumb luck, because the investigators vacuum up generic data without specific regard to, say, the physiology of this particular disorder (and the same for countless other disorders). Of course, collecting data on every possible physiological or environmental factor, mostly with weak individual effects, isn't possible and that is a dilemma for modern public health science.
In addition, it's known from affected families that penetrance of alleles related to the periodic paralyses is not 100% -- some people with a 'causal' variant never experience an attack, making associating genotype with phenotype even harder. Again, genetic background may affect this but, as with many genetic disorders with variable penetrance, it's not at all clear. Incomplete penetrance is a fact, but also a fudge factor, because it leaves the impression the trait really is 'genetic'; in fact, we often don't know how many people have such mutations but no symptoms at all, because they aren't screened (but some studies looking for such asymptomatic cases have easily found them, and they can be as common as the 'causal' mutations in affected patients).
Further, it's possible that there are non-ion channel related causes of these channelopathies. That is, something upstream is going wrong. In that case, it's unclear where to even begin to look for genetic causation. Thus, hypothetically in this instance, ion channels respond to the ionic concentrations inside the cell and in its environs. Factors that affect the ion concentrations themselves could lead to effects similar to ion channel defects per se. Thus, again just surmising, there are known environmental stimuli for attacks but these may affect the ion concentrations themselves, not the channel protein function. And, of course, both could be at work, which would be rather expected given the many precedents for disease complexity.
And, it's possible that Ellen's disease is polygenic, or not genetic at all, though given that many cases of periodic paralysis, including in families, seem to have a single genetic cause, this seems unlikely.
Genetics asks two basic questions: What causes disease X? And, who will get it? The promises of the past few decades are that answers to both these questions are just around the corner for most diseases. The NIH Office of Rare Disease Research reports that there are 7000 known rare diseases (diseases that affect fewer than 1 in 200,000 people). The cause of many of these diseases has been identified, and by some criteria over 6000 specific genes have been associated with some usually rare single-gene disorder. In many cases, it's possible to predict who will get the disease, and that is where genetic counseling is so useful. It is, in our view, also where our limited research resources should be directed.
But, if you read MT at all regularly, you know what we think about the promise of predicting common, complex diseases with genes. Current science is very far from answering the two simple questions, what causes common, complex disease X?, and who will get it? And, you know that we think that's because these questions can't be answered in any way approximating the promise of, say, precision medicine.
But single-gene disorders are a different kind of problem. What causes Ellen's HKPP? That seems to be a well-posed question, and should be answerable. But to date, it hasn't been. Labs are reporting 25-30% success with identifying the cause of rare genetic diseases (some somewhat higher success rates), so she is not at all unique. We commented last week on the problem of identifying specific at-risk subgroups more effectively than blanket epidemiological studies currently can.
Are we skeptics? Or are we realists? When even the 'easy' cases, like Ellen's, the low-hanging fruit, are hard, what does this mean about the promises for genomics?
Wednesday, February 4, 2015
Exploring genomic causal 'precision'
By
Ken Weiss
Regardless of whether or not some geneticists object to the cost or scientific cogency of the currently proposed Million Genomes project, it is going to happen. Genomic data clearly have a role in health and medical practice. The project is receiving kudos from the genetics community, but it's easy to forget that many questions related to understanding the actual nature of genomic causation and the degree to which that understanding can in practice lead to seriously 'precise' and individualized predictive or therapeutic medicine remain at best unanswered. The project is inevitable if for no other reason that DNA sequencing costs are rapidly decreasing. So let's assume the data, and think about what our current state of knowledge tells us we'll be predict from it all.
An important scientific (rather than political or economic) point in regard to recent promises, is that, currently, disease prediction is actually not prediction but data-fitting retrodiction. The data reflect what has happened in the past to bearers of identified genotypes. Using the results for prediction is to assume that what is past is prologue, and to extrapolate retrospectively estimated risks to the future. In fact, the individual genomewide genotypes that have been studied to estimate past risk will never recur in the future: there are simply too many contributing variants to generate each sampled person's genotype.
Secondly, if current theory underlying causation and measures like heritability is even remotely correct, the bulk of the risk associated with individual genomic factors are inherited in a Mendelian way but do not as a rule cause traits that way. Instead, each factor's effects are genome context-specific and act in a combinatorial way with the other contributing factors, including the other parts of the genome, and the other cells in the individual, and more.
Thirdly, in general, most risk seems not due to inherited genetic factors or context, because heritability is usually far below 100%. Risk is due in large part to lifestyle exposures and interactions. It is very important to realize that we cannot, even in principle know what current subjects' future environmental/lifestyle exposures will be, though we do know that they will differ in major ways from the exposures of those patients or subjects from whom current retrospective risks have been estimated. It is troubling enough that we are not good at evaluating, or even measuring current subjects' past exposures, whose effects we are now seeing along with their genotypes.
In a nutshell, the assumption underlying current 'personalized' medicine is one of replicability of past observations, and statistical assessments are fundamentally based on that notion in one way or another.
Furthermore, most risk estimates used are perforce for practical reasons based essentially on additive models: add up the estimated risk from each relevant genome site (hundreds of them) to get the net risk. Depending on the analysis, this leaves little room for non-additive effects, because things are estimated statistically from a population of samples, etc. These issues are well known to statisticians, perhaps less so to many geneticists, even if there are many reasons, good and bad, to keep them in the shadows. Biologically, as extensive systems analysis shows clearly, DNA functions work by its coded products interacting with each other and with everything else the cell is exposed to. There is simply no reason to assume that within each individual those interactions are strictly additive at the mechanistic level, even if they are assessed (estimated) statistically from large population samples.
For these and several other fundamental reasons, we're skeptical about the million genome project, and we've said that upfront (including in this blog post.) But supporters of the project are looking at the exact same flood of genomic data we are, and seeing evidence that the promises of precision medicine are going to be met. They say we're foolish, we say they're foolish, but who's right? Well, the fundamental issue is the way in which genotypes produce phenotypes, and if we can parameterize that in some way, we can anticipate the realms in which the promised land can be reached, ways in which it is not likely to be reached, and how best to discriminate between them.
Simulation
Based on work we've done over the past few years, one avenue we think should be taken seriously, which can be done at very low cost could potentially save a large amount of costly wheel-spinning, is computer simulation of the data and the approaches one might take to analyze it.
Computer simulation is a well-accepted method of choice in fields dealing with complex phenomena, as in chemistry, physics, and cosmology. Computer simulation allows one to build in (or out) various assumptions, and to see how they affect results. Most importantly, it allows testing whether the results match empirical data. When data are too complex, total enumeration of factors, much less analytical solutions to their interactions is simply not possible.
Biological systems have the kind of complexity that these other physical-science fields have to deal with. A good treatment of the nature of biological systems, and their 'hyper-astronomical' complexity, is Andreas Wagner's recent book The Arrival of the Fittest. This illustrates the types of known genetically relevant complexity that we're facing. If simulation of cosmic (mere 'astronomical' complexity) is a method of choice in astrophysics, among other areas, it should be legitimate for mere genomics.
A computer simulation can be deterministic or probabilistic, and with modern technology can mimic most of the sorts of things one would like to know in the promised miracle era of genomewide sequencing of everything that moves. Simulation results are not real biology, of course, any more than simulated multiple galaxies in space are real galaxies. But simulated results can be compared to real data. As importantly, with simulation, there is no measurement or ascertainment error, since you know the exact 'truth', though one can introduce sampling or other sorts of errors to see how they affect what can be inferred from imperfect real data. If simulated parameters, conditions, and results resemble the real world, then we've learned something. If they don't, then we've also learned something, because we can adjust the simulations to try to understand why.
Many sneer at simulation as 'garbage in, garbage out'. That's a false defense of relying on empirical data, that we know are loaded with all sorts of errors. Just as with simulation, an empiricist can design samples and do empirical data in garbage in, garbage out ways, too.
Computer simulation can be done at a tiny fraction of the cost of collecting empirical data. Simulation involves no errors or mistakes, no loss or inadvertent switching of blood samples, no problems due to measurement errors or imprecise definition of a phenotype, because you get the exact data that were simulated. It is very fast if a determined effort is made to do it. Even if the commitment is made to collect vast amounts of data, one can use simulation to make the best use of them.
Most simulations are built to aid in some specific problem. For example, under (say) a model of some small number of genes, with such-and-such variant frequency, how many individuals would one need to sample to get a given level of statistical power to detect the risk effects in a case-control study?
In a sense, that sort of simulation is either quite specific, or in some cases been designed to prove some point, such as to support a proposed design in a grant application. These can be useful, or they can be transparently self-serving. But there is another sort of simulation, designed for research purposes.
Evolution by phenotype
Most genetic simulations treat individual genes as evolving in populations. They are genotype-based in that sense, essentially simulating evolution by genotype. But evolution is phenotype-based: individuals as wholes compete, reproduce, or survive, and the genetic variation they carry is or isn't transmitted as a whole. This is evolution by phenotype, and is how life actually works.
There is a huge difference between phenotype-based and gene-based simulation, and the difference is highly pertinent to the issues presently at stake. That is because multiple genetic variants whether changing under drift or natural selection (almost always, it's both, with the former having the greatest effect), that we get the kind of causal complexity and elusive gene-specific causal effects that clearly is the case. And environmental effects need to be taken into account directly as well.
I know this not just by observing the data that are so plentiful but because I and colleagues have developed an evolution by phenotype simulation program (it's called ForSim, and is freely available and open-source, so this is not a commercial--email me if you would like the package). It is one of many simulation packages available, which can be found here: NCI link. Our particular approach to genetic variation and its dynamics in populations has been used to address various problems and it can address the very questions at issue today.
With simulation, you try to get an idea of some phenomenon so complex or extensive that data in hand are inadequate or where there is reason to think proposed approaches will not deliver on what is expected. If a simulation gives results that don't match the structure of available empirical data reasonably closely, you modify the conditions and run it again, in many cases only requiring minutes and a desk-top computer. Larger or very much larger simulations are also easily and inexpensively within reach, without waiting for some massive, time-demanding and expensive new technology. Even very large-scale simulation does not require investing in high technology, because major universities already have adequate computer facilities.
Simulation of this type can include features such as:
* Multiple populations and population history, with specifiable separation depth, and with or without gene flow (admixture)
* The number of contributing genes, their length and spacing along the genome
* Recombination, mutation, and genetic drift rates, environmental effects, and natural selection of various kinds and intensities to generate variation in populations
* Additive, or function-based non-additive, single or multiple phenotype determination
* Single and multiple related or independent phenotypes
* Sequence elements that do and that don't affect phenotype(s) (e.g., mapping-marker variants)
Such simulations provide
* Deep known (saved) pedigrees
* Ability to see the status of these factors as they evolve, saving data at any point in the past (can even mimic fossil DNA)
* Each factor can be adjusted or removed, to see what difference it makes.
* Testable sampling strategies, including 'random', phenotype-based (case-control, tail, QTL, for GWAS, families and Mendelian penetrance, admixture, population structure effects)
* Precise testing of the efficacy, or conditions, for predicting retrofitted risks, to test the characteristics of 'precision' personalized medicine.
An important scientific (rather than political or economic) point in regard to recent promises, is that, currently, disease prediction is actually not prediction but data-fitting retrodiction. The data reflect what has happened in the past to bearers of identified genotypes. Using the results for prediction is to assume that what is past is prologue, and to extrapolate retrospectively estimated risks to the future. In fact, the individual genomewide genotypes that have been studied to estimate past risk will never recur in the future: there are simply too many contributing variants to generate each sampled person's genotype.
Secondly, if current theory underlying causation and measures like heritability is even remotely correct, the bulk of the risk associated with individual genomic factors are inherited in a Mendelian way but do not as a rule cause traits that way. Instead, each factor's effects are genome context-specific and act in a combinatorial way with the other contributing factors, including the other parts of the genome, and the other cells in the individual, and more.
Thirdly, in general, most risk seems not due to inherited genetic factors or context, because heritability is usually far below 100%. Risk is due in large part to lifestyle exposures and interactions. It is very important to realize that we cannot, even in principle know what current subjects' future environmental/lifestyle exposures will be, though we do know that they will differ in major ways from the exposures of those patients or subjects from whom current retrospective risks have been estimated. It is troubling enough that we are not good at evaluating, or even measuring current subjects' past exposures, whose effects we are now seeing along with their genotypes.
In a nutshell, the assumption underlying current 'personalized' medicine is one of replicability of past observations, and statistical assessments are fundamentally based on that notion in one way or another.
Furthermore, most risk estimates used are perforce for practical reasons based essentially on additive models: add up the estimated risk from each relevant genome site (hundreds of them) to get the net risk. Depending on the analysis, this leaves little room for non-additive effects, because things are estimated statistically from a population of samples, etc. These issues are well known to statisticians, perhaps less so to many geneticists, even if there are many reasons, good and bad, to keep them in the shadows. Biologically, as extensive systems analysis shows clearly, DNA functions work by its coded products interacting with each other and with everything else the cell is exposed to. There is simply no reason to assume that within each individual those interactions are strictly additive at the mechanistic level, even if they are assessed (estimated) statistically from large population samples.
For these and several other fundamental reasons, we're skeptical about the million genome project, and we've said that upfront (including in this blog post.) But supporters of the project are looking at the exact same flood of genomic data we are, and seeing evidence that the promises of precision medicine are going to be met. They say we're foolish, we say they're foolish, but who's right? Well, the fundamental issue is the way in which genotypes produce phenotypes, and if we can parameterize that in some way, we can anticipate the realms in which the promised land can be reached, ways in which it is not likely to be reached, and how best to discriminate between them.
Simulation
Based on work we've done over the past few years, one avenue we think should be taken seriously, which can be done at very low cost could potentially save a large amount of costly wheel-spinning, is computer simulation of the data and the approaches one might take to analyze it.
Computer simulation is a well-accepted method of choice in fields dealing with complex phenomena, as in chemistry, physics, and cosmology. Computer simulation allows one to build in (or out) various assumptions, and to see how they affect results. Most importantly, it allows testing whether the results match empirical data. When data are too complex, total enumeration of factors, much less analytical solutions to their interactions is simply not possible.
Biological systems have the kind of complexity that these other physical-science fields have to deal with. A good treatment of the nature of biological systems, and their 'hyper-astronomical' complexity, is Andreas Wagner's recent book The Arrival of the Fittest. This illustrates the types of known genetically relevant complexity that we're facing. If simulation of cosmic (mere 'astronomical' complexity) is a method of choice in astrophysics, among other areas, it should be legitimate for mere genomics.
A computer simulation can be deterministic or probabilistic, and with modern technology can mimic most of the sorts of things one would like to know in the promised miracle era of genomewide sequencing of everything that moves. Simulation results are not real biology, of course, any more than simulated multiple galaxies in space are real galaxies. But simulated results can be compared to real data. As importantly, with simulation, there is no measurement or ascertainment error, since you know the exact 'truth', though one can introduce sampling or other sorts of errors to see how they affect what can be inferred from imperfect real data. If simulated parameters, conditions, and results resemble the real world, then we've learned something. If they don't, then we've also learned something, because we can adjust the simulations to try to understand why.
Many sneer at simulation as 'garbage in, garbage out'. That's a false defense of relying on empirical data, that we know are loaded with all sorts of errors. Just as with simulation, an empiricist can design samples and do empirical data in garbage in, garbage out ways, too.
Computer simulation can be done at a tiny fraction of the cost of collecting empirical data. Simulation involves no errors or mistakes, no loss or inadvertent switching of blood samples, no problems due to measurement errors or imprecise definition of a phenotype, because you get the exact data that were simulated. It is very fast if a determined effort is made to do it. Even if the commitment is made to collect vast amounts of data, one can use simulation to make the best use of them.
Most simulations are built to aid in some specific problem. For example, under (say) a model of some small number of genes, with such-and-such variant frequency, how many individuals would one need to sample to get a given level of statistical power to detect the risk effects in a case-control study?
In a sense, that sort of simulation is either quite specific, or in some cases been designed to prove some point, such as to support a proposed design in a grant application. These can be useful, or they can be transparently self-serving. But there is another sort of simulation, designed for research purposes.
Evolution by phenotype
Most genetic simulations treat individual genes as evolving in populations. They are genotype-based in that sense, essentially simulating evolution by genotype. But evolution is phenotype-based: individuals as wholes compete, reproduce, or survive, and the genetic variation they carry is or isn't transmitted as a whole. This is evolution by phenotype, and is how life actually works.
There is a huge difference between phenotype-based and gene-based simulation, and the difference is highly pertinent to the issues presently at stake. That is because multiple genetic variants whether changing under drift or natural selection (almost always, it's both, with the former having the greatest effect), that we get the kind of causal complexity and elusive gene-specific causal effects that clearly is the case. And environmental effects need to be taken into account directly as well.
I know this not just by observing the data that are so plentiful but because I and colleagues have developed an evolution by phenotype simulation program (it's called ForSim, and is freely available and open-source, so this is not a commercial--email me if you would like the package). It is one of many simulation packages available, which can be found here: NCI link. Our particular approach to genetic variation and its dynamics in populations has been used to address various problems and it can address the very questions at issue today.
With simulation, you try to get an idea of some phenomenon so complex or extensive that data in hand are inadequate or where there is reason to think proposed approaches will not deliver on what is expected. If a simulation gives results that don't match the structure of available empirical data reasonably closely, you modify the conditions and run it again, in many cases only requiring minutes and a desk-top computer. Larger or very much larger simulations are also easily and inexpensively within reach, without waiting for some massive, time-demanding and expensive new technology. Even very large-scale simulation does not require investing in high technology, because major universities already have adequate computer facilities.
Simulation of this type can include features such as:
* Multiple populations and population history, with specifiable separation depth, and with or without gene flow (admixture)
* The number of contributing genes, their length and spacing along the genome
* Recombination, mutation, and genetic drift rates, environmental effects, and natural selection of various kinds and intensities to generate variation in populations
* Additive, or function-based non-additive, single or multiple phenotype determination
* Single and multiple related or independent phenotypes
* Sequence elements that do and that don't affect phenotype(s) (e.g., mapping-marker variants)
Such simulations provide
* Deep known (saved) pedigrees
* Ability to see the status of these factors as they evolve, saving data at any point in the past (can even mimic fossil DNA)
* Each factor can be adjusted or removed, to see what difference it makes.
* Testable sampling strategies, including 'random', phenotype-based (case-control, tail, QTL, for GWAS, families and Mendelian penetrance, admixture, population structure effects)
* Precise testing of the efficacy, or conditions, for predicting retrofitted risks, to test the characteristics of 'precision' personalized medicine.
These are some of what the particular ForSim system can do, listed only because I know what we included in our particular program. I don't know much about what other simulation programs can do (but if they are not phenotype-based they will likely miss critical issues). Do it on a desktop or, for grander scale, on some more powerful platform. Other features that relate to some of the issues that the current whole genome proposed sequence implicitly raises could be built in by various parameter specification or by program modification, at a minuscule fraction of the cost of launching off on new sequencing.
Looking at this list of things to decide, you might respond, in exasperation, "This is too complicated! How on earth can one specify or test so many factors?" When you say that, without even yet pressing the 'Run' key, you have learned a major lesson from simulation! That's because these factors are, as you know very well, involved in evolutionary and genetic processes whose present-day effects we are being told can be predicted 'precisely'. Simulation both clearly shows what we're up against and may give ideas about how to deal with it.
Above is a schematic illustration of the kinds of things one can examine by simulation, and check with real data. In this figure (related to work we've been involved with), mouse strains were selected for 'opposite' trait value, then interbred, and then a founding few from each strain were crossed, and then intercrossed for many generations to let recombination break up gene blocks. Then use markers identified in the sequenced parental strain, to map variation causally related to the strains' respective inbred trait. Many aspects and details of such a design can be studied with the help of such results, and there are surprises that can guide research design (e.g., there is more variation than the nominal idea of inbreeding and 'representative' strain-specific genome sequencing generally considers, among other issues).
Possibilities in, knowledge out
As noted above, it is common to dismiss simulation out of hand, because it's not real data, and indeed simulations can certainly be developed that are essentially structured to show what the author believes to be true. But that is not the only way to approach the subject.
A good research simulation program is not designed to generate any particular answer, but just the opposite. Simulation done properly doesn't even take much time to get to useful answers. What it gives you is not real data but verisimilitude--when you match real data that are in hand, you can make sharper, focused decisions on what kinds of new data to obtain, or how to sample or analyze them, or, importantly, what they can actually tell you. Just as importantly, if not more so, if you can't get a good approximation to real data, then you have to ask why. In either case, you learn.
Because of its low relative cost, the preparatory use of serious-level simulation should be a method of choice in the face of the kinds of genomic causal complexity that we know constitutes the real world. Careful, honest use of simulation to know about nature and as a guide is one real answer to the regularly heard taunt that if someone doesn't have a magic answer about what to do instead, s/he has no right to criticize business as usual.
Simulation, when not done just to cook the books in favor of what one already is determined to do, can show where one needs to look to gain an understanding. It is no more garbage in, garbage out than mindless data collection, but at least when mistakes or blind alleys are found by simulation, there isn't that much garbage to have to throw out before getting to the point. Well-done simulation is not garbage in, garbage out, but a very fast and cost-effective 'possibilities in, knowledge out'.
Our prediction
We happen to think that life is genetically as complex as it looks from a huge diversity of studies large and small, on various species. One possibility is that this complexity implies there is in fact no short-cut to disease prediction for complex traits. Another is that some clever young persons, with our without major new data discoveries, will see a very different way to view this knowledge, and suggest a 'paradigm shift' in genetic and evolutionary thinking. Probably more likely is that, if we take the complexity seriously, we can develop a more effective and sophisticated approach to understanding phenogenetic processes, the connections between genotypes and phenotypes.
Looking at this list of things to decide, you might respond, in exasperation, "This is too complicated! How on earth can one specify or test so many factors?" When you say that, without even yet pressing the 'Run' key, you have learned a major lesson from simulation! That's because these factors are, as you know very well, involved in evolutionary and genetic processes whose present-day effects we are being told can be predicted 'precisely'. Simulation both clearly shows what we're up against and may give ideas about how to deal with it.
Above is a schematic illustration of the kinds of things one can examine by simulation, and check with real data. In this figure (related to work we've been involved with), mouse strains were selected for 'opposite' trait value, then interbred, and then a founding few from each strain were crossed, and then intercrossed for many generations to let recombination break up gene blocks. Then use markers identified in the sequenced parental strain, to map variation causally related to the strains' respective inbred trait. Many aspects and details of such a design can be studied with the help of such results, and there are surprises that can guide research design (e.g., there is more variation than the nominal idea of inbreeding and 'representative' strain-specific genome sequencing generally considers, among other issues).
Possibilities in, knowledge out
As noted above, it is common to dismiss simulation out of hand, because it's not real data, and indeed simulations can certainly be developed that are essentially structured to show what the author believes to be true. But that is not the only way to approach the subject.
A good research simulation program is not designed to generate any particular answer, but just the opposite. Simulation done properly doesn't even take much time to get to useful answers. What it gives you is not real data but verisimilitude--when you match real data that are in hand, you can make sharper, focused decisions on what kinds of new data to obtain, or how to sample or analyze them, or, importantly, what they can actually tell you. Just as importantly, if not more so, if you can't get a good approximation to real data, then you have to ask why. In either case, you learn.
Because of its low relative cost, the preparatory use of serious-level simulation should be a method of choice in the face of the kinds of genomic causal complexity that we know constitutes the real world. Careful, honest use of simulation to know about nature and as a guide is one real answer to the regularly heard taunt that if someone doesn't have a magic answer about what to do instead, s/he has no right to criticize business as usual.
Simulation, when not done just to cook the books in favor of what one already is determined to do, can show where one needs to look to gain an understanding. It is no more garbage in, garbage out than mindless data collection, but at least when mistakes or blind alleys are found by simulation, there isn't that much garbage to have to throw out before getting to the point. Well-done simulation is not garbage in, garbage out, but a very fast and cost-effective 'possibilities in, knowledge out'.
Our prediction
We happen to think that life is genetically as complex as it looks from a huge diversity of studies large and small, on various species. One possibility is that this complexity implies there is in fact no short-cut to disease prediction for complex traits. Another is that some clever young persons, with our without major new data discoveries, will see a very different way to view this knowledge, and suggest a 'paradigm shift' in genetic and evolutionary thinking. Probably more likely is that, if we take the complexity seriously, we can develop a more effective and sophisticated approach to understanding phenogenetic processes, the connections between genotypes and phenotypes.
Thursday, November 13, 2014
Evolution of malaria resistance: 70 years on...and on....and on
By
Ken Weiss
It was about 70 years ago that the complex problem of anemia, malaria, and genetic interactions, with their relation to hemoglobin was first beginning to be understood. Sickle cell anemia and its association with a globin gene variant, and similar associations between malarial susceptibility and other genes (such as G6PD and Duffy and other globin gene mutations) were also rapidly identified in roughly the same decades. The findings were showing that in areas of the world with long-endemic malaria, various gene mutations seemed to be at high frequency as if they protected against malaria. I was never involved in this directly, but I studied under Frank Livingstone and James V Neel at Michigan, two of the leaders in understanding the evolution of the protective mechanisms.
For decades we have had direct clinical evidence, mainly in Africa, but also in Sardinia, and then later in other places including southeast Asia, that at least some of the putatively protective mutations in the alpha and beta globin, and other genes did in fact protect against malaria, but that they had side effects such as various forms of anemia or other problems. Even then most of the evidence was circumstantial and based on geographic correlations.
The idea of a balanced polymorphism was suggested in regard to these variants. If you had two 'malaria-protective' alleles at the gene (one in each copy of the gene that you have), you were vulnerable to anemia, but if you had two 'normal' alleles you were susceptible to malaria; however, having one of each (a heterozygote genotype) you had some protection against both malaria and anemia. Evolution favored keeping both variants in the population, because selection worked against both homozygotes.
Far beyond malaria: Relationship to fundamental evolutionary questions
The idea of balanced polymorphisms played into a major theoretical argument among evolutionary biologists at the time, and sickle cell anemia became a central case in point, and a stereotypical classroom example. But the broader question was quite central to evolutionary theory. Balancing selection was, for many biologists who held a strongly selectionist version of Darwinism, the explanation for why there was so much apparently standing genetic variation in humans, but generally in all, species.
The theory had been that harmful mutations (the majority) are quickly purged, so the finding that there was widespread variation (polymorphism) in nature at gene after gene, the result of the type of genotyping possible then (based on protein variation), demanded explanation; balanced polymorphism provided it. This was countered by a largely new, opposing view called 'non-Darwinian' evolution, or the 'neutral' theory; it held that much or even most genetic variation had no effect on reproductive success, and the frequency of such variants changed over time by chance alone, that is, experience 'genetic drift'. This seemed heretically anti-Darwinian, though that was a wrong reaction and only the most recalcitrant or rabid Darwinist today denies that much of observed genomic variation evolves basically neutrally. But many saw the frequency of variants associated with what were seen as serious recessive diseases, like PKU and Cystic Fibrosis (and others) as the result of balancing selection.
In support of the selectionist view, many variants have been found in the globin and other genes for which the frequency of one or more alleles is correlated geographically with the presence (today, at least) of endemic malaria. But there are lots of variants that might be correlated with other things geographic because the latter are themselves often correlated with population history. Thus, the correlations are often empirical but not clearly causal. Indeed, not many variants have been clearly shown experimentally or clinically actually to be functionally related to malaria resistance.
In this light it is interesting to see a rather large-scale attempt at testing whether putative malaria-associated variants really are protective. The paper ("Reappraisal of known malaria resistance loci in a large multi center study") by a large consortium of authors is in the November 2014 Nature Genetics; it is paywalled so if you don't have direct access but would like to read it, I'd be happy to email a pdf.
These authors compiled large data sets from different areas of the world which have endemic malaria caused by the specific falciparum subtype of parasite, and compared the frequency of the many candidate gene variants in sufferers of severe malaria to a large set of unaffected controls (of course some of them may later become affected).
A long time coming...and the clock still ticking
Even now, 70 years after the first ideas were suggested, we still have scant direct clinical data showing protection at a mechanistic level, so the results of this paper are still statistical. But they are at least from a reasonably designed and specific study. The authors found positive statistical association for some of the most clear-cut classical risk alleles (sickle cell, G6PD, O-blood group), but ambiguous or variable evidence even for some of these, and no statistical evidence for many other putative causal, or protective variants. Further, they found that some variants had different effects in males and females, and one SNP, in the CD40LG gene, previously found to be associated with severe malaria, was associated with reduced risk in The Gambia, but significantly increased risk in Kenya. Whether this is just statistical variation or indicators of other aspects of these local-area genomes isn't clear.
The evidence in the positive instances is persuasive, even if just statistical, but the conflicting results and the surprising lack of findings for so many is curious as well as discouraging. How can it be that so long on, we still basically don't even know if a genetic variant is protective or not, other than the most classical ones? This shows how very challenging even 'simple' causation can be.
This raises the basic evolutionary issue in a different way. Darwin was convinced that adaptive evolution was very slow. One major reason was that rapid changes of species or adaptations were rarely observed (still true), and if they occurred they could be interpreted as creationist rather than natural events. Adaptive evolution under human direction, as in agricultural breeding, clearly brings about easily measured change. But some forms of natural selection could be quite strong. Adaptive coloration is one, but malaria should be another because it is so common and strong a negative effect on health. So basic evolutionary arguments ought, it was long hoped, demonstrate that, in this instance, balancing selection was a correct explanation of at least these polymorphisms.
In past work, one hemoglobin variant (called hemoglobin E) has apparently been sweeping across southeast Asia because there was no down side to being an EE homozygote, and it protected against malaria. But generally, the actual selective effect has been very hard to prove. The new study shows this in a sobering way. Is the story right? Have prior speculations about protective mutations been too superficially offered, and incorrect? Is the selective effect so small even in relation to malaria, that we can't see it even with samples large enough that 'nature' could have made a detectable selective difference? Or, if so gradual in a Darwinian sense, do these other mutations really make an evolutionary difference?
Several relevant points are, first, that this study only looked at one form of malaria (P. falciparum), and second, that the different putative protective genes are involved in different physiological pathways. And, as even the authors note, current patterns of disease, when antimalarial drugs are widely used, may not reflect patterns in the past, and thus it may not be possible to conclude that P. falciparum was the selective force these results suggest it may have been, plausible though that seems to be. These points suggest that even here, complexity and subtlety are involved.
Beyond evolutionary theory
More sobering than the reality of detecting evolutionary or even genuine physiological differences among these various genotypes, is the further fact that even for these major and rather clear causal sites, there is still basically no progress in effective gene-based therapy. After all, the target cells are in blood (generally, red cells), among the most easily accessible of all tissues. Given the unrestrained promises repeatedly being made by the genomewide-do-everything industry, this is (or should be) a very sobering thought. Our technological tools should, one might expect, have been able to solve such comparatively clear-cut problems.
To us, this 'failure' indicates the subtlety of genome physiology. Given the hundreds of putatively causal single-gene findings by GWAS and other means, where the evidence has seemed strong, we should be showing that genomic data are, after all the expense and effort, really worth gathering. We should be making a definitive, and one might say systematic, march toward elimination of these genetic threats, perhaps the way vaccines have done against many infectious diseases. If we could actually do that, and speak of cures and prevention rather than just risk-estimation of countless minor factors, then nobody would disagree that further genomic big-science efforts were worth the investment.
Meanwhile, more than 70 years on, the largely failed effort to use that knowledge directly to rid our species of a disease that has been estimated to have killed more human beings than any other single cause, shows how far we have to go--and how important new sorts of thinking could potentially be to the effort.
And, into the bargain, perhaps we're learning a lot about how adaptive evolution works, reinforcing Darwin's ideas about its slowness, about multiple alternative or interactive pathways, and more.
For decades we have had direct clinical evidence, mainly in Africa, but also in Sardinia, and then later in other places including southeast Asia, that at least some of the putatively protective mutations in the alpha and beta globin, and other genes did in fact protect against malaria, but that they had side effects such as various forms of anemia or other problems. Even then most of the evidence was circumstantial and based on geographic correlations.
The idea of a balanced polymorphism was suggested in regard to these variants. If you had two 'malaria-protective' alleles at the gene (one in each copy of the gene that you have), you were vulnerable to anemia, but if you had two 'normal' alleles you were susceptible to malaria; however, having one of each (a heterozygote genotype) you had some protection against both malaria and anemia. Evolution favored keeping both variants in the population, because selection worked against both homozygotes.
![]() |
| Plasmodium falciparum lifestyle; Wikipedia |
Far beyond malaria: Relationship to fundamental evolutionary questions
The idea of balanced polymorphisms played into a major theoretical argument among evolutionary biologists at the time, and sickle cell anemia became a central case in point, and a stereotypical classroom example. But the broader question was quite central to evolutionary theory. Balancing selection was, for many biologists who held a strongly selectionist version of Darwinism, the explanation for why there was so much apparently standing genetic variation in humans, but generally in all, species.
The theory had been that harmful mutations (the majority) are quickly purged, so the finding that there was widespread variation (polymorphism) in nature at gene after gene, the result of the type of genotyping possible then (based on protein variation), demanded explanation; balanced polymorphism provided it. This was countered by a largely new, opposing view called 'non-Darwinian' evolution, or the 'neutral' theory; it held that much or even most genetic variation had no effect on reproductive success, and the frequency of such variants changed over time by chance alone, that is, experience 'genetic drift'. This seemed heretically anti-Darwinian, though that was a wrong reaction and only the most recalcitrant or rabid Darwinist today denies that much of observed genomic variation evolves basically neutrally. But many saw the frequency of variants associated with what were seen as serious recessive diseases, like PKU and Cystic Fibrosis (and others) as the result of balancing selection.
In support of the selectionist view, many variants have been found in the globin and other genes for which the frequency of one or more alleles is correlated geographically with the presence (today, at least) of endemic malaria. But there are lots of variants that might be correlated with other things geographic because the latter are themselves often correlated with population history. Thus, the correlations are often empirical but not clearly causal. Indeed, not many variants have been clearly shown experimentally or clinically actually to be functionally related to malaria resistance.
In this light it is interesting to see a rather large-scale attempt at testing whether putative malaria-associated variants really are protective. The paper ("Reappraisal of known malaria resistance loci in a large multi center study") by a large consortium of authors is in the November 2014 Nature Genetics; it is paywalled so if you don't have direct access but would like to read it, I'd be happy to email a pdf.
These authors compiled large data sets from different areas of the world which have endemic malaria caused by the specific falciparum subtype of parasite, and compared the frequency of the many candidate gene variants in sufferers of severe malaria to a large set of unaffected controls (of course some of them may later become affected).
A long time coming...and the clock still ticking
Even now, 70 years after the first ideas were suggested, we still have scant direct clinical data showing protection at a mechanistic level, so the results of this paper are still statistical. But they are at least from a reasonably designed and specific study. The authors found positive statistical association for some of the most clear-cut classical risk alleles (sickle cell, G6PD, O-blood group), but ambiguous or variable evidence even for some of these, and no statistical evidence for many other putative causal, or protective variants. Further, they found that some variants had different effects in males and females, and one SNP, in the CD40LG gene, previously found to be associated with severe malaria, was associated with reduced risk in The Gambia, but significantly increased risk in Kenya. Whether this is just statistical variation or indicators of other aspects of these local-area genomes isn't clear.
The evidence in the positive instances is persuasive, even if just statistical, but the conflicting results and the surprising lack of findings for so many is curious as well as discouraging. How can it be that so long on, we still basically don't even know if a genetic variant is protective or not, other than the most classical ones? This shows how very challenging even 'simple' causation can be.
This raises the basic evolutionary issue in a different way. Darwin was convinced that adaptive evolution was very slow. One major reason was that rapid changes of species or adaptations were rarely observed (still true), and if they occurred they could be interpreted as creationist rather than natural events. Adaptive evolution under human direction, as in agricultural breeding, clearly brings about easily measured change. But some forms of natural selection could be quite strong. Adaptive coloration is one, but malaria should be another because it is so common and strong a negative effect on health. So basic evolutionary arguments ought, it was long hoped, demonstrate that, in this instance, balancing selection was a correct explanation of at least these polymorphisms.
In past work, one hemoglobin variant (called hemoglobin E) has apparently been sweeping across southeast Asia because there was no down side to being an EE homozygote, and it protected against malaria. But generally, the actual selective effect has been very hard to prove. The new study shows this in a sobering way. Is the story right? Have prior speculations about protective mutations been too superficially offered, and incorrect? Is the selective effect so small even in relation to malaria, that we can't see it even with samples large enough that 'nature' could have made a detectable selective difference? Or, if so gradual in a Darwinian sense, do these other mutations really make an evolutionary difference?
Several relevant points are, first, that this study only looked at one form of malaria (P. falciparum), and second, that the different putative protective genes are involved in different physiological pathways. And, as even the authors note, current patterns of disease, when antimalarial drugs are widely used, may not reflect patterns in the past, and thus it may not be possible to conclude that P. falciparum was the selective force these results suggest it may have been, plausible though that seems to be. These points suggest that even here, complexity and subtlety are involved.
Beyond evolutionary theory
More sobering than the reality of detecting evolutionary or even genuine physiological differences among these various genotypes, is the further fact that even for these major and rather clear causal sites, there is still basically no progress in effective gene-based therapy. After all, the target cells are in blood (generally, red cells), among the most easily accessible of all tissues. Given the unrestrained promises repeatedly being made by the genomewide-do-everything industry, this is (or should be) a very sobering thought. Our technological tools should, one might expect, have been able to solve such comparatively clear-cut problems.
To us, this 'failure' indicates the subtlety of genome physiology. Given the hundreds of putatively causal single-gene findings by GWAS and other means, where the evidence has seemed strong, we should be showing that genomic data are, after all the expense and effort, really worth gathering. We should be making a definitive, and one might say systematic, march toward elimination of these genetic threats, perhaps the way vaccines have done against many infectious diseases. If we could actually do that, and speak of cures and prevention rather than just risk-estimation of countless minor factors, then nobody would disagree that further genomic big-science efforts were worth the investment.
Meanwhile, more than 70 years on, the largely failed effort to use that knowledge directly to rid our species of a disease that has been estimated to have killed more human beings than any other single cause, shows how far we have to go--and how important new sorts of thinking could potentially be to the effort.
And, into the bargain, perhaps we're learning a lot about how adaptive evolution works, reinforcing Darwin's ideas about its slowness, about multiple alternative or interactive pathways, and more.
Wednesday, August 21, 2013
The beetle age
By
Ken Weiss
If the way of ushering in modern science in the 1600s was largely based on instruments such as lenses and clocks, the 1700s became an age of exploration, as recently characterized
by a BBC radio 4 series on the history of science (7 Ages of Science, episode 2). Extensive commercial, imperial, and military shipping (thanks to some better ship-making and navigational instruments) including gathering of plant, animal, and rock specimens from around the world. Darwin and many others were examples. In biology that time is now often dismissively referred to as the age of beetle collecting, to denote the unconstrained and often unsystematic assembly of specimens.
As part of this, in the mid-1700s the development of better telescopes led to huge advances in understanding the stars and their motion. Among the pioneers of this work were William Herschel and his wife Caroline. They discovered Uranus as the then most-distant planet known, and galaxies, and began to reveal to us for the first time how immense the universe is.
However, other questions also arose, or so one would think. But when Herschel said to the Astronomer Royal "I want to know what the stars made of," the latter replied, somewhat annoyed, "What we're interested in, is mapping."
Modern mapping
Once, about 15 years ago, I was on a site visit to evaluate and make a funding decision on whether to award the requested grant. The investigator was describing how the trait, call it Disease X, would be investigated with the latest GWAS tools. This was that one required greater sample sizes than had been available before to be able to identify spots in the genome that might be potential causes of X, as the investigator proposed to do.
There had already been smaller studies that had rather convincingly identified about 5 or so genes that seemed to lead to high risk of the disease. The mechanism and reason that these genes might be causal were, however, not at all clear nor could anything be done about the problem for persons carrying the seemingly causal variation in these genes. By analogy to Herschel's question to the Astronomer Royal (What were these genes made of?), when I asked why the investigator was proposing to search for even weaker signals than the ones already known, rather than working on understanding how the latter worked, the investigator replied: "Because mapping is what I do."
In the 15 years since then, lots of mapping has been done in this spirit, and a modest number of additional genes or possible-genes have been found, but no real progress has been made in the treatment or prevention of X, nor has there been much increased understanding of the mechanisms of the previously known genes, nor any gene-based therapy.
This reveals the attitude of so many in science today. "What I do is collect [specify some form of] Big Data!" 'Big Data' has become a fashionable term that, when dropped, may suggest gravitas or insight. But we aren't doing astronomy and while we certainly are highly ignorant about much of the nature of genomes, when it comes to funding the study of disease, for purposes of public health, this is a lame excuse for business as usual. What we need are more instances of what is (also fashionably) called proof of principle: proof that knowing about risk-conferring genetic variants leads to doing something about it. That's very tough, and we have precious few instances of such principle in genetics, but we do have enough to suggest, to us, that we should not be funding further gene-gazing expeditions, into the astronomical realm of genomic complexity and the astronomical costs of the studies. We should be focusing and intensifying the efforts to know how to do something to alleviate the problems caused by the genes, and there are many, that we know about, or the problems that seem to be truly 'genetic'.
We are not supposed to be just star gazing with public health funds; society expects and is promised benefits (whereas only NASA expects benefits, in the form of more funding, for star-gazing; for the rest of us, it's basically no different from watching science fiction on television).
Science requires data, and we never have enough. There are areas where modern style beetle collecting is still very much worth doing, because our knowledge is sufficiently rudimentary. But there are areas in which we've done enough of that, and the challenge is to concentrate resources on mechanism and intervention. In many instances, that really has nothing to do with technology, but has everything to do with lifestyle, because we have clear enough understanding to know that the diseases are largely the result of environmental exposure (too many calories, smoking, and so on). Funding for public health should go there, in those important instances.
From a genome point of view there are certainly areas where primary data-collection is still crucial. To take one example, it has been clear for decades that we don't yet have nearly a clear enough idea of how mutations arise in cells during development and subsequent life, nor how those mutations lead to disease and its distribution by age and sex, including interacting with environmental components and each other. But identifying mutations and their patterns in our billions of cells, as they arise by age, and how they affect cells is a major technological challenge, far harder than collecting larger case-control studies for traits we've already studied that way before.
As part of this, in the mid-1700s the development of better telescopes led to huge advances in understanding the stars and their motion. Among the pioneers of this work were William Herschel and his wife Caroline. They discovered Uranus as the then most-distant planet known, and galaxies, and began to reveal to us for the first time how immense the universe is.
However, other questions also arose, or so one would think. But when Herschel said to the Astronomer Royal "I want to know what the stars made of," the latter replied, somewhat annoyed, "What we're interested in, is mapping."
![]() |
| Replica of the telescope the Herschels used to discover Uranus; Wikipedia |
Once, about 15 years ago, I was on a site visit to evaluate and make a funding decision on whether to award the requested grant. The investigator was describing how the trait, call it Disease X, would be investigated with the latest GWAS tools. This was that one required greater sample sizes than had been available before to be able to identify spots in the genome that might be potential causes of X, as the investigator proposed to do.
There had already been smaller studies that had rather convincingly identified about 5 or so genes that seemed to lead to high risk of the disease. The mechanism and reason that these genes might be causal were, however, not at all clear nor could anything be done about the problem for persons carrying the seemingly causal variation in these genes. By analogy to Herschel's question to the Astronomer Royal (What were these genes made of?), when I asked why the investigator was proposing to search for even weaker signals than the ones already known, rather than working on understanding how the latter worked, the investigator replied: "Because mapping is what I do."
In the 15 years since then, lots of mapping has been done in this spirit, and a modest number of additional genes or possible-genes have been found, but no real progress has been made in the treatment or prevention of X, nor has there been much increased understanding of the mechanisms of the previously known genes, nor any gene-based therapy.
This reveals the attitude of so many in science today. "What I do is collect [specify some form of] Big Data!" 'Big Data' has become a fashionable term that, when dropped, may suggest gravitas or insight. But we aren't doing astronomy and while we certainly are highly ignorant about much of the nature of genomes, when it comes to funding the study of disease, for purposes of public health, this is a lame excuse for business as usual. What we need are more instances of what is (also fashionably) called proof of principle: proof that knowing about risk-conferring genetic variants leads to doing something about it. That's very tough, and we have precious few instances of such principle in genetics, but we do have enough to suggest, to us, that we should not be funding further gene-gazing expeditions, into the astronomical realm of genomic complexity and the astronomical costs of the studies. We should be focusing and intensifying the efforts to know how to do something to alleviate the problems caused by the genes, and there are many, that we know about, or the problems that seem to be truly 'genetic'.
We are not supposed to be just star gazing with public health funds; society expects and is promised benefits (whereas only NASA expects benefits, in the form of more funding, for star-gazing; for the rest of us, it's basically no different from watching science fiction on television).
Science requires data, and we never have enough. There are areas where modern style beetle collecting is still very much worth doing, because our knowledge is sufficiently rudimentary. But there are areas in which we've done enough of that, and the challenge is to concentrate resources on mechanism and intervention. In many instances, that really has nothing to do with technology, but has everything to do with lifestyle, because we have clear enough understanding to know that the diseases are largely the result of environmental exposure (too many calories, smoking, and so on). Funding for public health should go there, in those important instances.
From a genome point of view there are certainly areas where primary data-collection is still crucial. To take one example, it has been clear for decades that we don't yet have nearly a clear enough idea of how mutations arise in cells during development and subsequent life, nor how those mutations lead to disease and its distribution by age and sex, including interacting with environmental components and each other. But identifying mutations and their patterns in our billions of cells, as they arise by age, and how they affect cells is a major technological challenge, far harder than collecting larger case-control studies for traits we've already studied that way before.
Monday, October 1, 2012
Be afraid of fear, not personal genomics.
It's just the way it is now. This headline. This story.
(click to read/hear)
It follows the recipe. (1) Start with a headline that demonstrates controversy. (2) Present a story about science-related news (which does not require controversy to be news). (3) End it ever-so briefly and vaguely with dissent, doubt, outcry or warning.
This recipe applied to personal genomics is particularly bad.
If you read or hear that story you might be primed before you start to wonder, okay what's the worry? Glad this article will tell me, finally, what we should be concerned about concerning this brave new world of personal genomics.
But you'll be sorry when you reach the end and this is all you get:
But the idea of widespread sequencing is setting off alarm bells. How accurate are the results? How good are doctors at interpreting the results, which are often complicated and fuzzy? How well can they explain the subtleties to patients? The fear is that a lot of people could end up getting totally freaked out for no reason. And there are concerns about privacy. Scientists recently even sequenced a fetus in the womb, raising the possibility of everyone getting sequenced before or at birth — a prospect with a whole new set of questions and concerns. "I think there are lots of populationwide and individual dangers," said Mark Rothstein, a bioethicist at the University of Louisville. "We're basically not ready for a society in which very exquisite, detailed genomic information about every individual, potentially, is out there."Why? Tell us? And I don't mean the "us" who have access to the academic journals. Or the "us" who have the patience to bushwhack through the jargon. I mean, here is your chance to share with the public who you're concerned about: Since you brought it up, tell us why we should worry.
It's unclear who deserves the complaints and the criticism for producing pieces like this, since much of the "telling us" that I'm begging for might be lying on the cutting room floor.
I'm clenched about this because right now about 20 students in my Human Varition (Anthropology 350) course at the University of Rhode Island are voluntarily participating in genotyping through 23andMe. And I'm using this curriculum for the second semester now. After last spring, where over 100 students in both Human Variation and also the introductory level Human Origins (Anthropology 201) did 23andMe, not one student got "totally freaked out." This along with much of my experience with genotyping and undergraduates indicates that, with education and with understanding, personal genomics does not induce fear. Not coincidentally, participating in personal genomics aides in education.
And the same fear that I'm trying to mitigate through education is the same fear that some journalists and ethicists seem to be perpetuating if not creating.
In my experience, if you're informed, you're likely to appreciate biological complexity rather than cling to genetic determinism. If you're informed, you understand the positive and negative consequences and aspects of personal genomics. If you're informed, you don't get lured into personal genomics for all the wrong reasons. You don't order an expensive 23andMe spit kit as if it's snake oil. You don't send your vial of saliva to California, along with 300 of your precious bucks, because you think it will help you to live a longer, healthier life, or because you think it will show you your future. Spit kits are not crystal balls, are not medicine, are not cures. Plus, the results will also most certainly change! Not your genotypes, but how they're interpreted. That genomes must even be "interpreted" should be a flag shouldn't it?
Informed citizens and consumers don't buy into personal genomics thinking it's their one and only answer-- their key to "me"-- because "me" will be increasingly different the more we learn about genetics and the links between genotypes and phenotypes. "Me" is, for most, too stubborn and conservative, while at once too big and too free, to be dictated by genotypes and probabilistic phenotypes.
All that is guaranteed with a 23andMe spit kit is that you will see parts of yourself that you haven't seen before. There's not a whole lot on the planet that's cooler than that. For most of us who will never go to Mars, at least we've got this, at least we've got innerspace.
Even if you don't get an ounce of joy from the experience, when you're informed you don't fall uncritically for claims that spit kits are dangerous or venomous.
Considering the engaging educational opportunites provided by personal genomics, considering its power to inform, spit kits may just be much-needed anti-venom.
In my experience education diminishes fear about genetic determinism because it diminishes genetic determinism. That leads me to see fear of personal genomics as a symptom of ignorance. And that's something worth being afraid of.
**
Note: Ken, Anne and I have differing views on direct-to-consumer (DTC) personal genomics like 23andMe so please remember that I speak only for myself when I write. Also, I am not paid or sponsored by 23andMe to endorse their product. I use their product, at the educational rate, to teach anthropology at the University of Rhode Island.
Friday, July 27, 2012
Genomic scientists wanted: Healthy skepticism required
Everyone makes mistakes
...but geneticists make them more often. A Comment in this week's Nature, "Methods: Face up to false positives" by Daniel MacArthur and accompanying editorial are getting a lot of notice around the web. MacArthur's point is that biologists are too often too quick to submit surprising results for publication, and scientific journals too eager to get them into print. Much more eager than studies that report results that everyone expected.
This is all encouraged by a lay press that trumpets these kinds of results often without understanding them and certainly without vetting them. Often results are simply wrong, either for technical reasons or because statistical tests were inappropriate, wrongly done, incorrectly interpreted or poorly understood. The evidence of this is that journals are now issuing many more retractions than ever before.
Peer review catches some of this before it's published, but not nearly enough; reviewers are often overwhelmed with requests and don't give a manuscript enough attention or sometimes aren't in fact qualified to do so adequately. And journal editors are clearly not doing a good enough job.
But, as MacArthur says, "Few principles are more depressingly familiar to the veteran scientist: the more surprising a result seems to be, the less likely it is to be true." And, he says, "it has never been easier to generate high-impact false positives than in the genomic era." And this is a problem because
That time-consuming attention to detail would include checking and rechecking data coming off the sequencer, questioning surprising results and redoing them, driven by the recognition that even the sophisticated technology biologists now rely on for the masses of data they are analyzing can and does make mistakes. Which is why sequencing is often done 30 or more times before it's deemed good enough to believe. But doing it right takes money as well as time.
Skepticism required
And a healthy skepticism (which we blogged about here), or, as the commenter said, some self-criticism. You don't have to work with online genomic databases very long before it becomes obvious -- at least to the healthy skeptic -- that you have to check and recheck the data. Long experience in our lab with these data bases has taught us that they are full of sequence errors that aren't retracted, annotation errors, incorrect sequence assemblies and so on. And, results based on incorrect data are published and not retracted, but very obvious to, again, the healthy skeptic who checks the data. MacArthur cautions researchers to be stringent with quality control in their own labs, which is essential, but they also need to be aware that publicly available data are not error-free, so that results from comparative genomics must be approached with caution as well.
We've blogged before about a gene mapping study we're involved in. We've approached it as skeptics, and, we hope, avoided many common errors that way. This of course doesn't mean that we've avoided all errors, or that we'll reach important conclusions, but at least our eyes are open.
But just yesterday we ran into another instance of why that's important, and how insidious database errors can be. We are currently characterizing the SNPs (variants) in genes that differ between the strains of mice we're looking at to try to identify which are responsible for morphological differences between them.
The UCSC genome browser, an invaluable tool for bioinformatics, can show in one screen the structure of a gene of choice for numerous mammals. One of the ways a gene is identified is by someone having found a messenger RNA 'transcript' (copy) of the DNA sequence. That shows that the stretch of DNA that looks as if it might be a gene actually is one. We were looking at a gene that our mapping has identified as a possible candidate of interest and noticed that it was much much shorter in mice than in any of the other mammals shown. If we had just asked the data base for mouse genes in this chromosome region, we'd have retrieved just this short transcript. We might have accepted this without thinking and moved on, but this is a very unlikely result given how closely related the listed organisms are so we knew enough to question the data.
But we checked the mouse DNA sequence and other data and, sure enough, longer transcripts corresponding more closely to what's been reported in other mammals have been reported in mice. And additional parts of the possible gene, that correspond to what is known to be in other mammal transcripts, also exist in the mouse DNA. This strongly suggests that nobody has reported the longer transcript, but that it most likely exists and is used by mice. Thus, variation in the unreported parts of the mouse genome might be contributing to the evidence we found for an effect on head shape. But it took knowledge of comparative genomics and a healthy skepticism to figure out that there was something wrong with the original data as presented.
Not a new realization
There is a wealth of literature showing many reasons why first-reports of a new finding are likely to be misleading--either wrong or exaggerated. This is not a matter of dishonest investigators! But it is a matter of too-hasty ones. The reason is that if you search for things, those that by pure statistical fluke pop out are the ones that are going to be noticed. If you're not sufficiently critical of the possibility that they are artifacts of your study design, and you take the results seriously, you will report them to the major journals. And your career takes off!
A traditional football coach once said of forward passes, that there are three things that can happen (incomplete, complete, intercepted) and only one of them is good....so he didn't like to pass. Something similar applies here: If you are circumspect, you may
1. later have the let-down experience of realizing that there was some error--not carelessness, just aspects of luck or things like problems with the DNA sequencer's ability to find variants in a sample, and so on.
2. Then you don't get your first Big Story paper, much less the later ones that refine the finding (that is, acknowledge it was wrong without actually saying so).
3. Worse, if it's actually right but you wait til you've appropriately dotted your i's and crossed your t's, somebody else might find the same thing and report it, and they get all the credit! You may be wrong and later data dampens your results, but nobody remembers the exaggeration, Nature and the NY Times don't retract the story, your paper still gets all the citations (nobody 'vacates' them the way Penn State's football victories were vacated by the NCAA), you already got your merit raise based on the paper.....you win even when you lose!
So the pressures are on everyone to rush to judgment, and the penalties are mild (here, of course, we're not talking about any sort of fraud or dishonesty). Again, many papers and examples exist pointing the issues out, and the subject has been written about time and again. But in whose interest is it to change operating procedures?
Even so, it's refreshing to see this cautionary piece in a major journal. Will it make a difference? Not unless students are taught to be skeptical about results from the very start. And the journals' confessions aren't sincere: Tomorrow, you can safely bet that the same journals will be back to business as usual.
...but geneticists make them more often. A Comment in this week's Nature, "Methods: Face up to false positives" by Daniel MacArthur and accompanying editorial are getting a lot of notice around the web. MacArthur's point is that biologists are too often too quick to submit surprising results for publication, and scientific journals too eager to get them into print. Much more eager than studies that report results that everyone expected.
This is all encouraged by a lay press that trumpets these kinds of results often without understanding them and certainly without vetting them. Often results are simply wrong, either for technical reasons or because statistical tests were inappropriate, wrongly done, incorrectly interpreted or poorly understood. The evidence of this is that journals are now issuing many more retractions than ever before.Peer review catches some of this before it's published, but not nearly enough; reviewers are often overwhelmed with requests and don't give a manuscript enough attention or sometimes aren't in fact qualified to do so adequately. And journal editors are clearly not doing a good enough job.
But, as MacArthur says, "Few principles are more depressingly familiar to the veteran scientist: the more surprising a result seems to be, the less likely it is to be true." And, he says, "it has never been easier to generate high-impact false positives than in the genomic era." And this is a problem because
Flawed papers cause harm beyond their authors: they trigger futile projects, stalling the careers of graduate students and postdocs, and they degrade the reputation of genomic research. To minimize the damage, researchers, reviewers and editors need to raise the standard of evidence required to establish a finding as fact.It's, as the saying goes, a perfect storm. The unrelenting pressure to get results that will be published in high-impact journals, and then The New York Times, which can make a career -- i.e., get a post-doc a job or any researcher more grants, tenure, and further rewards -- combined with journals' drive to be 'high-impact' and newspapers' need to sell newspapers all discourages time-consuming attention to detail. And, as a commenter on the Nature piece said, in this atmosphere "any researcher who [is] more self-critical than average would be at a major competitive disadvantage."
That time-consuming attention to detail would include checking and rechecking data coming off the sequencer, questioning surprising results and redoing them, driven by the recognition that even the sophisticated technology biologists now rely on for the masses of data they are analyzing can and does make mistakes. Which is why sequencing is often done 30 or more times before it's deemed good enough to believe. But doing it right takes money as well as time.
Skepticism required
And a healthy skepticism (which we blogged about here), or, as the commenter said, some self-criticism. You don't have to work with online genomic databases very long before it becomes obvious -- at least to the healthy skeptic -- that you have to check and recheck the data. Long experience in our lab with these data bases has taught us that they are full of sequence errors that aren't retracted, annotation errors, incorrect sequence assemblies and so on. And, results based on incorrect data are published and not retracted, but very obvious to, again, the healthy skeptic who checks the data. MacArthur cautions researchers to be stringent with quality control in their own labs, which is essential, but they also need to be aware that publicly available data are not error-free, so that results from comparative genomics must be approached with caution as well.
We've blogged before about a gene mapping study we're involved in. We've approached it as skeptics, and, we hope, avoided many common errors that way. This of course doesn't mean that we've avoided all errors, or that we'll reach important conclusions, but at least our eyes are open.
But just yesterday we ran into another instance of why that's important, and how insidious database errors can be. We are currently characterizing the SNPs (variants) in genes that differ between the strains of mice we're looking at to try to identify which are responsible for morphological differences between them.
The UCSC genome browser, an invaluable tool for bioinformatics, can show in one screen the structure of a gene of choice for numerous mammals. One of the ways a gene is identified is by someone having found a messenger RNA 'transcript' (copy) of the DNA sequence. That shows that the stretch of DNA that looks as if it might be a gene actually is one. We were looking at a gene that our mapping has identified as a possible candidate of interest and noticed that it was much much shorter in mice than in any of the other mammals shown. If we had just asked the data base for mouse genes in this chromosome region, we'd have retrieved just this short transcript. We might have accepted this without thinking and moved on, but this is a very unlikely result given how closely related the listed organisms are so we knew enough to question the data.
But we checked the mouse DNA sequence and other data and, sure enough, longer transcripts corresponding more closely to what's been reported in other mammals have been reported in mice. And additional parts of the possible gene, that correspond to what is known to be in other mammal transcripts, also exist in the mouse DNA. This strongly suggests that nobody has reported the longer transcript, but that it most likely exists and is used by mice. Thus, variation in the unreported parts of the mouse genome might be contributing to the evidence we found for an effect on head shape. But it took knowledge of comparative genomics and a healthy skepticism to figure out that there was something wrong with the original data as presented.
Not a new realization
There is a wealth of literature showing many reasons why first-reports of a new finding are likely to be misleading--either wrong or exaggerated. This is not a matter of dishonest investigators! But it is a matter of too-hasty ones. The reason is that if you search for things, those that by pure statistical fluke pop out are the ones that are going to be noticed. If you're not sufficiently critical of the possibility that they are artifacts of your study design, and you take the results seriously, you will report them to the major journals. And your career takes off!
A traditional football coach once said of forward passes, that there are three things that can happen (incomplete, complete, intercepted) and only one of them is good....so he didn't like to pass. Something similar applies here: If you are circumspect, you may
1. later have the let-down experience of realizing that there was some error--not carelessness, just aspects of luck or things like problems with the DNA sequencer's ability to find variants in a sample, and so on.
2. Then you don't get your first Big Story paper, much less the later ones that refine the finding (that is, acknowledge it was wrong without actually saying so).
3. Worse, if it's actually right but you wait til you've appropriately dotted your i's and crossed your t's, somebody else might find the same thing and report it, and they get all the credit! You may be wrong and later data dampens your results, but nobody remembers the exaggeration, Nature and the NY Times don't retract the story, your paper still gets all the citations (nobody 'vacates' them the way Penn State's football victories were vacated by the NCAA), you already got your merit raise based on the paper.....you win even when you lose!
So the pressures are on everyone to rush to judgment, and the penalties are mild (here, of course, we're not talking about any sort of fraud or dishonesty). Again, many papers and examples exist pointing the issues out, and the subject has been written about time and again. But in whose interest is it to change operating procedures?
Even so, it's refreshing to see this cautionary piece in a major journal. Will it make a difference? Not unless students are taught to be skeptical about results from the very start. And the journals' confessions aren't sincere: Tomorrow, you can safely bet that the same journals will be back to business as usual.
Thursday, May 3, 2012
Metaphysics in science, Part V: Is 'risk' real or metaphysical?
By
Ken Weiss
Metaphysical ideas imposed on the world as if they were derived from the world go against the nature of modern science and bear similarity to a long-standing but rejected idea about how we understand existence. We've discussed some facets of these issues, from the point of view of modern evolutionary and genomic sciences as we see them, and to provoke thought (but not as professional philosophers or historians of science, to which we make no claim!).
Here we want to conclude by considering these issues related to an aspect of causation that we've dealt with in a previous series of posts, when causation and in applied areas the notion of 'risk', are probabilistic. Are these metaphysical concepts in any important sense, or are they just plain-vanilla and not-misleading conveniences, like our use of the term 'the human genome' to represent something that really doesn't exist but helps us understand what does exist?
Plato's concepts, from his analogy of the cave that we have referred to, were that abstract ideals actually exist, but all we can experience of them are shadowy, imperfect manifestations. In genetics, we only observe instances of the human genome, but there is no such thing as 'the human genome'. This doesn't bother us a bit because we understand the usefulness of an arbitrary (that is, agreed-on) reference to organize our discussions of human genetics.
Ideas like 'chair' or 'dog' may not have Platonic reality, but again are very useful without being misleading, relative to real chairs or dogs. In the case of dogs or genes, we even have very good, wholly material, empirical theories of population, that account for the collection of real-world objects to which we apply terms like 'dog' or 'gene'. The population concept does not require the existence of some 'ideal'.
Plato also dealt with more elusive examples, like 'good'. This is much less clear: does 'good' exist out-there in the meta-world with some reality of its own, or do we just observe instances of 'good' in the physical world? It's less clear than 'gene' or 'dog' because we haven't got a way to agree what 'good' is an arbitrary reference for. 'Good' is not a specifiable population of things.
But what about probability, say as expressed in terms of the 'risk' of getting a given disease if you carry a specific instance of some named gene?
Statistical causation: what kind of reality?
As we outlined in our series of posts on probability, the concept isn't always clear. When we speak of the probability of a given variant, say one of the two copies of a gene that a person has, being transmitted to a given offspring, what do we mean? We mean that in a long series of producing offspring, each copy will be transmitted to an offspring the same fraction of the time. That's a frequency interpretation. We have purely materialistic notions of how the molecules (DNA) randomly buzz around the nucleus of the sperm or egg precursor cell, and one of the two just happens to end up in a given sperm or egg cell. Neither copy has an advantage--that's a functional interpretation of probability.
In these instances, all we actually see are manifestations of the transmission of genes from parent to offspring. So in a sense, the 'probability' is a purely metaphysical concept: it exists in our heads whether or not anybody ever produces an actual offspring. In some ways the functional or frequency interpretations don't really matter, but in other ways the metaphysical nature is troubling. That's because we can only test its reality by experience and experience--even if our very notion of probability is correct--never precisely realizes the expected result! For example, the probability of your transmitting variant A to your next child may be 50%, and that may be as 'true' as true can be. But if you only have one child, it either received the A or it didn't. Further, in some sense (e.g., diploid organisms) we believe that the Mendelian process is universal. The cave-wall manifestations of shadows of metaphysical truths simply cannot tell you the truth!
So we have other ways to view probability concepts about the world. One is called 'likelihood', and it's used to say if our metaphysical idea that there is a true probability of transmitting an A to any given child is right, then what do the actual data tell us is the most likely value of that probability? Again, we're playing around with notions of truth. But if we believe--and 'believe' is the right word here--that genetic transmission works this way, we can learn from experience about it. This and other statistical ways of dealing with the probabilistic world reside largely in belief about what might be true in the world, rather than direct proof of what's true. But even in this case we believe that one of the alternatives we are considering is actually true! But is that not itself a metaphysical statement?
There is a danger in this and it seems to relate to the reason metaphysics was strongly rejected in the age of modern science, beginning around 400 years ago. The danger is that we can assume that ideas in our heads are real, yet nothing other than actual experience can tell us if we're right. So, instead, the new scientific method said, why not rely entirely on experience? Let experience show us what the truth is. After all, we want our ideas to enable us to predict future experiences, things not yet observed. That's what scientific theory is all about. As long as our theory is actually about reality, rooted in experience, this seems to work rather well, at least in practical terms.
Let's look at another example that we referred to earlier in this series. What is the 'probability' that a human with curly hair and agile thumbs will evolve from monkey stock? This is not about frequency of events in any useful sense; it's about something-or-other regarding things that might have happened. (We did, in fact, evolve, but we could ask the similar question, like "what is the probability that a 4-fingered, 6-toed language-speaking fully aquatic primate will evolve?")
These really are basically metaphysical questions. What is the chance that human-like life exists on other planets (something we've discussed earlier, as well, in posts about 'infinity')? Such questions seem to be about reality, but hardly are because the answer requires a numerical value ('chance', between 0 and 1) and there is no serious way of finding out the value, much less whether it's true, much less whether the idea that some such value there actually exists is itself correct.
As we've said in this series, metaphysics is vulnerable to beliefs not clearly shown by reality. Religious assertions are often accused of this fundamental fallacy. But scientific assertions clearly are also vulnerable in this way, because unlike religion, science is purported to be strictly about the real, material world. Yet, we believed Isaac Newton--clearly a modern scientist--until Einstein came along. So, when probabilistic causation is important, or seems to be the case, we are very vulnerable. What should we 'believe'?
To bring things back to earth, so to speak, these issues arise in full dress when it comes to interpreting genomics and in inferring genetic causation today and in evolution. The promises made of individual life-experience prediction from genomes sequenced at birth, or that GWAS or biobank whole genome sequence will do that, or enable all known human ills to disappear, are examples. They are based not just on what are largely metaphysical notions about causation, and when this is admitted to be probabilistic, about predicted outcomes. This is treated as if in the functional or frequency sense of probability, but the evidence is really clear that this is only mildly accurate. The point is that while advocates freely admit that we're not there yet, they believe that accurate--indeed perfect?--prediction is possible in principle.
When traits like the objects of GWAS and other 'omics are due not just to practicably countless contributing factors, some genetic and perhaps identifiable but others not known, but each of them somehow working only probabilistically, then we are more squarely in the metaphysical world. The probabilities now are really not of the frequency or even functional sort, except very abstractly. They are more of the belief sort. The same statements apply to many aspects of inferences made about how evolution has worked, and in particular, stories offering adaptive genetic explanations for traits seen today. Those, too, are probabilistic in the belief sense ("it seems likely that upright-walking hominids were able to compete to secure food from .....").
It is not just a belief that no immaterial forces intervene in genetic causation, say, of a disease. It is that if we knew everything, everything could be predicted. But there is no way to replicate unique events, like individual genomewide genotypes and all environmental experiences, we can never actually know how true this is.
Yet, and here is where we think people are dabbling in metaphysics when doing this kind of genetics: the belief system is so strong that it goes beyond an assertion that we just don't yet have adequate evidence, but actually goes against the evidence, which in the face of probabilistic complexity is already generally quite weak. It becomes, as we have said, imposing metaphysics on the real world, rather than the other way around. And this then can be very misleading to science and the distribution of limited resources we have to understand the world. It again becomes an obeisance of belief, or the exact opposite of science--a form of denial: again, it is the zen of genomics, when No means Yes.
What is metaphysical? What can we hope actually to know?
Metaphysics as we use the term in this series is the Platonic ideal that truth does exist somehow, and all we see is approximate manifestations of it. Science claims to have rejected that notion. We've seen examples where metaphysical abstractions still used in science are not particularly damaging.
But in genomics we are seeing something that was predictable (and predicted) for the right reasons decades ago--complexity is the rule, but people still want traits to parse simply. It is the investigator as an ostrich, hiding from the very truth he claims dedicated to find. It is the assumption of higher-level truth, in some ways thumbing one's nose at the evidence.
Coming full circle: when is a finding a 'finding'?
We return to where we began in this series, the assertion that unless you find some hoped-for, dramatic, simple tractable result, you haven't made a 'finding'. This attitude is such a shallow shadow of any semblance of an understanding of the nature of reality and our understanding of it, that we think it's not too much of a stretch to say that it poses a threat to society. That's because overly Platonic views of the world are misleading, divert resources, can lead to awful conflicts, and so on, as history very clearly shows.
Again, Plato provided a metaphysical view of existence. Ideas about things were real, and things themselves were, in some sense, not as real. Philosophers have sliced and diced these ideas over the centuries, in many sophisticated ways. Metaphysics grew from being rather central to humans (in western cultures, at least) trying to make sense of the world, to an airy-fairy world that scientists love to sneer at. But do we not much more, and much more culpably, indulge in implicit Platonic metaphysics than we care to admit?
Many philosophers have dealt with the difference between the real, empirical world we can touch and smell, and the world our neurons construct within our heads. From Aristotle and solipsists in classic times, to Kant and many others, the issue of how or whether our limited sensory apparatus and brain can actually and truly know anything other than itself has been an open one for philosophizing. And of course there are the works of countless religious thinkers about the nature or even existence of 'things' and non-things.
Here, we're not dabbling in such ultimates, nor are we qualified even to summarize the centuries of sophisticated thought about those issues--nor, for that matter, the thoughts of poets and artists whose work deals directly with them. We are simply assuming there is a reality 'out there', and that the interest of science is in how to understand it, both pragmatically and ultimately. Our context is genetics and evolution, not whether neutrons outrace light or electrons exist in fixed locations and all that.
But life goes beyond ordinary physics in which, anywhere at all, every oxygen molecule is alike and all photons speed equally through vacuums. Physics and chemistry are comfortable with concepts about collections of such identical things, as abstractions representing tractable realities whose collective behavior follows nice principles, or laws. Even when they are probabilistic, as in describing the pressure of a gas in a container, when this is due to random buzzing of huge numbers of identical objects. Pressure is empirical as a pragmatic stand-in for practically assessable instances. But it's metaphysical to the extent of the assertion that 'it', whatever it is, exists uniformly, eternally, and ubiquitously.
But evolution is not about the collective and eternal behavior of identical items, but instead is inherently about variable, ephemeral ones--from genes on up to ecosystems. We cannot assert identity in the way a chemist does, because the entirety of the life sciences is in a meaningful sense about variation.
This means that elusive issues like emergence or statistical causation, by individually unique collections of elements, isn't really like physics (even if all the elements, like you and us and globin genes and genomes, ultimately follow physical and chemical principles). It is the organization of life that's different in the sense we're considering.
One can argue that making assumptions about that organization, when it does not have specific, replicable instances, verges on Platonic metaphysics, and goes beyond convenient pragmatism. It is like asking whether 'good' exists. The danger is not that we have things we profoundly don't understand, even deep concepts like probability when we cannot confirm it in any actual way. The danger is that we really do indulge in metaphysics in the guise of science, by being immune to the messages that the real world, when it is not just instances like shadows on a cave-wall, sends us.
In a way, that has always been the deepest problem with metaphysics: it is not sufficiently constrained by reality. Yet of all fields of human endeavor, science should try very hard to understand the real world, not the ideal world or the wished-for world. Instead, of the current kind of metaphysics, we should be out in the sun where the truth, as well as is shadows can be seen. But things can be more comfortingly simple in the cave--the cave of denial of evidence. Like the original cavemen, perhaps we prefer the comfort of the dark. At least, to a great extent, for convenience and self-interest, even some scientists are staying in the cave on purpose.
We may respect Plato, but we should not become neo-Cavemen!
A request for comments by those who know!
We have said many times in this series, that we are not professional historians or philosophers of science, and that we are using terms--especially, 'metaphysics'--in a particular restricted sense. We also know of the existence of a vast literature over 2500 years on aspects of the subject. But we've only read, or dabbled, really, in a tiny fraction of that literature. So if there are any MT readers who are expert in these areas, we'd be happy to have commentary that constructively addressed the issues, as we have raised them, or to add to or modify what we've attempted to say.
Here we want to conclude by considering these issues related to an aspect of causation that we've dealt with in a previous series of posts, when causation and in applied areas the notion of 'risk', are probabilistic. Are these metaphysical concepts in any important sense, or are they just plain-vanilla and not-misleading conveniences, like our use of the term 'the human genome' to represent something that really doesn't exist but helps us understand what does exist?
![]() |
| Plato's cave: Wikimedia Commons |
Ideas like 'chair' or 'dog' may not have Platonic reality, but again are very useful without being misleading, relative to real chairs or dogs. In the case of dogs or genes, we even have very good, wholly material, empirical theories of population, that account for the collection of real-world objects to which we apply terms like 'dog' or 'gene'. The population concept does not require the existence of some 'ideal'.
Plato also dealt with more elusive examples, like 'good'. This is much less clear: does 'good' exist out-there in the meta-world with some reality of its own, or do we just observe instances of 'good' in the physical world? It's less clear than 'gene' or 'dog' because we haven't got a way to agree what 'good' is an arbitrary reference for. 'Good' is not a specifiable population of things.
But what about probability, say as expressed in terms of the 'risk' of getting a given disease if you carry a specific instance of some named gene?
Statistical causation: what kind of reality?
As we outlined in our series of posts on probability, the concept isn't always clear. When we speak of the probability of a given variant, say one of the two copies of a gene that a person has, being transmitted to a given offspring, what do we mean? We mean that in a long series of producing offspring, each copy will be transmitted to an offspring the same fraction of the time. That's a frequency interpretation. We have purely materialistic notions of how the molecules (DNA) randomly buzz around the nucleus of the sperm or egg precursor cell, and one of the two just happens to end up in a given sperm or egg cell. Neither copy has an advantage--that's a functional interpretation of probability.
In these instances, all we actually see are manifestations of the transmission of genes from parent to offspring. So in a sense, the 'probability' is a purely metaphysical concept: it exists in our heads whether or not anybody ever produces an actual offspring. In some ways the functional or frequency interpretations don't really matter, but in other ways the metaphysical nature is troubling. That's because we can only test its reality by experience and experience--even if our very notion of probability is correct--never precisely realizes the expected result! For example, the probability of your transmitting variant A to your next child may be 50%, and that may be as 'true' as true can be. But if you only have one child, it either received the A or it didn't. Further, in some sense (e.g., diploid organisms) we believe that the Mendelian process is universal. The cave-wall manifestations of shadows of metaphysical truths simply cannot tell you the truth!
So we have other ways to view probability concepts about the world. One is called 'likelihood', and it's used to say if our metaphysical idea that there is a true probability of transmitting an A to any given child is right, then what do the actual data tell us is the most likely value of that probability? Again, we're playing around with notions of truth. But if we believe--and 'believe' is the right word here--that genetic transmission works this way, we can learn from experience about it. This and other statistical ways of dealing with the probabilistic world reside largely in belief about what might be true in the world, rather than direct proof of what's true. But even in this case we believe that one of the alternatives we are considering is actually true! But is that not itself a metaphysical statement?
There is a danger in this and it seems to relate to the reason metaphysics was strongly rejected in the age of modern science, beginning around 400 years ago. The danger is that we can assume that ideas in our heads are real, yet nothing other than actual experience can tell us if we're right. So, instead, the new scientific method said, why not rely entirely on experience? Let experience show us what the truth is. After all, we want our ideas to enable us to predict future experiences, things not yet observed. That's what scientific theory is all about. As long as our theory is actually about reality, rooted in experience, this seems to work rather well, at least in practical terms.
Let's look at another example that we referred to earlier in this series. What is the 'probability' that a human with curly hair and agile thumbs will evolve from monkey stock? This is not about frequency of events in any useful sense; it's about something-or-other regarding things that might have happened. (We did, in fact, evolve, but we could ask the similar question, like "what is the probability that a 4-fingered, 6-toed language-speaking fully aquatic primate will evolve?")
These really are basically metaphysical questions. What is the chance that human-like life exists on other planets (something we've discussed earlier, as well, in posts about 'infinity')? Such questions seem to be about reality, but hardly are because the answer requires a numerical value ('chance', between 0 and 1) and there is no serious way of finding out the value, much less whether it's true, much less whether the idea that some such value there actually exists is itself correct.
As we've said in this series, metaphysics is vulnerable to beliefs not clearly shown by reality. Religious assertions are often accused of this fundamental fallacy. But scientific assertions clearly are also vulnerable in this way, because unlike religion, science is purported to be strictly about the real, material world. Yet, we believed Isaac Newton--clearly a modern scientist--until Einstein came along. So, when probabilistic causation is important, or seems to be the case, we are very vulnerable. What should we 'believe'?
To bring things back to earth, so to speak, these issues arise in full dress when it comes to interpreting genomics and in inferring genetic causation today and in evolution. The promises made of individual life-experience prediction from genomes sequenced at birth, or that GWAS or biobank whole genome sequence will do that, or enable all known human ills to disappear, are examples. They are based not just on what are largely metaphysical notions about causation, and when this is admitted to be probabilistic, about predicted outcomes. This is treated as if in the functional or frequency sense of probability, but the evidence is really clear that this is only mildly accurate. The point is that while advocates freely admit that we're not there yet, they believe that accurate--indeed perfect?--prediction is possible in principle.
When traits like the objects of GWAS and other 'omics are due not just to practicably countless contributing factors, some genetic and perhaps identifiable but others not known, but each of them somehow working only probabilistically, then we are more squarely in the metaphysical world. The probabilities now are really not of the frequency or even functional sort, except very abstractly. They are more of the belief sort. The same statements apply to many aspects of inferences made about how evolution has worked, and in particular, stories offering adaptive genetic explanations for traits seen today. Those, too, are probabilistic in the belief sense ("it seems likely that upright-walking hominids were able to compete to secure food from .....").
It is not just a belief that no immaterial forces intervene in genetic causation, say, of a disease. It is that if we knew everything, everything could be predicted. But there is no way to replicate unique events, like individual genomewide genotypes and all environmental experiences, we can never actually know how true this is.
Yet, and here is where we think people are dabbling in metaphysics when doing this kind of genetics: the belief system is so strong that it goes beyond an assertion that we just don't yet have adequate evidence, but actually goes against the evidence, which in the face of probabilistic complexity is already generally quite weak. It becomes, as we have said, imposing metaphysics on the real world, rather than the other way around. And this then can be very misleading to science and the distribution of limited resources we have to understand the world. It again becomes an obeisance of belief, or the exact opposite of science--a form of denial: again, it is the zen of genomics, when No means Yes.
What is metaphysical? What can we hope actually to know?
Metaphysics as we use the term in this series is the Platonic ideal that truth does exist somehow, and all we see is approximate manifestations of it. Science claims to have rejected that notion. We've seen examples where metaphysical abstractions still used in science are not particularly damaging.
But in genomics we are seeing something that was predictable (and predicted) for the right reasons decades ago--complexity is the rule, but people still want traits to parse simply. It is the investigator as an ostrich, hiding from the very truth he claims dedicated to find. It is the assumption of higher-level truth, in some ways thumbing one's nose at the evidence.
Coming full circle: when is a finding a 'finding'?
We return to where we began in this series, the assertion that unless you find some hoped-for, dramatic, simple tractable result, you haven't made a 'finding'. This attitude is such a shallow shadow of any semblance of an understanding of the nature of reality and our understanding of it, that we think it's not too much of a stretch to say that it poses a threat to society. That's because overly Platonic views of the world are misleading, divert resources, can lead to awful conflicts, and so on, as history very clearly shows.
Again, Plato provided a metaphysical view of existence. Ideas about things were real, and things themselves were, in some sense, not as real. Philosophers have sliced and diced these ideas over the centuries, in many sophisticated ways. Metaphysics grew from being rather central to humans (in western cultures, at least) trying to make sense of the world, to an airy-fairy world that scientists love to sneer at. But do we not much more, and much more culpably, indulge in implicit Platonic metaphysics than we care to admit?
Many philosophers have dealt with the difference between the real, empirical world we can touch and smell, and the world our neurons construct within our heads. From Aristotle and solipsists in classic times, to Kant and many others, the issue of how or whether our limited sensory apparatus and brain can actually and truly know anything other than itself has been an open one for philosophizing. And of course there are the works of countless religious thinkers about the nature or even existence of 'things' and non-things.
Here, we're not dabbling in such ultimates, nor are we qualified even to summarize the centuries of sophisticated thought about those issues--nor, for that matter, the thoughts of poets and artists whose work deals directly with them. We are simply assuming there is a reality 'out there', and that the interest of science is in how to understand it, both pragmatically and ultimately. Our context is genetics and evolution, not whether neutrons outrace light or electrons exist in fixed locations and all that.
But life goes beyond ordinary physics in which, anywhere at all, every oxygen molecule is alike and all photons speed equally through vacuums. Physics and chemistry are comfortable with concepts about collections of such identical things, as abstractions representing tractable realities whose collective behavior follows nice principles, or laws. Even when they are probabilistic, as in describing the pressure of a gas in a container, when this is due to random buzzing of huge numbers of identical objects. Pressure is empirical as a pragmatic stand-in for practically assessable instances. But it's metaphysical to the extent of the assertion that 'it', whatever it is, exists uniformly, eternally, and ubiquitously.
But evolution is not about the collective and eternal behavior of identical items, but instead is inherently about variable, ephemeral ones--from genes on up to ecosystems. We cannot assert identity in the way a chemist does, because the entirety of the life sciences is in a meaningful sense about variation.
This means that elusive issues like emergence or statistical causation, by individually unique collections of elements, isn't really like physics (even if all the elements, like you and us and globin genes and genomes, ultimately follow physical and chemical principles). It is the organization of life that's different in the sense we're considering.
One can argue that making assumptions about that organization, when it does not have specific, replicable instances, verges on Platonic metaphysics, and goes beyond convenient pragmatism. It is like asking whether 'good' exists. The danger is not that we have things we profoundly don't understand, even deep concepts like probability when we cannot confirm it in any actual way. The danger is that we really do indulge in metaphysics in the guise of science, by being immune to the messages that the real world, when it is not just instances like shadows on a cave-wall, sends us.
In a way, that has always been the deepest problem with metaphysics: it is not sufficiently constrained by reality. Yet of all fields of human endeavor, science should try very hard to understand the real world, not the ideal world or the wished-for world. Instead, of the current kind of metaphysics, we should be out in the sun where the truth, as well as is shadows can be seen. But things can be more comfortingly simple in the cave--the cave of denial of evidence. Like the original cavemen, perhaps we prefer the comfort of the dark. At least, to a great extent, for convenience and self-interest, even some scientists are staying in the cave on purpose.
We may respect Plato, but we should not become neo-Cavemen!
A request for comments by those who know!
We have said many times in this series, that we are not professional historians or philosophers of science, and that we are using terms--especially, 'metaphysics'--in a particular restricted sense. We also know of the existence of a vast literature over 2500 years on aspects of the subject. But we've only read, or dabbled, really, in a tiny fraction of that literature. So if there are any MT readers who are expert in these areas, we'd be happy to have commentary that constructively addressed the issues, as we have raised them, or to add to or modify what we've attempted to say.
Subscribe to:
Posts (Atom)









