My composer friend wants to be quite modern about creating beautiful music. He doesn't like to use computer programs for composing but he has devised another 'modern' way to compose, given that, in writing a piece, he often changes his mind. Scratching out notes on paper to replace them with 'better' ones makes for a real mess on the working pages, and he'd then have to transcribe his work onto new pages, and that in itself introduces room for mistakes. So he had an idea.
He purchased a set of notes and musical symbols, printed individually on a kind of flexible plastic. Copies of each possible note and notation element were in boxes in a little tray. As he composed, he merely took each required note from its place in the tray, and used its static electricity to place it on a page with printed staff-lines. If he changed his mind, it was easy to remove or replace a given note, and put it back in its box in the tray without generating an inky mess on the page and having to keep starting over to make his work-in-progress legible.
But there turned out to be a serious, indeed even tragic, problem. He liked working in his studio, right in front of a window giving him an inspiring view of his garden. But, after days of work composing a comparably ethereal and beautiful piece, a gust blew through the window, riffled the pages, and shook all the notes off the page and onto the table! What a scattered mess! And what a heartbreaking loss of all that work!
Of course, you could say that the composition with all its beauty was in some sense still there, right before him: all the required notes were indeed still there--every one. But they were in a pile, no longer with any order from which he could reconstruct the composition just by picking the notes up and placing them back on the page. So, it was literally all there---but none of what mattered was!
As my composer friend told me this story, it occurred to me that this was analogous to the 'pile' of DNA letters (As, Cs, Gs, and Ts) that is found by sequencing people with and without some trait, like a disease. The letters differ greatly among individuals with the 'same' trait, because they don't have the trait for the same genetic reason. And the sampled individuals' genomes vary in literally countless ways that have nothing to do with the disease. Unlike the score, the 'letters' are still in their original order, but genes don't make a score as far as we are concerned because, unlike an orchestra, we don't know how to 'play' them!
In a sense, each person we see who is playing the same tune, so to speak, is doing so from a different score. Some shared notes may be involved, but they are all jumbled up with shared, and not-shared, notes that have nothing to do with the tune.
And yet we are widely promised, and widely being trephined to pay for, the idea that looking through the jumble of genetic 'notes' we can predict just about anything you can name about each individual's traits.
Indeed, unlike the composer's problem, there are all sorts of notes that are not even visible to us (they are called 'somatic mutations'). We yearn for a health-giving genomic 'tune', which is a very natural way to feel, but we are unable (or, at least, unwilling) to face the music of genomic reality.
And, of course, this mega-scale 'omics 'research' is all justified with great vigor by NIH, as if it is on the very verge of discovering fundamental findings that will lead to miraculous cures, indeed cures for 'All of us'. At what point is it justified to refer to it as a kind of culpable fraud, a public con job?
By our bigger, bigger, bigger approach, we have entrenched 'composers' trying to read scores that are to a great extent unreadable in the way being attempted. We are so intense at this, like rows of monks transcribing sacred manuscripts in a remote monastery, that we are committed to something that we basically have every legitimate good reason to know isn't the way things are.
Showing posts with label All Of Us. Show all posts
Showing posts with label All Of Us. Show all posts
Friday, May 10, 2019
Monday, August 13, 2018
Big Data: the new Waiting for Godot
By
Ken Weiss
In Samuel Beckett's cryptic play, Waiting for Godot, two men spend the entire play anticipating the arrival of someone, Godot, at which point presumably something will happen--one can say, perhaps, that the wait will have been for some achieved objective. But what? Could it simply mean that they can then go somewhere else? Or, perhaps, there will be no end because Godot will never, in fact, arrive.
A good discussion of all of this is on the BBC Radio 4 The Forum podcast. Apparently, Beckett insisted that any such answers were in the play itself--he didn't imply that there was some external meaning, such as that Godot was God, or that the play was an allegory for the Cold War--which is one reason the play is so enigmatic.
Was the play written intentionally to be a joke, or a hoax? Of course, since the author refused to answer or perhaps even to recognize the legitimacy of the question, we'll never know. Or perhaps that in itself, is the tipoff that it really is a hoax. Or maybe (I think more likely) that because it was written in France in 1949, it's an existentialist era statement of the angst that comes from the recognition that the important questions in life don't have answers.
Waiting for the biomedical Promised Land
That was then, but today we are witnessing real-life versions of the play: things just as cleverly open-ended, with the 'What happens then?' question only having a vague, deferred answer, as in Beckett's title. And, as in the play, it is not clear how self-aware even some of the perpetrators are of what they are about.
I refer to the possibility that we are witnessing various Big Data endeavors, unknowingly imitative but as cleverly and cryptically open-ended as the implied resolution that will happen when Godot arrives. Big Data 'omics is a current, perhaps all too convenient, scientific version of the play, that we might call Waiting for God'omics. The arrival of the objective--indeed, not really stated, but just generically promised as, for example, 'precision genomic medicine' for 'All of Us'--is absolutely as slyly vague as what Vladimir and Estragon were presumably waiting for. The genomic Godot will never arrive!
This view is largely but not entirely cynical, for reasons that are at least a bit subtle themselves.
Reaching the oasis, the end of the rainbow, or the Promised Land is bad for business
One might note that if the 'omics Godot were ever to arrive, it would be the end of the Big Data (or should one say Big Gravy?) train, so obviously our Drs Vladimirs and Estragons must ensure that such a tragedy, arrival at the promised land, the elimination of all diseases in everyone, or whatever, never happens in real life. Is there any sense that anyone seriously thinks we would reach resolution of the cause of disease, with precision for all of us, say, and be able (that is, willing) to close down the Big Budget nature of our proliferating 'omictical me-too world?
We have entrenched the search for Godot, a goal so vague as to be unattainable. Even the proper use of the term 'precision' implies an asymptote, a truth that one never reaches but can get ever closer to. If we could get there, as is implied, we should have been promised 'exact' genomic medicine. And wouldn't this imply that then, finally, we'll divert the resources towards cures and prevention?
However, even if the perpetrators of the Big Promises never think or aren't aware of it, we must note that the goal cannot be reached even with the best and most honorable of intentions. Because of births and deaths, and environmental changes, and mutations and recombination, there truly never is the palm-draped oasis at which our venture could cease. There will never be an 'all' of us, and genetic causation is ever-changing (in part because of the similarly dynamic environment), meaning that there are no such things as risks to be approached with 'precision'. Risks are changeable and not stable, and indeed not fixed numerical values. At best, they are collective population (or sample) averages. So there is never a 'there' there, anywhere. There is only a different one everywhere.
But awareness of these facts doesn't seem to be part of the 'omicsalyptic promises with which we are inundated. They seem, by contrast, rote promises that are little if any different from political, economic, or religious promises--if only we do this, we'd get to a Promised Land. But such a land does not exist.
If we had, say, a real national health system, it would be properly and avowedly open-ended without anyone honorable objecting (if it were done well). And epidemiologically, of course, there will always be new mutations, recombinations, environments and the like to try to understand--disease with, or without strong genotype-phenotype causation. There will always be a need for health research (and basic science). But science, of all fields of human endeavor, should be honest. It should not hold out the promise that Godot will arrive, but in a sense, openly acknowledge that that can never happen.
But this doesn't let those off the guilty hook who are hawking today's implicit Big Data, big open-ended budget promise that by goosing up research now we'll soon eliminate genetic disease (I recall that Francis Collins did indeed, not all that long ago, promise that this Paradise would come soon--um, I think his date was something like 2010!) It's irresponsible, self-interested promising, of course. And those in genomics who are intelligent enough to deserve to be in genomics do, or should, know that very well.
Like Vladimir and Estragon, we'll always be told that we're waiting for Godot, and that he'll be coming soon.
NOTE: One might observe that Godoism is a firmly entrenched strategy elsewhere in our society, for examples, in regard to theoretical physics, where there will never be a collider big enough to answer the questions about fundamental particles: coming to closure would be as fiscally threatening to physics as it is to life sciences. Science is not alone in this, but our society does not pay it nearly enough skeptical heed.
![]() |
| www.mckellen.com |
A good discussion of all of this is on the BBC Radio 4 The Forum podcast. Apparently, Beckett insisted that any such answers were in the play itself--he didn't imply that there was some external meaning, such as that Godot was God, or that the play was an allegory for the Cold War--which is one reason the play is so enigmatic.
Was the play written intentionally to be a joke, or a hoax? Of course, since the author refused to answer or perhaps even to recognize the legitimacy of the question, we'll never know. Or perhaps that in itself, is the tipoff that it really is a hoax. Or maybe (I think more likely) that because it was written in France in 1949, it's an existentialist era statement of the angst that comes from the recognition that the important questions in life don't have answers.
Waiting for the biomedical Promised Land
That was then, but today we are witnessing real-life versions of the play: things just as cleverly open-ended, with the 'What happens then?' question only having a vague, deferred answer, as in Beckett's title. And, as in the play, it is not clear how self-aware even some of the perpetrators are of what they are about.
I refer to the possibility that we are witnessing various Big Data endeavors, unknowingly imitative but as cleverly and cryptically open-ended as the implied resolution that will happen when Godot arrives. Big Data 'omics is a current, perhaps all too convenient, scientific version of the play, that we might call Waiting for God'omics. The arrival of the objective--indeed, not really stated, but just generically promised as, for example, 'precision genomic medicine' for 'All of Us'--is absolutely as slyly vague as what Vladimir and Estragon were presumably waiting for. The genomic Godot will never arrive!
This view is largely but not entirely cynical, for reasons that are at least a bit subtle themselves.
Reaching the oasis, the end of the rainbow, or the Promised Land is bad for business
One might note that if the 'omics Godot were ever to arrive, it would be the end of the Big Data (or should one say Big Gravy?) train, so obviously our Drs Vladimirs and Estragons must ensure that such a tragedy, arrival at the promised land, the elimination of all diseases in everyone, or whatever, never happens in real life. Is there any sense that anyone seriously thinks we would reach resolution of the cause of disease, with precision for all of us, say, and be able (that is, willing) to close down the Big Budget nature of our proliferating 'omictical me-too world?
We have entrenched the search for Godot, a goal so vague as to be unattainable. Even the proper use of the term 'precision' implies an asymptote, a truth that one never reaches but can get ever closer to. If we could get there, as is implied, we should have been promised 'exact' genomic medicine. And wouldn't this imply that then, finally, we'll divert the resources towards cures and prevention?
However, even if the perpetrators of the Big Promises never think or aren't aware of it, we must note that the goal cannot be reached even with the best and most honorable of intentions. Because of births and deaths, and environmental changes, and mutations and recombination, there truly never is the palm-draped oasis at which our venture could cease. There will never be an 'all' of us, and genetic causation is ever-changing (in part because of the similarly dynamic environment), meaning that there are no such things as risks to be approached with 'precision'. Risks are changeable and not stable, and indeed not fixed numerical values. At best, they are collective population (or sample) averages. So there is never a 'there' there, anywhere. There is only a different one everywhere.
But awareness of these facts doesn't seem to be part of the 'omicsalyptic promises with which we are inundated. They seem, by contrast, rote promises that are little if any different from political, economic, or religious promises--if only we do this, we'd get to a Promised Land. But such a land does not exist.
If we had, say, a real national health system, it would be properly and avowedly open-ended without anyone honorable objecting (if it were done well). And epidemiologically, of course, there will always be new mutations, recombinations, environments and the like to try to understand--disease with, or without strong genotype-phenotype causation. There will always be a need for health research (and basic science). But science, of all fields of human endeavor, should be honest. It should not hold out the promise that Godot will arrive, but in a sense, openly acknowledge that that can never happen.
But this doesn't let those off the guilty hook who are hawking today's implicit Big Data, big open-ended budget promise that by goosing up research now we'll soon eliminate genetic disease (I recall that Francis Collins did indeed, not all that long ago, promise that this Paradise would come soon--um, I think his date was something like 2010!) It's irresponsible, self-interested promising, of course. And those in genomics who are intelligent enough to deserve to be in genomics do, or should, know that very well.
Like Vladimir and Estragon, we'll always be told that we're waiting for Godot, and that he'll be coming soon.
NOTE: One might observe that Godoism is a firmly entrenched strategy elsewhere in our society, for examples, in regard to theoretical physics, where there will never be a collider big enough to answer the questions about fundamental particles: coming to closure would be as fiscally threatening to physics as it is to life sciences. Science is not alone in this, but our society does not pay it nearly enough skeptical heed.
Sunday, May 6, 2018
"All of us" Who are 'us'?
By
Ken Weiss
So the slogan du jour, All Of Us, is the name of a 1.4 billion dollar initiative being launched today by NIH Director Francis Collins. The plan is to enroll one million volunteers in this mega-effort, the goal of which is, well, it depends. It is either to learn how to prevent and treat "several common diseases" or, according to Dr Collins who talked about the initiative here, "It's gonna give us the information we currently lack" to "allow us to understand all of those things we don't know that will lead to better health care." He's very enthusiastic about All of Us (aka Precision Medicine), calling it a "national adventure that's going to transform medical care." This might be viewed in the context of promises in the late 1900s that by now we'd basically have solved these problems--rather than needing ever-bigger longer-term 'data'.
And one can ask how the data quality can possibly be maintained if medical records of whoever volunteers vary in their quality, verifiability, and so on. But that is a technical issue. There are sociological and ontological issues as well.
All of Us?
Serving 'all of us' sounds very noble and representative. But let's see how sincere this publicly hyped promise really is. Using very rough figures, which will serve the point, there are 320 million Americans. So 1 million volunteers would be about 0.3% of 'all' of us. So first we might ask: What about achieving some semblance of real inclusive fairness in our society, by making a special effort to oversample African Americans, Hispanics, and Native Americans, before the privileged, mainly white, middle class get their names on the roles? That might make up for past abuses affecting their health and well-being.
So, OK, let's stop dreaming but at least make the sample representative of the country, white and otherwise. Does that imply fairness? There are, for example, about 300,000 Navajo Native Americans in the country. If All Of Us means what it promises, there would be about 950 Navajos in the sample. And about 56 Hopi tribespeople. And there are, of course, many other ethnic groups that would have to be included. Random (proportionate) sampling would include about 600,000 'white' people in the sample.
These are just crude subpopulation counts from superficial Google searching, but the point is that in no sense is the proposed self-selected sample of volunteers going to represent All Of Us in anything resembling fair distribution of medical benefits. You can't get as much detailed genomewide (not to mention environmental) data from a few hundred sampled individuals compared to hundreds of thousands. To be fair and representative in that sense, the sample would have to be stratified in some way rather than volunteer-based. It seems very unlikely that the volunteers who will be included are in some real sense going to be representative of the US, rather than, say of university and other privileged communities, major cities, and so on--even if not because of intentional bias but simply because they are more likely to learn of All Of Us and to participate.
Of course, defining what is fair and just is not easy. For example, there are far more Anglo Americans than Navajo or Hopi. So the Anglos might expect to get most of the benefits. But that isn't what All Of Us seems to be promising. To get adequate information from a small group, given the causal complexity we are trying to understand, they should probably be heavily oversampled. Even doing that would leave room for enough samples from the larger populations of Anglo and African-Americans adequate for the kind of discovery we could anticipate from this sort of Big Data study of causes of common disease.
More problems than sociology
That is the sociological problem of claiming representativeness of 'all' of us. But of course there is a deeper problem that we've discussed many times, and that is the false implied promise of essentially blanket (miracle?) cures for common diseases. In fact, we know very well that complex causation, of the common diseases that are the purported target of this initiative, involves tens to thousands of variable genome locations, not to mention the environmental ones that are beyond simple counting. Further, and this is a serious, nontrivial point, we know that these sorts of contributing causes include genetic and environmental exposures in the sampled individuals' futures, and these cannot be predicted, even in principle. These are the realities.
And, even if the project were truly representative of the US population demographically, as a sample of self-selected volunteers there remains the problem of representing diseases in the population subsets. Presumably this is why they are focusing on "common diseases", but still the sample will have to be stratified by possible causal exposures (lifestyles, diets, etc) and ethnicity, and then they'll have to have enough controls to make case-control comparisons meaningful. So, how many common diseases, and how will they be represented (males/females, early/late onset, related to what environmental lifestyles, etc.?)? One million volunteers isn't going to be representative or a large enough sample that has to be stratified for statistical analysis, especially if the sample also includes the ethnic diversity that the project promises.
And there's the epistemological problem of causation being too individualistic for this kind of hypothesis-free data fishing to solve--indeed, it is just that kind of research that has shown us clearly how that kind of research is not what we need now. We need research focused on problems that really are 'genetic', and some movement of resources to new thinking, rather than perpetuating the same kind of open-ended, 'Big Data' investment.
And more
In this context, the PR seems mostly to be spin for more money for NIH and its welfare clients (euphemistically called 'universities'). Every lock on Big Money for the Big Data lobby, or perhaps belief-system, excludes funding for focused research, for example, on diseases that would seem to be tractably understood by real science rather than a massive hypothesis-free fishing expedition.
How could the 1.4 billion dollars be better spent? A legitimate goal might be to do a trial run of a linked electronic records system as part of explicit move towards what we really need, and which would really include all of us; a real national healthcare system. This could be openly explained--we're going to learn how to run such a comprehensive system, etc., so we don't get overwhelmed with mistakes. But then for the very same reason, a properly representative project is what should be done. That would involve stratified sampling, and more properly thought-out design. But that would require new thinking about the actual biology.
And one can ask how the data quality can possibly be maintained if medical records of whoever volunteers vary in their quality, verifiability, and so on. But that is a technical issue. There are sociological and ontological issues as well.
All of Us?
Serving 'all of us' sounds very noble and representative. But let's see how sincere this publicly hyped promise really is. Using very rough figures, which will serve the point, there are 320 million Americans. So 1 million volunteers would be about 0.3% of 'all' of us. So first we might ask: What about achieving some semblance of real inclusive fairness in our society, by making a special effort to oversample African Americans, Hispanics, and Native Americans, before the privileged, mainly white, middle class get their names on the roles? That might make up for past abuses affecting their health and well-being.
So, OK, let's stop dreaming but at least make the sample representative of the country, white and otherwise. Does that imply fairness? There are, for example, about 300,000 Navajo Native Americans in the country. If All Of Us means what it promises, there would be about 950 Navajos in the sample. And about 56 Hopi tribespeople. And there are, of course, many other ethnic groups that would have to be included. Random (proportionate) sampling would include about 600,000 'white' people in the sample.
These are just crude subpopulation counts from superficial Google searching, but the point is that in no sense is the proposed self-selected sample of volunteers going to represent All Of Us in anything resembling fair distribution of medical benefits. You can't get as much detailed genomewide (not to mention environmental) data from a few hundred sampled individuals compared to hundreds of thousands. To be fair and representative in that sense, the sample would have to be stratified in some way rather than volunteer-based. It seems very unlikely that the volunteers who will be included are in some real sense going to be representative of the US, rather than, say of university and other privileged communities, major cities, and so on--even if not because of intentional bias but simply because they are more likely to learn of All Of Us and to participate.
Of course, defining what is fair and just is not easy. For example, there are far more Anglo Americans than Navajo or Hopi. So the Anglos might expect to get most of the benefits. But that isn't what All Of Us seems to be promising. To get adequate information from a small group, given the causal complexity we are trying to understand, they should probably be heavily oversampled. Even doing that would leave room for enough samples from the larger populations of Anglo and African-Americans adequate for the kind of discovery we could anticipate from this sort of Big Data study of causes of common disease.
More problems than sociology
That is the sociological problem of claiming representativeness of 'all' of us. But of course there is a deeper problem that we've discussed many times, and that is the false implied promise of essentially blanket (miracle?) cures for common diseases. In fact, we know very well that complex causation, of the common diseases that are the purported target of this initiative, involves tens to thousands of variable genome locations, not to mention the environmental ones that are beyond simple counting. Further, and this is a serious, nontrivial point, we know that these sorts of contributing causes include genetic and environmental exposures in the sampled individuals' futures, and these cannot be predicted, even in principle. These are the realities.
And, even if the project were truly representative of the US population demographically, as a sample of self-selected volunteers there remains the problem of representing diseases in the population subsets. Presumably this is why they are focusing on "common diseases", but still the sample will have to be stratified by possible causal exposures (lifestyles, diets, etc) and ethnicity, and then they'll have to have enough controls to make case-control comparisons meaningful. So, how many common diseases, and how will they be represented (males/females, early/late onset, related to what environmental lifestyles, etc.?)? One million volunteers isn't going to be representative or a large enough sample that has to be stratified for statistical analysis, especially if the sample also includes the ethnic diversity that the project promises.
And there's the epistemological problem of causation being too individualistic for this kind of hypothesis-free data fishing to solve--indeed, it is just that kind of research that has shown us clearly how that kind of research is not what we need now. We need research focused on problems that really are 'genetic', and some movement of resources to new thinking, rather than perpetuating the same kind of open-ended, 'Big Data' investment.
And more
In this context, the PR seems mostly to be spin for more money for NIH and its welfare clients (euphemistically called 'universities'). Every lock on Big Money for the Big Data lobby, or perhaps belief-system, excludes funding for focused research, for example, on diseases that would seem to be tractably understood by real science rather than a massive hypothesis-free fishing expedition.
How could the 1.4 billion dollars be better spent? A legitimate goal might be to do a trial run of a linked electronic records system as part of explicit move towards what we really need, and which would really include all of us; a real national healthcare system. This could be openly explained--we're going to learn how to run such a comprehensive system, etc., so we don't get overwhelmed with mistakes. But then for the very same reason, a properly representative project is what should be done. That would involve stratified sampling, and more properly thought-out design. But that would require new thinking about the actual biology.
Subscribe to:
Posts (Atom)

