Sunday, May 6, 2018

"All of us" Who are 'us'?

So the slogan du jour, All Of Us, is the name of a 1.4 billion dollar initiative being launched today by NIH Director Francis Collins.  The plan is to enroll one million volunteers in this mega-effort, the goal of which is, well, it depends.  It is either to learn how to prevent and treat "several common diseases" or, according to Dr Collins who talked about the initiative here, "It's gonna give us the information we currently lack" to "allow us to understand all of those things we don't know that will lead to better health care." He's very enthusiastic about All of Us (aka Precision Medicine), calling it a "national adventure that's going to transform medical care."  This might be viewed in the context of promises in the late 1900s that by now we'd basically have solved these problems--rather than needing ever-bigger longer-term 'data'.

And one can ask how the data quality can possibly be maintained if medical records of whoever volunteers vary in their quality, verifiability, and so on.  But that is a technical issue.  There are sociological and ontological issues as well.

All of Us?
Serving 'all of us' sounds very noble and representative.  But let's see how sincere this publicly hyped promise really is.  Using very rough figures, which will serve the point, there are 320 million Americans.  So 1 million volunteers would be about 0.3% of 'all' of us.  So first we might ask: What about achieving some semblance of real inclusive fairness in our society, by making a special effort to oversample African Americans, Hispanics, and Native Americans, before the privileged, mainly white, middle class get their names on the roles?  That might make up for past abuses affecting their health and well-being.

So, OK, let's stop dreaming but at least make the sample representative of the country, white and otherwise.  Does that imply fairness?  There are, for example, about 300,000 Navajo Native Americans in the country.  If All Of Us means what it promises, there would be about 950 Navajos in the sample.  And about 56 Hopi tribespeople.  And there are, of course, many other ethnic groups that would have to be included.  Random (proportionate) sampling would include about 600,000 'white' people in the sample.

These are just crude subpopulation counts from superficial Google searching, but the point is that in no sense is the proposed self-selected sample of volunteers going to represent All Of Us in anything resembling fair distribution of medical benefits.  You can't get as much detailed genomewide (not to mention environmental) data from a few hundred sampled individuals compared to hundreds of thousands.  To be fair and representative in that sense, the sample would have to be stratified in some way rather than volunteer-based.  It seems very unlikely that the volunteers who will be included are in some real sense going to be representative of the US, rather than, say of university and other privileged communities, major cities, and so on--even if not because of intentional bias but simply because they are more likely to learn of All Of Us and to participate.

Of course, defining what is fair and just is not easy.  For example, there are far more Anglo Americans than Navajo or Hopi.  So the Anglos might expect to get most of the benefits.  But that isn't what All Of Us seems to be promising.  To get adequate information from a small group, given the causal complexity we are trying to understand, they should probably be heavily oversampled.  Even doing that would leave room for enough samples from the larger populations of Anglo and African-Americans adequate for the kind of discovery we could anticipate from this sort of Big Data study of causes of common disease.

More problems than sociology
That is the sociological problem of claiming representativeness of 'all' of us.  But of course there is a deeper problem that we've discussed many times, and that is the false implied promise of essentially blanket (miracle?) cures for common diseases.  In fact, we know very well that complex causation, of the common diseases that are the purported target of this initiative, involves tens to thousands of variable genome locations, not to mention the environmental ones that are beyond simple counting.  Further, and this is a serious, nontrivial point, we know that these sorts of contributing causes include genetic and environmental exposures in the sampled individuals' futures, and these cannot be predicted, even in principle.  These are the realities.

And, even if the project were truly representative of the US population demographically, as a sample of self-selected volunteers there remains the problem of representing diseases in the population subsets.  Presumably this is why they are focusing on "common diseases", but still the sample will have to be stratified by possible causal exposures (lifestyles, diets, etc) and ethnicity, and then they'll have to have enough controls to make case-control comparisons meaningful. So, how many common diseases, and how will they be represented (males/females, early/late onset, related to what environmental lifestyles, etc.?)?  One million volunteers isn't going to be representative or a large enough sample that has to be stratified for statistical analysis, especially if the sample also includes the ethnic diversity that the project promises.

And there's the epistemological problem of causation being too individualistic for this kind of hypothesis-free data fishing to solve--indeed, it is just that kind of research that has shown us clearly how that kind of research is not what we need now.  We need research focused on problems that really are 'genetic', and some movement of resources to new thinking, rather than perpetuating the same kind of open-ended, 'Big Data' investment.

And more
In this context, the PR seems mostly to be spin for more money for NIH and its welfare clients (euphemistically called 'universities').  Every lock on Big Money for the Big Data lobby, or perhaps belief-system, excludes funding for focused research, for example, on diseases that would seem to be tractably understood by real science rather than a massive hypothesis-free fishing expedition.

How could the 1.4 billion dollars be better spent?  A legitimate goal might be to do a trial run of a linked electronic records system as part of explicit move towards what we really need, and which would really include all of us; a real national healthcare system.  This could be openly explained--we're going to learn how to run such a comprehensive system, etc., so we don't get overwhelmed with mistakes.  But then for the very same reason, a properly representative project is what should be done.  That would involve stratified sampling, and more properly thought-out design.  But that would require new thinking about the actual biology.

1 comment:

Ken Weiss said...

The issues are not new. The rationale of 'common' diseases was long ago, in the HapMap wedge project justifying Big Data, because if a disease is 'common' all groups have it, and the idea is that common variants are responsible, so they can be found in any ol' population like, for example, whites (who by chance have the money to buy drugs--I've been told this many times over the years, though always sotto voce).

There aren't easy answers, but we do now know that most individual instances of common diseases are due to very different non- or only partially overlapping genotypes. The alleles have different frequencies in different populations and their effects may depend on their genomic (and environmental) contexts. The point is that this is not an easy Big Data solution. It requires deeper thought, about the science itself, and less thought about how to get mega-funding too mega to terminate when diminishing returns set in (as, in many ways, they clearly already have).

Scientists should think about the problem, not about how to garner ever more resources. Science deserves support, but on a basis related to what is being investigated, not just as a welfare system for universities. I have elsewhere written about ways to begin to reform this system. But more, more, and even more is the way of life in science now, and reform is difficult.

Relying on advertising slogans like 'precision' medicine and 'All Of Us' is being encouraged, as how you get funding, when it should be penalized. Science should be a kind of sacred area, not just another part of the hurly-burly struggle for More.