Sunday, July 15, 2018

The problems are in physics, too!

We write in MT mainly about genetics and how it is used, misused, perceived, and applied these days.  That has been our own profession, and we've hoped to make cogent critiques that (if anybody paid any attention) might lead to improvement.  At least, we hope that changes could lead to far less greed, costly herd-like me-too research, and false public promises (e.g., 'precision genomic medicine')--and hence to much greater progress.

But if biology had problems, perhaps physics, with its solid mathematical foundation for testing theory, might help us see ways to more adequate understanding.  Yes, we had physics-envy!  Surely, unlike biology, the physical sciences are at least mathematically rigorous.  Unlike biology, things in the physical cosmos are, as Newton said in his famous Principia Mathematica, replicable: make an observation in a local area, like your lab, and it would apply everywhere.  So, if the cosmos has the Newtonian property of replicability, and the Galilean property of laws written in the language of mathematics, properties that were at the heart of the Enlightenment-period's foundation of modern science, then of course biologists (including even the innumerate Darwin) have had implicit physics envy.  And for more than a century we've thus borrowed concepts and methods in the hopes of regularizing and explaining biology in the same way that the physical world is described.  Not the least of the implications of this is a rather deterministic view of evolution (e.g., of force-like natural selection) and of genetic causation.

This history has we think often reflected a poverty of better fundamental ideas specific to biology.  Quarks, planets, and galaxies don't fight back against their conditions, the way organisms do!  Evolution, and hence life, are, after all, at the relevant level of resolution, fundamentally based on local variation and its non-replicability.  Even Darwin was far more deterministic in a physics-influenced way, than a careful consideration of evolution and variation warrants--and the idea of 'precision genomic medicine', so widely parroted by people who should know better (or who are fadishly chasing funds), flies in the face of what we actually know about life and evolution, and the fundamental differences between physics and biology.

Or so we thought!
Well, a fine new book by Sabine Hossenfelder, called Lost in Math, has given us a reality check if ever there was one.

In what is surely our culpable over-simplification, we would say that Hossenfelder shows that at the current level of frontier science, even physics is not so unambiguously mathematically rigorous as its reputation would have us believe.  Indeed, we'd say that she shows that physicists sometimes--often? routinely?--favor elegant mathematics over what is actually known.  That sounds rather similar to the way we favor simple, often deterministic ideas about life and disease and their evolution, based on statistical methods that assume away the messiness that is biology.  Maybe both sciences are too wedded to selling their trade to the public?  Or are there deeper issues about existence itself?

Hossenfelder eloquently makes many points about relevant ways to improve physics, and many are in the category of the sociology or 'political economics' of science--the money, hierarchies, power, vested interests and so on.  These are points we have harped on here and elsewhere, in regard to the biomedical research establishment.  She doesn't even stress them enough, perhaps, in regard to physics.  But when careers including faculty salaries themselves depend on grants, and publication counts, and when research costs (and the 'overhead' they generate) are large and feed the bureaucracy, one can't be surprised at the problems, nor that as a result science itself, the context for these socioeconomic factors, suffers.  Physics may require grand scale expenses (huge colliders, etc.) but genetics has been playing copy-cat for decades now, in that respect, entrenching open-ended Big Data projects.  One can debate--we do debate--whether this is paying off in actual progress.

Science is a human endeavor, of course, and we're all vain and needy.  Hossenfelder characterizes these aspects of the physics world, but we see strikingly similar issues in genomics and related 'omics areas.  We're sure, too, that physicists are like geneticists in the way that we behave like sheep relative to fads, while only some few are truly insightful.  Perhaps we can't entirely rid ourselves of the practical, often fiscal distractions from proper research.  But the problems have been getting systematically and palpably worse in recent decades, as we have directly experienced.  This has set the precedent and pattern for strategizing science, to grab long-term big-cost support, and so on.  Hossenfelder documents the same sorts of things in the physics world.

Adrift in Genetics
In genetics, we do not generally have deterministic forces or causation.  Genotypes are seen as determining probabilities of disease or other traits of interest.  It is not entirely clear why we have reached this state of affairs.  For example, in Mendel's foundational theory, alleles at genes (as we now call them) were transmitted with regular probabilities, but once inherited their causative effects were deterministic.  The discovery of the genetics of sexual reproduction, one chromosome set inherited from each parent, and one set transmitted to each offspring, showed why this could be the case.  The idea of independent, atomic units of causation made sense, and was consistent with the developing sciences of physics and chemistry in Mendel's time as he knew from lectures he attended in Vienna.

However, Mendel carefully selected clearly segregating traits to study, and knew not all traits behaved this way.  So an 'atomic' theory of biological causation was in a sense following 19th century science advances (or fads), and was in that sense forced onto selective data.  It was later used to rationalize non-segregating traits by the 'modern evolutionary synthesis' of the early 1900s.  But it was a theory that, in a sense, 'atomized' genetic causation in a physics-like way, with essentially the number of alleles being responsible for the quantitative value of a trait in the organism.  This was very scientific in the sense of science at the time.

Today, by contrast, the GWAS approach treats even genetic causation itself, not just its transmission, as somehow probabilistic.  The reasons for this are badly under-studied and often rationalized, but might in reality be at the core of what would be a proper theory of genetic causation.  One can, after the fact, rationalize genotype-based trait 'probabilities', but this is in deep ways wrong: it borrows from  physics the idea of replicability, and then equates retrospective induction (the results in a sample of individuals with or without a disease, for example), with prospective risks.  That is, it tacitly assumes a kind of causally gene-by-gene deterministic probability.  One deep fallacy in this is that a gene's effects can be isolated, but genes are in themselves inert: only by interacting do DNA segments 'do' anything.  Far worse, one may say epistemologically worse if not fatal, is that we know that future conditions in life, unlike those in the cosmos, are not continuous, deterministic, or predictable.

That is, extending induction to deduction is tacitly assumed in genomics, but is an unjustified convenience.  Indeed, we know the prevalence of traits like stature or disease changes with time, and along with literally unpredictable future lifestyle exposures and mutations.  So assuming a law-like extensibility from induction to deduction is neither theoretically or practically justifiable.

But to an extent we found quite surprising, being naive about physics, what we do in crude ways in genetics much resembles how physics rationalizes its various post hoc models to explain the phenomena outlined in Hossenfelder's book.  Our behavior seems strikingly similar to what Lost in Math shows about physics, but perhaps with a profound difference.

Lost in statistics
Genetic risk is expressed statistically (see polygenic risk scores, e.g.).  Somehow, genotypes affect not the inevitability but the probability that the bearer will have a given trait or disease.  Those are not really probabilities, however, but retrospective averages estimated by induction (i.e., from present-day samples that reflect past-experience).  Only by equating induction with deduction, and averages with inherent parameters, indeed, that take the form of probabilities, can we turn mapping results into 'precision' genomic predictions (which seems to assume, rather nonsensically, that the probability is a parameter that can be measured with asymptotic precision).

For example, if a fraction p of people with a given genotype in our study, have disease x, there is no reason to think that they were all at the same 'risk', much less that in some future sample the fraction will be same.  So, in what sense, in biology at least, is a probability an inherent parameter?  If it isn't, what is the basis of equating induction with deduction even probabilistically?

There is, we think, an even far deeper problem.  Statistics, the way we bandy the term about, is historically largely borrowed from the physical sciences, where sampling and measurement issues affect precision--and, we think profoundly, phenomena are believed to be truly replicable.  I'd like to ask Dr Hossenfelder about this, but we, at least, think that statistics developed in physics largely to deal with measurement issues when rigorous deterministic parameters were being estimated.  Even in quantum physics probabilities seem to be treated as true underlying parameters at least in the sense of being observational aspects of measuring deterministic phenomena (well, don't quote us on this!).

But these properties are [sic] precisely what we do not have in biology.  Biology is based on evolution which is inherently based on variation and its relation to local conditions over long time periods.  This does not even consider the vagaries of (sssh!) somatic mutation, which makes even 'constitutive' genotypes, the basic data of this field, an illusion of unknowable imprecision (e.g., it differs uniquely with individual, age, tissue, and environmental exposure).

In this sense, we're also Lost in Statistics.  Our borrowing of scientific notions from the history of physical sciences, including statistics and probability, is a sign that we really have not yet developed an adequate much less mature theory of biology.  Physics envy, even if physics was not Lost in Math, is the result of the course of science history, a pied piper for the evolutionary and genetic sciences.  It is made worse by the herd-like behavior of human activities, especially under the kinds of careerist pressures that have been built into the academic enterprise.  Yet the profession seems not even to recognize this, much less seriously to address it!

Taking what we know in biology seriously
The problems are real and while they'll never be entirely fixed, because we're only human, they are deeply in need of reform.  We've been making these points for a long time in relation to genetics, but perhaps naively didn't realize similar issues affected the fields of physics which appear, at least to the outsider, much more rigorous.

Nonetheless, we do think that the replicability aspects of physics, even with its frontier uncertainties, make it more mathematically--more parametrically--tractable compared to evolution and genetics, because the latter depend on non-replication.  This is fundamental, and we think suggests the need for really new concepts and methods, rather than ones essentially borrowed from physics.

At a higher and more profound, but sociological level, one can say that the research enterprise is lost in much more than math.  It will never be perfect; perhaps it can be perfected, but that may require much deeper thinking than even physics requires.

This is just our view: take a serious look at Hossenfelder's  assessment of physics, and think about it for yourself.

Thursday, June 14, 2018

Thinking about science upon entering the field. Part III: Ethics and Responsibilities

Here is the third of a four-part series of posts by Tristan Cofer, a graduate student in chemical ecology here at Penn State.  He has been thinking about the profession he is being trained for, and the broader setting in which it is taking place, and into which he will have a place:

Growing up in a medical household, I remember being more than just a little impressed by, what seemed to me, to be the many responsibilities that physicians were expected to have towards their patients. Serving on call every third or fourth night, working weekends and holidays, and, not to mention, the years spent in school or as a resident and intern, seemed to me to go beyond the so–called Hippocratic imperative to ‘first, do no harm’, and instead to border on an ethical mandate that one should always strive to do the most good. I am no doubt, engaging in some hero worship here, and I concede that the extent to which this mandate actually informs a physician’s conduct (much less whether it really exists) is debatable. However, I would argue that for many people, myself included, ‘good medicine’, by and large, means medicine that does the most good.

This relationship between healthcare and ethical responsibility is perhaps unsurprising given the influence that physicians have over our, and our loved ones’, mental and physical wellbeing. Simply put, we want to know that the people that we trust with the things that are most important to us are indeed trustworthy. That being said, I find it somewhat curious that, by comparison, we in the scientific community are not held to a similar ethical standard. This, to me, begs the often-unconsidered, if not outwardly ignored, question: What are our social responsibilities as scientists?

Science, like medicine, is embedded in the culture(s) in which it is practiced. It is a humanistic enterprise in that we as humans undertake it, and like all everything we do, it comes with baggage that oftentimes remains unchecked. I wouldn’t claim here that scientists give no consideration to the social frameworks in which they work (that would be both unfair and untrue); only that, based on my own experiences thus far in graduate school, discussions about a scientist’s social responsibilities have been mostly self-interested, concerning internal matters such as research ethics and the like. These conversations are no doubt valuable, in that we need to know that our colleagues are doing work that we can trust and build on; however, they hardly encourage one to think beyond their rather limited responsibilities to our chosen profession.

How much, for instance, should we expect our research to reflect the public’s values and interests? Because research is typically funded by tax-payer dollars, one might argue that, by extension, it is also carried out in their name. Is it, therefore, ethically reprehensible to conduct research that does not directly benefit the public in some way? Are we not also obligated to set research objectives with minority or special interests groups in mind? What happens when our interests conflict with the public’s? For example, can we defend using public funding to conduct research in evolutionary biology, knowing that some groups vehemently oppose teaching evolutionary theory?

Moreover, how should we deal with situations in which our internal responsibilities to ‘Science’ and our external responsibilities to the public are at odds with each other? Is it permissible to develop technologies that can quite literally change the world, without considering the people with whom we share it? Is this even possible? Are we even the best candidates to answer these questions, or should we consult ‘outsiders’ from the humanities and elsewhere in our discussions concerning the questions mentioned above? These discussions may seem like an unnecessary hindrance to scientific advancement, and perhaps they are. But maybe, that’s what we need.

Admittedly, I might be barking up the wrong tree here. Yes, Science has the potential to greatly benefit and harm the public, but so too do politics, business, and any other enterprise with deep pockets and a global reach. As a friend, much smarter than myself, once told me, maybe ‘Science is no more than a good way to keep smart people off the street’. At the end of the day, we all need to make a living, and conversations like these have the potential to make that harder to do. For better or worse, there is considerable pressure (both external and self-imposed) on scientists to do whatever they need to in order to bring in grants, publish to get tenure and advance their careers, and appease the powers-that-be to protect their self-interests. Most people either don’t want to, or can’t, risk rocking the proverbial boat—especially when there is little precedent to do so.

A new biomedical insight?

Here is a thoughtful and timely quote:
". . . . as no single disease can be fully understood in a living person; for every living person has his individual peculiarities and always has his own peculiar, new, complex complaints unknown to medicine—not a disease of the lungs, of the kidneys, of the skin, of the heart, and so on, as described in medical books, but a disease that consists of one out of the innumerable combinations of ailments of those organs. This simple reflection can never occur to doctors . . . . because it is the work of their life to undertake the cure of disease, because it is for that that they are paid, and on that they have wasted the best years of their life.  And what is more, that reflection could not occur to the doctors because they saw that they unquestionably were of use . . .  not because they made the patient swallow drugs, mostly injurious (the injury done by them was hardly perceptible because they were given in such small doses). They were of use, were needed, were indispensable in fact (for the same reason that there have always been, and always will be, reputed healers, witches, homÅ“opaths and allopaths), because they satisfied the moral cravings of the patient . . . . They satisfied that eternal human need of hope for relief, that need for sympathetic action that is felt in the presence of suffering, that need that is shown in its simplest form in the little child, who must have the place rubbed when it has hurt itself. The child . . . . feels better for the kissing and rubbing. The child cannot believe that these stronger, cleverer creatures have not the power to relieve its pain. . . ."
The language seems a bit arcane, and this is a translation, but its cogency as a justification for today's Big Data feeding frenzy is clear.  People who are ill, or facing death, will naturally grasp at whatever straws may be offered them.  In one way or another, this has been written about even back to Hippocrates.

Of course, palliation or cure of what disorders can be eased or cured should be the first order and obligation of medicine.  Where nothing like that is clearly known, trials of possible treatments are surely in order, if the patient understands at least the basic nature of the research, for example, that some are being given placebos while others the treatment under investigation.  Science doesn't know everything, and we often must learn the hard way, by trial and error.

Given that, perhaps the most important job of responsible science is to temper its claims, and to offer doses of the reality that life is a temporary arrangement, and that we need to get the most out of that bit of it to which we are privileged to have.  So research investment should be focused on tractable, definable problems, not grandiose open-ended schemes.  But promises of the latter are nothing new to society (in medicine or other realms of life).

The problem with false promises, by preachers of any type, is that they mislead the gullible, and in many cases this is known by those making the promises--or could and should be known.  The role of false promise in religion is perhaps debatable, but its role in science, while understandable given human ego and the struggle for attention, careers, and funding, is toxic.  People suffering, of poverty, hardship, or disease, seek and deserve solace.  But science needs to be protected from the temptations of huckstering, so that it can do its very important business as objectively as is humanly possible. 

By the way, the quote is from about 150 years ago, from War and Peace, Tolstoy's 1869 masterpiece about the nature of causation in human affairs.

Wednesday, May 9, 2018

Are common diseases generally caused by 'common' variants?

The funding mantra of genomewide mapping is that common variants cause common disease (CV-CD).  This was convenient for HapMap and other association-based attempts to find genetic causation.  The approach didn't require very dense genotyping or massive sample sizes, for example. Normally, based on Mendel's widely known experiments and so on, one would expect anything 'genetic' to run in families; however, because of notions like low penetrance--low probability of having the trait even if you've inherited the genetic variants--small nuclear families can't work, as a rule, and big enough families would be too costly or even impossible to ascertain.  In particular, for traits due to the effects of many genes or environmental factors and/or only weak causal variants' effects, families would not really be practicable.

So, conveniently, when DNA sequencing on a genomewide scale became practicable, the idea was that sequence variants might not have wholly determinative effects but the effects might be enough that we just need to find them in the population as a whole, not the smallish families that we have a hard enough time ascertaining.  People carrying such a variant would have a higher probability of showing the trait.

It was a convenient short-cut, but there is a legitimate evolutionary rationale behind this:  The same mutation will not recur very often so that if there are many copies of a causative allele (sequence variant) in a population, these are probably identical by descent (IBD), from a single ancestral mutational event.  In that sense, genomewide association studies (GWAS) are finding family members carrying the same allele, but without having to work through the actual (inaccessible, very large, multi-generation ancestral) pedigrees that connect them.  If the IBD assumption were not basically true, then different instances of the same nucleotide change will have different local genomic backgrounds, and the effects will often or even likely vary among the descendants of different mutations, affecting association tests, though the analysis rarely, if ever, attempts to detect or adjust for this.

In principle it can work well if a trait really is caused by alleles at a tractably small number of genes.  That's a very big 'if', but assuming it, which is similar to assuming the trait is a classical Mendelian trait, then one can find association of the allele with the trait among affected people, because, at that site at least, they are distant relatives.  If it is to be considered, though, the effect of a given allele does have to be strong enough, and its frequency in the sample large enough, to pass a statistical significance test.  This is a potential major issue since in very large samples searching countless sites across the genome, reaching significance means many observations, and in a sense requires an allele's frequency and/or individual impact to be high.

In essence, this is the underpinning and implicit justification for the huge GWAS empire.  There are many details, but one important assertion by the leaders of the new EverBigger (and more costly) AllOfUs project, is that common diseases are their target.  Rare diseases generally just won't show up often enough to find statistically reliable 'hits'.

Of course, 'common' is a subjective term, and if one searches millions of genome sites and their allele frequencies vary in the sample, tons of them might be 'common' by such a definition.  And they will have to have strong enough effects to be detectable as well based on suitably convincing significance criteria.  So we might expect CV-CD to be a proper description of such studies.  But there is a subtle difference: the implication (and once, 20 years ago, the de facto expectation) is that this meant that one or a few common variants cause the common disease.

Obviously, if that assumption of convenience were roughly true, then one can think of pharmaceutical or other preventive measures to target the causal variants in these genes in affected persons.  In fact, we have largely based the nearly 20-year GWAS effort on such a wedge rationale, starting with smaller-scale projects like HapMap.  Unfortunately, that was a huge success!

Why unfortunately?  Because, no matter how you define 'common', what we've clearly found, time and again, trait after trait, is that these common diseases are in each case due to effects of a different set of 'common' alleles whose effects are individually weak.  In that sense, the individual allele per se is not very predictive, because many unaffected people also carry that allele.  Every case is genetically unique so one Pharma does not fit all.  It is, I would assert, highly misrepresentative if not irresponsible to suggest otherwise, as is the common PR legerdemain.

Instead, what we know very clearly is that in many or most 'common' disease instances, since each case is caused by different sets of alleles, not only is each case causally unique, but no one allele is, in itself nearly necessary for the disease.  There isn't usually a single 'druggable' target of Pharma's dreams.  There was perhaps legitimate doubt about this 20 years ago when the adventure began, but no longer.

Indeed, it is generally rare for anything close to most of cases, compared to controls, to share any given allele, and even when that happens, the cause, as statistically estimated, say, by comparing cases and controls, is usually only slightly attributable to that allele's effects.  Even then most variation is typically not being accounted for, as measured by the trait's estimated heritability, because it seems due to a plethora of alleles too weak or rare to be detected in the sample, even if they're there and are, collectively, the greatest risk contributor. And, of course, we've not mentioned lifestyles and other environmental factors, nor the often largely non-overlapping results from different populations, nor various other factors.

The non-Mendelian Mendelian reality of life
I think that as a community we were led into this causal cul de sac by taking Mendel too literally or too hopefully.  To be sure, some traits are qualitative--they appear in two or a few distinct states, like green vs yellow peas--and these are basically the kinds of traits Mendel studied, because they were tractable.  In such cases each gene transmits in families in a regular way, that in his honor we call 'Mendelian'.  And human genetics had great success identifying them and their causal genes (cystic fibrosis is one well-known example but there have been many others).  However, common diseases are generally not caused by individual alleles at single genes.  Quantitative geneticists, such as agricultural breeders have basically known about the complexity of most traits for a century, even if specific contributing genes couldn't be identified until methods like GWAS came along 15-20 years ago.

Since we know all this now, from countless studies, it is irresponsible to hijack huge funding for more and more again of the same, based on a CV-CD promise that neither the public nor many investigators understand (or if they do, dare acknowledge).  One might go farther and suggest that this makes 'CV-CD' a semantic shell-game, that the Congress and public are still buying--bravely assuming that the administrators and scientists themselves, who are pushing this view, actually understand the genomic (and environmental) landscape.

NIH Director Collins is busy and has to worry about his institute's budget.  He may or may not know the kinds of things we've mentioned here--but he should!  His staff and his advisors should!  We have not invented them, no matter whether we've explained them fully or precisely enough.  We have no vested interest in the viewpoint we're expressing.  But the evidence shows that research should now be capitalizing, so to speak, on what we've actually learned from the genomic mapping era, rather than just doing more of the same, no matter how safe that is for careers (a structural problem that society should remedy).

Instead of ever more wheel-spinning, what we really need is new thinking, different rather than just more of the same Big Data enumeration.  Until new ideas bubble up, neither we nor anyone else can specify what they should be.  Continuing to pay for ever bigger data serves several immediate interests very well: the academic enterprise whose lifeblood includes faculty salaries and overhead funding for research done in their institution, the media and equipment suppliers who thrive on ever-biggerness, and the administrators and scientists whose imagination is too impoverished to generate some actual ideas.  More is easier, more insightful is very much harder.

So, yes, common diseases are caused by common variants--tens or hundreds of them!  Enumerating them is becoming a stale, repetitive costly business and maybe 'business' is the right word.  The public is paying for more, but in a sense getting less.  Until some day, someone thinks differently.

Sunday, May 6, 2018

"All of us" Who are 'us'?

So the slogan du jour, All Of Us, is the name of a 1.4 billion dollar initiative being launched today by NIH Director Francis Collins.  The plan is to enroll one million volunteers in this mega-effort, the goal of which is, well, it depends.  It is either to learn how to prevent and treat "several common diseases" or, according to Dr Collins who talked about the initiative here, "It's gonna give us the information we currently lack" to "allow us to understand all of those things we don't know that will lead to better health care." He's very enthusiastic about All of Us (aka Precision Medicine), calling it a "national adventure that's going to transform medical care."  This might be viewed in the context of promises in the late 1900s that by now we'd basically have solved these problems--rather than needing ever-bigger longer-term 'data'.

And one can ask how the data quality can possibly be maintained if medical records of whoever volunteers vary in their quality, verifiability, and so on.  But that is a technical issue.  There are sociological and ontological issues as well.

All of Us?
Serving 'all of us' sounds very noble and representative.  But let's see how sincere this publicly hyped promise really is.  Using very rough figures, which will serve the point, there are 320 million Americans.  So 1 million volunteers would be about 0.3% of 'all' of us.  So first we might ask: What about achieving some semblance of real inclusive fairness in our society, by making a special effort to oversample African Americans, Hispanics, and Native Americans, before the privileged, mainly white, middle class get their names on the roles?  That might make up for past abuses affecting their health and well-being.

So, OK, let's stop dreaming but at least make the sample representative of the country, white and otherwise.  Does that imply fairness?  There are, for example, about 300,000 Navajo Native Americans in the country.  If All Of Us means what it promises, there would be about 950 Navajos in the sample.  And about 56 Hopi tribespeople.  And there are, of course, many other ethnic groups that would have to be included.  Random (proportionate) sampling would include about 600,000 'white' people in the sample.

These are just crude subpopulation counts from superficial Google searching, but the point is that in no sense is the proposed self-selected sample of volunteers going to represent All Of Us in anything resembling fair distribution of medical benefits.  You can't get as much detailed genomewide (not to mention environmental) data from a few hundred sampled individuals compared to hundreds of thousands.  To be fair and representative in that sense, the sample would have to be stratified in some way rather than volunteer-based.  It seems very unlikely that the volunteers who will be included are in some real sense going to be representative of the US, rather than, say of university and other privileged communities, major cities, and so on--even if not because of intentional bias but simply because they are more likely to learn of All Of Us and to participate.

Of course, defining what is fair and just is not easy.  For example, there are far more Anglo Americans than Navajo or Hopi.  So the Anglos might expect to get most of the benefits.  But that isn't what All Of Us seems to be promising.  To get adequate information from a small group, given the causal complexity we are trying to understand, they should probably be heavily oversampled.  Even doing that would leave room for enough samples from the larger populations of Anglo and African-Americans adequate for the kind of discovery we could anticipate from this sort of Big Data study of causes of common disease.

More problems than sociology
That is the sociological problem of claiming representativeness of 'all' of us.  But of course there is a deeper problem that we've discussed many times, and that is the false implied promise of essentially blanket (miracle?) cures for common diseases.  In fact, we know very well that complex causation, of the common diseases that are the purported target of this initiative, involves tens to thousands of variable genome locations, not to mention the environmental ones that are beyond simple counting.  Further, and this is a serious, nontrivial point, we know that these sorts of contributing causes include genetic and environmental exposures in the sampled individuals' futures, and these cannot be predicted, even in principle.  These are the realities.

And, even if the project were truly representative of the US population demographically, as a sample of self-selected volunteers there remains the problem of representing diseases in the population subsets.  Presumably this is why they are focusing on "common diseases", but still the sample will have to be stratified by possible causal exposures (lifestyles, diets, etc) and ethnicity, and then they'll have to have enough controls to make case-control comparisons meaningful. So, how many common diseases, and how will they be represented (males/females, early/late onset, related to what environmental lifestyles, etc.?)?  One million volunteers isn't going to be representative or a large enough sample that has to be stratified for statistical analysis, especially if the sample also includes the ethnic diversity that the project promises.

And there's the epistemological problem of causation being too individualistic for this kind of hypothesis-free data fishing to solve--indeed, it is just that kind of research that has shown us clearly how that kind of research is not what we need now.  We need research focused on problems that really are 'genetic', and some movement of resources to new thinking, rather than perpetuating the same kind of open-ended, 'Big Data' investment.

And more
In this context, the PR seems mostly to be spin for more money for NIH and its welfare clients (euphemistically called 'universities').  Every lock on Big Money for the Big Data lobby, or perhaps belief-system, excludes funding for focused research, for example, on diseases that would seem to be tractably understood by real science rather than a massive hypothesis-free fishing expedition.

How could the 1.4 billion dollars be better spent?  A legitimate goal might be to do a trial run of a linked electronic records system as part of explicit move towards what we really need, and which would really include all of us; a real national healthcare system.  This could be openly explained--we're going to learn how to run such a comprehensive system, etc., so we don't get overwhelmed with mistakes.  But then for the very same reason, a properly representative project is what should be done.  That would involve stratified sampling, and more properly thought-out design.  But that would require new thinking about the actual biology.

Thursday, April 26, 2018

Gene mapping: More Monty Python than Monty Python

The gene for ...... (Monty  Python)
Here's a link to a famous John Cleese (of Monty Python fame) sketch on gene mapping.  We ask you to decide whether this is funnier than the daily blast of GWAS reports and their proclaimed transformative findings: which is more Monty than the full Monty.

Why we keep spending money on papers that keep showing how MontyPythonish genomewide association with complex traits is, is itself a valid question.  To say, with a straight face, that we now know of hundreds, much less of thousands, of genomewide sites that affect some trait--in some particular sample of humans, with much or most of the estimated heritability yet unaccounted for, without saying that enough is enough, is almost in itself a comedy routine.

We have absolutely no reason--or, at least, no need--to criticize anything about individual mapping papers.  Surely there are false findings, mis-used statistical tests, and so on, but that is part of the normal life in science, because we don't know everything and have to make assumptions, etc.  Some of the findings will be ephemeral, sample-specific, and so on.  That doesn't make them wrong.  Instead, the critique should be aimed at authors who present such work with a straight face as if it is (1) important, (2) novel in any really novel way, and (3) not saying that the paper shows why, by now with so many qualitatively similar results, we should stop public funding of this sort of work.  We should move on to more cogent science that reflects, but doesn't just repeat, the discovery of genomic causal (or, at least, associational) complexity.

The bottom line
What these studies show, and there is no reason to challenge the results per se, is that complex traits are not to be explained by simple, much less additive genetic models.  There is massive causal redundancy with similar traits due to dissimilar genotypes.  But this shouldn't be a surprise.  Indeed, we can easily account for this in terms of evolutionary phenomena, both related to processes like gene duplication and the survival protection that alternative pathways provides.

Even if each GWAS 'hit' is correct and not some sort of artifact, it is unclear what the message is.  To us, who have no vested interest in continuing, open-ended GWAS efforts with ever-larger samples, the bottom line is that this is not the way to understand biological causation.

We reach that view on genomic considerations alone, without even considering the environmental and somatic mutation components of phenotype generation, though these are often obviously determinative (as secular trends in risk clearly show).  We reach this view without worrying about the likelihood that many or perhaps even most of these 'hits' are some sort of statistical, sampling, analytic or other artifact, or are so indirectly related to the measured trait, or so environment-dependent as to be virtually worthless in any practical sense.

What GWAS ignore
There are also three clear facts that are swept under the rug, or just ignored, in this sort of work.  One is somatic mutation, which are not detected in constitutive genomewide studies but could be very important (e.g., cancer).  The second is that DNA is inert and does something only in interaction with other molecules.  Many of those relate to environmental and lifestyle exposures, which candid investigators know are usually dreadfully inaccurately measured.  The third is that future mutations, not to mention future environments are unpredictable, even in principle.  Yet the repeatedly stressed objective of GWAS is 'precision' predictive medicine.  It sounds like a noble objective, but it's not so noble given the known and knowable reasons these promises can't be met.

So, if biological causation is complex, as these studies and diverse other sorts of direct and indirect evidence clearly show, then why can't we pull the plug on these sorts of studies, and instead, invest in some other mode of thinking, some way to do focused studies where genetic causation is clear and real, rather than continuing to feed the welfare state of GWAS?

We're held back by inertia, and the lack of better ideas, but another important if not defining constraint is that investigator careers depend on external funding and that leads to safe me-too proposals.  We should stop imitating Monty Python, and recognize that if the gene-causation question even makes sense, some new way of thinking about it is needed.

Wednesday, April 25, 2018

Improving access to healthcare can usually make malaria go away

Drug resistant malaria has emerged in Southeast Asia several times in history and subsequently spread globally. When there are no other antimalarials to use this has led to public health and humanitarian disasters, especially in high transmission settings (parts of sub-Saharan Africa).

Currently there is a single effective antimalarial left: Artemisinin. But malaria parasites in Southeast Asia are already developing resistance to this antimalarial, leading many in the malaria research community and in public health to worry that we will soon be left with untreatable malaria.

One proposed solution to this problem has been to attempt to eliminate the parasite from regions where drug resistance consistently emerges. The proposed strategy uses a combination of increasing access to health care (so that ill people can be quickly diagnosed and treated, therefore reducing transmission) and targeting asymptomatic reservoirs by asking everyone who lives in a community where there is a large reservoir to take antimalarials, regardless of whether or not they feel ill (mass drug administration).

In Southeast Asia malaria largely persists in areas that are difficult to access and remote. The parasite thrives in conflict zones and in the fringes of society. These are the areas that frequently don’t have strong healthcare or surveillance systems and some have even argued that control or elimination would be impossible in such areas because of these difficulties.

Today on World Malaria Day my colleagues and I published the results after 3 years of an elimination campaign in Karen State of Myanmar.  The job is not complete. But this work has shown that it is feasible to set up a health care system, even in remote and difficult-to-access areas, and that most villages can achieve elimination through beefing up of the health care system alone. In places where there are high proportions of people with asymptomatic malaria, access to health care alone doesn’t suffice and malaria persists for a longer period of time. With high participation in mass drug administration, which requires a large amount of community engagement, these communities are able to quickly eliminate the parasites as well. We are hopeful that similar programs will be expanded throughout Southeast Asia, regardless of the geographic and political characteristics of the regions, so that elimination can be achieved and sustained.

Malaria (P. falciparum) incidence in the target area over three years. The project expanded over the three years, and overall incidence has decreased.

Link to the main paper:
Effect of generalised access to early diagnosis and treatment and targeted mass drug administration on Plasmodium falciparum malaria in Eastern Myanmar: an observational study of a regional elimination programme

Link to a detailed description of the setup of the project: