The Mermaid's Tale: GWAS

Showing posts with label GWAS. Show all posts

Friday, October 19, 2018

Nyah, nyah! My study's bigger than your study!!

It looks like a food-fight at the Precision Corral! Maybe the Big Data era is over! That's because what we really seem to need (of course) is even bigger GWAS or other sorts of enumerative (or EnumerOmics studies, because then (and only then) will we really realize how complex traits are caused, so that we can produce 'precision' genomic medicine to cure all that ails us. After all, there no such thing as enough 'data' or a big (and open-ended) enough study. Of course, because so much knowledge....er, money....is at stake, such a food-fight is not just children in a sand box, but purported adults, scientists even, wanting more money from you, the taxpayer (what else?). The contest will never end on its own. It will have to be ended from the outside, in one way or another, because it is predatory: it takes resources away from what might be focused, limited, but actually successful problem-solving research.

The idea that we need larger and larger GWAS studies, not to mention almost any other kind of 'omics enumerative study, reflects the deeper idea that we have no idea what to do with what we've got. The easiest word to say is "more", because that keeps the fiscal flood gates open. Just as preachers keep the plate full by promising redemption in the future--a future that, like an oasis to desert trekkers, can be a mirage never reached, scientists are modern preachers who've learned the tricks of the trade. And, of course, since each group wants its flood gates to stay wide open it must resist any even faint suggestion that somebody else's gates might open wider.

There is a kind of desperate defense, as well as food fight, over the situation. This, at least, is one way to view a recent exchange between an assertion by Boyle et al. (Cell 169(7):1177-86, 2018**) that some few key genes perhaps with rare alleles scattered across the genome are the 'core' genes responsible for complex diseases, but that lesser often indirect or incidental genes across the genome provide other pathways to affect a trait, and are detected in GWAS. If a focus on this model were to take place, it might threaten the gravy train of more traditional, more mindless, Big Data chasing. As a plea to avoid that is Wray et al.'s falsely polite spitball in return (Cell 173:1573-80, 2018**) urging that things really are spread all over the genome, differently so in everyone. Thus, of course, the really true answer is some statistical prediction method, after we have more and even larger studies.

Could it be, possibly, that this is at its root merely a defense of large statistical data bases and Big Data per se, expressed as if it were a legitimate debate about biological causation? Could it be that for vested interests, if you have a well-funded hammer everything can be presented as if it were a nail (or, rather, a bucket's worth of nails, scattered all over the place)?

Am I being snide here?
Yes, of course. I'm not the Ultimate Authority to adjudicate about who's right, or what metric to use, or how many genome sites, in which individuals, can dance on the head of the same 'omics trait. But I'm not just being snide. One reason is that both the Boyle and Wray papers are right, as I'll explain.

The arguments seem in essence to assert that complex traits are due either to many genetic variants strewn across the genome, or to a few rare larger-effect alleles here and there complemented by nearby variants that may involve indirect pathways to the 'main' genes, and that these are scattered across the genome ('omnigenic'). Or that we can tinker with GWAS results and various technical measurements from them to get the real truth?

We are chasing our tails these days in an endless-seeming circle to see who can do the biggest and most detailed enumerative study, to find the most and tiniest of effects, with the most open-ended largesse, while Rome burns. Rome, here, are the victims of the many diseases which might be studied with actual positive therapeutic results by more, focused, if smaller, studies. Or, in many cases, by a real effort at revealing and ameliorating the lifestyle exposures that typically, one might say overwhelmingly, are responsible for common diseases.

If, sadly, it were to turn out that there is no more integrative way, other than add-'em-up, by which genetic variants cause or predispose to disease, then at least we should know that and spend our research resources elsewhere, where they might do good for someone other than universities. I actually happen to think that life is more integratively orderly than its effects typically being enumeratively additive, and that more thoughtful approaches, indeed reflecting findings of the decades of GWAS data, might lead to better understanding of complex traits. But this seemingly can't be achieved by just sampling extensively enough to estimate 'interactions'. The interactions may, and I think probably, have higher-level structure that can be addressed in other ways.

But if not, if these traits are as they seem, and there is no such simplifying understanding to be had, then let's come clean to the public and invest our resources in other ways to improve our lives before these additive trivia add up to our ends when those supporting the work tire of exaggerated promises.

Our scientific system, that we collectively let grow like mushrooms because it was good for our self interests, puts us in a situation where we must sing for our supper (often literally, if investigators' salary depends on grants). No one can be surprised at the cacophony of top-of-the-voice arias ("Me-me-meeeee!"). Human systems can't be perfect, but they can be perfected. At some point, perhaps we'll start doing that. If it happens, it will only partly reflect the particular scientific issues at issue, because it's mainly about the underlying system itself.

**NOTE: We provide links to sources, but, yep, they are paywalled --unless you just want to see the abstract or have access to an academic library. If you have the looney idea that as a taxpayer you have already paid for this research so private selling of its results should be illegal--sorry!--that's not our society.

Tuesday, October 16, 2018

Where has all the thinking gone....long time passing?

By Ken Weiss

Where did we get the idea that our entire nature, not just our embryological development, but everything else, was pre-programmed by our genome? After all, the very essence of Homo sapiens compared to all other species, is that we use culture--language, tools, etc.--to do our business rather than just our physical biology. In a serious sense, we evolved to be free of our bodies, our genes made us freer from our genes than most if not all other species! And we evolved to live long enough to learn--language, technology, etc.--in order to live our thus-long lives.

Yet isn't an assumption of pre-programming the only assumption by which anyone could legitimately promise 'precision' genomic medicine? Of course, Mendel's work, adopted by human geneticists over a century ago, allowed great progress in understanding how genes lead at least to the simpler of our traits, with discrete (yes/no) manifestations, traits that do include many diseases that really, perhaps surprisingly, do behave in Mendelian fashion, and for which concepts like dominance and recessiveness been applied and that, sometimes, at least approximately hold up to closer scrutiny.

Even 100 years ago, agricultural and other geneticists who could do experiments, largely confirmed the extension of Mendel to continuously varying traits, like blood pressure or height. They reasoned that many genes (whatever they were, which was unknown at the time) contributed individually small effects. If each gene had two states in the usual Aa/AA/aa classroom example sense, but there were countless such genes, their joint action could approximate continuously varying traits whose measure was, say, the number of A alleles in an individual. This view was also consistent with the observed correlation of trait measure with kinship-degree among relatives. This history has been thoroughly documented. But there are some bits, important bits, missing, especially when it comes to the fervor for Big Data 'omics analysis of human diseases and other traits. In essence, we are still, a century later, conceptual prisoners of Mendel.

'Omics over the top: key questions generally ignored
Let us take GWAS (genomewide association studies) on their face value. GWAS find countless 'hits', sites of whatever sort across the genome whose variation affects variation in WhateverTrait you choose to map (everything simply must be 'genomic' or some other 'omic, no?). WhateverTrait varies because every subject in your study has a different combination of contributing alleles. Somewhat resembling classical Mendelian recessiveness, contributing alleles are found in cases as well as controls (or across the measured range of quantitative traits like stature or blood pressure), where the measured trait reflects how many A's one has: WhateverTrait is essentially the sum of A's in 'cases', which may be interpreted as a risk--some sort of 'probability' rather than certainty--of having been affected or of having the measured trait value.

We usually treat risk as a 'probability,' a single value, p, that applies to everyone with the same genotype. Here, of course, no two subjects have exactly the same genotype so some sort of aggregate risk score, adding up each person's 'hits', is assigned a p. This, however, tacitly assumes something like that each site contributes some fixed risk or 'probability' of affection. But this treats these values as if they were essential to the site, each thus acting as a parameter of risk. That is, sites are treated as a kind of fixed value or, one might say 'force', relative to the trait measure in question.

One obvious and serious issue is that these are necessarily estimated from past data, that is, by induction from samples. Not only is there sampling variation that usually is only crudely estimated by some standard statistical variation-related measure, but we know that the picture will be at least somewhat different in any other sample we might have chosen, not to mention other populations; and those who are actually candid about what they are doing know very well that the same people living in a different place or time would have different risks for the same trait.

No study is perfect, so we use some conveniently assumed well-behaved regression/correction adjustments to account for the statistical 'noise' due to factors like age, sex, and unmeasured environmental effects. Much worse than these issues, there are clearly factors of imprecision, and the obvious major one, taboo even to think about much less to mention, that relevant future factors (mutations, environments, lifestyles) are unknowable, even in principle. So what we really do, are forced to do, is extend what the past was like to the assumed future. But besides this, we don't count somatic changes (mutation arising in body tissues during life, that were not inherited), because they'd mess up our assertions of 'precision', and we can't measure them well in any case (so just shut one's eyes and pretend the ghost isn't in the house!).

All of these together mean that we are estimating risks from imperfect existing samples and past life-experience, but treating them as underlying parameters so that we can extend them to future samples. What that does is equate induction with deduction, assuming the past is rigorously parametric and will be the same in the future; but this is simply scientifically and epistemologically wrong, no matter how inconvenient it is to acknowledge this. Mutations, genotypes, and environments of the future are simply unpredictable, even in principle.

None of this is a secret, or new discovery, in any way. What it is, is inconvenient truth. These things should have been enough, by themselves and without badgering investigators about environmental factors that (we know very well, typically predominate) prevent all the NIH's precision promises from being accurate ('precise'), or even to a knowable degree. Yet this 'precision' sloganeering is being, sheepishly, aped all over the country by all sorts of groups who don't think for themselves and/or who go along lest they get left off the funding gravy train. This is the 'omics fad. If you think I am being too cynical, just look at what's being said, done, published, and claimed.

These are, to me, deep flaws in the way the GWAS and other 'omics industries, very well-heeled, are operating these days, to pick the public's pocket (pharma may, slowly, be awakening-- Lancet editorial, "UK life science research: time to burst the biomedical bubble," Lancet 392:187, 2018). But scientists need jobs and salaries, and if we put people in a position where they have to sing in this way for their supper, what else can you expect of them?

Unfortunately, there are much more serious problems with the science, and they have to do with the point-cause thinking on which all of this is based.

Even a point-cause must act through some process
By far most of the traits, disease or otherwise, that are being GWAS'ed and 'omicked these days, at substantial public expense, are treated as if the mapped 'causes' are point causes. If there are n causes, and a person has an unlucky set m out of many possible sets, one adds 'em up and predicts that person will have the target trait. And there is much that is ignored, assumed, or wishfully hidden in this 'will'. It is not clear how many authors treat it, tacitly, as a probability vs a certainty, because no two people in a sample have the same genotype and all we know is that they are 'affected' or 'unaffected'.

The genomics industry promises, essentially, that from conception onward, your DNA sequence will predict your diseases, even if only in the form of some 'risk'; the latter is usually a probability and despite the guise of 'precision' it can, of course, be adjusted as we learn more. For example, it must be adjusted for age, and usually other variables. Thus, we need ever larger and more and longer-lasting samples. This alone should steer people away from being profiteered by DNA testing companies. But that snipe aside, what does this risk or 'probability' actually mean?

Among other things, those candid enough to admit it know that environmental and lifestyle factors have a role, interacting with the genotype if not, usually, overwhelming it, meaning, for example, that the genotype only confers some, often modest, risk probability, the actual risk much more affected by lifestyle factors, most of which are not measured or not measured with accuracy, or not even yet identified. And usually there is some aspect that relates to age, or some assumption about what 'lifetime' risk means. Whose lifetime?

Aspects of such a 'probability'
There are interesting issues, longstanding issues, about these probabilities, even if we assume they have some kind of meaning. Why do so many important diseases, like cancers, only arise at some advanced age? How can a genomic 'risk' be so delayed and so different among people? Why are mice, with very similar genotypes to humans (which is why we do experiments on them to learn about human disease) only live to 3 while we live to our 70s and beyond?

Richard Peto, raised some of these questions many decades ago. But they were never really addressed, even in an era when NIH et al were spending much money on 'aging' research including studies of lifespan. There were generic theories that suggested from an evolutionary theory why some diseases were deferred to later ages (it is called 'negative pleiotropy'), but nobody tried seriously to explain why that was from a molecular/genetic point of view. Why do mice only live only 3 years, anyway? And so on.

These are old questions and very deep ones but they have not been answered and, generally, are conveniently forgotten--because, one might argue, they are inconvenient.

If a GWAS score increases the risk of a disease, that has a long delayed onset pattern, often striking late in life, and highly variable among individuals or over time, what sort of 'cause' is that genotype? What is it that takes decades for the genes to affect the person? There are a number of plausible answers, but they get very little attention at least in part because that stands in the way of the vested interests of entrenched too-big-to-kill Big Data faddish 'research' that demands instant promises to the public it is trephining for support. If the major reason is lifestyle factors, then the very delayed onset should be taken as persuasive evidence that the genotype is, in fact, by itself not a very powerful predictor.

Why would the additive effects of some combination of GWAS hits lead to disease risk? That is, in our complex nature why would each gene's effects be independent of each other contributor? In fact, mapping studies usually show evidence that other things, such as interactions are important--but they are at present almost impossibly complex to be understood.

Does each combination of genome-wide variants have a separate age-onset pattern, and if not, why not? And if so, how does the age effect work (especially if not due to person-years of exposure to the truly determining factors of lifestyle)? If such factors are at play, how can we really know, since we never see the same genotype twice? How can we assume that the time-relationship with each suspect genetic variant will be similar among samples or in the future? Is the disease due to post-natal somatic mutation, in which case why make predictions based on the purported constitutive genotypes of GWAS samples?

Obviously, if long delayed onset patterns are due not to genetic but to lifestyle exposures interacting with genotypes, then perhaps lifestyle exposures should be the health-related target, not exotic genomic interventions. Of course, the value of genome-based prediction clearly depends on environmental/lifestyle exposures, and the future of these exposure is obviously unknowable (as we clearly do know from seeing how unpredictable past exposures have affected today's disease patterns).

The point here is that our reliance on genotypes is a very convenient way of keeping busy, bringing in the salaries, but not facing up to the much more challenging issues that the easy one (run lots of data through DNA sequencers) can't address. I did not invent these points, and it is hard to believe that at least the more capable and less me-too scientists don't clearly know them, if quietly. Indeed, I know this from direct experience. Yes, scientists are fallible, vain, and we're only human. But of all human endeavors, science should be based on honesty because we have to rely on trust of each other's work.

The scientific problems are profound and not easily solved, and not soluble in a hurry. But much of the problem comes from the funding and careerist system that shackles us. This is the deeper explanation in many ways. The paint on the House of Science is the science itself, but it is the House that supports that paint that is the real problem.

A civically responsible science community, and its governmental supporters, should be freed from the iron chains of relentless Big Data for their survival, and start thinking, seriously, about the questions that their very efforts over the past 20 years, on trait after trait, in population after population, and yes, with Big Data, have clearly revealed.

Saturday, October 6, 2018

And yet it moves....our GWAScopes and Galileo's lesson on reality

By Ken Weiss

In 1633, Galileo Galilei was forced to recant before the Pope his ideas about the movement of the Earth, or else to face the most awful penalty. As I understand the story, he did recant....but after leaving the Cathedral, he stomped his foot on the ground, and declared "And yet it moves!" For various reasons, usually reflecting their own selfish vested interests, the powers that be in human society frequently stifle unwelcome truths, truths that would threaten their privileged well-being. It was nothing new in Galileo's time--and it's still prevalent today.

Galileo: see Wikipedia "And yet it moves"

All human endeavors are in some ways captives of current modes of thinking--world-views, beliefs, power and economic structures, levels of knowledge, and explanatory frameworks. Religions and social systems often, or perhaps typically, constrain thinking. They provide comforting answers and explanations, and people feel threatened by those not adhering, not like us in their views. The rejection of heresy applies far beyond formal religion. Dissenters or non-believers are part of 'them' rather than 'us', a potential threat, and it is thus common if not natural to distrust, exclude, or even persecute them.

At the same time, the world is as the world really is, especially when it comes to the physical Nature. And that is the subject of science and scientific knowledge. We are always limited by current knowledge, of course, and history has shown how deeply that can depend on technology, as Galileo's experience with the telescope exemplifies.

When you look through a telescope . . . .
In Galileo's time, it was generally thought or perhaps believed is a better word, that the cosmos was God's creation as known by biblical authority. It was created in the proverbial Genesis way, and the earth--with we humans on it--was the special center of that creation. The crystal spheres bearing the stars and planets, circled around and ennobled us with their divine light. In the west, at least, this was not just the view, it was what had (with few exceptions) seemed right since the ancients.

But knowledge is often, if not perhaps always, limited by our senses, and they in turn are limited by our sensory technology. Here, the classical example is the invention of the telescope, and eventually, what that cranky thinker Galileo saw through it. Before his time, we had we had our naked eyes to see the sun move, and the stars seemed quite plausibly to be crystal spheres bearing twinkles of light, rotating around us.

If you don't know the story, Wikipedia or many other sources can be consulted. But it was dramatic! Galileo's experience taught science a revolutionary lesson about reality vs myth and, very directly, about the importance of technology in our understanding of the world we live in.

The lesson from Galileo was that when you look through a telescope you are supposed to change your mind about what is out there in Nature. The telescope lets you see what's really there--even if it's not what you wanted to see, or thought you'd see, or would be most convenient for you to see.

Galileo's telescope (imagined). source: news.nationalgeographic.com

From Mendel's eyes to ours
Ever since antiquity, plant and animal breeders empirically knew about inheritance, that is, about the physical similarities between parents and offspring. Choose parents with the most desirable traits, and their offspring will have those traits, at least, so to speak, on average. But how does that work?

Mendel heard lectures in Vienna that gave him some notion of the particulate nature of matter. When, in trying to improve agricultural yields, he noticed discrete differences, he decided do test their nature in pea plants which he knew about and were manageable subjects of experiments to understand the Molecular Laws of Life (my phrase, not his).

Analogies are never perfect, but we might say that Mendel's picking discrete, manageable traits was like pre-Newtonians looking at stars but not at what controlled their motion. Mendel got an idea of how parents and offspring could resemble each other in distinct traits. In a similar way that a telescope was the instrument that allowed Galileo to see the cosmos better, and do more observing than guessing, geneticists got their Galilean equivalent, in genomewide mapping (GWAS), which allowed us to do less guessing about inheritance and to see it better. We got our GWAScope!

But what have we done with our new toy? We have been mesmerized by gene-gazing. Like Galileo's contemporaries who, finally accepting that what he saw really was there and not just an artifact of the new instrument, gazed through their telescopes and listed off this and that finding, we are on a grand scale just enumerating, enumerating, and enumerating. We even boast about it. We build our careers on it.

That me-too effort is not surprising nor unprecedented. But it is also become what Kuhn called 'normal science'. It is butting our heads upon a wall. It is doing more and more of the same, without realizing that what we see is what's there, but we're not explaining it. From early in the 20th century we had quantitative genetics theory--the theory that agricultural breeders have used in formal ways for that century, making traditional breeding that had been around since the discovery of agriculture, more formalized and empirically rigorous. But we didn't have the direct genetic 'proof' that the theory was correct. Now we do, and we have it in spades.

We are spinning wheels and spending wealth on simple gene-gazing. It's time, it's high time, for some new insight to take us beyond what our GWAScopes can see, digesting and understanding what our gene-gazing has clearly shown.

Unfortunately, at present we have an 'omics Establishment that is as entrenched, for reasons we've often discussed here on MT, as the Church was for explanations of Truth in Galileo's time. It is now time for us to go beyond gene-gazing. GWAScopes have given us the insight--but who will have the insight to lead the way?

Wednesday, August 15, 2018

On the 'probability' of rain (or disease): does it make sense?

By Ken Weiss

We typically bandy the word probability around, as if we actually understand it. The term, or a variant of it like probably, can be used in all sorts of contexts that, on the surface seem quite obvious and related to some sense of uncertainty; e.g., "That's probably true," or "Probably not." But is it so obvious? Are the concepts clear at all? When are they, actually, more than just informally and subjectively, meaningful?

Will it rain today? Might it? What is the chance of rain?
One of the typical uses of probabilistic terms in daily life has to do with weather predictions. As a former meteorologist myself, I find this a cogent context in which to muse about these terms, but with extensions that have much deeper relevance.

Here is an episode of a generally very fine BBC Radio 4 program called More or Less, whose mission is to educate listeners on the proper use and understanding of numbers, statistics, probabilities and the like. This episode deals, somewhat unclearly and to me quite vaguely, unsatisfactorily, and even somewhat defensively, about the use and interpretation of weather forecasts.

So what does a forecast calling for an x% chance of rain mean? Let's think of an imaginary chessboard laid over a particular location. It is raining under the black, but not under the white squares. There is nothing probabilistic about this. 50% of people in the area will experience rain. If I don't know where you live, exactly, I'd have to say that you have a 50% chance of rain, but that has nothing to do with the weather itself but rather with my uncertainty of where you live. Even then it's misleadingly vague since people don't live randomly across a region (they are, for example, usually clustered in some sub-regions).

Another interpretation is that I don't know where the black and white squares will be exactly, at any given time, but my weather models predict that in about half of the region, rain will fall. This could be because my computer models, necessarily based on imperfect measurement and imperfect theory, are therefore imperfect--but I run them many times, making small random changes in various values to account for that imperfection, and I find that among these model runs, 50% of the time at any given spot, or 50% of the entire area under consideration, experiences rain.

Or, is it that there is an imaginary chessboard moving overhead and so the 50% of the land will be under the black and hence getting rain at any given time, and thus that any given area will only get it 50% of the time, but every area will certainly get rain at some time during the forecast period, indeed every area will be getting rain half of the period? Then the best forecast is that you will get wet if you stay outside all day, but if you only run out to get the mail you might not? Might??

Or is it that my models are imperfect but theory or experience tell me that there is a 50% chance of any rain in the area--that is, my knowledge can tell me no more than that. In that case, any given place will have this guesstimated chance of rain. But does that mean at any given time during the forecast period, or at every time during it? Or is it that my knowledge is very good, but the meteorological factors--the nature of atmospheric motion and so on--only probabilistically form droplets that are large enough not just to be clouds but to fall to earth? That is, is it the atmospheric process itself that is probabilistic--at least based on the theory, since I can't observe every droplet.

If a rain-generating front is passing through the area, it could rain everywhere along the front, but only until the front has moved past the area. Thus, it may rain with 100% certainty, but only 50% of the specified time, if the front takes that amount of time to pass through.

I've undoubtedly only mentioned some of the many ways that weather forecast probabilities can be intended or interpreted as meaning. It is not clear--and the BBC program shows this--that everyone or perhaps even anyone making them actually understands, or is thinking clearly about, what these probability forecasts mean. Even meteorologists themselves, especially when dumbing down for the average Joe who only wants to know if he should carry his brolly with him, are likely ('probably'?!) unclear about these values. Probably they mean a bit of this and a bit of that. I wonder if anyone can know which of the meanings are being used in any given forecast.

Well, fine, everyone knows that nobody really knows everything about the weather. Anyway, it's not that big of a deal if you get an unexpected drenching now and then, or more often haul your raincoat to work but never need it.

But what about things that really matter, like your future health? My doc takes my blood pressure and looks at my weight, and may warn me that I am at 'risk' of a heart attack or stroke--that without taking some preventive measures I may (or probably will) have such a fate. That's a lot more important than a soaked shirt. But what does it mean? Isn't everybody at some risk of these diseases?Does my doc actually know? Does anybody? Who is thinking clearly about these kinds of risk pronouncements?

OK, caveats, caveats: but will I get diabetes?
In genomics 'precision' genomic medicine is one of the genomics marketing slogans of the day, the very vague (I would say culpably false) promise that from your genotype we can predict your future--that's what 'precision' implies. The same applies even if weaseling now would include environmental factors as well as genomic ones. And the idea implies knowledge not just of some vague probability, but by implication it means perfection--prediction with certainty. But to what extent--if any at all--is the promise, or can the promise be true? What would it mean to be 'true'? After all, anyone might get, say type 2 diabetes, mightn't they? Or, more specifically, what does such a sentence itself even mean, if anything?

We know that, today at least, some people get diabetes sometime in their lives, and even if we don't know why or which ones, that seems like a safe assertion. But to say that any person, not specifically identified, might become diabetic is rather useless. We want a reason--a cause--and if we have that we assume it will enable us to identify specifically vulnerable individuals. Even then, however, we don't know more than to say, in some sense that we may not even understand as well as we think we do, that not all the vulnerable will get the disease: but we seem to think that they share some probability of getting it. But what does that mean, and how do we get such figures?

Does it mean that among all those with a given GWAS! genotype, (1) a fraction f will get diabetes?(2) a fraction f will get diabetes if they live beyond some specified age? (3) a fraction f will get diabetes before they die if they live the same lifestyle diet as those from whom the risk was estimated? (4) a net fraction f will get diabetes, pro-rated year by year as they age; (5) a net fraction related to f will get diabetes, but that is adjusted for current age, sex, race, etc.?

What about each individual consulting their Big Data genomic counselor? Are these fractions f related to each individual as a probability p=f that s/he will get diabetes (conditional on things like items 1-5 above)? That is, is every person at the same risk?

Only if we can equate our past sample, from which we estimated f by induction to the probability p used by deduction to assert for each new individual might this, even in principle, lead to 'precision genomic medicine'. It is prediction, not just description that we are being promised. Even if we were thinking in public health terms, this is essentially the same, because it would relate to the fraction of individuals who will be affected in the future, because each person is exposed to the same probability.

Of course, we might believe that each person has some unique probability of getting diabetes (related, again, to the above items), and that f reflects the mix (e.g., average) of these probabilities. But then, we have to assume that all the genotypes and lifestyles and so on in the current group whose future we're offering 'precision' predictions is exactly like the sample from which the predictions were derived, that this mix of risks is, somehow, conserved. How can such an assumption ever be justified?

Of course, we know very well that no current sample whose future we want to be precise about will be exactly the same as the past sample from which the probabilities (or fractions) were derived. Obviously, much will differ, but we also know that we simply have no way to assess by how much it will differ. For example, future diets, sociopolitical, and other factors that affect risk will not be the same as those in the past, and are inherently unpredictable. So, on what meaningful basis can 'precision' prediction be promised?

Just for fun, let's take the promise of precision genomic medicine at its face value. I go to the doc, who tells me

"Based on your genome sequence, I must advise you of your fate in regard to diabetes."
"Thanks, doc. Fire away!"
"You have a 23.5% chance of getting the disease."
"Wow! That sounds high! That means I have a 23.5% chance that I won't die in a car or plane crash, right? That's very comforting. And if about 10% of people get cancer, then of my 76.5% chance of not getting diabetes, it means only a 7.65% chance of cancer! Again, wow!"
"But wait, Doc! Hold on a minute. I might get diabetes and cancer, right? About a 7.65% percent chance of that, right?"
"Um, well, um, it doesn't work quite that way [to himself, sotto voce: "at least I think so..."].....that's because you might die of diabetes, so you wouldn't get cancer. Of course, the cancer could come first, but it would linger, because you have to live long enough to experience your 23.5% risk of diabetes. That would not be good news. And, of course, you could get diabetes and then get in a crash. I said get diabetes, not die of it, after all!"

I gather you, too, can imagine how to construct many different sorts of fantasy conversations like this, even rashly assuming that your doctor understood probability, had read his New England Journal regularly when not too sleepy after a day's work at the clinic--and that the article in the NEJM was actually accurate. And that NIH knew in sincerity what they were promising in the way of genomic predictability promises. But wait! The medical journals, and even the online genotyping scam companies--you can probably name one or two of them--change your estimated risks from time to time as new 'data' come in. So when can I assume case-closed and I (well, the Doc) really knows the true probabilities?

I mean, what if there are no such true probabilities, because even if there were, not just knowledge, but also circumstances (cultural, not to mention mutations) continually change, and what if we have no way whatever to know how they're gonna change? Then what is the use of these 'precision' predictions? They, at best, only apply to a single, current instance. So what (if anything at all) does 'precision' mean?

It only takes a tad of thinking to see how precisely imprecise these promises all are--must be, except very short-term extrapolations of what past data showed, and extrapolations of unknown (and unknowable) 'precision'. Except, of course, the very precise truth that you, as a taxpayer, are going to foot the bill for a whole lot more of this sort of promises.

Unlike the weather, we don't have anything close to as rigorous an understanding of human biology and cultures as we do of the behavior of gases and fluids (the atmosphere). We might want to say, self-protectingly and more honestly modest, that our use of 'probability' is very subjective and really just means an extrapolated rough average of some unspecifiable sort. But then that doesn't sound like the glowing promise of 'precision', does it? One has to wonder what sort of advice would make scientifically proper, and honorable, use of the kind of probabilistic, vague, ephemeral evidence we have when we rely on 'omics approaches, or even when it's the best we can do at present.

In meteorology, it used to be (when I was playing that game) that we'd joke "persistence is the best forecast". This was, of course, for short range, but short range was all we could do with any sort of 'precision'. We are pretty much in that situation now, in regard to genomics and health.

The difference is, weather forecasters are honest, and admit what they don't know.

Tuesday, August 14, 2018

The Placebome.....can you believe that!

By Ken Weiss

Is it only religion that feeds and reassures the gullible, no matter what catastrophes strike?

When a baby is born with serious health issues, this is apparently the loving God's will (to test the parents' faith; God can, after all, save the baby's soul). But rather than just blaming God, perhaps one's faith in this same devilish Being, that faith itself, could have curative powers. At least those powers might extend to the believer him or herself.

When a person's mood ameliorates a disease, yet no formal medical treatment has been involved, that is a psychological effect. When the person is in a case-control drug trial study, in which s/he has (though unaware of it) been given a sugar pill--a placebo--rather than the drug under test, and that person's health improves anyway, that is called the placebo effect.

It is important when testing a new drug to have a way to determine whether it really does nothing (or, indeed, is harmful) rather than its intended effect. Since people who are ill might get better or worse for various reasons, a drug trial often compares those patients given the drug with those who are given a placebo. The drug is considered to be efficacious if it does something, rather than nothing--nothing, that is, as is assumed about the placebo.

But are some unjustified if convenient assumptions being made in this long-used standard comparison as a test of the new drug's efficacy? Studies including placebo have long been relatively standard, if not indeed mandatory for drug approval. But how well are the comparisons--and their underlying assumptions--understood? The answer may not be as obvious as is generally assumed.

Back pain that's a headache
What about this paper by Carvalho et al., in the journal Pain (Carvalho et a., vol 157, number 12, 2016)? The authors did a randomized control trial of open-label placebos (OLPs) taken in the usual dose way for the usual 3 weeks on patients suffering low back pain. The authors found clear (that is, statistically significant) reduction in symptoms--even though the 'control' patients knew they were taking a placebo. Perhaps they still thought they were taking medicine, or perhaps just being in a study seemed to them, somehow, to be a form of care, something positive--that is, systematically better than no treatment. But this is not supposed to happen, and relates to a variety of very important, if equally inconvenient, issues about what counts as evidence, what counts as therapy and so on.

The samples in the Carvalho study were small and one can quibble about the quality of the research if one wants to dismiss it. (E.g., if it were really true, why wasn't it published in a major journal? Did reactionary reviewers from these journals keep it from being published there?). Still, if the placebo effect is real, the idea should not be a surprise. Biologically, there really need be no reason why subjects must be blinded to being given placebos in order for them to work.

But is it appropriate to ask whether, in a similar way, religious faith might have a placebo effect, and if so, should it be part of case-control studies of new drugs or treatments? If so, then.....

....some things to consider
Here's an interesting thought: If the placebo effect is real, then how do we know that actual medicines work? They may seem better than placebos in comparison studies, but what if a substantial fraction of the treatment effect is for religious or other reasons? That is, these subjects experience a kind of placebo effect? Then, the case-control distinction is less than one thinks: perhaps as a result, the efficacy of the medicine is actually substantially less than is credited by the standard kinds of placebo-comparison study. Perhaps placebo-response is part of the case side of the comparison, as well as the control side, and without them the 'case' effect would no longer be significant, or as significant?

If we are doing a placebo-based test of a new drug, should case and control religious or other beliefs be identified, and matched in the two groups? What about atheists--is that also a comparable faith, or would it serve as a control on such faith?

Even to acknowledge the possibility that we've under-rated the placebo effect, and over-rated the drugs that we rely on, and that belief systems can even have such effect, raises interesting and important questions. What if we told a patient that s/he had a placebic genotype, and thus, say, tended to believe everything s/he heard or read? Then would s/he realize this and stop believing, blocking the placebo effect? In not knowing if s/he were a case or control, actually reduce even the 'case' effect? Would we tell such people of some meds they could take to 'cure' this placebo-responsive trait? Would they take it? These could be interesting areas to explore, though deciding how to do definitive studies would, by the very nature of the subject, not be easy.

And yet. . . .
Of course, scientists being the way they are, there is now a proposed 'placebome' project (Hall et al., Trends in Mol Med, 21 (5), 2015). The researchers want to search for genomic regions that affect the effect which, they claim varies among people and hence assume it must be 'genetic' (this might even be reasonable, in principle, but way too premature for yet another GWAS project). Is it as silly, bandwagonish, transparent, and premature a version of unquestioning belief and/or marketing as one can imagine? I think so--you can, if you wish, of course, look at the paper and judge for yourself.

But even if this is capitalizing on the 'omics fad, a transparent me-too money-seeking strategy that our venal system imposes, that doesn't vitiate the idea that placebic effects could, in fact, be both real and important. Nor that truly thoughtful, systematic ways of investigating its nature, not just some statistical results related to it, would be possible and appropriate. But to do this, how would such a study be designed?

One thing this all suggests to me is that we may not have defined placebos carefully (or knowledgeably) enough, or don't understand what is going on that could count for a physiological (as opposed to 'merely psychological') effect. Since we have the embedded notion that science is about material technology, statistics, and so on, perhaps we just don't believe (and that's the right word for it) that things can happen that are not part of our science heritage, which largely derives from reductionist physics. If we've not looked in a properly designed way for this effect, perhaps we should. At the very least, there may be much to learn.

But before rushing to the 'omics market, there are interesting qusetions to ask. Why aren't religious believers who pray for God's grace, generally healthier than the non-believers? Or is there, in fact, a notable but undocumented difference? Does serious religiosity serve as a placebo in daily life, and if not, why not? If there are measurable physiological or neural pathways that can be identified during placebic experience, are they potential therapeutic targets?

But there's a deeper more serious question
The fact of placebo effects is generally interesting, but raises an important, very curious issue. How can a placebo effect work on the diversity of traits for which it has been suggested? If all a placebic effect does is make you feel better no matter how sick you are, then it's not really placebic in that it doesn't mimic the drug being taken and shouldn't affect the specific disease, just the patient's mood. But if it can affect the disease, how can that be?

Placebos seem to work in many different drugs and treatments, for many physically and/or physiologically different and unrelated disorders. At least, I think that is what has been reported. But these involve different tissues and systems. So how does the patient 'know' which tissue or physiological system to fix, that is, which cell type a real medicine would be targeting, when believing s/he has taken some effective medicine?

I know very little about the placebo effect, and it doubtlessly shows in this post, to anyone who does. But I think these are important, or indeed fundamental questions that include, but go beyond asking if the effect is real: they ask what the effect could actually be. Before we untangle these issues, and understand what the placebo effect really is, we should be highly skeptical of any 'omic project claiming that it will map it and find out what genes are responsible for it. Among other things, as I've tried to point out here, one needs to know what 'it' actually is. And as regards genetic studies, is there the proper kind of plausibility evidence on which to build an 'omics case: is there, for example, any reason at all to believe the placebo is familial?

There is already huge waste of research money chasing 'omics fads these days, while real problems go under-served. One need not jump on every bandwagon. If there are real questions here, and there seem to be, then the groundwork needs to be laid before we go genome searching.

Thursday, June 14, 2018

A new biomedical insight?

By Ken Weiss

Here is a thoughtful and timely quote:

". . . . as no single disease can be fully understood in a living person; for every living person has his individual peculiarities and always has his own peculiar, new, complex complaints unknown to medicine—not a disease of the lungs, of the kidneys, of the skin, of the heart, and so on, as described in medical books, but a disease that consists of one out of the innumerable combinations of ailments of those organs. This simple reflection can never occur to doctors . . . . because it is the work of their life to undertake the cure of disease, because it is for that that they are paid, and on that they have wasted the best years of their life. And what is more, that reflection could not occur to the doctors because they saw that they unquestionably were of use . . . not because they made the patient swallow drugs, mostly injurious (the injury done by them was hardly perceptible because they were given in such small doses). They were of use, were needed, were indispensable in fact (for the same reason that there have always been, and always will be, reputed healers, witches, homœopaths and allopaths), because they satisfied the moral cravings of the patient . . . . They satisfied that eternal human need of hope for relief, that need for sympathetic action that is felt in the presence of suffering, that need that is shown in its simplest form in the little child, who must have the place rubbed when it has hurt itself. The child . . . . feels better for the kissing and rubbing. The child cannot believe that these stronger, cleverer creatures have not the power to relieve its pain. . . ."

The language seems a bit arcane, and this is a translation, but its cogency as a justification for today's Big Data feeding frenzy is clear. People who are ill, or facing death, will naturally grasp at whatever straws may be offered them. In one way or another, this has been written about even back to Hippocrates.

Of course, palliation or cure of what disorders can be eased or cured should be the first order and obligation of medicine. Where nothing like that is clearly known, trials of possible treatments are surely in order, if the patient understands at least the basic nature of the research, for example, that some are being given placebos while others the treatment under investigation. Science doesn't know everything, and we often must learn the hard way, by trial and error.

Given that, perhaps the most important job of responsible science is to temper its claims, and to offer doses of the reality that life is a temporary arrangement, and that we need to get the most out of that bit of it to which we are privileged to have. So research investment should be focused on tractable, definable problems, not grandiose open-ended schemes. But promises of the latter are nothing new to society (in medicine or other realms of life).

The problem with false promises, by preachers of any type, is that they mislead the gullible, and in many cases this is known by those making the promises--or could and should be known. The role of false promise in religion is perhaps debatable, but its role in science, while understandable given human ego and the struggle for attention, careers, and funding, is toxic. People suffering, of poverty, hardship, or disease, seek and deserve solace. But science needs to be protected from the temptations of huckstering, so that it can do its very important business as objectively as is humanly possible.

By the way, the quote is from about 150 years ago, from War and Peace, Tolstoy's 1869 masterpiece about the nature of causation in human affairs.

Thursday, April 26, 2018

Gene mapping: More Monty Python than Monty Python

By Ken Weiss

The gene for ...... (Monty Python)

Here's a link to a famous John Cleese (of Monty Python fame) sketch on gene mapping. We ask you to decide whether this is funnier than the daily blast of GWAS reports and their proclaimed transformative findings: which is more Monty than the full Monty.

Why we keep spending money on papers that keep showing how MontyPythonish genomewide association with complex traits is, is itself a valid question. To say, with a straight face, that we now know of hundreds, much less of thousands, of genomewide sites that affect some trait--in some particular sample of humans, with much or most of the estimated heritability yet unaccounted for, without saying that enough is enough, is almost in itself a comedy routine.

We have absolutely no reason--or, at least, no need--to criticize anything about individual mapping papers. Surely there are false findings, mis-used statistical tests, and so on, but that is part of the normal life in science, because we don't know everything and have to make assumptions, etc. Some of the findings will be ephemeral, sample-specific, and so on. That doesn't make them wrong. Instead, the critique should be aimed at authors who present such work with a straight face as if it is (1) important, (2) novel in any really novel way, and (3) not saying that the paper shows why, by now with so many qualitatively similar results, we should stop public funding of this sort of work. We should move on to more cogent science that reflects, but doesn't just repeat, the discovery of genomic causal (or, at least, associational) complexity.

The bottom line
What these studies show, and there is no reason to challenge the results per se, is that complex traits are not to be explained by simple, much less additive genetic models. There is massive causal redundancy with similar traits due to dissimilar genotypes. But this shouldn't be a surprise. Indeed, we can easily account for this in terms of evolutionary phenomena, both related to processes like gene duplication and the survival protection that alternative pathways provides.

Even if each GWAS 'hit' is correct and not some sort of artifact, it is unclear what the message is. To us, who have no vested interest in continuing, open-ended GWAS efforts with ever-larger samples, the bottom line is that this is not the way to understand biological causation.

We reach that view on genomic considerations alone, without even considering the environmental and somatic mutation components of phenotype generation, though these are often obviously determinative (as secular trends in risk clearly show). We reach this view without worrying about the likelihood that many or perhaps even most of these 'hits' are some sort of statistical, sampling, analytic or other artifact, or are so indirectly related to the measured trait, or so environment-dependent as to be virtually worthless in any practical sense.

What GWAS ignore
There are also three clear facts that are swept under the rug, or just ignored, in this sort of work. One is somatic mutation, which are not detected in constitutive genomewide studies but could be very important (e.g., cancer). The second is that DNA is inert and does something only in interaction with other molecules. Many of those relate to environmental and lifestyle exposures, which candid investigators know are usually dreadfully inaccurately measured. The third is that future mutations, not to mention future environments are unpredictable, even in principle. Yet the repeatedly stressed objective of GWAS is 'precision' predictive medicine. It sounds like a noble objective, but it's not so noble given the known and knowable reasons these promises can't be met.

So, if biological causation is complex, as these studies and diverse other sorts of direct and indirect evidence clearly show, then why can't we pull the plug on these sorts of studies, and instead, invest in some other mode of thinking, some way to do focused studies where genetic causation is clear and real, rather than continuing to feed the welfare state of GWAS?

We're held back by inertia, and the lack of better ideas, but another important if not defining constraint is that investigator careers depend on external funding and that leads to safe me-too proposals. We should stop imitating Monty Python, and recognize that if the gene-causation question even makes sense, some new way of thinking about it is needed.

Sunday, October 15, 2017

Understanding Obesity? Fat Chance!

By Ken Weiss

Obesity is one of our more widespread and serious health-threatening traits. Many large-scale mapping as well as extensive environmental/behavioral epidemiological studies of obesity have been done over recent decades. But if anything, the obesity epidemic seems to be getting worse.

There's deep meaning in that last sentence: the prevalence of obesity is changing rapidly. This is being documented globally, and happening rapidly before our eyes. Perhaps the most obvious implication is that this serious problem is not due to genetics! That is, it is not due to genotypes that in themselves make you obese. Although everyone's genotype is different, the changes are happening during lifetimes, so we can't attribute it to the different details of each generation's genotypes or their evolution over time. Instead, the trend is clearly due to lifestyle changes during lifetimes.

Of course, if you see everything through gene-colored lenses, you might argue (as people have) that sure, it's lifestyles, but only some key nutrient-responding genes are responsible for the surge in obesity. These are the 'druggable' targets that we ought to be finding, and it should be rather easy since the change is so rapid that the genes must be few, so that even if we can't rein in McD and KFC toxicity, or passive TV-addiction, we can at least medicate the result. That was always, at best, wishful thinking, and at worst, rationalization for funding Big Data studies. Such a simple explanation would be good for KFC, and an income flood for BigPharma, the GWAS industry, DNA sequencer makers, and more.....except not so good for those paying the medical price, and those who are trying to think about the problem in a disinterested scientific way. Unfortunately, even when it is entirely sincere, that convenient hope for a simple genetic cause is being shown to be false.

A serious parody?
Year by year, more factors are identified that, by statistical association at least and sometimes by experimental testing, contribute to obesity. A very fine review of this subject has appeared in the mid-October 201 Nature Reviews Genetics, by Ghosh and Bouchard, which takes seriously not just genetics but all the plausible causes of obesity, including behavior and environment, and their relationships as best we know them, and outlines the current state of knowledge.

Ghosh and Bouchard provide a well-caveated assessment of these various threads of evidence now in hand, and though they do end up with the pro forma plea for yet more funding to identify yet more details, they provide a clear picture that a serious reader can take seriously on its own merits. However, we think that the proper message is not the usual one. It is that we need to rethink what we've been investing so heavily on.

To their great credit, the authors melded behavioral, environmental, and genetic causation in their analysis. This is shown in this figure, from their summary; it is probably the best current causal map of obesity based on the studies the authors included in their analysis:

If this diagram were being discussed by John Cleese on Monty Python, we'd roar with laughter at what was an obvious parody of science. But nobody's laughing and this isn't a parody! And it is by no means of unusual shape and complexity. Diagrams like this (but with little if any environmental component) have been produced by analyzing gene expression patterns even just of the early development of the simple sea urchin. But we seem not to be laughing, which is understandable because they're serious diagrams. On the other hand, we don't seem to be reacting other than by saying we need more of the same. I think that is rather weird, for scientists, whose job it is to understand, not just list, the nature of Nature.

We said at the outset of this post that 'the obesity epidemic seems to be getting worse'. There's a deep message there, but one essentially missing even from this careful obesity paper: it is that many of the causal factors, including genetic variants, are changing before our eyes. The frequency of genetic variants changes from population to population and generation to generation, so that all samples will look different. And, mutations happen in every meiosis, adding new variants to a population every time a baby is born. The results of many studies, as reflected in the current summary by Ghosh and Bouchard, show the many gene regions that contribute to obesity, their total net contribution is still minor. It is possible, though perhaps very difficult to demonstrate, that an individual site might account more than minimally for some individual carriers in ways GWAS results can't really identify. And the authors do cite published opinions that claim a higher efficacy of GWAS relative to obesity than we think is seriously defensible; but even if we're wrong, causation is very complex as the figure shows.

The individual genomic variants will vary in their presence or absence or frequency or average effect among studies, not to mention populations. In addition, most contributing genetic variants are too rare or weak to be detected by the methods used in mapping studies, because of the constraints on statistical significance criteria, which is why so much of the trait's heritability in GWAS is typically unaccounted for by mapping. These aspects and their details will differ greatly among samples and studies.

Relevant risk factors will come or go or change in exposure levels in the future--but these cannot be predicted, not even in principle. Their interactions and contributions are also manifestly context-specific, as secular trends clearly show. Even with the set of known genetic variants and other contributing factors, there are essentially an unmanageable number of possible combinations, so that each person is genetically and environmentally unique, and the complex combinations of future individuals are not predictable.

Risk assessment is essentially based on replicability, which in a sense is why statistical testing can be used (on which these sorts of results heavily rely). However, because these risk factor combinations are each unique they're not replicable. At best, as some advocate, the individual effects are additive so that if we just measure each in some individual add up each factor's effect, and predict the person's obesity (if the effects are not additive, this won't work). We can probably predict, if perhaps not control, at least some of the major risk factors (people will still down pizzas or fried chicken while sitting in front of a TV). But even the known genetic factors in total only account for a small percentage of the trait's variance (the authors' Table 2), though the paper cites more optimistic authors.

The result of these indisputable facts is that as long as our eyes are focused, for research strategic reasons or lack of better ideas, on the litter of countless minor factors, even those we can identify, we have a fat chance of really addressing the problem this way.

If you pick any of the arrows (links) in this diagram, you can ask how strong or necessary that link is, how much it may vary among samples or depend on the European nature of the data used here, or to what extent even its identification could be a sampling or statistical artifact. Links like 'smoking' or 'medication', not to mention specific genes, even if they're wholly correct, surely have quantitative effects that vary among people even within the sample, and the effect sizes probably often have very large variance. Many exposures are notoriously inaccurately reported or measured, or change in unmeasured ways. Some are quite vague, like 'lifestyle', 'eating behavior', and many others--both hard to define and hard to assess with knowable precision, much less predictability. Whether their various many effects are additive or have more complex interaction is another issue, and the connectivity diagram may be tentative in many places. Maybe--probably?--in such traits simple behavioral changes would over-ride most of these behavioral factors, leaving those persons for whom obesity really is due to their genotype, which would then be amenable to gene-focused approaches.

If this is a friable diagram, that is, if the items, strengths, connections and so on are highly changeable, even if through no fault of the authors whatever, we can ask when and where and how this complex map is actually useful, no matter how carefully it was assembled. Indeed, even if this is a rigidly accurate diagram for the samples used, how applicable is it to other samples or to the future?Or how useful is it in predicting not just group patterns, but individual risk?

Our personal view is that the rather ritual plea for more and more and bigger and bigger statistical association studies is misplaced, and, in truth, a way of maintaining funding and the status quo, something we've written much about--the sociopolitical economics of science today. With obesity rising at a continuing rate and about a third of the US population recently reported as obese, we know that the future health care costs for the consequences will dwarf even the mega-scale genome mapping on which so much is currently being spent, if not largely wasted. We know how to prevent much or most obesity in behavioral terms, and we think it is entirely fair to ask why we still pour resources into genetic mapping of this particular problem.

There are many papers on other complex traits that might seem to be simple like stature and blood pressure, not to mention more mysterious ones like schizophrenia or intelligence, in which hundreds of genomewide sites are implicated, strewn across the genome. Different studies find different sites, and in most cases most of the heritability is not accounted for, meaning that many more sites are at work (and this doesn't include environmental effects). In many instances, even the trait's definition itself may be comparably vague, or may change over time. This is a landscape 'shape' in which every detail is different, within and between traits, but is found in common with complex traits. That in itself is a tipoff that there is something consistent about these landscapes but we've not yet really awakened to it or learned how to approach it.

Rather than being skeptical about these Ghosh and Bouchard's' careful analysis or their underlying findings, I think we should accept their general nature, even if the details in any given study or analysis may not individually be so rigid and replicable, and ask: OK, this is the landscape--what do we do now?

Is there a different way to think about biological causation? If not, what is the use or point of this kind of complexity enumeration, in which every person is different and the risks for the future may not be those estimated from past data to produce figures like the one above? The rapid change in prevalence shows how unreliable these factors must be, at prediction--they are retrospective of the particular patterns of the study subjects. Since we cannot predict the strengths or even presence of these or other new factors, what should we do? How can we rethink the problem?

These are the harder question, much harder than analyzing the data; but they are in our view the real scientific questions that need to be asked.

Saturday, June 17, 2017

The GWAS hoax....or was it a hoax? Is it a hoax?

By Ken Weiss

A long time ago, in 2000, in Nature Genetics, Joe Terwilliger and I critiqued the idea then being pushed by the powers-that-be, that the genomewide mapping of complex diseases was going to be straightforward, because of the 'theory' (that is, rationale) then being proposed that common variants caused common disease. At one point, the idea was that only about 50,000 markers would be needed to map any such trait in any global populations. I and collaborators can claim that in several papers in prominent journals, in a 1992 Cambridge Press book, Genetic Variation and Human Disease, and many times on this blog we have pointed out numerous reasons, based on what we know about evolution, why this was going to be a largely empty promise. It has been inconvenient for this message to be heard, much less heeded, for reasons we've also discussed in many blog posts.

Before we get into that, it's important to note that unlike me, Joe has moved on to other things, like helping Dennis Rodman's diplomatic efforts in North Korea (here, Joe's shaking hands as he arrives in his most recent trip). Well, I'm more boring by far, so I guess I'll carry on with my message for today.....

There's now a new paper, coining a new catch-word (omnigenic), to proclaim the major finding that complex traits are genetically complex. The paper seems solid and clearly worthy of note. The authors examine the chromosomal distribution of sites that seem to affect a trait, in various ways including chromosomal conformation. They argue, convincingly, that mapping shows that complex traits are affected by sites strewn across the genome, and they provide a discussion of the pattern and findings.

The authors claim an 'expanded' view of complex traits, and as far as that goes it is justified in detail. What they are adding to the current picture is the idea that mapped traits are affected by 'core' genes but that other regions spread across the genome also contribute. In my view the idea of core genes is largely either obvious (as a toy example, the levels of insulin will relate to the insulin gene) or the concept will be shown to be unclear. I say this because one can probably always retroactively identify mapped locations and proclaim 'core' elements, but why should any genome region that affects a trait be considered 'non-core'?

In any case, that would be just a semantic point if it were not predictably the phrase that launched a thousand grant applications. I think neither the basic claim of conceptual novelty, nor the breathless exploitive treatment of it by the news media, are warranted: we've known these basic facts about genomic complexity for a long time, even if the new analysis provides other ways to find or characterize the multiplicity of contributing genome regions. This assumes that mapping markers are close enough to functionally relevant sites that the latter can be found, and that the unmappable fraction of the heritability isn't leading to over-interpretation of what is 'mapped' (reached significance) or that what isn't won't change the picture.

However, I think the first thing we really need to do is understand the futility of thinking of complex traits as genetic in the 'precision genomic medicine' sense, and the last thing we need is yet another slogan by which hands can remain clasped around billions of dollars for Big Data resting on false promises. Yet even the new paper itself ends with the ritual ploy, the assertion of the essential need for more information--this time, on gene regulatory networks. I think it's already safe to assure any reader that these, too, will prove to be as obvious and as elusively ephemeral as genome wide association studies (GWAS) have been.

So was GWAS a hoax on the public?
No! We've had a theory of complex (quantitative) traits since the early 1900s. Other authors argued similarly, but RA Fisher's famous 1918 paper is the typical landmark paper. His theory was, simply put, that infinitely many genome sites contribute to quantitative (what we now call polygenic) traits. The general model has jibed with the age-old experience of breeders who have used empirical strategies to improve crop, or pets species. Since association mapping (GWAS) became practicable, they have used mapping-related genotypes to help select animals for breeding; but genomic causation is so complex and changeable that they've recognized even this will have to be regularly updated.

But when genomewide mapping of complex traits was first really done (a prime example being BRCA genes and breast cancer) it seemed that apparently complex traits might, after all, have mappable genetic causes. BRCA1 was found by linkage mapping in multiply affected families (an important point!), in which a strong-effect allele was segregating. The use of association mapping was a tool of convenience: it used random samples (like cases vs controls) because one could hardly get sufficient multiply affected families for every trait one wanted to study. GWAS rested on the assumption that genetic variants were identical by descent from common ancestral mutations, so that a current-day sample captured the latest descendants of an implied deep family: quite a conceptual coup based on the ability to identify association marker alleles across the genome identical by descent from the un-studied shared remote ancestors.

Until it was tried, we really didn't know how tractable such mapping of complex traits might be. Perhaps heritability estimates based on quantitative statistical models was hiding what really could be enumerable, replicable causes, in which case mapping could lead us to functionally relevant genes. It was certainly worth a try!

But it was quickly clear that this was in important ways a fool's errand. Yes, some good things were to be found here and there, but the hoped-for miracle findings generally weren't there to be found. This, however, was a success not a failure! It showed us what the genomic causal landscape looked like, in real data rather than just Fisher's theoretical imagination. It was real science. It was in the public interest.

But that was then. It taught us its lessons, in clear terms (of which the new paper provides some detailed aspects). But it long ago reached the point of diminishing returns. In that sense, it's time to move on.

So, then, is GWAS a hoax?
Here, the answer must now be 'yes'! Once the lesson is learned, bluntly speaking, continuing on is more a matter of keeping the funds flowing than profound new insights. Anyone paying attention should by now know very well what the GWAS etc. lessons have been: complex traits are not genetic in the usual sense of being due to tractable, replicable genetic causation. Omnigenic traits, the new catchword, will prove the same.

There may not literally be infinitely many contributing sites as in the original statistical models, be they core or peripheral, but infinitely many isn't so far off. Hundreds or thousands of sites, and accounting for only a fraction of the heritability means essentially infinitely many contributors, for any practical purposes. This is particularly so since the set is not a closed one: new mutations are always arising and current variants dying away, and along with somatic mutation, the number of contributing sites is open ended, and not enumerable within or among samples.

The problem is actually worse. All these data are retrospective statistical fits to samples of past outcomes (e.g., sampled individuals' blood pressures, or cases' vs controls' genotypes). Past experience is not an automatic prediction of future risk. Future mutations are not predicable, not even in principle. Future environments and lifestyles, including major climatic dislocations, wars, epidemics and the like are not predictable, not even in principle. Future somatic mutations are not predictable, not even in principle.

GWAS almost uniformly have found (1) different mapping results in different samples or populations, (2) only a fraction of heritability is accounted for by tens, hundreds, or even thousands of genome locations and (3) even relatively replicable 'major' contributors, themselves usually (though not always) small in their absolute effect, have widely varying risk effects among samples.

These facts are all entirely expectable based on evolutionary considerations, and they have long been known, both in principle, indirectly, and from detailed mapping of complex traits. There are other well-known reasons why, based on evolutionary considerations, among other things, this kind of picture should be expected. They involve the blatantly obvious redundancy in genetic causation, which is the result of the origin of genes by duplication and the highly complex pathways to our traits, among other things. We've written about them here in the past. So, given what we now know, more of this kind of Big Data is a hoax, and as such, a drain on public resources and, perhaps worse, on the public trust in science.

What 'omnigenic' might really mean is interesting. It could mean that we're pressing up ever more intensely against the log-jam of understanding based on an enumerative gestalt about genetics. Ever more detail, always promising that if we just enumerate and catalog just a bit (in this case, the authors say we need to study gene regulatory networks) more we'll understand. But that is a failure to ask the right question: why and how could every trait be affected by every part of the genome? Until someone starts looking at the deeper mysteries we've been identifying, we won't have the transormative insight that seems to be called for, in my view.

To use Kuhn's term, this really is normal science pressing up against a conceptual barrier, in my view. The authors work the details, but there's scant hint they recognize we need something more than more of the same. What is called for, I think is young people who haven't already been propagandized about the current way of thinking, the current grantsmanship path to careers.

Perhaps more importantly, I think the situation is at present an especially cruel hoax, because there are real health problems, and real, tragic, truly genetic diseases that a major shift in public funding could enable real science to address.

Thursday, April 20, 2017

Some genetic non-sense about nonsense genes

By Ken Weiss

The April 12 issue of Nature has a research report and a main article about what is basically presented as the discovery that people typically carry doubly knocked-out genes, but show no effect. The idea as presented in the editorial (p 171) notes that the report (p235) uses an inbred population to isolate double knockout genes (that is, recessive homozygous null mutations), and look at their effects. The population sampled, from Pakistan, has high levels of consanguineous marriages. The criteria for a knockout mutation was based on the protein coding sequence.

We have no reason to question the technical accuracy of the papers, nor their relevance to biomedical and other genetics, but there are reasons to assert that this is nothing newly discovered, and that the story misses the really central point that should, I think, be undermining the expensive Big Data/GWAS approach to biological causation.

First, for some years now there have been reports of samples of individual humans (perhaps also of yeast, but I can't recall specifically) in which both copies of a gene appear to be inactivated. The criteria for saying so are generally indirect, based on nonsense, frameshift, or splice-site mutations in the protein code. That is, there are other aspects of coding regions that may be relevant to whether this is a truly thorough search to see that whatever is coded really is non-functional. The authors mention some of these. But, basically, costly as it is, this is science on the cheap because it clearly only addresses some aspects of gene functionality. It would obviously be almost impossible to show either that the gene was never expressed or never worked. For our purposes here, we need not question the finding itself. The fact that this is not a first discovery does raise the question why a journal like Nature is so desperate for Dramatic Finding stories, since this one really should be instead a report in one of many specialty human genetics journals.

Secondly, there are causes other than coding mutations for gene inactivation. They have to do with regulatory sequences, and inactivating mutations in that part of a gene's functional structure is much more difficult, if not impossible, to detect with any completeness. A gene's coding sequence itself may seem fine, but its regulatory sequences may simply not enable it to be expressed. Gene regulation depends on epigenetic DNA modification as well as multiple transcription factor binding sites, as well as the functional aspects of the many proteins required to activate a gene, and other aspects of the local DNA environment (such as RNA editing or RNA interference). The point here is that there are likely to be many other instances of people with complete or effectively complete double knockouts of genes.

Thirdly, the assertion that these double KOs have no effect depends on various assumptions. Mainly, it assumes that the sampled individuals will not, in the future, experience the otherwise-expected phenotypic effects of their defunct genes. Effects may depend on age, sex, and environmental effects rather than necessarily being a congenital yes/no functional effect.

Fourthly, there may be many coding mutations that make the protein non-functional, but these are ignored by this sort of study because they aren't clear knockout mutations, yet they are in whatever data are used for comparison of phenotypic outcomes. There are post-translational modification, RNA editing, RNA modification, and other aspects of a 'gene' that this is not picking up.

Fifthly, and by far most important, I think, is that this is the tip of the iceberg of redundancy in genetic functions. In that sense, the current paper is a kind of factoid that reflects what GWAS has been showing in great, if implicit, detail for a long time: there is great complexity and redundancy in biological functions. Individual mapped genes typically affect trait values or disease risks only slightly. Different combinations of variants at tens, hundreds, or even thousands of genome sites can yield essentially the same phenotype (and here we ignore the environment which makes things even more causally blurred).

Sixthly, other samples and certainly other populations, as well as individuals within the Pakistani data base, surely carry various aspects of redundant pathways, from plenty of them to none. Indeed, the inbreeding that was used in this study obviously affects the rest of the genome, and there's no particular way to know in what way, or more importantly, in which individuals. The authors found a number of basically trivial or no-effect results as it is, even after their hunt across the genome. Whether some individuals had an attributable effect of a particular double knockout is problematic at best. Every sample, even of the same population, and certainly of other populations, will have different background genotypes (homozygous or not), so this is largely a fishing expedition in a particular pond that cannot seriously be extrapolated to other samples.

Finally, this study cannot address the effect of somatic mutation on phenotypes and their risk of occurrence. Who knows how many local tissues have experienced double-knockout mutations and produced (or not produced) some disease or other phenotype outcome. Constitutive genome sequencing cannot detect this. Surely we should know this very inconvenient fact by now!

Given the well-documented and pervasive biological redundancy, it is not any sort of surprise that some genes can be non-functional and the individual phenotypically within a viable, normal range. Not only is this not a surprise, especially by now in the history of genetics, but its most important implication is that our Big Data genetic reductionistic experiment has been very successful! It has, or should have, shown us that we are not going to be getting our money's worth from that approach. It will yield some predictions in the sense of retrospective data fitting to case-control or other GWAS-like samples, and it will be trumpeted as a Big Success, but such findings, even if wholly correct, cannot yield reliable true predictions of future risk.

Does environment, by any chance, affect the studied traits? We have, in principle, no way to know what environmental exposures (or somatic mutations) will be like. The by now very well documented leaf-litter of rare and/or small-effect variants plagues GWAS for practical statistical reasons (and is why usually only a fraction of heritability is accounted for). Naturally, finding a single doubly inactivated gene may, but by no means need, yield reliable trait predictions.

By now, we know of many individual genes whose coded function is so proximate or central to some trait that mutations in such genes can have predictable effects. This is the case with many of the classical 'Mendelian' disorders and traits that we've known for decades. Molecular methods have admirably identified the gene and mutations in it whose effects are understandable in functional terms (for example, because the mutation destroys a key aspect of a coded protein's function). Examples are Huntington's disease, PKU, cystic fibrosis, and many others.

However, these are at best the exceptions that lured us to think that even more complex, often late-onset traits would be mappable so that we could parlay massive investment in computerized data sets into solid predictions and identify the 'druggable' genes-for that Big Pharma could target. This was predictably an illusion, as some of us were saying long ago and for the right reasons. Everyone should know better now, and this paper just reinforces the point, to the extent that one can assert that it's the political economic aspects of science funding, science careers, and hungry publications, and not the science itself, that leads to the persistence of drives to continue or expand the same methods anyway. Naturally (or should one say reflexively?), the authors advocate a huge Human Knockout Project to study every gene--today's reflex Big Data proposal.**

Instead, it's clearly time to recognize the relative futility of this, and change gears to more focused problems that might actually punch their weight in real genetic solutions!

** [NOTE added in a revision. We should have a wealth of data by now, from many different inbred mouse and other animal strains, and from specific knockout experiments in such animals, to know that the findings of the Pakistani family paper are to be expected. About 1/4 to 1/3 of knockout experiments in mice have no effect or not the same effect as in humans, or have no or different effect in other inbred mouse strains. How many times do we have to learn the same lesson? Indeed, with existing genomewide sequence databases from many species, one can search for 2KO'ed genes. We don't really need a new megaproject to have lots of comparable data.]

Comments

We always welcome comments, but we moderate them to reduce spam, gratuitous unkindness and so forth. Because we moderate comments, they won't appear on the blog until one of us publishes them, but we try to do that in a timely way.

We've had to make a change to the commenting page. People had told us that Blogger was eating their comments, so now, rather than embedding comment editing with the posts, it has to be done on a separate, full page. Unfortunately, the 'reply' option has disappeared so comments will just follow one another. We'll see how this goes.

FINALLY, please note that we assert copyright for the material written by us that appears here

Link List

Friday, October 19, 2018

Tuesday, October 16, 2018

Saturday, October 6, 2018

Wednesday, August 15, 2018

Tuesday, August 14, 2018

Thursday, June 14, 2018

Thursday, April 26, 2018

Sunday, October 15, 2017

Saturday, June 17, 2017

Thursday, April 20, 2017