My composer friend wants to be quite modern about creating beautiful music. He doesn't like to use computer programs for composing but he has devised another 'modern' way to compose, given that, in writing a piece, he often changes his mind. Scratching out notes on paper to replace them with 'better' ones makes for a real mess on the working pages, and he'd then have to transcribe his work onto new pages, and that in itself introduces room for mistakes. So he had an idea.
He purchased a set of notes and musical symbols, printed individually on a kind of flexible plastic. Copies of each possible note and notation element were in boxes in a little tray. As he composed, he merely took each required note from its place in the tray, and used its static electricity to place it on a page with printed staff-lines. If he changed his mind, it was easy to remove or replace a given note, and put it back in its box in the tray without generating an inky mess on the page and having to keep starting over to make his work-in-progress legible.
But there turned out to be a serious, indeed even tragic, problem. He liked working in his studio, right in front of a window giving him an inspiring view of his garden. But, after days of work composing a comparably ethereal and beautiful piece, a gust blew through the window, riffled the pages, and shook all the notes off the page and onto the table! What a scattered mess! And what a heartbreaking loss of all that work!
Of course, you could say that the composition with all its beauty was in some sense still there, right before him: all the required notes were indeed still there--every one. But they were in a pile, no longer with any order from which he could reconstruct the composition just by picking the notes up and placing them back on the page. So, it was literally all there---but none of what mattered was!
As my composer friend told me this story, it occurred to me that this was analogous to the 'pile' of DNA letters (As, Cs, Gs, and Ts) that is found by sequencing people with and without some trait, like a disease. The letters differ greatly among individuals with the 'same' trait, because they don't have the trait for the same genetic reason. And the sampled individuals' genomes vary in literally countless ways that have nothing to do with the disease. Unlike the score, the 'letters' are still in their original order, but genes don't make a score as far as we are concerned because, unlike an orchestra, we don't know how to 'play' them!
In a sense, each person we see who is playing the same tune, so to speak, is doing so from a different score. Some shared notes may be involved, but they are all jumbled up with shared, and not-shared, notes that have nothing to do with the tune.
And yet we are widely promised, and widely being trephined to pay for, the idea that looking through the jumble of genetic 'notes' we can predict just about anything you can name about each individual's traits.
Indeed, unlike the composer's problem, there are all sorts of notes that are not even visible to us (they are called 'somatic mutations'). We yearn for a health-giving genomic 'tune', which is a very natural way to feel, but we are unable (or, at least, unwilling) to face the music of genomic reality.
And, of course, this mega-scale 'omics 'research' is all justified with great vigor by NIH, as if it is on the very verge of discovering fundamental findings that will lead to miraculous cures, indeed cures for 'All of us'. At what point is it justified to refer to it as a kind of culpable fraud, a public con job?
By our bigger, bigger, bigger approach, we have entrenched 'composers' trying to read scores that are to a great extent unreadable in the way being attempted. We are so intense at this, like rows of monks transcribing sacred manuscripts in a remote monastery, that we are committed to something that we basically have every legitimate good reason to know isn't the way things are.
Showing posts with label disease prediction. Show all posts
Showing posts with label disease prediction. Show all posts
Friday, May 10, 2019
Thursday, November 5, 2015
Red meat makes a good, scary cancer story....but is it?
By
Ken Weiss
It's off again, on again: don't eat processed meat, don't eat red meat, or you'll get colon cancer!! Eat fish (well, unless it has mercury) or chicken (unless it has salmonella), or 'the other white meat': pork (remember the billboards?). They're safe!
A few years ago we seemed to have been given some relief when stories suggested that red meat (beef) was OK after all (of course, the lives of the cows were awful, and eating beef meant you doped up on antibiotics, but at least it didn't give you colon cancer).
Recently, a statement (now apparently offline) released by the International Agency for Cancer Research, a part of the World Health Organization, asserts that eating processed meat and red meat, 'causes' cancer. Actually, the report was a bit more nuanced than the headlines, but journalists have to make a living, no?
In response to strong backlash, the WHO quickly was forced to 'clarify' their clarion call to vegetarianism -- here's a link to their Q&A on the subject. They now acknowledge, or 'clarify' that what they had done was simply add the meats to a list of known nasties, that cause cancer. Putting meat on a causal list is one thing, but dishing it out to the media is another, and a rather irresponsible way to play for publicity (of course, if the news media made an exception and actually did their job of being skeptics, this wouldn't have unwound as it did).
In any case, the bottom line was basically that even two strips of bacon a day increases your colon cancer rate by 18%. That sounds like a whopping and terrifying difference! The WHO put this in the same carcinogenic-substance category as asbestos and tobacco. As they quickly clarified, that is in a sense a warning list, but the 18% figure is what got in the news and may have, at least temporarily, slammed the bacon and hamburger industry, if anybody still listens to the daily Big Warnings. However, let's hold all cynicism for the moment, hard as that is to do, and look a bit more closely at was said.
First, there seems little doubt that processed meats 'cause' cancer. That doesn't mean an innocent-looking strip of bacon will give you cancer. Instead, what it means is that various high quality studies have found a dose-response pattern in which higher or longer exposure levels earlier in life are associated with higher cancer incidence later on. We know that correlation is not the same as causation, and that lifestyle factors are highly correlated. Thus, for example, those in dire poverty don't eat tons of processed meat, and those who eat less salami also eat more brussels sprouts, take vitamins, don't smoke, lay off the double gin tonics.....and of course, go to the Ashram regularly to get your mind off the bacon you didn't eat at breakfast and the aftertaste of your dinner's brussels sprouts, and say a mantra to stay calm after you've given up everything that's fun.
Now, in the west, the lifetime risk of colon cancer is about 5%. That means that if you tote up the probability of having cancer at age 40, 45, 50, .... 100, if the 18% figure is credible, it means that risk is about that much higher in those who dose up on pastrami and burgers. Actually, this was the estimate based on eating 2-strips of bacon or the equivalent every day. Of course, by far most of these cancers occur in older people (over the age of 60, say). That means that the risk figures mainly apply to you if you live to old age, and of those who die earlier of other things their actual risk turned out to be zero--they enjoyed their visits to McD's and the deli! That's why smoking is, in a literal epidemiological sense, a preventive relative to colon cancer (smoking will kill you of something else first). There's no joking about cancer, but the basic idea is that for those who lived long enough, about 5% get colon cancer at some age. Actually, while we don't know about meat-eating habits, but risks have been declining in recent years in developing countries (and, I think, increasing in other countries as they westernize).
Eat meat and lower your risk!
At a baseline of 5%, an 18% increase means a lifetime risk of about 6%. Now if you hog up even more, your risk will go higher, perhaps much higher. But wait a minute. How many people actually dish up so heavily on processed meat (including steak and burger)? Surely some do. In fact, we don't know exactly where the lifetime risk estimate of 5% comes from; if from a population sample, then it wouldn't have regressed out meat-eating, and the figure would already include meat-eaters. However, let's ignore all these potential confounding or confusing issues and just consider the 18% figure on its own, as a given, as risk differences between abstainers and sausage gluttons.
Now in modern countries with health care systems, one routine health-care procedure is regular colonoscopy in older adults. There was a recent estimate that regular colonoscopy can prevent about 53% of colon cancers; the reason is that precancerous polyps are found and excised so they can't transform into cancer. Actually, you can find even more dramatic estimates of the preventive effectiveness if you scan at the web. Likewise, you'll find many other lifestyle factors widely cited as having protective effects, including exercise, vitamins, eating vegetables, and the like.
Let's just do a bit of back-of-the-envelope numerology to make the point that if you're a bacon hog but have regular screening, get your exercise and all that, and you reduce your meat-elevated risk by 50%, then your net risk is around 3%, about half the 'average' of 5%. One can surmise that if you stop your bacon fix, but then figure you're fine and don't do the other preventives, many of which are likely to be wanting in the meat-hog's normal lifestyle, then the actual effect of your 'healthy' baconless diet change will be to increase your cancer risk!
This is a lesson in complex causation and oversimplified news stories. Processed meat may be a risk factor for colon cancer, but throwing irresponsibly simplified figures like raw meat to the news media leads to worse, rather than better information for the public.
So, as Hippocrates said, moderation in all things. Eat your reuben (OK, yes, along with some broccoli). But go one better than Hippocrates: get scoped!
A few years ago we seemed to have been given some relief when stories suggested that red meat (beef) was OK after all (of course, the lives of the cows were awful, and eating beef meant you doped up on antibiotics, but at least it didn't give you colon cancer).
Recently, a statement (now apparently offline) released by the International Agency for Cancer Research, a part of the World Health Organization, asserts that eating processed meat and red meat, 'causes' cancer. Actually, the report was a bit more nuanced than the headlines, but journalists have to make a living, no?
![]() |
| Bacon, Stock photo |
In response to strong backlash, the WHO quickly was forced to 'clarify' their clarion call to vegetarianism -- here's a link to their Q&A on the subject. They now acknowledge, or 'clarify' that what they had done was simply add the meats to a list of known nasties, that cause cancer. Putting meat on a causal list is one thing, but dishing it out to the media is another, and a rather irresponsible way to play for publicity (of course, if the news media made an exception and actually did their job of being skeptics, this wouldn't have unwound as it did).
In any case, the bottom line was basically that even two strips of bacon a day increases your colon cancer rate by 18%. That sounds like a whopping and terrifying difference! The WHO put this in the same carcinogenic-substance category as asbestos and tobacco. As they quickly clarified, that is in a sense a warning list, but the 18% figure is what got in the news and may have, at least temporarily, slammed the bacon and hamburger industry, if anybody still listens to the daily Big Warnings. However, let's hold all cynicism for the moment, hard as that is to do, and look a bit more closely at was said.
First, there seems little doubt that processed meats 'cause' cancer. That doesn't mean an innocent-looking strip of bacon will give you cancer. Instead, what it means is that various high quality studies have found a dose-response pattern in which higher or longer exposure levels earlier in life are associated with higher cancer incidence later on. We know that correlation is not the same as causation, and that lifestyle factors are highly correlated. Thus, for example, those in dire poverty don't eat tons of processed meat, and those who eat less salami also eat more brussels sprouts, take vitamins, don't smoke, lay off the double gin tonics.....and of course, go to the Ashram regularly to get your mind off the bacon you didn't eat at breakfast and the aftertaste of your dinner's brussels sprouts, and say a mantra to stay calm after you've given up everything that's fun.
Now, in the west, the lifetime risk of colon cancer is about 5%. That means that if you tote up the probability of having cancer at age 40, 45, 50, .... 100, if the 18% figure is credible, it means that risk is about that much higher in those who dose up on pastrami and burgers. Actually, this was the estimate based on eating 2-strips of bacon or the equivalent every day. Of course, by far most of these cancers occur in older people (over the age of 60, say). That means that the risk figures mainly apply to you if you live to old age, and of those who die earlier of other things their actual risk turned out to be zero--they enjoyed their visits to McD's and the deli! That's why smoking is, in a literal epidemiological sense, a preventive relative to colon cancer (smoking will kill you of something else first). There's no joking about cancer, but the basic idea is that for those who lived long enough, about 5% get colon cancer at some age. Actually, while we don't know about meat-eating habits, but risks have been declining in recent years in developing countries (and, I think, increasing in other countries as they westernize).
Eat meat and lower your risk!
At a baseline of 5%, an 18% increase means a lifetime risk of about 6%. Now if you hog up even more, your risk will go higher, perhaps much higher. But wait a minute. How many people actually dish up so heavily on processed meat (including steak and burger)? Surely some do. In fact, we don't know exactly where the lifetime risk estimate of 5% comes from; if from a population sample, then it wouldn't have regressed out meat-eating, and the figure would already include meat-eaters. However, let's ignore all these potential confounding or confusing issues and just consider the 18% figure on its own, as a given, as risk differences between abstainers and sausage gluttons.
Now in modern countries with health care systems, one routine health-care procedure is regular colonoscopy in older adults. There was a recent estimate that regular colonoscopy can prevent about 53% of colon cancers; the reason is that precancerous polyps are found and excised so they can't transform into cancer. Actually, you can find even more dramatic estimates of the preventive effectiveness if you scan at the web. Likewise, you'll find many other lifestyle factors widely cited as having protective effects, including exercise, vitamins, eating vegetables, and the like.
Let's just do a bit of back-of-the-envelope numerology to make the point that if you're a bacon hog but have regular screening, get your exercise and all that, and you reduce your meat-elevated risk by 50%, then your net risk is around 3%, about half the 'average' of 5%. One can surmise that if you stop your bacon fix, but then figure you're fine and don't do the other preventives, many of which are likely to be wanting in the meat-hog's normal lifestyle, then the actual effect of your 'healthy' baconless diet change will be to increase your cancer risk!
This is a lesson in complex causation and oversimplified news stories. Processed meat may be a risk factor for colon cancer, but throwing irresponsibly simplified figures like raw meat to the news media leads to worse, rather than better information for the public.
So, as Hippocrates said, moderation in all things. Eat your reuben (OK, yes, along with some broccoli). But go one better than Hippocrates: get scoped!
Thursday, October 22, 2015
My grandmother's dementia and me
My father's mother had Alzheimer's disease, or dementia of some sort, as did her sister. Both lived with us at different times when I was a child, my great-aunt until she died in the bedroom upstairs, and my grandmother until she was impossible for my parents to care for, at which time they found a very kind, very patient woman with a big house in the country, and she went to live there.
These two sisters, the only children in their family, were always close. They both worked all their lives, and were extremely competent and very kind. My great-aunt never married; her fiancé had gone off to fight in the Spanish-American war, but died during an outbreak of yellow fever in Florida before he ever got to Cuba. But, she lived with a cousin for many years. When my parents finally cleaned out the apartment after my great aunt died, one of the things they found in the attic was a skull that must have once been used for teaching anatomy. No one had any clue how it ended up in that attic. My parents have displayed in their living room for most of my life. My mother's theory, after years of living with it, is that this is the skull of a poor man who was suffering from an abscessed tooth, and he shot himself in the head because he couldn't stand the pain. Here's a sketch.
My grandmother married and had one child, my father. My grandparents, my great-aunt and her cousin all lived perhaps half an hour from us, in the town where my father had grown up, and my grandfather drove them all to visit us on Sunday afternoons. He loved driving -- he enjoyed taking my sisters and me for drives in the country. What I remember most about these drives was the overwhelming odor of his strong cigars. (He used to enjoy shooting woodchucks, too, happy to be doing farmers such a favor. I remember going with him and my grandmother once on such an outing, but I refused to take a shot, which disappointed him. He would steady his gun on the roof of the car, aim and shoot. He draped the one woodchuck he killed the day I was with him over the gate into the field he'd shot it in, so that the farmer would take note. One Sunday when they came to visit, there was a bullet hole in the roof of the car, over the passenger side -- I don't remember that that was ever explained.)
Dementia does unpredictable things to people. My great-aunt -- Aunt, we called her, as my father had -- was always cheerful and sweet, if a bit confused. Every morning she would ask where she was, but she was still able to play cribbage with us, she loved having us comb her long thin hair, past grey, now yellowed, and pin it into a bun. I don't remember that she ever fussed about anything.
My grandmother, on the other hand, was distraught with worry from the moment she woke, to the moment she went to bed, and probably long after that. She would sit at the kitchen table all day every day, every few minutes asking the same worried questions in the same frantic way. She was miserable. Occasionally she was able to access a part of her brain that reminded her that she was confused, and that made things even worse.
Apart from being two different versions of the same heart wrenching story that could be told by so many people, this raises several questions. Was this two sisters with very different forms of the same disease? Or, did they have two different diseases?
And, did the fact that both his mother and his aunt had dementia mean that my father was at higher risk of dementia himself? Apparently not, as he is now in his late 80's, still very active, very engaged, mentally and even physically. In turn, does this mean that my sisters and I don't have to worry about dementia ourselves?
Or is it secular trends in Alzheimer's disease that we should pay attention to?
One measure of a condition's impact is its prevalence. That is the fraction of the population at a given point in time that is affected. A recent BBC Radio 4 program, More or Less, discussed changes in Alzheimer's prevalence over time, after a paper reporting (among many other things) decreased prevalence of dementia in the UK was published in The Lancet ("Global, regional, and national disability-adjusted life years (DALYs) for 306 diseases and injuries and healthy life expectancy (HALE) for 188 countries, 1990–2013: quantifying the epidemiological transition," Murray et al.). According to the study, prevalence of dementia in British people over age 65 has declined by more than 20% in the last 20 years; it's currently about 7 percent of that segment of the population.
This is in striking contrast to a recent report in the UK that estimates that 1/3 -- 33%!-- of the British children born in 2015 will have dementia in later life. Tim Harford, presenter of More or Less, pointed out, though, that it's odd that this number was taken seriously by anyone, given that it is equivalent to thinking that predictions made 100 years ago, when AIDS wasn't known, antibiotics not yet discovered, and so on, would have any credibility. And, the 1/3 estimate was based on 20 year old data. (A quick check of prevalence of dementia in the UK is a bit confusing -- many sites caution that the number of people with Alzheimer's disease is rising rapidly. It's an Alzheimer's time bomb, they warn. But, given that the population is both aging and increasing, this isn't, in itself, a surprise, or very meaningful in relation to individual biological risk because, again, it's the fraction of the population that is affected that is the significant statistic. To be clearer, if more people live longer, even the same age-specific risk of getting a disease will lead to more people with the disease, that is, higher prevalence in the population. Of course, the number of affected individuals is relevant to the health care burden.)
How predictable is dementia?
Carol Brayne, one of hundreds of authors on the Lancet report and interviewed for More or Less, speculates that the reported fall in prevalence has to do with changes in 'vascular health', as incidence of heart attacks and stroke have fallen as well. She suggests that it seems as though the things we have been doing in western countries to prevent cardiovascular disease have been working.
But of course this assumes we know the cause of dementia, and that it's in some sense a cardiovascular disease. But, we don't understand the cause nearly well enough to say this, and in fact, like most chronic diseases, dementia is many different conditions, with many different causes.
The genetic causal factors related to Alzheimer's disease include mutations in a few genes, but these account for only a fraction of cases. Mutations in the two presenillin genes can lead to early onset Alzheimer's. The most commonly discussed genetic risk factor has to do with the E4 allele in the ApoE gene, whose physiology is related to fat transport in the blood. It seems to be associated with the development of plaque in brains of people with late onset (60s and over) Alzheimer's, but the association is complex, people without the E4 allele also develop plaque, and people with plaque may not have dementia, and the causal mechanisms are unclear. Risk seems to depend on whether one carries one or two copies of the E4 allele, and seems to be higher for women than for men, and is apparently affected by environmental factors, but it does seem to raise risk from something like 10-15% in people over 80 to 30-50%.
What this means, even if the statistics were reliable, the risk estimates stable, and environmental contributions minimal, is that it is obvious that even having two copies of the risk allele is not a guarantee of Alzheimer's disease. And, in some populations having two copies isn't associated with Alzheimer's at all (Nigeria, e.g.). In addition, while the association with increased risk has long been described, the physiology is still not understood. GWAS have reported other genetic risk factors, but not nearly as consistently as ApoE4, nor as strong.
The reported decline in dementia prevalence is not new; we blogged in 2013 about dramatically decreasing rates in the UK, as well as in Denmark, as reported by Gina Kolata then. So, how can it be declining rapidly, but the strongest risk factor we know of is genetic -- and the frequency of this variant is not changing enough to even begin to account for the data? Or, is Carol Brayne right that dementia is a vascular disease, and vascular diseases are on the decline, so Alzheimer's is, too?
Indeed, even the definition of whether you 'have' Alzheimer's or not is changeable and not precise, and researchers don't even agree on what an Alzheimer's brain looks like. A good discussion of these various factors, including social and economic aspects and the history of studies of Alzheimer's, is a book The Alzheimer Conundrum, by Margaret Lock, a fine medical anthropologist at McGill in Canada (and friend of ours).
Can Alzheimer's be prevented?
The causes of Alzheimer's disease are so poorly understood that it's said that the best prevention is to exercise, quit smoking and maintain a social life. Very generic advice that could apply to a lot of things! If we don't know what causes it, and there are probably environmental risk factors, which we don't really understand, relevant past environmental agents are unknown, future environments impossible to predict, and genetic risk factors not good predictors, then we certainly don't know how to predict population prevalence rates, not to mention who is most likely to develop the disease. (NB: this is pertinent to late-onset dementia; early-onset is more likely to have a genetic cause, and is thus more likely to be predictable.)
Given the experience of two generations in my family, should I or shouldn't I worry about developing dementia? If my grandmother and great-aunt had the ApoE4 risk allele, my father may or may not, and my sisters and I may or may not. If they did and my father does, it's a good example of an allele with "incomplete penetrance," for which either genetic background or environmental risk factors or both are also necessary. Which makes predicting dementia difficult, whether or not we were to have the risk allele. If they didn't have it, something else caused their dementia, and we have no idea what that was. Indeed, they were both social, never smoked, and walked to work for decades.
To me, as to most people, dementia is frightening. But, obviously, my family history is useless in terms of determining my risk -- my grandmother had it, my father doesn't.
Still, every time I forget someone's name, I think of my grandmother.
These two sisters, the only children in their family, were always close. They both worked all their lives, and were extremely competent and very kind. My great-aunt never married; her fiancé had gone off to fight in the Spanish-American war, but died during an outbreak of yellow fever in Florida before he ever got to Cuba. But, she lived with a cousin for many years. When my parents finally cleaned out the apartment after my great aunt died, one of the things they found in the attic was a skull that must have once been used for teaching anatomy. No one had any clue how it ended up in that attic. My parents have displayed in their living room for most of my life. My mother's theory, after years of living with it, is that this is the skull of a poor man who was suffering from an abscessed tooth, and he shot himself in the head because he couldn't stand the pain. Here's a sketch.
![]() |
| Sketch by A Buchanan |
My grandmother married and had one child, my father. My grandparents, my great-aunt and her cousin all lived perhaps half an hour from us, in the town where my father had grown up, and my grandfather drove them all to visit us on Sunday afternoons. He loved driving -- he enjoyed taking my sisters and me for drives in the country. What I remember most about these drives was the overwhelming odor of his strong cigars. (He used to enjoy shooting woodchucks, too, happy to be doing farmers such a favor. I remember going with him and my grandmother once on such an outing, but I refused to take a shot, which disappointed him. He would steady his gun on the roof of the car, aim and shoot. He draped the one woodchuck he killed the day I was with him over the gate into the field he'd shot it in, so that the farmer would take note. One Sunday when they came to visit, there was a bullet hole in the roof of the car, over the passenger side -- I don't remember that that was ever explained.)
Dementia does unpredictable things to people. My great-aunt -- Aunt, we called her, as my father had -- was always cheerful and sweet, if a bit confused. Every morning she would ask where she was, but she was still able to play cribbage with us, she loved having us comb her long thin hair, past grey, now yellowed, and pin it into a bun. I don't remember that she ever fussed about anything.
My grandmother, on the other hand, was distraught with worry from the moment she woke, to the moment she went to bed, and probably long after that. She would sit at the kitchen table all day every day, every few minutes asking the same worried questions in the same frantic way. She was miserable. Occasionally she was able to access a part of her brain that reminded her that she was confused, and that made things even worse.
Apart from being two different versions of the same heart wrenching story that could be told by so many people, this raises several questions. Was this two sisters with very different forms of the same disease? Or, did they have two different diseases?
And, did the fact that both his mother and his aunt had dementia mean that my father was at higher risk of dementia himself? Apparently not, as he is now in his late 80's, still very active, very engaged, mentally and even physically. In turn, does this mean that my sisters and I don't have to worry about dementia ourselves?
Or is it secular trends in Alzheimer's disease that we should pay attention to?
One measure of a condition's impact is its prevalence. That is the fraction of the population at a given point in time that is affected. A recent BBC Radio 4 program, More or Less, discussed changes in Alzheimer's prevalence over time, after a paper reporting (among many other things) decreased prevalence of dementia in the UK was published in The Lancet ("Global, regional, and national disability-adjusted life years (DALYs) for 306 diseases and injuries and healthy life expectancy (HALE) for 188 countries, 1990–2013: quantifying the epidemiological transition," Murray et al.). According to the study, prevalence of dementia in British people over age 65 has declined by more than 20% in the last 20 years; it's currently about 7 percent of that segment of the population.
This is in striking contrast to a recent report in the UK that estimates that 1/3 -- 33%!-- of the British children born in 2015 will have dementia in later life. Tim Harford, presenter of More or Less, pointed out, though, that it's odd that this number was taken seriously by anyone, given that it is equivalent to thinking that predictions made 100 years ago, when AIDS wasn't known, antibiotics not yet discovered, and so on, would have any credibility. And, the 1/3 estimate was based on 20 year old data. (A quick check of prevalence of dementia in the UK is a bit confusing -- many sites caution that the number of people with Alzheimer's disease is rising rapidly. It's an Alzheimer's time bomb, they warn. But, given that the population is both aging and increasing, this isn't, in itself, a surprise, or very meaningful in relation to individual biological risk because, again, it's the fraction of the population that is affected that is the significant statistic. To be clearer, if more people live longer, even the same age-specific risk of getting a disease will lead to more people with the disease, that is, higher prevalence in the population. Of course, the number of affected individuals is relevant to the health care burden.)
How predictable is dementia?
Carol Brayne, one of hundreds of authors on the Lancet report and interviewed for More or Less, speculates that the reported fall in prevalence has to do with changes in 'vascular health', as incidence of heart attacks and stroke have fallen as well. She suggests that it seems as though the things we have been doing in western countries to prevent cardiovascular disease have been working.
But of course this assumes we know the cause of dementia, and that it's in some sense a cardiovascular disease. But, we don't understand the cause nearly well enough to say this, and in fact, like most chronic diseases, dementia is many different conditions, with many different causes.
The genetic causal factors related to Alzheimer's disease include mutations in a few genes, but these account for only a fraction of cases. Mutations in the two presenillin genes can lead to early onset Alzheimer's. The most commonly discussed genetic risk factor has to do with the E4 allele in the ApoE gene, whose physiology is related to fat transport in the blood. It seems to be associated with the development of plaque in brains of people with late onset (60s and over) Alzheimer's, but the association is complex, people without the E4 allele also develop plaque, and people with plaque may not have dementia, and the causal mechanisms are unclear. Risk seems to depend on whether one carries one or two copies of the E4 allele, and seems to be higher for women than for men, and is apparently affected by environmental factors, but it does seem to raise risk from something like 10-15% in people over 80 to 30-50%.
What this means, even if the statistics were reliable, the risk estimates stable, and environmental contributions minimal, is that it is obvious that even having two copies of the risk allele is not a guarantee of Alzheimer's disease. And, in some populations having two copies isn't associated with Alzheimer's at all (Nigeria, e.g.). In addition, while the association with increased risk has long been described, the physiology is still not understood. GWAS have reported other genetic risk factors, but not nearly as consistently as ApoE4, nor as strong.
The reported decline in dementia prevalence is not new; we blogged in 2013 about dramatically decreasing rates in the UK, as well as in Denmark, as reported by Gina Kolata then. So, how can it be declining rapidly, but the strongest risk factor we know of is genetic -- and the frequency of this variant is not changing enough to even begin to account for the data? Or, is Carol Brayne right that dementia is a vascular disease, and vascular diseases are on the decline, so Alzheimer's is, too?
Indeed, even the definition of whether you 'have' Alzheimer's or not is changeable and not precise, and researchers don't even agree on what an Alzheimer's brain looks like. A good discussion of these various factors, including social and economic aspects and the history of studies of Alzheimer's, is a book The Alzheimer Conundrum, by Margaret Lock, a fine medical anthropologist at McGill in Canada (and friend of ours).
Can Alzheimer's be prevented?
The causes of Alzheimer's disease are so poorly understood that it's said that the best prevention is to exercise, quit smoking and maintain a social life. Very generic advice that could apply to a lot of things! If we don't know what causes it, and there are probably environmental risk factors, which we don't really understand, relevant past environmental agents are unknown, future environments impossible to predict, and genetic risk factors not good predictors, then we certainly don't know how to predict population prevalence rates, not to mention who is most likely to develop the disease. (NB: this is pertinent to late-onset dementia; early-onset is more likely to have a genetic cause, and is thus more likely to be predictable.)
Given the experience of two generations in my family, should I or shouldn't I worry about developing dementia? If my grandmother and great-aunt had the ApoE4 risk allele, my father may or may not, and my sisters and I may or may not. If they did and my father does, it's a good example of an allele with "incomplete penetrance," for which either genetic background or environmental risk factors or both are also necessary. Which makes predicting dementia difficult, whether or not we were to have the risk allele. If they didn't have it, something else caused their dementia, and we have no idea what that was. Indeed, they were both social, never smoked, and walked to work for decades.
To me, as to most people, dementia is frightening. But, obviously, my family history is useless in terms of determining my risk -- my grandmother had it, my father doesn't.
Still, every time I forget someone's name, I think of my grandmother.
Thursday, March 12, 2015
Simulating complexity and predicting the future
Predicting complex disease is the latest genomics flavor of the day. Or rather, it's the old flavor with a new name -- precision medicine. So, we were pleased to be alerted to a new paper (H/T Peter Tennant and Mel Bartley; "The mathematical limits of genetic prediction for complex chronic disease," Keyes et al., Journal of Epidemiology and Community Health) that addresses the prediction question by simulating a lot of data to look at how plausible it will be to predict complex disease given the wealth of potentially interacting risk factors that will need to be taken into account . This question of course is particularly timely, given the new million genomes, precision medicine effort proposed by President Obama and endorsed by the head of the NIH, and many others.
A few weeks ago, Ken blogged about the advantages of using computer simulation to probe causal connections in genetics and epidemiology (here and here). Simulations can be valuable because they allow exploration of complexity with known assumptions built in explicitly and hence testably, and because there are no data or measurement errors (unless introduced intentionally and then they're still identifiable). If the results resemble real data, then one has confidence in the assumptions. If not, conditions can be changed to explore why not. Also, things far too complex to be affordably tested in the real world can be simulated, and simulation is fast and inexpensive, as a way to explore the nature of causation in a given context.
Keyes et al. simulate one million populations, with 10,000 individuals per population, to explore the question of how possible, given epistasis (gene interaction), and gene x environment interaction, predicting complex diseases will be. They point out that while genetic and epidemiological studies have been useful for finding correlations between risk factors and disease, they've been less useful for predicting which individuals will develop a given disease.
And, they point out that genome wide association studies (GWAS) have been invaluable for demonstrating that complex diseases are by and large polygenic, and that different subsets of many genes are apparently interacting in individuals who share a disease. But, they haven't been useful for prediction of complex traits.
But, identifying interacting genes, and gene by gene interaction has proven to be difficult. Thus, Keyes et al write, "[i]n this paper, we use simulated data of one million separate populations to demonstrate the drivers of the association between a germline genetic risk factor and a disease outcome, drawing observations that have implications for personalised medicine and genetic risk prediction."
They first create a hypothetical disease, one that is caused by a germ line genetic variant and environmental exposure to one or more risk factors. Risk of disease is higher in those exposed to both than the additive effect in individuals exposed to genetic risk or environmental risk alone. And, importantly, the disease can also be caused in many other ways. Keyes et al. varied the rate of genetic exposure, environmental exposure, and background prevalence of disease in each of their simulated populations.
They simulated the enormous number of populations they did in order to accommodate every possible prevalence of the combination of risk factors, from 1 to 100%. They then compared nine different scenarios of genetic and environmental risk exposure, low, moderate and high, estimating the risk of disease for those with compared with those without the risk allele.
Our simulation program, ForSim, referred to earlier, is a more complex and evolutionarily sophisticated approach that specifies things less explicitly and that can apply to multiple populations and other things. But if anything, in its simulated results, with causation and variation more realistic, causation will be even less precisely estimable or predictable than in the current paper, whose results are already quite convincing.
We'd just add that while this study is certainly a cautionary tale, the authors don't, in our view, acknowledge the full import of their conclusions. Every genome is unique and unpredictable, and future environments are unpredictable, even in principle, so that if predicting complex diseases depends on knowing environmental and genomic context, it's not going to be possible. It may be possible to retrodict complex disease based on understanding past environment and observed genes and genomes, but solving the prediction problem is another question.
| The Crystal Ball, by John William Waterhouse: scrying in crystal; Wikipedia |
A few weeks ago, Ken blogged about the advantages of using computer simulation to probe causal connections in genetics and epidemiology (here and here). Simulations can be valuable because they allow exploration of complexity with known assumptions built in explicitly and hence testably, and because there are no data or measurement errors (unless introduced intentionally and then they're still identifiable). If the results resemble real data, then one has confidence in the assumptions. If not, conditions can be changed to explore why not. Also, things far too complex to be affordably tested in the real world can be simulated, and simulation is fast and inexpensive, as a way to explore the nature of causation in a given context.
Keyes et al. simulate one million populations, with 10,000 individuals per population, to explore the question of how possible, given epistasis (gene interaction), and gene x environment interaction, predicting complex diseases will be. They point out that while genetic and epidemiological studies have been useful for finding correlations between risk factors and disease, they've been less useful for predicting which individuals will develop a given disease.
And, they point out that genome wide association studies (GWAS) have been invaluable for demonstrating that complex diseases are by and large polygenic, and that different subsets of many genes are apparently interacting in individuals who share a disease. But, they haven't been useful for prediction of complex traits.
But, identifying interacting genes, and gene by gene interaction has proven to be difficult. Thus, Keyes et al write, "[i]n this paper, we use simulated data of one million separate populations to demonstrate the drivers of the association between a germline genetic risk factor and a disease outcome, drawing observations that have implications for personalised medicine and genetic risk prediction."
They first create a hypothetical disease, one that is caused by a germ line genetic variant and environmental exposure to one or more risk factors. Risk of disease is higher in those exposed to both than the additive effect in individuals exposed to genetic risk or environmental risk alone. And, importantly, the disease can also be caused in many other ways. Keyes et al. varied the rate of genetic exposure, environmental exposure, and background prevalence of disease in each of their simulated populations.
They simulated the enormous number of populations they did in order to accommodate every possible prevalence of the combination of risk factors, from 1 to 100%. They then compared nine different scenarios of genetic and environmental risk exposure, low, moderate and high, estimating the risk of disease for those with compared with those without the risk allele.
Using simulations that span the range of potential possible prevalences of genes, environmental factor and unrelated factors, we show that the magnitude of both the risk ratio and risk difference [risk of disease to those exposed to the genetic risk factor vs those not exposed] association between a genetic factor and health outcome depends entirely on the prevalence of two factors: (1) the factors that interact with the genetic variant of interest; and (2) the background rate of disease in the population. These results indicate that genetic risk factors can only adequately predict disease in the presence of common interacting factors, suggesting natural limits on the predictive ability of individual common germline genetic factors in preventative medicine.And, four conclusions. First, predicting complex disease from genes will continue to be largely unsuccessful, unless the environmental context and gene interactions are understood. Second, it's when background disease rates are low, and environmental risk factors common that predicting disease from genes is going to be most reliable. Third, environmental context is important in predicting the effect of genes. And, fourth, non-replicability of many genotype/phenotype studies is likely to be due to differing prevalences of genetic and environmental risk factors in the different study populations. Trait 'heritability' is context dependent, not an inherent characteristic of the trait itself.
Our simulation program, ForSim, referred to earlier, is a more complex and evolutionarily sophisticated approach that specifies things less explicitly and that can apply to multiple populations and other things. But if anything, in its simulated results, with causation and variation more realistic, causation will be even less precisely estimable or predictable than in the current paper, whose results are already quite convincing.
We'd just add that while this study is certainly a cautionary tale, the authors don't, in our view, acknowledge the full import of their conclusions. Every genome is unique and unpredictable, and future environments are unpredictable, even in principle, so that if predicting complex diseases depends on knowing environmental and genomic context, it's not going to be possible. It may be possible to retrodict complex disease based on understanding past environment and observed genes and genomes, but solving the prediction problem is another question.
Thursday, February 5, 2015
Populations, individuals and imprecise disease prediction
As Michael Gerson writes in the Washington Post, "Preventable infectious disease is making its return to the developed world, this time by invitation." When anti-vaxxers were few, and they chose not to expose their kids to what they consider toxins, their kids benefited from the herd immunity that resulted from most parents choosing to have their kids vaccinated (or, as Gerson puts it, anti-vaxxers chose to be free-riders). They could claim there were no costs to their action (or non-action), because as long as they were a small minority, there weren't.
But unlike many complex non-infectious diseases, infectious diseases are very predictable. Once the proportion of a population that is immunized falls below a certain threshold, as determined by the rigorous and empirically tested mathematics of infectious disease, the kids of anti-vaxxers are then sitting ducks for disease once they are exposed, and the disease is then likely to spread even to the immunized population because no vaccine is 100% effective. And this is happening in the US now with measles. Anti-vaxxers convinced enough of their neighbors not to vaccinate that they can no longer claim no cost, only benefits to their beliefs.
In theory, herd immunity protects a population from measles when at least 90-95% of the population is vaccinated, so the above map would suggest that even in Oregon, with a rate of non medical immunization exemption over 6%, the disease would be unable to gain a foothold. But, infectious disease researcher Marcel Salathé who is here at Penn State nicely describes herd immunity here, and suggests that something closer to 100% coverage would actually be required to protect against measles because of pockets of lower vaccination rates, and non-random mixing of the population and so forth.
The science on the safety and efficacy of vaccination is well-established. Vaccines can have side-effects, but it's pretty clear the list doesn't include autism. (Has anyone estimated the prevalence of autism in anti-vaxxer communities yet? If they were right that the MMR vaccine causes autism, the rate should be a lot lower in unvaccinated kids by now, no?) The freedom of choice issue, of whether the state has a right to require individuals to be vaccinated, is a live one, and any conscientious objector to the right of society to make decisions for individuals has to be struggling with this one, given the societal consequences. This is distinctly not the same as an individual's freedom to decide whether to smoke or to drink jumbo soft drinks because in the case of vaccines, what's good for society is also good for individuals, and vice versa.
But that's not what interests me particularly here. What interests me is the interplay between population and individual disease dynamics. Infectious disease dynamics depend on the proportion of susceptible individuals in the population; too few and the disease dies out, enough and the disease sticks around, cyclically infecting people as, say, the flu, or endemic, as, e.g., venereal diseases. So, in a very real sense infectious disease happens to a group, in a group, and because of a group, at the same time it's happening to individuals in that group. But chronic non-infectious diseases (CNIDs) don't work that way. Chronic non-infectious diseases happen to individuals only, irrespective of what's happening to anyone else in the population (though, see below).
But, what we know about and predict for individuals depends on what we know (or think we know) about a CNID in a population. Epidemiologists collect data in a group on what may be relevant risk factors, and then statistically estimate their importance and impact. Observations on a single individual don't have the power to allow epidemiologists to determine which risk factors are likely to be important. That requires repeated observations, on many people.
So, calculations of how likely you are to have a heart attack given your age, body mass index, cholesterol levels and so on are based on population associations between such factors and actual heart attacks. These are based on past observed experience in many individuals. As we've written many times before, it's not really clear what 'risk' represents, other than the observed proportion of a population with apparent past exposure to tested risk factors who went on to develop a disease. But, a lot of people with the same risk factors didn't develop disease, or have a heart attack or whatever, and a lot of people without those risk factors did. So, risk estimates are population-based statistics that may or may not apply to you -- or anyone individually, really. They are collective data, and clearly don't explain all risk, or allow precise prediction.
Now, social epidemiologists would say that chronic diseases can be as much a result of population factors as infectious diseases are. Smoking, obesity, drinking, stress-related diseases are correlated with social class, so in a very real sense, population dynamics can affect risk of CNIDs as well. So, if we want to explain CNIDs by distal rather than proximal risk factors, population dynamics become important, as with infectious diseases. But they are still population-level factors -- not every low-income individual is obese or has high blood pressure, and not every overweight individual is poor. But, everyone with measles has been infected by the measles virus.
The population issue, I think, goes a long way toward explaining why it's so hard to predict chronic disease, genetic or otherwise. We're forced to infer group statistics to individuals, and that's never going to be precise.
But unlike many complex non-infectious diseases, infectious diseases are very predictable. Once the proportion of a population that is immunized falls below a certain threshold, as determined by the rigorous and empirically tested mathematics of infectious disease, the kids of anti-vaxxers are then sitting ducks for disease once they are exposed, and the disease is then likely to spread even to the immunized population because no vaccine is 100% effective. And this is happening in the US now with measles. Anti-vaxxers convinced enough of their neighbors not to vaccinate that they can no longer claim no cost, only benefits to their beliefs.
![]() |
| From Mother Jones, 2014 |
In theory, herd immunity protects a population from measles when at least 90-95% of the population is vaccinated, so the above map would suggest that even in Oregon, with a rate of non medical immunization exemption over 6%, the disease would be unable to gain a foothold. But, infectious disease researcher Marcel Salathé who is here at Penn State nicely describes herd immunity here, and suggests that something closer to 100% coverage would actually be required to protect against measles because of pockets of lower vaccination rates, and non-random mixing of the population and so forth.
The science on the safety and efficacy of vaccination is well-established. Vaccines can have side-effects, but it's pretty clear the list doesn't include autism. (Has anyone estimated the prevalence of autism in anti-vaxxer communities yet? If they were right that the MMR vaccine causes autism, the rate should be a lot lower in unvaccinated kids by now, no?) The freedom of choice issue, of whether the state has a right to require individuals to be vaccinated, is a live one, and any conscientious objector to the right of society to make decisions for individuals has to be struggling with this one, given the societal consequences. This is distinctly not the same as an individual's freedom to decide whether to smoke or to drink jumbo soft drinks because in the case of vaccines, what's good for society is also good for individuals, and vice versa.
| Measles virus; Wikipedia (Cynthia S Goldsmith Content Provider, CDC) |
But that's not what interests me particularly here. What interests me is the interplay between population and individual disease dynamics. Infectious disease dynamics depend on the proportion of susceptible individuals in the population; too few and the disease dies out, enough and the disease sticks around, cyclically infecting people as, say, the flu, or endemic, as, e.g., venereal diseases. So, in a very real sense infectious disease happens to a group, in a group, and because of a group, at the same time it's happening to individuals in that group. But chronic non-infectious diseases (CNIDs) don't work that way. Chronic non-infectious diseases happen to individuals only, irrespective of what's happening to anyone else in the population (though, see below).
But, what we know about and predict for individuals depends on what we know (or think we know) about a CNID in a population. Epidemiologists collect data in a group on what may be relevant risk factors, and then statistically estimate their importance and impact. Observations on a single individual don't have the power to allow epidemiologists to determine which risk factors are likely to be important. That requires repeated observations, on many people.
So, calculations of how likely you are to have a heart attack given your age, body mass index, cholesterol levels and so on are based on population associations between such factors and actual heart attacks. These are based on past observed experience in many individuals. As we've written many times before, it's not really clear what 'risk' represents, other than the observed proportion of a population with apparent past exposure to tested risk factors who went on to develop a disease. But, a lot of people with the same risk factors didn't develop disease, or have a heart attack or whatever, and a lot of people without those risk factors did. So, risk estimates are population-based statistics that may or may not apply to you -- or anyone individually, really. They are collective data, and clearly don't explain all risk, or allow precise prediction.
Now, social epidemiologists would say that chronic diseases can be as much a result of population factors as infectious diseases are. Smoking, obesity, drinking, stress-related diseases are correlated with social class, so in a very real sense, population dynamics can affect risk of CNIDs as well. So, if we want to explain CNIDs by distal rather than proximal risk factors, population dynamics become important, as with infectious diseases. But they are still population-level factors -- not every low-income individual is obese or has high blood pressure, and not every overweight individual is poor. But, everyone with measles has been infected by the measles virus.
The population issue, I think, goes a long way toward explaining why it's so hard to predict chronic disease, genetic or otherwise. We're forced to infer group statistics to individuals, and that's never going to be precise.
Thursday, October 16, 2014
What if Rev Jenyns had agreed? Part III. 'Group' selection in individuals, too.
By
Ken Weiss
We have been using Darwin's and Wallace's somewhat different views of evolution to address some questions of evolutionary genetics and their consequences for todays attempt to understand the biological, especially genomic, basis of traits of interest. Darwin had a more particularistic individual focus and Wallace a more group-focused, ecological one, on the dynamics of evolutionary change.
As a foil, we noted that a friend of Darwin's, Leonard Jenyns was offered the naturalist's job on the Beagle first, but turned it down, opening the way for Darwin. We mused about how we might think today had Wallace's view of evolution, announced in the same year that Darwin's was, been the first view of the new theory. Where we'd be now if we'd had a more group than individual focus is of course not knowable, but we feel Wallace's viewpoint, at least in some senses, has been wrongly neglected.
![]() |
| HMS Beagle in the Straits of Magellan |
As a foil, we noted that a friend of Darwin's, Leonard Jenyns was offered the naturalist's job on the Beagle first, but turned it down, opening the way for Darwin. We mused about how we might think today had Wallace's view of evolution, announced in the same year that Darwin's was, been the first view of the new theory. Where we'd be now if we'd had a more group than individual focus is of course not knowable, but we feel Wallace's viewpoint, at least in some senses, has been wrongly neglected.
Population genetic theory traces what happens to genetic variants in a population over time. Almost without exception the theory treats each individual as representing a single genotype. We take individual blood samples or cheek swabs, and let our "Next-Gen" sequencer grind out the nucleotide sequences as though on a proverbial assembly line. In this sense, each individual--or, rather, the individual's genotype--is taken to be the unit of evolution.
Populations were, and generally still are, seen as a mix of these individual internally non- varying homogeneous units each having a genotype. But that's an obviously inaccurate way to view life, another reflection of the difference in viewpoint about variation in life that we've been characterizing by relating them symbolically to Darwin's and Wallace's stress in their views of evolution.
There is a strong tendency to equate genotypes with the traits they cause. This derives from the tendency to reduce natural selection to screening of single genes, because if single genes cannot be detected effectively by selection, they generally won't have high predictive value for biomedicine either. It is easy to see the issue.
But individuals are populations too
Let's ask something very simple: What is your 'genotype'? You began life as a single fertilized egg with two instances of human genomes, one inherited from each parent (here, we’ll ignore the slight complication of mitochondrial DNA). Two sets of chromosomes. But that was you then, not as you are now. Now, you’re a mix of countless billions of cells. They’re countless in several ways. First, cells in most of your tissues divide and produce two daughter cells, in processes that continue from fertilization to death. Second, cells die. Third, mutations occur so that each cell division introduces numerous new DNA changes in the daughter cells. These somatic (body cell) mutations don’t pass to the next generation (unless they occur in the germline) but they do affect the cells in which they are found.
But how do we determine your genotype? This is usually done from thousands or millions of cells—say, by sequencing DNA extracted from a blood sample or cheek swab. So what is usually sequenced is an aggregate of millions of instances of each genome segment, among which there is variation. The resulting analysis picks up, essentially, the most common nucleotides at each position. This is what is then called your genotype and the assumption is that it represents your nature, that is, all your cells that in aggregate make you what you are.
In fact, however, you are not just a member of a population of different competing individuals each with their inherited genotypes. In every meaningful sense of the word each person, too, is a i of genomes. A person's cells live and/or compete with each other in a Darwinian sense, and his/her body and organs and physiology are the net result of this internal variation, in the same sense that there is an average stature or blood pressure among individuals in a population.
Let's ask something very simple: What is your 'genotype'? You began life as a single fertilized egg with two instances of human genomes, one inherited from each parent (here, we’ll ignore the slight complication of mitochondrial DNA). Two sets of chromosomes. But that was you then, not as you are now. Now, you’re a mix of countless billions of cells. They’re countless in several ways. First, cells in most of your tissues divide and produce two daughter cells, in processes that continue from fertilization to death. Second, cells die. Third, mutations occur so that each cell division introduces numerous new DNA changes in the daughter cells. These somatic (body cell) mutations don’t pass to the next generation (unless they occur in the germline) but they do affect the cells in which they are found.
But how do we determine your genotype? This is usually done from thousands or millions of cells—say, by sequencing DNA extracted from a blood sample or cheek swab. So what is usually sequenced is an aggregate of millions of instances of each genome segment, among which there is variation. The resulting analysis picks up, essentially, the most common nucleotides at each position. This is what is then called your genotype and the assumption is that it represents your nature, that is, all your cells that in aggregate make you what you are.
In fact, however, you are not just a member of a population of different competing individuals each with their inherited genotypes. In every meaningful sense of the word each person, too, is a i of genomes. A person's cells live and/or compete with each other in a Darwinian sense, and his/her body and organs and physiology are the net result of this internal variation, in the same sense that there is an average stature or blood pressure among individuals in a population.
If we were to clone a population of individuals, each from a single identical starting cell, and house them in entirely identical environments, there would still be variation among them (we see this, imperfectly, in colonies of inbred laboratory strains such as of mice). They are mostly the same, but not entirely. That’s because they are aggregates of cells, with genomes varying around their starting genome.
Yesterday we tried to describe why the traits in individuals in populations have a central tendency: most people have pretty similar stature or glucose levels or blood pressure. The reason is a group-evolutionary phenomenon. In a population, many different genomic elements contribute to the trait, and because the population is here and hence has evolved successfully in its competitive environment, the mix of elements and their individual frequencies is such that random draws of these elements mainly generate rather similar results.
It is this distribution of random draws of all the genetic variants in the population that determines the context and hence the success of a given variant. But the process is a relativistic one, rather than absolute effects of individual variants. Gene A's success depends on B's presence and vice versa, across the genome. There is always a small number of outliers, having drawn unusual combinations, and evolution screens these in a way that results in a central tendency that may shift over time, etc.
The same explanation accounts for the traits in individuals. There would be a central tendency in our hypothetical cloned mice. That’s because the somatic mutations generate many different cells, but most are not too different from each other. As in evolution in populations, if they are dysfunctional the cell dies (or, in some instances, they doom the whole cell-population to death, as when somatic mutations cause cancer in the individual). Otherwise, they usually comprise a population near the norm.
Is somatic variation important?
An individual is a group, or population of differing cells. In terms of the contribution of genetic variation among those cells, our knowledge is incomplete to say the least. From a given variant's point of view (and here we ignore the very challenging aspect of environmental effects), there may be some average risk--that is, phenotype among all sampled individuals with that variant in their sequenced genome. But somatically acquired variation will affect that variant's effects, and generally we don't yet know how to take that into account, so it represents a source of statistical noise, or variance, around our predictions. If the variant's risk is 5% does that mean that 5% of carriers are at 100% risk and the rest zero? Or all are at 5% risk? How can we tell? Currently we have little way to tell and I think manifestly even less interest in this problem.
Cancer is a good, long-studied example of the potentially devastating nature of somatic variation, because there is what I've called 'phenotype amplification': a cell that has inherited (from the person's parents or the cell's somatic ancestors) a carcinogenic genotype will not in itself be harmful, but it will divide unconstrained so that it becomes noticeable at the level of the organism. Most somatic mutations don't lead to uncontrolled cell proliferation, but they can be important in more subtle ways that are very hard to assess at present. But we do know something about them.
Cancer is a good, long-studied example of the potentially devastating nature of somatic variation, because there is what I've called 'phenotype amplification': a cell that has inherited (from the person's parents or the cell's somatic ancestors) a carcinogenic genotype will not in itself be harmful, but it will divide unconstrained so that it becomes noticeable at the level of the organism. Most somatic mutations don't lead to uncontrolled cell proliferation, but they can be important in more subtle ways that are very hard to assess at present. But we do know something about them.
Evolution is a process of accumulation of variation over time. Sequences acquire new variants by mutations in a way that generates a hierarchical relationship, a tree of sequence variation that reflects the time order of when each variant first arrived. Older variants that are still around are typically more common than newer ones. This is how the individual genomes inherited by members of a population and is part of the reason that a group perspective can be an important but neglected aspect of our desire to relate genotypes to traits, as discussed yesterday. Older variants are more common and easier to find, but are unlikely to be too harmful, or they would not still be here. Rarer variants are very numerous in our huge, recently expanded human population. They can have strong effects but their rarity makes them hard to analyze by our current statistical methods.
However, the same sort of hierarchy occurs during life as somatic mutations arise in different cells at different times in individual people. Mutations arising early in embryonic development are going to be represented in more descendant cells, perhaps even all the cells in some descendant organ system, than recent variants. But because recent variants arise when there are many cells in each organ, the organ may contain a large number of very rare, but collectively important, variants.
The mix of variants, their relative frequencies, and their distribution of resulting effects are thus a population rather than individual phenomenon, both in populations and individuals. Reductionist approaches done well are not ‘wrong’, and tell us what can be told by treating individuals as single genotypes, and enumerating them to find associations. But the reductionist approach is only one way to consider the causal nature of life.
Our society likes to enumerate things and characterize their individual effects. Group selection is controversial in the sense of explaining altruism, and some versions of group selection as an evolutionary theory have well-demonstrated failings. But properly considered, groups are real entities that are important in evolution, and that helps account for the complexity we encounter when we force hyper-reductionistic, individual thinking to the exclusion of group perspectives. The same is true of the group nature of individuals' genotypes.
We have taken Darwin and Wallace as representatives of these differing perspectives. Had Jenyns taken the boat ride he was offered, we'd have been more strongly influenced by Wallace's population perspective because we wouldn't have had Darwin's. Instead, Darwin's view won, largely because of his social position and being in the London hub of science, as has been well-documented. A consequence is that the ridicule to which group-based evolutionary arguments have been subjected is a reflection of the resulting constricted theoretical ideology of many scientists—but not of the facts that science is trying to explain.
What needs to be worked on is not, or certainly not just, increased sample size to somehow make enumerative individual prediction accurate. For reasons we've tried to suggest, retrospective fitting to the particular agglomerate of genotypes does not yield accurate individual prediction--and here we've not even considering non-genomic aspects of each genome-site's environment. Instead, we should try to develop a better population-based understanding of the mix of variants and their frequencies, and a better sense of what a given allele's 'effect' is when we know each allele's effect is not singular nor absolute, but is strictly relative to its context both in terms of its individual and population occurrences. It's not obvious (to us, at least) how to do that, or how such an understanding might relate to whether accurate individualized prediction is likely to be possible in general.
Friday, October 3, 2014
An example of the problem of risk projection
By
Ken Weiss
One of the biggest problems in biomedical, including genomic, disease risk prediction is that it is almost always based on projections of past risks into the future. We wrote about that the other day (here), but here's yet another example--and they abound.
The Oct 1 NYTimes had a story about a boom in pre-school fitness programs. If parents, and it will largely be middle-class privileged parents, adopt this fad, it may have long-term, even lifelong, implications for the future health of the kids who partake. If the Times is right that this is a boom industry, one can imagine a whole generation of super healthy upper and upper middle class future adults in the making. That would be quite good (unless, of course, it turns out that various muscle, skeletal, or other traits are harmed by overdoing this early exercise), and so a beneficial practice for individual and public health.
But, even if they are healthier than today's adults, eventually these babies will develop diseases as they grow older. From our perspective as scientists who think about pitfalls to doing science, this raises some potential problems for future researchers doing disease genetics or environmental epidemiology, looking for risk factors associated with disease.
So risk estimates are about the future, but future exposures can't even in principle be known. This is obvious, so why is awareness of the problem so low? And what, if anything, can we do about it besides discounting risk estimates and acknowledging that they usually have unknown precision?
![]() |
| Baby swimming; Wikipedia |
The Oct 1 NYTimes had a story about a boom in pre-school fitness programs. If parents, and it will largely be middle-class privileged parents, adopt this fad, it may have long-term, even lifelong, implications for the future health of the kids who partake. If the Times is right that this is a boom industry, one can imagine a whole generation of super healthy upper and upper middle class future adults in the making. That would be quite good (unless, of course, it turns out that various muscle, skeletal, or other traits are harmed by overdoing this early exercise), and so a beneficial practice for individual and public health.
But, even if they are healthier than today's adults, eventually these babies will develop diseases as they grow older. From our perspective as scientists who think about pitfalls to doing science, this raises some potential problems for future researchers doing disease genetics or environmental epidemiology, looking for risk factors associated with disease.
1. If it's predominantly parents of a given ancestry, European urbanites say, who enroll their kids, this can induce false positive genomic signals. Any other kind of clustering related to who enrolls can be equally problematic;
2. The kids themselves may not remember, or investigators decades from now may not be aware of these early fitness programs even to ask about them. The exposure to such programs' effects may as a result go under-reported in epidemiological or genetic association studies, leading to distorted estimates of other risk factors;
3. If parents who enroll their kids are, as seems likely, themselves into fitness plans, there can be a family association of altered risk with genotype that will be challenging to identify and correct for as they could seem to be genetic;
4. If the kids are inculcated with other health-habits, based on today's do-this/don't-do-that fashions (e.g., here's a story in the Times about 6 year olds choosing to be vegan), there will be correlations with later disease that will not necessarily be identifiable, and indeed, it may be the parents' attitudes that are responsible, not the kids' genotypes or behavioral choices freely made.Our society already spends much media ink and research resources in hyping risk estimates for genes and lifestyle factors alike, that are made retrospectively based on the behavioral and exposure antecedents of today's disease cases, as ascertained by means such as interview questionnaires (Did you smoke? How much, for how long? Did you get exercise when you were a child? How much, for how long? How many eggs did you eat per week when you were in your twenties?). Those are not only quite inaccurate, involving things occurring decades ago, but the chronic, complex disease risks we're exposed to today generally won't materialize for decades into the future. Indeed, if we read about a risky behavior or food, this makes a lot of us change our behavior, yet another complication--and one which operates regularly as we read advice from the latest research, not always aware of its potential weaknesses.
So risk estimates are about the future, but future exposures can't even in principle be known. This is obvious, so why is awareness of the problem so low? And what, if anything, can we do about it besides discounting risk estimates and acknowledging that they usually have unknown precision?
Monday, June 3, 2013
Reporting incidental genetic findings could be a goldmine
So, now that the cost of whole genome sequencing (WGS) is falling to nearly affordable levels, you're thinking of having your genome sequenced. Or maybe you could just do your whole exome (WES), which if you don't know it already is the sequence for all the protein-coding regions--a very small part--of the genome. Perhaps you've already been genotyped and you know your ear wax type and your purported generic risk of diabetes, but you think that if it's possible to know more about your genes and genetic risk factors, you really want to know. Maybe you wonder if you're at risk of a specific disease that may run in your family, or maybe you're just curious.
Or, perhaps you've got a rare life-altering disease, you're the only member of your family with this disease as far as you know, and you'd like to know why you have it, if possible. Maybe you'd like to know whether you're at risk of passing it on to any children you may have. Or, WES or WGS might simply be diagnostic.
Or, you believe that the more whole genome sequences in public databases the better, and you'd like to contribute yours. You understand that many genetic risk factors aren't yet well-defined or understood, but you'd like to contribute to the learning process.
Reasons for sequencing vary greatly, from personal choice to clinical indication, but at least these questions apply to everyone: How much do you want to know? And should you be able to make that decision for yourself?
A piece published in Science on May 31 reports that you may not be able to answer these questions for yourself in the future:
The diseases currently on the list were chosen, according to the Science piece, based on "the potential for medical intervention to mitigate disease, the strength of association between specific gene abnormalities and the condition, and penetrance of those genes." They are generally pediatric, rare diseases and incidental findings with respect to these diseases will, the ACMG estimates, affect perhaps 1% of people for whom clinical sequencing is done. But how useful is this, even for those 1%?
Healthy people probably don't mind learning of incidental findings, and might even welcome them, though they are already often asked if they are interested in knowing their genotype for a small set of diseases -- Alzheimer's is one -- because that information is more sensitive than, say, eye color.
But, someone already living with a challenging disease or disorder may not want to know whether they have a higher than average risk of diabetes or breast cancer or dementia, particularly if onset is decades in the future. That is, they've got a 'right not to know.' They might feel they've already got enough to worry about. Indeed, even the ACMG recognizes that risk assessment is iffy at best, even for many monogenic diseases.
The most straightforward of these diseases are generally familial, often congenital or with early onset, and families are likely to already be aware of their risk, or can be or have been genotyped for causal variants or carrier status. So, it's arguable that there's little clinical advantage to reporting apparent risk of these kinds of disorders because it will be redundant. Or more importantly, uninformative. Indeed, for many such disorders it's not clear what the population frequency of 'causal' variants is, and thus how likely the variant is to actually be causal.
And, as we've discussed many times here on MT, it's impossible to predict environmental exposures into the future. Risk is always determined retroactively, given past exposures, because that's all we've got. If a disease or disorder is due to a gene x environment interaction, as many are, including BRCA1 and 2 to some degree, then knowing that we carry a potentially causal variant is not enough information to predict disease. Indeed, that information does not exist.
Further, all of us, whether we're healthy or already affected by a genetic variant, are carrying a not insubstantial number of genetic variants, including copy-number variants, deletions or insertions, and single nucleotide polymorphisms that may or may not be associated with disease. Often not.
Currently, people undergoing clinical WGS or WES are given extensive genetic counseling, and asked whether or not they prefer to be informed of findings unrelated to their condition. The ACMG says this is not going to be practicable in the near future, when sequencing is more common and the laboratory doing the sequencing is not involved in reporting findings to the patient. Incidental findings will simply be reported as a matter of course.
The Science piece argues that to treat genetic information any differently from other kinds of medical information is wrong. A piece in Genetics in Medicine, published online May 30, argues that the situation is akin to when a patient has a chest x-ray to diagnose pneumonia and the radiologist sees a tumor in the lung -- of course the patient is told. It would be wrong otherwise. But a visible mass is different from a probabalistic risk.
It's true that there are many kinds of laboratory results that are not definitive, but they are evaluated within a clinical context. Risk of heart disease is rarely based on blood lipid levels alone. It seems premature to conclude, rather patronizingly at that, that patients must be told incidental genetic findings that may or may not be meaningful.
Or.....not? A remedy for reporting bias against negative results?
On the other hand, perhaps there is a very valuable piece of information that the medical establishment may, in this context, not want to know even themselves! As we noted above, many of the 'incidental' findings have to do with severe, usually early onset traits. Incidental findings of that sort in people who already know they don't have such a disease (since, for example, they're grown-ups without the pediatric problem), actually are valuable negative evidence for the genotype's causative role!
We and many others have criticized the many incentives that lead to over-claiming of results in the genomics arena, such as the publication bias against negative results. But here we may have a gold mine of tempering evidence, to undermine the kinds of confident predictions often offered. Every case of a supposedly causal mutation (or a devastating mutation in a causally associated gene) that turns up as an incidental finding should be used to adjust, and often to reduce the risk associated with a putative gene or allelic variant. Here, without mounting a specific study and then only reporting positive findings, we can accumulate negative findings in a classically unbiased way--by studying people without the target trait (similar to studying a disorder, like stroke, by age-sex matched hospital patients who are there for unrelated diseases).
Could incidental findings be a gold mine of discovery for variants that do not yield as high a risk as has been thought (or hyped)?
Or, perhaps you've got a rare life-altering disease, you're the only member of your family with this disease as far as you know, and you'd like to know why you have it, if possible. Maybe you'd like to know whether you're at risk of passing it on to any children you may have. Or, WES or WGS might simply be diagnostic.
Or, you believe that the more whole genome sequences in public databases the better, and you'd like to contribute yours. You understand that many genetic risk factors aren't yet well-defined or understood, but you'd like to contribute to the learning process.
Reasons for sequencing vary greatly, from personal choice to clinical indication, but at least these questions apply to everyone: How much do you want to know? And should you be able to make that decision for yourself?
A piece published in Science on May 31 reports that you may not be able to answer these questions for yourself in the future:
The American College of Medical Genetics and Genomics (ACMG) recently issued a statement recommending that all laboratories conducting clinical sequencing seek and report pathogenic and expected pathogenic mutations for a short list of carefully chosen genes and conditions. The recommendations establish a baseline for reporting clinically relevant incidental findings and articulate ethical principles relevant to their disclosure.That is, the ACMG is recommending that anyone undergoing WGS for clinical purposes -- for diagnosis or evaluation of a tumor for treatment purposes, e.g. -- be informed if they have one of the disease-related mutations on the list of the College's choosing, mutations that are not related to the purpose of the initial testing--that is, not related to the disease that initiated the sequencing. These would be incidental findings. They add that they "recognize that there are insufficient data on clinical utility to fully support these recommendations" but that the list will be revised as data improve.
The diseases currently on the list were chosen, according to the Science piece, based on "the potential for medical intervention to mitigate disease, the strength of association between specific gene abnormalities and the condition, and penetrance of those genes." They are generally pediatric, rare diseases and incidental findings with respect to these diseases will, the ACMG estimates, affect perhaps 1% of people for whom clinical sequencing is done. But how useful is this, even for those 1%?
Healthy people probably don't mind learning of incidental findings, and might even welcome them, though they are already often asked if they are interested in knowing their genotype for a small set of diseases -- Alzheimer's is one -- because that information is more sensitive than, say, eye color.
But, someone already living with a challenging disease or disorder may not want to know whether they have a higher than average risk of diabetes or breast cancer or dementia, particularly if onset is decades in the future. That is, they've got a 'right not to know.' They might feel they've already got enough to worry about. Indeed, even the ACMG recognizes that risk assessment is iffy at best, even for many monogenic diseases.
The most straightforward of these diseases are generally familial, often congenital or with early onset, and families are likely to already be aware of their risk, or can be or have been genotyped for causal variants or carrier status. So, it's arguable that there's little clinical advantage to reporting apparent risk of these kinds of disorders because it will be redundant. Or more importantly, uninformative. Indeed, for many such disorders it's not clear what the population frequency of 'causal' variants is, and thus how likely the variant is to actually be causal.
And, as we've discussed many times here on MT, it's impossible to predict environmental exposures into the future. Risk is always determined retroactively, given past exposures, because that's all we've got. If a disease or disorder is due to a gene x environment interaction, as many are, including BRCA1 and 2 to some degree, then knowing that we carry a potentially causal variant is not enough information to predict disease. Indeed, that information does not exist.
Further, all of us, whether we're healthy or already affected by a genetic variant, are carrying a not insubstantial number of genetic variants, including copy-number variants, deletions or insertions, and single nucleotide polymorphisms that may or may not be associated with disease. Often not.
Currently, people undergoing clinical WGS or WES are given extensive genetic counseling, and asked whether or not they prefer to be informed of findings unrelated to their condition. The ACMG says this is not going to be practicable in the near future, when sequencing is more common and the laboratory doing the sequencing is not involved in reporting findings to the patient. Incidental findings will simply be reported as a matter of course.
We recognize that this may be seen to violate existing ethical norms regarding the patient’s autonomy and “right not to know” genetic risk information. However, in selecting a minimal list that is weighted toward conditions where prevalence may be high and intervention may be possible, we felt that clinicians and laboratory personnel have a fiduciary duty to prevent harm by warning patients and their families about certain incidental findings and that this principle supersedes concerns about autonomy, just as it does in the reporting of incidental findings elsewhere in medical practice.But reporting incidental findings to a patient without genetic counseling is, we propose, harmful. Further, 'harm' can include telling someone who's already dealing with a challenging disorder or disease that he or she carries a genetic variant that may or may not cause disease sometime in the future. There are very rarely absolute risks associated with genetic variants.
The Science piece argues that to treat genetic information any differently from other kinds of medical information is wrong. A piece in Genetics in Medicine, published online May 30, argues that the situation is akin to when a patient has a chest x-ray to diagnose pneumonia and the radiologist sees a tumor in the lung -- of course the patient is told. It would be wrong otherwise. But a visible mass is different from a probabalistic risk.
It's true that there are many kinds of laboratory results that are not definitive, but they are evaluated within a clinical context. Risk of heart disease is rarely based on blood lipid levels alone. It seems premature to conclude, rather patronizingly at that, that patients must be told incidental genetic findings that may or may not be meaningful.
Or.....not? A remedy for reporting bias against negative results?
On the other hand, perhaps there is a very valuable piece of information that the medical establishment may, in this context, not want to know even themselves! As we noted above, many of the 'incidental' findings have to do with severe, usually early onset traits. Incidental findings of that sort in people who already know they don't have such a disease (since, for example, they're grown-ups without the pediatric problem), actually are valuable negative evidence for the genotype's causative role!
We and many others have criticized the many incentives that lead to over-claiming of results in the genomics arena, such as the publication bias against negative results. But here we may have a gold mine of tempering evidence, to undermine the kinds of confident predictions often offered. Every case of a supposedly causal mutation (or a devastating mutation in a causally associated gene) that turns up as an incidental finding should be used to adjust, and often to reduce the risk associated with a putative gene or allelic variant. Here, without mounting a specific study and then only reporting positive findings, we can accumulate negative findings in a classically unbiased way--by studying people without the target trait (similar to studying a disorder, like stroke, by age-sex matched hospital patients who are there for unrelated diseases).
Could incidental findings be a gold mine of discovery for variants that do not yield as high a risk as has been thought (or hyped)?
Wednesday, August 22, 2012
The exactitude of -omics
On Exactitude in Science
Jorge Luis Borges, Collected Fictions, translated by Andrew Hurley.
…In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.
—Suarez Miranda,Viajes de varones prudentes, Libro IV,Cap. XLV, Lerida, 1658We'd like to suggest that Borges' short story can be aptly applied to the current state of disease prediction. Fifteen years ago or so we were being told that once we had the human genome (HG) sequenced we'd be able to predict the diseases people were going to get, prevent them, and everyone would live to older ages than we'd ever attained before. Aside from the questionable ethics of enabling such a demographic catastrophe, not to mention the idea that "everyone" would surely be an exclusive club, this promise is not much closer to realization now than in pre-HG days.
The -omics boom was being born. This is the era of 'hypothesis free' approaches. When we don't know the cause or can't develop useful actual hypotheses, our 'hypothesis' is just that some element in the realm we're searching has causal effects. The genome was the first such realm, and the idea was that the trait had to have some genetic cause and if we blindly search the entire genome it must be there, and so we'll find it (or instead of 'it', some tractable few numbers of such causal sites).
Genomics was driven by increasing technology and was addictive, because, it is not too cynical to say, it was thought-free, meat-grinder, factory science. It was lucrative, did indeed teach us a lot about what genes and genomes do, and found a modest number of important causal genes. Its success, at least in the fashion and funding senses, understandably spawned other hypothesis-free blind technological approaches, cashing in on the cachet of the 'omics' word and its rejection of the need for actual prior hypotheses to design studies: nutriomics, connectomics, metabolomics, microbiomics, immunomics, epigenomics, and more. How much of this was because the same people who were promising us that successful disease prediction with genetics was right around the corner realized that this just wasn't true, and needed to figure out ways to keep their labs running we can't say, but we certainly are a fad-following, money-following research culture and we know this is part of the story. To be fair, when other approaches hadn't solved any of the problems, there was natural appeal to a thought-free, safely factory-like turn. In any case, many of the same people who were gung-ho about genetics are now equally gung-ho about the promise of the -omics boom to bring us disease prediction and prevention that will really work this time.
The current interest in the -omics of supercentenarians in order to figure how they lived to their ripe old ages, and thus how we can live to 120 is, we think, an example of this misguided fad. One basic assumption of this work is that every cause is individually identifiable, predictable and replicable. This is in fact true for causes with large effects--Mendelian diseases, e.g., or point source infections like cholera or malaria and so on--but there are many paths to heart disease or stroke. When everyone's genome is unique and causes many and variable, however, too often each combination of environmental and genetic factors will be extremely rare if not singular, and impossible to identify with current statistical-sampling based methods, the identification of rarely replicated events will be next to impossible. The idea that every cause can be identified is a reductionist approach to disease akin to the reductionist approach to evolution, which requires every trait to have an adaptive reason to have evolved when in fact sometimes it's just chance.
But, once we venture into the quest to find environmental factors that influence longevity, we're necessarily identifying these factors retroactively, if they are even identifiable, and yet none of us is going to live in the past. Future environments are unpredictable. So, again, unless a factor has large effects--heavy radiation exposure, infectious agents, toxins, e.g.--it's unlikely to be useful in predicting individual cases of disease.
We can see the issues by the proliferation of ever-more 'omics' approaches. Each omics-community advocates its realm as if it is the, or at least the critical, one. Essentially, we always add but rarely reduce, the number of potential causes of the traits in the lives of individuals. This adds to the combinatorial realm--number of possible combinations of factors (and their intensity)--through which we must search. More causes, inevitably individually rare, means that to show that a combination is causal it has to be seen enough times. That means ever larger samples because 'seen enough times' means to enable us to rule out chance as the explanation for the association between the combination of risks and the outcome. But when there are more reasonably plausible combinations than grains of sand on the earth's beaches (this is no exaggeration--it's if anything an understatement), there aren't enough people to get such results. And subsequent generations will have different people with different combinations of risk factors.
We certainly wouldn't argue with the idea that what we eventually succumb to is likely to be the result of multiple -omics, that is, a combination of factors. But, we do question the idea that they will be identifiable, or useful in prediction, which is presumably the point of all this work. The current interest in documenting every possible factor that might have an effect on health and longevity is bringing us closer and closer to Borges' map of the Empire.
Monday, April 16, 2012
Is whole genome sequencing fading? Will it rebound (or relapse)?
By
Ken Weiss
There are various informal indicators that funders are losing enthusiasm for human whole genome sequencing. We've seen discussions of 'genome fatigue' in the media (Carl Zimmer, e.g., talks about this here), and one colleague said there wasn't much enthusiasm for whole genome sequencing because we hadn't found the cure for cancer or made highly useful personalized predictive medicine. Another colleague on an NIH grant review panel said that particular panel, at least, wasn't going to fund any more GWAS studies.
If this turns out to be more than a few anecdotes or personal opinions, and is actually occurring, it's understandable and to be lauded. As we think we can truthfully claim, we have for years been warning of the dangers of the kind of overkill that genomics (and, indeed, other 'omics' fads) present: promise miracles and you had better deliver!
The same thing applies to evolutionary studies that seek whole genome sequences as well as to studies designed to use such data to predict individual diseases. There are too many variants to sort through, the individual signal is too weak, and too many parts of the genome contribute to many if not most traits, for genomes to be all that important--whether for predicting future disease, normal phenotypes like behaviors, or fitness in the face of natural selection.
There are some traits, especially if close to a specific protein, in which only one or a few genes are important. There are many genes which, if broken by mutation, can cause serious problems. And as we've said numerous times, this is where the genetics money should be spent. But the nature of evolution is that it has produced complexity by involving numerous cooperating genetic elements, and traits are typically buffered against mutations. Otherwise, organisms couldn't have gotten so complex (try making a brain or liver with just one gene!). Otherwise, with so many genes and ever-present mutation, nobody in any species would ever survive.
The instances of single-gene or major-mutation causation are numerous and real. They are already handled by services like genetic counseling in biomedicine, and by evolutionary or experimental analysis. But the important nature of Nature is its complexity and at present whole genome sequence data provide too much variation for us to deal with on adequate terms.
Nature screens the success of organisms on their overall traits, regardless of what genotype contributed to it. Many of the contributing variants to a given trait are new mutations or are very rare in the population, and very difficult to detect in terms of assigning 'risk' to them. Worse, they flow through the population all the time, as individuals die and new ones are born. Since their individual effects depend on their context--the ever-changing environment and the rest of the genome--these effects are also fluid. Thus, enumerating causal variants may not be a very useful way to understand biological causation.
Of course, rumors of the demise of ever-higher throughput genomics may be greatly exaggerated. Funding may not actually be diminishing, or may return. Whether that will be a rebound towards good science, or a relapse of low payoff, is a matter of opinion.
![]() |
| DNA sequence data, Wikimedia Commons |
If this turns out to be more than a few anecdotes or personal opinions, and is actually occurring, it's understandable and to be lauded. As we think we can truthfully claim, we have for years been warning of the dangers of the kind of overkill that genomics (and, indeed, other 'omics' fads) present: promise miracles and you had better deliver!
The same thing applies to evolutionary studies that seek whole genome sequences as well as to studies designed to use such data to predict individual diseases. There are too many variants to sort through, the individual signal is too weak, and too many parts of the genome contribute to many if not most traits, for genomes to be all that important--whether for predicting future disease, normal phenotypes like behaviors, or fitness in the face of natural selection.
There are some traits, especially if close to a specific protein, in which only one or a few genes are important. There are many genes which, if broken by mutation, can cause serious problems. And as we've said numerous times, this is where the genetics money should be spent. But the nature of evolution is that it has produced complexity by involving numerous cooperating genetic elements, and traits are typically buffered against mutations. Otherwise, organisms couldn't have gotten so complex (try making a brain or liver with just one gene!). Otherwise, with so many genes and ever-present mutation, nobody in any species would ever survive.
The instances of single-gene or major-mutation causation are numerous and real. They are already handled by services like genetic counseling in biomedicine, and by evolutionary or experimental analysis. But the important nature of Nature is its complexity and at present whole genome sequence data provide too much variation for us to deal with on adequate terms.
Nature screens the success of organisms on their overall traits, regardless of what genotype contributed to it. Many of the contributing variants to a given trait are new mutations or are very rare in the population, and very difficult to detect in terms of assigning 'risk' to them. Worse, they flow through the population all the time, as individuals die and new ones are born. Since their individual effects depend on their context--the ever-changing environment and the rest of the genome--these effects are also fluid. Thus, enumerating causal variants may not be a very useful way to understand biological causation.
Of course, rumors of the demise of ever-higher throughput genomics may be greatly exaggerated. Funding may not actually be diminishing, or may return. Whether that will be a rebound towards good science, or a relapse of low payoff, is a matter of opinion.
Tuesday, February 28, 2012
Progress -- complex diseases are still complex
Ciliopathies are a class of disorders recognized only relatively recently. They are genetic disorders that affect the function of the primary, or non-motile, cilium, of which most mammalian cells have one. The normal function of these organelles still isn't well-understood, and they were long thought to be vestiges of the eukaryotic cell's evolutionary past, but now they are thought to be 'cellular antennae', involved in sensing a wide variety of signals -- chemical sensing, temperature sensing, and the sensing of movement, at least, and in vertebrate development. Here's a useful description of primary cilia.

Eukaryotic cilium.
A number of rare diseases have been associated with cilial dysfunction, including spina bifida, some forms of retinitis pigmentosa, some obesity, some diabetes and liver disease, some breathing disorders, and so forth. A paper in last week's Science by Lee et al. describes one ciliopathy, Joubert's syndrome, a rare genetic disorder that affects the cerebellum, and thus balance and coordination. This is of general interest because, as a commentary in the same issue points out, it elucidates just one aspect of why complex diseases can be so difficult to understand.
Lee et al. identified a gene, a TMEM (transmembrane) gene, that seemed to be responsible for Joubert's syndrome in 5 of the 10 families in their study. The disorder in the other families, who did not carry the same gene variant, seemed to be phenotypically identical, so they resequenced the area around the gene in question to look for possible causative variants nearby. Sequencing of the 'exome' has become de riguer in recent years (that is, all the exons, or coding regions, in a genome; as this is only ~1% of the genome, it's a lot cheaper and faster than sequencing the entire genome). But, as this paper and commentary point out, restricting the search only to exons can miss important variants.
Indeed, Lee et al. found mutations in the neighboring related TMEM gene. Both genes, TMEM138 and TMEM216, encode transmembrane proteins, that is, proteins that rest across cell membranes with part sticking out into the space surrounding the cell where it can monitor aspects of the environment, and the other part remaining inside the cell. But the authors found no homologous regions in the genes or the resulting proteins, and thus nothing that explained why the disease could be the same in all families. This prompted them to look for shared sequence in the regulation of the expression of the two genes.
Aravinda Chakravarti and Ashish Kapoor say in their commentary on this paper, and Mendelian disease in general, that this work represents a maturing of the understanding of complex genetic disease. The genetics community should no longer be focused on single gene mutations, or even exomes (the protein-coding sections of 'genes'), but instead should recognize that complex diseases will require complex explanations.
One reason for the current focus -- should we call it a 'fad'? -- on rare variants is that the heavily touted promise that everything in the universe would be explained as being due to common genetic variants (and hence attractive to pharmaceutical companies and useful for widespread risk prediction) was that the theory has proven largely to be a bust. So since nobody will give up on predictive genotyping, the move was to rare variants which, not incidentally, will require extensive DNA sequencing, data bases, analysis and the grants that go with them, to find and document. It's difficult not to wax cynical in this way.
As we have written many times, here and elsewhere, when there are many pathways to the same phenotype, including gene by environmental interactions, and when everyone is genetically unique, the idea that most cases of rare or common diseases can be explained or predicted is likely to be an unattainable goal. As a rule, causation involves a spectrum of strong and weak, common and rare, interacting effects.
Still, it is a sign of progress when major players, not just those of us working on a smaller scale or even those on the sidelines, are cautioning about the ineffectiveness of looking for answers only at single genes or coding regions, or in enormous studies.
Eukaryotic cilium.
A number of rare diseases have been associated with cilial dysfunction, including spina bifida, some forms of retinitis pigmentosa, some obesity, some diabetes and liver disease, some breathing disorders, and so forth. A paper in last week's Science by Lee et al. describes one ciliopathy, Joubert's syndrome, a rare genetic disorder that affects the cerebellum, and thus balance and coordination. This is of general interest because, as a commentary in the same issue points out, it elucidates just one aspect of why complex diseases can be so difficult to understand.
Lee et al. identified a gene, a TMEM (transmembrane) gene, that seemed to be responsible for Joubert's syndrome in 5 of the 10 families in their study. The disorder in the other families, who did not carry the same gene variant, seemed to be phenotypically identical, so they resequenced the area around the gene in question to look for possible causative variants nearby. Sequencing of the 'exome' has become de riguer in recent years (that is, all the exons, or coding regions, in a genome; as this is only ~1% of the genome, it's a lot cheaper and faster than sequencing the entire genome). But, as this paper and commentary point out, restricting the search only to exons can miss important variants.
Indeed, Lee et al. found mutations in the neighboring related TMEM gene. Both genes, TMEM138 and TMEM216, encode transmembrane proteins, that is, proteins that rest across cell membranes with part sticking out into the space surrounding the cell where it can monitor aspects of the environment, and the other part remaining inside the cell. But the authors found no homologous regions in the genes or the resulting proteins, and thus nothing that explained why the disease could be the same in all families. This prompted them to look for shared sequence in the regulation of the expression of the two genes.
To test for coordinated expression, we examined tissue-expression patterns of human TMEM138 and TMEM216 using the microarray database and in situ hybridization of human embryos. We found tight coexpression values of human TMEM138 and TMEM216 across the major tissues, including the brain and kidneys, and similar expression patterns in various tissues, including the kidneys, cerebellar buds, and telencephalon, at 4 to 8 gestational weeks (gw) of human embryos. To test whether this coordinated expression was due to the adjacent localization, we compared mRNA levels in zebrafish versus mice, representing species before and after the gene rearrangement event. Using quantitative polymerase chain reaction (qPCR), we detected tightly coordinated expression levels in mice compared with those in zebrafish (correlation coefficient r = 0.984 versus 0.386), which suggests that TMEM138 and TMEM216 might share regulatory elements (REs) within the ~23-kb intergenic region. We further examined several experimental features and found that regulatory factor X 4, a transcription factor regulating ciliary genes, binds a RE conserved in the noncoding intergenic region to mediate coordinated expressions of TMEM138 and TMEM216Further analysis leads them to suggest that both genes are necessary for normal development of the cilium, and that this is because they are regulated by a shared intergenic region, a 'cis-regulatory module', or CRM, a binding site for transcription factors that regulate nearby genes but that is not itself part of those genes. How these modules arise or how the coordinated expression of genes evolves is not well-understood, but this CRM seems to explain the pattern Lee et al. found in the Joubert syndrome families they studied.
Aravinda Chakravarti and Ashish Kapoor say in their commentary on this paper, and Mendelian disease in general, that this work represents a maturing of the understanding of complex genetic disease. The genetics community should no longer be focused on single gene mutations, or even exomes (the protein-coding sections of 'genes'), but instead should recognize that complex diseases will require complex explanations.
Mutation analyses of single-gene defects have identified two puzzles: One is that not all individuals with a specific disorder have identifiable coding mutations; the other is that not all individuals with identical mutations, even in the same family, are equally affected, and some may be symptom-free. The first mystery has many suspected causes: The disorder may be due to another gene—even the adjacent one, as Lee et al. demonstrate—or arise from mutations in a gene's regulatory sequences, or be a phenocopy (a trait that is not of genetic origin but is environmentally induced and mimics the phenotype produced by a gene). This is a persistent challenge in studying an outbred organism like humans; just because a disorder is monogenic does not imply that it is monocausal. The second problem is more mysterious and far less understood. Phenotypic discordance, or variation in disease penetrance, between identical mutation bearers could result from differential environmental exposures (such as normal intelligence versus mental retardation in diet-treated versus untreated phenylketonuria).The goal remains to determine causation as well as to predict disease. The pendulum keeps swinging between the search for common and rare variants with which to do this -- as this commentary says, "Studies of Mendelian disease should also move from its preoccupation with rare variants to a focus on common polymorphisms, particularly at regulatory sequences affecting either rare disorders like Hirschsprung disease or common disorders like myocardial infarction."
One reason for the current focus -- should we call it a 'fad'? -- on rare variants is that the heavily touted promise that everything in the universe would be explained as being due to common genetic variants (and hence attractive to pharmaceutical companies and useful for widespread risk prediction) was that the theory has proven largely to be a bust. So since nobody will give up on predictive genotyping, the move was to rare variants which, not incidentally, will require extensive DNA sequencing, data bases, analysis and the grants that go with them, to find and document. It's difficult not to wax cynical in this way.
As we have written many times, here and elsewhere, when there are many pathways to the same phenotype, including gene by environmental interactions, and when everyone is genetically unique, the idea that most cases of rare or common diseases can be explained or predicted is likely to be an unattainable goal. As a rule, causation involves a spectrum of strong and weak, common and rare, interacting effects.
Still, it is a sign of progress when major players, not just those of us working on a smaller scale or even those on the sidelines, are cautioning about the ineffectiveness of looking for answers only at single genes or coding regions, or in enormous studies.

Tuesday, February 14, 2012
Ptolemaic genetics: epicycles of lobbying
That was then...
Way back then, in the dark ol' days of science, the Roman astronomer Claudius Ptolemy (90-168AD) tried to explain the position of the planets in terms of divinely perfect circles of orbit around God's home (the Earth). The idea that we were at the center of perfect celestial spheres was a standard 'scientific' explanation of the cosmos and our place in it.
But the cantankerous planets refused to play by the rules, and their paths deviated from perfect circles. Indeed, occasionally the seemed to move backward through the skies! Still, perfect circular orbits around Earth simply had to be true based on the fundamental belief system of the time, so astronomers invented numerous little deviations, called epicycles, to make the (we now know) elliptical orbital pegs fit the round holes of theory.
And then along came Nicolaus Copernicus (1473-1543 AD). And the cosmos was turned inside out: the earth was not the center of things after all!
Thomas Kuhn famously described in The Structure of Scientific Revolutions how the best and the brightest scientists struggle valiantly to fit pegs into holes they don't really fit, until some bright person ccomes along and shows the benighted herd a better way to account for the same things. Copernicus, Galileo, Newton, Einstein, and others were the knights in shining armor who inaugurated some of the most noteworthy of these occasional 'scientific revolutions.' Darwin's evolutionary ideas are also a classic example.
The same kind of struggle is just what is happening now in genetics and evolutionary biology--indeed in many other fields in which statistical evidence runs headlong into causal complexity. Whether, when, or what knightly change will occur is anyone's guess.
And this is now
Everyone remembers the hoopla the sequencing of the human genome was met with when it was announced (or rather, each time it was announced) -- we were promised that we would by now not only know why people were sick, but we'd be able to predict what we'd get sick with in future. It was promised that this would be a silver-bullet reality by the early 21st century by no other than Francis Collins. Others were promising lifespans in the centuries: all of us would be Methuselahs!
So, all those illnesses would now be treatable or preventable in the first place. How? Well, the genome would allow us to identify druggable pathways, and common diseases must be due to common genetic variants (an idea that came to be known as common disease common variant, or CDCV), and if we could just identify them, we'd be in business. After all, didn't Darwin show us that everything about everything alive was due to genetic causation and natural selection? If that's the case, we should be able to find it, and our wizardry at engineering would take the ball and run with it. Big Pharma jumped on the 'druggable' genome bandwagon and people running big sequencing labs jumped on the CDCV idea, and genomewide association studies (GWAS) were born. And then the 1000 Genomes project, and all the -omics projects.... Big is better, of course! Not that these efforts weren't questioned at the time, based on what everyone should have known about evolution and population genetics, but the powers-that-be plowed ahead anyway.
Well, we're no longer in a minority of naysayers. It's widely recognized that GWAS haven't been very successful, relative to the loud promises being trumpeted only a few years ago. And even the successes they have had -- and numerous genes associated with traits have been identified, it must be said -- typically explain only a small amount of the variation in disease, or any trait, in fact. So now researchers are working on automating the prediction of disease from gene variants based on protein structure and other DNA-based clues. But the assumption--the belief system, really--is still that the answer is in the DNA, and disease prediction is still going to be possible.
A piece in Feb 9 Nature describes a number of state-of-the-art approaches to predicting the effects of DNA variants, in part based on what amino acid changes do to proteins. The idea now is that diseases are going to be found to be due to rare variants, and the challenge is to figure out what these variants do. In part, evolution will help us to do this.
It's true in one sense that every disease we get is genetic -- everything that happens in our body is affected by genes -- but in another sense, much of what happens is a response to the environment, and so is environmentally determined--that is, is not due to genetic variation in susceptibility. Predicting a disease from genes when it's due to combined action of genes and environment, therefore, is a very challenging problem.
Here is just one example of why: Native Americans throughout the Americas are about 65 years into a widespread epidemic of obesity, type 2 diabetes and gallbladder disease, diseases that were quite rare in these people before World War II. There are a number of reasons to suspect that their high prevalence is due to a fairly simple genetic susceptibility. But, if gene variants (still not identified) are responsible, they have been at high frequency in the descendants of those who crossed the Bering Straits from Siberia for at least 10,000 years -- which means that variants that are now detrimental were "tolerated by evolution and exist[ed] in healthy individuals" for a very long time.
If geneticists had wanted to predict 70 years ago what diseases Native Americans were susceptible to, these variants would have been completely overlooked, because they weren't yet causing disease. And indeed these 'risk' genes, whatever they be, were benign -- until the environment changed. We're all walking around with variants that would kill us in some environment or other, and since we can't predict the environments we'll be living in even 20 years from now, never mind 50 or 100, the idea that we'll be able to predict which of our variants will be detrimental when we're old is just wrong. In fact, we're each walking around with substantial numbers of mutant or even 'dead' genes, with apparently no ill effect at all -- but who knows what the effect might be in a different environment.
But, ok, some of us do have single gene variants that make us sick now. Many of these have been identified, most readily when a family of affected individuals is examined (though the benefit of knowing the gene is rarely of use therapeutically), but many more remain to be. The current idea is that this can be done by looking for mutations in chromosome regions that are conserved among species, and figuring out which of these change amino acids (and thus the protein coded for by the gene). The idea is that unvarying regions are unvarying because natural selection has tested the variants that arose and found them wanting, thus eliminating them from the population. They must, therefore, be functionally important!
Well, if we can do with or without a protein (or other functional DNA element), depending on the variation we have across the genome, then even when the element is important its variation in a given individual may not be causal: there are many examples where that is clearly true. Further, the same kind of evolutionary reasoning would say that centrally important -- and hence highly conserved -- parts of the genome probably cannot vary much without being lethal, largely to the embryo. So, from that equally sound Darwinian reasoning, we would expect that disease-associated variation will be in the minor genes with only little effect! So the 'evolutionary conservation' argument cuts both ways, and it's not at all clear which way its cut is sharpest. It's a great idea, but in some ways the hope that searching for conservation will bail us out, is just more wishful thinking to save business as usual.
To complicate things even more, not all amino acid changes cause disease, or even do much of anything. And again, sometimes they will only be harmful in a given environment. And, of course, not all diseases are caused by protein changing mutations -- sometimes they are caused by disturbances to gene regulation.
In fairness, the multitude of researchers trying to make sense of the limitless genetic variation that is pouring out of DNA sequencers recognize that it's complicated. But then, why are they still saying things like this, as quoted in the Nature piece: “The marriage of human genetics and functional genomics can deliver what the original plan of the human genome promised to medicine.”
What's to the rescue? Do we need another 'scientific revolution'?
We have no idea when or if our current model of living Nature will be shown to be naive, or whether our understanding is OK but we haven't cottoned on to a seriously better way to think about the problems, or indeed whether the hubris of computer and molecular scientists' love of technology will, in fact, be victorious. If it comes, it could be. But we are certainly in the midst of a struggle to fit the square truths about genetics and evolution into the round holes of Mendelian and Darwinian orthodoxy.
Perhaps the problem to be solved is how to back away from enumerative, probabilistic, reductionistic treatment of complex, multiple causation, and to make inferences in other ways. We need to understand causation by numerous, small or even ephemeral statistical effects, without our current enumerative statistical methods of inference. In terms of the philosophy of science, doing that would require some replacement of the 400 year-old foundations of modern science, based on reductionistic, inductive methods that enabled science to get to the point today where we realize that we need something different.
The situation here is complicated relative to scientific revolutions in Copernicus', Newton's, Darwin's or even Einstein's time by the large, institutionalized, bureaucratized, fiscal juggernaut that science has become. This makes the rivalries for truth, for explanations that this time will finally, really, truly solve the complexity problem even more frenzied, hubristic, grasping, and lobbying than before. That adds to the normal amount of ego all of us in science have, the desire to be right, to have insight, and so on. Whether it will hasten the inspiration for a transforming better idea, or will just force momentum along incremental paths and make real insight even harder to come by, is a matter of opinion.
Sadly, the science funding system, including the role of lobbying via the media, is so entrenched in our careers, that dishonesty about what is claimed to the media or even said in grants is widespread and quietly acknowledged even by the most prominent people in the field: "It's what you have to say to get funded!", they say. But where does dissembling end and dishonesty begin when it comes time to the design and reporting of studies (and, here, we're not referring to fraud, but to misleading results and over promising the importance of the work)? The commitment to the ideology and the promises restrains freedom of thought, and certainly dampens innovative science. But it's a trap for those who have to have grants and credit to make their living in research institutions and the science media.
But right now, scientists are like tropical trees, struggling mightily to be the one that reaches the sunlight, putting the others in their shade. What we need is a conceptual zip-line over the canopy.
![]() |
| Ibn al-Shatir's model for the appearances of Mercury, showing the multiplication of epicycles in a Ptolemaic enterprise. 14th century CE (Wikimedia Commons). |
But the cantankerous planets refused to play by the rules, and their paths deviated from perfect circles. Indeed, occasionally the seemed to move backward through the skies! Still, perfect circular orbits around Earth simply had to be true based on the fundamental belief system of the time, so astronomers invented numerous little deviations, called epicycles, to make the (we now know) elliptical orbital pegs fit the round holes of theory.
And then along came Nicolaus Copernicus (1473-1543 AD). And the cosmos was turned inside out: the earth was not the center of things after all!
Thomas Kuhn famously described in The Structure of Scientific Revolutions how the best and the brightest scientists struggle valiantly to fit pegs into holes they don't really fit, until some bright person ccomes along and shows the benighted herd a better way to account for the same things. Copernicus, Galileo, Newton, Einstein, and others were the knights in shining armor who inaugurated some of the most noteworthy of these occasional 'scientific revolutions.' Darwin's evolutionary ideas are also a classic example.
The same kind of struggle is just what is happening now in genetics and evolutionary biology--indeed in many other fields in which statistical evidence runs headlong into causal complexity. Whether, when, or what knightly change will occur is anyone's guess.
And this is now
Everyone remembers the hoopla the sequencing of the human genome was met with when it was announced (or rather, each time it was announced) -- we were promised that we would by now not only know why people were sick, but we'd be able to predict what we'd get sick with in future. It was promised that this would be a silver-bullet reality by the early 21st century by no other than Francis Collins. Others were promising lifespans in the centuries: all of us would be Methuselahs!
So, all those illnesses would now be treatable or preventable in the first place. How? Well, the genome would allow us to identify druggable pathways, and common diseases must be due to common genetic variants (an idea that came to be known as common disease common variant, or CDCV), and if we could just identify them, we'd be in business. After all, didn't Darwin show us that everything about everything alive was due to genetic causation and natural selection? If that's the case, we should be able to find it, and our wizardry at engineering would take the ball and run with it. Big Pharma jumped on the 'druggable' genome bandwagon and people running big sequencing labs jumped on the CDCV idea, and genomewide association studies (GWAS) were born. And then the 1000 Genomes project, and all the -omics projects.... Big is better, of course! Not that these efforts weren't questioned at the time, based on what everyone should have known about evolution and population genetics, but the powers-that-be plowed ahead anyway.
Well, we're no longer in a minority of naysayers. It's widely recognized that GWAS haven't been very successful, relative to the loud promises being trumpeted only a few years ago. And even the successes they have had -- and numerous genes associated with traits have been identified, it must be said -- typically explain only a small amount of the variation in disease, or any trait, in fact. So now researchers are working on automating the prediction of disease from gene variants based on protein structure and other DNA-based clues. But the assumption--the belief system, really--is still that the answer is in the DNA, and disease prediction is still going to be possible.
A piece in Feb 9 Nature describes a number of state-of-the-art approaches to predicting the effects of DNA variants, in part based on what amino acid changes do to proteins. The idea now is that diseases are going to be found to be due to rare variants, and the challenge is to figure out what these variants do. In part, evolution will help us to do this.
"Sequencing data from an increasing number of species and larger human populations are revealing which variants can be tolerated by evolution and exist in healthy individuals."But, are we trying to explain a current disease, or predict the diseases someone will eventually get? These are different endeavors, though it may often be inconvenient to acknowledge that. Rare pediatric diseases that are due to single genetic mutations, or genetic diseases that cluster in families (and, again, usually with young onset age and rare) are easier to parse than the complex chronic diseases that most of us will eventually get. But, based on the comparison of the genomes that have already been sequenced, we now know that we all seem to differ from each other at something like 3 million bases. That is, we all have a genome that has never existed before and never will again. Assigning function to all that variation is from daunting to impossible -- not least because a lot of it might not even have a function. And the idea that we'll eventually be able to make predictions from those variants is based on questionable assumptions.
It's true in one sense that every disease we get is genetic -- everything that happens in our body is affected by genes -- but in another sense, much of what happens is a response to the environment, and so is environmentally determined--that is, is not due to genetic variation in susceptibility. Predicting a disease from genes when it's due to combined action of genes and environment, therefore, is a very challenging problem.
Here is just one example of why: Native Americans throughout the Americas are about 65 years into a widespread epidemic of obesity, type 2 diabetes and gallbladder disease, diseases that were quite rare in these people before World War II. There are a number of reasons to suspect that their high prevalence is due to a fairly simple genetic susceptibility. But, if gene variants (still not identified) are responsible, they have been at high frequency in the descendants of those who crossed the Bering Straits from Siberia for at least 10,000 years -- which means that variants that are now detrimental were "tolerated by evolution and exist[ed] in healthy individuals" for a very long time.
If geneticists had wanted to predict 70 years ago what diseases Native Americans were susceptible to, these variants would have been completely overlooked, because they weren't yet causing disease. And indeed these 'risk' genes, whatever they be, were benign -- until the environment changed. We're all walking around with variants that would kill us in some environment or other, and since we can't predict the environments we'll be living in even 20 years from now, never mind 50 or 100, the idea that we'll be able to predict which of our variants will be detrimental when we're old is just wrong. In fact, we're each walking around with substantial numbers of mutant or even 'dead' genes, with apparently no ill effect at all -- but who knows what the effect might be in a different environment.
But, ok, some of us do have single gene variants that make us sick now. Many of these have been identified, most readily when a family of affected individuals is examined (though the benefit of knowing the gene is rarely of use therapeutically), but many more remain to be. The current idea is that this can be done by looking for mutations in chromosome regions that are conserved among species, and figuring out which of these change amino acids (and thus the protein coded for by the gene). The idea is that unvarying regions are unvarying because natural selection has tested the variants that arose and found them wanting, thus eliminating them from the population. They must, therefore, be functionally important!
A host of increasingly sophisticated algorithms predict whether a mutation is likely to change the function of a protein, or alter its expression. Sequencing data from an increasing number of species and larger human populations are revealing which variants can be tolerated by evolution and exist in healthy individuals. Huge research projects are assigning putative functions to sequences throughout the genome and allowing researchers to improve their hypotheses about variants. And for regions with known function, new techniques can use yeast and bacteria to assess the effects of hundreds of potential mammalian variants in a single experiment.This is potentially useful, because for those with single gene mutations that cause disease -- 1 variant among 3 million other ways in which each person differs from everyone else -- homing in on the causative mutation is, again, difficult to impossible if you don't have a large family with similarly affected individuals in which to confirm the association of mutation and disease.
Well, if we can do with or without a protein (or other functional DNA element), depending on the variation we have across the genome, then even when the element is important its variation in a given individual may not be causal: there are many examples where that is clearly true. Further, the same kind of evolutionary reasoning would say that centrally important -- and hence highly conserved -- parts of the genome probably cannot vary much without being lethal, largely to the embryo. So, from that equally sound Darwinian reasoning, we would expect that disease-associated variation will be in the minor genes with only little effect! So the 'evolutionary conservation' argument cuts both ways, and it's not at all clear which way its cut is sharpest. It's a great idea, but in some ways the hope that searching for conservation will bail us out, is just more wishful thinking to save business as usual.
![]() |
| Methuselah (Della Francesca ca. 1550) |
In fairness, the multitude of researchers trying to make sense of the limitless genetic variation that is pouring out of DNA sequencers recognize that it's complicated. But then, why are they still saying things like this, as quoted in the Nature piece: “The marriage of human genetics and functional genomics can deliver what the original plan of the human genome promised to medicine.”
What's to the rescue? Do we need another 'scientific revolution'?
We have no idea when or if our current model of living Nature will be shown to be naive, or whether our understanding is OK but we haven't cottoned on to a seriously better way to think about the problems, or indeed whether the hubris of computer and molecular scientists' love of technology will, in fact, be victorious. If it comes, it could be. But we are certainly in the midst of a struggle to fit the square truths about genetics and evolution into the round holes of Mendelian and Darwinian orthodoxy.
Perhaps the problem to be solved is how to back away from enumerative, probabilistic, reductionistic treatment of complex, multiple causation, and to make inferences in other ways. We need to understand causation by numerous, small or even ephemeral statistical effects, without our current enumerative statistical methods of inference. In terms of the philosophy of science, doing that would require some replacement of the 400 year-old foundations of modern science, based on reductionistic, inductive methods that enabled science to get to the point today where we realize that we need something different.
The situation here is complicated relative to scientific revolutions in Copernicus', Newton's, Darwin's or even Einstein's time by the large, institutionalized, bureaucratized, fiscal juggernaut that science has become. This makes the rivalries for truth, for explanations that this time will finally, really, truly solve the complexity problem even more frenzied, hubristic, grasping, and lobbying than before. That adds to the normal amount of ego all of us in science have, the desire to be right, to have insight, and so on. Whether it will hasten the inspiration for a transforming better idea, or will just force momentum along incremental paths and make real insight even harder to come by, is a matter of opinion.
Sadly, the science funding system, including the role of lobbying via the media, is so entrenched in our careers, that dishonesty about what is claimed to the media or even said in grants is widespread and quietly acknowledged even by the most prominent people in the field: "It's what you have to say to get funded!", they say. But where does dissembling end and dishonesty begin when it comes time to the design and reporting of studies (and, here, we're not referring to fraud, but to misleading results and over promising the importance of the work)? The commitment to the ideology and the promises restrains freedom of thought, and certainly dampens innovative science. But it's a trap for those who have to have grants and credit to make their living in research institutions and the science media.
![]() |
| Zip-line over rainforest canopy, Costa Rica (Wikimedia) |
But right now, scientists are like tropical trees, struggling mightily to be the one that reaches the sunlight, putting the others in their shade. What we need is a conceptual zip-line over the canopy.
Subscribe to:
Posts (Atom)









