Tuesday, January 27, 2015

Somatic mutation: does it cut both ways?

I've written journal articles as well as blogposts here at MT, about the known and potential importance of somatic mutation (SoMu) as a cause of disease.  I referred to this in our post on 'precision' medicine yesterday, saying I'd write about it today.  So here goes, an attempt to show why SoMu may be an important causal phenomenon, one I called 'Cryptic causation' in a paper a few years ago in Trends in Genetics.

SoMu's are DNA changes that occur in dividing cells after the egg is fertilized.  Mutations arise every time cells divide after that, throughout life.  Each time a cell divides thereafter, the mutations that arose when it was formed are transmitted to its daughter cells, and this continues throughout life (unless that site experiences another mutation at some point during its lifelong lineage).  The distinction between somatic mutations and germ line mutations goes back to Weissmann's demonstration of the separation of the 'soma' and the 'germ line', the germ line being a developmental clade of cells leading to sperm and egg cells and soma being cells unrelated to these.  A change from parent to offspring that reflects mutation arising in the germ line is the usual referent of the word 'mutation'.   Wherever they arose in the embryogenesis of the gonads, they are treated as if they occurred right at the time of meiosis.  That isn't a real problem, but it is fundamentally distinct from SoMu, because the latter are inherited in the somatic (body) tissue lineage in which they arose, but are not transmitted to offspring.

Normally, we would dismiss somatic mutation as just one of those trivial details that has little to do with the nature of each organism--its traits.  At any given genome location, most of the cells have 'the' genome that was initially inherited.  If a SoMu breaks something in a single cell in some tissue, making that cell not behave properly, so what?  Mostly the cell will die or just while away its life not cooperating, its diffidence swamped out by the millions of neighboring cells, performing their proper duties, in the mutant cell's organ.  It will have no effect on the organism as a whole.

But that is not always so!  In some unfortunate cell, a combination of inherited and somatic variants may lead that individual cell to be hyperviable in the sense of not following the local tissue's restrictions on its growth and behavior.  It can then grow, differentiate, grow more, again and again. We have a name for this: it's called cancer.

Somatic changes may mean that different parts of a given organ have somewhat different genotypes. Some fraction of, say, a lung or stomach, may work more or less efficiently than others.  If the composite works basically well, it won't even be noticed (unless, for example, the somatically mutant clones cause differences, like local spots, in skin or hair pigment).  But when a change in one cell is early enough in embryogenesis, or there is some other sort of phenotype amplification, by which a single mutant cell can cause major effects at the organismal level, the SoMu is very important indeed.

It isn't just cancer that may result from somatic mutation.  Epilepsy is a possible example, where mutant neurons may mis-fire, entraining nearby otherwise-normal neurons to engage in firing, and producing a local seizure.  I suggested this possibility a few years ago in the Trends in Genetics paper, though the subject is so difficult to test that although it is a plausible way to account for the locality of seizures, the idea has been conveniently ignored.

There are theories that mitochondria, of which cells contains hundreds or thousands, may mutate relatively rapidly and function badly.  They are an important way the cell obtains energy, and the mitochondrial DNA is not in the nucleus and is not prowled by mutation-repair mechanisms the way chromosomes are.  Some have suggested that SoMu's accumulate in neurons in the brain, and since the neurons don't replicate much if at all, they can gradually become damaged.  It's been suggested that this may account for some senile dementia or other aging-related traits.

Beware, million genome project!
What has this got to do with the million genome project?  An important fact is that SoMu's are in body tissues but are not part of the constitutive (inherited) genome, as is routinely sampled from, say, a cheek swab or blood sample.  The idea underlying the massive attempts at genomewide mapping of complex traits, and the new culpably wasteful 'million genomes' project by which NIH is about to fleece the public and ensure that even fewer researchers get grants because the money's all been soaked up by DNA sequencing, Big Data induction labs, is that we'll be able to predict disease precisely, from whole genome sequence, that is, from constitutive genome sequence of hordes of people.  We discussed this yesterday, perhaps to excess. Increasing sample size, one might reason, will reduce measurement error and make estimates of causation and risk 'precise'.  That is in general a bogus self-promoting ploy, among other reasons because rare variants and measurement and sample errors or issues may not yield a cooperating signal-to-noise ratio.

So I think that the idea of wholesale, mindless genome sequencing will yield some results but far less than is promised and the main really predictable result, indeed precisely predictable result, is more waste thrown onto mega-labs, to keep them in business.

Anyway, we're pretty consistent with our skepticism, nay, cynicism about such Big Data fads as mainly grabs in tight times for funding that's too long-lasting or too big to kill, regardless of whether it's generating anything really useful.

One reason for this is that SoMu cannot be detected in the kind of whole genome sequences being ground out by the machinery of this big industry.  If you have SoMu's in vulnerable tissues, say lung or stomach or muscle, you may be at quite substantial increased risk for some nasty disease, but that will be entirely unpredictable from your constitutive genome because the mutation isn't to be found in your blood cells.  Now, thinking about that, sequencing is not so precise after all, is it?

I've tried to point these things out for many years, but except for cancer biologists the potential problem is hardly even investigated (except, in a different sort of fad, by epigeneticists looking for DNA marking that affects gene expression in body cells but that, also, cannot be detected by whole genome sequencing).

In fact, epigenetics is a similar though perhaps in some ways tougher problem.  DNA marking affects gene expression by changing it in local tissues, which reflects cellularly local environmental events and hence constitutive genomics can't evaluate it directly.  On the other hand, epigenetic marking of functional elements can easily and systematically be reversed, also enzymatically in response to specific environmental changes at the cell level.  These are somatic changes in DNA dynamics, but at least SoMu, if detected, basically doesn't get reversed within the same organism and is 'permanent' in that sense, and hence easier to interpret.

But--the mistake may go in the opposite direction!
But I've myself neglected another potentially quite serious problem.  SoMu's arise in the embryonic development of the tissues we use to get constitutive genome sequences.  The lineage leading to blood and other tissues divides from other lineages reasonably early in development.  The genome sequenced in blood is not in fact your constitutive genome!  Information found there may not be in other of your tissues, and hence not informative about your risks for traits involving gene expression.

The push for precision based on genomewide sequencing is misguided in this sense, the opposite of the non-detectability of SoMu's in blood samples.  The opposite may be true: what's is found in 'constitutive' genomes in blood samples may actually not be found in the rest of the body and may not have been in your inherited genome!

This may not be all that easy to check.  First, comparing parent to offspring, one should see a difference, that is, non-transmitted alleles in both parties.  But since neither parent's blood and offspring's blood is entirely their 'constitutive' genomes, it may be difficult to know just what was inherited.  Even if most sites don't change and follow parent-offspring patterns, it doesn't take that many changes to cause disease-related traits (if it did, then why would so much funding be going to 'Mendelian', that is, single gene, usually single-mutation traits)?

One could check sequences in individuals' tissues that are not in the same embryonic fate-map segment as blood, or compare cheek cells and blood, or other things of that nature.  In my understanding at least, lineages leading to cheek cells (ectodermal origin) and blood cells (mesodermal origin) separate quite early in development.  So comparing the two (being careful only to sample white cells and epithelial cells) could reveal the extent of the problem.

It might comfortingly show that little is at issue, but that should be checked.  However, of course, that would be costly and would slow down the train to get that Big Funding out of Congress and to keep the Big Labs and their sequencers in their constituencies in operation.

Still, if we are being fed promises that are more than just ploys for mega-funding in tight times,  or playing out of the belief system that inherited genome sequence is simply all there is to life, or is enough to know about, then we need to become able to look where genetic variation manifests its effects:  at the local cell level.  Even for a true-believer in DNA as everything, a blood-based sequence can only tell us so much--and that may not include the variation that exists in the person's other tissues.

Well, one might wish to defend the Infinite Genomes Project by saying that at least constitutive genome sequences from blood samples get most, or the main, signal by which genetic variation affects risk of traits like disease.  But is that even true?

First, huge genomewide mapping studies routinely, one might say notoriously relative to the genome faith, account for only a fraction, usually small fraction, of the estimated overall genetic contribution as estimated by measures like heritability.  Predictive power is quite limited (and here we're not even considering environments, which cloud the picture greatly).

But second, risk from constitutive genome sequence is, as a rule and especially for complex or late-onset traits that are so important to our health and longevity, accounting only for a fraction of overall risk.  That is, heritability is far below 100%.  So the bulk of risk is not to be found in such sequence data.  And while 'environment' is clearly of major importance, SoMu appears as environment in genomic studies, because the variants are not in constitutive sequences and not shared between parents and offspring in family studies.  This may be especially important for traits that really do seem to involve genes in the cellular mechanism, as so clearly shown by cancers.

Thus, it is not accurate to say that at least we even get the bulk of genetic (meaning inherited) risk accounted for by pie-in-the-sky exhaustive genome sequencing.  Yet, testing for SoMu is not even on the agenda of Big Data advocates.

How much more one would get from a serious approach to SoMu--which would require some serious innovative thinking--remains untested.  It's not on the agenda not because we know its relatively unimportant, but because it's hard to test, and in that sense hard to use to grease the wheels of current projects for which an excuse to keep funding is what is really being sought by the Big Data advocates.  It's safer, even if we know it's got its limits and we don't really know what those limits are.

A real 'genomic' approach should include checking for the problems caused by SoMu--in both directions!


Tania Gálvez San José said...

Hello Holly Dunsworth, sorry to write here but I would like to ask you something about your hiphotesis about why we are born at 9 months and "the obstetric dilemma". If it is something methabolic why are we mother's capable to breastfeed exclusively for 6 months. That is we are the givers of the only food our children have for 9 months of pregnancy + 6-7 months of exclusive breastfeeding. Why there is an energy limitation inside the womb and not outside with breastfeeding? Thank you for your answer.

Daniel said...

This post is ridiculous in several ways. Firstly, it overestimates the likely impact of somatic mutation on complex disease. Secondly, it falsely claims that modern genomics researchers are ignoring the phenomenon, when in fact there are numerous research projects currently underway exploring it (published examples below). And finally, it criticizes the value of large sequencing studies, when in fact such studies are precisely what will be required to establish the precise role of somatic mutation in these diseases.

Indeed, thus far our best evidence for a role of somatic mutations in complex disease comes from *gasp* large sequencing studies:



Both papers show convincingly that somatic mutations in known cancer-associated genes are present at some level in the blood of many healthy individuals, that this burden increases with age, and that it associates modestly with health outcomes.

There is a fundamental tension at the heart of nearly every post on this blog, and this one is no exception. You argue biology is complicated - and I agree. But you then argue that large-scale science is not required to understand it - and you are wrong. Complex, multi-causal phenomena require large samples to understand, and the only way to make progress in these complex diseases is to employ exactly the kinds of large-scale studies you hate. Sorry.

Holly Dunsworth said...

This is for Tania.
There is a limit to how long you can exclusively feed your baby with breastmilk. At some point, a mother's body cannot manufacture enough milk to nourish a growing child on a diet of her milk alone. For humans, evidence of this limit would be unethical to achieve experimentally/purposefully, but our model (of which you write) shows that lactating mothers are operating at that high metabolic rate and that hypothetical sustained ceiling just as infants become pretty big, are eating (it seems like) all the time, moms are generally back to their pre-pregnancy weight (or below, like me, I'm right there with a 5 month infant right now) and that's when we start adding supplemental foods to infant diets. There is a huge literature on lactation energetics across mammals if you're interested. Therein you could find out more about limitations on mothers as milk-factories.

Ken Weiss said...
This comment has been removed by the author.
Ken Weiss said...

[This replaces a poorly worded response, which was deleted]

Wouldn't expect you to agree with us, Daniel. If you are discussing cancer, there isn't as much if any difference from what we said in our post. Clearly other traits will be similar in regard to somatic mutation, but not really being looked for (and I think epilepsy may be another example). That is because you'd have to biopsy the tissues in ways or so often as to be impractical to say the least. But the variants aren't in constitutive genome sequences.

That hematopoietic cancer-associated variants are present in blood of healthy individuals proves one of the points in a sense, because those are somatically arisen in the tissue at risk which is the tissue sampled.

An issue we raised is whether the blood data reflect other tissues (because, for example, the blood-borne mutation is somatic in origin and not inherited). So, a BRCA1 mutation in blood that arose somatically in the hematopoietic embryogenic 'fate map', after its divergence from breast-tissue lineages, does not put the person at risk of breast cancer--or if it does and there's a mechanism, some sort of induction or horizontal transfer; it would be worth some major prize to show that.

Cancer, being clonal, is the archetype of phenotype amplification, because a single mutant cell grows as a clade that leads to an organismal-level problem. There are other possible amplification mechanisms. One is when somatic mutations arise early in an organ or tissues' embryonic fate-map, and can be absent in blood but present in much or all of the tissue's cells.

In that sense, as we said, blood not bearing such a variant could be misleading if a somatic mutation in earlier fate-map branches occurred in the gene. How important that is, is something we suggest be given more attention.

Big studies where they're warranted are warranted. Big mindless studies, mainly rescuing huge long-term studies that have already shown that major individual genetic risk factors aren't really there, are a low payoff for their cost. Lots of independent investigators could be funded to pursue diverse new ideas--most won't pay off, of course, because that's the nature of science, but at least the chance is greater (in my view, though I know not in yours).

If you need a gargantuan study to find something, then it's either rare and that has its own epistemological as well as importance issues. Or in some instances less costly ways may exist to find them.

Rare, high-penetrance variants can be very important, a lot more so than a bucketful of miscellaneous tiny-effect variants for largely environment-dependent late-onset traits. We have said often and in recent posts including today, that the former are worth attention even if the cost is high relative to the number of beneficiaries.

Bunches of rare, or even common, low-penetrance polygenic variants exist in hordes, but we don't need more massive genome sequencing to know that because we already do. And there are plenty of relevant epistemological issues even in interpreting those findings. Or, put another way, if we agree about complexity, and that no one yet really understands its structure (and, evolutionarily, that may not even be something unitary to expect), then before we drain resources with ever-more GWAS etc., now sequence-based rather than by mere millions of mapping markers, maybe some actually effective study designs to understand the nature complexity, that don't just assume enumerability or require sampling all of the human species, would be called for.

Of course, again, we can't expect someone from your perspective to like what we say. Even getting a response from you may mean we have raised relevant and important issues. Naturally, we don't think our readers, such as they be, are so naive that they can't tell the difference between our uninformed fairy stories and the truth.

Serious discussion rather than pre-emptive commitments such as are being reported out of NIH, is merited.

Daniel said...

Ken, thanks for your reply.

It's worth noting that the somatic mutations in blood appear to have a link to heart disease, as well as to the risk of hematological malignancies. The former association is still being pursued but looks pretty solid.

I can tell you that various smaller-scale studies have chased a role for somatic mutations in other tissues in other complex diseases, with no clear link established yet. NIMH currently has an actively call for funding proposals for exploring this association in psychiatric disease (http://grants.nih.gov/grants/guide/pa-files/PAR-14-173.html), belying your claim that no-one is taking this possibility seriously. There just isn't any solid evidence that somatic mutations in tissues outside ones with capacity for rapid clonal expansion (e.g. blood) will play any role whatsoever in non-cancer diseases.

"maybe some actually effective study designs to understand the nature complexity, that don't just assume enumerability or require sampling all of the human species"

Have I missed the part where you outlined the experimental design you'd like to see, in which you have reasonable power to detect a biologically plausible effect size without needing deep tissue sequencing in thousands of individuals?

Somatic mutations, if they do indeed play any substantial role in non-cancer diseases, will be scattered across many sites in the genome, will be present in a low depth in any given individual, and will contribute to only a fraction of cases for any given disease. (This sentence is what I believe to me the most likely scenario; I'd be happy to entertain reasonable arguments against it.) Under these circumstances there is simply no getting around the need for sequencing of the relevant tissues in very large sample sizes to establish their role. Unless you have suggestions for "effective study designs" that somehow overcome fundamental statistical realities?

"Even getting a response from you may mean we have raised relevant and important issues."

Please don't do this. I was pretty clear how I felt about your post. My negative comments do not imply that your post contains useful information. This discussion is interesting, but if you persist in this misinterpretation I'm happy to go back to not commenting.

Ken Weiss said...

I apologize for the mistake of thinking you took anything we said seriously! At least my view can't be interpreted as protecting a vested interest, since I'm not doing this sort of science any more (but, for what it's worth I do know at least a little, having spent most of my career doing what was 'big' data science for its time).

As to somatic mutation, given that finding it is a challenge, I think it’s premature to say, as you do, that “There just isn’t any solid evidence that somatic mutations in tissues outside ones with capacity for rapid clonal expansion…will play any role whatsoever in non-cancer diseases.” If hardly anyone has looked (a scattering of current work that you refer to doesn't belie the point). Epilepsy is one possibility, as I mentioned in the post (with phenotype amplification via entrainment of neurons), and one can imagine many others. Documenting it would of course be problematic.

But, I would predict just what you say, in a sense. If somatic mutations are a causal complement to inherited variation, then huge samples will be required to find them, just as huge samples were required to find regularly inherited genomic factors, as by GWAS. They would largely, in a sense, be the same sorts of changes that we see in constitutive causal variation.

On the other hand, there may be contexts in which important causal variation that would be lethal to embryos can accumulate harmlessly during life with serious organismal effects only under some circumstances such as I tried to describe in the post. It could also be the case that somatic mutations of a different sort than we see in inherited variation could have pathogenic effects when arising in the context of mature tissue.

Anyway, it would be exceedingly strange to find the kinds of complex traits that have been so heavily examined by GWAS and related approaches were suddenly simple when caused by somatic mutation, while being essentially polygenic when viewed via constitutive genomic variation. If careful studies showed such a thing, that would really be worth following up.

To say that somatic mutations will contribute to ‘only a fraction’ of cases (assuming you mean trivial fraction) is an assumption, is certainly wrong relative to cancer, where most mutations must be somatic, even in those inheriting a ‘cancer’ mutation. Recent work on retinoblastoma mutations and epigenetic marking, if it holds up, might show the point.

Somatic contributions to causation should be a larger challenge than trying to understand causation from inherited variation alone, since it would be comparably complex (though recombination is of a different sort and that might actually simplify things somewhat), with clades of differently behaving, differently genomic subsets. I think the studies characterizing the dynamics of tumors show this, as expected.

The degree of symmetry of causal effects in populations of people (or animals) diverging from common ancestry, and populations of cells diverging from a zygote, is an important issue and citing a scattering of studies that are looking at such things does not support arguments one way or another.

In any case, somatic mutation was just one of several apparently truly causal factors that massive constitutive genome sequencing doesn't identify (and this is besides the logistic costs and cogency of tying such sequencing to rescue decades-old projects which is what at least one story said was the plan for the Precision Medicine project).

Response to be continued below....

Ken Weiss said...

If one needs millions of genome sequences to pick up some causal signal that somehow is too weakly penetrant or rare to be seen even in hundreds of thousands in GWAS studies etc., then it will be interesting how one argues for their importance given the other problems both in public health and in genome sciences.

My answer is not a miracle new method, big or small scale. Whether I have such a method is irrelevant to the issues of what we currently know. There are focused well-posed questions one can ask, that may very well require constitutive genome sequencing, but perhaps where other things can also be included. As far as complex biomedical traits go, if public health measures greatly reduced the obvious, manifest, strong lifestyle factors, the cases that would remain would plausibly be the rarer but truly ‘genetic’ ones, and they would, like any rare trait with strong genetic causation, be worthy of and amenable to intense scrutiny.

To justify extensive somatic mutation work on 'hypothesis-free' pure induction studies even if including somatic mutation would be to square the complexity we already see with GWAS results. But omitting somatic mutation, just as omitting epigenetic and microbiomic factors is to put limits on what can be done with constitutive sequencing. Adding them probably wouldn't even help account for 'hidden' heritability. If hyper-scale sequencing doesn't just add noise upon noise, most of it with irreplicable sets of trivial contributors, then all the evidence gathered in the last couple of decades, experimental and observational, on a broad array of plant and animal species, has been wrong.

In fact, my views do not minimize the role of genetic causation. Somatic mutation effects could mean that, at the cell level, traits are even more 'genetic' than the heritability suggests, and yet not included in the measure of heritability.

But documenting complexity and its nature would in my view be a lot more effective in model systems where the variation is controlled and can be better understood. But I don’t know if even the extensive genome-scale work on yeast, bacteria, Drosophila, Arabidopsis and others, including controlled mouse multi-way intercrosses, have found things to be very much simpler or more enumerably ‘genetic’.

Even population genetic simulation, which I am doing, can contribute a lot at trivial cost, and very fast, and without things like sequencing or measurement errors. Simulation (though not with my particular program without major change) could include somatic mutation. The results can lead to well-posed questions for which, when relevant, genome sequencing etc would be well justified.