Monday, January 19, 2015

We can see the beast....but it's been us!

The unfathomable horrors of what the 'Islamists' are doing these days can hardly be exaggerated.  It is completely legitimate, from the usual mainstream perspective at least, to denigrate the perpetrators in the clearest possible way, as simply absolute evil.  But a deeper understanding raises sobering questions.

It's 'us' pointing at 'them' at the moment, and some aspects of what's going on reflect religious beliefs: Islam vs Christianity, Judaism, or the secular western 'faith'.  If we could really believe that we were fundamentally better than they are we could feel justified in denigrating their wholly misguided beliefs, and try to persuade them to come over to our True beliefs about morally, or even theologically acceptable behavior.

Unfortunately, the truth is not so simple.  Nor is it about what 'God' wants.  The scientific atheists (Marxist) slaughtered their dissenters or sent them to freeze in labor camps by the multiple millions. It was the nominally Christian (and even Socialist) Nazis who gassed their targets by the millions. And guess who's bombing schools in Palestine these days?

Can we in the US feel superior?  Well, we have the highest per capita jailed population, and what about slavery and structural racism?  Well, what about the Asians?  Let's see, the rape of Nanking, Mao's Cultural Revolution, the rapine Huns.....

Charlie Hebdo is just a current example that draws sympathy, enrages, and makes one wonder about humans.  Haven't we learned?  I'd turn it around and ask: has anything even really changed?

Christians have made each other victims, of course.  Read John Fox's Book of Martyrs from England in the 1500's (or read about the more well-known Inquisition).  But humans are equal opportunity slaughterers. Think of the crusades and back-and-forth Islamic-Christian marauding episodes.  Or the Church's early systematic 'caretaking' of the Native Americans almost from the day Columbus first got his sneakers wet in the New World, not to mention its finding justification for slavery (an idea going back to those wonderful classic Greeks, and of course previously in history).  Well, you know the story.

Depiction of Spanish atrocities committed in the conquest of Cuba in Bartolomé de Las Casas's "Brevisima relación de la destrucción de las Indias", 1552.   The rendering was by the Flemish Protestantartist Theodor de Bry. Public Domain. 

But this post was triggered not just by the smoking headlines of the day, but because I was reading about that often idealized gentle, meditative Marcus Aurelius, the Roman Emperor in the second century AD.  In one instance, some--guess who?--Christians had been captured by the Romans and were being tortured: if they didn't renounce their faith, they were beheaded (sound familiar?) or fed to the animals in a colosseum.  And this was unrelated to the routine slavery of the time. Hmmm...I'd have to think about whether anyone could conceive of a reason that, say, lynching was better than beheading.

It is disheartening, even in our rightful outrage at the daily news from the black-flag front, to see that contemporary horrors are not just awful, they're not even new!  And, indeed, part of our own Western heritage.

Is there any science here?  If not, why not?
We try to run an interesting, variable blog, mainly about science and also its role in society.  So the horrors on the Daily Blat are not as irrelevant as they might seem:  If we give so much credence, and resources, to science, supposedly to make life better, less stressful, healthier and longer, why haven't we moved off the dime in so many of these fundamental areas that one could call simple decency--areas that don't even need much scientific investment to document?

Physics, chemistry and math are the queens of science.  Biology may be catching up, but that would seem today mainly to be to the extent we are applying molecular reductionism (everything in terms of DNA, etc). That may be physics worship or it may be good; time will tell, but of course applied biology can claim many major successes. The reductionism of these fields gives them a kind of objective, or formalistic, rigor.  Controlled samples or studies, with powerful or even precise instrumentation are possible to measure and evaluate data, and to form testable credible theory about the material world.

But a lot of important things in life seem so indirect, relative to molecules, that one would think there could also be, at least in principle,  comparably effective social and behavioral sciences that did more than lust after expensive, flashy reductionist equipment (DNA sequencing, fMRI imaging, super-computing, etc.) and the like.  Imaging and other technologies certainly have made much of the physical sciences possible by enabling us to 'see' things our organic powers, our eyes, nose, ears, etc.,  could not detect.  But the social sciences?  How effective or relevant is that lust to the problems being addressed?

The cycling and recycling of social science problems seems striking.  We have plentiful explanations for things behavioral and cultural, and many of them sound so plausible.  We have formal theories structured as if they were like physics and chemistry: Marxism and related purportedly materialist theories of economics, cultural evolution, and behavior, and 'theories' of education, which are legion yet the actual result has been sliding for decades.  We have libraries-full of less quantitively or testably rigorous, more word-waving 'theories' by psychologists, anthropologists, sociologists, economists and the like.  But the flow of history and, one might say, its repeated disasters, shows, to me, that we as yet don't in fact have nothing very rigorous, despite a legacy going back to Plato and the Greek philosophers.

We spend a lot of money on the behavioral and social sciences with 'success' ranging from very good for very focal types of traits, to none at all when it comes to what are the major sociocultural phenomena like war, equity, and many others.  We have journal after journal, shelves full of books of social 'theory', including some (going back at least to Herbert Spencer) that purport to tie physical theory to biology to society, and Marx and Darwin are often invoked, along with ideas like the second law of thermodynamics and so on.  Marx wanted a social theory as rigorous as physics, and materialist, too, but in which there would be an inevitable, equitable end to the process.  Spencer had an end in mind, too, but one with a stable inequality of elites and the rest.  Not exactly compatible!

And this doesn't include social theories derived from this or that world religion.  Likewise, of course, we go through psychological and economic theories as fast as our cats go through kibbles, and we've got rather little to show for it that could seriously claim respect as science in the sense of real understanding of the phenomena.  When everyone needs a therapist, and therapists are life-long commitments, something's missing.





Karl Marx and Herbert Spencer, condemned to face each other for eternity at Highgate Cemetery in London (photos: A Buchanan)

Either that, or these higher-levels of organized traits simply don't follow 'laws' the way the physical phenomena do.  But that seems implausible since we're made of physical stuff, and such a view would take us back to the age-old mind-matter duality, endless debate about free will, consciousness, soul, and all the rest back through the ages.  And while this itemization is limited to western culture, there isn't anything more clearly 'true' in the modern East, nor in the cultures elsewhere or before ours.

Those with vested interests in their fMRI machines, super-computer modeling, or therapy practices will likely howl 'Foul!' It's hard not to believe that in the past there were a far smaller percentage of people with various behavioral problems needing chemical suppression or endless 'therapy' than there is today.  But if there were, and things are indeed changing for the worse, this further makes the point.  Why aren't mental health problems declining, after so much research?

You can defend the social sciences if you want, but in my personal view their System is, like the biomedical one, a large vested interest that keeps students off the street for a few years, provides comfy lives for professors, fodder for the news media and lots of jobs in the therapy and self-help industries (including think-tanks for economics and politics).....but has not turned daily life, even in the more privileged societies, into Nirvana.

One can say that those interests just like things to stay the way they are, or argue that while their particular perspective can't predict every specific any more than a physicist can predict every molecule's position, generic, say, Darwinian competition-is-everything views are simply true. Such assertions--axioms, really--are then just accepted and treated as if they're 'explanations'. If you take such a view, then we actually do understand everything!  But even if these axioms--Darwinian competition, e.g.--were true, they have become such platitudes that they haven't proven themselves in any serious sense, because if they had we would not have multiple competing views on the same subjects.  Despite debates on the margins, there is, after all, only one real chemistry, or physics, even if there are unsolved aspects of those fields.

The more serious point is this:  we have institutionalized research in the 'soft' as well as 'hard' sciences.  But a cold look at much of what we spend funding on, year after year without demanding actual major results, would suggest that we should be addressing the lack of real results as perhaps the more real or at least more societally important problem these fields should be addressing--and with the threat of less or no future funding if something profoundly better doesn't result.  In a sense, engineering works in the physical sciences because we can build bridges without knowing all the factors involved in precise detail.  But social engineering doesn't work that way.

After all, if we are going to spend lots of money on minorities (like professors, for example), we would be better to take an engineering approach to problems like 'orphan' (rare) diseases, which are focused and in a sense molecular, and where actual results could be hoped for.  The point would be to shift funds from wasteful, stodgy areas that aren't going very far.  Even if working on topics like orphan diseases is costly, there are no other paths to the required knowledge other than research with documentable results.  Shifting funding in that direction would temporarily upset various interests, but would instead provide employment dollar to areas and people who could make a real difference, and hence would not undermine the economy overall.

At the same time, what would it take for there to be a better kind of social science, the product of which would make a difference to human society, so we no longer had to read about murders and beheadings?

Thursday, January 15, 2015

When the cat brings home a mouse

To our daughter's distress, she needs to find a new home for her beloved cats, so overnight we've gone from no cats to three cats, while we try to find them someplace new.  I haven't lived with cats since I was a kid really, because I was always allergic.  When I visited my daughter, I'd get hives if Max, her old black cat, sadly now gone, rubbed against my legs, and I always at least sneezed even when untouched by felines.  But now with three cats in the house, I'm allergy-free and Ken, never allergic to cats before, is starting to sneeze -- loudly.


Old Max

Casey


Oliver upside-down


But the mystery of the immune system is just one of the mysteries we're confronting -- or that's confronting us -- this week.  Here's another.  The other day my daughter brought over a large bag of dry cat food.  I put it in a closet, but the cats could smell it, and it drove them nuts, so I moved it into the garage.  A few days later I noticed that the cats were all making it clear that they really, really wanted to go into the garage, but we were discouraging that given the dangers of spending time in a location with vehicles that come and go unpredictably. I just assumed they could smell the kibbles, or were bored and wanted to explore new horizons.

But two nights ago I went out to the garage myself to get pellets for our pellet stove, and Mu managed to squeeze out ahead of me.  He made a mad dash for the kibbles.  Oliver was desperate to follow, but I squeezed out past him and quickly closed the door.  At which point, Mu came prancing back, squeaking.  Oh wait, he wasn't squeaking, it was the mouse he was carrying in his mouth that was squeaking!  He was now just as eager to get back in the house as he'd been to get out.  After a few minutes he realized that wasn't going to happen, so he dropped the now defunct mouse, and I let him back in.

Mu, the Hunter
So, that 'tear' in the kibbles bag that I'd noticed a few days before?  Clearly made by a gnawing mouse (mice?).  And the cats obviously had known about this long before I did.  But how did Mu know exactly where to make a beeline to to catch the mouse?  He'd never seen where I put the bag, nor the mouse nibbling at it!  And I have to assume the other cats would have been equally able hunters had they been given the chance.

Amazing.  A whole undercurrent of sensory awareness and activity going on right at our feet, and we hadn't clued in on any of it.  I'd made unwarranted assumptions about holes in the bag, but the cats knew better.  Yes, I could have looked more closely at the kibble that had spilled out of the bag and noticed the mouse droppings.  But I didn't, because, well, because it didn't occur to me.

Though, now that I'm clued in, I believe we've got another mouse...


Mu and Ollie at the door to the garage yesterday afternoon


And?
I might even have been able to detect the mouse without seeing any of the evidence, just like the cats, if I'd tuned in more attentively, but I'm pretty sure it would have required better hearing.  In any case, other bits of evidence more suited to my perceptive powers were available, but I didn't notice.  I take this as yet another cautionary tale about how we know what we know, and I will claim it applies as well to politics, economics, psychology, forensics, religion, science, and more.  We build our case on preconceived notions, beliefs, assumptions, what we think is true, rarely re-evaluating those beliefs -- unless we're forced to, when, say, Helicobacter pylori is found to cause stomach ulcers, or our college roommate challenges our belief in God, or economic austerity does more harm than good.

As Holly often says, scientists shouldn't fall in love with their hypothesis.  Hypotheses are made to be tested; stretched, pounded, dropped on the floor and kicked, and afterwards, and continually, examined from every possible angle, not defended to the death.  But we often get too attached, and don't notice when the cat brings home a mouse.

An illustrative blog post in The Guardian by Alberto Nardelli and George Arnett last October tells a similar tale (h/t Amos Zeeberg on Twitter).  "Today’s key fact: you are probably wrong about almost everything."  Based on a survey by Ipsos Mori, Nardelli and Arnett report disconnects between what people around the world believe is true about the demographics of their country, and what's actually true.

So, people in the US overestimate the percentage of Muslims in the country, thinking it's 15% when it's actually 1%.  Japanese think the percentage of Muslims is 4% when it's actually 0.4%, and the French think it's 31% while it's actually 8%.

In the US, we think immigrants make up 32% of the population, but in fact they are 13%.  And so on.  We think we know, but very often we're wrong.  We're uninformed, ill-informed, or under informed, even while we think we're perfectly well informed.

Source: The Guardian

The Guardian piece oozes political overtones, sure.  But I think it is still a good example of how we go about our days, thinking we're making informed decisions, based on facts, but it's not always so.  A minority of Americans accept evolution, despite the evidence; you made up your mind about whether Adnan is guilty or innocent if you listened to Serial, even though you weren't a witness to the murder, and the evidence is largely circumstantial.  And so on.  And this all has consequences.

In a sense, even if we are right about what we think, or its consequences, based on what we know, it's hard to know if we are missing relevant points because we simply don't have the data, or haven't thought to evaluate it correctly, as me in regard to Mu and the mouse.  We have little choice but to act on what we know, but we do have a choice about how much confidence, or hubris, we attribute to what we know, to consider that what we know may not be all there is to know.

This is sobering when it comes to science, because the evidence for a novel or alternative interpretation might be there to be seen in our data, but our brains aren't making the connections, because we're not primed to or because we're unaware of aspects of the data.  We think we know what we're seeing, and it's hard to draw different conclusions.

Fortunately, occasionally an Einstein or a Darwin or some other grand synthesizer comes along and looks at the evidence in a different way, and pushes us forward.  Until then, it's science as usual; incremental gains based on accepted wisdom.  Indeed, even when such a great synthesizer provides us with dramatically better explanations of things, there is a tendency to assume that now, finally, we know what's up, and to place too much stock in the new theory......repeating the same cycle again.

Tuesday, January 13, 2015

The Genome Institute and its role

The NIH-based Human Genome Research Institute (NHGRI) has for a long time been funding the Big Data kinds of science that is growing like mushrooms on the funding landscape.  Even if overall funding is constrained, and even if this also applies to the NHGRI (I don't happen to know), the sequestration of funds in too-big-to-stop projects is clear. Even Francis Collins and some NIH efforts to reinvigorate individual-investigator RO1 awards don't really seem to have stopped the grab for Big Data funds.

That's quite natural.  If your career, status, or lab depends on how much money you bring into your institution, or how many papers you publish, or how many post-docs you have in your stable, or your salary and space depend on that, you will have to respond in ways that generate those score-counting coups.  You'll naturally exaggerate the importance of your findings, run quickly to the public news media, and do whatever other manipulations you can to further your career.  If you have a big lab and the prestige and local or even broader influence that goes with that, you won't give that up easily so that others, your juniors or even competitors can have smaller projects instead.  In our culture, who could blame you?

But some bloggers, Tweeters, and Commenters have been asking if there is a solution to this kind of fund sequestration, largely reserved (even if informally) for the big usually private universities.  The arguments have ranged from asking if the NHGRI should be shut down (e.g., here) to just groping for suggestions.  Since many of these questions have been addressed to me, I thought I would chime in briefly.

First, a bit of history or perspective, as informally seen over the years from my own perspective (that is, not documented or intended to be precise, but a broad view as I saw things):
The NHGRI was located administratively where it was for reasons I don’t know.  Several federal institutes were supporting scientific research.  NIH was about health, and health 'sells', and understandably a lot of fund is committed to health research.  It was natural to think that genome sequences and sciences would have major health implications, if the theory that genes are the fundamental causal elements of life was in fact true.  Initially James Watson, discoverer of DNA's structure, and perhaps others advocated the effort.  He was succeeded by Francis Collins who is a physician and clever politician.
However, there was competition for the genome ‘territory’, at least with the Atomic Energy Commission.  I don’t know if NSF was ever in the ‘race’ to fund genomic research, but one driving force at the time was the fear of mutations that atomic radiation (therapeutic, from wars, diagnostic tests, and weapons fallout) generated.  There was also a race with the private sector, notably Celera as a commercial competitor that would privatize the genome sequence.  Dr Collins prominently, successfully, and fortunately defended the idea of open and free public access.  The effort was seen as important for many reasons, including commercial ones, and there were international claimants in Japan, the UK, and perhaps elsewhere, that wanted to be in on the act.  So the politics were rife as well as the science, understandably.
It is possible that only with the health-related promises was enough funding going to be available, although nuclear fears about mutations and the Cold War probably contributed, along with the usual less savory for self-interest, to AEC's interests.
Once a basic human genome sequence was available, there was no slowing the train. Technology, including public and private innovation promised much quicker sequencing in the future, that was quickly to become available even to ordinary labs (like mine, at the time!).  And once the Genome Institute (and other places such as the Sanger Centre in Britain and centers in Japan, China, and elsewhere) were established, they weren't going to close down!  So other sequences entered the picture--microbes, other species, and so on.  
It became a fad and an internecine competition within NIH.  I know from personal experiences at the time that program managers felt the need to do 'genomics' so they would be in on the act and keep their budgets.  They had to contribute funds, in some way I don't recall, to the NHGRI's projects or in other ways keep their portfolios by having genomics as part of this.  -Omics sprung up like weeds, and new fields such as nutrigenomics, cancer genomics, microbiomics and many more began to pull in funding, and institutes (and the investigators across the country) hopped aboard.  Imitation, especially when funds and current fashion are involved, is not at all a surprise, and efficiency or relative payoff in results took the inevitable back seat: promises rather than deliveries naturally triumphed.
In many ways this has led to the current of exhaustively enumerative Big Data: a return to 17th century induction.  This has to do not just with competition for resources, but a changed belief system also spurred by computing power: Just sample everything and pattern will emerge!
Over the decades the biomedical (and to some lesser extent biological) university establishment grew on the back of the external funding which was so generous for so long.  But it has led to a dependency.  Along with exponential growth in the number of competitors, hierarchies of elite research groups developed--another natural human tendency.  We all know the career limitations that are resulting from this.  And competition has meant that deans and chairs expect investigators always to be funded, in part because there aren't internal funds to keep labs running in the absence of grants. It's been a vicious self-reinforcing circle over the past 50 years.
As hierarchies built, private donors were convinced (conned?) into believing that their largesse would lead to the elimination of target diseases ('target' often meaning those in the rich donors' families). Big Data today is the grandchild of the major projects, like the Manhattan Project in WWII, that showed that some kinds of science could be done on a large scale.  Many, many projects during past decades showed something else: Fund a big project, and you can't pull the plug on it!  It becomes too entrenched politically.  
The precedents were not lost on investigators!  Plead for bigger, longer studies, with very large investments, and you have a safe bet for decades, perhaps your whole career. Once started, cost-benefit analysis has a hard time paring back, much less stopping such projects. There are many examples, and I won't single any of them out.  But after some early splash, by and large they have got to diminishing returns but not got to any real sense of termination: too big to kill.
This is to some extent the same story with the NHGRI.  The NIH has got too enamored of Big Data to keep the NHGRI as limited or focused as perhaps it should have been (or should be). In a sense it became an openly anti-focused-research sugar daddy (Dr Collins said, perhaps officially, that NHGRI didn’t fund ‘hypothesis-based research”) based on pure inductionism and reductionism, so it did not have to have well-posed questions.  It basically bragged about not being focused.
This could be a change in the nature of science, driven by technology, that is obsolescing the nature of science that was set in motion in the Enlightenment era, by the likes of Galileo, Newton, Bacon, Descartes and others.  We'll see.  But the socioeconomic, political sides of things are part of the process, and that may not be a good thing.
Will focused, hypothesis-based research make a comeback?  Not if Big Data yields great results, but decades of it, no matter how fancy, have not shown the major payoff that has been promised.  Indeed, historians of science often write that the rationale, that if you collect enough data its patterns (that is, a theory) will emerge, has rarely been realized.  Selective retrospective examples don't carry the weight often given them.

There is also our cultural love affair with science.  We know very clearly that many things we might do at very low cost would yield health benefits far exceeding even the rosy promises of the genomic lobby.  Most are lifestyle changes.  For example, even geneticists would (privately, at least) acknowledge that if every 'diabetes' gene variant were fixed, only a small fraction of diabetes cases would be eliminated. The recent claim that much of cancer is due just to bad mutational luck has raised lots of objections--in large part because Big Data researchers' business would be curtailed. Everyone knows these things.


What would it take to kill the Big Data era, given the huge array of commercial, technological, and professional commitments we have built, if it doesn't actually pay off on its promises?  Is focused science a nostalgic illusion? No matter what, we have a major vested interest on a huge scale in the NHGRI and other similar institutes elsewhere, and grantees in medical schools are a privileged, very well-heeled lot, regardless of whether their research is yielding what it promises.


Or, put another way, where are the areas in which Big Data of the genomic sort might actually pay, and where is this just funding-related institutional and cultural momentum?  How would we decide?


So what do to?  It won't happen, but in my view the NHGRI does not, and never did, belong properly in NIH. It should have been in NSF, where basic science is done.  Only when clearly relevant to disease should genomics be funded for that purpose (and by NIH, not NSF).  It should be focused on soluble problems in that context.
NIH funds the greedy maw of medical schools.  The faculty don't work for the university, but for NIH.  Their idea of 'teaching' often means giving 5-10 lectures a year that mainly consist of self-promoting reports about their labs, perhaps the talks they've just given at some meeting somewhere. Salaries are much higher than at non-medical universities--but in my view grants simply should not pay faculty salaries.  Universities should.  If research is part of your job's requirements, its their job to pay you.  Grants should cover research staff, supplies and so on.
Much of this could happen (in principle) if the NHGRI were transferred to NSF and had to fund on an NSF-level budget policy.  Smaller amounts, to more people, on focussed basic research.  The same total budget would go a lot farther, and if it were restricted to non-medical school investigators there would be the additional payoff that most of them actually teach, so that they disseminate the knowledge to large numbers of students who can then go out into the private sector and apply what they've learned.  That's an old-fashioned, perhaps nostalgic(?) view of what being a 'professor' should mean.  
Major pare-backs of grant size and duration could be quite salubrious for science, making it more focused and in that sense accountable.  The employment problem for scientists could also be ameliorated.  Of course, in a transition phase, universities would have to learn how to actually pay their employees.
Of course, it won't happen, even if it would work, because it's so against the current power structure of science.  And although Dr Collins has threatened to fund more small RO1 grants it isn't clear how or whether that will really happen.  That's because there doesn't seem to be any real will to change among enough people with the leverage to make it happen, and the newcomers who would benefit are, like all such grass-roots elements, not unified enough.
These are just some thoughts, or assertions, or day-dreams about the evolution of science in the developed world over the last 50 years or so.  Clearly there is widespread discontent, clearly there is large funding going on with proportionately little results.  Major results in biomedical areas can't be expected over night.   But we might expect that research had more accountability.

Thursday, January 8, 2015

Genomewide mapping and a correlation fallacy

When there isn't an adequate formal theory for determining cause and effect, we often must rely on searches for statistical associations between variables that we, for whatever reason, think might cause an outcome and the occurrence of the outcome itself.  One criterion is that the putative cause must arise before its effect, that is, the outcome of interest.  That time-order is sometimes not clear in the kinds of data we collect, but we would normally say we're lucky in genetics because a person is 'exposed' to his or her genotype from the moment of conception.  Everything that might cause an outcome, say a disease, comes after that.  So gene mapping searches for correlations between inherited genomic variation and variation in the outcome.  But the story is not as crystal clear as is typically presented.

In genomewide mapping studies, like case-control GWAS (or QTL mapping for quantitative traits), we divide the data into categories, based on say two variants (SNP alleles), A and B, at some genome position X. Then, if the outcome--say some disease under investigation--is more common among A-carriers than among B-carriers at some chosen statistical significance level, it is common to infer or even to assert that the A-allele is a causal factor for the disease (or, less often put this way, that B is causally protective).  The usual story is that the difference is far from categorical, that is, the A-bearing group is simply at a higher probabilistic risk of manifesting the trait.

However, the usually unstated inference is that the presence of SNP A has some direct, even if only probabilistic, effect in causing the outcome.  The gene may, for example, be involved in some signaling pathway related to the disease, so that variation in A affects the way the pathway, as a whole, protects or fails to protect the person.

Strategies to protect against statistical artifact
We know that in most cases many, or even typically most people with the disease do not carry the A allele, because the relative risks associated with the A allele are usually quite modest.  So the correlation might be false, because as we know and too often overlook, correlation does not in itself imply causation.  One way to test for potential confusion is to compare AA, AB, and BB genotypes to see if the 'dose' (number of copies) of A is correlated with some aspect of the disease.  Usually there isn't enough data to resolve such differences with much convincing statistical rigor.

Another approach to protect against false associations is to extend the study and see if the same association is retained.  But this is often very costly because of sample size demands or, if the study is in a small population, perhaps impossible.  Likewise, people exit and enter study groups, changing the mix of variation, risking obscuring signal.  One way to try to show systematic effects is to do a meta-analysis, by pooling studies.  If the overall correlation is still there, even if individual studies have come to different risk estimates, one may have more confidence.  This is, to my understanding, usually not done by regressing the allele frequency with risk, which seems like something that should be done, but there is heterogeneity in method, genotyping, accuracy, and size among studies so this is likely to be problematic.

An issue that seems often, if not usually to have been overlooked
An upshot of this typical kind of finding of very weak effect sizes is that the disorder is the result of any of a variety of genomic backgrounds (genomewide genotypes) as well as lifestyle exposures that aren't being measured.  The background differences may complement the A's effects in varying ways so that the net effect is real, but on average weak.  That's why non-A carriers have almost the same level of risk (again, that is, the net effect size of A is small).

But the problem arises when the excess risk in the A-carriers is assumed to be due to that allele.  In fact, and indeed very likely, is that even in many affected A-carriers the disease may have arisen because of risky variables in networks other than the one involving the 'A/B' gene.  That is why nearly as high a proportion of non-A-carriers are affected.  Because of independent assortment and recombination among genes and chromosomes, the same distribution of backgrounds will be found in the A-bearing cases (though none of the genome-types will be the same even between any two individuals).  In those A-individuals their outcome may be due entirely to variants other than the A allele itself, for example, because of variants in genes in other networks.  That is, some, many, or even most A-bearing cases may not, in fact, be affected because of the A allele.

This seems to me very likely to be a common phenomenon, given what we know about the complex genotypic variation at the thousands of potentially relevant sites typed in genomewide analysis like GWAS in any sample. One well-known issue that GWAS methods can and often do correct for, is that some factors related to population structure (origins and marriage patterns among the sampled individuals) can induce false correlations.  But even after that correction, given the true underlying causal complexity, it is likely that for some SNP sites it is only chance distributions of different complex genotypes between the A- and non-A SNP genotypes that suffice to generate the weak statistical effects, when so many sites in the genome are tested.

Suppose the A allele's estimated effects raise the risk of the disease to 5% in the A-carriers, and let's assume for the moment the convenient fiction that there is no error in this estimate.  It may be the case that 5% of the A-bearing cases are due to the presence of the very-strong A-allele, and are doomed to the disease, whereas the other 95% of A-bearing are risk-free.  Or it could be that every A-carrier's risk is elevated by some fraction so the average is 5%.  Given that almost as many cases are usually seen in non-A carriers, such uniformity seems unlikely to be true.  Almost certainly, at least in principle, is that the A-carriers have a distribution of risk, for whatever background genomic or environmental or stochastic reasons, but that whose average is 5%.  These alternative interpretations are very difficult to test, and when does anyone actually bother?

The problem relates to intervention strategies
For many years, we have know from all sorts of mapping studies that most identified sites have very weak effects.  We know that in many cases environmental factors are vastly more important (because, for example, the disease prevalence has changed dramatically in the last few decades).  But the justification (rationale, or excuse) for continuing the huge Big Data approach is that it will at least identify 'druggable target' genes so the culpable pathway can be intervened in.  Hoorah! for Big Pharma--even if the gene itself isn't that important, a network has many intervention points.

However, to the extent that the potential correlation fallacy discussed here is at play, targeting genetically based therapy at the A allele may fail not because the targeting doesn't work but because most A-carriers are affected mainly or exclusively for other reasons.  If the inferential fallacy is not addressed, think of how long and how costly it would be to end up doing better.

The correlation fallacy discussed here doesn't even assume that the A-allele is not a true risk factor, which as we noted above may often be the case if the results in Big Data studies are largely statistical artifacts.  The issue is that the A effect is unlikely to be very strong, because otherwise it would be easier to see or show up in family studies (as some rare alleles do, in fact), but simply because most individuals in both the A and non-A categories are affected for completely unrelated reasons.   Again what we know about recombination, somatic mutation, independent assortment, and the complexity of gene and gene-environmental interactions, suggests that this simply must often be true.   The correlation fallacy may pervasively lurk behind widely proclaimed discoveries from genome mapping.

Wednesday, January 7, 2015

The complex evolution of personality

So, apparently even sea anemones have personalities.  The idea that non-human animals can be measurably, say, bolder or shyer than others of their species may or may not be a surprising idea to you, perhaps depending on how many cats, dogs, horses, laboratory mice you have known.  But, scientists who study animal behavior are currently focusing on animal personality in a big way.  The BBC Radio 4 program Discovery discussed this the other day, and to us, the discussion raised some unintended points.

Hermit crab; Wikipedia

Presenter Adam Hart interviewed behavioral scientists studying personality in animals as diverse as songbirds and sea anemones. All agreed that variation is the norm.  Daniel Nettle, at the University of Newcastle, described five different dimensions to human personality: extroversion, neuroticism, agreeableness, conscientiousness, openness to experience.  They aren't all found in non-human animals, he said, though some seem to be, and all seem to be present in chimps.

Personality variation in the great tit, a small songbird, has been studied by many people (e.g., here and here).  For example, Samantha Patrick from the University of Gloucestershire described catching birds in the wild and releasing them into a room furnished with artificial trees, which they hadn't seen before.  Each bird's behavior upon first seeing the room is recorded, and its 'exploration score' calculated, to determine where the bird sits on a boldness/shyness range.  Fast explorers are more aggressive, and more willing to take risks than slow explorers.  And, when birds are artificially selected for parental aggression or calm, heritability of such personality traits is consistently around 50% -- that is, 50% of the behavior seems to have a genetic source.  But behavioral plasticity has been found to be common in great tits as well.

Great tit; Wikipedia, photo by Lviatour

But it's not just vertebrates that are being studied.  Mark Briffa at Plymouth University studies boldness and shyness in hermit crabs. He disturbs them by lifting them out of the water and turning them upside-down, leaves them thus for five seconds, and then replaces them in the water, and measures the time it takes for them to return to normal.  He then gives them a boldness rating.

Hart asked Briffa whether boldness in hermit crabs is at all equivalent to human extroversion.  That is, whether understanding hermit crab behavior give us any insights into human behavior.  Briffa's answer was that humans share an ancient neurobiology with other animals, including hermit crabs, and looking at that will help "simplify the problem" of human behavior.  Maybe.

So, there are conscientious crabs, those that spend a lot of time investigating empty snail shells as they make the decision about which one to move into next.  And there are bold hermit crabs, who seem to choose with little consideration.  There are evolutionary trade-offs to each of these behaviors, Briffa said.  The conscientious crabs get better shells, but waste a lot of time looking, while the bold crabs get iffier shells but saved time.  Time for what, exactly, it wasn't clear.  Can't be making more hermit crabs, because personality traits are not linked with fitness, and Briffa was not the only interviewee who said this about the animal they study.  That is, there's no individual reproductive advantage to being bold or shy, conscientious or not.  If there were, of course, there'd be a lot less variation in personality because it would have been selected out of the population in favor of the personality trait that led to more offspring.

How did these traits evolve?
This means that these traits aren't here because they were favored by natural selection, at least not in the present if these studies are any indication.  They could be here just by chance for reasons of ecology or population structure of some kind in the species' pasts.  Or, it might mean that there's a lot more plasticity in personality than is being reported, or identified by current methods, and that in some way plasticity is genetically enabled.  And indeed this would be expected, given that adaptability is so widespread that we've called it a fundamental principle of life.  A brain that can sense its circumstances, evaluate them, and plan responses may be what has evolved, but different brains, even if they were developed from the same genotype, might make different decisions.

However Hart, and interviewees, asserted that natural selection has favored a variety of personality types. But if no personality type has more offspring, gradually out reproducing the others, this can't be. It's only possible if group selection, natural selection that can see, and thus choose traits that benefit a group, is at work.

So, great tits can be shy or bold.  If an entire flock is bold, they are often on the move and able to locate new food sources, but they ignore each other, and that's bad for the cohesion of the group.  Shy birds stay together, but they don't move to new food sources, and that's bad for the health of the group.  A mix of bold and shy birds is ideal; the bold birds ensure that the flock moves to new food sources, and the shy birds follow.  Group selection would have favored a flock that includes a mix of personality traits, for the benefit of the flock rather than only a single personality trait, with its serious flaws.

But, this is a controversial issue.  Even Darwin, who himself addressed behavior including that in humans and our closer relatives, was rather mixed on this point.  The kind of mixed-flock just described could be a case of complex balanced polymorphism if the traits are genetically determined.  Too many bolds, bad for the group, too many shy, bad for the group.  The bold/shy genotypes' fitness is a function of the population in which they occur.  That would be standard genetic theory.

Group selection is coming back into fashion, being seen these days by all sorts of theoretical modelers and empirical investigators.  It has a checkered history.  The issue is that individuals shed their genes or not, and a genotype that engenders a particular behavior that is good for the group is fine -- so long as it's even better for the individuals with the genotype.  The reason this is contentious is that those strong Darwinians who are unhappy with any sort of resistance to pure individual selection argue that a genotype that favors the group cannot proliferate if that is at the relative expense of the individual with the genotype.  Otherwise, the group may do fine, but the genotype cannot become more relatively common over time.  At least not within the group.

The issues are quite mixed, and anthropomorphizing evolutionary modelers often say that an altruistic group-favoring gene variant will either be outcompeted, or that in effect it enables 'cheaters', without the good-guy genotype, to succeed at the good-guys' expense.  So the altruism-conferring variant loses out in the end.  If a group grows at the relative expense of other groups of the same species, the good-guy variants might overall increase in frequency (since the group without good guys disappears), but eventually, from a strongly deterministic Darwinian view of natural selection, the good-guy gene will get driven out.

How much does the work described above help explain behavior and the evolution of behavior in humans, where sociocultural factors clearly play a larger part in behavior than in non-humans?  The assumption, presumably, is that what's being explored here is the genetic aspect of behavior.  But, if personality is only 50% heritable in great tits, who don't share the extent of cultural influence on behavior that humans have, then it's hard to accept that we're getting at something that can be explained primarily by hard-wiring in humans.  At the very least, chance is playing a role comparable in strength to selection in determining group and individual success.  Of course, anthropomorphizing is difficult to avoid and it's difficult to tell when it's justified or not; indeed, cultural evolution has been described even in great tits, with the spread of learned behavior across a wide area.

Darwin wrote quite a lot about behavior, including altruism, aesthetics, personality and so on in Descent of Man, where he tried to show continuity between humans and other species.  His specific agenda, and even explicitly stated, was to displace religious creationism as an explanation for animal (and plant) diversity.  So he wanted humans to have traits that other animals have.  Several chapters deal with his ideas of these sorts of behavioral and communal sharing, including explanations of altruism.  He stressed behavioral continuity also in his Expression of the Emotions in Man and Animals.

Darwin was hand-waving much of the time when he did this.  And while he talked of selection, his main point was continuity and descent from common ancestry.  This is very different from observing what other species do, assessing when or how or if a trait is actually genetic in a way simple enough for selection to screen it at the gene level, and determining if, in fact, variants of the trait affect fitness.  Short term observations and risk of things like anthropomorphizing, and the likelihood that behavioral patterns for individuals may vary during their lives or in different circumstances, make the area difficult to study definitively in evolutionary terms.

But one thing is definitively clear:  animals do behave in variable ways, and that's fascinating enough.

Tuesday, January 6, 2015

Is cancer just bad luck? Part II. It's a genetic, but usually unpredictable, disease

Yesterday, we discussed some history of research on the cause and predictability of cancer.  Today, we'll try to raise some questions that seem to have been overlooked in the recent Tomasetti and Vogelstein paper in Science that argues that much or most cancer, with a few notable and clear exceptions, does not arise from inherited genetic mutations, nor from lifestyle exposures, but arises just by bad luck during the countless cell divisions that occur during our lives.  Much reaction to the paper has overlooked these issues as well.

In the usual use of the term, cancer is not genetic because there are only a few types of cancer that are clearly due to inherited variations in known individual genes. Even these are usually only a subset of all instances of cancer of the particular organ in question.  Most breast cancer does not involve inherited variation in the BRCA1 or BRCA2 genes, for example.

At the same time, some cancers, most notably breast but also colorectal and some other cancers, show family correlations of risk, suggesting that multiple contributing inherited variants might be involved. By far the bulk of cancers are 'sporadic' in the sense that they arise without detectable genetic risk factors.  Even large-scale GWAS type studies find very few genome sites that contribute more than individually very small, barely detectable, risk.

Before the frenetic genome mapping era began around 20 years ago, it seemed clear that with few exceptions (those perhaps mainly due to viruses) cancer was the archetype of a lifestyle-related disease.  Smoking caused a very clear risk of lung cancer.  Some viral exposures caused cancers. Colorectal cancers were largely due to low-roughage western diets, and various things like hormone drugs, coffee, and you-name-it, were suspects.  In addition, we knew clearly that ionizing radiation such as in x-rays and in uranium miners caused cancer risk.

The genome-wielders largely took over, of course, but that was as much a sociopolitical coup as it was based on any serious level science.  DNA was fashionable, sequencers were fancy (and expensive), and we could search the whole genome to find the culprit variants.  This turned out largely to be a big low-payoff bust, though not all geneticists are candid enough to admit it. Still, to many, with the few known exceptions, cancer has been seen as not a genetic disease.

But it's 'genetic' nonetheless!
This may all be true--it certainly is so empirically.  It gives the impression cancer is not really a genetic disease, in the usual sense of the word, meaning due to inherited risk.  But another sense of the word refers to mechanism, and cancer generally does seem clearly to be genetic in that sense.  It's just that the source of the variation is among cells within the body rather than among people (really, conceptuses) in a population.  Or, more properly it's a mix.  In fact if it were really genetic in the inherited sense the fetus would not develop properly, so one should never expect a really deterministic variant to 'cause' cancer by itself.  In this sense, cancer really is, if anything, the archetype of a genetic disease.  Here's why.

Diseases all must arise in some way or other in the behavior of cells.  Usually, it will be some collective aspect of cells, say, the pancreas's cells, as a whole, just don't make enough insulin,  Or by the way diets and other factors affect them, the blood stream produces too much of the wrong kinds of fats and they clog arteries.

But cancer is a disease of a single cell that then goes awry, and its cellular descendants.  The reason is that its genes are not responding in the usually self-restrained way for their local tissue environment. The genes could be induced by viral insertions, or by somatic mutations (that is, mutations occurring in body cells but that were not in the sequences inherited by the individual at his/her conception). The mutations cause the cells to divide without the usual orderly constraints.

Somatic mutations are not in the germ line and are not transmitted from parents to offspring.  They don't generate family risk correlations.  They can't be found by GWAS or other studies based on sequencing inherited genomes.  But they are genetic changes nonetheless, and many studies have shown that tumor cells do share mutational changes not found in normal tissue from the same person, and that as a tumor grows, spreads, develops drug resistance the cells in different descendant parts of the cancer have acquired even further mutational changes.

So that most cancer is not predictable from inherited genotypes is a disappointment, at least for genetic epidemiologists, it's a genetic disease nonetheless.  It's just hard or impossible to detect individual cells with a combination of the 'wrong' changes so as to found a tumor lineage.

At the same time, there is no reason to doubt that countless genetic variants that are inherited can affect risk, and make a cell more vulnerable to transformative somatic mutations.  It's just that, as GWAS types of research shows, the majority of these have individually very small effects--that's because they only have an effect when some other unlucky mutation(s) happen to arise in the same cell during the person's life.  But there can be uncountedly many such heritable weak-effect genome-types that simply can't be found by the current mapping techniques, and that's why such techniques don't find them.

And, yes, it's 'environmental'
Yesterday, we started this series stimulated by the Tomasetti and Vogelstein paper, in which they related the number of dividing cells in a person and the risk and age of onset of cancers of that organ.  They showed statistically that with the few known exceptions such as smoking and lung cancer, that cancer rates correlated pretty well with these considerations.  Since we ourselves were working with cancer site-specific and worldwide age-patterns of cancer, and formulating somatic-mutational models in those pre-genetic days, these ideas were already rather well-established, so the new paper uses newer data and seems very good and apt, but the idea isn't as new as the headlines and attention made it seem.  If anything, the profession at large should never have got to the point of expecting better tumor predictability than was at hand.

Still, environmental risk factors are not ruled out by that analysis.  Environmental or life-history risk factors, like diet or reproductive history and so on, stimulate cell divisions and in that way can affect the risk of mutations arising in the way Tomasetti and Vogelstein suggested: simply the normal errors in DNA copying.  Since the exposure has to affect a cell in a given tissue and in a particular relevant gene being used by that tissue, it is no surprise that the exposure's net effect, and hence predictability, is usually very small.  Still, exposure to environmental agents must contribute to mutations if the agent is known to be mutagenic or to stimulate cell-division.  So epidemiologists may be right that mutagenic or mitogenic exposures can have carcinogenic effect, but Tomasetti and Vogelstein are right that this will be essentially undetectable.  In no way does their analysis relate to the carcinogenic effect per se, just to the net magnitude.  Indeed, we know that such predictions, except relating to a few risk factors like smoking and UV light and HPV virus, haven't proven to be very powerful or reliable.  So there's nothing new here, except to the extent that genetic or environmental epidemiologists are in denial.

But actually, there are very clear environmental factors related to cancer risk.  They have to do with the subtle concept of competing causes.  If mutations arising by chance during cell division ultimately lead to transforming genotypes in some cell, the longer one lives the more likely such changes are likely to arise in at least one such cell in the person.  This is generally why most cancer rates rise with age in ways correlated with rates of cell division.

So, if we were to obtain wonderful preventive measures to eliminate heart disease and stroke, cancer rates would go dramatically up!  That is simply because those who now no longer died from the former would be alive to await the latter.  That is environmental causation, even if indirect!  Likewise, if we really want to reduce the risk of cancer, all we need do is keep eating McBurgers in greater and greater amounts, start some wars, or continue to over-use antibiotics: then we'll all die off of other causes, before we're old enough to get cancer.

Among many things that were said, unaccredited now, by many people including myself, because of the somatic mutational nature of cancer, if you were to live long enough you would get cancer of every organ you have.

Yes, luck is involved!
Indeed, even in inheritors of risk alleles, it need not be that if they get the cancer involved that their case is due to that allele--this is obvious in the sense that the same tumor can arise in people without the allele, usually the vast majority of cases.  So other factors are involved, and the natural occurrence of mutations in cell division, as well as environmental mutagenic or promoter agents doesn't change the fact that which exposed person has the wrong mutations in the vulnerable cells is simply a matter of luck.  An environmental mutagen has to hit the wrong set of genes in the wrong cell.  Naturally and fortunately, the odds are against such bad luck.

Tomasetti and Vogelstein essentially are saying that only the internal luck of mis-copying by DNA causes cancer. But environmental factors contribute to those errors, even if any individual exposure has very weak effects relative to a given type of cancer.  Relative to all cancers, it's harder to say, because through most of history few have lived long enough for there to be the kind of data needed, and since the risk per cell per cell division is small, and cell division generally slows with age, the newer evidence in an aging population will be statistically weak; cancer rates taper off, cancers grow more slowly, and the elderly have more urgent problems to deal with, as a rule.

But even if these findings are true but not revolutionary, not so fast!
The idea that risks per at-risk cell per cell-division that Tomasetti and Vogelstein based their analysis on makes sense, even if it's something that was essentially known decades ago.  We ourselves built multi-hit mutation-accumulation models that seemed to provide reasonably good fits to the known age-onset patterns of specific cancers.  These were based on somatic mutations.  But the T and V paper's analysis actually raises some issues that suggest maybe the authors have given too 'pat' of an explanation.

Even in the mid-20th century it was known that different species of animal also got a similar array of cancers, but that their accelerating age-specific risks, in principle related to the relative number of cells, were correlated with the species' typical lifespan.  And this had little if anything to do with environmental exposures, since the animals involved were typically those we managed or that had rather uniform environments.  This is not a trivial observation!

For example, inbred animals tell the tale as to tissues with a particular life-history of mitosis.  Mice housed in essentially identical conditions, develop an array of tumors at age-specific rates.  But mice get them in months, while we get them in decades.  This problem was raised around 1970 by prominent epidemiologist Richard Peto, but seems to have basically just been (conveniently) ignored. There are also strain-specific cancer risks in mice and other animals (including dogs and cats) that suggest that inherited vulnerability genotypes may be involved, but not single-gene variants.  If the number of cells at risk, or their division rates, are responsible for the just-bad-luck theory, then tiny mice should never get cancer!  And elephants or cows should be dropping over with huge tumors very early in life.

This raises another interesting issue about theory vs data in understanding cancer.  Among the transformative ideas in the late 1900s was that cancer is a 'multistage' disorder, that arises only after several events have occurred in some unlucky cell lineage in the body (or are inherited).  Early results suggested that only 2 events might be responsible.  A number of biostatistical epidemiologists began fitting, or I'd say 'forcing', 2 or 3-stage models to the data.  That is, they had their a specific theory, based on the fragmentary evidence then available, and fit the data to it, to estimate, for example, the rates at which the events occurred.  Then they had to explain what those events were, say, a cell-division inducer and a mutation.  But there was very little substantial evidence that that was the general story of cancer, and the evidence was far weaker than the commitment to the model.

Ranajit Chakraborty and I took a different approach.  We applied a more open multiple hit model and let the data speak for themselves; that is, we estimated (rather than pre-specified) the number of hits required.  We got, I think, better fits and better explanations.  The number of hits was higher, though at the time nothing was known about what they were.  Around 1990 Adam Connor and I suggested that the age pattern of cancer could be accounted for by the age-related probability that some individual cell would acquire some critical set of changes as a function of age, here we didn't specify the number.  This, too, seemed to fit the age-patterns and both approaches suggested that cancers were as a group due to similar genetic processes (whether or not they affected different genes in each instance--there was no useful data at the time), but left open the number of events involved. Since then, it has become clear that many different genes, and different combinations in different instances of cancer in the same organ (lung, stomach, etc.) are involved.  In all, these facts and findings account for the complexity of cancer (and, indeed, many other common normal or abnormal traits).

But if 'luck' means that some individual cell has, for whatever reason, acquired an initiating set of mutations or growth stimuli, then we can expect that to a great extent, each transformed cell is transformed for a different genotypic reason, and no one gene need be involved, or is sufficient.  You just get a bad roll of the mutational dice in one of your cells, regardless of whether the mutation is only due to DNA copying or has been affected by external agents.  The difference would be rather slight, and the main correlation (as in Tomasetti and Vogelstein) related to how many cell-turnovers are at risk.

But the species differences show that something other than just 'luck', or luck affected by lifestyle factors, is involved, and what that is, is basically not known.  That suggests that the Tomasetti and Vogelstein interpretation is itself missing something important (though it won't change the empirical fact that neither inherited genotypes nor most environmental exposures do not have highly predictive effects).

In sum
Cancer is more, and less, than pure luck.  And its causes are still poorly understood.  We think as we've said that the Tomasetti and Vogelstein paper points to many things that are shown by new data--but little if anything that wasn't known, shown, and understood for the right reasons a generation ago.  The love affair with inherited genotypes, enabled, encouraged, and funded by a variety of enthusiasms, opportunities, and vested interests, has distracted attention from working from what we knew.  The problem is that the somatic mutational nature of cancer doesn't lead to tidy prediction, prevention or interventions, at least not with current thinking.  But that's where future thinking should be going.

Monday, January 5, 2015

Is cancer just bad luck? Part I. Known risk factors are poor predictors

Cancers are a highly unpredictable set of diseases, representing a fundamental problem in understanding causation that Tomasetti and Vogelstein address in a recent paper in Science ("Variation in cancer risk among tissues can be explained by the number of stem cell divisions", 2 Jan 2015, Vol 347 Issue 217).  This paper has gotten a lot of notice, both approving and not.  A bit of history might be helpful.

Cancers are due to cell proliferation gone wrong, that is, not obeying the constraints on division and differentiation of their particular tissue.  The idea has been that this is due either to exposure to some environmental risk factor, including something to do with lifestyle, or to genetic predisposition.  Both seem to be true at the population level, with, for example, breast cancer associated with age at first birth or whether women breast feed or not, number of children, alcohol consumption, and so forth, and with clear genetic risk factors, like some BRCA1 and 2 risk alleles.  In populations, people who smoke are more likely to get lung and other cancers, people with HPV cervical cancer, and so on.  But this doesn't mean that everyone who smokes, or has particular genetic risk allele, will get cancer, and that's the issue.  

Even if a risk factor is know, that doesn't explain the immediate cause of a tumor, at the cell level.  That cause is gene(s) misbehaving, causing the cell to divide at an inappropriate time.  So the idea for decades had been that environmental agents that stimulated cell division put cells at risk of incurring a mutation, and environmental mutagens caused those changes, which were the ultimate or final causes of cancer.  Hence, the search for 'cancer' genes.  


In the old days (the 1990s!), direct searches for genes were generally not possible, with a few exceptions where viruses seemed to change genes in a cancer-causing way.  But some cancers seemed to be clearly familial, that is, inherited in a Mendelian way in families.  They were statistically predictable, but with the problem that the risk depended on whether you inherited a risk gene, and we could only make a probabilistic statement about that.  A few lucky breaks showed that finding such genetic mutations was possible.  Specific inherited cancer risk-genes was first and most clearly demonstrated for a couple of childhood tumors.  Most notably, perhaps, was the eye cancer, retinoblastoma.  A fortuitous chromosomal deletion allowed the responsible gene to be identified, which was rare at that time for biomedical genetics, that was largely confined to predicting risk with no understanding of nor ability to test the actual causal gene.  There were a few others with similarly lucky discovery.

However, when genotyping on a genome-wide scale became possible, the idea was clearly that we could search the entire genome for locations that were co-transmitted or associated with a given type of cancer.  There have been many different methods, and a few clear successes.  The hallmark, and indeed one of the first genomewide screens to yield a major risk factor, was the finding that the BRCA1 and BRCA2 genes could, when experiencing one of several particular mutations, lead to a very high lifetime risk of cancer.  This was done in large, multi-generational families, but the success spurred methods to search more generally in populations (that we now call GWAS or other types of searches).  The BRCA discovery led to the rampant genomewide approach that we have seen in the recent 15-20 years.  The idea underlying this work has been the idea of finding risk variants that are strong enough, if not to be transmitted clearly in families, at least consistently affect risk, and this has been extended to basically every trait someone could get a grant to study.

But even when BRCA causation was found, there were important questions.  Those inheriting a high-risk BRCA mutation and who did in fact get breast (or ovarian) cancer, did not get those diseases until mid- to late life.  The lifetime risk was very high indeed, and some unfortunately got separate cancers in each breast.  Yet this was not the rule.  So, if the gene 'caused' the cancer, why did it take so long to do it?  An obvious answer is that it was environmental factors.  Also, by far most cancers do not segregate in families in Mendelian fashion the way BRCA mutation effects can, and indeed relatives only share slightly excess risk.  Even cases are at only slightly elevated risk than controls for most cancer-related gene mapping results.  One would think that the final risk might be due to the additional contribution of environmental factors.

Epidemiological studies of environmental risk factors for cancer have identified the major ones -- smoking, asbestos, exposure to UV light and X-rays, exposure to some chemicals used in agriculture and so on.  So, many (especially environmental epidemiologists who don't have a stake in the competition for genomic funding) have argued that if genomic variation isn't a good predictor, environmental variation must be!  But after extensive work, environmental factors don't explain all causes of any given cancer, either, nor can exposure history reliably predict cancers -- only a small minority even of smokers goes on to develop lung cancer, e.g.  And, indeed, unlike smoking and a few others, most environmental associations and candidate factors aren't clear mutagens or promoters.  So what's going on??

Why don't environmental or genetic risk factors explain all the risk?
This is the problem Cristian Tomasetti, a mathematician, and Bert Vogelstein addressed.  Vogelstein was one of the pioneers of the search for somatic mutation. That is, the mutational change that makes a cell misbehave need not have been inherited, but have been generated during the person's life. Vogelstein years ago applied a particular technique to show that tumor cells contained a particular kind of mutation (called 'loss of heterozygosity') that was not found in non-cancer cells from the same individual, but often was found in particular genome regions for a given type of cancer (in particular, colorectal cancer).  That was rather clear evidence (and there was evidence from a growing number of other researchers, too) that cancer was indeed a 'genetic' disease, but not just due to inherited variants.

Tomasetti and Vogelstein point out that current data suggest that only 5-10% of cancers are caused by heritable factors, and environmental factors can't explain the wide disparities in risk of cancer in different tissues.  They wondered how much cancer is caused by chance and how much by environmental factors.  By "chance" they mean things that just happen to go wrong during the DNA copying that occurs during cell division, which is when a tumor gets started.  Their analysis suggests that these changes are just inherent molecular copying errors, that don't have to be induced by environmental factors.

Writing in the same issue of Science in which the paper appears, Jennifer Couzin-Frankel describes the work:
In a paper published...this week in Science, Vogelstein and Cristian Tomasetti, who joined the biostatistics department at Hopkins in 2013, put forth a mathematical formula to explain the genesis of cancer. Here’s how it works: Take the number of cells in an organ, identify what percentage of them are long-lived stem cells, and deter- mine how many times the stem cells divide. With every division, there’s a risk of a cancer- causing mutation in a daughter cell. Thus, Tomasetti and Vogelstein reasoned, the tissues that host the greatest number of stem cell divisions are those most vulnerable to cancer. When Tomasetti crunched the numbers and compared them with actual cancer statistics, he concluded that this theory explained two-thirds of all cancers.
Tomasetti and Vogelstein estimate the stochastic, or chance effects "associated with the lifetime number of stem cell divisions within each tissue."  These effects can be mathematically distinguished from environmental risk factors.  They predicted "that there should be a strong, quantitative correlation between the lifetime number of divisions among a particular class of cells within each organ (stem cells) and the lifetime risk of cancer arising in that organ."  And this is what they found, and how they determined that two-thirds of all cancers are due to chance; the changes that occur just by bad luck during DNA replication.

There are also life-history aspects of cell division that are generally consistent with this.  For example, neurons stop or at least slow down their division rates as the brain matures, while glial (supporting) cells keep dividing, and most brain cancers in adults are gliomas.  Retinoblastoma (eye cancer) risk is mainly at birth or early childhood, and retinal cells have stopped dividing after that.  But radiation treatment (an environmental mutagen) for RB has been found in the past, at least, to lead to later bone cancer, when bones are rapidly growing.

This has generated some attempts at rebuttal, which is not surprising, because many hopes as well as vested interests among geneticists and environmental epidemiologists are threatened by the finding.  But in fact, based on work and then-current ideas we ourselves were involved in back in the 1970s and 80s, the current kerfuffle is a reflection both of culpable misunderstanding, ignoring of long-standing evidence, wishful thinking, and looking away from some facts that raise challenges even for the 'new' explanation of cancer causation.  We'll discuss that tomorrow.