Tuesday, November 18, 2014

Darwin and the evolution of my brain

 My drawing ability stalled out at the second grade level.  To wit, this drawing of an imaginary place that I did in reply to a request from my daughter just 3 months ago.




It is embarrassingly bad, and it perfectly illustrates why I didn't draw for my entire life.  At least by hand.  (What are those green blobs in the pond? Frogs? Lily pads?)  Ok, yes, somehow I did manage to produce illustrations for two different, 'serious' books, but I thought of that as 90% Adobe Illustrator and 10% me.  But I did learn as I went along, or maybe AI and I trained each other.

Here's one of my first Illustrator drawings, from our earlier book.


From Genetics and the Logic of Evolution, Weiss and Buchanan, 2004



I remember how much I struggled just to make this simple line drawing.

And then one of my favorites, which not coincidentally was among the last I did for The Mermaid's Tale.

From The Mermaid's Tale; Weiss and Buchanan, 2009



Ok, two of my favorites.

C. elegans body plan; The Mermaid's Tale; Weiss and Buchanan, 2009


I learned a lot about using AI by the time that book was finished.  But see drawing above as to how much that applied to hand drawing.

And then a few months ago, by chance I saw that a painter at the local art co-op was offering a beginning drawing class. On a whim, I decided to sign up.  I knew someone in college who drew her way through Norway on a postcard-sized sketchbook, and I always thought it would be wonderful to be able to do that, though I had no illusion that I would ever be able to.  Still, I liked the idea of perhaps being able to improve my drawing at least to the point of being able to enjoy doing it.

I took the list of supplies we'd need to the art store before the first class, bought the pencils, erasers, pencil sharpener, and the suggested sketch pad, which was so large that it was unwieldy to carry.  I was embarrassed to walk out of the store with that 18 x 24" sketch pad under my arm.  I felt a total fraud.




The first class was a bit intimidating -- one woman was already seated at her easel, half-way through copying a da Vinci drawing, which didn't help.  As it turned out, she was a private student but I didn't know that at the time.  I sat down at one of the free drawing tables and opened the sketch pad to the first yawningly empty sheet of paper.  I laid my 2HB pencils and my erasers and my sharpener next to the paper, and looked around at the 6 other students doing the same.  What if they were all as good as the woman copying the master?

Introductions followed -- relief, the rest of us really were beginners -- and then the instructor sat down to demonstrate what he was asking us to do.  He put a simple box on the pedestal that was the hub of the circle of tables and nervous students, and began to draw.  Yes, he did put up his thumb to measure the size of the object.  It was in fact a revelation to me that artists actually do that -- and the beginning of the evolution of my brain.

After watching how it was supposed to be done, I sat back down to try to draw the box myself.  Huge blank sheet, everyday object, render to paper.  Just picking up the pencil was awkward, and the act of putting the first line to paper felt like it was being done by someone else's arm, driven by someone else's brain.  But I did it, and this is what I drew.




Ok, tentative lines, no attention to technical issues, but there are lines on paper.  That was good enough for day 1.

We had eight classes, three hours long, each utterly basic but utterly eye-opening to someone stuck in second grade drawing mode.  Perspective! Oh, that's why I could draw the diagrammatic figures for the books that I did with Illustrator!  No need to make them look life-like.  Oh, we're supposed to draw what we see, not what we think we see!  Revolution.  Negative space! A whole new way of seeing.  Organizational lines, vanishing points, units; all basic, all essential.

As I practiced, somehow the rust fell away, and my muscles started to be willing to move.  Not just arm muscles, but the seeing, rendering muscles.  We went outside to draw houses for one class, and here's the one I drew.




Still tentative, still technical issues, and the house looks rather more haunted in my rendering than it does on the street (it's a very tidy, well-kept house, in fact), but still, progress I thought.  I started to sort of like what I drew, so I kept drawing.

And then the other day I woke up wondering if I could draw Darwin.  Who does that?  So, me, online photo of Darwin, sketchbook and a pencil.  I wish I'd taken more photos as I worked, because the fascinating thing, to me, is that at some point early on, my lines on paper began to actually look like the famously familiar photo of this man.  This absolutely amazed me, and continues to.

Here's a photo of just the face, before I added the trimmings, which turn out not to be necessary for the effect.  I actually kind of like this picture better than the 'finished' one.  But the thing is, it was Darwin after just the first eye was done.



Now, 'finished' (which brings up another artistic problem: how do you know when you're finished?)




How does the brain turn lines on paper into a sense of a three-dimensional person?  Is it because we know this image so well that we excuse my raw attempt at rendering it, and fill in the blanks?  Simple (or not so simple) pattern recognition?  That could be.

And, to pull this tale back to the beginning of evolutionary time, at the end of each drawing class we looked at all of our drawings, and 'critiqued' them.  To me the fascinating thing, each week, was how very differently each of the seven of us put pencil to paper; same beginnings, usually the same object, totally different renderings.  One woman drew a bird's-eye view of the house she was sitting in front of, in the beautiful dark confident lines she used for everything; another man drew every shingle on the roof of his chosen house.  Speciation in action.

As well as evolution of the mind.  My mind, my understanding, my confidence and ability to tell my muscles what to do.  I have a long way to go, having drawn my way into a number of technical corners in just this one Darwin drawing, and I have no idea where to even begin working with color, but I've learned a lot.  Not least about what this optical illusion, this effect of graphite on paper, tells us about the power of the brain, of its constant, effortless brilliance at solving 'the binding problem', the putting together of the results of what so many different parts of the brain are perceiving and making final sense of it.  Indeed, to the brain, this isn't a 'problem' at all.  Even ants and bees and crows and dolphin brains can do it.  It's a 'problem' only for those who want to put it into words.

Friday, November 14, 2014

A Groundhog Day blog redux

Someone wondered the other day why we keep saying the same thing over and over on our blog.  "Ok, ok, we know things are complex, get on with your life." We, of course, wonder why we have to keep repeating ourselves.  But his query reminded me that we've dealt with this issue before, so we're rerunning a post from 2012.

The Groundhog Day blog?

Sometimes it seems that we're posting the same story over and over again.  Here are some new study results, here's what the authors say they mean, and here's what we think they really mean.  Usually a lot less than the authors report.  Just this week, does aspirin prevent cancer?  Should we eat eggs?  And a post asking simply how we can tell if results are credible.  If you read us regularly you know we don't just pick on epidemiology.  We give genetics the same treatment -- why should we believe any GWAS results, e.g.?  And should we expect to find genes 'for' most diseases?  Or behaviors?  The same for all those adaptive stories that 'explain' the reason some trait evolved.  And Holly is equally circumspect about claims in paleoanthropology, which of course is why we love her posts!

Is it just being curmudgeonly to ask these questions?  Or is it that where some see irreducible complexity others see a simple explanation that actually works?

An isomorphic problem
The important thing about these various issues in modern science is that from the point of view of gaining knowledge about the causal world, they are isomorphic problems.  They have similar characteristics and are (currently) addressed by approaches with similar logic--in terms of study design, and similar assumptions on which both study design, data collection, and methods of analysis are based.  The similarities in underlying causal structure include the following:
  1. Many different factors contribute causally to the outcome
  2. Most of the individual factors contribute only a small amount
  3. The effect of a given factor depends in various ways on the other factors in the individual
  4. The frequency of exposure to the factors varies greatly among individuals
  5. Sampling conditions (how we get the data we use to identify causal elements) vary or can't really be standardized
  6. The conditions change all the time
  7. The evidence for causation is often indirect (esp. in reconstructing evolution)
  8. We have no underlying theory that is adequate to the task, and so we use 'internal' criteria
These days, we use the word 'complexity' to describe such situations.  That word is often used in a way that seems to imply wisdom or even understanding on the part of those who use it, so it has become a professionalized flash-word often with little content.

Often, people use the word, but persist in applying enumerative, reductionist approaches that we inherited over the past 400 years largely from the physical sciences (we've posted on this subject before).  This is based essentially on the repeatability of experiments or situations.  We try to identify individual causal elements and study them on their own.  But if the nature of causation is the integrated effects of uniquely varying individuals, then only the individual strong (often rare) factors will be easily identified and characterized in this way.

Item #8 above is important.  In physics we have strongly formal theory which yields precise predictions under given conditions.   There is measurement error, and the predictions are sometimes probabilistic, but the probabilities involved and the statistics of analyzing error, were designed for such situations.  We compare actual data to predictions from that externally derived theory.  That is, we have a theory not derived from the data itself.  It is critical to science that the theory is largely derived not just in our heads but from prior data.  But it's external to new data that we use to test the theory's accuracy.

In the situations we are facing in genetics, evolution, biomedicine, and health, we have little similar theory, and the predictions of what we have are not precise or our assumptions too general.  Even the statistical aspects of measurement error or probabilistic causation are not based on rigorously specified expectations from theory.  Our theory is simply too vague at this stage.  So what do we do?

We use internal test criteria.  That is, we test the data against itself.  We compare cases and controls, or different species of apes' skeletons, or different diets.  We don't use some serious-level theory to predict that so many eggs per day, or some specific genotype at many sites in the genome will have some specific effect based on primary biological theory, but only that there is a per-egg outcome. We don't know why, so we can't really test the idea that eggs really are causal, because we know there are many variables we just aren't adequately measuring or understanding.  When we do find strong causal effects, however, which does happen and is our goal of this kind of research, then subsequently we can perhaps develop a real theoretical base for our ideas.  But the track record of this approach is mixed.

This is also often called a hypothesis-free approach.  For most of the glory period in science, the scientific method was specifically designed to force you to declare your idea in a controlled way, and test it (the 'scientific method').  But when this didn't work very well, as in the above areas, we adopted a hypothesis-free approach that allowed internal controls and tests: our 'hypothesis' is just that eggs do something: we don't have to specify how or why.  In that sense, we are simply ignoring the rules of historically real science, and even boasting that we are doing science anyway, by just collecting as much data as we can, as comprehensively as we can, in the hopes that some truth will fall out.

The central tenet of science for the last 400 years has been the idea that a given cause will always produce the same effect.  Even if the world is not deterministic, and the result will not be the same exact one, it will at least have some probability distribution specifying the relative frequency with which we'll observe a given outcome (like Heads vs Tails in coin-flipping).  But we really don't even have such criteria in the problems we're writing about.  Even when we try to replicate, we often don't get the same answer, and do not have good explanations for that.
When we're in this situation, of course we can expect to get the morass of internally inconsistent results that we see in these areas, and it's for the same basic epistemologicalreason!  That is, the same reason relative to the logic of our study designs and testing in these very different circumstances (genetics, epidemiology, etc.).  Yet that doesn't seem to slow down the machine that cranks out cranky results: our system is not designed to let us slow down to do that.  We have to keep the funds coming in and the papers coming out.

And then of course there's cause #9. Most of us have some underlying ideology that shapes our interpretation of results.

This is all a fault of us and the system.  We can't be faulted for Nature's complexity.  The issues are much more--yes--complex than we've described here, but we think this captures the gist of the problem.  Scientific methods are very good when we have a good theory, or when we are dealing with collections of identical objects (like oxygen or water molecules, etc.), but not when the objects and their behavior are not identical and we can't specify how they aren't.  We all clearly see the problem.  But we haven't yet developed an adequate way to deal with it.

Comet: it's not just a cleanser any more!


  

Thursday, November 13, 2014

Evolution of malaria resistance: 70 years on...and on....and on

It was about 70 years ago that the complex problem of anemia, malaria, and genetic interactions, with their relation to hemoglobin was first beginning to be understood.  Sickle cell anemia and its association with a globin gene variant, and similar associations between malarial susceptibility and other genes (such as G6PD and Duffy and other globin gene mutations) were also rapidly identified in roughly the same decades.  The findings were showing that in areas of the world with long-endemic malaria, various gene mutations seemed to be at high frequency as if they protected against malaria.  I was never involved in this directly, but I studied under Frank Livingstone and James V Neel at Michigan, two of the leaders in understanding the evolution of the protective mechanisms.

For decades we have had direct clinical evidence, mainly in Africa, but also in Sardinia, and then later in other places including southeast Asia, that at least some of the putatively protective mutations in the alpha and beta globin, and other genes did in fact protect against malaria, but that they had side effects such as various forms of anemia or other problems.  Even then most of the evidence was circumstantial and based on geographic correlations.

The idea of a balanced polymorphism was suggested in regard to these variants.  If you had two 'malaria-protective' alleles at the gene (one in each copy of the gene that you have), you were vulnerable to anemia, but if you had two 'normal' alleles you were susceptible to malaria; however, having one of each (a heterozygote genotype) you had some protection against both malaria and anemia.  Evolution favored keeping both variants in the population, because selection worked against both homozygotes.

Plasmodium falciparum lifestyle; Wikipedia

Far beyond malaria: Relationship to fundamental evolutionary questions
The idea of balanced polymorphisms played into a major theoretical argument among evolutionary biologists at the time, and sickle cell anemia became a central case in point, and a stereotypical classroom example. But the broader question was quite central to evolutionary theory.  Balancing selection was, for many biologists who held a strongly selectionist version of Darwinism, the explanation for why there was so much apparently standing genetic variation in humans, but  generally in all, species.

The theory had been that harmful mutations (the majority) are quickly purged, so the finding that there was widespread variation (polymorphism) in nature at gene after gene, the result of the type of genotyping possible then (based on protein variation), demanded explanation; balanced polymorphism provided it.  This was countered by a largely new, opposing view called 'non-Darwinian' evolution, or the 'neutral' theory; it held that much or even most genetic variation had no effect on reproductive success, and the frequency of such variants changed over time by chance alone, that is, experience 'genetic drift'.  This seemed heretically anti-Darwinian, though that was a wrong reaction and only the most recalcitrant or rabid Darwinist today denies that much of observed genomic variation evolves basically neutrally.  But many saw the frequency of variants associated with what were seen as serious recessive diseases, like PKU and Cystic Fibrosis (and others) as the result of balancing selection.

In support of the selectionist view, many variants have been found in the globin and other genes for which the frequency of one or more alleles is correlated geographically with the presence (today, at least) of endemic malaria.  But there are lots of variants that might be correlated with other things geographic because the latter are themselves often correlated with population history.  Thus, the correlations are often empirical but not clearly causal.  Indeed, not many variants have been clearly shown experimentally or clinically actually to be functionally related to malaria resistance.

In this light it is interesting to see a rather large-scale attempt at testing whether putative malaria-associated variants really are protective. The paper ("Reappraisal of known malaria resistance loci in a large multi center study") by a large consortium of authors is in the November 2014 Nature Genetics; it is paywalled so if you don't have direct access but would like to read it, I'd be happy to email a pdf.

These authors compiled large data sets from different areas of the world which have endemic malaria caused by the specific falciparum subtype of parasite, and compared the frequency of the many candidate gene variants in sufferers of severe malaria to a large set of unaffected controls (of course some of them may later become affected).

A long time coming...and the clock still ticking
Even now, 70 years after the first ideas were suggested, we still have scant direct clinical data showing protection at a mechanistic level, so the results of this paper are still statistical.  But they are at least from a reasonably designed and specific study.  The authors found positive statistical association for some of the most clear-cut classical risk alleles (sickle cell, G6PD, O-blood group), but ambiguous or variable evidence even for some of these, and no statistical evidence for many other putative causal, or protective variants.  Further, they found that some variants had different effects in males and females, and one SNP, in the CD40LG gene, previously found to be associated with severe malaria, was associated with reduced risk in The Gambia, but significantly increased risk in Kenya.  Whether this is just statistical variation or indicators of other aspects of these local-area genomes isn't clear.

The evidence in the positive instances is persuasive, even if just statistical, but the conflicting results and the surprising lack of findings for so many is curious as well as discouraging.  How can it be that so long on, we still basically don't even know if a genetic variant is protective or not, other than the most classical ones? This shows how very challenging even 'simple' causation can be.

This raises the basic evolutionary issue in a different way. Darwin was convinced that adaptive evolution was very slow.  One major reason was that rapid changes of species or adaptations were rarely observed (still true), and if they occurred they could be interpreted as creationist rather than natural events.  Adaptive evolution under human direction, as in agricultural breeding, clearly brings about easily measured change.  But some forms of natural selection could be quite strong.  Adaptive coloration is one, but malaria should be another because it is so common and strong a negative effect on health.  So basic evolutionary arguments ought, it was long hoped, demonstrate that, in this instance, balancing selection was a correct explanation of at least these polymorphisms.

In past work, one hemoglobin variant (called hemoglobin E) has apparently been sweeping across southeast Asia because there was no down side to being an EE homozygote, and it protected against malaria.  But generally, the actual selective effect has been very hard to prove.  The new study shows this in a sobering way.  Is the story right?  Have prior speculations about protective mutations been too superficially offered, and incorrect?  Is the selective effect so small even in relation to malaria, that we can't see it even with samples large enough that 'nature' could have made a detectable selective difference?  Or, if so gradual in a Darwinian sense, do these other mutations really make an evolutionary difference?

Several relevant points are, first, that this study only looked at one form of malaria (P. falciparum), and second, that the different putative protective genes are involved in different physiological pathways.  And, as even the authors note, current patterns of disease, when antimalarial drugs are widely used, may not reflect patterns in the past, and thus it may not be possible to conclude that P. falciparum was the selective force these results suggest it may have been, plausible though that seems to be.  These points suggest that even here, complexity and subtlety are involved.

Beyond evolutionary theory
More sobering than the reality of detecting evolutionary or even genuine physiological differences among these various genotypes, is the further fact that even for these major and rather clear causal sites, there is still basically no progress in effective gene-based therapy.  After all, the target cells are in blood (generally, red cells), among the most easily accessible of all tissues.  Given the unrestrained promises repeatedly being made by the genomewide-do-everything industry, this is (or should be) a very sobering thought.  Our technological tools should, one might expect, have been able to solve such comparatively clear-cut problems.

To us, this 'failure' indicates the subtlety of genome physiology.  Given the hundreds of putatively causal single-gene findings by GWAS and other means, where the evidence has seemed strong, we should be showing that genomic data are, after all the expense and effort, really worth gathering.  We should be making a definitive, and one might say systematic, march toward elimination of these genetic threats, perhaps the way vaccines have done against many infectious diseases.  If we could actually do that, and speak of cures and prevention rather than just risk-estimation of countless minor factors, then nobody would disagree that further genomic big-science efforts were worth the investment.

Meanwhile, more than 70 years on, the largely failed effort to use that knowledge directly to rid our species of a disease that has been estimated to have killed more human beings than any other single cause, shows how far we have to go--and how important new sorts of thinking could potentially be to the effort.

And, into the bargain, perhaps we're learning a lot about how adaptive evolution works, reinforcing Darwin's ideas about its slowness, about multiple alternative or interactive pathways, and more.

Wednesday, November 12, 2014

On cancer genetics

What 'causes' cancer?  This was a very mysterious disease for a long time, and there were many theories about it.  Prominently, in the 1970s or so, a major idea was proposed by Nobel laureate Macfarlane Burnet, an eminent Australian immunologist.  The idea was known as the 'forbidden clone' theory and was about autoimmune disease but, more generally, about somatic mutation.  The idea of cancer as a somatic mutational disease made sense if cancer arose from single founder cells, as accumulating evidence suggested, and yet was generally not inherited.  If it is 'genetic' in its etiological mechanism, what else could it be?  Viral causes were found, though I cannot recall when, relative to the rest of this history.

The idea of a mix of inherited and somatic mutations had appeal in the sense that if you inherited part of a mutational pathway to cancer, but not all of it, your parents would be unaffected but you would only have to 'await' complementary somatic mutation in order for some cell to be transformed to a cancer state.  This thinking led Al Knudsen in the early 70s to propose such a mechanism for the pediatric eye cancer retinoblastoma--a marvelous insight for which a Nobel prize would not have been inappropriate.  There, it has turned out that the major event is a second, somatic, mutational 'hit' in the RB gene itself, and the tumors occur so early in life that perhaps few other somatic events are needed to transform a retinoblast.  Also, retinoblasts may not divide much if at all after development, so if you escape the second event while the retina is developing, then you're safe.

The idea of cancer as a somatic mutational disease is widely acknowledged, though most of the ink is spilled lauding discoveries of inherited tumor variants, of which the best-known are variants in the BRCA1 and 2 genes (but there are others).  Virally induced cancers seem to be due to viruses incorporating into inappropriate locations in the genome, so while they are externally 'inherited', the cell-specific mechanism is consistent with other ideas.

It is still correct that, with a few exceptions like retinoblastoma, even those who inherit a high-risk variant such as in the BRCA genes typically do not get their cancer till much later in life.  And it is also true that inherited variants seem to need many subsequent complimentary mutations for a cell to be transformed.  Thus, even BRCA mutations are in themselves not a cause of cancer.  Indeed, if the story is correctly being understood, the BRCA genes are involved in mutation detection and repair, so that the associated breast and ovarian (and perhaps a few other) cancers are really due, at the cellular level, to other mutational changes that directly affect the cell's behavior.

Somatic mutations are generally hard to study, but even in cancer, a concentrated source of cells with such mutations, this is a challenge because a tumor grows rapidly and spreads, so even if all tumor cells are somatic descendants of the original transformed cell, these cells continue to acquire further mutations.  This accounts, in part at least, for the spread (metastasis) and evolution of drug resistance of tumors.

Most attention has been on protein causing changes--exome mutations--in the search for cancer-related  mutations.  But if cancer is a lineage of cells that do not constrain their processes or rate of cell division, then one might suspect that regulatory variation would be comparably or even more important than protein structure itself; that is, normal proteins related to cell behavior may cause problems if there are too many or too few of them in a cell under various conditions.  This has led to expanded, though more difficult, searches of DNA sequence in tumors.

Regulatory somatic mutations in cancer
A paper in the November 2014 issue of Nature Genetics, by Weihnold et al., reports on regulatory mutations found in cancer cells.  The authors used some existing cancer genome data bases that compared cancerous tissue to normal ('matched normal') control samples.  The samples were small and had other various limitations as the authors note, but the point is that in screening whole genome sequence they found a number of gene-regulating areas that had multiple mutations in the data and thus seemed to indicate regulatory somatic mutations.

This is interesting beyond even the tentative nature of the paper itself.  One might speculate that even when protein variation is responsible for the cell's initial transformation to found a tumor, the subsequent aspects of growth, metastasis, drug resistance and so on may well be due to changes in the regulatory behavior of the cells descendant from the original tumor.

It is theoretically obvious and well-documented specifically, that different parts of tumors contain different somatic-origin mutations.  This paper suggests that classical genes are not the only place to look for such variation.  Searching the 'noncoding' parts of the genome, which is the vast majority and is still largely not understood, will be daunting.  How complex, unique to individuals and tractable this approach will turn out to be is hard to predict.  But as we've noted recently here on MT, the evolution of cells within a given individual's lifetime is comparably (or more) complex than the evolution of individuals in a species,  Documenting this variation in adequate detail may require very different sorts of methods, but the story is surely going to be interesting.  How well it aids therapy is another story entirely.

Tuesday, November 11, 2014

Genomics: finding a lid to fit one's kettle?

I recently read Laurence Sterne's Tristram Shandy, and this led me to begin re-reading one of the books that was a precursor to the jumbled, chaotic, but often hilarious adventures of Tristram, namely, the 16th century Gargantua and Pantagruel by Francois Rabelais.  In the Preface to Book I, I noticed that Rabelais spoke of one Friar Lubin, who went to great lengths "to find a lid to fit his kettle."



The context was Rabelais' argument that a lot of retrospective meanings were often assigned, by presumed sages, to the works of the classics such as Homer and Ovid.  Rabelais' idea was that these authors wrote wonderful stuff but afterwards scholars combed through it to find subtle meanings that were never really there.  The relevance to science, as I interpret this, is to the widespread, perhaps quite natural tendency for investigators convinced something is true to force results into interpretation consistent with that conviction.  Could this lead geneticists to make more of a specific mapping-based genome location of functions that are not really there, and thus to be distracted from functions that might be more important?

A century of work has found many normal and disease traits that are tractably genetic in the sense that one or at most a few, or a choice of one among a few, identifiable genetic loci are responsible.  But simple genetic causation is far from what is being routinely promised as the more general case, especially for the common, important complex traits that are the major public health problems, and a main target of genomics today.  For those traits, rather than single genes, tens to even thousands of different genome regions are being found to have statistically detectable association with the traits, usually detectable only in huge samples and/or with individually very small effects.  Yet such findings are commonly claimed as triumphs, and the investigators go to great lengths to find in them a lid to fit their kettle.

One possible current example of this kind of Procrustean approach is a paper in the current issue of Cell ("Lessons from a Failed γ-Secretase Alzheimer Trial," De Strooper, Cell, Nov 6,  2014).  Protein complexes called γ-secretases have been thought by various criteria related to the amyloid plaques associated with Alzheimer disease, to be likely candidates for inhibitors to have therapeutic effects, but a directly relevant drug trial study found some negative consequences but failed to find the positive effect.  The authors of the Cell paper argue that this 'No' actually means 'Yes' if you just let us continue the research:  "This pessimism is unwarranted: analysis of available information presented here demonstrates significant confounds for interpreting the outcome of the trial and argues that the major lessons pertain to broad knowledge gaps that are imperative to fill."

Strooper presents a vigorous and technically specific set of arguments, and he may be right, of course, but even if so in this case, No-means-Yes arguments are seen rather more often than any actual beef.  It's easy to make fun of, and indeed if there is good plausibility evidence, a single study, especially with statistically-based inference, may not be a definitive refutation of an idea.  If one has what seems like a good idea it is natural and right not to give up on it too easily.  But the frequency of this persistence and the rather typical lack of strong follow-up confirmation at least raises serious questions about our criteria for inference and for giving up on an idea that isn't panning out.

If we choose in advance some significance level, say = 0.05, as a cutoff for finding a signal, and we design a sample that according to our model should be able to detect an effect of the size we expect, but the study arrives at a p-value of, say, 0.06, in technical terms we should abandon our hypothesis, but of course we usually don't.  We call 0.07 'suggestive' and press ahead with our hypothesis.  This seems like cheating, and in a sense it is.  But in a deeper sense, if we realize the arbitrariness of all our inferential criteria (parsimony, falsifiability, significance....) then we realize that inference is a subjective kind of collective sense of acceptance (or not) of hypotheses.  In that light, the γ-secretase Carry On Regardless attitude may not be so wrong--even if it shows that belief, not just objectivity, is important in sciences like genomics that would fancy themselves rigorously objective.

Another example is the search that some investigators are making to find rare rather than common variants causing disease.  In principle this makes sense since most variants in the human genome are rare and this may be especially true of harmful variants because evolution (natural selection) will on average work against them.  There are various techniques for a rare-variant approach, such as finding a given gene in which different sequence variants are seen in different cases of the disease.  This is persuasive, not in the sense that the nature of the specific variants themselves shows why they are pathogenic, but because multiple observations of the same gene at least suggests it might be causal.  Historically, once relatively common variants were used to map causation of some pediatric traits, like PKU or Cystic Fibrosis, subsequent sequencing of the gene in patients has found a large variety of different variants--typically hundreds!--that are themselves too rare to generate statistical association on their own.  If we can now assume that the gene is the cause, then we can infer that the newly found mutations are causal.  That is an assumption that can be questioned, because under the assumption many strange seemingly innocuous variants (e.g., in noncoding or intronic or synonymous sites) are blamed as being causal.

Another tactic for attributing cause to rare variants is to find the same variant in affected relatives, especially a parent and offspring.  This plausibly appears as high-penetrance (Mendelian dominant) inheritance.  Of course, roughly half of all sequence variants found in any parent will be found in any given offspring, but if there is functional, experimental, or other substantial reason to suspect a particular gene, or the inherited rare variant seems culpable (e.g., a premature stop codon), then such transmission would seem to be at least plausibly convincing.  This seems to be widely accepted logic, but is it right?

Fitting data to prior ideas: Procrustean beds or lidless kettles?
The answer is, undoubtedly sometimes, but probably in most cases not really.  How can that be?  The reason, if not the trait, is simple: genetic variants have their effects only in their environmental and genomic context.  If a given variant is not always seen in association with a trait, or there is no particular known functional reason to 'blame' a given genome location for the trait, then one has to ask why one can make a causal assumption.  We know from many mapping studies by now that variant-specific risks are usually very small, often detectable only in huge samples.  In other words, by far most people carrying the variant don't get the disease, so it's a tad strange to think of it as a 'causal' finding. 

Even genes in which known variants with clearly very strong unquestioned effect are widely accepted (e.g., major mutations in the BRCA genes in relation to breast cancer).  But even then the risk estimated from samples is neither 100%, nor similar across cohorts.  Something differs among affected carriers of these variants, and that is context.  The context  is either environmental or genomic. In the case of BRCA, the genes are thought to function to detect mutations in the cell and stimulate their correction or to kill the cell.  Their role in cancer is that in a meaningful sense the tumor is caused by other variants in the genome, not BRCA itself.  But lifestyle factors somehow seem clearly also to be involved--how would that be if the BRCA is a mutation-repair related gene?

Parent-offspring transmission of rare variants certainly may indicate that they play some role in the outcome, but it's possibly (perhaps likely?) because of other genetic (or environmental) co-conditions in the individuals.  Offspring inherit much besides a single variant from their parents, after all. 

Various studies, based on DNA sequence analysis, have by now shown that we each typically carry around tens or more defunct or seriously damaged genes.  The variants may be pathogenic in some individuals but not in others.  The reason again must be context, that is, something other than the gene itself.  If not, it is some probabilistic aspect of causation about which we can usually only speculate (or assume without even a guess about mechanism)--or simply use 'probability' as a fudge factor to make our story seem scientifically convincing.

Weak signals may not be low fruit, but pointers to elsewhere
Ironically and oddly, finding rare variants in various individuals or finding variants common enough to generate a statistically significant association test but with only low relative risk may mainly mean that the bearers also carry other risk factor(s) that made the target variant 'causal' in the few observed cases.  Most of the time--in most contexts--there seems to be no excess risk, or else one would expect the gene to be easily identified even in modest samples, as CF and PKU and many other traits were.  Small effect is what small relative risks mean, and small relative risks are by far the rule in mapping studies.

Indeed, claiming success by forcing the conclusion that the identified gene is 'the' cause in these individuals, even parent-offspring pairs, may be another way of finding lids to fit investigators' kettles.  Again, if there is a conclusion, it might better be that when small-effect variants are found, it is the context of the rest of the genome (plus life experience of the cases) that are as key to understanding the trait as the target 'hit' site itself.  The discovered hit may be involved, but mainly acting as a pointer to some other factor(s) that really account for the effect.  If the identified gene itself is so important, why do we only identify a few rare variants in that gene associated with risk, even if transmitted in families?  That is, why don't we see some higher-frequency mutations in the same gene as we do with many of the other largely single-allele traits?


These questions apply even to those who argue that finding these cases, the 'low hanging fruit' as such things are often called, is a worthy objective that we can attain, even in the face of complexity.   Of course, there are population genetic (evolutionary history) reasons why this may be so, since variant frequencies are affected by chance among other things.  And finding a cherry is not evidence against it's involvement.  When an inactivating variant is found to be transmitted, this is certainly plausibility evidence worth following and, after all, many single-gene disorders have been identified once there is a clear-enough trail to follow.  Still, even knockout mouse confirmations are not always definitive support by any means, and as we noted above, healthy people may harbor as many 'bad' genetic variants as those affected. But if the finding is confirmed, then of course therapeutic approaches can be contemplated.

However, the great lack of clear therapeutic consequents of the vast majority of GWAS-like findings is consistent with the idea that the target site is in truth mainly pointing us to other things that are what we need to know about. That is, thinking of the picked cherry as really causal may be a mistaken way to interpret genomic data, even if the cherry is a small part of the story.  If this is being too critical then it is only to match the predominant view which is being too promotional.

An upside of these ideas could be to lead investigators to take the context-dependent aspect of such findings more seriously and see what else may be accompanying the rare variant in question, or what it may interact with.  There are, of course, efforts to do this, but it is not an easy problem, because such follow-ups lead back into the web of complexity; but perhaps using these situations as entry points we can find some order there.

To make more of it than that may suggest that oftentimes we've decided ahead of time what sort of kettle we have, we will fit what lids we find to it.

Monday, November 10, 2014

Dragonflies and innate understanding of physics

"The human mind possesses a basic probabilistic knowledge."  So say Fontanari et al. in a newly published paper in PNAS ("Probabilistic cognition in two indigenous Mayan groups").  They asked whether formal schooling was a necessary foundation for a sense of chance by comparing two unschooled Mayan groups with Mayan schoolchildren and a control, and determined that no formal education is required for making "correct probabilistic evaluations."

This paper hit the popular news media.  "We are all natural bookmakers," said New Scientist.  And the senior author, Vittorio Girotto, said,
"We wanted to show that this sense of chance exists, that it is universal, and that you do not need to be trained to evaluate uncertainty," says Girotto. "We have good evidence now that the human mind does possess this ability."
Researchers have also reported that infants have a sense of "intuitive physics," seemingly being born with the ability to understand gravity (that is, by 2 months of age, they expect an object to fall -- really, who understands gravity?), and to expect that an object doesn't cease to exist when hidden from view.

And, studies (e.g., here and here) suggest that by 5 or 6 months, infants have a sense of numbers.  But then, so, apparently, do non-human primates, such as tamarins.  When two objects were hidden behind a screen, tamarins expected to see two objects when the screen was lifted; when there were three, the animals looked at the objects longer than when they were presented with the expected number, suggesting surprise or confusion.  But then, dogs are good at playing Frisbee because they understand physics, too -- what does up there, comes down here.  

And, even crows understand water displacement, knowing that if they raise the water level in a small beaker, they'll be able to pluck out a piece of floating food.



And look at how bats, and even dragonflies track their in-flight prey.



There seem to be several things going on when things like the probability study in unschooled Mayans make the news.  To those of us who are schooled, probability, mathematics, physics -- or even grammar -- can seem like rather esoteric subjects that take years of training to master, or to even vaguely understand (though really, who understands probability?).  Traditional schooling has divided the world we know into disciplines that have names and bodies of knowledge that must be mastered.

But, in large part, formal education is giving names to things we already knew.  We have already internalized grammar as infants, we have a grasp of essential physical or mathematical principles, and it seems some basic understanding of chance as well. Essentially we're formalizing our description of the world we know from experience, but clearly we -- and dogs and tamarins, and crows and dragonflies and many other animals -- know that world before we know words or equations or models or principles that describe it.  And indeed, most animals never get to that stage.  I think we all can do this not because we have an innate sense of physics, or grammar, but because our brains have evolved to be able to recognize some kind of order, and to make generalizations from what we experience.  It's apparently important to survival, because so many organisms have evolved the same ability.

In this context, we should keep in mind that mathematics is just an elegant way of describing relationships and really exists only because over the millennia humans did in fact realize that relationships had regularity.  The long-known fact to western science, for example, that the Mayans had very sophisticated calendars shows that the recent news story is no surprise at all -- indeed, it would be very surprising were it not so.   How things work in the brain is, however, a different order of question.

Holly elegantly suggests it simply comes down to pattern recognition.  Frisbees follow predictable arcs, objects don't disappear inexplicably, if there are 5 yellow tokens and only 1 red, the chance of choosing a yellow one is higher than the chance of choosing a red one.  I'm happy with that.