Tuesday, February 18, 2014

The Ape In the Trees

I'd guess that few people outside of primate paleontology are familiar with the story of The Ape in the Tree.   It's one of those stories in paleontology that never gets oldeven if you've heard Alan Walker, one of the story's main unravelers and my Ph.D. advisor, tell it many times. Now, thanks to a new discovery by our team, out today in Nature Communications, we've got a new story to tell alongside it: The Ape in the Trees. 

Alan Walker in the mid 80s excavating at the Kaswanga Primate Site, Rusinga Island, Kenya
Because it's a juicy one, let's recount the ol' Ape in the Tree first.

The "Ape" refers to Proconsul, a genus of fossil hominoids from the earliest part of the ape radiation. Proconsul remains are preserved in deposits down at the early part of the Miocene epoch at sites in Kenya and Uganda. These animals weren't doing much of any brachiation, suspension, or knuckle-walking in the trees or on the ground like extant gibbons, siamangs, orangutans, gorillas, chimpanzees and bonobos. They were what we call generalized arboreal quadrupeds, moving about the trees by relatively slow but strong grasping with their hands and feet, and without the aid of a tail for balance. Body size estimates across the few known species range from 9-90 kg (20-200 pounds), but little in the way of functional anatomical differences (which are translated into behavioral differences) are apparent.

Proconsul skull discovered by Mary Leakey on Rusinga Island in the late 1940s.
Mary's skull was commemorated as a postage stamp.
In the story, "Ape" specifically refers to just one partial skeleton found at R114 on Rusinga Island, Kenya.

KNM-RU 2036. Proconsul partial skeleton from site R114. The green bits are the additional parts found by Walker and colleagues decades after the initial discovery marked in blue.
Rusinga's an island in Lake Victoria connected by a short human-built causeway to the mainland, but the lake, and hence the ability for Western Kenyan land to become surrounded by its water, is fairly recent, within the last 2 million years or so, and it's fluctuated greatly in depth ever since. Here's how it looks now:

© Shipman and Walker

© Shipman and Walker
20-18 million years ago, at the time of Proconsul, Rusinga Island's terrain was part of the land flanking the then-active Kisingiri Volcano which is just south of Mbita and out of frame of the map above. It's in large part thanks to the volcanic deposits that such wonderful preservation occurs on Rusinga.

Two fossilized grasshoppers found preserved in a deinothere (ancient elephant relative) footprint near the R114 (“ape in the tree”) site on Rusinga Island, Kenya. The one in the back is flipped upside down. © Alan Walker

Articulated bones of the right and left feet of an adult Proconsul (museum catalog no. KPS III). These ape feet remained intact upon burial and during fossilization, and were excavated roughly 20 million years later at the Kaswanga Primate Site, Rusinga Island, Kenya. © Alan Walker
People have been studying Proconsul for nearly 90 years (with the initial find at a limestone quarry in Koru, Western Kenya). And they've been studying the fossils and rocks of Rusinga Island, the site which has produced the most Proconsul fossils, for nearly as long.

Site R114 was recognized early on in the scientific history of Rusinga Island. A few curious circular deposits of a different rock composition than the surrounding matrix (and containing interesting fossils including Proconsul KNM-RU 2036, the blue bits on the figure above) were hypothesized to be potholes. This was the explanation that held until the mid 1980s when Alan Walker and his friends started poking around in the Rusinga collections in Nairobi.  Walker noticed some Proconsul bones collected by Louis Leakey and colleagues from Rusinga Island had been misidentified as other animals. This inspired a joint team from Walker's university, the Johns Hopkins University, and the National Museums of Kenya to go to Rusinga Island to re-examine sites there. The expeditions were successful. They recovered more Proconsul bones to join the partial skeleton KNM-RU 2036 (the green bits on the figure above).  But what's even more exciting was their investigation of this pothole idea.

They simply (I meant that in spirit, not sweat) dug down at the edges of the main pothole, the one that produced the Proconsul bones, and found something surprising. Instead of finding a basin shape in the ground like a good pothole, the edges actually got wider the deeper they dug.

The pothole that wasn't. (R114, Rusinga Island)
Instead of a pothole, this curious fossil-filled feature appears to be a fossil tree trunk that was both filled in by sediment and fossils but also buried deep by surrounding sediment. To explain how the Proconsul and other creatures, like a rabbit and an ungulate, were also preserved, Walker and colleagues surmised that it was once hollowed out and used by a carnivore, like a creodont, as a den.

Figure from Alan Walker and Mark Teaford (1989) The Hunt for Proconsul. Scientific American 260(1): 82. Diagram drawn by Tom Prentiss. Also reproduced in Walker and Shipman’s 2005 book The Ape in the Tree
So there you have it... that's "The Ape in the Tree" in a nutshell... You can read much more about it, and other stories about Proconsul, in the book of the same name.

link

Now, onto the brand new story of The Ape in the Trees...

A team of us has been working the sites on Rusinga and nearby Mfangano Island, annually or more, since 2006. We've been fortunate to collaborate internationally and to share this work with many undergraduate and graduate students too.

Systematic survey at site R107 in 2011

We've been exceedingly lucky to have a geologist on our team, Dan Peppe, who is also a paleobotanist. This is because there are not just fossil tree trunks but exquisitely preserved leaves.
Figure from "A morphotype catalog and paleoenvironmental interpretations of early Miocene fossil leaves from the Hiwegi Formation, Rusinga Island, Lake Victoria, Kenya" [link]

Andas discovered at R3, a particularly rich site, by Peppe's doctoral student, Lauren Michelthere are also well-preserved root systems. And these are located among fossil tree trunks, within nicely preserved paleosols (soils) that happen to contain fossils of Proconsul and another primate Dendropithecus.

Lauren Michel and Dan Peppe hard at work.
That's what we've published today in Nature Communications.

Reconstruction of site R3, this fossil forest containing fossil Proconsul (fore) and Dendropithecus (higher up) by artist Jason Brougham.

This time, then, the story's not about an ape skeleton actually inside a tree, but it's about an ape (and other creatures) dying in a preserved fossil forest of trees. The root systems and distance between trunks points to a closed-canopy forest, meaning that the arboreal creatures like Proconsul and Dendropithecus needn't come down to the ground to travel and that the ground was most likely very shady, with an ecosystem that reflected it. Further, the fossil leaf morphologies point to a wet and warm forest. All consistent with what many primates, especially apes, prefer for habitat today. What's more, this habitat reconstruction jibes so nicely with the behavioral interpretations we've made for Proconsul based on anatomy.

We started this long-term project at such historically well-known sites not just to find more of the ancient fossil apes they're famous for (which we have!), but because we wanted to describe the paleoenvironments in which they lived, died, and evolved. The findings in this paper are more than I dared to dream would come from our work already. It's some of the best evidence linking ape to habitat that we could ask for. It really speaks to the power of collaboration and perseverance.

Glorious fossils. Not too shabby of a campsite either.

*** 

To get a sense of what we do at Rusinga, here's a nice film that a crew from the American Museum of Natural History put together about our work.

Science Bulletins: Expedition Rusinga - Uncovering Our Adaptive Origins from AMNH on Vimeo.

Our work's also featured in the Hall of Human Origins at the AMNH, including a fun interactive exhibit that shows us doing many fieldwork type things like sieving endless piles of dirt.

And here's a peek at the ape tail loss story that Proconsul tells, which will be part of Neil Shubin's three-part series in April on PBS called Your Inner Fish.

***

The paper

Michel, L. et al (Feb 18, 2014) Remnants of an ancient forest provide ecological context for Early Miocene fossil apes. Nature Communications doi:10.1038/ncomms4236

Monday, February 17, 2014

In search of scientific blunders

The current (March 6,  2014) issue of The New York Review of Books includes a review (paywall) by physicist Freeman Dyson of a new book by Mario Livio called Brilliant Blunders: Colossal Mistakes by Great Scientists that Changed Our Understanding of Life and the Universe. The book provides the story of five famous mistakes by giants of science, Charles Darwin, William Thomson (Lord Kelvin), Linus Pauling, Fred Hoyle, and Albert Einstein, but Dyson writes that after reading the book, he now sees mistakes by great scientists everywhere.  Isaac Newton, James Clerk Maxwell, Gregor Mendel, himself. The thesis is that these great and greatly influential thinkers made major mistakes but essentially because they were thoughtful about it and had good reasons relative to then-current theory, the price in reputation they paid, or that society paid, was slight (unlike military or other real-world blunders).

Darwin
Darwin ignored Mendel and Mendel ignored Darwin.  But who cares?  They did so despite vastly important work that, well, that worked.  In a sense, their mistakes reflect the boldness of their contributions.  Dyson argues that "The greatest scientists are the best losers."

In contrast to these bold thinkers, most modern scientists, caught in the current system, we think, are pressured to play safe and too often that's just what they do.  We have major problems facing us in terms of understanding biological evolution and causation and as we've recently written, we personally think the name of the game today is generally just that: play safe, don't make blunders, because if you do, your continued success in the system is at risk.  No tenure, no grants, no promotions.  (Note, we say 'blunders', that is, mistakes or wrong guesses even if for good reasons, but not fraud or other culpable misdeeds).

We've tried to take the basic statistical basis of modern evolutionary and genomic inference to task for clinging to conservative, established modes of thought rather than addressing what we believe are genuine, deep problems that are not (we feel) amenable to multivariate sampling statistics and their kin.

The proof isn't in the pudding!
The current peer review system is supposed to be a quality control system for grants and publications, so that work is judged fairly on merits alone.  Of course like all large institutional systems it has its flaws.  Some are easy trades for fairness rather than Old Boy-ism, but the system also very strongly and clearly leads to safe conventionalism.  It can accommodate real innovation, but it makes that hard.  But we have at least lived under the illusion that citation counts are an accurate recognition of quality.

Now, a new commentary in Science ("Peering into Peer Review," Jeffrey Mervis) reports that this doesn't seem to be so.  The results of two studies of the citation records of papers reporting the results of citation counts of nearly 1500 studies funded by the National Heart, Lung, and Blood Institute of the NIH between 2001 and 2008.  At least among those that were funded, there is no sense that better priority meant higher impact; the funded projects with the poorest review scores were cited as many times as the studies with the highest scores.  Michael Lauer, head of the division of Cardiovascular Sciences at the NHLBI who carried out the studies, was surprised.
"Peer review should be able to tell us what research projects will have the biggest impacts," Lauer contends.  "In fact, we explicitly tell scientists it's one of the main criteria for review.  But what we found is quite remarkable.  Peer review is not predicting outcomes at all.  And that's quite disconcerting."
In recent years, 8-10% or so of grant applications have been funded, so even grants with the lowest priority scores are in the top tier so it's no surprise if they are all being cited equally.  But, as Mervis points out, the fraction of grants funded was much higher in the early years of these studies, and presumably, if peer review ranking of applications is at all meaningful, there should have been quality differences between the highest and lowest ranked grants.  Another difference, that probably should have made a difference in quality of results, was that the lowest ranked grants generally received less funding.

If citation counts actually reflect on quality, this indicates that a whole lot of good research is not being funded these days -- which of course is no surprise.  However, this study highlights something else about citations, too, if in an oblique way: we're not talking about a lot of citations.  According to the commentary, each publication, whether from a grant that had a top, middle or lower tier priority score, was cited 13 or 14 times.  To us, that's not high impact.  Perhaps the equal paucity of citations should be of more concern than that high and low quality grants are equally cited.  Perhaps this shows the unsurprising arbitrary aspect of grant ratings and reviews, but it also shows something that not many people talk about, which is that most publications are mostly ignored.  This isn't a surprise either, given the number of papers published every year, but it should probably be more sobering than it seems to be.

Or could something beyond these issues be going on?

There is more than bad luck here--much more!
Perhaps if we were really funding scientific quality--innovation, creative thinking, and really bellying-up to core, serious problems, we should expect a strong negative correlation.  The top scores should go to deep, creative proposals that, almost by nature, are likely to fail.  These would not be crank or superficial proposals, but seriously good probing of the most important (and hence truly highest priority) research.  Most of us know very well that being really creative is a sure way to not be funded, and that is widely (if privately) acknowledged: write a safe, incremental proposal, promise major results even if you know that's unlikely, and then use the money to do something you really think is new.

If not a strong negative correlation overall, at least there should be a few really new findings that got lots of citations and recognition, but most top-scoring projects went nowhere. That, at least, could be an indicator that we're funding the best real science.

One wonders how, or if, the Large Hadron Collider would have reported their very large experiment if they had not found evidence for the Higgs Boson--or, are they forcing a positive interpretation out of their statistical data?  We can't judge, but the power of wishful thinking is always there we think a clear negative would have been reported as such.

One can say that much of today's science is going nowhere or littlewhere (GWAS and much else), though the 'negative' results are often safe, highly publishable and lauded results that a big-data study is guaranteed to find. The first few times no real map results were achieved, and we saw that polygenic traits really turned out to be polygenic, we might have had a major positive finding: a theory, even if a very generic one, was confirmed.  Of course, a GWAS study may make a big strike, but more typically, instead of saying 'our GWAS basically found nothing, showing that our idea was wrong and some other idea is needed',  even a big strike-out can be made into a big story as if there is a real lesson learned; minor hits are lauded.  Indeed, empty big data results are usually not viewed as the anathema of a failed theory, but highly published and publicized to justify even bigger studies of the same type.

The kind of busts that regularly result in the safe market that predominates today are not really tests of any theory, and so do not come up with a definitively empty results.  They are not high quality blunders. And even these top-ranked, generally safe studies,  are not reflecting distinctions between priority scores and citations.

If what we want is safe science, in areas where we just need routine work on some new project, even if it's important such as a universal flu vaccine or data on obesity and disease, then we could establish an Engineering Science funding pool and build a firewall between that highly relevant but specific goal-oriented work, and basic science.  Nominally, NSF should be the place for the latter and NIH for the former, but that's not how things are today, because both have become the social welfare net for universities and their faculty.  Good work does get done, but a lot of trivial work is supported even if it's of good technical quality, and the pressure is to do that,while actual innovation is in a sense punished by the conservative nature of the system.

It would be terrific if the very best science tested real ideas that went nowhere.  Then we could encourage development that would lead to more blunders, and the correlated emergence of more Darwins and Einsteins.

Friday, February 14, 2014

Let's shake on it!

Well, folks, as to how your research money is being spent:

Here's the Bulletin of the Week (or, at least, Day) from Nature, Heredity, not your usual source of such earth-shattering news.   Nature Heredity reports that a GWAS twin study finds no genetic evidence for handedness.*  Since GWAS is the Ptolemaic theory of our time (see yesterday's post and comments), this latest epicycle of not-to-be-questioned methodology is not just a shocker, but a confirmation (of something).  This must be at least as much so as yesterday's Big Story, that a genome sequence from a pre-Columbian US skeleton shows that Native Americans really and truly were---- descendants from Asians!

Let's shake on it--it's a big deal...isn't it?
That the handedness story made Nature was published at all, and then given such play, will certainly reinforce the idea that no 'paradigm shift' is currently needed in genomics or evolutionary biology.  This will bring comfort to hundreds of investigators and thousands of their graduate student trainees learning truth by doing their masters' work for them.  Some might have been worried that at least a few people have questioned the current ability of our science to provide definitive genetic answers to every question we might think to ask about any trait of vital societal importance.

Actually, what this study really shows, once and for all, is a fundamental fact about modern genetics:

The left hand doesn't know what the right hand is doing!**


--------------------------
* We add a caveat that the study suggests.  They were able to analyze a mere 3940 twins, so could only detect genomic single-SNP effects with an odds ratio of 2.0.  It could not detect sites with an OR of 1.2 or less.  The point, of course, is to provide an advance justification for anyone to propose a gargantuan meta-analysis of, say 100,000 twins, to detect the grains of sand that contribute to this vitally important trait.

** And apparently the right hand doesn't know what the right hand is doing here!  This story is in Heredity, not Nature!  Our mistake, thinking the BBC only picks up Nature/Science stories!

Thursday, February 13, 2014

Even more on posing the well-posed....

Here's another installment on the idea of well-posed questions in evolutionary and biomedical genetics, and related areas of epidemiology.  We wrote about why the kinds of questions that these fields have been able to ask about causation has changed, and why well-posed questions are less and less askable now.  We've tried to deconstruct the questions that are being asked currently and explore why they aren't well-posed and why they can't be.  Today, we think further about what it means to pose answerable questions about causality.

The causal assumption
A well-posed question can be about the fact of a numerical association, say between some specified variable and some outcome.  Or, it can be about the reason for that association.  Do we live in a causal universe?  This may seem like a very stupid question, but it is actually quite profound. If we live in a causal universe, then everything we see must have a cause: there can be no effect, no observation, without an antecedent cause--a situation that existed before our observation and that is responsible for it.

Actually, some physicists argue from what is known of quantum mechanics (physical processes on the atomic or smaller scale) that there are, indeed, effects that do not have causes.  One way to think of this, without being quite as strange relative to common sense as much of quantum mechanics is, is this:  the world must be ultimately deterministic.  Some set of conditions must make our outcome inevitable!  If you knew those conditions, the probability of the effect would be 1 (100%).

In that sense, the word 'probability' has no real meaning.  But if to the contrary, outcomes were not inevitable, because things were truly probabilistic, then our attempts to understand nature would or should be different in some way.  The usual idea of 'laws' of Nature is that the cosmos is deterministic, even if many things like sampling and measurement yield errors relative to our ability to infer the whole, exact truth.  And 'chaos' theory says that it may be that even the tiniest errors, even in a truly deterministic system, can lead to uncontrollably or unknowably large errors.

These ideas are fundamental when we think about our topics of interest in genetics, public health, and evolutionary biology.

The statistical gestalt
A widespread approach in these areas is to see how much juice we can squeeze out of a rock with ever-fancier statistical methods (this is helped by the convenient software we have to play with).  Whether the variables are genomic or environmental or both, we'll slice and dice this way and that, and find the fine-points of 'risk' (that is, retrospectively ascertained already achieved outcomes).

Now, one may think that there really is some 'juice' in the rock:  that the world is truly deterministic, and that the juice we see is the causal truth masked by all the mismeasured and unmeasured variables, incomplete samples, and the other errors we fallible beings inevitably make.   Our statistical results may not be good for prediction, or may be good but to an unknown extent, but if we think of the world as causal, there is for each instance of some trait, the wholly deterministic conditions that made it as inevitable as sunrise.  In defense of our hard-to-interpret methods and results, we'll say "it's the best we can do, and it will work at least for some sets of factors." And so let's keep on using our same statistical methods to squeeze the rock.

One may explain this view by saying that life is complex, and statistical sampling and inferential methods are how to address it.   This could be, and certainly in part seems to be, because many issues of importance involve so many factors, interacting in so many different ways that each individual is unique (even if causation is 100% deterministic).  This raises fundamental issues as to how we could ever deliver the promised successes (people will still get sick, despite what Francis Collins and so many others implicitly promise).  Perhaps what we're doing is not just the best we can do now, it's the best that can be done, because it's just how Nature is!

Rococo mirror and stuccowork, Schloss Ludwigsburg, Stuttgart

But we should keep in mind one aspect of the usual sorts of statistical methods, no matter how many rococo filigrees they may be gussied up with.  These methods find associations (correlated presence) among variables, and if we can specify which occurred first, then we can interpret those as causes and the latter as effects.  We can estimate the strength or 'significance' of the observed association.  But we should confess openly that these are only what was captured in some particular sample.  Even if they truly do reflect underlying causation, this can be so indirect as to be largely impenetrable (unless the cause is so strong that we can design experiments to demonstrate it, or find it with samples so small and cheap that our Chair will penalize us for not proposing a bigger, more expensive study).

Now associations of this sort can be very indirect relative to the actual underlying mechanism.  Why?  The mix of factors--their relative proportion or frequency--in our sample may affect the strength of association.  An associated,  putatively causal factor may be correlated with some truly causal, but unmeasured, factors.  The complexity of contexts may mean that causation only occurs in the presence of other factors (context-dependency).

If we do not have a theory, or mechanism, of causation, we are largely doomed to floundering in the sea of complexity, as we clearly are experiencing in many relevant areas today.  More data won't help if it generates more heterogeneity rather than enough replications of similar causal situations.  More data will feed SPSS or Plink, very fine analysis software, but not necessarily real knowledge.

This might be true, but to me it's just stalling to justify business more or less as usual.  We have had a century of ever increasing data scale and numerical statistical tricks and we are where we are.  As we and others note, there have been successes.  But most of them are because of strong, replicable effects (that were or could have been found with cheaper methods and smaller, more focused studies).   We're not that much better off for many of the more common problems.

Often it is argued that we can use a simplified 'model' of causation, ignoring minor effects and looking for just the most important ones.  This can of course be a good strategy, and some times a true attempt is made to connect relevant variables with at least half-way relevant interactions (linear, multiplicative, etc.); but more often what is modeled is a set of basic regression or correlation associations between some sets of variables (such as a 'network'), but without a serious attempt at a mechanism that explains the quantitative causal relationships and in a sense still relies on very indirect associations (usually we just don't have sufficient knowledge for being more specific). We've posted before (in a series that starts here, e.g.) on some of the 'strange' phenomena we know apply to genetics that mean that local sites along the genome can have their effects only because of complex, very non-local aspects of the genome.  Significant associations only sometimes lead to understanding of why.

But if we have a good causal model, both the statistical associations, and nature of causation become open to understanding--and to prediction.  With a theory, or mechanism, we can sample more efficiently, control for and understand measurement and other errors more effectively, select variables more relevantly, separate out human fallibility variation (the truly statistical aspects of studies) from causation, interpret the results more specifically, and predict--an ultimate goal--with more confidence and precision.  A good causal mechanism provides these and many other huge advantages over generic association testing, which is mainly what is going on today.

Of course, it's hellishly difficult even in controlled areas like chemistry, to develop good mechanistic understanding of life.  That is to me no excuse for the kind of many-layered removal between what we estimate from data now, and what we should be trying to understand.  Even if it's harder to get grants if we actually try to do things right.

There are some seemingly insuperable issues, and we've written about them before.  One is that we fit data to restrospective data, about events that have already happened, yet we often speak of that as if it is retrodiction as if that means it has predictive value.  But we know that many risk factors, even genetic ones, depend on future events unpredictable in principle, so we can't know enough about what we're modeling.  And second, if we're modeling one outcome--diabetes, say--then the risks of other 'competing' outcomes, say cancer rates, clearly affect our risks for the target outcome, and in subtle ways.  If cancer were eliminated, that alone would raise the risk of diabetes because people would live to have longer exposures to diabetes risk factors.  Since we don't know about such changes in other competing factors that may occur, this is not really a problem that we can build into causal or mechanistic models of a given trait--a real dilemma and, probably, a rationale for doing the best statistical data-fitting that we can.  But I think we too routinely we use such things as excuses for not thinking more deeply

Of course, if the causation in our cosmos is truly probabilistic, and effects just don't have causes, or we don't know the probability values involved or how such values are 'determined' (or if they are), then we may really be in the soup when it comes to predicting things we want to predict that have many complex contributing factors.

Forming well posed questions, not easy but ....
To me personally, it's time to stop squeezing the statistical rock.  Whether or not the world is ultimately deterministic or not, it is time to ask:  What if our current concepts about how the world works are somehow wrong, or more particularly if the concepts underlying statistical approaches are not of the right kind?  Or if we rely on vanilla methods that are chosen because they allow us to go collect data without actually understanding anything (this is, after all, rather what genomics' boasts of 'hypothesis free' methods confesses).

Even if we have no good alternatives (yet), one can ask how might we look for something different--not while we continue to do the same sorts of statistical sampling approaches, but instead of continuing to do that?  Theory can have a bad name, if it just means woolly speculation or great abstraction irrelevant to the real world.  But much of standard statistical analysis is based on probability theory and degrees of indirection and 'models' so generic that they are just as woolly, even if they generate answers, because the 'answers' aren't really answers, and won't  really be answers, until we are somehow forced to start asking well-posed questions.

Some day, some lucky person, will hit upon a truly new idea....

Wednesday, February 12, 2014

More on well-posed questions and how to ask them

Yesterday we discussed what are called 'well-posed" questions.  Science is the enterprise by which we try to answer questions we have about Nature and the world.  Science is, if anything, a kind of method of investigation, a philosophy of what counts as knowledge.  But it's not trivial that you can't answer a question until you've asked it, and if you haven't done that properly, well....

The history of western approaches to asking and answering questions about Nature is long and varied.  In the earliest western writings, the 'classics' such as Ptolemy, Aristotle, the Pythagoreans, Hippocrates, Euclideans, and the Epicurians, the idea was that one could observe the world and then think-out theories of how the world is.  There were experimental works and formal theories, like geometry, but much was done without formal or systematic ways of codifying knowledge. Science was, to a great extent, opinion.  There were schools of thought populated by groups with different opinions, but little way to resolve them.  Some became dominant (such as the four humors and medicine), but their epistemology--their knowledge basis--was informal.

French salon in the age of Enlightenment; Wikipedia

In the so-called 'Enlightenment' period in European history (to our chagrin, we know very little about the history of science and philosophy elsewhere in the world), the idea took hold that only by careful, systematic observation of specific, constrained instances of Nature could answers to general questions be developed.  Induction based on repeated observation, reduction to basic causal elements, and formal types of reasoning (especially, mathematics) were approaches to the world.  Underlying this (as in the classical times before, to an extent), was the idea that Nature was a material phenomenon (perhaps started and occasionally interrupted by God, depending on one's religious views), that followed consistent processes, or 'laws'.  If Nature were not, then it would be un-knowable because what we thought we knew could always be excepted here, there, or anywhere.

Along with laws came notions that we now know as well-posed questions.  As we said yesterday, a well-posed question is one that is available for study by proper science.  Whether the Mona Lisa or pepperoni pizza are good are not well-posed questions.  Among many reasons, the answer differs for any person, and without any seriously known predictability, and they are personal judgments that, unlike laws of Nature, can change willy-nilly.

The fallibility of well-posed questions
But, as we noted yesterday, a well-posed question should have these sorts of properties:
  1. It is structured so that a solution exists, at least in principle 
  2. Such a solution should be unique 
  3. The solution's behavior changes systematically with the initial conditions 
  4. An appropriately sufficient set of variables should be adequately measured.
But there are ways perhaps to clarify some of the problems that we didn't mention.  For example, most statements in science rest upon definitions.  Definitions of terms for things in the world can be arbitrary--we choose them as humans, assuming they apply to reality.  But they need not be as applicable as they sound.  So a scientific question can be clear-sounding but in fact not well-posed:

Q1:  "What is the cause of type 2 diabetes?"

Sounds fine, and it is a simple sentence, but does it meet the criteria of a well-posed problem?  And is it answerable?

For starters, one must define what is meant by 'type 2 diabetes'. The implication is that it is a state of being (assuming we're discussing humans, or at least mammals).  But what 'state'?  One must specify more, or as the humanities-jargon would have it, 'unpack' the phrase.  Some measurement is implied, but what?  Or is it some outcome?  If we define 't2d' to be some level of blood glucose, measured in some specific way, then we might argue that Q1 is well-posed.  Indeed, we can answer it with a discussion of the biomechanism that causes high blood glucose.  But that's not helpful if what we really want to know is why that mechanism has gone off in some people.

And, often what happens is that the definition changes--diagnostic glucose cutoffs are changed, or the definition changes from glucose levels to some other trait, such as blood pressure, neuropathy, vision problems or some set of these purported glucose-dependent outcomes.  That is we can define the trait in terms of a risk state (glucose levels) or an outcome state (kidney failure).

Blood glucose meter; Wikipedia

Now the question, no matter how simple in structure, is not so clearly well-posed.  Even with a constant definition, the measure can change.  We measure not blood glucose but glycosylated hemoglobin, or we use a different measuring machine.  Then we face various data consistency problems, and certainly whether past data can be used any longer.

Or, we could say that 'diabetes' is the state when you have a particular genotype.  That may sound strange, but in a law-like universe isn't the genotype just a form of trait, and other things like glucose level or kidney failure just secondary, indirect measures of the  'true' trait?

Q2:  "Do factors f1, f2, f3, and f4 cause diabetes?"

This version sounds better because it's more specific.  Here, let's now forget the problems with defining diabetes, taking that for granted.  We just move the same issues towards f1--f4: defining them and how they are measured.  Now we have to consider whether the question means all these factors, or just 'some combination' of them.  If the latter, then the question is not well-posed until it is re-phrased to be clear about what 'combination' means.  Presence or absence?  Some quantitative measure on the factors?  If f1 alone never leads to diabetes, is it a 'cause'?

If we define terms clearly then one might assume that 'a' solution to the question does exist, at least in principle, at least for some specific data.  But if definitions can change then the solutions will change, having no bearing on Nature but only on human culture--the way we choose to make our definitions and measurements.  One thing that is clear is that most of the time questions like Q2 are not clearly posed.

What about uniqueness?  If the answer is not unique, then the law-like assumptions about Nature are somewhat strange.  Causes should have outcomes and if you measure carefully, shouldn't you get just one outcome?  Or one cause per outcome?  But we know that's not true (in the broadest sense, there are many ways to die, but there are also many paths to heart disease, or asthma and allergic response, and so forth.)  If many different causes have purely law-like properties, then Q1 is actually not a well-posed question, no matter how simple and clear it sounds.  Q2 might be better, for example, if each possible combination of the f's has a unique outcome; but if there is a factor f5 that is not being measured, the question will not have a unique answer.

For this reason, the escape from the idea of uniqueness is to say that many different causal situations each uniquely cause diabetes is correct in a sense, but then even Q2 is poorly posed.

Q3:  "Do factors f1, f2, f3, and f4 cause diabetes by altering its probability of occurring?"

This is a common escape-valve way to pose the question.  It's an escape valve because it introduces probabilities, which has to do with repetition of identical conditions, which never happens. But it allows almost anyone to collect almost any sort of data and 'answer' the question in a technical sense of getting an answer.  Probability is an ethereal concept but one that inevitably involves fractions of outcomes of a given type out of multiple outcomes, real or imagined.  Or it may mean something about the fundamental way that the factors act or interact.  One attractive attribute is that it can mean anything and you don't have to specify in clear terms.  Statistical analysis yields a test, but since the outcome may or may not occur, Q3 can always have an 'answer'.  If some statistical significance level is used, the answer is 'yes' or 'no', but such levels are subjective not objective.  And even if the true (but unknown) answer is 'no', we know from sampling theory that finding exactly zero effect is unlikely even if there is no actual causation involved.

So, criterion 3 for well-posed questions:  Does 'diabetes' change in some orderly way with changes in the test variables?  Often we can't really tell or we use some assumed patterning test to make statistical judgments (e,g., regression coefficients).  Again this requires many aspects of definitions and assumptions about the nature of replicability.  In particular, given what we know about genes and environmental factors, the mix of factors, the sampled contexts, is unique in each study or sample.  This means that we cannot actually replicate findings, and replication is one of the classical ways to check our answers to well-posed questions (or even to ill-posed questions).

Next?

In a law-like universe, we should be able to replicate exactly, at least in principle.  In the living world, we can't.  In the living world, replication is not part of evolution's deal--variation is.  In that sense, even non-replication doesn't invalidate a finding from a previous study.  So we're stuck to a great extent with approximate statistical answers to questions as we happen to pose them today, that we want to extend to the future but know that this depends on changes in the factors which we cannot in principle predict.

Statistical sampling methods and probabilistic thinking are the ways we address the ill-posed nature of many of our evolutionary and biomedical questions. Sometimes this works well enough to answer poorly-posed questions satisfactorily enough for our particular purposes.  With strong causal factors that are prevalent in the population of inference, things work and Enlightenment scientific criteria take us where we want to go.  In that case, we don't even need well-posed questions, or we don't care how they fail to measure up, because we get what we want.

But what we are facing along a broad front of evolutionary and biomedical (and agricultural) interests are poorly posed questions that are not giving us what we want.  What we tried to suggest yesterday, and we've discussed in many posts, is that what is needed are well-posed questions for which our methods and concepts of causality are apt, or questions that force us to develop appropriate methods and concepts of causality.

It's time for someone to fill in the blank:

Q4:

Tuesday, February 11, 2014

The well-posed problem problem

Yesterday Ken wrote about the Gary Taubes piece in Sunday's NYT that asked why nutrition science has failed to explain the obesity and type 2 diabetes epidemics in the US and elsewhere.  As Ken wrote, "After decades of huge, expensive studies by presumably the most knowledgeable investigators doing state-of-the-art science, we know relatively little with relatively little firmness, about what we eat does for or to us."  He suggested that this can be attributed in large part to the state of the research establishment, the pressure scientists feel to keep their labs churning out 'product' -- Taubes notes that there have been 600,000 papers and books on obesity in the last few decades, with little true understanding gained.  The science industry relies on, and indeed encourages more of the same reductionist thinking that hasn't worked in the past.  And even if not intentionally, that discourages true innovation.

Well-posed questions
So, the research establishment is a large part of the problem.  But must an unwieldy, top-heavy, conceptually conservative research establishment be inefficient?  Is it possible that even a large establishment could be better at answering the questions it asks?  Maybe not, but if it would at least be possible in principle, perhaps the place to start is to wonder whether it's the questions being asked that are the problem.

There's an idea in mathematics that posits that to explain a physical phenomenon, the problem should be "well-posed".  That is,
  1. It is structured so that a solution exists, at least in principle
  2. Such a solution should be unique
  3. The solution's behavior changes systematically with the initial conditions
And to make this more relevant to our discussion, we'll add another criterion:
     4. An appropriately sufficient set of variables should be adequately measured.
Is it reasonable to apply this framework to the kinds of questions asked in epidemiology, as well as genetics?  Sometimes, certainly.  In 1854, John Snow asked why people living around the Broad Street water pump in a neighborhood in London were getting cholera.  His solution was that the water was contaminated.  Most people at the time believed cholera was caused by dirty air, so this wasn't a wildly popular solution.  But he was right, and when the water was no longer contaminated, people no longer got sick.   A well-posed problem with a unique solution.

John Snow's cholera map - the halcyon days of epidemiology (Wikipedia)

In the early days of epidemiology, the field could ask well-posed questions about causality and indeed correctly discovered the cause of many infectious diseases as well as diseases caused by environmental risk factors like asbestosis or smoking-induced lung cancer or, not even that long ago, Legionnaire's disease.  Genetics, too, grew up expecting to be able to ask, and answer, well-posed questions -- what causes cystic fibrosis, or Tay Sachs disease or Huntington's disease?

But, if we ask what causes obesity, and we try to answer it with even state-of-the-art epidemiological or genetic methods, we are now in the territory of ill-posed questions.  Something(s) must cause it because it exists.  But we know that everyone gets there in their own way -- no two people eat exactly the same thing and expend the exact same amount of energy on the path to obesity, so there can and will be no single answer to the question.  In fact, this is true of many single-gene diseases as well -- for example, there are over 1000 different variants that are attributed to cause cystic fibrosis, at least in the sense that they are found in affected persons but transmitted from unaffected parents, suggesting that the combination is responsible for the trait.  However, it is often found even in purportedly simple genetic diseases that a large fraction of people with the 'causal' allele are unaffected, as is the case for hemachromotosis (an iron absorption disease).  This version of reality can't be the answer to a well-posed question.

And therein lies a fundamental problem.  If there can be so many answers to questions of causation, how do we know when we're done?  Have we found the answer when we identify the cause of Mary's obesity or is it not a solution until we've found that in general taking in too many calories relative to calories expended is the cause?  That is, until we can generalize from one or more observations to assume we've found the solution for all cases?  Have we found the cause of cystic fibrosis when we know which variant causes John's, or is it when we know that there are thousands of variants on the CFTR gene that can be causal?  But then what of persons with 'causal' genotypes who are disease-free?

This isn't a trivial problem, because when we try to explain individual issues from the group level -- when we're taking the generalizations we've made from individuals and trying to apply them to people who didn't contribute to the original data set -- we're doing this without some important relevant information, and we often don't know what that is.  We're also, in effect, assuming that everyone is alike, which we know is not true.  And, when we get into the realm of prediction, we're assuming that future environments will be just like past environments, and we know that's not true.  Even then, we assign group risks probabilistically, since usually not everyone in the group (say, smokers) gets the disease.  What kind of cause are we implying?

Well, causation becomes statistical and that can mean we don't really understand the mechanism or, often, are not measuring enough relevant variables.  We use terms like 'cause' too casually, when what we really are suggesting is that the factor contributes to the trait at least in some circumstances among those who have the factor.

So it becomes impossible to know how it applies to any given individual, either to explain or predict disease.  What we know is that smoking causes cancer.  Except when it doesn't.  Or, let's consider Gary Taubes' favorite poison these days, sugar.  Can he actually answer the question of what causes obesity and diabetes with a single answer -- 'sugar', as he seems to believe?  If he can, then it is a well-posed question.  But, then, why, is my 86 year old father thin and diabetes-free even though he has been known to order 2 desserts following hearty restaurant meals?  Indeed, he used to feed his children ice cream for dinner (we did appreciate it). And his own father had type 2 diabetes, so there's no excuse for my father to be free of this disease.  Or, so far, his 3 children, no longer young, who were given those long ago ice cream dinners.  So, sugar can't be the answer to what causes obesity, though it might be an answer of sorts; but asking what causes obesity and type 2 diabetes can't be a well-posed question if it isn't going to have a unique solution in some specifiable way.

Even thoughtful people who believe GWAS have been a great success will acknowledge that it's hard to know what to make of the findings of tens or hundreds of genes with small effect for many diseases. What causes X?  Well, either this group of 200 genes, or that group, or that other group there.  But with 200 genes, each with only (say) two different alleles (sequence states), there are 3 to the 200th power different genotypes, even if only a tiny fraction--unknown--ever actually exist in the population, much less in our particular study sample.

Add to this the environmental factors, but we usually don't know which ones, or which ones are significant for which people. It is a kind of legerdemain by which we seem scientific by saying that gene X raises the relative risk of an outcome by such and such an amount.  What do we really mean by that, other than to say that's the excess number of cases in our retrospectively collected data, that we extrapolate to everyone else?  Is the answer in any way clear?  Or was the question about 'risk' not well-posed?

Perhaps, then, we must strike the well-posed problem criterion for understanding disease.  Forget unique solution, for all practical purposes no solution exists when an effect is different for everyone.   That is clearly true since a unique observation can't be repeated to test the original assertion.

Or better said, when you've got essentially as many solutions as you have individuals, or you've only got aggregate solutions, it's both hard to predict and hard to know how to apply group solutions to individuals.  We're all told to lower our LDL cholesterol and raise our HDL, which seems to lower heart disease risk -- except when it doesn't.  And this is something that can't be predicted because, well, because it depends.  It depends on many unknown and unknowable factors; in many many cases, causation is not reducible.

The best that epidemiology has come up with for determining whether a theorized cause really is one is a set of criteria pulled together long ago, the Hill criteria, a list of 9 rules that looks nice on the face of it; the larger an association because a factor and an effect, the more likely the factor is the cause; the cause should be consistent; it should seem plausible, and so forth.  The only problem is that only one of these criteria always must be true -- the cause must precede the effect.  So, in effect we've got a branch of science that sets out to find the cause of disease that has no clue how to tell when it's done.*  Never being done is nice for keeping the medical research establishment in business, but not so nice for the enhancement of knowledge.

A colleague and friend of ours, Charlie Sing, is a genetic epidemiologist at Michigan.  He has for decades tried to stress the clear fact that it is not just a 'cause' but its context that is relevant.  One gene is another gene's context in any given individual, and ditto for environmental exposures.  Without the context there is no cause, or at least no well-posed cause question.  If the contexts are too complex and individual, and not systematic, or involve too many factors, we're in the soup.  Because, it depends.

Where do we go from here?
So, what's the bottom line?  We don't know what questions to ask, we don't know when we've answered them, and we don't know how to do it better.  Our questions and even many of our answers sound clear but are in fact not well-posed.  For reasons of historical success in other fields, we're stuck in a reductionist era looking for single explanations for disease, but that's no longer working.  It's time to expand our view of causation, to understand what it means that my father can eat all the desserts he wants and never gain weight but that seems not to be true for a lot of people.  But it is fair, indeed necessary, to ask, What are we supposed to do if we reject reductionism, which has stood the test of at least a few centuries' time?

Gregor Mendel, who asked well-posed questions, and answered them

It seems likely that we were seduced by the early successes of epidemiology with point-causes with large effects -- infectious diseases -- and we were similarly seduced by Mendel's carefully engineered successes with similar point causes -- single genes -- for carefully chosen traits, but these are paradigms that don't fit the complex world we're now in.  What Mendel showed was that causal elements were inherited with some specifiable probability, and he did that in a well-posed setting (selective choice of traits and highly systematized breeding experiments).  But Mendel's ideas rested on the notion that while the causal elements (we now call them alleles) were transmitted in a probabilistic way, once inherited they acted deterministically.  Every pea plant with the same genotype had the same color peas, etc.  We now know that that's an approximation for some simple situations, but not really applicable generally.

We are really good at gathering and generating data.  Big Data.  Making sense of it is harder.  The casual assumption is that we can, do, and should, ask well-posed questions, but at present perhaps we no longer can.  When every person is different, when there can be more than one pathway to a given outcome (and defining outcomes itself is problematic), and when the pathways are complex, finding simple answers in heterogeneous data sets isn't going to happen.  And finding complex answers in heterogeneous data sets is something we don't really know how to do yet.  And again, generalizing from individuals to groups and back again adds its own layer of difficulty because of just what our friend Charlie says -- the answers are context dependent, but you've lost the context.

There have been a number of attempts to form statistical analyses that take many variables into account.  Some go site by site along the genome, and sum up estimated risk for the variants found at each site, then account for other variables such as sex and perhaps age and others, maybe even some environmental factors.  The basic idea is that context is identified as combinations of factors that together  have a discernible effect that the variables don't have on their own. 

This kind of statistical partitioning may help identify very strong risk categories, such as responses to some environmental factor in people with some particular genotype, but in general everyone is at some risk and we as a society will have to ask how much detail is worth knowing, or whether when too many variables constitute exchangeable sets (that is, with similar risk) its usefulness may diminish.  Will it be expected to tell us how many cigs and cokes and Snickers bars we can consume, how many minutes on the treadmill, for our genotype, sex, and age, an apple a day, and keep our risk of, say, stroke, to less than some percent for the next 10 years?  Or is that not a very useful kind of goal--a well-posed question?  Time will tell.

An unfair rebuke
We are often chastised for being critical of the current system because while what we say is correct we haven't provided the magic what-to-do-instead answer.  That is little more than a rebuke of self-interested convenience, because if our points are correct then the system is misbehaving and needs to change rather than just keep the factory humming.  The rebuke is like a juror voting to convict an accused, even if there isn't any evidence against him, because the defense hasn't identified the actual guilty person.

That we can't supply miracles is certainly a profound limitation on our part.  But it isn't an excuse for still looking for lost keys under the lamp-post when that's not where they were lost.  If anything, our certainly acknowledged failure should be a stimulus to our critics to do better.

If we can't expect to ask well-posed questions any longer, what kinds of questions can we ask?


--------------
*This is somewhat hyperbolic because with infectious diseases, or environmental factors with large effects -- coal dust and Black Lung Disease, asbestos and asbestosis, e.g. -- it is possible to know with experimental data and overwhelming evidence from exposure data.  But it's certainly true for the chronic late onset diseases that are killing us these days.

Monday, February 10, 2014

Eat, drink, and be .... confused!

Is beer good for your health or is it a slow killer?  What about, say, bread, broccoli, wine, eggs, or (dare we say it?), sex?

Beer; Yebisu Beer Museum, Wikipedia

The answers are:
1.  Yes!  Anything we consume will keep us alive, but that just facilitates the path to our final end.  Metabolism and conjugal energy expenditure generate waste products, heat, cell division, and so on.  That leads to death!

Or 2.  Nobody knows!  After decades of huge, expensive studies by presumably the most knowledgeable investigators doing state-of-the-art science, we know relatively little with relatively little firmness, what we eat does for or to us.

In a Times article on Sunday, nutrition journalist Gary Taubes excoriates the nutrition research mill, and we think properly so, for decades of generating ever more numerous studies on nutritional epidemiology, without garnering many firm or important conclusions.
The 600,000 articles — along with several tens of thousands of diet books — are the noise generated by a dysfunctional research establishment. Because the nutrition research community has failed to establish reliable, unambiguous knowledge about the environmental triggers of obesity and diabetes, it has opened the door to a diversity of opinions on the subject, of hypotheses about cause, cure and prevention, many of which cannot be refuted by the existing evidence. Everyone has a theory. The evidence doesn’t exist to say unequivocally who’s wrong.
Even the basic questions about foods and eating are not being answered with much rigor, including even the common-wisdom recommendations related to obesity, dietary abuses, and so on, are often on quite shaky ground.  It is a huge research establishment that has its hands on funding agencies and is not being held accountable for delivering actual goods commensurate with the public investment.

Why can't we figure out the relationship of nutrition to disease in so many cases?  Largely because we've got a reductionist science that is able to find causes of disease that have large effects -- smoking, asbestos, the cause of infectious diseases -- but lousy at explaining diseases that are due to gradual exposure to multiple interacting factors that takes place over decades.  And, when some people exposed to what looks like a risk factor -- obesity, say -- develop diabetes but others don't, or when lots of sugar in the diet seems to be associated with obesity in some people but not in others, our methods really fail us.  We see the exact parallel in genetics, as we've written many times.  Indeed, does DDT cause Alzheimer's disease, a result we blogged about just last week?

Also on Sunday, the BBC reported that vitamin C is an effective treatment for cancer.  How many times for how many decades do we still have to hear more of this-finally-is true conjecture about vitamin C?  It's been going on for many decades.  The current story appeared in a journal called Science Translational Medicine.  The very phrase' translational medicine' reflects our rather bourgeois industrialization of the research system with its business and status basis.  It is a cachet self-congratulatory catch-phrase that suggests that biomedical research in the past had no interest in preventing or combating disease.  It suggests that grants were not given by NIH for such purposes (of course, NIH does fund a lot research that's irrelevant to health), which is just plain silly.  Or, rather, as we often have suggested, an establishment's typical way of making itself sound salubrious to the taxpayer we milk for our careers.  So why is there even a Science Translational Medicine journal?  Because Nature has one?  Because there might be advertising or subscription gains to be made?  Because each science publisher has to keep up with the perceived proverbial Joneses?  Because the bloated professoriate clamors for ever-more status-sounding places to publish their work?

What we're doing in the biomedical and health research establishment is to a great extent ever more of the same kinds of studies only bigger and with more costly and sophisticated hard and software.  Anyone with a computer can get SurveyMonkey software and design a questionnaire, and if you've got a degree in public health you know how to hire a bunch of nurse-interviewers, phone-callers, data-base miners and the like, and send them out on the streets to do various sorts of random samples, test and quality-check questionnaires in a standard way, then increase samples, have lots of meetings and data-enterers, and after a few years start using push-button statistical software to pour out papers (and contact the Times and BBC 'science' journalists to trumpet your work).  And, every year or so, write new grants to follow up your important 'translational' research.

Is there anything new here?
The  state of play is well known, and well known at least to the thoughtful contingent of researchers in the game.  Taubes, who has been guilty of simplistic advocacy as even he acknowledges by confessing his personal preference for sugar as the one-size-fits-all evil, clearly identifies the nature of the problem.  He doesn't really offer a solution.

We can't either, but we do say, that what is not being done is making better use of our wet ware: our brainpower.  Taubes' story was on nutrition research, and we have been harping on the same issues with respect to genomics, and have also critiqued epidemiology (including nutritional epidemiology) in past posts.

There is no magic answer for ginning up real insight and creativity.  But there may be ways of numbing or mesmerizing the part of society that might produce creativity.  A huge factory-like establishment of drone workers who need the factory to keep spewing out 'product' may have just that effect.  Our own idea that part of what is needed is to at least make the soil -- the research environment ecology -- more likely to engender innovation.  That would mean to down-size, slow down, think more and return research to being more of a profession than an industry.  There are too many professors, too pressured to grind out too many papers or hustle too many grants to keep the administrators and careerists happy.  Too many administrators who, dependent on the cream, need to keep the factory humming.  Time to think, or re-think, or be inspired creatively by odd facts is hard to come by when the pressure is get grants or lose your job.

Better synthetic rather than narrowly technical education is needed, but we haven't been generating the kinds of people who can teach it.  Instead, we have trained a body of academic professors who have been brought up and entrained in, depend on, and hence perhaps can't see or can't afford to see what is actually happening. But despite at least some people pointing the problems out, there is nothing on the horizon that seems yet able to stimulate real change.

All of this is true.  It is also true that the problems are difficult, many if not most faculty have a sincere drive to help public or individual health, high technology is at least somewhat effective and more than just expensive showy toys, and administrators are needed for big, expensive systems.  The methods are often canned in ready-made software, but that doesn't make them wrong, even if we clearly are thinking wrongly in some way.  Until we're shown better, while we do act like fad-following sheep, we do that because that's what we, collectively, know how to do.  And until our jobs and self-esteem are not constructed around the impatient, short-term factory mentality, one cannot expect us to act very differently. 

We do, after all, need to eat, drink, and try to be merry, for tomorrow we die whether or not we like to accept that.