Tuesday, March 29, 2016

Statistical Reform.....or Safe-harbor Treadmill Science?

We have recently commented on the flap in statistics circles about the misleading use of significance test results (p-values) rather than a more complete and forthright presentation of the nature of the results and their importance (three posts, starting here).  There has been a lot of criticism of what boils down to misrepresentative headlines publicizing what are in essence very minor results.  The American Statistical Association recently published a statement about this, urging clearer presentation of results.  But one may ask about this and the practice in general. Our recent set of posts discussed the science.  But what about the science politics in all of this?

The ASA is a trade organization whose job it is, in essence, to advance the cause and use of statistical approaches in science.  The statistics industry is not a trivial one.  There are many companies who make and market statistical analytic software.  Then there are the statisticians themselves and their departments and jobs.  So one has to ask is the ASA statement and the other hand-wringing sincere and profound or, or to what extent, is this a vested interest protecting its interests?  Is it a matter of finding a safe harbor in a storm?

Statistical analysis can be very appropriate and sophisticated in science, but it is also easily mis- or over-applied.  Without it, it's fair to say that many academic and applied fields would be in deep trouble; sociopolitical sciences and many biomedical sciences as well fall into this category.  Without statistical methods to compare and contrast sampled groups, these areas rest on rather weak theory.  Statistical 'significance' can be used to mask what is really low level informativeness or low importance under a patina of very high quantitative sophistication.  Causation is the object of science, but statistical methods too often do little more than describe some particular sample.

When a problem arises, as here, there are several possible reactions.  One is to stop and realize that it's time for deeper thinking: that current theory, methods, or approaches are not adequately addressing the questions that are being asked.  Another reaction is to do public hand-wringing and say that what this shows is that our samples have been too small, or our presentations not clear enough, and we'll now reform.  

But if the effects being found are, as is the case in this controversy, typically very weak and hence not very important to society, then the enterprise and the promised reform seem rather hollow. The reform statements have had almost no component that suggests that re-thinking is what's in order. In that sense, what's going on is a stalling tactic, a circling of wagons, or perhaps worse, a manufactured excuse to demand even larger budgets and longer-term studies, that is to demand more--much more--of the same.

The treadmill problem

If that is what happens, it will keep scientists and software outfits and so on, on the same treadmill they've been on, that has led to the problem.  It will also be contrary to good science.  Good science should be forced by its 'negative' results, to re-think its questions. This is, in general, how major discoveries and theoretical transformations have occurred.  But with the corporatization of academic professions, both commercial and in the sense of trade-unions, we have an inertial factor that may actually impede real progress.  Of course, those dependent on the business will vigorously resist or resent such a suggestion. That's normal and can be expected, but it won't help unless a spirited attack on the problems at hand goes beyond more-of-the-same.




Is it going to simulate real new thinking, or mainly just strategized thinking for grants and so on?

So is the public worrying about this a holding action or a strategy? Or will we see real rather than just symbolic, pro forma, reform? The likelihood is not, based on the way things work these days.

There is a real bind here. Everyone depends on the treadmill and keeping it in operation. The labs need their funding and publication treadmills, because staff need jobs and professors need tenure and nice salaries. But if by far most findings in this arena are weak at best, then what journals will want to publish them? They have to publish something and keep their treadmill going. What news media will want to trumpet them, to feed their treadmill? How will professors keep their jobs or research-gear outfits sell their wares?

There is fault here, but it's widespread, a kind of silent conspiracy and not everyone is even aware of it. It's been built up gradually over the past few decades, like the frog in slowly heating water who does't realize he's about to be boiled alive. We wear the chains we've forged in our careers. It's not just a costly matter, and one of understandable careerism. It's a threat to the integrity of the enterprise itself.
We have known many researchers who have said they have to be committed to a genetic point of view because that's what you have to do to get funded, to keep your lab going, to get papers in the major journals or have a prominent influential career. One person applying for a gene mapping study to find even lesser genomic factors than the few that were already well-established said, when it was suggested that rather than find still more genes, perhaps the known genes might now be investigated instead, "But, mapping is what I do!".  Many a conversation I've heard is a quiet boasting about applying for funding for work that's already been done, so one can try something else (that's not being proposed for reviewers to judge).

If this sort of 'soft' dishonesty is part of the game (and if you think it's 'soft'), and yet science depends centrally on honesty, why do we think we can trust what's in the journals?  How many seriously negating details are not reported, or buried in huge 'supplemental' files, or not visible because of intricate data manipulation? Gaming the system undermines the very core of science: its integrity.  Laughing about gaming the system adds insult to injury.  But gaming the system is being taught to graduate students early in their careers (it's called 'grantsmanship').


We have personally encountered this sort of attitude, expressed only in private of course, again and again in the last couple of decades during which big studies and genetic studies have become the standard operating mode in universities, especially biomedical science (it's rife in other areas like space research, too, of course).  


There's no bitter personal axe being ground here.  I've retired, had plenty of funding through the laboratory years, our work was published and recognized.  The problem is of science not personal.  The challenge to understand genetics, development, causation and so forth is manifestly not an easy one, or these issues would not have arisen.  

It's only human, perhaps, given that the last couple of generations of scientists systematically built up an inflated research community, and the industries that serve it, much of which depends on research grant funding, largely at the public trough, with jobs and labs at stake.  The members of the profession know this, but are perhaps too deeply immersed to do anything major to change it, unless some sort of crisis forces that upon us. People well-heeled in the system don't like these thoughts being expressed, but all but the proverbial 1%-ers, cruising along just fine in elite schools with political clout and resources, know there's a problem and know they dare not say too much about it.


The statistical issues are not the cause.  The problem is a combination of the complexity of biological organisms as they have evolved, and the simplicity of human desires to understand (and not to get disease).  We are pressured not just to understand, but to translate that into dramatically better public and individual health.  Sometimes it works very well, but we naturally press the boundaries, as science should.  But in our current system we can't afford to be patient.  So, we're on a treadmill, but it's largely a treadmill of our own making.

Wednesday, March 23, 2016

Playing the Big Fiddle while Rome burns?

We've seemed to have forgotten the trust-busting era that was necessary to control monopolistic acquisition of resources.  That was over a century ago, and now we're again allowing already huge companies to merge and coalesce.  It's rationalized in various ways, naturally, by those on the gain.  It's the spirit and the power structure of our times, for whatever reason.  Maybe that explains why the same thing is happening in science as universities coo over their adoption of 'the business model'.

We're inundated in jargonized ways of advertising to co-opt research resources, with our  'omics' and 'Big Data' labeling.  Like it or not, this is how the system is working in our media and self-promotional age.  One is tempted to say that, as with old Nero, it may take a catastrophic fire to force us to change.  Unfortunately, that imagery is apparently quite wrong.  There were no fiddles in Nero's time, and if he did anything about the fire it was to help sponsor various relief efforts for those harmed by it.  But whatever imagery you want, our current obsession with scaling up to find more and more that explains less and less is obvious. Every generation has its resource competition games, always labeled as for some greater good, and this is how our particular game is played.  But there is a fire starting, and at least some have begun smelling the smoke.

Nero plucks away.  Sourcc: Wikipedia images, public domain
The smolder threatens to become an urgent fire, truly, and not just as a branding exercise.  It is a problem recognized not just by nay-saying cranks like us who object to how money is being burnt to support fiddling with more-of-the-same-not-much-new research.  It is an area where a major application of funds could have enormously positive impact on millions of people, and where causation seems to be quite tractable and understandable enough that you could even find it with a slide rule.

We refer to the serious, perhaps acute, problem with antibiotic resistance.  Different bugs are being discovered to be major threats, or to have evolved to become so, both for us and for the plants and animals who sacrifice their lives to feed us. Normal evolutionary dynamics, complemented with our agricultural practices, our population density and movement, and perhaps other aspects of our changing of local ecologies, is opening space for the spread of new or newly resistant pathogens.

This is a legitimate and perhaps imminent threat of a potentially catastrophic scale.  Such language is not an exercise in self-promotional rhetoric by those warning us of the problem. There is plenty of evidence that epidemic or even potentially pandemic shadows loom.  Ebola, zika, MRSA, persistent evolving malaria, and more should make the point and we have history to show that epidemic catastrophes can be very real indeed.

Addressing this problem rather than a lot of the wheel-spinning, money-burning activities now afoot in the medical sciences would be where properly constrained research warrants public investment.  The problem involves the ecology of the pathogens, our vulnerabilities as hosts, weaknesses in the current science, and problems in the economics of such things as antibacterial drugs or vaccinations.  These problems are tractable, with potentially huge benefit.

For a quick discussion, here is a link to a program by the statistical watchdog BBC Radio program MoreOrLess on antibiotic resistance  Of course there are many other papers and discussions as well.  We're caught between urgently increasing need, and the logistics, ecology, and economics that threaten to make the problem resistant to any easy fixes.

There's plenty of productive science that can be done that is targeted to individual causes that merit our attention, and for which technical solutions of the kind humans are so good at might be possible. We shouldn't wait to take antibiotic resistance seriously, but clearing away the logjam of resource commitments in genetic and epidemiological research to large weakly statistical efforts well into diminishing returns, or research based on rosy promises where we know there are few flowers, will not be easy...but we are in danger of fiddling around detecting risk factors with ever-decreasing effect sizes until the fire spreads to our doorsteps.

Tuesday, March 22, 2016

The statistics of Promissory Science. Part II: The problem may be much deeper than acknowledged

Yesterday, I discussed current issues related to statistical studies of things like genetic or other disease risk factors.  Recent discussion has criticized the misuse of statistical methods, including a statement on p-values by the American Statistical Association.  As many have said, the over-reliance on p-values can give a misleading sense that significance means importance of a tested risk factor.  Many touted claims are not replicated in subsequent studies, and analysis has shown this may preferentially apply to the 'major' journals.  Critics have suggested that p-values not be reported at all, or only if other information like confidence intervals (CIs) and risk factor effect sizes be included (I would say prominently included). Strict adherence will likely undermine what even expensive major studies can claim to have found, and it will become clear that many purported genetic, dietary, etc., risk factors are trivial, unimportant, or largely uninformative.

However, today I want to go farther, and question whether even making these correctives doesn't go far enough, and would perhaps serve as a convenient smokescreen for far more serious implications of the same issue. There is reason to believe the problem with statistical studies is more fundamental and broad than has been acknowledged.

Is reporting p-values really the problem?
Yesterday I said that statistical inference is only as good as the correspondence between the mathematical assumptions of the methods and what is being tested in the real world.  I think the issues at stake rest on a deep disparity between them.  Worse, we don't and often cannot know which assumptions are violated, or how seriously.  We can make guesses and do all auxiliary tests and the like, but as decades of experience in the social, behavioral, biomedical, epidemiological, and even evolutionary and ecological worlds show us, we typically have no serious way to check these things.

The problem is not just that significance is not the same as importance. A somewhat different problem with standard p-value cutoff criteria is that many of the studies in question involve many test variables, such as complex epidemiological investigations based on long questionnaires, or genomewide association studies (GWAS) of disease. Normally, p=0.05 means that by chance one test in 20 will seem to be significant, even if there's nothing causal going on in the data (e.g., if no genetic variant actually contributes to the trait).  If you do hundreds or even many thousands of 0.05 tests (e.g., of sequence variants across the genome), even if some of the variables really are causative, you'll get so many false positive results that follow-up will be impossible.  A standard way to avoid that is to correct for multiple testing by using only p-values that would be achieved by chance only once in 20 times of doing a whole multivariable (e.g., whole genome) scan.  That is a good, conservative approach, but means that to avoid a litter of weak, false positives, you only claim those 'hits' that pass that standard.

You know you're only accounting for a fraction of the truly causal elements you're searching for, but they're the litter of weakly associated variables that you're willing to ignore to identify the mostly likely true ones.  This is good conservative science, but if your problem is to understand the beach, you are forced to ignore all the sand, though you know it's there.  The beach cannot really be understood by noting its few detectable big stones.

Sandy beach; Wikipedia, Lewis Clark

But even this sensible play-it-conservative strategy has deeper problems.

How 'accurate' are even these preferred estimates?
The metrics like CIs and effect sizes that critics are properly insisting be (clearly) presented along with or instead of p-values face exactly the same issues as the p-value: the degree to which what is modeled fits the underlying mathematical assumptions on which test statistics rest.

To illustrate this point, the Pythagorean Theorem in plane geometry applies exactly and universally to right triangles. But in the real world there are no right triangles!  There are approximations to right triangles, and the value of the Theorem is that the more carefully we construct our triangle the closer the square of the hypotenuse is to the sum of the squares of the other sides.  If your result doesn't fit, then you know something is wrong and you have ideas of what to check (e.g., you might be on a curved surface).

Right triangle; Wikipedia

In our statistical study case, knowing an estimated effect size and how unusual it is seems to be meaningful, but we should ask how accurate these estimates are.  But that question often has almost no testable meaning: accurate relative to what?  If we were testing a truth derived from a rigorous causal theory, we could ask by how many decimal places our answers differ from that truth.  We could replicate samples and increase accuracy, because the signal to noise ratio would systematically improve.  Were that to fail, we would know something was amiss, in our theory or our instrumentation, and have ideas how to find out what that was.  But we are far, indeed unknowably far, from that situation.  That is because we don't have such an externally derived theory, no analog to the Pythagorean Theorem, in important areas where statistical study techniques are being used.

In the absence of adequate theory, we have to concoct a kind of data that rests almost entirely on internal comparison to reveal whether 'something' of interest (often that we don't or cannot specify) is going on.  We compare data such as cases vs controls, which forces us to make statistical assumptions such as that, other than (say) exposure to coffee, our sample of diseased vs normal subjects differ only in their coffee consumption, or that the distribution of other variation in unmeasured variables is random with regard to coffee consumption among our cases and controls subjects. This is one reason, for example, that even statistically significant correlation does not imply causation or importance. The underlying, often unstated assumptions are often impossible to evaluate. The same problem relates to replicability: for example, in genetics, you can't assume that some other population is the same as the population you first studied.   Failure to replicate in this situation does not undermine a first positive study.  For example, a result of a genetic study in Finland cannot be replicated properly elsewhere because there's only one Finland!  Even another study sample within Finland won't necessarily replicate the original sample.  In my opinion, the need for internally based comparison is the core problem, and a major reason why theory-poor fields often do so poorly.

The problem is subtle
When we compare cases and controls and insist on a study-wide 5% significance level to avoid a slew of false-positive associations, we know we're being conservative as described above, but at least those variables that do pass the adjusted test criterion are really causal with their effect strengths accurately estimated.  Right?  No!

When you do gobs of tests, some very weak causal factor may by good luck pass your test. But of those many contributing causal factors, the estimated effect size of the lucky one that passes the conservative test is something of a fluke.  The estimated effect size may well be inflated, as experience in follow-up studies often or even typically shows.

In this sense it's not just p-values that are the problem, and providing ancillary values like CIs and effect sizes in study reports is something of a false pretense of openness, because all of these values are vulnerable to similar problems.  The promise to require these other data is a stopgap, or even a strategy to avoid adequate scrutiny of the statistical inference enterprise itself.

It is nobody's fault if we don't have adequate theory.  The fault, dear Brutus, is in ourselves, for using Promissory Science, and feigning far deeper knowledge than we actually have.  We do that rather than come clean about the seriousness of the problems.  Perhaps we are reaching a point where the let-down from over-claiming is so common that the secret can't be kept in the bag, and the paying public may get restless.  Leaking out a few bits of recognition and promising reform is very different from letting all it all out and facing the problem bluntly and directly.  The core problem is not whether a reported association is strong or meaningful, but, more importantly, that we don't know or know how to know.

This can be seen in a different way.   If all studies including negative ones were reported in the literature, then it would be only right that the major journals should carry those findings that are most likely true, positive, and important.  That's the actionable knowledge we want, and a top journal is where the most important results should appear.  But the first occurrence of a finding, even if it turns out later to be a lucky fluke, is after all a new finding!  So shouldn't investigators report it, even though lots of other similar studies haven't yet been done?  That could take many years or, as in the example of Finnish studies, be impossible.  We should expect negative results should be far more numerous and less interesting in themselves, if we just tested every variable we could think of willy-nilly, but in fact we usually have at least some reason to look, so it is far from clear what fraction of negative results would undermine the traditional way of doing business.  Should we wait for years before publishing anything? That's not realistic.

If the big-name journals are still seen as the place to publish, and their every press conference and issue announcement is covered by the splashy press, why should they change?  Investigators may feel that if they don't stretch things to get into these journals, or just publish negative results, they'll be thought to have wasted their time or done poorly designed studies.  Besides normal human vanity, the risk is that they will not be able to get grants or tenure.  That feeling is the fault of the research, reputation, university, and granting systems, not the investigator.  Everyone knows the game we're playing. As it is, investigators and their labs have champagne celebrations when they get a paper in one of these journals, like winning a yacht race, which is a reflection of what one could call the bourgeois nature of the profession these days.

How serious is the problem?  Is it appropriate to characterize what's going on as fraud, hoax, or silent conspiracy?  Probably in some senses yes; at least there is certainly culpability among those who do understand the epistemological nature of statistics and their application.  Plow ahead anyway is not a legitimate response to fundamental problems.

When reality is closely enough approximated by statistical assumptions, causation can be identified, and we don't need to worry about the details.  Many biomedical and genetic, and probably even some sociological problems are like that.  The methods work very well in those cases.  But this doesn't gainsay the accusation that there is widespread over-claiming taking place and that the problem is a deep lack of sufficient theoretical understanding of our fields of interest, and a rush to do more of the same year after year.

It's all understandable, but it needs fixing.  To be properly addressed, an entrenched problem requires more criticism even than this one has been getting recently.  Until better approaches come along, we will continue wasting a lot of money in the rather socialistic support of research establishments that keep on doing science that has well-known problems.

Or maybe the problem isn't the statistics, after all?
The world really does, after all, seem to involve causation and at its basis seems to be law-like. There is truth to be discovered.  We know this because when causation is simple or strong enough to be really important, anyone can find it, so to speak, without big samples or costly gear and software. Under those conditions, numerous details that modify the effect are minor by comparison to the major signals.  Hundreds or even thousands of clear, mainly single-gene based disorders are known, for example.  What is needed is remediation, hard-core engineering to do something about the known causation.

However, these are not the areas where the p-value and related problems have arisen.  That happens when very large and SASsy studies seem to be needed, and the reason is that there causal factors are weak and/or so complex.  Along with trying to root out misrepresentation and failure to report the truth adequately, we should ask whether, perhaps, the results showing frustrating complexity are correct.

Maybe there is not a need for better theory after all.  In a sense the defining aspect of life is that it evolves not by the application of external forces as in physics, but by internal comparison--which is just what survey methods assess.  Life is the result of billions of years of differential reproduction, by chance and various forms of selection--that is, continual relative comparison by local natural circumstances.  'Differential' is the key word here.  It is the relative success among peers today that determines the genomes and their effects that will be here tomorrow.  In a way, in effect and if often unwittingly and for lack of better ideas, that's just the sort of comparison made in statistical studies.

From that point of view, the problem is that we don't want to face up to the resulting truth, which is that a plethora of changeable, individually trivial causal factors is what we find because that's what exists.  That we don't like that, don't report it cleanly, and want strong individual causation is our problem, not Nature's.

Wednesday, March 16, 2016

The statistics of Promissory Science. Part I: Making non-sense with statistical methods

Statistics is a form of mathematics, a way devised by humans for representing abstract relationships. Mathematics comprises axiomatic systems, which make assumptions about basic units such as numbers; basic relationships like adding and subtracting; and rules of inference (deductive logic); and then elaborates these to draw conclusions that are typically too intricate to reason out in other less formal ways.  Mathematics is an awesomely powerful way of doing this abstract mental reasoning, but when applied to the real world it is only as true or precise as the correspondence between its assumptions and real-world entities or relationships. When that correspondence is high, mathematics is very precise indeed, a strong testament to the true orderliness of Nature.  But when the correspondence is not good, mathematical applications verge on fiction, and this occurs in many important applied areas of probability and statistics.

You can't drive without a license, but anyone with R or SAS can be a push-button scientist.  Anybody with a keyboard and some survey generating software can monkey around with asking people a bunch of questions and then 'analyze' the results. You can construct a complex, long, intricate, jargon-dense, expansive survey. You then choose who to subject to the survey--your 'sample'.  You can grace the results with the term 'data', implying true representation of the world, and be off and running.  Sample and survey designers may be intelligent, skilled, well-trained in survey design, and of wholly noble intent.  There's only one little problem: if the empirical fit is poor, much of what you do will be non-sense (and some of it nonsense).

Population sciences, including biomedical, evolutionary, social and political fields are experiencing an increasingly widely recognized crisis of credibility.  The fault is not in the statistical methods on which these fields heavily depend, but in the degree of fit (or not) to the assumptions--with the emphasis these days on the 'or not', and an often dismissal of the underlying issues in favor of a patina of technical, formalized results.  Every capable statistician knows this, but of course might be out of business if openly paying it enough attention. And many statisticians may be rather disinterested or too foggy in the philosophy of science to understand what goes beyond the methodological technicalities.  Jobs and journals depend on not being too self-critical.  And therein lie rather serious problems.

Promissory science
There is the problem of the problems--the problems we want to solve, such as in understanding the cause of disease so that we can do something about it.  When causal factors fit the assumptions, statistical or survey study methods work very well.  But when causation is far from fitting the assumptions, the impulse of the professional community seems mainly to increase the size, scale, cost, and duration of studies, rather than to slow down and rethink the question itself.  There may be plenty of careful attention paid to refining statistical design, but basically this stays safely within the boundaries of current methods and beliefs, and the need for research continuity.  It may be very understandable, because one can't just quickly uproot everything or order up deep new insights.  But it may be viewed as abuse of public trust as well as of the science itself.

The BBC Radio 4 program called More Or Less keeps a watchful eye on sociopolitical and scientific statistical claims, revealing what is really known (or not) about them.  Here is a recent installment on the efficacy (or believability, or neither) of dietary surveys.  And here is a FiveThirtyEight link to what was the basis of the podcast.

The promotion of statistical survey studies to assert fundamental discovery has been referred to as 'promissory science'.  We are barraged daily with promises that if we just invest in this or that Big Data study, we will put an end to all human ills.  It's a strategy, a tactic, and at least the top investigators are very well aware of it.  Big long-term studies are a way to secure reliable funding and to defer delivering on promises into the vague future.  The funding agencies, wanting to seem prudent and responsible to taxpayers with their resources, demand some 'societal impact' section on grant applications.  But there is in fact little if any accountability in this regard, so one can say they are essentially bureaucratic window-dressing exercises.

Promissory science is an old game, practiced since time immemorial by preachers.  It boils down to promising future bliss if you'll just pay up now.  We needn't be (totally) cynical about this.  When we set up a system that depends on public decisions about resources, we will get what we've got.  But having said that, let's take a look at what is a growing recognition of the problem, and some suggestions as to how to fix it--and whether even these are really the Emperor of promissory science dressed in less gaudy clothing.

A growing at least partial awareness
The problem of results that are announced by the media, journals, universities, and so on but that don't deliver the advertised promises is complex but widespread, in part because research has become so costly, that some warning sirens are sounding when it becomes clear that the promised goods are not being delivered.

One widely known issue is the lack of reporting of negative results, or their burial in minor journals. Drug-testing research is notorious for this under-reporting.  It's too bad because a negative result on a well-designed test is legitimately valuable and informative.  A concern, besides corporate secretiveness, is that if the cost is high, taxpayers or share-holders may tire of funding yet more negative studies.  Among other efforts, including by NIH, there is a formal attempt called AllTrials to rectify the under-reporting of drug trials, and this does seem at least to be thriving and growing if incomplete and not enforceable.  But this non-reporting problem has been written about so much that we won't deal with it here.

Instead, there is a different sort of problem.  The American Statistical Association has recently noted an important issue, which is the use and (often) misuse of p-values to support claims of identified  causation (we've written several posts in the past about these issues; search on 'p-value' if you're interested, and the post by Jim Wood is especially pertinent).  FiveThirtyEight has a good discussion of the p-value statement.

The usual interpretation is that p represents the probability that if there is in fact no causation by the test variable, that its apparent effect arose just by chance.  So if the observed p in a study is less than some arbitrary cutoff, such as 0.05, it means essentially that if no causation were involved the chance you'd see this association anyway is no greater than 5%; that is, there is some evidence for a causal connection.

Trashing p-values is becoming a new cottage industry!  Now JAMA is on the bandwagon, with an article that shows in a survey of biomedical literature from the past 25 years, including well over a million papers, a far disproportionate and increasing number of studies reported statistical significant results.  Here is the study on the JAMA web page, though it is not public domain yet.

Besides the apparent reporting bias, the JAMA study found that those papers generally failed to provide adequate fleshing out of that result.  Where are all the negative studies that statistical principles might expect to be found?  We don't see them, especially in the 'major' journals, as has been noted many times in recent years.  Just as importantly, authors often did not report confidence intervals or other measures of the degree of 'convincingness' that might illuminate the p-value. In a sense that means authors didn't say what range of effects is consistent with the data.  They report a non-random effect, but often didn't give the effect size, that is, say how large the effect was even assuming that effect was unusual enough to support a causal explanation. So, for example, a statistically significant increase of risk from 1% to 1.01% is trivial, even if one could accept all the assumptions of the sampling and analysis.

Another vocal critic of what's afoot is John Ionnides; in a recent article he levels both barrels against the misuse and mis- or over-representation of statistical results in biomedical sciences, including meta-analysis (the pooling of many diverse small studies into a single large analysis to gain sufficient statistical power to detect effects and test for their consistency).  This paper is a rant, but a well-deserved one, about how 'evidence-based' medicine has been 'hijacked' as he puts it.  The same must be said of  'precision genomic' or 'personalized' medicine, or 'Big Data', and other sorts of imitative sloganeering going on from many quarters who obviously see this sort of promissory science as what you have to do to get major funding.  We have set ourselves a professional trap, and it's hard to escape.  For example, the same author has been leading the charge against misrepresentative statistics for many years, and he and others have shown that the 'major' journals have in a sense the least reliable results in terms of their replicability.  But he's been raising these points in the same journals that he shows are culpable of the problem, rather than boycotting those journals.  We're in a trap!

These critiques of current statistical practice are the points getting most of the ink and e-ink.  There may be a lot of cover-ups of known issues, and even hypocrisy, in all of this, and perhaps more open or understandable tacit avoidance.  The industry (e.g., drug, statistics, and research equipment) has a vested interest in keeping the motor running.  Authors need to keep their careers on track.  And, in the fairest and non-political sense, the problems are severe.

But while these issues are real and must be openly addressed, I think the problems are much deeper. In a nutshell, I think they relate to the nature of mathematics relative to the real world, and the nature and importance of theory in science.  We'll discuss this tomorrow.

Tuesday, March 15, 2016

Obesity and diabetes: Actual epigenetics or just IVF?

This press release that appeared in my newsfeed titled "You are what your parents ate!" caught my eye because I'm a new mom of a new human and also because I study and teach human evolution.

So I clicked on it.

And after that title primed me to think about me!, the photo further encouraged my assumption that this is really all about humans.


"You are what your parents ate!"

But it's about mice. Yes, evolution, I know, I know. We share common ancestry with mice which is why they can be good experimental models for understanding our own biology. But we have been evolving separately from mice for a combined total of over 100 million years. Evolution means we're similar, yes, but evolution also means we're different.

Bah. It's still fascinating, mice or men, womice or women! So I kept reading and learned how new mice made with IVF--that is, made of eggs and sperm from lab-induced obese and diabetic mouse parents, but born of healthy moms--inherited the metabolic troubles of their biological parents. And by inherited, we're not talking genetically, because these phenotypes are lab-induced. We're talking epigenetically. So the eggs and sperm did it, but not the genomes they carry!

This isn't so surprising if you've been following the burgeoning field of epigenetics, but it's hard to look away. This fits with how we see secular increases in human obesity and adult-onset diabetes--it can't be genomic evolution, it must be epigenetic evolution, whatever that means!

As the press release says...
"From the perspective of basic research, this study is so important because it proves for the first time that an acquired metabolic disorder can be passed on epigenetically to the offspring via oocytes and sperm- similar to the ideas of Lamarck and Darwin," said Professor ...
Whole new ways of thinking are so exciting.

Except when you remember a two-year-old piece by Bethany Brookshire (because you use it to teach a course on sex and reproduction) which explained something that suggests we may have a major experimental problem with the study above.

In IVF, the sperm gets isolated (or "washed") from the semen.

You know what happens, to mice in particular, when there's no semen? Obesity and other symptoms of metabolic syndrome! There are placental differences too. This was published in PNAS.


"Offspring of male mice without seminal fluid had bigger placentas (top right) and increased body fat (bottom right) compared with offspring of normal male mice (left images)" from The fluid part of semen plays a seminal role by Bethany Brookshire.

So I went back to look at the original paper that the press release with the donut lady was about. I wanted to see if they are aware of this potential problem with IVF and whether it explains their findings, rather than the trendy concept of epigenetics...

So even though they titled it "Epigenetic germline inheritance of diet-induced obesity and insulin resistance," I wanted to see if they at least accounted for this trouble with semen, like how it's probably important, how its absence may bring about the same phenotypes they're tracking, and how IVF doesn't use semen.

But I don't have access to Nature Genetics.

Who has access to Nature Genetics, can check out the paper, and wants to write the ending of this blog post?

Step right up! Post your work in the comments (or email me holly_dunsworth@uri.edu, and please include a pdf of the paper so I can see too) and I'll paste it right here.

Update 12:19 pm
Two very good comments below are helpful. Please read those.

I'll add that I now have the pdf of the paper (but not the Supplemental portion where all the methods live and other important information resides). This quote from the second paragraph implies they do not agree with the finding of (or have forgotten about) the phenotypic variation apparently caused by sperm washed of their seminal fluid:
"The use of IVF enabled us to ensure that any inherited phenotype was exclusively transmitted via gametes."
As the second commenter (Anonymous) pointed out below, there does not appear to be a comparison of development or behavior between any of the IVF mice and mice made by mouse sex. So there is no way to tell whether their IVF mice exhibit the same metabolic changes that the semen/semenless study found. Therefore, it is neither possible to work the semen issue into the explanation nor to rule out its effects. Seems like a missed opportunity.

Completely unrelated and inescapable... I'm a little curious about how the authors decided to visualize their data like this:


Tuesday, March 8, 2016

Murmurations and you

I have a doctorate in Public Health which means that, unlike a 'real doctor', I was trained to think in terms of the health of populations, not of specific individuals.  Public Health of course, when done appropriately, can have an enormous impact on the health of individuals, but in a very real way that's a side effect of gathering group information and instituting measures meant to affect a group.  Clean water, fluoridated water, vaccinations, window screens, anti-smoking campaigns, and so much more are all public health measures targeting whole populations, without regard for the specific cavities or cases of cholera or lung cancer that the measure will actually prevent.  This is because, of course, smoking doesn't make every smoker sick, just enough of them that aiming to convince whole populations not to smoke can have a large enough difference on population health that it's worth the cost and effort.

You've probably seen those murmuration videos showing enormous flocks of birds flying as if they were one; undulating, turning, responding as though they have a collective mind.  Here's one is of a flock of starlings being hunted by a peregrine falcon one evening in Rome. The starlings fly so unpredictably that, at least this time, the falcon is unable to catch a meal.


Source: BBC One

According to the Cornell Lab of Ornithology, murmurations almost always arise in response to the detection of a predator; a falcon or a hawk that has come for its dinner, as the starlings in Rome.  So, a bird or birds detect the predator and sound the alarm, which triggers the whole flock to take off. But, how do they stay together?  Who decides where they're going next, and how does the rest of the flock get the message?

Young et al. report, in a 2013 paper in PLOS Computational Biology, that once in flight each bird is noticing and responding to the behavior only of its seven nearest neighbors.  The murmuration, the movement of the group, then, is due to local responses that create the waves of motion that can be seen in the evening sky.  There is no single leader, just many, many local responses happening almost simultaneously.

The same kinds of dynamics explain the movements of schools of fish as well.  They work to some extent, but fish are routinely attacked by sharks, which can scoop up multiple individuals at a time, and surely sometimes birds of prey manage to snap up a luckless bird among the thousands or millions in a flock.  But, most of the fish or the birds do get away, so it's a winning strategy for the group.  Public Health in action.

Well-known, very prolific British epidemiologist George Davey Smith was interviewed on the BBC Radio 4 program The Life Scientific not long ago.  He's a medical doctor with a degree in Public Health as well, so he's been trained to think in terms of both the population and the individual.  He is currently interested in what genes can tell us about environmental influences on health.  One of his contributions to this question is the analytical tool called Mendelian Randomization, which aims to tease out environmental triggers of a trait given a particular genetic risk factor.  That is, the idea is to divide a study sample into individuals with and without a particular genetic variant, to determine whether their history of exposure to an apparent risk factor might be responsible for the disease.  In this instance, the gene isn't modifiable, but exposure might be.

In the interview, Davey Smith said that his primary interest is in population health, and that if a Public Health measure can reduce incidence of disease, he's happy.  So, if everyone in a population is on statins, say, and that reduces heart disease and stroke without major side effects, he would consider that a successful Public Health measure.  Even if it's impossible to know just who's stroke or heart attack was prevented.  Success of Public Health can only be evaluated on the population, not the individual level.

So much for personalized, predictive medicine.  That's fine, my training is in Public Health, too, so I'm ok with that.  Except that Davey Smith is also a fan of large, longitudinal studies maintained in perpetuity because, as he said, they have yielded more results at lower cost than most any other kind of epidemiological study.

But there are problems with such studies, and if the idea is to identify modifiable environmental risk factors, a major problem is that these studies are always retrospective.  And, as we've written here so often, future environments are not predictable in principle.  Presumably the aim of these large studies is to use Big Data to determine which Public Health measures are required to reduce risk of which diseases, and if that is done -- so that large segments of the population are put on statins or change from saturated to unsaturated fats or start to exercise or quit smoking -- this changes environmental exposures, and thus the suite of diseases that people are then at risk of.

So, Public Health has to always be playing catch up.  Controlling infectious diseases can be said to have been a cause of the increase in cancer and obesity and heart disease and stroke, by increasing the number of people who avoided infectious disease to live to be at risk of these later diseases.  So, in that sense, putting whole populations on statins is going to cause the next wave of diseases that will kill most of us, even if we don't yet know what these diseases will be.  Maybe even infectious diseases we currently know nothing about.

Even though, after putting their favored Public Health measure into effect, all the starlings outwitted the falcon that particular night in Rome, they're all eventually going to die of something.

Friday, March 4, 2016

When evolutionary-minded medicine gets it (possibly) wrong about childbirth interventions

No one is saying that medicine isn't brilliant and hasn't saved lives. But it does intervene more than necessary when it comes to pregnancy and childbirth.

Part of that unnecessary intervention is driven by lack of experience. Part is an economically-driven disrespect for time. (Give childbirth some motherlovin' time.) Another part, related very much to experience, is how difficult it is to decide when intervention is and isn't necessary, especially when things are heating up. But another part of the trouble actually lies in the evolutionary perspective. Unfortunately it's not all rainbows and unicorns when M.D.s embrace evolution. Instead, evolutionary thinking is biasing some medical professionals into believing that, for example, birth by surgical caesarean is an "evolutionary imperative."

Here's one recent example in The American Journal of Obstetrics & Gynecology of how the evolutionary perspective is (mis)guiding arguments for increased medical intervention in childbirth.

link to paper
It's a fairly straight-forward study of over 22,000 birth records at a hospital in Jerusalem. The authors ask whether birth weight (BW) or head circumference (HC) is more of a driver of childbirth interventions (instrumental delivery and unplanned caesareans) than the other. Of course, the focus is on the biggest babies with the biggest heads causing all the trouble, so the authors narrow the data down to the 95th percentile for both. Presumably they're asking this question about BW and HC because both can be estimated with prenatal screening. So there's the hope of improving delivery outcomes here. And, of course, the reason they ask whether head size or body mass is more of a problem is because of evolution. They anticipate that they'll discover that heads are a bigger problem than bodies because of the well-known "obstetrical dilemma" (OD) hypothesis in anthropology.

OD thinking goes like this: Big heads and small birth canals are adaptive for our species' cognition and locomotion, respectively, but the two traits cause a problem at birth, which is not only difficult but results in our species' peculiar brand of useless babies. (But see and see.)

So, since we're on the OD train, it's no surprise when we read how the authors demonstrate and, thus, conclude that indeed HC (head circumference) is more strongly associated with childbirth interventions than BW (birth weight), at least when we're up in the 95th percentile of BW and HC. Okay.

They use this finding to advocate for prenatal estimation of head size to prepare for any difficulties a mother and her fetus may be facing soon. Okay.

Sounds good. Sounds really good if you support healthy moms and babies. But it also sounds really good if you already see these risks to childbirth through the lens of the "obstetrical dilemma" with that OD thinking helping you to support "the evolutionary imperative" of the c-section. Okay.

Too many "Okays" you're thinking? You're right. There's a catch.

When you dig into the paper you see that "large HC" heads are usually about an inch (~ 2.5 cm) greater in circumference than "normal HC" ones. (Nevermind that we chopped up a continuum of quantitative variation to put heads in arbitrary categories for statistical analysis.) And when you calculate the head diameter based on the head circumference, there is less than 1 cm difference between "large" and "normal" neonatal heads in diameter. That doesn't seem like a whole lot considering how women's bony pelvic dimensions can vary more than that.  Still, these data suggest that the difference between a  relatively low risk of having a c-section and a relatively high risk of having a c-section amounts to less than a centimeter in fetal head diameter. And maybe it does. Nobody's saying that big heads aren't a major problem sometimes! But maybe there's something else to consider that the paper absolutely didn't.

Neonatal heads get squeezed and molded into interesting shapes in the birth canal.

The data say that normal HC babies get born vaginally more often than large HC ones. But this is based on the head measures of babies who are already born! If we're pitting head circumference (HC) of babies plucked from the uterus against the HC of babies who've been through hello! then of course the vaginally delivered ones could have smaller HCs.

C-sected babies tend to have rounder heads than the ones squeezed by the birth canal. It's impossible to know but I'm fairly confident about this, at least for a subsample of a population: Birth the same baby from the same mother both ways, vaginally and surgically, and its head after c-section will have a larger HC than its squeezed conehead will after natural birth.


Measuring newborn head circumference (HC). source
When we're talking about roughly 2.5 cm difference in circumference or less than 1 cm difference in diameter, then I'd say it's possible that neonatal cranial plasticity is mucking up these data; we're sending c-sected babies over into the "large HC" part of the story just because they were c-sected in the first place. So without accounting for this phenomenon, the claim that large head circumference is more of a cause of birth intervention, of unplanned c-sections, than large body mass isn't as believable.

If these thoughts about neonatal cranial molding are worthwhile, then here we have a seemingly useful and very high-profile professional study, grounded in the popular but deeply flawed obstetrical dilemma hypothesis, that is arguing for medical intervention in childbirth based solely on the difference in head size measures induced by those very medical interventions. 

The circle of life!


Thursday, March 3, 2016

Humans are master meaning generators

A hashtag in the sky above a school at dusk in southern Rhode Island.  
Was it put there intentionally? What does it mean?

For as long as we’ve been writing about exquisite Paleolithic cave paintings and carefully crafted Stone Age tools we’ve been debating their meanings.  And the debate carries on because meaning is difficult to interpret and that’s largely because “what does it mean?” is a loaded question.

“Meaning” is a hallmark of humanity and, as the thinking often goes, it is a unique aspect of Homo sapiens. No other species is discussing meaning with us. We’re alone here. So we’re supposed to be at least mildly shocked when we learn that Neanderthals decorated their bodies with eagle talons. And it’s supposed to be even harder to fathom that Neanderthals marked symbolic thinking on cave walls. But such is the implication of lines marked by Neanderthals in the shape of a hashtag at Gibraltar

source: "The Gibraltar Museum says scratched patterns found in the Gorham’s Cave, in Gibraltar, are believed to be more than 39,000 years old, dating back to the times of the Neanderthals. Credit: EPA/Stewart Finlayson"
This sort of meaningful behavior, combined with the fact that many of us are harboring parts of the Neanderthal genome, encourages us to stop seeing Neanderthals as separate from us. But another interpretation of the hashtag is one of mere doodling; its maker was not permanently and intentionally scarring the rock with meaning. These opposing perspectives on meaning, whether it’s there or not, clash when it comes to chimpanzee behavior as well.  

We’ve grown comfortable with the ever-lengthening list of chimpanzee tool use and tool-making skills that researchers are reporting back to us. But a newly published chimpanzee behavior has humans scratching their heads. Chimpanzees in West Africa fling stones at trees and hollow tree trunks. The stones pile up in and around the trees, looking like a human-made cairn (intentional landmark) in some cases.  Males are most often the throwers, pant-hooting as they go, which is a well-known score to various interludes of chimpanzee social behavior. 

source: "Mysterious stone piles under trees are the work of chimpanzees.© MPI-EVA PanAf/Chimbo Foundation"
Until now, chimp behaviors that employ nature’s raw materials—stones, logs, branches, twigs, leaves—have been easy to peg as being “for” a reason. They’re for cracking open nutritious nuts, for stabbing tasty bushbabies (small nocturnal primates), or for termite fishing. But throwing stones at trees has nothing to do with food. If these chimps do it for a reason then it’s a little more esoteric. 

Maybe they do it for pleasure, to let off steam, or to display, or maybe they do it because someone else did it. It may be all of those things at once, and maybe so much more. Maybe you’d call that ritual. Maybe you wouldn’t. Maybe you’d say that they do it because that’s what chimps do in those groups: they walk on their knuckles; they eat certain foods; they make certain sounds; they sleep in certain terms in certain trees; and they do certain things with rocks, like fling them in certain places. Maybe we could just say that this behavior is the way of certain chimpanzees, hardly more mystifying than other behaviors that we’ve come to expect of them.

For comparison, I have certain ways. There are piles of books near my desk. They pile up on tables and shelves. I could fling books on the floor but I don’t. I’m not against flinging them on the floor; it’s just not how things are usually done. I share this behavior with many other, but not all, humans in the presence of books, tables, and shelves.  Until I wrote this paragraph, I never gave it much thought, it’s not something that factors even remotely into how I see the world or my place in it, and yet the piling of books on tables and shelves is quite a conspicuous and, therefore, large part of my daily life.
  
So, why isn’t someone setting up a camera trap in my office and writing up “human accumulative book piling” in Nature? Because this type of behavior, whatever it means, is quintessentially human. No one could claim to discover it in a prestigious publication unless they discovered it in a nonhuman. And they did.

Normally what we do when we learn something new about chimpanzee behavior is we end up crossing one more thing off our list of uniquely human traits. “Man the tool-maker” was nixed decades ago. What should we cross off the list now with this new chimp discovery? Would it be “ritual” and by extension “meaning,” or would it be “piling up stuff”? About that Neanderthal hashtag, do we cross off “art” or “symbolism” and by extension “meaning,” or would we just cross off “doodling,” which holds a quite different meaning? Rather than crossing anything off our list, do we welcome Neanderthals into our kind so we can keep our monopoly on hashtags? Whatever we decide, case by case, trait by trait, we usually interpret our shrinking list of uniquely human traits to be clear demonstration that other animals are becoming more human-like the more we learn about the world.

That’s certainly one way to see it.  But there’s another, more existential, and therefore, arguably, more human way to look at that shrinking list of uniquely human traits: Humans are becoming less human-like the more we learn about the world.

#WhatDoesThatEvenMean #PantHoot #Hashtag #ThisIsMyCaveWall