Showing posts with label assumptions. Show all posts
Showing posts with label assumptions. Show all posts

Tuesday, July 5, 2016

When scientific theory constrains

It's good from time to time to reflect on how we know what we think we know.  And to remember that, as it has been in any time in history, much of what we now think is true will sooner or later be found to be false or, often, only inaccurately or partially correct.  Some of this is because values change -- not so long ago homosexuality was considered to be an illness, e.g.  Some is because of new discoveries -- when archaea were first discovered they were thought to be exotic microbes that inhabited extreme environments but now they're known to live in all environments, even in and on us. And of course these are just two of countless examples.

But what we think we know can be influenced by our assumptions about what we think is true, too. It's all too easy to look at data and interpret it in a way that makes sense to us, even if there are multiple possible interpretations.  This can be a particular problem in social science, when we've got a favorite theory and the data can be seen to confirm it; this is perhaps easiest to notice if you yourself aren't wedded to any of the theories.  But it's also true in biology. It is understandable that we want to assert that we now know something, and are rewarded for insight and discoveries, rather than more humbly hesitating to make claims.

Charitable giving
The other day I was listening to the BBC Radio 4 program Analysis on the charitable impulse.  Why do people give to charity?  It turns out that a lot of psychological research has been done on this, to the point that charities are now able to manipulate us into giving.  If you call your favorite NPR station to donate during a fund drive, e.g., if you're told that the caller just before you gave a lot of money, you're more likely to make a larger donation than if you're told the previous caller pledged a small amount.

A 1931 advertisement for the British charity, Barnardo's Homes; Wikipedia

Or, if an advertisement pictures one child, and tells us the story of that one child, we're more likely to donate than if we're told about 30,000 needy children.  This works even if we're told the story of two children, one after the other.  But, according to one of the researchers, if we're shown two children at once, and told that if we give, the money will randomly go to just one of the children, we're less likely to give.  This researcher interpreted this to mean that two is too many.

But there seem to me to be other possible interpretations given that the experiment changes more than one variable.  Perhaps it's that we don't like the idea that someone else will choose who gets our money.  Or that we feel uncomfortable knowing that we've helped only one child when two are needy.  But surely something other than that two is too many, given that in 2004 so many people around the world donated so much money to organizations helping tsunami victims that many had to start turning down donations.  These were anonymous victims, in great numbers.  Though, as the program noted, people weren't nearly as generous to the great number of victims of the earthquake in Nepal in 2015, with no obvious explanation.

The researcher did seem to be wedded to his one vs too many interpretation, despite the contradictory data.  In fact, I would suggest that the methods, given what were presented, don't allow him to legitimately draw any conclusion.  Yet he readily did.

Thinness microbes?
The Food Programme on BBC Radio 4 is on to the microbiome in a big way.  Two recent episodes (here and here) explore the connection between gut microbes, food, and health and the program promises to update us as new understanding develops.  As we all know by now, the microbiome, the bug intimates that accompany us through life, in and on our body, may affect our health, our weight, our behavior, and perhaps much more.  Or not.


Pseudomonas aeruginosa, Enterococcus faecalis and Staphylococcus aureus on Tryptic Soy Agar.  Wikipedia

Obesity, asthma, atopy, periodontal health, rheumatoid arthritis, Parkinson's, Alzheimer's, autism, and many many more conditions have been linked with, or are suggested to be linked with, in one way or another, our microbiome.  Perhaps we're hosting the wrong microbes, or not a diverse enough set of microbes, or we wipe the good ones out with antibiotics along with the bad, or with alcohol, and what we eat may have a lot to do with this.

One of the researchers interviewed for the program was experimenting with a set of identical twins in Scotland.  He varied their diets having them eat, for example, lots of junk food and alcohol, or a very fibrous diet, and documented changes in their gut microbiomes which apparently can change pretty quickly with changes in diet.  The most diverse microbiome was associated with the high fiber diet. Researchers seem to feel that diversity is good.

Along with a lot of enthusiasm and hype, though, mostly what we've got in microbiome research so far is correlations.  Thin people tend to have a different set of microbes than obese people, and people with a given neurological disease might statistically share a specific subset of microbes.  But this tells us nothing about cause and effect -- which came first, the microbiome or the condition?  And because the microbiome can change quickly and often, how long and how consistently would an organism have to reside in our gut before it causes a disease?

There was some discussion of probiotics in the second program, the assumption being that controlling our microbiome affects our health.  Perhaps we'll soon have probiotic yogurt or kefir or even a pill that keeps us thin, or prevents Alzheimer's disease.  Indeed, this was the logical conclusion from all the preceding discussion.

But one of the researchers, inadvertently I think, suggested that perhaps this reductionist conclusion was unwarranted.  He cautioned that thinking about probiotic pills rather than lifestyle might be counterproductive.  But except for factors with large effects such as smoking, the effect of "lifestyle" on health is rarely obvious.  We know that poverty, for example, is associated with ill health, but it's not so easy to tease out how and why.  And, if the microbiome really does directly influence our health, as so many are promising, the only interesting relevant thing about lifestyle would be how it changes our microbiomic makeup.  Otherwise, we're talking about complexity, multiple factors with small effects -- genes, environmental factors, diet, and so on, and all bets about probiotics and "the thinness microbiome" are off.  But, the caution was, to my mind, an important warning about the problem of assuming we know what we think we know; in this case, that the microbiome is the ultimate cause of disease.

The problem of theory
These are just two examples of the problem of assumption-driven science. They are fairly trivial, but if you are primed to notice, you'll see it all around you. Social science research is essentially the interpretation of observational data from within a theoretical framework. Psychologists might interpret observations from the perspective of behavioral, or cognitive, or biological psychology, e.g., and anthropologists, at least historically, from, say, a functionalist or materialist or biological or post-modernist perspective. Even physicists interpret data based on whether they are string theorists or particle physicists.

And biologists' theoretical framework? I would suggest that two big assumptions that biologists make are reductionism and let's call it biological uniformitarianism. We believe we can reduce causation to a single factor, and we assume that we can extrapolate our findings from the mouse or zebrafish we're working on to other mice, fish and species, or from one or some people to all people. That is, we assume invariance rather than that what we can expect is variation. There is plenty of evidence to show that by now we should know better.

True, most biologists would probably say that evolutionary theory is their theoretical framework, and many would add that traits are here because they're adaptive, because of natural selection. Evolution does connect people to each other and people to other species, it has done so by working on differences, not replicated identity, and there is no rule for the nature or number of those differences or for extrapolating from one species or individual to another. We know nothing to contradict evolutionary theory, but that every trait is adaptive is an assumption, and a pervasive one.

Theory and assumption can guide us, but they can also improperly constrain how we think about our data, which is why it's good to remind ourselves from time to time to think about how we know what we think we know. As scientists we should always be challenging and testing our assumptions and theories, not depending on them to tell us that we're right.

Wednesday, March 16, 2016

The statistics of Promissory Science. Part I: Making non-sense with statistical methods

Statistics is a form of mathematics, a way devised by humans for representing abstract relationships. Mathematics comprises axiomatic systems, which make assumptions about basic units such as numbers; basic relationships like adding and subtracting; and rules of inference (deductive logic); and then elaborates these to draw conclusions that are typically too intricate to reason out in other less formal ways.  Mathematics is an awesomely powerful way of doing this abstract mental reasoning, but when applied to the real world it is only as true or precise as the correspondence between its assumptions and real-world entities or relationships. When that correspondence is high, mathematics is very precise indeed, a strong testament to the true orderliness of Nature.  But when the correspondence is not good, mathematical applications verge on fiction, and this occurs in many important applied areas of probability and statistics.

You can't drive without a license, but anyone with R or SAS can be a push-button scientist.  Anybody with a keyboard and some survey generating software can monkey around with asking people a bunch of questions and then 'analyze' the results. You can construct a complex, long, intricate, jargon-dense, expansive survey. You then choose who to subject to the survey--your 'sample'.  You can grace the results with the term 'data', implying true representation of the world, and be off and running.  Sample and survey designers may be intelligent, skilled, well-trained in survey design, and of wholly noble intent.  There's only one little problem: if the empirical fit is poor, much of what you do will be non-sense (and some of it nonsense).

Population sciences, including biomedical, evolutionary, social and political fields are experiencing an increasingly widely recognized crisis of credibility.  The fault is not in the statistical methods on which these fields heavily depend, but in the degree of fit (or not) to the assumptions--with the emphasis these days on the 'or not', and an often dismissal of the underlying issues in favor of a patina of technical, formalized results.  Every capable statistician knows this, but of course might be out of business if openly paying it enough attention. And many statisticians may be rather disinterested or too foggy in the philosophy of science to understand what goes beyond the methodological technicalities.  Jobs and journals depend on not being too self-critical.  And therein lie rather serious problems.

Promissory science
There is the problem of the problems--the problems we want to solve, such as in understanding the cause of disease so that we can do something about it.  When causal factors fit the assumptions, statistical or survey study methods work very well.  But when causation is far from fitting the assumptions, the impulse of the professional community seems mainly to increase the size, scale, cost, and duration of studies, rather than to slow down and rethink the question itself.  There may be plenty of careful attention paid to refining statistical design, but basically this stays safely within the boundaries of current methods and beliefs, and the need for research continuity.  It may be very understandable, because one can't just quickly uproot everything or order up deep new insights.  But it may be viewed as abuse of public trust as well as of the science itself.

The BBC Radio 4 program called More Or Less keeps a watchful eye on sociopolitical and scientific statistical claims, revealing what is really known (or not) about them.  Here is a recent installment on the efficacy (or believability, or neither) of dietary surveys.  And here is a FiveThirtyEight link to what was the basis of the podcast.

The promotion of statistical survey studies to assert fundamental discovery has been referred to as 'promissory science'.  We are barraged daily with promises that if we just invest in this or that Big Data study, we will put an end to all human ills.  It's a strategy, a tactic, and at least the top investigators are very well aware of it.  Big long-term studies are a way to secure reliable funding and to defer delivering on promises into the vague future.  The funding agencies, wanting to seem prudent and responsible to taxpayers with their resources, demand some 'societal impact' section on grant applications.  But there is in fact little if any accountability in this regard, so one can say they are essentially bureaucratic window-dressing exercises.

Promissory science is an old game, practiced since time immemorial by preachers.  It boils down to promising future bliss if you'll just pay up now.  We needn't be (totally) cynical about this.  When we set up a system that depends on public decisions about resources, we will get what we've got.  But having said that, let's take a look at what is a growing recognition of the problem, and some suggestions as to how to fix it--and whether even these are really the Emperor of promissory science dressed in less gaudy clothing.

A growing at least partial awareness
The problem of results that are announced by the media, journals, universities, and so on but that don't deliver the advertised promises is complex but widespread, in part because research has become so costly, that some warning sirens are sounding when it becomes clear that the promised goods are not being delivered.

One widely known issue is the lack of reporting of negative results, or their burial in minor journals. Drug-testing research is notorious for this under-reporting.  It's too bad because a negative result on a well-designed test is legitimately valuable and informative.  A concern, besides corporate secretiveness, is that if the cost is high, taxpayers or share-holders may tire of funding yet more negative studies.  Among other efforts, including by NIH, there is a formal attempt called AllTrials to rectify the under-reporting of drug trials, and this does seem at least to be thriving and growing if incomplete and not enforceable.  But this non-reporting problem has been written about so much that we won't deal with it here.

Instead, there is a different sort of problem.  The American Statistical Association has recently noted an important issue, which is the use and (often) misuse of p-values to support claims of identified  causation (we've written several posts in the past about these issues; search on 'p-value' if you're interested, and the post by Jim Wood is especially pertinent).  FiveThirtyEight has a good discussion of the p-value statement.

The usual interpretation is that p represents the probability that if there is in fact no causation by the test variable, that its apparent effect arose just by chance.  So if the observed p in a study is less than some arbitrary cutoff, such as 0.05, it means essentially that if no causation were involved the chance you'd see this association anyway is no greater than 5%; that is, there is some evidence for a causal connection.

Trashing p-values is becoming a new cottage industry!  Now JAMA is on the bandwagon, with an article that shows in a survey of biomedical literature from the past 25 years, including well over a million papers, a far disproportionate and increasing number of studies reported statistical significant results.  Here is the study on the JAMA web page, though it is not public domain yet.

Besides the apparent reporting bias, the JAMA study found that those papers generally failed to provide adequate fleshing out of that result.  Where are all the negative studies that statistical principles might expect to be found?  We don't see them, especially in the 'major' journals, as has been noted many times in recent years.  Just as importantly, authors often did not report confidence intervals or other measures of the degree of 'convincingness' that might illuminate the p-value. In a sense that means authors didn't say what range of effects is consistent with the data.  They report a non-random effect, but often didn't give the effect size, that is, say how large the effect was even assuming that effect was unusual enough to support a causal explanation. So, for example, a statistically significant increase of risk from 1% to 1.01% is trivial, even if one could accept all the assumptions of the sampling and analysis.

Another vocal critic of what's afoot is John Ionnides; in a recent article he levels both barrels against the misuse and mis- or over-representation of statistical results in biomedical sciences, including meta-analysis (the pooling of many diverse small studies into a single large analysis to gain sufficient statistical power to detect effects and test for their consistency).  This paper is a rant, but a well-deserved one, about how 'evidence-based' medicine has been 'hijacked' as he puts it.  The same must be said of  'precision genomic' or 'personalized' medicine, or 'Big Data', and other sorts of imitative sloganeering going on from many quarters who obviously see this sort of promissory science as what you have to do to get major funding.  We have set ourselves a professional trap, and it's hard to escape.  For example, the same author has been leading the charge against misrepresentative statistics for many years, and he and others have shown that the 'major' journals have in a sense the least reliable results in terms of their replicability.  But he's been raising these points in the same journals that he shows are culpable of the problem, rather than boycotting those journals.  We're in a trap!

These critiques of current statistical practice are the points getting most of the ink and e-ink.  There may be a lot of cover-ups of known issues, and even hypocrisy, in all of this, and perhaps more open or understandable tacit avoidance.  The industry (e.g., drug, statistics, and research equipment) has a vested interest in keeping the motor running.  Authors need to keep their careers on track.  And, in the fairest and non-political sense, the problems are severe.

But while these issues are real and must be openly addressed, I think the problems are much deeper. In a nutshell, I think they relate to the nature of mathematics relative to the real world, and the nature and importance of theory in science.  We'll discuss this tomorrow.

Wednesday, May 2, 2012

Metaphysics in science, Part IV. When causation is complex, what is it? Real or metaphysical?

This series has dealt with what it means to be scientific but not metaphysical, whether in a sense science has not really abandoned its ages-long flirtation with ideas imposed on the world rather than the world determining our ideas.

In previous posts we dealt with metaphysical notions like 'the human genome' or 'the globin gene', which do not really imply the actual existence of Platonic ideals, and serve mainly as pragmatic guides for our understanding of the world and the practice of science.

We then addressed why, whether, and how failure of data to replicate a theory should lead us to abandon it.  If it doesn't, then the theory is in a way shown to be a Platonic ideal assumed to be true rather than the kind of empirical truth we supposedly are seeking in science.  We mentioned a couple of examples in which Darwin held to his theory, correctly believing the overall evidence overriding mistaken notions of genes, but also imposed his theory on data as in making what amounted to 'progressive' theories of evolution when he studied barnacles.

Then we asked why GWAS and related omics, that did not find the expected, and promised, high accounting for important diseases, has not led to an abandonment of the underlying theory about major-gene causation, and whether that showed that for whatever set of reasons, metaphysics was driving our material assessment of the world of genetics and evolution.  These are all real and, we think, important issues that are rarely addressed by scientists. 

However, there are other issues that are important and we'd like to comment on two of them.  They are complexity, and statistical causation.  Here, we discuss the first of these two issues. 

Complexity and emergence: what are they?
So far in this series, we've considered rather simple theories:  A gene exists.  It codes for protein or its regulation in cells.  Antibiotic resistance results from genetic variants rising in frequency if they help the organism surmount the lethal challenge.

We've seen that these don't really seem to pose any serious systematic or fundamental threat to the notion that we can express our understanding of  the world in such abstract terms.

But what about when the theory gets more complex, when, say, many different factors interact to contribute to a single net result?  The net result, like a building, is sometimes called an 'emergent' phenomenon relative to the contributing components (bricks and steel beams).  That is, enumerating the components, or even studying them even down to the level of the atom, won't tell you much at all about the building itself.  What shape will it take?  How many stories will it be?  (We could estimate that by counting the bricks and beams, yes, but that won't be very precise.)  What's it going to be used for?  Who will use it?  Will the roof leak? 

Let's apply this to disease genetics.  Let's say that diabetes is the building, and many different genes and environmental factors the bricks and mortar.  We can't easily go forward or backward from here -- we can't reliably predict diabetes from the genes, and we certainly can't predict future environments, nor can we retrodict genes or environment knowing someone has diabetes.  In this instance, what kind of truth is an emergent phenomenon, relative to a material theory of the world?  Is it metaphysical in any way that should concern us?

If the result can't be predicted from the components, then more is going on than a list of those components.  The net result may be given a name, but this becomes more metaphysical than physical in some causal senses.  It's causally not so strictly utilitarian in the way 'the globin gene' guides us to study the instances of globin genes in actual people.

If every case of diabetes is due to a different set of causal factors, working and interacting in different ways in each instance, then diabetes is a different kind of reality, a somewhat metaphysical notion that exists independent of its assumed ordinary causality.  These are not just abstract philosophical questions, but in fact underlie our decisions about how to approach causation.  When our assumptions are unstated, and we don't think about why we're asking the scientific questions we ask, and designing the studies we design, our understanding of complex traits can easily become ensnared by their complexity, and this all becomes even more problematic if we assume we're looking at a simple trait. Even iron-clad ideas about causation, or the most appropriate uses of metaphysical convenience, can lapse into metaphysical vapor.

These are things you have to think about to grasp them, perhaps.  At least,  we do! If every instance is causally different so that we cannot enumerate the causes (because, for example, we need large samples or replications to show that they are really causes), then the emergent thing verges on a metaphysical ideal:  the trait may seem real enough, in our heads, but causally elusive in the world.  It is an assumption that it is a causally unitary....what?  It is too easy to assume its reality and force that onto assuming that if we but have big enough studies, or whatnot, we will be able to treat it by the usual reductionist methods (enumerating its causal bricks), when that may not be its reality as far as the current scientific method is concerned.  That is the 'emergence' problem, and we're not very good at solving it.  Instead, we wish it away through metaphysical ideals.

One strongly problematic aspect of all of this is related to, but goes far beyond, complexity and arbitrary agreed-on working definitions.  It is how we view probabilistic 'causation', to which we turn next.

Sunday, April 12, 2009

Never say die

Today's news has a story of a report that female mice continue to generate at least some new egg cells after they are born. Eggs, as well as heart and other muscle, and most types of neurons, were for a long time believed to be post-mitotic: that is, they could not be regenerated. But if various reports are accurate, all of these types of cells can regenerate at least to some extent.

If the exceptions to the once-held 'rules' are real, and not trivial, this can be potentially good news for the development of therapeutic approaches in which an individual's own cells could be used to generate lost or damaged cells. But it also raises some interesting basic scientific questions, too.

Most if not all of these results come from animal models. So why is it touted as good news for humans? If one is a fervent Darwinist, and thinks that the pressure of competition is always pushing species towards ever more specialized 'adaptive' states, then there is no reason to expect that human cells would behave the same way as those of laboratory models. But, if one believes that animal models, such as mice or chicks (much less flies and flatworms!) represent the human state, one is correspondingly less rigidly Darwinian.

The issue is a practical one. Regardless of one's views about natural selection, we know that our models are only approximate: but to what extent can we trust results from work with animal models? Many of us work with such models every day (in our case, with mice) and daily lab life can be very frustrating as a result, because it is easy to see that not even the animal models are internally consistent or invariant. But there is an even more profound issue here.

Whether we're working with animal models or taking some other approach, we design our research, and interpret our results, in light of what we accept ('believe'?) to be true. If what we accept is reasonably accurate, we can do our work without too much concern. But if our basic assumptions are far from the truth, we can be way off.

As we noted in another post, every scientist before today is wrong. We are always, to some extent, fishing in the dark. It's another frustration, when you build a study around a bunch of published papers, and then discover that they've all copied one another on basic assumptions and, like the drunk looking for his keys under the lamp-post, have been exploring the same territory in ever-increasing, but perhaps ever more trivial detail.

Yet in fact we can never know just where our assumptions may be wrong. On the other hand, we can't just design new experiments, presumably to advance knowledge beyond its current state, without in some sense building upon that state. How can one be freed of assumptions--such as interpreting data as if certain cell types cannot divide and replenish--and yet do useful research?

There is no easy answer, perhaps no answer at all except that we have to keep on plugging away unless or until we realize we're getting nowhere, or someone has a better, transformative idea. For the former, we're trapped in the research enterprise system that presses us to keep on going just to keep the millwheels turning. For the latter, we have to wait for a stroke of luck, and they may only come every century or more.