Wednesday, November 28, 2018

Induction-deduction, and replicability: is there any difference?

In what sense--what scientific sense--does the future resemble the past?  Or perhaps, to what extent does it?  Can we know?  If we can't, then what credence for future prediction can we give to results of studies today, necessarily from the past experience of current samples?  Similarly, in what sense can we extrapolate findings on this sample to some other sample or population?  If these questions are not easily answerable (indeed if they are answerable at all!), then much of current, and currently very widespread and expensive science, is at best of unclear, questionable value.

We can look at these issues in terms of a couple of standard aspects of science: the relationship between induction and deduction; and the idea of replicability.  Induction and deduction basically come from the Enlightenment time in western history, when it was found in a formal sense that the world of western science--which at that time meant physical science--followed universal 'laws' of Nature.  At that time, life itself was generally excluded from this view, not least because it was believed to be the result of ad hoc creation events by God.

The induction--deduction problem
Some terminology:  I will make an important distinction between two terms.  By induction I mean drawing a conclusion from specific observed data (e.g., estimating some presumed causal parameter's value).  Essentially, this means inferring a conclusion from the past, from events that have already occurred. But often what we want to do is to predict the future.  We do that, often implicitly, by equating observed past values as estimates of causal parameters, that apply generally and therefore to the future; I refer to that predictive process, derived from observed data, as deduction.  So, for example, if I flip a coin 10 times and get 5 Heads, I assume that this is somehow built into the very nature of coin-flipping so that the probability of Heads on any future flip is 0.5 (50%).

If we can assume that induction implies deduction, then what we observe in our present or past observations will persist so that we can predict it in the future.  In a law-like universe, if we are sampling properly, this will occur and we generally assume this means with complete precision if we had perfect measurement (here I speculate, but I think that quantum phenomena at the appropriate scale have the same universally parametric properties).

Promises like 'precision genomic medicine', which I think amount to culpably public deceptions, effectively equate induction with deduction: we observe some genomic elements associated in some statistical way with some outcome, and assume that the same genome scores will similarly predict the future of people decades from now.  There is no serious justification for this assumption at present, nor quantification of by how much there might be errors in assuming the predictive power of past observations, in part because mutations and lifestyle clearly have major effects, but especially because these are unpredictable--even in principle.  Indeed, there is another, much deeper problem of a similar kind, that has gotten recent--but to me often quite naive attention: replicability.

The replicability problem
Studies, perhaps especially in social and behavioral fields, report findings that others cannot replicate.  This is being interpreted as suggesting that (ignoring the rare outright fraud), there is some problem with our decision-making criteria, other forms of bias, or poor study designs.  Otherwise, shouldn't studies of the same question agree?  There has been a call for the investigators involved to improve their statistical analysis (i.e., keep buying the same software!! but use it better), report negative results, and so on.

But this is potentially, and I think fundamentally, naive.  It assumes that such study results should be replicable.  It assumes, as I would put it, that at the level of interest, life = physics.  This is, I believe not just wrong but fundamentally so.

The assumption of replicability is not really different from equating induction to deduction, except in some subtle way applied to a more diverse set of conditions.  Induction of genomic-based disease risk is done on a population like, say, case-control samples, and then applied to the same population in terms of its current members' future disease risks.  But we know very well that different genotypes are found in different populations, so it is not clear what degree of predictability we should, or can, assume.

Replicability is similar except that in general a result is assumed to apply across populations or samples, not just to the same sample's future.  That is, I think, an even broader assumption than the genomics-precision promise that does, at least nominally, now recognize population differences.

The real, the deeper problem is that we have absolutely no reason to expect any particular degree of replicability between samples for these kinds of things.  Evolution is about variation, locally responsive and temporary, and that applies to social behavior as well.  We know that 'distance' or difference accumulates (generally) gradually over time and separation as a property of cultural as well as biological evolution.  The same obviously applies even more to psychological and sociological samples and inferences from them.

Not only is it silly to think that samples of, say, this year's college seniors at X University will respond to questionnaires in the same way as samples of some other class or university or beyond.  Of course, college students come cheap to researchers, and they're convenient.  But they are not 'representative' in the replicability sense except by some sort of rather profound assumption.  This is obvious, yet it is a tacit concept of very much research (biological, psychological, and sociological).

Even social scientists acknowledge the local and temporary nature of many of the things they investigate, because the latter are affected by cultural and historical patterns, fads, fashions, and so much more.  Indeed, the idea of replicability is to me curious to begin with.  Thus, a study that fails to replicate some other study may not reflect failings in either, and the idea that we should replicate in this kind of way is a carryover of physics envy.  Perhaps in many situations, a replication result is what should be examined most closely! The social and even biological realms are simply not as 'Newtonian', or law-like, as is the real physical realm in which our notions of science--especially the very idea of a law-like replicability, arose. Not only is failure to replicate not necessarily suspect at all, but replicability should not generally be assumed.  Or, put an other way, a claim that replicability is to be expected is a strong claim about Nature that requires very strong evidence!

This raises the very deep problem that in the absence of replicability assumptions, we don't know what to expect of the next study, after we've done the first.....or is this a justification for just keeping the same studies going (and funded) indefinitely?  That's of course the very rewarding game being played in genomics.

Monday, November 19, 2018

It is unethical to teach evolution, no matter the organism, without confronting racism and sexism

People say we’re the storytelling ape. I hear that. Though conjuring fiction is beyond me, and I only remember the worst punchlines, I love trading stories and so do you. Storytelling is a definitively human trait. But if stories make us human, what went wrong with the mother of them all?

Human origins should be universally cherished but it’s not even universally known. It just doesn’t appeal to most people. This goes far beyond religion. Human evolution hasn’t caught on despite it being over 150 years old.  Where it has, it’s subversive or offensive. We have a problem. How could my life be subversive or offensive. How could yours?

Whether or not we evolved to tell stories, the one about where we came from should be beloved, near and dear to our hearts, not cold, clinical, and pedantic, not repulsive or embarrassing, not controversial, racist, sexist and anti-theist, not merely “survival of the fittest,” end of story, not something that only pertains to the world’s champions of wealth or babymaking. We deserve so much better. We deserve a sprawling, heart-thumping, face-melting epic, inspiring its routine telling and retelling. It’s time for a human evolution that’s fit for all humankind.

Such a human evolution requires a new narrative, both hyper-sensitive to the power of narrative and rooted in science that is light years ahead of Victorian dogma. This is the antidote to a long history of weaponizing human nature against ourselves. Our 45th president credits the survival-of-the-fittest brand of human evolution for his success over less kick-ass men in business and in bed. Pick-up artists and men’s rights activists, inspired by personalities like Jordan Peterson, use mistaken evolutionary thinking to justify their sexism and misogyny. Genetic and biological determinism have a stranglehold on the popular imagination, where evolution is frequently invoked to excuse inequity, like in the notorious Google Memo. Public intellectuals like David Brooks and Jon Haidt root what seems like every single observation of 2018 in tropes from Descent of Man. And there's the White House memo that unscientifically defines biological sex. Evolution is all wrapped up in white supremacy and a genetically-destined patriarchy.  This is not evolution. And this is not my evolution. I know you're nodding your head along with  me.

Without alternative perspectives, who can blame so many folks for out-right avoiding evolutionary thinking? We must lift the undeserved stigma on our species' origins story and rip it away from those who would perpetuate its abuses.


It took me a while to get to this point, to have this view that I wish I'd had from the very beginning. No one should feel defensive in reaction to my opinion, which is...

Evolution educatorseven if sticking to E.coli, fruit flies, or sticklebacksmust confront the ways that evolutionary science has implicitly undergirded and explicitly promoted, or has naively inspired so many racist, sexist, and otherwise harmful beliefs and actions. We can no longer arm students with the ideas that have had harmful sociocultural consequences without addressing them explicitly, because our failure to do so effectively is the primary reason these horrible consequences exist. The worst of all being a human origins that refuses humanity.

Make this history ancient history. We've waited too long.  (image: Marks, 2012)

So many of us are still thinking and teaching from the charged tradition of demonstrating that evolution is true. Thanks to everyone's hard work, it is undeniably true. Now we must go beyond this habit of reacting to creationism and instead react to a problem that is just as old but is far more urgent because it actually affects human well-being.

Bad evolutionary thinking and its siblings, genetic determinism and genetic essentialism, are used to justify civil rights restrictions, human rights violations, white supremacy, and the patriarchy. And as a result, evolution is avoided and unclaimed by scholars, students, and their communities who know this all too well.

In Why be against Darwin? Creationism, racism, and the roots of anthropology,* Jon Marks explains how early anthropologists, in the immediate wake of Darwin's ideas, faced a dilemma. If they were to continue as if there were a "psychic unity of (hu)mankind" then they felt compelled to reject an evolution which was being championed by some influential scientific racists. Marks writes, "So either you challenge the authority of the speaker to speak for Darwinism or you reject the program of Darwinism." Anyone who knows someone who's not a fan of evolution, knows that the latter option is a favorite still today. And it's not  creationism and it's not science denial. It's the rejection of what we know to be an outdated and tainted notion of evolution. No one can update and clean up evolution as powerfully as we can if we do it ourselves, right there, in the classroom.

We are teaching more and more people evolution which may be exciting but only if we are equally as energetic in our confrontation of its sordid past. I can say this without attracting any indignation (right?) because of the fact that evolution has a sordid present.

Let's put that to an end.

Here I offer some general suggestions for how to do that and I'm speaking to all of us, whether we teach  a course dedicated to human origins and evolution, whether we teach a course dedicated to evolution and only cover humans for part of it, whether we teach a course dedicated to evolution but exclude humans entirely... because we all have to actively fix this. Learners will apply evolutionary thinking to humans, whether or not your focal organisms are human. Making rules in one domain and transferring them to new ones is humanity's jam. Eugenics is proof that our jam can go rancid.

And while we're actively disassociating the reality of evolution (which is just a synonym for 'nature' and for 'biology') from all the shitty things humans do in its name, we can help make it more personal as we all deserve our origins story to be. We deserve a human origins we can embrace.

Model that personal satisfaction in thinking evolutionarily about your own life. Don't be afraid to bring the humanities into your evolution courses.

Choose examples and activities focused on the evolution of the human body or focused on the unity of the species. Go there if you don't already.  Here are some awesome lesson plans: 

Guide students in composing scientifically sourced and scientifically sound origins stories for their favorite things in life, like their friends or pizza (maybe by tracking down the origins of wheat, lactase persistence, cooking, teeth, or even way back to the first eaters of anything at all).

For actively dismantling evolution's racist/etc past and present, may I suggest checking out and maybe assigning (+ the Marks article linked above):

10 Facts about human variation by Marks

Is Science Racist? by Marks

Racing around, getting nowhere* by Weiss (fellow mermaid) and Fullerton

A Dangerous Idea: Eugenics and the American Dream (film)

If you are feeling under-prepared or uncomfortable going beyond biology in your course, find a colleague who can help out or do it entirely for you. If they're on campus, pick their brains about assignments or activities, or ask them for a guest lecture.  If they're not on campus, invite them to campus or connect them to your classroom via Skype. There are all stripes of anthropologists (and there are also historians) who are comfortable and more than  happily willing to help you cover evolution as it should be, which is to explicitly include its sociocultural context and consequences.

*This article is open access but if for some reason you still cannot access it, just email me at and I will send you the pdf.

Additional Resources of Relevance...

There's no such thing as a 'pure' European—or anyone else – Gibbons (Science)

A lot of Southern whites are a little bit black – Ingraham (Washington Post)

From the Belgian Congo to the Bronx Zoo (NPR)

A True and Faithful Account of Mr. Ota Benga the Pygmy, Written by M. Berman, Zookeeper – Mansbach

In the Name of Darwin – Kevles (PBS)

Are humans hard-wired for racial prejudice?  - Sapolsky (LA Times)

How to write about Africa – Wainaina (Granta)
Colonialism and narratives of human origins in Asia and Africa— Athreya and Ackerman
Frederick Douglass’s fight against scientific racism – Herschthal (NYT)
The unwelcome revival of race science—Evans  (The Guardian)
#WakandanSTEM: Teaching the evolution of skin color—Lasisi
For Decades, Our Coverage Was Racist. To Rise Above Our Past, We Must Acknowledge It: We asked a preeminent historian to investigate our coverage of people of color in the U.S. and abroad. Here’s what he found—Goldberg (NatGeo)
There’s No Scientific Basis for Race—It's a Made-Up Label: It's been used to define and separate people for millennia. But the concept of race is not grounded in genetic—Kolbert (NatGeo)
Why America’s Black Mothers and Babies Are in a Life-or-Death Crisis - Villarosa (The New York Times)
The labor of racism –Davis (Anthrodendum)
Being black in America can be hazardous to your health – Khazan (The Atlantic)
White People Are Noticing Something New: Their Own Whiteness—Bazelon (The New York Times)
Ancestry Tests Pose a Threat to Our Social Fabric: Commercial DNA testing isn’t just harmless entertainment. It’s keeping alive ideas that deserve to die – Terrell (Sapiens)
Surprise! Africans are not all the same (or why we need diversity in science) – Lasisi
Why white supremacists are chugging milk (and why geneticists are alarmed) – Harmon (NYT)
Everyday discrimination raises womens blood pressure – Yong (The Atlantic)
How the alt-right’s sexism lures men into white supremacy – Romano (Vox)
Sex Redefined – Ainsworth (Nature)

Peace Among Primates – Sapolsky (The Greater Good)

Against Human Nature—Ingold

Thursday, November 8, 2018

The horseshoe crab and the barnacle: induction vs deduction in evolution

Charles Darwin had incredible patience.  After his many-year, global voyage on the HMS Beagle, he nestled in at Down House, where he was somehow able to stay calm and study mere barnacles to an endless extent (and to write 4--four--books on these little creatures).  Who else would have had the obsessive patience (or independent wealth and time on one's hands) to do such a thing?

Image result for darwin barnacles
      From Darwin's books on barnacles (web image capture)
Darwin's meticulous work and its context in his life and thinking are very well described in Rebecca Stott's compelling 2003 book, Darwin and the Barnacle, which I highly recommend, as well as the discussion of these topics in Desmond and Moore's 1991 Darwin biography, The Life of a Tormented Evolutionist.  These are easier, for seeing the points I will describe here, than plowing through Darwin's detailed own tomes (which, I openly confess, I have only browsed).  His years of meticulous barnacle study raised many questions in Darwin's mind, about how species acquire their variation, and his pondering this eventually led to his recognition of 'evolution' as the answer, which he published only years later, in 1859, in his Origin of Species.

Darwin was, if anything, a careful and cautious person, and not much given to self-promotion.  His works are laden with appropriate caveats including, one might surmise, careful defenses lest he be found to have made interpretive or theoretical mistakes.  Yet he dared make generalizations of the broadest kind.  It was his genius to see, in the overwhelming variation in nature, the material for understanding how natural processes, rather than creation events, led to the formation of new species.  This was implicitly true of his struggle to understand the wide variation within and among species of barnacles, variation that enabled evolution, as he later came to see. Yet the same variation provided a subtle trap:  it allowed escape from accusations of undocumented theorizing, but was so generic that in a sense it made his version of a theory of evolution almost unfalsifiable in principle.

But, in a subtle way, Mr Darwin, like all geniuses, was also a product of his time.  I think he took an implicitly Newtonian, deterministic view of natural selection.  As he said, selection could detect the 'smallest grain in the balance' [scale] of differences among organisms, that is, could evaluate and screen the tiniest amount of variation.  He had, I think, only a rudimentary sense of probability; while he often used the word 'chance' in the Origin, it was in a very casual sense, and I think that he did not really think of chance or luck (what we call genetic drift) as important in evolution.  This I would assert is widely persistent, if largely implicit, today.

One important aspect of barnacles to which Darwin paid extensive attention was their sexual diversity.  In particular, many species were hermaphroditic.  Indeed, in some species he found small, rudimentary males literally embedded for life within the body of the female.  Other species were more sexually dichotomous.  These patterns caught Darwin's attention.  In particular, he viewed this transect in evolutionary time (our present day) as more than just a catalog of today, but also as a cross-section of tomorrow.  He clearly thought that what we saw today among barnacle species represented the path that other species had taken towards becoming the fully sexually dichotomous (independent males and females) in some species today: the intermediates were on their way to these subsequent stages.

This is a deterministic view of selection and evolution: "an hermaphrodite species must pass into a bisexual species by insensibly small stages" from single organisms having both male and female sex organs to the dichotomous state of separate males and females (Desmond and Moore: 356-7).

But what does 'must pass' mean here?  Yes, Darwin could array his specimens to show these various types of sexual dimorphism, but what would justify thinking of them as progressive 'stages'?  What latent assumption is being made?  It is to think of the different lifestyles as stages along a path leading to some final inevitable endpoint.

If this doesn't raise all sorts of questions in your mind, why not?  Why, for example, are there any intermediate barnacle species here today?  Over the eons of evolutionary time why haven't all of them long ago reached their final, presumably ideal and stable state?  What justifies the idea that the species with 'intermediate' sexuality in Darwin's collections are not just doing fine, on their way to no other particular end?  Is something wrong with their reproduction?  If so, how did they get here in the first place?  Why are there so many barnacle species today with their various reproductive strategies (states)?

Darwin's view was implicitly of the deterministic nature of selection--heading towards a goal which today's species show in their various progressive stages.  His implicit view can be related to another, current controversy about evolution.

Rewinding the tape
There has for many recent decades been an argument about the degree of directedness or, one might say, predictability in evolution.  If evolution is the selection among randomly generated mutational variants for those whose survival and reproduction are locally, at a given time favored, then wouldn't each such favored path be unique, none really replicable or predictable?

Not so, some biologists have argued!  Their view is essentially that environments are what they are, and will systematically--and thus predictably--favor certain kinds of adaptation.  There is, one might quip, only one way to make a cake in a particular environment.  Different mutations may arise, but only those that lead to cake-making will persist.  Thus, if we could 'rewind the tape' of evolution and go back to way back when, and start again, we would end up with the same sorts of adaptations that we see with the single play of the tape of life that we actually have. There would, so to speak, always be horseshoe crabs, even if we started over.  Yes, yes, some details might differ, but nothing important (depending, of course, on how carefully you look--see my 'Plus ├ža ne change pas', Evol. Anthropol, 2013, a point others have made, too).

Others argue that evolution is so rooted in local chance and contingency, that there would be no way to predict the details of what would evolve, could we start over at some point.  Yes, there would be creatures in each local niche, and there would be similarities to the extent that what we would see today would have to have been built from what genetic options were there yesterday, but there the similarity would end.

Induction, deduction, and the subtle implications of the notion of 'intermediate' forms
Stott's book,  Darwin and the Barnacle, discusses Darwin's work in terms of the presumed intermediate barnacle stages he found.  But the very use of such terms carries subtle implications. It conflates induction with deduction, it assumes what is past will be repeated.  It makes of evolution what Darwin also made of it: a deterministic, force-like phenomenon.  Indeed, it's not so different from a form of creationism.

This has deeper implications.  Among them are repeatability of environments and genomes, at least to the extent that their combination in local areas--life, after all, operates strictly on local areas--will be repeated elsewhere and else-times.  Only by assuming not only the repeatability of environments but also of genomic variation, can one see in current states of barnacle species today stages in a predictable evolutionary parade.  The inductive argument is the observation of what happened in the past, and the deductive argument is that what we see is intermediate, on its way to becoming what some present-day more 'advanced' stage is like.

This kind of view, which is implicitly and (as with Darwin) sometimes explicitly invoked, is that we can use the past to predict the future.  And yet we routinely teach that evolution is by its essential nature locally ad hoc and contingent, based on random mutations and genetic drift--and not driven by any outside God or other built-in specific creative force.

And 'force' seems to be an apt word here.

The idea that a trait found in fossils, that was intermediate between some more primitive state and something seen today, implies that a similar trait today could be an 'intermediate stage' today for a knowable tomorrow, conflates inductive observation with deductive prediction.  It may indeed do so, but we have no way to prove it and usually scant reason to believe it.  Instead, equating induction with deduction tacitly assumes, usually without any rigorous justification, that life is a deductive phenomenon like gravity or chemical reactions.

The problem is serious: the routine equating of induction with deduction gives a false idea about how life works, even in the short-term.  Does a given genotype, say, predict a particular disease in someone who carries it, because we find that genotype associated with affected patients today?  This may indeed be so, especially if a true causal reason is known; but it cannot be assumed to be.  We know this from well-observed recent history: Secular trends in environmental factors with disease consequences have indeed been documented, meaning that the same genotype is not always associated with the same risk.  There is no guarantee of a future repetition, not even in principle.

Darwin's worldview
Darwin was, in my view, a Newtonian in his view.  That was the prevailing science ethos in his time.  He accepted 'laws' of Nature and their infinitesimally precise action.  That Nature was law-like was a prevailing, and one may say fashionable view at the time. It was also applied to social evolution, for example, as in Marx's and Engels' view of the political inevitability of socialism.  That barnacles can evolve various kinds of sexual identities and arrangements doesn't mean any of what Darwin observed in them was on the way to full hermaphrodism or even later to fully distinct sexes...or, indeed, to any particular state of sexuality.  But if you have a view like his, seeing the intermediate stages even contemporaneously, would reinforce the inevitabilistic aspect of a Newtonian perspective, and seemingly justify using induction to make deductions.

Even giants like Darwin are products of their times, as all we peons are.  We gain comfort from equating deduction with induction, that the past we can observe allows us to predict the future.  That makes it comfortingly safe to make assertions, the feeling that we understand the complex environment in which we must wend our way through life.  But in science, at least, we should know the emptiness of the equation of the past with the future.  Too bad we can't seem to see further.

Friday, October 19, 2018

Nyah, nyah! My study's bigger than your study!!

It looks like a food-fight at the Precision Corral!  Maybe the Big Data era is over!  That's because what we really seem to need (of course) is even bigger GWAS or other sorts of enumerative (or EnumerOmics studies, because then (and only then) will we really realize how complex traits are caused, so that we can produce 'precision' genomic medicine to cure all that ails us.  After all, there no such thing as enough 'data' or a big (and open-ended) enough study.  Of course, because so much, at stake, such a food-fight is not just children in a sand box, but purported adults, scientists even, wanting more money from you, the taxpayer (what else?).  The contest will never end on its own.  It will have to be ended from the outside, in one way or another, because it is predatory: it takes resources away from what might be focused, limited, but actually successful problem-solving research.

The idea that we need larger and larger GWAS studies, not to mention almost any other kind of 'omics enumerative study, reflects the deeper idea that we have no idea what to do with what we've got.  The easiest word to say is "more", because that keeps the fiscal flood gates open.  Just as preachers keep the plate full by promising redemption in the future--a future that, like an oasis to desert trekkers, can be a mirage never reached, scientists are modern preachers who've learned the tricks of the trade.  And, of course, since each group wants its flood gates to stay wide open it must resist any even faint suggestion that somebody else's gates might open wider.

There is a kind of desperate defense, as well as food fight, over the situation.  This, at least, is one way to view a recent exchange between an assertion by Boyle et al. (Cell 169(7):1177-86, 2018**)  that some few key genes perhaps with rare alleles scattered across the genome are the 'core' genes responsible for complex diseases, but that lesser often indirect or incidental genes across the genome provide other pathways to affect a trait, and are detected in GWAS.  If a focus on this model were to take place, it might threaten the gravy train of more traditional, more mindless, Big Data chasing. As a plea to avoid that is Wray et al.'s falsely polite spitball in return (Cell 173:1573-80, 2018**)  urging that things really are spread all over the genome, differently so in everyone.  Thus, of course, the really true answer is some statistical prediction method, after we have more and even larger studies.

Could it be, possibly, that this is at its root merely a defense of large statistical data bases and Big Data per se, expressed as if it were a legitimate debate about biological causation?  Could it be that for vested interests, if you have a well-funded hammer everything can be presented as if it were a nail (or, rather, a bucket's worth of nails, scattered all over the place)?

Am I being snide here? 
Yes, of course. I'm not the Ultimate Authority to adjudicate about who's right, or what metric to use, or how many genome sites, in which individuals, can dance on the head of the same 'omics trait.  But I'm not just being snide.  One reason is that both the Boyle and Wray papers are right, as I'll explain.

The arguments seem in essence to assert that complex traits are due either to many genetic variants strewn across the genome, or to a few rare larger-effect alleles here and there complemented by nearby variants that may involve indirect pathways to the 'main' genes, and that these are scattered across the genome ('omnigenic').  Or that we can tinker with GWAS results and various technical measurements from them to get the real truth?

We are chasing our tails these days in an endless-seeming circle to see who can do the biggest and most detailed enumerative study, to find the most and tiniest of effects, with the most open-ended largesse, while Rome burns.  Rome, here, are the victims of the many diseases which might be studied with actual positive therapeutic results by more, focused, if smaller, studies.  Or, in many cases, by a real effort at revealing and ameliorating the lifestyle exposures that typically, one might say overwhelmingly, are responsible for common diseases.

If, sadly, it were to turn out that there is no more integrative way, other than add-'em-up, by which genetic variants cause or predispose to disease, then at least we should know that and spend our research resources elsewhere, where they might do good for someone other than universities.  I actually happen to think that life is more integratively orderly than its effects typically being enumeratively additive, and that more thoughtful approaches, indeed reflecting findings of the decades of GWAS data, might lead to better understanding of complex traits.  But this seemingly can't be achieved by just sampling extensively enough to estimate 'interactions'.  The interactions may, and I think probably, have higher-level structure that can be addressed in other ways.

But if not, if these traits are as they seem, and there is no such simplifying understanding to be had, then let's come clean to the public and invest our resources in other ways to improve our lives before these additive trivia add up to our ends when those supporting the work tire of exaggerated promises.

Our scientific system, that we collectively let grow like mushrooms because it was good for our self interests, puts us in a situation where we must sing for our supper (often literally, if investigators' salary depends on grants).  No one can be surprised at the cacophony of top-of-the-voice arias ("Me-me-meeeee!").  Human systems can't be perfect, but they can be perfected.  At some point, perhaps we'll start doing that.  If it happens, it will only partly reflect the particular scientific issues at issue, because it's mainly about the underlying system itself.

**NOTE: We provide links to sources, but, yep, they are paywalled --unless you just want to see the abstract or have access to an academic library.  If you have the looney idea that as a taxpayer you have already paid for this research so private selling of its results should be illegal--sorry!--that's not our society.

Thursday, October 18, 2018

When is a consistent account in science good enough?

We often want our accounts in science to be consistent with the facts.  Even if we can't explain all the current facts, we can always hope to say, truthfully, that our knowledge is imperfect but our current theory is at least largely true....or something close to that....until some new 'paradigm' replaces it.

It is also only natural to sneer at our forebears' primitive ideas, of which we, naturally, now know much better.  Flat earth?  Garden of Eden?  Phlebotomy?  Phlogiston?  Four humors?  Prester John, the mysterious Eastern Emperoro who will come to our rescue?  I mean, really!  Who could ever have believed such nonsense?

Prester John to the rescue (from Br Library--see Wikipedia entry)
In fact, leaders among our forebears accepted these and much else like it, took them as real, sought them for solace from life's cares not just because they were promised (as in religious figures) but as earthly answers.  Or, to seem impressively knowledgeable, found arcane ways to say "I dunno" without admitting it.  And, similarly, many used ad hoc 'explanations' for personal gain--as self-proclaimed gurus, promisers of relief from life's sorrows or medical woes (usually, if you cross their palms with silver first).

Even in my lifetime in science, I've seen forced after-the-fact 'explanations' of facts, and the way a genuine new insight can show how wrong those explanations were, because the new insight accounts for them more naturally or in terms of some other new facts, forces, or ideas.  Continental drift was one that had just come along in my graduate school days.  Evolution, relativity, and quantum mechanics are archetypes of really new ideas that transformed how our forebears had explained what is now our field of endeavor.

Such lore, and our more broad lionizing of leading political, artistic or other similarly transformative figures, organizes how we think.  In many ways it gives us a mythology, or ethnology, that leads us to order success into a hierarchy of brilliant insights.  This, in turn, and in our careerist society, provides an image to yearn for, a paradigm to justify our jobs, indeed our lives, make them meaningful--make them important in some cosmic sense, and really worth living.

Indeed, even ordinary figures from our parents, to the police, generals, teachers, and politicians have various levels of aura as idols or savior figures, who provide comforting answers to life's discomfiting questions.  It is natural for those burdened by worrisome questions to seek soothing answers.

But of course, all is temporary (unless you believe in eternal heavenly bliss).  Even if we truly believe we've made transformative discoveries or something like that during our lives, we know all is eventually dust.  In the bluntest possible sense, we know that the Earth will some day destruct and all our atoms scatter to form other cosmic structures.

But we live here and now and perhaps because we know all is temporary, many want to get theirs now, and we all must get at least some now--a salary to put food on the table at the very least.  And in an imperfect and sometimes frightening world, we want the comfort of experts who promise relief from life's material ills as much as preachers promise ultimate relief.  This is the mystique often given to, or taken by, medical professionals and other authority figures.  This is what 'precision genomic medicine' was designed, consciously or possibly just otherwise, to serve.

And we are in the age of science, the one True field (we seem to claim) that delivers only objectively true goods; but are we really very different from those in similar positions of other sorts of lore?  Is 'omics any different from other omnibus beliefs-du-jour?  Or do today's various 'omical incantations and promises of perfection (called 'precision') reveal that we are, after all, even in the age of science, only human and not much different from our typically patronized benighted forebears?

Suppose we acknowledge that the latter is, at least to a considerable extent, part of our truth.  Is there a way that we can better use, or better allocate, resources to make them more objectively dedicated to solving the actually soluble problems of life--for the public everyday good, and perhaps less used, as from past to today, to guild the thrones of those making the promises of eternal bliss?

Or does sociology, of science or any other aspect of human life, tell us that this is, simply, the way things are?

Wednesday, October 17, 2018

The maelstrom of science publishing: once you've read it, when should you shred it?

There is so much being published in the science literature--a veritable tsunami of results.  New journals are being started almost monthly, it seems, and mainly or only by for-profit companies.  There seems to be a Malthusian growth of the number of scientists, which has certainly produced a genuine explosion of research and knowledge, but the intense pressure on scientists to publish has perhaps changed the relative value of every paper.

And as I look at the ancient papers (that is, ones from 2016-17) that I've saved in my Must-Read folder, I see all sorts of things that, if they had actually been widely read, much less heeded, would mean that many papers being published today might not seem so original.  At least, new work might better reflect what we already know--or should know if we cared about or read that ancient literature.

At least I think that, satire aside, in the rush to publish what's truly new, as well as for professional score-counting and so on, and with the proliferating plethora of journals, the past is no longer prologue (sorry, Shakespeare!) as it once was and, one can argue should still be.  The past is just the past; it doesn't seem to pay to recognize, much less to heed it, except for strategic citation-in-passing reasons and because bibliography software can be used to winnow out citable papers so that reviewers of papers or grant applications won't be negative because their work wasn't cited.  You can judge for yourself whether this is being realistic or too cynical (perhaps both)!

The flux of science publishing is enormous for many reasons.  Not least is the expansion in the number of scientists.  But this is exacerbated by careerist score-counting criteria that have been growing like the proverbial Topsy in recent decades: the drive to get grants, big and bigger, long and longer.  Often in biomedical sciences, at least, grants must include investigator salaries, so there is massive self-interest in enumerable 'productivity'.  The journals proliferate to fill this market, and of course to fill the coffers of the publishers' self-interest.  Too cynical?

Over the years, in part to deflate Old Boy networks, 'objective' criteria have come to include, besides grants garnered, a faculty member's number of papers, ranking of the journals they're in, citation counts, and other 'impact factor' measures.  This grew in some ways also to feed the growing marketeering by vendors, even who provide score-counting tools, and university bureaucracies.  More generally, it reflects the way middle-class life, the life most of us now lead, has become--attempts to earn status, praise, wealth, and so on by something measurable and therefore ostensibly objectiveToo cynical?  

Indeed, it is now common for graduate students--or even undergrads--to attend careerism seminars.  Instruction in how to get published, how to get funded, how to work the System.  This may be good in a sense, or at least realistic, even if it was not so when, long ago, I was a graduate student.  It does, however, put strategizing rather than science up front, a first-year learning priority.  One wonders how much time is lost that, in those bad old days, was spent thinking and learning about the science itself.  We were, for example, to spend our 2-year Master's program learning our field, only then to get into a lab and do original work, which was what a PhD was about.  It is fair to ask whether this is just a change in our means of being and doing, without effect on the science itself, or whether careerism is displacing or even replacing really creative science?  When is objection to change nothing more than nostalgic cynicism?

Is science more seriously 'productive' than it used to be?
Science journals have always been characterized largely by the minutiae they publish, because (besides old boy-ism) real, meaty, important results are hard to come by.  Most observation in the past, and experiment these days, yields little more than curios.  You can see this by browsing decades-old volumes even of the major science journals.  The reports may be factually correct, but of minimal import.  Even though science has become a big industry rather than the idle rich's curiosity, most science publishing now, as in the past, might more or less still be vanity publishing.  Yet, as science has become more of a profession, there are important advances, so it is not clear whether science is now more splash than substance than it was in the past.

So, even if science has become an institutionalized, established, middle-class industry, and most of us will go down and out, basically unknown in the history of our fields, that has probably always been the case.  Any other view probably is mainly retrospective selective bias: we read biographies of our forebears, making them seem few and far between, and all substantial heroes; but what we are reading is about those forebears who really did make a difference.  The odd beetle collector is lost to history (except maybe to historians, who themselves may be making their livings on arcane minutiae).  So if that's just reality, there is no need to sneer cynically at it.

More time and energy are taken up playing today's game than was the case, or was necessary, in the past--at least I think that is pretty clear, if impossible to prove.  Even in the chaff-cloud, lasting knowledge does seem to be much more per year than it used to be.  That seems real, but it reveals another reality.  We can only deal with so much.  With countless papers published weekly, indeed many of them reviews (so we don't have to bother reading the primary papers), overload is quick and can be overwhelming.

That may be cynical, but it's also a reality.  My Must-Read folder on my computer is simply over-stuffed, with perhaps a hundred or more papers that I 'Saved' every year.  When I went to try to clean my directory this morning, I was overwhelmed: what papers before, say, 2015 are still trustworthy, as reports or even as reviews of then-recent work?  Can one even take reviews seriously, or cite them or past primary papers without revealing one's out-of-dateness?  New work obviously can obsolesce prior reviews. Yet reviews make the flood of prior work at least partially manageable.  But would it be safer just to Google the subject if it might affect one's work today?  It is, at least, not just cynicism to ask.

Maybe to be safe, given this situation, there would be two solutions:
1.  Just Google the subject and get the most recent papers and reviews; 
2.  There should be software that detects and automatically shreds papers in a Science Download directory, that haven't had any measurable impact in, say, 5 or (to be generous) 10 years.  We already have sites like Reddit, whose contents may not have a doomsday eraser.  But in science, to have mercy on our minds and our hard discs, what we need is Shred-it!

Tuesday, October 16, 2018

Where has all the thinking gone....long time passing?

Where did we get the idea that our entire nature, not just our embryological development, but everything else, was pre-programmed by our genome?  After all, the very essence of Homo sapiens compared to all other species, is that we use culture--language, tools, etc.--to do our business rather than just our physical biology.  In a serious sense, we evolved to be free of our bodies, our genes made us freer from our genes than most if not all other species! And we evolved to live long enough to learn--language, technology, etc.--in order to live our thus-long lives.

Yet isn't an assumption of pre-programming the only assumption by which anyone could legitimately promise 'precision' genomic medicine?  Of course, Mendel's work, adopted by human geneticists over a century ago, allowed great progress in understanding how genes lead at least to the simpler of our traits, with discrete (yes/no) manifestations, traits that do include many diseases that really, perhaps surprisingly, do behave in Mendelian fashion, and for which concepts like dominance and recessiveness been applied and that, sometimes, at least approximately hold up to closer scrutiny.

Even 100 years ago, agricultural and other geneticists who could do experiments, largely confirmed the extension of Mendel to continuously varying traits, like blood pressure or height.  They reasoned that many genes (whatever they were, which was unknown at the time) contributed individually small effects.  If each gene had two states in the usual Aa/AA/aa classroom example sense, but there were countless such genes, their joint action could approximate continuously varying traits whose measure was, say, the number of A alleles in an individual.  This view was also consistent with the observed correlation of trait measure with kinship-degree among relatives.  This history has been thoroughly documented.  But there are some bits, important bits, missing, especially when it comes to the fervor for Big Data 'omics analysis of human diseases and other traits.  In essence, we are still, a century later, conceptual prisoners of Mendel.

'Omics over the top: key questions generally ignored
Let us take GWAS (genomewide association studies) on their face value.  GWAS find countless 'hits', sites of whatever sort across the genome whose variation affects variation in WhateverTrait you choose to map (everything simply must be 'genomic' or some other 'omic, no?).  WhateverTrait varies because every subject in your study has a different combination of contributing alleles.  Somewhat resembling classical Mendelian recessiveness, contributing alleles are found in cases as well as controls (or across the measured range of quantitative traits like stature or blood pressure), where the measured trait reflects how many A's one has: WhateverTrait is essentially the sum of A's in 'cases', which may be interpreted as a risk--some sort of 'probability' rather than certainty--of having been affected or of having the measured trait value.

We usually treat risk as a 'probability,' a single value, p, that applies to everyone with the same genotype.  Here, of course, no two subjects have exactly the same genotype so some sort of aggregate risk score, adding up each person's 'hits', is assigned a p.  This, however, tacitly assumes something like that each site contributes some fixed risk or 'probability' of affection.  But this treats these values as if they were essential to the site, each thus acting as a parameter of risk.  That is, sites are treated as a kind of fixed value or, one might say 'force', relative to the trait measure in question.

One obvious and serious issue is that these are necessarily estimated from past data, that is, by induction from samples.  Not only is there sampling variation that usually is only crudely estimated by some standard statistical variation-related measure, but we know that the picture will be at least somewhat different in any other sample we might have chosen, not to mention other populations; and those who are actually candid about what they are doing know very well that the same people living in a different place or time would have different risks for the same trait.

No study is perfect, so we use some conveniently assumed well-behaved regression/correction adjustments to account for the statistical 'noise' due to factors like age, sex, and unmeasured environmental effects.  Much worse than these issues, there are clearly factors of imprecision, and the obvious major one, taboo even to think about much less to mention, that relevant future factors (mutations, environments, lifestyles) are unknowable, even in principle.  So what we really do, are forced to do, is extend what the past was like to the assumed future.  But besides this, we don't count somatic changes (mutation arising in body tissues during life, that were not inherited), because they'd mess up our assertions of 'precision', and we can't measure them well in any case (so just shut one's eyes and pretend the ghost isn't in the house!).

All of these together mean that we are estimating risks from imperfect existing samples and past life-experience, but treating them as underlying parameters so that we can extend them to future samples.  What that does is equate induction with deduction, assuming the past is rigorously parametric and will be the same in the future;  but this is simply scientifically and epistemologically wrong, no matter how inconvenient it is to acknowledge this.  Mutations, genotypes, and environments of the future are simply unpredictable, even in principle.

None of this is a secret, or new discovery, in any way.  What it is, is inconvenient truth. These things should have been enough, by themselves and without badgering investigators about environmental factors that (we know very well, typically predominate) prevent all the NIH's precision promises from being accurate ('precise'), or even to a knowable degree.   Yet this 'precision' sloganeering is being, sheepishly, aped all over the country by all sorts of groups who don't think for themselves and/or who go along lest they get left off the funding gravy train.  This is the 'omics fad.  If you think I am being too cynical, just look at what's being said, done, published, and claimed.

These are, to me, deep flaws in the way the GWAS and other 'omics industries, very well-heeled, are operating these days, to pick the public's pocket (pharma may, slowly, be awakening-- Lancet editorial, "UK life science research: time to burst the biomedical bubble," Lancet 392:187, 2018).  But scientists need jobs and salaries, and if we put people in a position where they have to sing in this way for their supper, what else can you expect of them?

Unfortunately, there are much more serious problems with the science, and they have to do with the point-cause thinking on which all of this is based.

Even a point-cause must act through some process
By far most of the traits, disease or otherwise, that are being GWAS'ed and 'omicked these days, at substantial public expense, are treated as if the mapped 'causes' are point causes.  If there are n causes, and a person has an unlucky set m out of many possible sets, one adds 'em up and predicts that person will have the target trait.  And there is much that is ignored, assumed, or wishfully hidden in this 'will'.  It is not clear how many authors treat it, tacitly, as a probability vs a certainty, because no two people in a sample have the same genotype and all we know is that they are 'affected' or 'unaffected'.

The genomics industry promises, essentially, that from conception onward, your DNA sequence will predict your diseases, even if only in the form of some 'risk'; the latter is usually a probability and despite the guise of 'precision' it can, of course, be adjusted as we learn more.  For example, it must be adjusted for age, and usually other variables.  Thus, we need ever larger and more and longer-lasting samples.  This alone should steer people away from being profiteered by DNA testing companies.  But that snipe aside, what does this risk or 'probability' actually mean?

Among other things, those candid enough to admit it know that environmental and lifestyle factors have a role, interacting with the genotype if not, usually, overwhelming it, meaning, for example, that the genotype only confers some, often modest, risk probability, the actual risk much more affected by lifestyle factors, most of which are not measured or not measured with accuracy, or not even yet identified.  And usually there is some aspect that relates to age, or some assumption about what 'lifetime' risk means.  Whose lifetime?

Aspects of such a 'probability'
There are interesting issues, longstanding issues, about these probabilities, even if we assume they have some kind of meaning.  Why do so many important diseases, like cancers, only arise at some advanced age?  How can a genomic 'risk' be so delayed and so different among people?  Why are mice, with very similar genotypes to humans (which is why we do experiments on them to learn about human disease) only live to 3 while we live to our 70s and beyond?

Richard Peto, raised some of these questions many decades ago.  But they were never really addressed, even in an era when NIH et al were spending much money on 'aging' research including studies of lifespan.  There were generic theories that suggested from an evolutionary theory why some diseases were deferred to later ages (it is called 'negative pleiotropy'), but nobody tried seriously to explain why that was from a molecular/genetic point of view.  Why do mice only live only 3 years, anyway?  And so on.

These are old questions and very deep ones but they have not been answered and, generally, are conveniently forgotten--because, one might argue, they are inconvenient.

If a GWAS score increases the risk of a disease, that has a long delayed onset pattern, often striking late in life, and highly variable among individuals or over time, what sort of 'cause' is that genotype?  What is it that takes decades for the genes to affect the person?  There are a number of plausible answers, but they get very little attention at least in part because that stands in the way of the vested interests of entrenched too-big-to-kill Big Data faddish 'research' that demands instant promises to the public it is trephining for support.  If the major reason is lifestyle factors, then the very delayed onset should be taken as persuasive evidence that the genotype is, in fact, by itself not a very powerful predictor.

Why would the additive effects of some combination of GWAS hits lead to disease risk?  That is, in our complex nature why would each gene's effects be independent of each other contributor?  In fact, mapping studies usually show evidence that other things, such as interactions are important--but they are at present almost impossibly complex to be understood.

Does each combination of genome-wide variants have a separate age-onset pattern, and if not, why not?  And if so, how does the age effect work (especially if not due to person-years of exposure to the truly determining factors of lifestyle)?  If such factors are at play, how can we really know, since we never see the same genotype twice? How can we assume that the time-relationship with each suspect genetic variant will be similar among samples or in the future?  Is the disease due to post-natal somatic mutation, in which case why make predictions based on the purported constitutive genotypes of GWAS samples?

Obviously, if long delayed onset patterns are due not to genetic but to lifestyle exposures interacting with genotypes, then perhaps lifestyle exposures should be the health-related target, not exotic genomic interventions.  Of course, the value of genome-based prediction clearly depends on environmental/lifestyle exposures, and the future of these exposure is obviously unknowable (as we clearly do know from seeing how unpredictable past exposures have affected today's disease patterns).

The point here is that our reliance on genotypes is a very convenient way of keeping busy, bringing in the salaries, but not facing up to the much more challenging issues that the easy one (run lots of data through DNA sequencers) can't address.  I did not invent these points, and it is hard to believe that at least the more capable and less me-too scientists don't clearly know them, if quietly.  Indeed, I know this from direct experience.  Yes, scientists are fallible, vain, and we're only human.  But of all human endeavors, science should be based on honesty because we have to rely on trust of each other's work.

The scientific problems are profound and not easily solved, and not soluble in a hurry.  But much of the problem comes from the funding and careerist system that shackles us.  This is the deeper explanation in many ways.  The  paint on the House of Science is the science itself, but it is the House that supports that paint that is the real problem.

A civically responsible science community, and its governmental supporters, should be freed from the iron chains of relentless Big Data for their survival, and start thinking, seriously, about the questions that their very efforts over the past 20 years, on trait after trait, in population after population, and yes, with Big Data, have clearly revealed.