Thursday, September 12, 2013

Organisms with fuzzy borders

Everyone's on the human microbiome story these days -- the bacteria in our gut that make us fat, thin, sick, well -- but a new paper in Current Biology ("Horizontally Transmitted Symbionts and Host Colonization of Ecological Niches", Henry et al., Sept 9, 2013) tells the story of the aphid microbiome and the role that symbionts play in the life history of this insect.  Symbionts don't make aphids fat or thin, but they do have other significant effects.  

Symbiotic bacteria in the insect gut are common, as they must be if they're not pathogenic. They have been the subject of research for decades.  They are often inherited, in that offspring have the microbiome of their parents, and the bacterial mix can affect interactions between members of a given species, the use of plants on which insects feed, can boost the insect's immune system, or affect their response to temperature changes, or even manipulate their reproduction.  Insects can be host to multiple simultaneous symbionts, ecosystems which undoubtedly have their own internal dynamics:  Ferrari and Vavre suggest in a 2011 paper that symbionts and their hosts are "dynamic communities that affect and are affected by the communities in which they are embedded."

Aphids, also known as "plant lice," are sap-sucking insects that either live on or destroy plants, depending on your point of view.  If you are a farmer, they're very destructive but if you happen to be an aphid, you're just doing your job.  

Pea aphid, Acyrthosiphon pisum, mother and nymphs. Wikipedia

Aphids can host both primary and secondary symbionts; most aphids are infected with primary symbionts which can be essential to survival as they supplement the insect's nutrition and can protect the insect from pathogens, while fewer aphids host secondary symbionts, bacteria that play less crucial roles but that may still improve the insect's lot in life, including increasing its reproductive fitness.

Symbionts can reside in the cytoplasm of their hosts' cells, and are most often transferred vertically, from mother to offspring, but Henry et al. report considerable horizontal transmission as well, from adult insect to adult insect, just as the small genetic ring-structures called plasmids are transmitted from bacterium to bacterium. The mechanisms of horizontal transfer aren't well-documented, but Jennifer White notes in a perspective on the Henry paper in Current Biology, that it can be via the host plant, via a shared natural enemy, or via sexual transmission. 

Henry et al. looked at genetic diversity, ecological correlates, and mode of transmission for 1,104 pea aphids from 11 kinds of plants, in 155 different places in 14 countries.  They used the DNA sequence of the aphid's primary symbiont, the vertically inherited Buchnera, as the basis for grouping the insects by symbiont similarity.  They used nuclear markers from the aphids to group them by adaptation to specific host plants.  They also determined which of four secondary symbionts, if any, the aphid was hosting.  Knowing this alone, one would expect there to be a 'tree' of historical and/or geographical relationships based on these aspects of sharing, and this is apparently so:

The authors report that host plant was non-randomly associated with symbiont.  That is, which plant the aphid preferred was correlated with the symbionts it hosted. And, genetic structure of the symbiont was associated with host ecology -- sometimes but not always this included preferred plants, but always geographic factors such as temperature and aridity of the locale. 

Henry et al. found horizontal transfer to be common, and "associated with aphid lineages colonizing new ecological niches, including novel plant species and climatic regions."  This is a spatial kind of phylogeny rather than a temporal parent-offspring tree of relationships.  Indeed, they found that similar species of symbionts infect aphids on similar host plants in similar ecosystems around the world, and that mode of transmission is associated with type of plant and ecological setting.  This finding suggests that there is also a temporal family tree, and the authors wondered which came first then, the symbiont or the adaptation to host plant.  Whichever, it likely happened long ago.

They used a Bayesian approach to test for correlated evolution between two traits.  This means they had some prior notion of relationships that might be found, and used data to refine the likely true story.  This is a method they explained fully in the supplemental material.  "We found clear evidence that the pea aphid’s colonization of particular host plants is associated with infections by two of the four symbiont species."

And they found that aphids will switch to particular host plants when infected with a particular symbiont; indeed, they don't switch to these plants at all when they aren't infected by these symbionts.  "Our work supports the idea that symbionts assist their host in exploiting specific ecological niches and occupying different climatic zones and is consistent with the hypothesis that symbionts form a horizontal gene pool that is actively sampled by hosts when confronting novel environmental challenge.

That means that the evolutionary history has been fostered not just by individual competition, but by shared interactions and success.  The authors suggest that the symbiont influences choice of feed plant because there is a fitness benefit to feeding on a specific plant.  And this in turn would be beneficial to the bacterium, which thrives when the aphid thrives.  Henry et al. state that "rates of colonization of new host plants are higher when a symbiont is carried rather than when aphids on particular host plant have higher rates of gaining certain bacteria."  They conclude that "secondary symbionts constitute a eukaryote horizontal gene pool, a reservoir of potential adaptations, or preadaptation."

There are no new basic concepts in this work, even though it seems to be very well done.  But every well-documented instance of complex interactions, that we have known about at more visible levels of observation, that are documented down to the molecular and cellular level, reinforces an overall view of the web-like network that evolution has woven.

Whether transmitted horizontally or vertically, or whether symbionts need to be considered purely in the light of selective adaptation, asking how symbionts can affect the behavior of their host reflects a fairly recent view of biology in which organisms have fuzzy borders. In this case, the symbiosis is good for the host and the bacteria, if not so good for the plant.  But it does reinforce the view that cooperation is important in life. 

Wednesday, September 11, 2013

Having a ball with the latest Great Finding in science!

You can never tell what's been funded in the name of hard-core, serious-minded, fundamental paradigm-shifting, transformative science.  Limited journal (and magazine and online 'science' news sources) mean that only the hottest, most trenchant research is published to an astounded audience.

This week's mega-story, about a paper published in PNAS, is told, among many other places, on the BBC--and meriting a headline at that:  Testical size is related to child-rearing behavior!  Hey, guys!  Size does matter after all!  The tinier your privates (well, some of them) are, the kinder, gentler you are and the better fathering you provide your beleaguered spouse.  This goes strikingly against the theory that males are all about rutting and fighting and disappearing after their mate starts to 'show' that she's preggie and not so interested in rolls in the hay. 


So tender and sweet a Dad.  (image from the BBC web page)


Instead, those men with tinier equipment prefer to stay home watching Junior, cooing and doing all those wishy-washy girlie things.  (Unfortunately, blog ethics standards prohibit us from showing images of the research material itself; however, readers with deep scientific interest in the research, can find similar items, of varying description, in many other more explicit web sites)  Now this is a surprise, since child-rearing could actually be related to fitness (which, we remind readers, is about successful reproduction).  So why is it that the Big Brute image of maleness, and the drive selection would have to make guys leave the nest and go out on pick-up missions, ever became the Real Truth about sexual reproduction?  If this is so important, of course, we'd expect that over evolutionary times, the success of the smaller set would have become established by natural selection. By now, we should all have marbles instead of basketballs, and this should not vary very much, and there should be a gene 'for' keeping size under control.  So how is it that this Fundamental trait is still so variable, or that so much science (that is, simple-story evolutionary speculation about how size matters) has been so widely accepted?  We'll leave that for you to contemplate, as the topic du jour.

Tuesday, September 10, 2013

Who begat you??

Henry Louis Gates, who has done so much to interest people in DNA-based ancestry testing, tells the story of his own first ancestry report. He was told that his maternal line probably traced back to the Nubian people in Egypt. Given that the ancestors of most African Americans left Africa by force, it's difficult for their descendants to trace their family histories further back than their arrival on these shores, so Gates was excited to learn of his North African past.

Five years and another ancestry test later, however, he was told that his maternal ancestors were in fact more likely to have been European than African. Why didn't the first test give the same results? As we recall the story, it was because the goal of first ancestry testing company was to trace African American ancestry back to Africa. They had no European samples in their database, and this forced the closest fits to be African, even when they weren't.

The quality of any ancestry results depends on the representativeness of the ancestral data. If an ancestry company only had, say, Tahitians, Mongolians and Finnish Saami samples with which to compare our DNA, we'd all look like an admixture of Tahitian, Mongolian and Saami. And much the same is true when a researcher is trying to determine the geographic history of a population, or to search for genetic signatures of natural selection.

Population ancestry determination
A recent paper in BioEssays ("SNP ascertainment bias in population genetic analyses: Why it is important, and how to correct it," Lachance and Tishkoff, online 9 July 2013, in print Sept 2013) discusses the biases that may be built into demographic and genetic analyses if researchers determine allele frequencies -- the frequency of different gene variants -- based on single nucleotide polymorphisms (SNPs) which have been found to vary in substantial frequency in other, perhaps distantly related populations. (SNPs are single sites in the genome where some people have, say, a T and others an A.) These criteria for identifying variable sites that are to be used later to examine ancestry of general samples of people can bias the ancestry estimates.

 Lachance and Tishkoff report the results of population genetic analyses using whole genome sequences of 15 African hunter-gatherers; 5 Hadza from Tanzania, 5 Pygmies from Cameroon, and 5 Sandawe from Tanzania. The sequences contained millions of variants that hadn't been seen in other populations before -- and this will be true of every population with a history of geographic isolation. Indeed, each of us contains hundreds of brand-new mutations private only to us, and others exist private to our near kin, and so on. Depending on the sample size and source, one will detect some but by no means all, genetic variation in the population (any population). Lachance and Tishkoff compared their analyses done with whole genomes with SNP-based analysis (that is, a subset of variable sites, not whole sequence, sites that were identified in some other samples), and determined that the latter are biased.

SNP ascertainment bias is the systematic deviation of population genetic statistics from theoretical expectations, and it can be caused by sampling a nonrandom set of individuals or by biased SNP discovery protocols. Unless the whole genome of every individual in a population is sequenced there will always be some form of SNP ascertainment bias.

Population statistics are affected by SNP ascertainment bias in a variety of ways. For example, populations may falsely seem to have shrunk in size, the age estimate of SNPs may be biased toward older SNPs because pre-ascertained SNPs are usually older than population specific SNPs. Measures of population heterozygosity, and its variation, can be affected because those measures depend on how much variation there is within and between populations. If you only use sites found to vary substantially in, say, Europeans (as the earliest SNP sets were), then statistically speaking Europeans are likely to seem more variable than other populations.

 Population geneticists and evolutionary biologists are often interested in determining whether signatures of natural selection can be detected in a specific population. That is, is there genetic evidence that the population has adapted to local environmental conditions? One major signal is reduced variation, because strong selection can eliminate all but the most favored variant(s) in a given relevant gene. But if you only look at common variants, you may not see this. Lachance and Tishkoff report that they identified many possible targets of natural selection when they looked at the complete genome sequences, but most of these were missed when the analyses used the pre-ascertained SNPs.

The authors suggest a number of possible ways to work around SNP ascertainment bias, the best being whole genome sequencing, which of course is still prohibitively expensive for even a moderate sample size. Others include correcting for ascertainment bias in the models and analyses, although this isn't perfect.

 The idea is that if you sequence whole genomes in enough people from enough geographic sites you can identify sites that vary without only looking in population 2 at sites you already found vary in population 1. The degree to which this is feasible without introducing your own types of sampling bias is debatable, as is the adequate sample size--and whether it is worth the expenditure. But it is at least important to realize the extent to which what you decide to collect may, in advance and even unwittingly, affect what you conclude. How and how well one can actually detect natural selection's signatures with such data is another subject, and it's a difficult one. We should not leap into costly sequencing efforts just to satisfy some curiosity on the subject, because there are many other kinds of issues that one would have to consider--and it's often not clear what the value even of a correctly identified signal is.

 And, there are considerations other than SNP ascertainment bias that affect the reliability of population genetic analyses. They include sample size and choice, among other things. But, as with personal ancestry testing, if you're trying to understand population 2 primarily with knowledge about population 1, your results may not be reliable if they don't 'triangulate' with regard to enough appropriate alternatives. And it can be hard to tell if you've done that.

 Nonetheless, and while recognition of these issues is at least 20 years old, but they are worth being aware of.

Monday, September 9, 2013

Talking shop, and so much more

Ken and I are just back from a wonderful few days in Rhode Island.  Holly invited Ken to speak in the University of Rhode Island biology seminar series and we were delighted with the chance to visit the coast, meet some of Holly's students and colleagues, eat seafood, and most of all, spend real time with her, who we mostly see only virtually, and her husband Kevin.

A few of the things we pondered:
  • Why is Elroy the dog digging that trench under the bushes in the back yard?
  • White, tomato or clear clam chowder?
  • If there are multiverses, and you die on every one of them, are you still immortal?
  • Do dogs have self-awareness?
  • Why are oysters so genetically diverse?
  • What's worse, hurricanes or blizzards?
  • Can science writers be advocates? Can they avoid being advocates?
  • Are Maine lobsters better than Rhode Island ones?
  • Why do so many genetic results point to polygenic causation when so many genes are conserved? Or, why are so many genes conserved when polygenic causation is ubiquitous?
  • Does anyone understand quantum mechanics?
  • Why do we live so far from Rhode Island?

And a few of the things we saw:

View from Beavertail lighthouse

If I were a bee, I would be unable to pass this seductive hibiscus by (it is a hibiscus, isn't it?). Such lovely modularity, too.

Cliff Walk; we pretended the Newport mansions behind us weren't there.

Ken, Kevin and Holly: Were they talking about quantum mechanics or where to have our next meal? Can't remember.

Elroy; a great dog, he's even written a book (and you can follow him on Twitter: @ElroyBeefstu).  No doubt, dogs have self-awareness.  And so much more.  

Or maybe they were talking about who makes Bad Boy vodka.

We try to forget how landlocked Central PA is.
And then we tried hard to figure out what this gull was eating, but failed. Crab? Skate? Mermaid's tail?


The mesmerizing view of the indefinite, if not infinite sea...




Before coffee, draped in hydrangea from the gorgeous bush in the front yard.  
Thank you so much, Holly and Kevin.  Such a fine weekend.

Friday, September 6, 2013

Fruit and diabetes - a cocktail of results

Do you remember how much fresh fruit you ate last year?  Or, ok, in the last three months?  Or even last week?  What about differentiating between fruits; strawberries vs cantaloupe vs blueberries?  And how many??

We ask because a new study published in the British Medical Journal ("Fruit consumption and risk of type 2 diabetes: results from three prospective longitudinal cohort studies," Muraki et al.), that's getting a lot of news play, reports that eating a serving of blueberries at least three times a week protects against type 2 diabetes (t2d).  Blueberries are best, but grapes and raisins are second best and apples and pears are third.  Cantaloupe, on the other hand, seems to be a risk factor, as does drinking your fruit as juice rather than eating it whole. 

The study looked at food questionnaire data from a ton of people; "66 105 women from the Nurses’ Health Study (1984-2008), 85 104 women from the Nurses’ Health Study II (1991-2009), and 36 173 men from the Health Professionals Follow-up Study (1986-2008)."  So the data must be robust--yes?

Food questionnaires are a well-established tool for eliciting dietary information, and they are used all the time.  But they are also notoriously unreliable, for reasons that are easy to understand; it's really hard, first, to think in terms of standardized portion sizes and, second, to remember how often you eat a food, particularly if it's seasonal.  And there's the subtle possibility of responder bias--knowing what the study is looking for.

An anecdote, for what it's worth.  My mother has been filling out food questionnaires as a subject in the Nurses' Health Study for decades.  I remember her reaction to the first one she was asked to complete because I was a graduate student in public health at the time, learning in epidemiology classes about state-of-the-art tools like diet surveys, so the fact that they might be seriously flawed was an eye opener.  Suffice it to say, she did a lot of guessing. It's not good science to extrapolate from a single case, I know, but the following, from the paper, suggests it might actually be valid in this instance.
The food frequency questionnaires were validated against diet records among 173 participants in the Nurses’ Health Study in 1980 and 127 participants in the Health Professionals Follow-up Study in 1986. Corrected correlation coefficients between food frequency questionnaire and diet record assessments of individual fruit consumption were 0.80 for apples, 0.79 for bananas, and 0.74 for oranges in women, and 0.67 for total whole fruits, 0.76 for fruit juice, 0.95 for bananas, 0.84 for grapefruit, 0.76 for oranges, 0.70 for apples and pears, 0.59 for raisins and grapes, and 0.38 for strawberries in men.
So portion size and frequency are hard to remember, and that's a problem.  But maybe there's something else influencing the results of this study.  We're betting it's a lot easier to remember when you consume something every day, like a glass of orange juice, or banana on your cereal, but harder to remember when you have something that is more likely to be only seasonally available, like plums, or apricots.  This could explain why there seems to be such variation in the accuracy with which people remember different fruits and juice.

But probably more significantly, we'd bet that people who are already at lower risk of t2d because they exercise, or watch their diets, are thinner and so forth, are the same people who include more fruits and vegetables in their diets.  So, blueberry consumption may be indicative of a low-risk lifestyle rather than nutritional components that protect against t2d.  And this is what was found.
In all three cohorts, total whole fruit consumption was positively correlated with age, physical activity, multivitamin use, total energy intake, fruit juice consumption, and the modified alternate health eating index score, and was inversely associated with body mass index and current smoking. Whole fruit consumption was associated with an increased probability of using post-menopausal hormones in the Nurses’ Health Study and with a reduced probability of using oral contraceptives in the Nurses’ Health Study II.
The investigators corrected for these correlations -- that is, essentially asked the question, "In the lower risk group in the study, are those who eat X fruit at even lower risk?".  These adjustments weakened the associations, which isn't surprising, and in fact adjusting for other things like gestational diabetes or cancer also attenuated the associations.  But they still found that blueberries, raisins, grapes, pears and apples were correlated with lower risk, and cantaloupe with higher.  Methodological questions aside, could blueberries really be protective against type 2 diabetes, but not cantaloupe?

The idea is that fruits are rich in fiber, antioxidants and phytochemicals, all of which are presumed to be protective.  But, some are sugary, which might instead be a risk factor.  Results of studies of the role of specific fruits in risk of t2d have varied, with different fruits sometimes implicated in risk and sometimes with protection. "In eight previous prospective studies, the association between total fruit consumption and risk of type 2 diabetes was examined, and the results were mixed."

And, the study found that association of risk with glycemic load was inconsistent, and varied by cohort; "...a significant, inverse association was found in the Nurses’ Health Study, but not in the other two cohorts."  Indeed, this wasn't the only outcome that varied by cohort.    
In the Nurses’ Health Study II and Health Professionals Follow-up Study, banana consumption was associated with a lower risk of type 2 diabetes, whereas in the Nurses’ Health Study a non-significant positive association was found. The association for strawberry consumption was significantly positive in the Health Professionals Follow-up Study but was non-significant and inverse in the Nurses’ Health Study.
The authors conclude about their findings that most, but not all, "were quite consistent among three cohorts." We would suggest, however, that despite the large cohort sizes, methodological issues, particularly recall issues with the food questionnaires, are significant enough that you shouldn't make your decisions about which fruit to put on your cereal based on these results.

The issues are mainly examples of data (recall) error, and confounding:  so many factors go together in this society, bombarded by health-advice 'news' (and 'scientific' reports like the blueberry study), that it is very difficult to disentangle them.  Usually, then each individual factor has low effect on its own.

You can make your own judgement about whether these kinds of study are worth reading, publishing, or funding.  You might, for example, classify them with GWAS and other 'Big Data' studies, and say that when many different factors are at play, with individually minor effects, and people know what the latest advice is and adjust their behavior (often subtly) accordingly, that we're just asking to be confounded when we ask about the wisdom of exposure to individual factors.

Unless, of course, you're from New England and you're partial to blueberries.  In that case: go ahead and dose up!  Otherwise, read something more worthwhile than massive, inconclusive, weak-factor studies.

Thursday, September 5, 2013

Sudden silence from North Korea! The Terwilliger Debacle.

On Tuesday we posted about the arrival of tuba player, basketball aficionado, and sometime geneticist Joe Terwilliger in North Korea, to help shepherd the illustrious (and illustrated) Dennis Rodman around the country, so he (Dennis) won't get lost.  Joe is conversant, or better, in the North Korean dialect--at least as we thought--so this seemed like a reasonable way for a once-serious geneticist to spend his time.


But today the horrible news of Joe's trip is.....no news.  A total silence.  As quiet as a crowd before a foul shot....and what a foul shot this has turned out to be!  Where's all the brave talk and hoopla that usually accompanies any attention paid to the Perfect Republic?

Well, as long as you promise not to tell anyone, we will tell you.  We have learned (quite confidentially) that this was not another example of dictatorial failure to run the spin machine properly.  No!  It was a linguistic disaster, and you'll quickly know who was responsible.

You may have seen photos the other day of 500,000 young North Korean girls, all spontaneously assembled in identical pink dresses, on a Pyongyang parade squere.  The news wires said this was to celebrate some latest government success or other.  We wondered what that may have been, but we've learned that, instead, it was the result of the Terwilliger Debacle.

These nubile young servants of the State had, in fact, been assembled to have a Celebrate Genetics Day march, then to file one at a time by a Q-tip station to have a cheek (i.e., oral!) swab to donate their life-substance to the nation.  This was a Terwilliger-arranged MegaMegaHyper GWAS study, to humble the rest of the world even more than a nuclear test would have, by generating more DNA data and more trivial but significant 'findings' than the world has ever known. This in itself is very strange: Normally, one would not expect Joe Terwilliger to encourage anyone to undertake another GWAS study.  Normally, he advises them to do some actual science instead.

But then the true nature of the disaster became evident, and we can now explains the sudden silence! True, Joe is reasonably fluent in various languages, and he's plenty proud of it. But what the secret wires have informed us is that Joe's linguistic overconfidence has backfired big-time.  In teaching the Kim crew about genomics, he told them they would have to sequence 6.2 billion units for each person they included in the study, and this material would have to be assembled in their new Shame-the-West lab. 

But in his haste to be included in this trip, and tag along with the illustrious (and illustrated) Mr Rodman, Joe got his Korean phonemes mixed up when orally querying his Rosetta Stone Korean language program for the word he needed, and what he actually translated to the North Koreans was was beans, not genes!

Well, the North Koreans don't know as much genetics as they do nuclear physics, and whether or not Joe's instructions seemed strange to them, Kim's loyal citizens obediently and with amazing speed mobilized the required trainloads of bushel upon bushel, to be secretly delivered, from every corner of the country and in the dark of night, to the Lab, in order for the Big Data result to be sprung on an unsuspecting world, before the illustrious (and illustrated) Mr Rodman would end his 'good-will only' tour.

Instead, the catastrophe has occurred: the beans entirely clogged up the sequencing machines--and there is no Joe Terwilliger to be seen!  (Fortunately, the North Koreans don't yet realize what's happened, and are searching the sequencer manual to see how they failed to set it up right).  We don't know how you say HORSE in Korean, but we did, however, hear one report that the sound of a bouncing basketball could be heard in a gym near the lab.

Joe's obvious ulterior motive was that now the Koreans will have to invite him back, to teach a class on how to operate sequencing equipment.

Wednesday, September 4, 2013

Collecting data is easy; making sense of it is not

Big Data
More Big Data is on the horizon -- a new estimate of the number of viruses in known mammal species harbor is generating excitement over the plausibility of sequencing them all.  A paper in mBio ("A Strategy To Estimate Unknown Viral Diversity in Mammals," Anthony et al.) reports that sequencing the Indian Flying Fox, a bat, the first species whose 'virodiversity' was characterized, found 58 viruses, 50 of them previously unknown.  Extrapolating from this, researchers estimated that with 5,500 known mammals, mammals harbor at least 320,000 viruses.

The majority of viruses that infect humans come to us from wild life, so if we sequence them all, which now seems feasible, or at least imaginable in principle, we'd have at least some handle on the kinds of epidemics that might be in our future.  And this for a mere $6.3 billion, trivial compared with the cost of a pandemic, assuming that's what would be in store otherwise. And it would be even less costly if only 85% of all viruses were characterized.
If annualized over a 10-year period, the discovery of 85% of mammalian viral diversity would be just $140 million/year, which is both a one-off cost and a fraction of the cost of globally coordinated pandemic control programs such as the “One World, One Health” program, estimated at $1.9 to 3.4 billion per year, recurring.  While these programs will not themselves prevent the emergence of new zoonotic viruses, they will further contribute to pandemic preparedness by enhancing our understanding of viral ecology and the mechanisms of disease emergence and by providing sequences and other insights that reduce the morbidity, mortality, and economic impact of emerging infectious diseases by expediting recognition and intervention. 
This all sounds extremely valuable to global public health.  But, will it be?  We have known about the mode of transmission of influenza for decades, and have developed vaccines which we're all encouraged to get each fall, but we are unable to eliminate the virus in its animal reservoirs.  And it mutates every year, and it's yet another virus that medicine is unable to treat, and it still kills tens of thousands every winter.

MERS coronavirus; Wikimedia Commons
We know everything we need to know about HIV and it's still killing people.  We worry about avian flu, and mutations that will make it easily transmissible between humans, and are unable to prevent that, even though the virus has been well characterized.  Even when the reservoir for MERS is discovered, we'll not be able to treat it, nor prevent its transmission to humans, nor prevent its mutation.  Sequencing a virus is easy, it's what gets done after that that's hard.  And having 320,000 viral sequences won't change that.

Another Framingham -- alas
At least one commenter on this project, here, likens its potential to that of the Framingham Project, a decades long study of heart disease which many believe has been responsible for a sharp decline in  cardiovascular disease in the West.  But in fact incidence of CVD began to fall before any Framingham-related interventions became widespread, and indeed the underlying reason is still not understood.  And some of the fundamental Framingham findings have not been born out by subsequent research. So, if the project is similar to Framingham it may well be in collecting huge amounts of data that may not make much difference to public health.

The project has the potential to be similar to Framingham in another major way, and that is in tying up money for decades.  The investigators are already talking about ten years of funding.  And ten years from now, the project will be too big to defund.

And of course this project will have a lot in common with human genetics -- let's just collect the data, and then it will tell us what it means.  Science has gotten very very good at collecting loads of data.  But that doesn't always correlate with asking the right questions, or understanding what it means.

There seems to be a clear pattern that investigators get together to hatch ever larger, longer and costlier projects, argued on fanciful promises that often boil down to fear tactics (horrible epidemic if you don't fund us!!!).  That is the 'Big Data' worldview.  Time has already proven that these are very costly, difficult to terminate after they reach diminishing returns, compete with focused individual investigator research, and are a big grab on shrinking or limited funds.

So the very first question one should ask, after filtering out the excess verbiage, is would this, at this time, given other problems being faced, be worth investing in?  Or should the investigators be told to focus on more immediate or real problems, or areas of virology and epidemiology with more likely short-term relevance?  After all, some years, say ten, from now, technology costs may have plummeted and this could be more affordable, given a recovered economy.  But affordability is not necessarily the same as guaranteed progress.