Tuesday, November 17, 2009

Wallace, Bates and the Ten Thousand Beetles Project

n the middle 1800s, numerous European adventurers sallied forth into the wild and largely unknown regions of the world, to see what they could find. There were many such collectors, but two of the most famous were Henry Bates (in whose honor Batesian mimicry for selective advantage of protective coloration was named) and Alfred Wallace (after whom the theory of evolution was not named but should have been at least co-named). They slogged through the Amazon basin, and (in Wallace's case) the islands of what is now Indonesia.

They were 'naturalists', collecting and studying the diversity of exotic (to Europeans) animal species of these tropical wonderlands. Their purpose was mixed but an important aspect was that they supported themselves and their expeditions by sending home trophy species to sell to the wealthy to show off in their parlors and impress their friends. We often think of them now as 'beetle collectors'.

This is rather like the 10,000 vertebrate genome project being proposed by a large consortium of modern beetle collectors (Genome 10K, they're calling it).  The idea is to sequence the genomes of 10,000 vertebrate species.  The cost of sequencing a genome is dropping and is promised soon to be around $1000.  Thus, this project can be done for a mere $10 billion!  Compared to landing on the moon, making another nuclear submarine, or who knows how many other mega-projects, it seems cheap, almost a bargain, and well within the routine claims that sciences are making of the public honey-pot.

Not so fast! For starters, this is only the foot in the door. Once these sequences are done, there will inevitably follow an open-ended demand for persons to stuff, curate, and protect the specimens, for gear and programmers to house them in the Museum of Natural Genomes.

So is this proposal one that should be given priority? It is certainly a legitimate scientific objective. But such proposals are becoming the routine thinking of Big Science, with very little expressed concern for the actual likely impact of the research, much less what else could be done with the same resources. We can't know for sure, and much will of course be learned, even including some surprising or even important facts here and there. But is it likely that such data will truly transform any thinking we already can do, or could do with only, say, 100 more sequences, targeted to specific problems? After all, we already know a lot about cows and dogs and salamanders. What is the sequence (of single individuals, deprived of phenotype data) going to tell us? After all, beetles are beetles.

It may seem that we're once again just being cranks and moaning about the state of the scientific world. But think about this: Just the entry fee of $10 billion alone could fund 10,000 million-dollar grants to do more focused, question-driven science. Or 100,000 $100,000 grants. That's a lot of real questions, and a lot of investigators (with careers of their own to worry about) who would be funded but won't be if the neo-beetles are collected. And the proliferation of opportunities would continue because there would be no demand for unending Museum maintenance expenses.

There is, of course, another very big difference between today and the glory days of Wallace and Bates canoeing upriver with their butterfly nets. In those days, vain wealthy people were paying the tab. Now, it's still feathering the scientists' nests, but it's with our money as taxpayers, and given all the lobbying and maneuvering, we don't really have a say in the science that gets done. This is a rather big change.

Maybe the proposers of the 10K vertebrate genome project feel like they're small potatoes compared the already presumptuous 100,000 human genomes project. And you know very well that insect and plant people will see the precedent, and we'll have the 10,000 Weed and Crop project, the 10,000,000 insect genome project (barely scratching the surface), and sea urchins will be next.

After all, if what you have to do to be a red-blooded American is simply propose more and more and more (and bigger), then where's the limit? (It's not the sky, as NASA's proposals clearly show)

So, we suggest a stunning precedent that would recognize some societal responsibility. Instead of more-more-more, how about those proposing the 10K vertebrate genome project saying, in a civic-minded way, that this is clearly scientific back-burner material of no urgency, and should wait until the $99.99 genome and automatic annotation and data-maintenance systems that don't t require endless additional funding to handle the additional data.

What? Wait a whole decade before we can see the South Assetia Leaping Mudwort's sequence? Yes. Because sequencing costs will surely come down. If by then we really still need to do our DNA-collecting, we'll be able to. Or maybe the novelty will have worn off, and we'll have other things on our minds.

But putting things in perspective, we should admit that our favorites are those big ones with the huge curved jaws and iridescent green carapace. Now they were really worth the effort!


Sam said...

$10 billion!?! $1000 x 10,000 = $10 billion??? Better check your zeros. Does your position change if the cost is cut by a facter of 1000 (assuming the predictions for sequencing costs are accurate)?

Anne Buchanan said...

Actually, what the Nature blurb says about the cost of the project is this:

"The group is looking for funding for the main phase
of the project, which could cost anywhere from US$10 million to $100 million, depending on the costs to process and sequence each sample. The team anticipates that sequencing costs will drop below $10,000 per genome within a few years, making it feasible to sequence the entire genomes of 10,000 vertebrates within this budget."

So, ahem, it does look like our math is off, but probably by more like a factor of 100--the cost of a single genome for this project isn't likely to be down to $1000. $100 million is still a lot of money that could be going elsewhere.

Thanks for catching that, Sam.

Ken Weiss said...

Yes, this is is a freshman mistake! So the degree of excess is less (or much less) than we said. I think that doesn't really change the argument, that this is a low-priority item in tight times when there are higher priority things that might be done, and especially to be able to spread the wealth around.

And also, at present, the curation and genome annotation costs are higher than they will eventually become.

Still, we should have checked our numbers more carefully. Rather than edit the post, covering our tracks, we'll leave it out for all to see. (We save the danger that our corrected numbers would also be wrong, the way Darwin's calculation in the Origin of Species, about elephant proliferation, changed but, as I recall, never got right)

Sam said...

Complete Genomics in Mountain View, CA (home of Google) says they sequenced three human genomes for $4400/ea in a recent Science paper. George Church was involved so I'm guessing that the 100,000 genome project you mentioned is going to involve some sort of Complete Genomics/Google/Church unholy trinity.

Presumably, the 10k project will be able to attain similar costs, and they will continue to drop as the project progresses.

As much as a question of cost, though, in all these projects is how the sampling is designed in order maximize information quantity and prevent the diminishing returns that must accompany sequencing the 10,000th vertebrate.

My understanding is that the 1000 genome project is essentially replicating the HapMap sampling. So, once again we will have a huge data set that purports to tell us about human variation from three (or four) populations.

In the 10k project, the decision will come down to whether it is more useful to sample certain clades (those that are cute/cuddly or we have an economic reason to favor) extensively, or to get as broad a spectrum as possible. 10,000 is a lot of species, maybe a reasonable balance can be achieved (maybe 5,000 would be enough).

These kind of scientific considerations would go a long way toward helping to judge if the overall costs are justified.

Ken Weiss said...

Well, 100 is a lot of species as far as that goes. And the other 4,900 (or 9,900) can wait without important loss until the cost is down to $99.99, and then anything anybody wants will be worthwhile by most measures.

The 1,000 genome sequence was explicitly done to keep the funding stream going. I'm not on the 'in' as far as what is going to be included, but as I understand it a good fraction will be the same HapMap samples. The advertising of the project claimed it would help identify rare disease variants, which is not likely at all, no matter what samples are used.

Without studying specific variation, a phenotype-free genome sequence isn't very informative given all already know about vertebrates, their relationships, and evolution.

All sorts of reasons will be raised in defense of do-it-now, and I can think of some myself. But my own view is that refining vertebrate phylogeny and that sort of thing is fine, but very low priority.

Not too long from now it will be doable for not much more than the price of a butterfly net.

My view is based on accountability for results that mean something for more than a rarefied few academics, but also to spread available research funding around, yes, in smaller packets, but to many more people who might have crisper ideas.

Of course, you may not agree!

Sam said...

I do agree, but I think a major source of the problem is the system that drives researchers to propose bigger and bigger projects. 10,000, as far as I can tell, is a completely arbitrary number chosen because it sounds impressive, and had to be bigger than the 1000 genomes project.

Expanding the number of vertebrate genomes could have profound effects on our ability to identify functionally important non-coding regions of the genome such as regulatory elements because comparisons of organisms of varying taxonomic distances is vital. So if the project had such a targeted goal it might be justifiable. This would require far fewer that 10,000 species, and would be more than simple beetle collecting. But it also requires a more nuanced understanding of the science and is therefore more difficult to sell to the public.

How can we reign in investigator's claims in a funding environment that seems to encourage and reward this kind of behavior? On the flip side, would the human genome project ever have gotten off the ground if we did? To a certain extent, grandiose claims do drive technological advances that allow even grandioser projects to be envisioned.

Ken Weiss said...

I basically agree. I am probably less convinced that sequence comparisons will do as as you say (about regulation) that can't be done by other approaches or, as you say fewer (far fewer) species. Regulatory regions move around and highly conserved ones may mainly already be detectable with sparser genome trees, where the background correlation is less than among closer species.

Anyway, the cultural aspect of science that leads to ever-grander claims as a strategy also means as a way of thinking, and that can distort the questions that are asked. Because it is part of our culture, it isn't going to be changed very much, if at all, by 'rational' arguments because that's not what it's about.

My other argument is about alternative uses of funds, research or otherwise, that don't get done because grandiose projects co-opt them. I prefer wider distribution of funding, to more people (and more junior people). And I think there's no reason not to prioritize, and delay things until their cost comes down. At $10 per sequence, nobody would complain about large-sample sequening. That day will come, in your career if not mine. And better, more focused, large-scale designs can be more definitive, more quickly anyway.

I do think uncritical schemes like this, driven by no particular hypothesis, are just like beetle collecting.

Some good will always come of it (including career support for scientists), but I do raise the question of that, relative to the interests of the public who's paying the tab.

These, of course are subjective judgments. Indeed, my cost-estimating gaffe shows that like everyone else, I'm vulnerable to seeing things through my own lenses and biases.

Steve Bates said...

Ken, I came here to offer the same correction as Sam in his first comment, but more just to say hello after more than three decades. As you know, I have no qualifications to comment on the content of your blog, but I surely am enjoying reading it... the writing is superb and I presume the science is the same!

Ken Weiss said...

Yes, Steve, it was a dumb error and showed both that we didn't pay close enough attention, and that we are trapped, like everyone, in our own prejudices (less likely to catch mistakes that tend to support our own point of view).

Our basic point is not changed, however, that proliferation of more, largely for its own sake, is the way things are done these days, and investigators often know that's just what they're doing. That doesn't mean there is no value to the science, but is not the best way to set priorities.

But, to your main point, it's great to hear from you after so many years! I've heard a bit about you from Anne via the recent reunion with Nancy and good ol' Bob Schwartz, and it makes us nostalgic for our Houston days!

Let's keep in touch. Did you know our daughter is a musician, now in Barcelona training in baroque violin performance?