The other day I was browsing through recently published papers in PLoS Biology and came across this one.
Field D, Amaral-Zettler L, Cochrane G, Cole JR, Dawyndt P, et al. 2011 The Genomic Standards Consortium. PLoS Biol 9(6): e1001088. doi:10.1371/journal.pbio.1001088.I'm interested in this sort of thing since back in the olden days (1993) I spent a bit of time at GenBank exploring annotation issues with a view to correcting the growing number of errors that were being propagated in online databases.
Abstract
A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.
It's an insoluble problem and I doubt very much that a new organization is going to help.
But that's not what I want to talk about. Near the end of the article in PLoS Biology you find this paragraph.
The Internet has resulted in a Cambrian explosion of productivity and data sharing through the adoption of a huge stack of agreed-upon protocols (standards) that allow many devices and programs to communicate to the transformative benefit of the everyday user [26]. Enabling access to user-generated content is key to harnessing the resources of a distributed community: Flickr has over 5 billion photographs uploaded, and Wikipedia has over 3.5 million English articles as of this writing. Standards for organizing sequence data will be similarly needed as sequencing instruments themselves, especially as these instruments are more and more commoditized and owned by individuals rather than institutions.I'm sad to find this sort of content-free language creeping into scientific journals. We've been spared up to now but it looks like the 23 scientists listed as authors feel comfortable with this new style of writing.
I'll agree that "Cambrian" (terrible hackneyed metaphor) and "transformative" (shudder - that just brings Alan Sokal to mind) should never have made it into that paragraph, but if you remove those two words all of the remaining ones have content. It's not riveting stuff, but they are saying something concrete.
ReplyDeleteThis is a fuzzy one. The article is classed as a "community page" and not a scientific paper.
ReplyDeleteFrom the PLoS article descriptions;
The Community Page is a forum for organizations and societies to highlight their efforts to enhance the dissemination and value of scientific knowledge.
So it is not a journal article, but in their "magazine section". But is does get a full citation. Not sure what to think of that class of articles.
The Cambrian explosion line makes me cringe.
ReplyDeleteThis is not a research article but is published under the header of Community Pages. Different writing styles are appropriate for different types of writing. Does it belong in a scientific journal? Sure.
ReplyDeleteWell the language does not bother me, but the idea of yet another group inventing more "standards" does bother me. In my opinion these groups consist of people who want me to do work (complying with these standards can be VERY difficult) for them, for free.
ReplyDeleteIf they want to use the data that I generate, I would think that they should do a little work too. I spent the better part of a month hammering my square data into the round hole that was the required format. Since then, nobody has downloaded that data. Time not well spent.
Whether the flowery non sequiturs language of the article are appropriate aside, 'omics' standards are unfortunately sorely needed. Datasets have become unreasonably large and extremely tedious to convert between the multitude of different near identical competing formats. We have different formats for raw short reads, mapped reads, annotation files, etc. Some of these are more compact, or store more pieces of valuable information and as it stands, downloading any 'seq' dataset from GenBank requires you to assume that the depositors handled the data in a way you're comfortable with (it's not raw data). As the NCBI short-read archive is having problems maintaining funding, a serious discussion about these issues is required.
ReplyDeleteThe article is labeled as community, but if that's not acceptable, then we need some appropriate (i.e., not on blogs) location to discuss the possibility of such standards, or else you're going to a) require serious computational skills and b) waste a lot of time whenever you're involved in this type of work.
The language is terrible but it's not a research paper. Standards are good! Better to have 60% in a good universally available standard than 100% in a mishmash of everyone's own idea of what is good. The group developing the standard needs to be small and influential with journal editors. Genebank is terribly outdated. Usability of most other databases is limited at best. PDB is by far the best because it has oldest and most adhered to standards.
ReplyDeleteAnonymous,
ReplyDeleteIf they want to use the data that I generate, I would think that they should do a little work too. I spent the better part of a month hammering my square data into the round hole that was the required format. Since then, nobody has downloaded that data. Time not well spent.
I think you would have to be more specific. I for one prefer everybody to have an easy time downloading and using my data. As things are, when I download something, even from NCBI, I can't assume that I can handle the files in the very same way. Annotations don't have the same quality, et cetera. I used to hear, and it was almost true for me as well, some people complain about spending 80% of their time just dealing with unusual data formats, and cleaning up before they could do any analyses. This will be solved once we are in full into the hyper-sequencing era (almost there), when rejecting data not conforming with one or another standard will not affect the analyses.
Anonymous,
ReplyDeleteAlso, I think that if your data was square and the standards round, then you chose the wrong standard. The idea of standards is not to force squares into round holes, but to have square holes for square data, and round holes for round data. If there is no round hole, complain and suggest for the better.
Anyway, if you don't want people to easily use your data that's fine. You will be one giant whose shoulders will not be available for others to stand and see farther.
Anyway, if you don't want people to easily use your data that's fine.
ReplyDeleteNo, it's not fine. There is no, not even one, modern research program that exists without taxpayers footing significant portion of the bill. We taxpayers want our money to be used efficiently - and that includes disseminating results widely.
It reminds me of so many articles that have the phrase "toward a new..." In other words, they want you to go a particular direction even if you don't want to go there or think it is a valid line of reasoning.
ReplyDeleteDr. Moran - you seem to enjoy taking down the arguments in the purportedly "serious" books and articles of the IDiots and accommodationists. And while I enjoy reading your posts, Dembski et al... seem like low hanging fruit.
ReplyDeleteAre you familiar with Conor Cunningham's "Darwin's Pious Idea: Why the Ultra-Darwinists and Creationists Both Get It Wrong". It came out December of last year. I haven't read it, but I've perused several reviews and it seems right up your alley. It has been praised by some prominent philosophers, churchmen, accommodationists, academics, et al....
He is strongly pro-scientific Evolution, and he is a Christian. He does however criticize both the religious critics of evolution and the atheistic Neo-Darwinian critics of religion as relying on a similar Paley-esque (mis)understanding of religion.
Would love to hear your breakdown of a more thoughtful attempt at accommodationism, beyond the bullshit that keeps coming from Seattle.
DK,
ReplyDeleteNo, it's not fine. There is no, not even one, modern research program that exists without taxpayers footing significant portion of the bill. We taxpayers want our money to be used efficiently - and that includes disseminating results widely.
My apologies. You are right.
anonymous says,
ReplyDeleteAre you familiar with Conor Cunningham's "Darwin's Pious Idea: Why the Ultra-Darwinists and Creationists Both Get It Wrong".
I've read the book. Before I obtained a copy I commented on Cunningham's serious misunderstanding of modern evolutionary theory as seen in a video: A Defense of the "Theistic Evolution" Version of Creationism.
That's the problem with his book. He attacks many of the same abuses I attack. The difference is that I recognize a better version of evolutionary theory that deserves to be popularized. Cunningham seems to think that all biologists believe in the nonsense promoted by the most extreme ultra-Darinians and evolutionary psychologust. He hasn't bothered to read an introductory textbook on evolution.
Cunningham has a particular hatred for Richard Dawkins and The Selfish Gene. I do not agree with his statements about Dawkins.
Cunningham is a theologian and philiosopher at the University of Nottingham in England. He writes extensively about biology.
Ironically, this same group of philosophers complain bitterly whenever a scientist ventures into the realm of theology or philosophy.
Isn't that strange?
I'm interested in knowing where they got their information for "Wikipedia has over 3.5 million English articles as of this writing"
ReplyDeleteWas it Wikipedia? does that mean they cited Wikipedia in a scientific journal?
Is that allowed? I feel like since Gr. 9 I've always been told by teachers/lecturers that citing Wikipedia would be an automatic (insert % here) deduction from your paper.
Perhaps this article does belong in a scientific journal, but whatever the writers got paid I'd like a (insert $ here) deduction.