Sandwalk: BioGPS

Saturday, January 17, 2009

BioGPS

BioGPS is billed as a "Biology Gene Portal System." It's another database. You can read the review on genomeweb but you will have to register [GNF Team Rolls Out BioGPS Gene Portal for Users and Contributors].

The brains behind BioGPS is Andrew Su at the Genomics Institute of the Novartis Research Foundation (GNF) in San Diego (USA). According to the genomeweb article ...

As scientists move forward in analyzing experimental results, they generally consult up to a dozen "standard web sites" Su said, such as Entrez Gene, Ensembl, UniProt, or the Mouse Genome Informatics site. Each site delivers "partially overlapping gene annotation," so users must visit each, enter their search, learn the interface, and learn how to find each of the genes of interest on that site, he said. "Often that is a quite daunting process."

The idea behind BioGPS, Su said, is to avoid that process as well as reveal to researchers smaller and less-known gene portals that scientists might have missed.

Call me skeptical. The author of the article, Vivian Marx, contacted me and asked me to check out BioGPS. I have a long-standing interest in biological databases dating back to an early attempt to improve and update GenBank by adding annotation. That attempt was a failure—for very sound reasons [Errors in Sequence Databases].

I looked at my favorite genes on BioGPS. Here's the link to their homepage: BioGPS. The first thing you notice is that that database is restricted to rat, mouse, and human genes. The second thing you notice is that there's no value added. The data appears to be copied from other databases. This includes all of the errors, omissions, and misinterpretations found at each site. The emphasis is on expression data—that's what overwhelms the visible record of each gene.

Here's an example. This is the human HSPA1L gene. It happens to be a member of the HSP70 gene family. HSP70 proteins are the major chaperones of the cell. The HSP1AL version is specifically expressed in testes.

The expression data is correct but none of the databases mention that this gene is a developmentally regulated member of the HSP70 gene family even though that information has been in the literature for almost twenty years. You don't learn anything from visiting BioGPS that you wouldn't learn from visiting most other databases and, more importantly, you don't learn the information that might be most important to your research because it isn't in any of the databases. Anyone looking at this record would be puzzled by the lack of connection between the correct expression profile and all of the other information.

It gets worse. If you check out the rat HSPA1L gene you won't even learn that it is developmentally regulated because the expression profile doesn't include testes. The links to this genes suggest that it responds to stress, but it doesn't.

This is just one example of the problems with biological databases. Collecting together links from a variety of databases doesn't help. It just ensures that the errors from each database will be combined, creating maximum confusion.

I'm quoted correctly in the article ...

Larry Moran, a biochemist at the University of Toronto, told BioInform by e-mail that he had looked at a few of his "favorite genes" in the portal. "I don't think it's a very useful database," he said, since it is a summary of information gleaned from other databases with "no attempt at annotation."

In addition, he said, "much of the information is wrong or misleading," such as some of the expression profiles, which "seem to be incorrect; probably because the data is for another gene and not the one in the database record."

Users "who would rely on that sort of expression data would be making a very serious mistake," he said."

Reacting to these comments, Su said, "I think it is a good thing, in terms of making those errors more widely seen. The more eyes that see it, the more likely that that error will be fixed."

Being able to detect errors, however, has to be connected to the ability to fix it, he said. "This is the wiki principle, everybody can edit it, everybody can fix it, everybody has the responsibility and the power to make sure it's correct."

In an ideal world, researchers will fix errors in the databases and a Wiki-like system seems like a good idea. The experiment is already underway [A Gene Wiki]. But, as it turns out, this approach is incredibly naive as I discovered from attempts to fix GenBank a few decades ago. Nobody's going to do it. It's way too much work and there's no motivation to share information on public databases.

I received an email message from one of the authors of the expression data. As you might expect, the expression data profiles that are so prominently featured in the BioGPS database records are from the team at The Genomics Institute of the Novartis Research Foundation (e.g. Su et al., 2004). Much of it may be correct—it certainly succeeded with the HSP1Al gene—but I think it's wrong for HSPA1A.

My correspondent pointed out that his expression data has been widely used by hundreds of researchers and the papers have tons of citations.¹ He described several studies that have made important discoveries based on the expression profiles that have been published. I don't doubt that this is true. That's not the point. The point is whether taking the expression data and adding links from other sources makes BioGPS a valuable resource.

Not as far as I can see.

1. The idea that just because a paper is widely quoted means that it must be correct is something that troubles me greatly. It seems to be part of the new way of doing science.

Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., Cooke, M.P., Walker, J.R. and Hogenesch, J.B. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. (USA) 101:6062-6067. [PubMed]

10 comments :

G8TR said...: Hi Larry -- why don't we let your readers decide whether you summarized my email accurately.

Larry,

I read your comments in Genome Web with regard to Andy Su's BioGPS project. I know how easy it is to be misquoted, and suspect that may be at play here.

However, as the senior author of the two existing gene atlas papers I thought I would respond to the two comments they ran with.

First, "much of the information is wrong or misleading". To be precise, none of the information is wrong -- all of the probe sets are annotated and we provide this information in a convenient format. Some of the probe sets may not be accurate reporters of expression, not sensitive enough, cross hybridize with other genes, etc. and we address these issues in the paper and provide recommendations for interpretation (Su et al., PNAS, 2002; Su et al., PNAS, 2004). More importantly, these resources are in wide and productive use in the research community (as evidenced by their citation and more than 10,000,000! instances they have been utilized -- you read the number correctly). To my delight, most of the use is by everyday biologists like myself -- this is what we intended. Some of the follow up work has been phenomenal, for example, Vamsi Mootha and his colleagues at the Broad have cloned not one but three different human disease loci using this resource! In short, just because you did not find the resource useful does not mean it is uniformly so -- the body of evidence in fact strongly argues against it.

Second, your comment that you "don't think its very useful because its a summary of existing data" misses the point entirely. BioGPS is designed to seamlessly and extensibly integrate existing data from several to many sources to allow people to more quickly accomplish their work. For example, you can collate expression information from several databases including GNFs, the Allen Brain project, Gensat, etc., and your own datasets without doing any of the heavy lifting yourself. If you want to work on the gene in the lab, another view allows you to get reagent information from public and commercial sources -- compare prices of antibodies if you want, easily w/o having to go to each database and look things up.

I realize the paper isn't yet written, so you can't benefit from explanations of the intent and features. However, I respectfully suggest injecting a little caution in publicly commenting on projects that are outside of your area of expertise.

Finally, I am a fan of your science blog -- with regards,

John

--
John Hogenesch, Ph.D.
Associate Professor, Pharmacology
Associate Director, Penn Genome Frontiers Institute
Institute for Translational Medicine and Therapeutics
University of Pennsylvania School of Medicine
810 BRBII/III
421 Curie Blvd
Philadelphia 19104-6160
phone 484-842-4232
hogenesc@mail.med.upenn.edu
http://bioinf.itmat.upenn.edu/hogeneschlab/; Saturday, January 17, 2009 12:21:00 PM
Unknown said...: This is more generally related to your idea of a gene wiki. I think one of the reasons there is little interest in doing that is there is no recognition and a possible loss of control when editing a wiki. I spend some time updating wikipedia articles on some genes/processes in which I am interested but it dosent go on my CV and its not considered an authoritative source. A lot of people would consider it a waste of time until some sort of acknowledgement is given to these efforts. Thats why its easier to just mash together a dozen databases and claim that its comprehensive (all the while being able to blame the source databases for any errors); Saturday, January 17, 2009 12:57:00 PM
G8TR said...: The main point of BioGPS is that its extensible and customizable. The goal is not creating new data -- rather providing easier access to existing resources -- and hooks for other developers or data providers to not have to recode the underpinnings.

Use case: lets say you want to buy and antibody against YFG (your favorite gene). BioGPS provides a way enter YFG with links to commercially available antibodies without you having to know about the companies a priori or enter YFG at each one.

Finally, no one is claiming BioGPS is comprehensive -- no annotation effort can ever be comprehensive. We'll never know all the things there are to know about our genes.; Saturday, January 17, 2009 1:32:00 PM
Larry Moran said...: G8TR says,

Hi Larry -- why don't we let your readers decide whether you summarized my email accurately.

Thanks for quoting the email message. I have a policy of not quoting directly from personal email.; Saturday, January 17, 2009 6:27:00 PM
G8TR said...: Your very welcome.; Saturday, January 17, 2009 8:00:00 PM
Anonymous said...: His very welcome?; Saturday, January 17, 2009 11:21:00 PM
G8TR said...: Speech recognition software for repetitive motion disorder. Thanks for pointing out the error, though, it was certainly germane to the discussion.; Sunday, January 18, 2009 5:35:00 AM
Andrew said...: BioGPS's target audience

"Larry's right, we're not attempting to do annotation, so BioGPS might not be useful to him. But he's not our target audience either..."; Monday, January 19, 2009 4:53:00 PM
Larry Moran said...: Here's what the BioGPS team says on BioGPS's target audience.

Larry has focused his entire scientific career on the study of HSP70 family genes. For people like Larry who only care about a handful of genes, they really don't have a great need for gene portals. They know their genes backward and forward, and they get their information directly by following the primary literature. Relative to that, every gene portal will be missing important information.

You miss the point. There are hundreds of people who are knowledgeable about a small subset of genes. Almost all of them agree that the existing databases are not very accurate with respect to their genes.

What's the logical conclusion?

I suppose you could conclude that the only thing wrong with biological databases is that they aren't very accurate for those genes that have been intensively studied but they are extremely useful for all the other genes.

Large scale experiments, such as expression studies, are very important and useful but the very nature of such work means that the researchers don't know very much about the genes they're working with.

The goal is to connect the survey results with the experts on individual genes to see if the survey results are accurate. You don't so this by simply linking to existing biological databases and hoping that everyone will assume the database entries are accurate, and so are the survey results.

Real science means getting down and dirty and exploring the details. Superficial isn't going to work and it could, in fact, be very harmful.; Tuesday, January 20, 2009 10:31:00 AM
Andrew said...: Reply to Larry's comment posted here.

"Does your blender also cook eggs for you in the morning? If not, is the blender useless?"; Tuesday, January 20, 2009 1:57:00 PM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Saturday, January 17, 2009

BioGPS

10 comments :