More Recent Comments

Thursday, May 06, 2010

I Don't Have Time for This!

 
The banner headline on the front page of The Toronto Star says, "U of T cracks the code." You can read the newspaper article on their website: U of T team decodes secret messages of our genes. ("U of T" refers to the University of Toronto - our newspaper thinks we're the only "T" university in the entire world.)

The hyperbole is beyond disgusting.

The work comes from labs run by Brendan Frey and Ben Blencowe and it claims to have discovered the "splicing code" mediating alternative splicing (Barash et al., 2010). You'll have to read the paper yourself to see it the headlines are justified. It's clear that Nature thought it was important 'cause they hyped it on the front cover of this week's issue.

The frequency of alternative splicing is a genuine scientific controversy. We've known for 30 years that some genes are alternatively spliced to produce different protein products. The controversy is over what percentage of genes have genuine biologically relevant alternative splice variants and what percentage simply exhibit low levels of inappropriate splicing errors.

Personally, I think most of the predicted splice variants are impossible. The data must be detecting splicing errors [Two Examples of "Alternative Splicing"]. I'd be surprised if more than 5% of human genes are alternatively spliced in a biologically relevant manner.

Barash et al. (2010) disagree. They begin their paper with the common mantra of the true believers.
Transcripts from approximately 95% of multi-exon human genes are spliced in more than one way, and in most cases the resulting transcripts are variably expressed between different cell and tissue types. This process of alternative splicing shapes how genetic information controls numerous critical cellular processes, and it is estimated that 15% to 50% of human disease mutations affect splice site selection.
I don't object to scientists who hold points of view that are different than mine—even if they're wrong! What I object to is those scientists who promote their personal opinions in scientific papers without even acknowledging that there's a genuine scientific controversy. You have to look very carefully in this paper for any mention of the idea that a lot of alternative splicing could simply be due to mistakes in the splicing machinery. And if that's true, then the "splicing code" that they've "deciphered" is just a way of detecting when the machinery will make a mistake.

We've come to expect that science writers can be taken in by scientists who exaggerate the importance of their own work, so I'm not blaming the journalists at The Toronto Star and I'm not even blaming the person who wrote the University of Toronto press release [U of T researchers crack 'splicing code']. I'll even forgive the writers at Nature for failing to be skeptical [The code within the code] [Gene regulation: Breaking the second genetic code].

It's scientists who have to accept the blame for the way science is presented to the general public.
Frey compared his computer decoder to the German Enigma encryption device, which helped the Allies defeat the Nazis after it fell into their hands.

“Just like in the old cryptographic systems in World War II, you’d have the Enigma machine…which would take an instruction and encode it in a complicated set of symbols,” he said.

“Well, biology works the same way. It turns out to control genetic messaging it makes use of a complicated set of symbols that are hidden in DNA.”
Given the number of biological activities needed to grow and govern our bodies, scientists had believed humans must have 100,000 genes or more to direct those myriad functions.

But that genomic search of the 3 billion base pairs that make up the rungs of our twisting DNA ladders revealed a meagre 20,000 genes, about the same number as the lowly nematode worm boasts.

“The nematode has about 1,000 cells, and we have at least 1,000 different neuron (cells) in our brains alone,” said Benjamin Blencowe, a U of T biochemist and the study’s co-senior author.

To achieve this huge complexity, our genes must be monumental multi-taskers, with each one having the potential to do dozens or even hundreds of different things in different parts of the body.

And to be such adroit role switchers, each gene must have an immensely complex set of instructions – or a code – to tell them what to do in any of the different tissues they need to perform in.
I wish I had time to present a good review of the paper but I don't. Sorry.


Barash, Y., Calarco, J.A., Gao, W., Qun Pan, Q., Wang, X., Shai, O., Benjamin J. Blencowe, and Frey, B.J. (2010) Deciphering the splicing code. Nature 465: 53–59. [doi:10.1038/nature09000] [Supplementary Information]

26 comments :

crf said...

It's like they're code-breaking the Inigma machine?

And you're disagreeing that there is even a code! THEN, OMG, THAT'S JUST WHAT HITLER WOULD SAY!

Georgi Marinov said...

That the paper is overhyped beyond any justifiable level is undeniable. It's also undeniable that it is completely unjustified to assume that just because you see something in the data, it is functional. However, I don't think the "5% functional alternative transcripts" number is correct either, as if you look at deep sequencing data, many more genes than that show expression of different dominant isoforms between cell types, which to me implies that the phenomenon is more widespread than that. Of course not all of it is due to alternative splicing, we have alternative TSS choice and polyadenylation sites, but still, there is more diversity than 5%. A lot, maybe most of those junctions you can find in the data are probably due to errors in a mechanisms that doesn't have to be always accurate to let the organisms survive, but not all of them

Larry Moran said...

Georgi Marinov says,

However, I don't think the "5% functional alternative transcripts" number is correct either, as if you look at deep sequencing data, many more genes than that show expression of different dominant isoforms between cell types, which to me implies that the phenomenon is more widespread than that. Of course not all of it is due to alternative splicing, we have alternative TSS choice and polyadenylation sites, but still, there is more diversity than 5%.

We agree that a lot of what passes for variation is probably accident.

What we need to know is how much is biologically relevant. I agree with you that quantification of various alternatively spliced transcripts is the key bit of data we need. I'd love to see a table showing how many genes have reproducible levels of different transcripts in different cell types where the levels are clearly stated.

What cutoff would convince you that a real biological phenomenon is in play? Would the minor mRNA have to be greater than 0.1% of the major one or would it have to be 1% or 10%?

The fact that there could be different splice variants in one tissue than in another is not proof that they are functional. Given that different tissues have different splice factors, we expect that different error levels will occur in different tissues. Some of the deep sequencing methods are perfectly capable of picking up less than one transcript per cell and distinguishing that from 1/10th that amount. It looks like a ten-fold difference but is it relevant?

If you know of any paper that provides information on the levels of various transcripts per cell for all 20,000 genes then I'll be happy to read it. Heck, if you can even point me to papers that show 1000 genes like this I'd change my position.

Waiting .....

While Georgi is searching the literature, the rest of you might enjoy looking at the predicted alternatively spliced RNA for the most highly conserved gene in biology [The Frequency of Alternative Splicing]. Does anyone seriously think this gene has 18 different mRNAs, some of which only encode a small piece of the 70KDa protein?

This is what you need to defend if you're a proponent of the idea that most human genes show alternative splicing. Good luck.

Georgi Marinov said...

The literature on transcript quantification from deep sequencing is just appearing now, but people have been working on it for a while.

http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.1621.html#/

Of course, there is no published paper that looks at all 20,000 genes, because this would involve doing RNA-Seq on a lot of tissues and cell types. Which is certainly being done in various labs around the world but it will take some time for the results to be published

BTW, I was not referring to different minor isoforms found at 1% of the level of the major isoform, I was referring to major isoform switches between cell types

Anonymous said...

...the rest of you might enjoy looking at the predicted alternatively spliced RNA for the most highly conserved gene in biology

Actually, I'd be more interested in predicted alternatively spliced RNA for rapidly evolving recent genes. You're seriously (and unfairly) stacking the deck in favor of spurious results by specifying "the most highly conserved gene in biology" for your analysis.

Psi Wavefunction said...

IMNSHO, alternative splicing is way overrated. Just subfunctionalisation on a little bit of crack...

Larry Moran said...

anonymous says,

You're seriously (and unfairly) stacking the deck in favor of spurious results by specifying "the most highly conserved gene in biology" for your analysis.

I take this as an admission that you can't explain the predicted alternative transcripts of the BiP gene, right?

If you recognize that the predicted alternative transcripts of a well-studied gene are mostly artifacts, then what's your basis for concluding that the predictions of a less well-studied gene are mostly accurate?

I thought rationality was something that scientists are supposed to admire? :-)

Larry Moran said...

Georgi Marinov says,

BTW, I was not referring to different minor isoforms found at 1% of the level of the major isoform, I was referring to major isoform switches between cell types

Fine. What's your cutoff? What percentage of the predominant form do you consider to be a "major isoform"? It would be good to specify this before the results are in so you can't be accused of post-hoc rationalization.

Let's say one tissue has 5000 copies per cell of a particular mRNA and 100 copies of a minor isoform (2%). Another tissue has only 150 copies of each isoform per cell (50%). Does that count?

These numbers are important. It astonishes me that support for abundant, biologically relevant, alternative splicing is so widespread when nobody knows what the data says.

Anonymous said...

If you recognize that the predicted alternative transcripts of a well-studied gene are mostly artifacts, then what's your basis for concluding that the predictions of a less well-studied gene are mostly accurate?

Now you're being silly. The predictions are a starting point, to be followed up with experimental data. Of course the predictions are not going to be completely accurate (the recent UofT paper notwithstanding). The decision of which predictions are worth expending resources for experimental confirmation has to be informed by our knowledge of the underlying biology of the genes in question.

Georgi Marinov said...
This comment has been removed by the author.
Georgi Marinov said...

Let's say one tissue has 5000 copies per cell of a particular mRNA and 100 copies of a minor isoform (2%). Another tissue has only 150 copies of each isoform per cell (50%). Does that count?

There aren't that many genes expressed at 5000 copies per cell in any cell type, so it's not fair to require that. What I was talking about is one isoform being expressed at say, N1 copies per cell in one cell type with some or no minor isoforms there, and another isoform being expressed at N2 copies per cell with some or no minor isoforms in that cell type, where both N1 and N2 are significantly larger than the expression of the minor isoforms. In the paper I cited, at conservative cutoff levels, this happens a few hundred to a thousand times during a single developmental transition.

I am by no means a supporter of the "I see a read mapping here, therefore there must be a functional transcript it is coming from" approach to the data, but you are often at the other extreme, being too ready to dismiss data altogether

Larry Moran said...

One of my colleagues has alerted me to a paper that provides some quantitative data for us to sink our teeth into.

Work from Chris Burge's lab at MIT indicates that 86% of human genes produce a minor isoform that is 15% or more of the level of the major isoform. If this data holds up, if strongly suggests that the minor alternatively spliced form isn't just "noise."

Wang et al. (2008) Nature 456: 470-476.

Larry Moran said...

Georgi Marinov says,

... but you are often at the other extreme, being too ready to dismiss data altogether

I teach a course on "Scientific Controversies and Misconceptions" and one of the main points is that in real scientific controversies the data is contradictory.

I try to make graduate students (and colleagues) understand the implications of this. What it means is that the controversy isn't likely to be resolved with just one or two experiments. It also means that some of the data has to be wrong.

Scientists are obliged to recognize and respect other interpretations of the data. In my case, I'm fully aware of the deep sequencing experiments (but see comment above) and the EST data. I'm fully aware of the fact that many people interpret this data as evidence for massive amounts of biologically relevant alternative splicing.

I still prefer to interpret the data as being mostly due to errors in splicing, or "noise." I think the data showing that much of it is noise is more credible than the data showing that it is functional. (see Melamud and Moult (2009) Nucl. Acid Res. 37:4862-4872).

The thing that troubles me the most is when scientists—mostly those on the other side of this controversy—completely ignore the fact that there even IS a controversy. They publish papers where they don't even bother to reference the work of people who disagree with their interpretation. Note, for example, that there are no references to proponents of "noise" in the paper from the Blencowe/Frey labs.

We realize, of course, why they're doing this. If most alternative splicing is due to mistakes in the splicing machinery then all the recent paper has done is develop a method for predicting when these errors are likely to happen. That interpretation would greatly change the nature of their discovery and might even mean that their paper wouldn't be on the cover of Nature. So, I can guess why they would ignore the controversy.

This might qualify as unethical behavior. What do you think?

DK said...

I am 100% with Larry on this one. The fact is, ALL of the transcript detection employed in these 'omics studies is not quantitative. Sure, authors like to pretend that it is, but no, it isn't. The question is simple and the way to answer it is very straightforward. But very labor-intensive. First, forget RNA. Protein-coding gene expression is not transcription. It's translation. So, make and purify Ab against every exon from >50 genes known to have tissue-specific isoforms. And from >50 genes that we don't know this about (including some, like HSA and actin, where we know fir sure that there are no splice isoforms at all). After that, only about 10,000 of Western runs ought to tell what's real and what's not. It will also provide an experimental test for the predictions made by paper biochemists cracking the enigma codes. But, of course, that's too difficult and not sexy to actually be funded and be done.

Anonymous said...

This might qualify as unethical behavior. What do you think?

Good question. This might have qualified as unethical 30 years ago, but this is science in the 21st century and zealously ruthless self-promotion is required for survival today. I think as long as students of the literature are aware of this trend the harm done is not too great. Unfortunately however, the general public (read: media) really get mislead quite badly: a topic you have previously discussed at some length.

Anonymous said...

1. This back and forth was better than the best journal club meeting I have ever attended. Thank you all.

2. Re the 86% of human genes making an alternatively spliced product that is at 15% or more of the level of the major splice variant (phew): Does this suggest functionality merely because of the high level (or have I missed your point completely?)? I am perfectly willing to accept such a high error rate - look at the estimated 10-20% of fertilized eggs and blastocysts that spontaneously give up.

Anonymous said...

I teach introductory biology courses at the community college level (soon in a tenure track position), often for non-majors who have little interest in the subject. I like to think that this gives me a better snapshot of what the majority of people are thinking regarding biology than my soon to be ending research job at TSRI. If it does, than your average person is very confused when they hear about contradictory findings (real or oversold). Probably the most important thing I do in these classes is try to explain how science works and why disagreement is so important.

Georgi Marinov said...

Work from Chris Burge's lab at MIT indicates that 86% of human genes produce a minor isoform that is 15% or more of the level of the major isoform. If this data holds up, if strongly suggests that the minor alternatively spliced form isn't just "noise."

Wang et al. (2008) Nature 456: 470-476.


The problem with the Wang et al. paper is that at the time reads were single-end 32bp, which seriously confounds mapping when you're dealing with splices, and no real attempt was made there to quantify individual transcripts.

Reads now are paired-end, 75-100bp, and will be even longer with further improvements in sequencing technologies, and transcript-level quantification why still computationally difficult and not getting it right 100% of the time, at least exists, so we can say a lot more about the transcriptome and with much higher confidence.

SPARC said...

This was so predictable: DI's Casey Luskin jumps the waggon.
Casey Luskin

<

SPARC said...

Let's hope that the author's didn't use PLIER for their analyses.

Georgi Marinov said...
This comment has been removed by the author.
Georgi Marinov said...

Scientists are obliged to recognize and respect other interpretations of the data. In my case, I'm fully aware of the deep sequencing experiments (but see comment above) and the EST data. I'm fully aware of the fact that many people interpret this data as evidence for massive amounts of biologically relevant alternative splicing.

I still prefer to interpret the data as being mostly due to errors in splicing, or "noise." I think the data showing that much of it is noise is more credible than the data showing that it is functional. (see Melamud and Moult (2009) Nucl. Acid Res. 37:4862-4872).

The thing that troubles me the most is when scientists—mostly those on the other side of this controversy—completely ignore the fact that there even IS a controversy. They publish papers where they don't even bother to reference the work of people who disagree with their interpretation. Note, for example, that there are no references to proponents of "noise" in the paper from the Blencowe/Frey labs.


See, I agree with you about the basic point of most of the novel transcripts being noise, and this come from having actually worked
with the data and seen where those come from and what they look like. What I don't agree is the 5% number, I claim it is higher than that, that's all.


We realize, of course, why they're doing this. If most alternative splicing is due to mistakes in the splicing machinery then all the recent paper has done is develop a method for predicting when these errors are likely to happen. That interpretation would greatly change the nature of their discovery and might even mean that their paper wouldn't be on the cover of Nature. So, I can guess why they would ignore the controversy.

This might qualify as unethical behavior. What do you think?


Yes, it is unethical behavior. But I am not certain that it is the case in all papers that people have published on the subject

Georgi Marinov said...

I am 100% with Larry on this one. The fact is, ALL of the transcript detection employed in these 'omics studies is not quantitative. Sure, authors like to pretend that it is, but no, it isn't. The question is simple and the way to answer it is very straightforward. But very labor-intensive. First, forget RNA. Protein-coding gene expression is not transcription. It's translation.

The problem with this is that transcripts don't have to be coding for proteins to be functional

So, make and purify Ab against every exon from >50 genes known to have tissue-specific isoforms. And from >50 genes that we don't know this about (including some, like HSA and actin, where we know fir sure that there are no splice isoforms at all). After that, only about 10,000 of Western runs ought to tell what's real and what's not. It will also provide an experimental test for the predictions made by paper biochemists cracking the enigma codes. But, of course, that's too difficult and not sexy to actually be funded and be done.

I have a hard time seeing how such a thing would ever even work. People are having a hard enough time trying to make good reliably working and sufficiently specific antibodies against whole proteins, against exons it is a whole another nightmarish level of difficulty

DK said...

The problem with this is that transcripts don't have to be coding for proteins to be functional

That's a cop out. Whenever people talk about function of splice isoforms, they they talk mainly, if not exclusively, about protein isoforms.

I have a hard time seeing how such a thing would ever even work.

Is it possible that this is because you've never done it yourself?

People are having a hard enough time trying to make good reliably working and sufficiently specific antibodies against whole proteins, against exons it is a whole another nightmarish level of difficulty

If people are having hard time making good Ab for Westerns, then it only means that these people are completely incompetent.

These aren't MAbs we are talking about and aren't Ab that should recognize native folds. Stick an exon into bacterial expression vector (make it into fusion protein if exon is < 150 bp), purify the protein (denatured or not, doesn't matter) = perfect antigen. Purify total IgG against membrane containing crapload of the antigen = perfect primaries. (Preabsorb against fusion partner if necessary). 1.5 month from A to Z. Trivial. Can, if desired, be made into quasi high throughput thing.

charlie wagner said...

I read the paper.

They're right. You're wrong.

Sorry!

Devin said...

^
You must have misread it. Don't worry, it happens.