Sunday, January 09, 2011

Splicing Error Rate May Be Close to 1%

Alex Ling alerted me to an important paper in last month's issue of PLoS Genetics. Pickrell et al. (2010) looked at low abundance RNAs in order to determine how many transcripts showed evidence of possible splicing errors. They found a lot of "alternative" spliced transcripts where the new splice junction was not conserved in other species and was used rarely. They attribute this to splicing errors. Their calculation suggests that the splicing apparatus makes a mistake 0.7% of the time.

This has profound implication for the interpretation of alternative splicing data. If Pickerell et al. are correct—and they aren't the only ones to raise this issue—then claims about alternative splicing being a common phenomenon are wrong. At the very least, those claims are controversial and every time you see such a claim in the scientific literature it should be accompanied by a statement about possible artifacts due to splicing errors. If you don't see that mentioned in the paper then you know you aren't dealing with a real scientist.

Here's the abstract and the author summary ..
Abstract

While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution.

Author Summary

Most human genes are split into pieces, such that the protein-coding parts (exons) are separated in the genome by large tracts of non-coding DNA (introns) that must be transcribed and spliced out to create a functional transcript. Variation in splicing reactions can create multiple transcripts from the same gene, yet the function for many of these alternative transcripts is unknown. In this study, we show that many of these transcripts are due to splicing errors which are not preserved over evolutionary time. We estimate that the error rate in the splicing of an intron is about 0.7% and demonstrate that there are two major types of splicing error: errors in the recognition of exons and errors in the precise choice of splice site. These results raise the possibility that variation in levels of alternative splicing across species may in part be to variation in splicing error rate.


Pickrell, J.K., Pai, A.A., and Gilad, Y., Pritchard, J.P. (2010) Noisy Splicing Drives mRNA Isoform Diversity in Human Cells. PLoS Genet 6(12): e1001236. doi:10.1371/journal.pgen.1001236

12 comments :

  1. A bit off-topic but:
    "Abstract" and "Author Summary"? What's up with that?

    Abstract (noun): A summary of a text, scientific article, document, speech, etc.

    The two are synonymous - as also evidenced by the obvious redundancy of the text.

    ReplyDelete
  2. I have always maintained that there is a link between alternative splicing and gene duplication.

    AS is basically a way of achieving variation relatively quickly compared to GD and subsequent mutation.

    There is also little doubt that AS can lead to framshifts and premature truncations which are almost always useless.

    ReplyDelete
  3. They doubled the number of previously observed splice junctions. Maybe these guys are just extra good at generating bogus data?

    ReplyDelete
  4. From the PLoS website...

    Abstract

    The abstract of the paper should be succinct; it must not exceed 300 words. Authors should mention the techniques used without going into methodological detail and should summarize the most important results. While the abstract is conceptually divided into three sections (Background, Principal Findings, and Significance), please do not apply these distinct headings to the abstract within the article file. We would however encourage you to include Background, Principal Findings, and Significance headings within the abstract field of the submission system. Please do not include any citations and avoid specialist abbreviations.
    Author Summary

    We ask that all authors of research articles include a 150–200 word non-technical summary of the work as part of the manuscript to immediately follow the abstract. This text is subject to editorial change, should be written in the first-person voice, and should be distinct from the scientific abstract. Aim to highlight where your work fits within a broader context; present the significance or possible implications of your work simply and objectively; and avoid the use of acronyms and complex terminology wherever possible. The goal is to make your findings accessible to a wide audience that includes both scientists and non-scientists. Authors may benefit from consulting with a science writer or press officer to ensure they effectively communicate their findings to a general audience.

    ReplyDelete
  5. Thanks for pointing out that paper. I love papers that end with testable hypotheses for their claims.

    ReplyDelete
  6. Reza says,

    AS is basically a way of achieving variation relatively quickly compared to GD and subsequent mutation.

    Statements like that get us into semantic difficulties that I'd prefer to avoid. Mutations that affect splicing can produce variation, just as all mutations produce variation.

    When you use the term "AS" (alternative splicing) it implies biological function—or at least it should. I prefer to restrict the term "alternative splicing" to those events that are known to be biologically relevant. If you apply the term "alternative splicing" to all splicing errors then it becomes meaningless.

    Most forms of true alternative splicing involve inserting or removing an exon in the middle of a gene. If you take a typical protein-encoding gene then it's extremely unlikely that such an event will lead to a functional product. Thus, I doubt very much whether this is a common way of evolving new protein functions. If that were true then you would expect a significant number of genes to show true alternative splicing and that has not been demonstrated.

    I suspect that less that 5% of human genes exhibit true functional alternative splicing and even those examples are highly regulated. If you want a new functional protein it's probably much easier to duplicate the gene and mutate one of the copies. That's what the evidence suggests.

    ReplyDelete
  7. Thanks for the comments on our paper.

    They doubled the number of previously observed splice junctions. Maybe these guys are just extra good at generating bogus data?

    This was (perhaps obviously) one of our first concerns. The data which convinced us that this was not the case is presented in Figure 1; in particular, nearly all the new junctions carry GT-AG dinucleotides intronic of the inferred splice junction (this known sequence specificity of the splicing reaction was not used in their identification), and we recapitulate the previously observed periodic pattern of positions of alternative splice junctions.

    ReplyDelete
  8. I dont understand how they can draw a line and say theses are errors and these are not. This quote, from a gene duplication paper, works just as well here: "Our inclination is to categorize the most frequent and kinetically favourable interactions as the right ones and the minor ones as tolerable errors, but in the absence of any grand designer there are no right or wrong interactions — just handholds of different sizes that selection can use to climb a fitness mountain."

    Shouldn't the question be about how adaptive these events are?

    The most worrisome aspect of studying AS to me is RT polymerase errors. There is clear evidence of widespread RT errors, especially template switching which would look just like AS without the proper controls. I have a expect that the reason we keep finding novel events is because they keep being made in during the RT step.

    Conant GC, Wolfe KH. Turning a hobby into a job: how duplicated genes find new functions. Nature reviews. Genetics. 2008;9(12):938-50. Available at: http://www.ncbi.nlm.nih.gov/pubmed/19015656.

    ReplyDelete
  9. Alternative splicing that produces functional isoforms is observed in the majority of human genes. The higher an organism is on the evolutionary ladder, the higher the percentage of its genes that are alternatively spliced. I make this statement as a lifelong AS researcher, and the data I implicitly recruit to make it come from painstaking functional analyses -- not from the type of work like this, that generates more noise than signal.

    ReplyDelete
  10. Athena Andreadis says,

    Alternative splicing that produces functional isoforms is observed in the majority of human genes.

    That is an incorrect statement. The only way you could rationalize such a statement is to re-define the word "functional."

    The higher an organism is on the evolutionary ladder, the higher the percentage of its genes that are alternatively spliced.

    There is no such thing as an evolutionary ladder and there's no scientific way to tell whether humans are "higher" than fruit flies or cyanobacteria. Your use of those words is very disturbing.

    I make this statement as a lifelong AS researcher, and the data I implicitly recruit to make it come from painstaking functional analyses -- not from the type of work like this, that generates more noise than signal.

    It saddens me to think that a "life-long" AS researcher actually believes that "painstaking functional analysis" has demonstrated your point—namely, that the majority of genes undergo biologically relevant (functional) alternative splicing.

    I'm not denying that alternative splicing exists. We've been teaching that to undergraduates for 30 years. I'm not denying that your favorite gene (tau) exhibits alternative splicing. What I'm challenging is the claim that a large percentage of human primary transcripts are alternatively spliced just because you can detect low levels of strange transcripts in some tissues.

    ReplyDelete
  11. Below is a direct quote from the RNA-Seq paper by the Christopher Burge lab ("Alternative isoform regulation in human tissue transcriptomes", Nature 456, 470-476, 2008):

    "Some of these events may involve exclusively low frequency alternatively spliced isoforms. However, 92% of multi-exon genes were estimated to undergo alternative splicing when considering only events for which the relative frequency of the minor (less abundant) isoform exceeded 15% in one or more samples."

    ReplyDelete
  12. re the paper from the Burge lab.

    It's good to see that some scientists address this issue of isoform abundance.

    They analyzed ten human tissues and five cancer cell lines. It would be interesting to know how many of the significant alternative splice variants only show up in the cancer cell lines.

    ReplyDelete