Tuesday, June 27, 2017

Debating alternative splicing (Part IV)

In Debating alternative splicing (Part III) I discussed a review published in the February 2017 issue of Trends in Biochemical Sciences. The review examined the data on detecting predicted protein isoforms and concluded that there was little evidence they existed.

My colleague at the University of Toronto, Ben Blencowe, is a forceful proponent of massive alternative splicing. He responded in a letter published in the June 2017 issue of Trends in Biochemical Sciences (Blencowe, 2017). It's worth looking at his letter in order to understand the position of alternative splicing proponents. He begins by saying,
It is estimated that approximately 95% of multiexonic human genes give rise to transcripts containing more than 100 000 distinct AS events [3,4]. The majority of these AS events display tissue-dependent variation and 10–30% are subject to pronounced cell, tissue, or condition-specific regulation [4].
Ben references two papers. The first one, Pan et al., (2008) (ref. 3), is from his own lab. It documents the existence of abundant splice variants and estimates that, "95% of multiexon genes undergo alternative splicing." This is misleading because the results only show the existence of splice variants. Whether these can be explained as splicing errors or alternative splicing (AS) is what's being debated. Nobody questions the existence of abundant splice variants but that doesn't mean they are produced by AS.

The second paper is Wang et al. (2008). It was also published nine years ago. The authors looked at splice variants in different tissues and found that the number and the amount of these splice variants varied from tissue to tissue. This is the result predicted for both the splicing errors explanation and the massive alternative splicing explanation so it doesn't distinguish between these conflicting explanations. Nevertheless, Wang et al. conclude, "... most alternative splicing events are regulated between tissues, providing an important element of support for the hypothesis that alternative splicing is a principle contributor to the evolution of phenotypic complexity in mammals."

The conclusion is doubly flawed because they have not demonstrated that they are dealing with true alternative splicing and they have not demonstrated that they are viewing regulation. The word "regulation" is not an observation; it's a conclusion. It is a loaded word. "Regulation" implies that the variation observed has biological relevance. But not all variation is due to regulation. This is the same problem we encounter in the pervasive transcription debate. Just because a transcript is tissue-specific doesn't mean it's biologically relevant. Similarly, just because some splicing errors are restricted to certain cells doesn't mean they are regulated.

Bencowe's opening arguments are seriously flawed. I don't think he's making as strong a case as he thinks.
The latter events are significantly enriched for evolutionary conservation and frame-preserving potential.
The Wang et al. paper showed that there were several classes of transcript variants. Some of them showed lots of evidence of being real examples of alternative splicing. In that class, about 60% of skipped exon events showed preservation of the reading frame. In the class that looked the least promising—a class that included most events—only 41% of the skipped exon events preserved reading frame. (You expect 33% by chance alone.)

As for "conservation," the results in the Wang et al. paper do not say what you might think. There's no data on whether variants are present in other species. The evidence on conservation and preservation of open reading frame are far less impressive than Blencowe implies in his letter. In fact, most studies show that the production of splice variants is NOT conserved.
Moreover, dozens of independent studies have shown that subsets of differentially regulated AS events are significantly enriched in genes that function in common biological processes and pathways, and that, where characterized, exons in these splicing ‘networks’ have important functions [5,6]. For example, AS networks function extensively in the remodeling of cytoskeletal interactions, signaling cascades, and gene regulatory pathways, and the literature contains hundreds of examples in which translated splice variants contribute important roles in development, cell and tissue homeostasis, animal behavior, diseases, as well as other processes.
This is an important point. We would like to know the number of proven examples of alternative splicing in oredr to bring the debate into focus.

It's well known that most transcript variants have not (yet?) been shown to have biological relevance. This means that splicing errors is a viable explanation. On the other hand, it is well known that alternative splicing is a real phenomenon. There are many well-established examples.

However, the mere fact that splice variants are common in important genes is not evidence of alternative splicing because ALL multiexon genes have splice variants. Blencowe argues that "hundreds" of examples of real alternative splicing exist but I'm not aware of the data supporting that claim. Nevertheless, I'm willing to assume that several hundred human protein-coding genes are alternatively spliced. That's 1% of the total and it's a long way from 95%. This is not a debate about the mere existence of alternative splicing; it's a debate about its relative frequency.

Ben then raises certain technical issue with the Tress et al. paper. He wonders whether the inability to detect predicted protein isoforms can be due to the lack of sensitivity of the mass spec assays. The answer is "yes," that's a possibility. Such objections can always be raised when one is trying to demonstrate the absence of something. It's part of the problem with proving the negative.

However, there's a built in positive control for these experiments and that's their ability to detect the dominant predicted isoform for protein-coding genes. The results do detect these isoforms as Tress et al. point out in their response to Blencowe's letter (Tress et al., 2017b). Many of those isoforms are present in substantial amounts in the cells being assayed but many are relatively low abundance proteins. It doesn't mean that mass spec will detect all minor isoforms if they exist but it does establish that the assays detect what they are supposed to detect.

Blencowe then asks whether the conclusions in Tress et al., (2017a) are justified by the results. Here's the response from the authors ...
Question two is ‘are the conclusions justified based on the findings?’ In our opinion the available evidence leaves little room for doubt. Most protein coding genes have main protein isoforms, and most alternative exons are subject to neutral or near-neutral evolution. We believe our conclusions are well substantiated and invite readers to judge for themselves in the article and related papers.
I agree with Tress et al. Their conclusions are justified. They are not "proven." They are as tentative as most scientific conclusions should be. So far, I don't think Ben Blencowe has presented an adequate defense of his claim and I don't his conclusion is as well-justified as the one he is criticizing.

The third issue Blencoe raises is whether the splice variants are actually translated in spite of the fact that the protein products haven't been detected. He reviews data indicating that "tens of thousands of splice variant transcripts have been detected in polysome fractions." But there are just as many technical objections to those experiments as the ones he is criticizing. In fact, as Tress et al. point out in their response, all kinds of non-coding RNAs are detected in polysomes indicating that this is not a reliable indication of translation.

What we have is a conflict between direct assays for proteins in the mass spec experiments and indirect assays in ribosome profiling experiments. The results do not agree. You can't continue to argue for the existence of tens of thousands of protein isoforms unless you can actually prove they exist. But that's exactly what Blencowe does when he says,
In summary, when collectively considering multiple sources of false negative detection rates for splice variants in LC-MS/MS data, previous results demonstrating that protein abundance is predominately related to transcript abundance, and recent results from detecting splice variant sequences associated with ribosomes, it is apparent that most splice variants detected in transcriptome profiling data are likely translated. Therefore, it is possible that most splice variants contribute to cellular function. Unfortunately, the authors of [1] have unnecessarily dismissed as a ‘theory’ a large body of experimental data from numerous laboratories demonstrating extensive roles for AS in the remodeling of cellular networks that have diverse roles in critical biological processes.
I understand where Ben is coming from. He has a lot invested in the idea that massive alternative splicing is a real phenomenon. However, I think he could do a much better job of presenting his case if he would just acknowledge the alternative explanation (splicing errors) and discuss the results in light of that possibility. I think he is exaggerating the evidence for biological function and ignoring counter evidence such as lack of sequence conservation.

Tress et al. (2017b) demonstrate in their response to Blencowe that they understand the evolutionary significance of conservation and the lack of conservation. Here's what they say in defense of their view that most splice variants are due to splicing errors.
A wealth of genome-wide genetic variation data from human populations has recently become available, enabling us to test whether alternative exons are undergoing purifying selection (whether they really are innovations). The variant data we analyzed were only available from 2012. These show that most alternative exons are evolving neutrally: they have a much higher non-synonymous to synonymous substitution ratio, and a 10-fold higher proportion of potentially damaging high-impact variants. Most interestingly, these figures do not decrease with increasing allele frequency, as would be expected if alternative exons were under selection pressure.

The neutral or near-neutral selection pressures apparent in the current population are a very strong suggestion that most alternative variants have not been evolutionary selected to have important cellular roles.
It seems to me that this is strong evidence in favor of the splicing error explanation.

Blencowe closes his criticism with the following ....
Tress and colleagues have already shown promise for the more comprehensive detection of translated splice variants and will undoubtedly prove to be valuable in future studies [7,14]. Moreover, it is important to appreciate that it can take a single research group years of effort to determine the biological function of a single AS event. As such, an important goal for future studies will be to further develop high-throughput methods for interrogating the functions of splice variants. In the meantime, one should be mindful of the old aphorism, ‘absence of evidence is not evidence of absence’.
I agree with Ben that the real proof of the pudding lies in hard-core biochemistry and molecular biology. It takes a lot of work to demonstrate that predicted AS events are biologically relevant. But that's exactly what has to happen if we are ever going to be convinced that 95% of human genes are alternatively spliced to produce functional protein isoforms. I disagree with Ben that this controversy is going to be resolved in his favor by more genomics and proteomics. That hasn't worked so far. Ask ENCODE how it's working out for them.

Finally, the old aphorism is somewhat disingenuous. Ben is trying to shift the burden of proof onto his opponents. He is criticizing them for not proving that the protein isoforms do not exist. That's not fair. The burden of proof is on him and his supporters to show that alternative splicing is real. As more and more attempts to demonstrate the existence of abundant protein isoforms result in failure, it becomes increasing difficult to maintain they exist. At some point, the absence of evidence in support of massive alternative splicing should cause proponents to rethink their position.

Note: The front page story in The Toronto Star refers to a 2010 paper by Barash et al. That paper claims to have discovered an extensive regulatory code controlling alternative splicing. There's no mention of splicing errors.

Barash, Y., Calarco, J.A., Gao, W., Pan, Q., Wang, X., Shai, O., Blencowe, B.J., and Frey, B.J. (2010) Deciphering the splicing code. Nature, 465:53-59. [doi: 10.1038/nature09000]

Blencowe, B.J. (2017) The relationship between alternative splicing and proteomic complexity. Trends in biochemical sciences, 42(6), 407-408. [doi: 10.1016/j.tibs.2017.04.001]

Pan, Q., Shai, O., Lee, L.J., Frey, B.J., and Blencowe, B.J. (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature genetics, 40:1413-1415. [doi: 10.1038/ng.259]

Tress, M.L., Abascal, F., and Valencia, A. (2017a) Alternative splicing may not be the key to proteome complexity. Trends in biochemical sciences, 42:98-110. [doi: 10.1016/j.tibs.2016.08.008]

Tress, M.L., Abascal, F., and Valencia, A. (2017b) Most Alternative Isoforms Are Not Functionally Important. Trends in biochemical sciences, 42:408-410. [doi: 10.1016/j.tibs.2017.04.002]

Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature, 456:470-476. [doi: 10.1038/nature07509]


  1. RNA-Seq experiments (the standard type, not CAGE expts) are steady-state snapshots of the stable RNAs in a cell (just like a Northern Blot, but more sensitive). Unlike microarray experiments, the number of reads sequenced is directly proportional to the number of transcripts (after normalization, and a bunch of other caveats). Therefore, I would expect that any splicing events (whether they be adventitious, or regulated) that introduce an early stop codon into the ORF of a protein-coding RNA, will be quickly degraded by the cell's nonsense-mediated decay (NMD) machinery. I believe NMD proteins are associated with active ribosomes. This means that most of the aberrantly spliced transcripts would probably not be detected in an RNA-Seq experiment, due to QC. Could this be used to argue that detectable RNA isoforms containing frameshifts are NOT being actively translated (because otherwise they would get degraded)? [[full disclosure: I have not yet read every paper cited in the four-part series of blog posts]

    1. We could quibble endlessly about whether splicing errors can be detected by RNA-Seq experiments. It doesn't seem productive to argue that they can't be detected, therefore the splice variants must be functional.

      The debate can only be resolved by demonstrating that most genes actually produce multiple functional protein isoforms. Alternatively, they could be producing splice variants that play a role in regulation even if they aren't translated.

      The problem with this "debate" is that it is being completely ignored by proponents of alternative splicing. They act as if their case has already been proven. This false view is widely promoted in the scientific literature and in the popular press.

    2. Hi,

      Several important points here.

      Firstly RNA-Seq cannot be used to determine the proportion of transcripts because (a) reads are too short and (b) transcript reconstruction algorithms are rubbish.

      Second NMD transcripts are indeed associated to the ribosomes, so in fact you would expect to find them in RNA-Seq experiments (before they get to the ribosomes). Indeed many dominant transcripts from RNA-Seq experiments are NMD variants (though this might not be real and could be the fault of the poor transcript reconstruction methods).

      Thirdly some RNA transcripts with frameshifts are NMD variants (and these shouldn't get translated) but many are not. We even find some protein evidence for a few of these non-NMD translated frameshifts and one or two seem to be highly conserved.

      I think we may finally have started a debate on the importance of alternative splicing; one of our two AS papers has been in the top 5 most read TIBS papers since October and I would hope that this is going to be reflected in the scientific literature. The door is certainly now open.

      Unfortunately, the dichotomy between what is found at the transcript level on one side and the conservation, genetic variation data and proteomics data on the other is likely to only get wider in the short term. New long read technology has the capability of finding entire transcripts and I have just seen a talk claiming to have found long read evidence for 192 variants for MAPK10. If that extrapolates to the whole genome that would be something like 200,000 transcripts. MAPK10 is actually a gene that we find proteomics and conservation evidence for, but for only one alternative splice event.

    3. Firstly RNA-Seq cannot be used to determine the proportion of transcripts because (a) reads are too short and (b) transcript reconstruction algorithms are rubbish.

      Splicing events give rise to split reads in RNAseq experiments, which will inform you about the use of exon cassettes and exon extension. Quantification of split reads does not rely on reconstruction algorithms and works perfectly fine with short reads.

      That being said, I agree with Larry that we could quibble endlessly about this, but it is not going to resolve this debate.

    4. @Corneel Oh, sure, but that wasn't what was said in the original comment ("the number of reads sequenced is directly proportional to the number of transcripts"). To detect distinct splice events in RNA-Seq you don't need transcript reconstruction algorithms, you only need to map the short read correctly. To assume the presence of a whole transcript, you need transcript reconstruction methods to guess which read maps to which transcript, and that's where you end up with weird and wonderful results.

      And you are quite right, this discussion doesn't resolve anything, but it is important. The problems with these methods is something that needs to be taken into account when determining (for example) tissue-specific expression of transcripts in large-scale experiments. Transcript reconstruction algorithms are treated far more seriously than they should be.

    5. Michael Tress writes, "I think we may finally have started a debate on the importance of alternative splicing; ...

      I appreciate the attempt but I don't see any evidence that there's a real debate going on. It's going to be very difficult for proponents of AS to admit there's a controversy. That's because admitting a there's a "debate" means their previous papers are called into question and their grant proposals may be challenged.

      I see no evidence that any of my colleagues, or any members of their labs, are willing to debate the issue of massive alternative splicing. It's very much like the ENCODE situation where few of the researchers are willing to admit they might have been wrong. They continue to publish papers about the importance of pervasive transcription and the the millions of transcription factor binding sites.

      The problem could easily be solved if we had competent reviewers who would insist that the rhetoric in most papers be toned down. However, most of those reviewers are themselves heavily invested in the idea that there's massive complexity in the human genome and all they need to do is attract lots of grant money to investigate it.

    6. The comments we have had suggest that at least some people who were previously convinced of the importance of AS are starting to realise that there are still a lot of questions to be answered, but it's certainly not going to change overnight.

  2. Is there some way to estimate the amount of wasteful metabolic activity involved in nonfunctional splicing and transcribing? Like, how much energy could an organism save if it could (miraculously) never do that?

    1. http://www.pnas.org/content/112/51/15690.full.pdf

      An enduring mystery of evolutionary genomics concerns the mechanisms responsible for lineage-sp ecific expansions of genome size in eukaryotes, especially in multicellular species. One idea is that all excess DNA is mutationally hazardous, but weakly enough so that genome-size expansion passively emerges in species experiencing relatively low efficiency of selection owing to small effective population sizes. Another idea is that substantial gene additions were impossible without the energetic boost provided by the colonizing mitochondrion in the eukaryotic lineage. Contrary to this latter view, analysis of cellular energetics and genomics data from a wide variety of species indicates that, relative to the lifetime ATP requirements of a cell, the costs of a gene at the DNA, RNA, and protein levels decline with cell volume in both bacteria and eukaryotes. Moreover, these costs are usually sufficiently large to be perceived by natural selection in bacterial populations, but not in eukaryotes experiencing high levels of random genetic drift. Thus, for scaling reasons that are not yet understood, by virtue of their large size alone, eukaryotic cells are subject to a broader set of opportunities for the colonization of novel genes manifesting weakly advantageous or even tran siently disadvantageous phenotypic effects. These results indicate that the origin of the mitochondrion was not a prerequisite for genome-size expansion.

    2. read that...not obvious..1%? 10%? 50%? They seem to estimate metabolic costs per gene, not the cost of all irrelevant transcriptions.

  3. What makes you think splicing mistakes are expected to have tissue specificity?

    1. Two obvious reasons:

      1) The number of splicing mistakes should be correlated with the number of transcriptions, and those are tissue specific.

      2. If there are tissue-specific alternative splicings of some genes (there are), then whatever causes those might also operate on sites that just happen to resemble the sites in the genuine alternatively spliced genes.

      There may be other reasons that I haven't thought of.

  4. Just out of curiosity: Are circular RNAs functional?

  5. Would it be possible to look for ubiquitination of proteins produced by splice variants as a proxy for determining function? From a quick scan of Google hits it appears that ubiquitination can be detected by LC/MS which could allow high throughput screening of the proteasome. Just a thought.

    1. I think that this would have the same advantages and drawbacks of standard proteomics experiments. In the end you are looking for an extra two glycines on every peptide that has a lysine (well, mostly lysines). However, this does raise an important point, post-translational modifications are one of the main complications in MS experiments. There are a huge variety of PTMs and they are never all accounted for, which is partly why their are so many unassigned spectra (it's certainly not due to AS). To complicate matters several of PTM are still poorly characterized and tend to produce many false positives identifications (N-terminal acetylations, for example). There does seem to have been some progress here recently though.

  6. It seems to me if Blencowe is right, then he has a discovery that could lead to the curing of countless diseases.
    I also think he is engaged in some wishful thinking on the subject.
    If both of the above statements are correct, then I do not find the situation to be at all surprising.

    I wonder what would happen if I said “I think you are engaged in some wishful thinking on this subject,” to him directly.

    I also wonder if it would help someone do the countless hours of tedious lab work that will be required to learn more about this if that person were engaged in wishful thinking.
    Isn’t that why we expect someone who isn’t engaged in wishful thinking to attempt to replicate the work?

  7. Larry, you might be interested in this relevant paper:

    "Alternative splicing and the evolution of phenotypic novelty" (2017)

  8. If alternative splicing is mostly noise, how do you explain the level of splicing variation among vertebrates

    Science 21 December 2012:
    Vol. 338 no. 6114 pp. 1587-1593 DOI: 10.1126/science.1230612
    All Science Journals
    Enter Search Term
    My Science
    About the Journal
    Prev | Table of Contents | Next Read Full Text to Comment (0)
    The Evolutionary Landscape of Alternative Splicing in Vertebrate Species
    Nuno L. Barbosa-Morais1,2, Manuel Irimia1,*, Qun Pan1,*, Hui Y. Xiong3,*, Serge Gueroussov1,4,*,
    Leo J. Lee3, Valentina Slobodeniuc1, Claudia Kutter5, Stephen Watt5, Recep Çolak1,6, TaeHyung Kim1,7, Christine M. Misquitta-Ali1, Michael D. Wilson4,5,7, Philip M. Kim1,4,6, Duncan T. Odom5,8,
    Brendan J. Frey1,3, Benjamin J. Blencowe1,4,†
    Author Affiliations
    ↵†To whom correspondence should be addressed. E-mail: b.blencowe@utoronto.ca ↵* These authors contributed equally to this work.
    How species with similar repertoires of protein-coding genes differ so markedly at the phenotypic level is poorly understood. By comparing organ transcriptomes from vertebrate species spanning ~350 million years of evolution, we observed significant differences in alternative splicing complexity between vertebrate lineages, with the highest complexity in primates. Within 6 million years, the splicing profiles of physiologically equivalent organs diverged such that they are more strongly related to the identity of a species than they are to organ type. Most vertebrate species- specific splicing patterns are cis-directed. However, a subset of pronounced splicing changes are predicted to remodel protein interactions involving trans-acting regulators. These events likely further contributed to the diversification of splicing and other transcriptomic changes that underlie phenotypic differences among vertebrate species."

    1. Bill Cole writes: "If alternative splicing is mostly noise, how do you explain the level of splicing variation among vertebrates "

      The most obvious answer is species specific variation in the proteins that control splicing. Mutations could potentially change the specificity of the proteins that detect splice sites causing differences in the levels of splice variants between species.

    2. The most likely explanation is species specific changes in intron sequences that generate spurious splice sites. Some changes will be in the splice sites themselves generating weaker or stronger matches to the ideal splice site.

      I'm skeptical of the idea that there are more splicing errors in primates compared to other mammalian lineages. Also, the authors are postulating rates of adaptive change that are totally unreasonable. Finally, there's not a shred of evidence supporting the idea that phenotypic differences between vertebrate species are due to alternative splicing.

  9. "I'm skeptical of the idea that there are more splicing errors in primates compared to other mammalian lineages."

    According to the paper mouse alternative splicing levels are about 20% chimps at 50% and humans at 95%.

    1. If most splice variants are due to splicing errors then the more you study a species the more errors you should detect. The number of different splice variants will depend on the number of different tissues you look at and the depth of the study.

      Unless all these variables are controlled, I remain skeptical of any data suggesting a big difference between splice variants in different mammals.

      If most of the splice variants are biologically significant then the problem is even worse. It means that for some unknown reason thousands of adaptive mechanisms of alternative splicing have arisen specifically in the human lineage in the past five million years.

      Meanwhile, those same adaptations did not arise in any other mammalian lineage for some unknown reason. That doesn't make any sense.

    2. There is also the issue that changes of coding sequence, and also possibly of flanking regulatory sequences, would affect many of these variants. If the functions of these variants were different, that would imply a lot more selective constraint on these sequences. It would be hard to change one variant to achieve a greater adaptation without making the function of another variant worse.