More Recent Comments

Saturday, December 08, 2018

The persistent myth of alternative splicing

I'm convinced that widespread alternative splicing does not occur in humans or in any other species. It's true that the phenomenon exists but it's restricted to a small number of genes—probably fewer than 1000 genes in humans. Most of the unusual transcripts detected by modern technology are rare and unstable, which is consistent with the idea that they are due to splicing errors. Genome annotators have rejected almost all of those transcripts.

You can see links to my numerous posts on this topic at: Alternative splicing and the gene concept and Are splice variants functional or noise?.

The figure shows an idealized version of alternative splicing producing three different proteins with different combinations of exon coding regions.1 The idea is that cells can increase diversity by employing alternative splicing and this is often used as a rationale to explain how humans can get away with the same number of genes as other animals. According to this view, humans use alternative splicing to make several different proteins from a single gene. It's widely believed that >90% of human genes use alternative splicing to make several hundred thousand different proteins from only 20,000 protein-coding genes.

The other day I was reading a very interesting paper on bacterial group II introns where the authors demonstrated that group II intron RNAs in Lactoccus lactis could insert themselves into bacterial mRNAs by reverse splicing (LaRoche-Johnston et al., 2018). Recall that spliceosomal introns in eukaryotes probably evolved from group II introns and this paper fills in one of the crucial steps in that process but this is not the main take-home lesson from the paper. You can tell what the authors think is important from the title: "Bacterial group II introns generate genetic diversity by circularization and trans-splicing from a population of intron-invaded mRNAs."

They have chosen to emphasize the idea that bacteria can generate diversity through alternative splicing just like eukaryotes. Here's how they view alternative splicing in eukaryotes.
A hallmark of eukaryotic cells is the ability of their numerous introns, sequences that interrupt genes, to generate genetic diversity through the expression of several different protein variants from a single gene.
The authors are repeating an idea—I think it's a myth—that alternative splicing is widespread in eukaryotes and its purpose is to generate diversity. Like most authors these days, they think this is a proven fact that does not need to be critically examined.

I'm interested in tracing the origin of this idea to see what evidence has been advanced to support it so I was intrigued when the author included a reference to a paper I hadn't seen before. It was a 2017 review by Bush et al, in Philosophical Transaction B with an interesting title: "Alternative splicing and the evolution of phenotypic novelty" (Bush et al., 2017). The authors begin by explaining the problem they are trying to solve ...
Soon after the publication of the human genome sequence—which revealed a lower than expected number of protein coding genes—alternative splicing was proposed as a candidate to explain the diversification in the number of cell types observed in some eukaryotic lineages (a higher number of cell types in a given species is assumed to reflect increased organism complexity). As gene duplication has long been associated with functional innovation, it was initially assumed that overall gene number should correlate with the number of cell types. Gene duplication rates, however, failed to reflect the diversification of cell types observed in several eukaryotic lineages. There is a significant correlation between the two but weaker than expected and only moderately better if the analysis is restricted to metazoans. Whole-genome duplication events at the base of the vertebrate lineage, which have the highest number of cell types among eukaryotes, have comparable numbers of genes to most invertebrates (that have not undergone whole genome duplication). The poor relationship between a species' cell-type diversity and its total gene number has become known as ‘the G-value paradox’.
That's a bad beginning because, in my opinion, there's no such thing as the "G-value paradox;" thus, the authors are attempting to solve a problem that doesn't exist [Deflated egos and the G-value paradox]. As you might have guessed, the "solution" is alternative splicing.
Although other genomic features have been shown to correlate with cell-type number and may be important contributors to the evolution of complexity, alternative splicing—as a mechanism allowing transcript diversification in the absence of increases in gene number—is a prime candidate to explain the G-value paradox. Comparative studies have reported marked differences in the prevalence of alternative splicing across eukaryotic lineages as well as a significant correlation between alternative splicing and the number of cell types per species. These results are in principle consistent with an adaptive role of alternative splicing in determining a genome's functional information capacity and facilitating transcript diversification in species with greater numbers of cell types.
Statements like that are no longer surprising because there seems to be an overwhelming consensus within the field that widespread alternative splicing explains the low number of genes in humans. I argue that there's no need to "explain" the low number of genes and, furthermore, alternative splicing is not common.

In most cases where there is a genuine scientific controversy there will be facts and evidence to support both sides and the argument is over how to interpret those facts. The evidence that most unusual transcripts are just noise due to splicing eerrors is based on the following lines of evidence.
  • Splicing is associated with a known error rate that's consistent with the production of frequent spurious transcripts.
  • The unusual transcripts are usually present at less than one copy per cell.
  • The unusual transcripts are rapidly degraded and usually don't leave the nucleus.
  • The transcripts are not conserved.
  • The predicted protein products of these transcripts have never been detected.
  • The number of different unusual transcripts produced from each gene makes it extremely unlikely that they could all be biologically relevant.
  • The number of detectable transcripts correlates with the length of the gene and the number of introns, which is consistent with splicing errors.
  • Gene annotators who have looked closely at the data have determined that >90% of them are spuriuous junk RNA or noise.
If this were a genuine scientific controversy then there should be an equal amount of evidence in support of the competing hypothesis so let's see what evidence this review quotes to support the following claim in the second paragraph of their paper.
Alternative splicing is common in many eukaryote lineages, including metazoans, fungi and plants, with deep transcriptome sequencing of the human genome showing over 95% of multi-exon genes produce at least one alternatively spliced isoform [10,11].
Let's check out references 10 and 11 to see if there's strong evidence of alternative splicing.

Reference 10 is to a 2008 Nature Genetics paper by my colleagues from the University of Toronto (Toronto, Ontario, Canada) (Pan et al., 2008). I'm very familiar with this paper. The authors used the new technique of mRNA-Seq to assay six different tissues for mRNAs containing a splice junction sequence. These represent events where canonical splicing did not occur. By combining their data with earlier data from other experiments they estimate that ~95% of all human genes produce unusual transcripts. They estimate that a typical multi-exon gene produces an average of seven different transcript variants.

Note that I'm careful to use the term "transcript variants" because without further evidence we have no way of knowing whether these transcripts are noise or real examples of alternative splicing. Unfortunately, my colleagues didn't make this distinction—they refer to all events as alternative splicing. They don't even mention the possibility that they could be looking at splicing artifacts.

So, the authors of the more recent paper (Bush et al., 2017) are correct to refer to the 2008 paper as support for their claim because that's what the 2008 paper by my colleagues actually said. However, my colleagues didn't present any evidence that their abundant transcripts were functional and that alternative splicing actually makes a significant contribution to genetic diversity. It looks like a false claim of abundant alternative splicing is being accepted without critical evaluation—perhaps because it was published in a good journal and everyone assumes it underwent rigorous peer review.

What about the second reference, reference 11? That's a 2008 Nature paper by Wang et al. Those authors are much more strident in their claims; for example, they begin their paper by saying,
The mRNA and protein isoforms produced by alternative processing or primary RNA transcripts may differ in structure, function, localization, or other properties. Alternative splicing in particular is known to affect more than half of all human genes, and has been proposed as the primary driver of the evolution of phenotypic complexity in mammals.
If you've been following my discussion you will know that it's simply not true that half of all human genes exhibit alternative splicing. What IS true is that multiple transcript variants can be detected in these genes but whether they are noise or not remains to be determined.

The Wang et al. paper sets out to extend the data using mRNA-Seq in the same way as the Pan et al. paper except they assayed more tissues and collected more transcript sequences. They determined that "alternative splicing is nearly universal." Even when they restrict their analysis to more abundant transcripts, they calculate that 92% of multi-exon genes undergo alternative splicing. The authors do not discuss splicing errors and they present no evidence that these transcripts represent biologically relevant alternative splicing.

So what we have is a couple of frequently cited papers from 2008 that made unsubstantiated claims about the abundance of alternative splicing. Those claims have been widely accepted in spite of the fact that there's plenty of evidence that they are wrong. Most scientists seem to be completely unaware of the fact that the data can be best explained by splicing errors so they publish papers assuming that abundant alternative splicing is a well-documented fact.

How did this happen? The points I'm making have all been published in the scientific literature so they should be known to anyone who researches the topic. If you are thinking about working in this field, I recommend a recent paper by Bhuiyan et al. (2018). Here's what they say in the abstract.
Although most genes in mammalian genomes have multiple isoforms, an ongoing debate is whether these isoforms are all functional as well as the extent to which they increase the functional repertoire of the genome. To ground this debate in data, it would be helpful to have a corpus of experimentally-verified cases of genes which have functionally distinct splice isoforms (FDSIs).
This is how a scientific paper should be written. You explain the hypothesis and outline a test to see if it is correct. Here's what they conclude after looking at the data,
Recent studies have challenged whether most genes can produce multiple functional splice isoforms and our results can offer something to both sides of the debate. We acknowledge that other researchers may have different definitions of a functional splice isoform, but we view the debate within our operational definition – a functional splice isoform is one that is necessary for the gene’s overall function.

One side of the debate claims that most genes have multiple functionally distinct isoforms. Viewing our findings optimistically, we provide what is to our knowledge the only substantial list of human and mouse genes for which this is actually documented to be true. The low number of genes with such evidence can be interpreted as a vast opportunity for experimentalists to identify the functions of the isoforms for > 80% of genes. The other side of the debate approaches alternative splicing with a less Panglossian view, with the null hypothesis being that most isoforms do not have a specific distinct function. Multiple studies taking a genomic or evolutionary perspective have concluded that it is unlikely that most genes have multiple functional splice isoforms. Viewed pessimistically, our data is consistent with this body of work. If the literature lacks supporting evidence for widespread FDSIs, the null hypothesis should be maintained and claims that every observed isoform has a function to be discovered should be viewed skeptically.

To our knowledge, this report represents the first effort to curate the literature in order to determine the genes where splicing increases the genome’s functional potential. Such individual reports have been generally ignored in the debate about the function of alternative splicing, which has instead focused on databases and high-throughput data sets. Our estimate that only 4% of human and 9% of mouse genes have evidence for functionally distinct isoforms serves both a sobering reminder of the limited evidence, and a motivation for increased experimental efforts to settle the debate.
Let's hope that more and more scientists wake up to the fact that there's limited evidence for widespread alternative splicing and that in the absence of evidence the null hypothesis is junk RNA.

I'm not holding my breath.

1. Real alternative splicing also occurs with noncoding RNA genes but I'll restrict my discussion to protein-coding genes.

Bhuiyan, S.A., Ly, S., Phan, M., Huntington, B., Hogan, E., Liu, C.\C., Liu, J., and Pavlidis, P. (2018) Systematic evaluation of isoform function in literature reports of alternative splicing. BMC Genomics, 19:637. [doi: 10.1186/s12864-018-5013-2]

Pan, Q., Shai, O., Lee, L.J., Frey, B.J., and Blencowe, B. J. (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature genetics, 40:1413-1415. [doi: 10.1038/ng.259]

Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., and Burge, C.B. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature, 456:470-476. [doi: 10.1038/nature07509]


  1. In your previous post you suggest that run-of-the-mill housekeeping genes also have alternate splice forms. Did you look this up or are you just assuming? Because if its true that housekeeping genes have on average the same number of splice forms as the general average I think that would be another bit of compelling evidence that most splice forms aren't functional. Of course it would then be nice to take a few genes for which there are easy assays (hexokinase?) and show that the protein products of the alt spice forms are not functional.

    1. I looked at all the genes in the gluconeogenesis/glycolysis and citric acid cycle pathways. I also looked at the genes for the major subunits of RNA polymerase. They all have multiple slice variants and in all cases no protein variants have been detected.

      Here's the data for TRP1 - the triose phosphate isomerase gene.

      Splice variants of the human triose phosphate isomerase gene: is alternative splicing real?

    2. Here's the some of splice variants in the gene for the large subunit of RNA polymerase.

      Two Examples of "Alternative Splicing"

  2. Few months ago I had a quick (and dirty) look if the number of splice variants in an organism correlates with the effective population size of that species. There is a negative correlation and to me the most plausible explanation is that most of these variants are junk and selection can't get rid of them because of the "drift barrier".

  3. Watch this terrible, terrible video on introns and alternative splicing by nature video:, which seems to insinuate that it's probably all there for some sort of adaptive reason.

    1. I'm struggling to understand why Nature would produce such a video without input from knowledgeable experts. Part of the problem, evident in this video, is that the warnings of Gould and Lewontin about the dangers of adaptationism have largely been ignored by most scientists and science journalists. It is inconceivable to the authors of this video that introns could NOT have an adaptive function so they use a scattergun approach by mentioning all possible functions of introns, even those that are patently ridiculous.