Sandwalk: Alternative splicing: function vs noise

Wednesday, April 08, 2020

Alternative splicing: function vs noise

This post is about a recent review of alternative splicing published by my colleague Ben Blencowe in the Dept. of Medical Genetics at the University of Toronto (Toronto, Ontario, Canada). (The other author is Jermej Ule of The Francis Crick Institute in London (UK).) They are strong supporters of the idea that alternative splicing is a common feature of most human genes.

I am a strong supporter of the idea that most splice variants are due to splicing errors and only a few percent of human genes undergo true alternative spicing.

This is a disagreement about the definition of "function." Is the mere existence of multiple splice variants evidence that they are biologically relevant (functional) or should we demand evidence of function—such as conservation—before accepting such a claim?

Background: what are splice variants?

Let me begin by defining some terms. Modern techniques are capable of detecting specific RNA molecules that may be present at less than one copy per cell. By scanning many different tissues, workers have compiled extensive lists of transcripts that are complementary to various parts of the genome. This gives rise to the idea of pervasive transcription and that was one of the reasons why ENCODE researchers claimed that most of our genome is functional.

Most knowledgeable scientists now agree that many of those transcripts are spurious transcripts produced by accidental transcription. Many of those transcripts overlap with known genes and the primary transcript will be processed by splicing if it overlaps a splice site. This gives rise to transcripts that are characterized as splice variants and those transcripts are not so easily dismissed as mistakes by workers in the field of alternative splicing. That's because alternative splicing is a real phenomenon that has been well-studied in a few genes since the early 1980s.

I restrict the term "alternative splicing" to those situations where the alternate transcripts are known to be biologically relevant, or when we have a strong reason to suspect true alternative splicing. In situations where the transcript variants don't have the characteristics of true alternative splicing, and where there's no evidence of biological relevance, I will refer to those transcripts as "transcript variants" or "splice variants." This differs from standard usage in the field where all the splice variants are automatically assumed to be examples of true alternative splicing.¹

It's hard to find a modern up-to-date database that lists all the variants for an individual gene but it seems from scanning old databases that there may be dozens of splice variants for most genes. One of most widely quoted papers in the field is the Pan et al. (2008) paper from the Blencowe/Frey labs. This is the paper where they claim that 95% of human multiexon protein-coding genes are alternatively spliced and that there are, on average, "at least seven alternative splicing events" per gene.

I reject this terminology. I would say there are at least seven splice variants per gene and it remains to be seen whether they are examples of splicing errors or true alternative splicing. Neverthelss, in spite of the lack of supporting evidence—other than the mere existence of splice variants—this paper is widely quoted as evidence of pervasive alternative splicing.

An example of splice variants

The top figure below shows some of the splice variants for the human triose phosphate isomerase gene (TPI1) from the Ensembl: human database. I think these are only a small subset of the variants that have been reported for this gene but even in this small subset you can see predictions of eight different proteins plus two variants that don't encode proteins.

The bottom figure is the same data for the mouse gene [Ensemble: mouse]. There are only three variants of the mouse TRI1 genes in the Ensemble database and only one of them is predicted to make a different protein—one that's missing the C-terminal half of the protein. Note that the patterns of transcript variants of the mouse and human genes are not the same. Production of these variants is not conserved in mammals.

Triose phosphate isomerase is an important metabolic enzyme found in all species, including bacteria. The enzyme catalyzes an important reaction in gluconeogenesis/glycolysis. The structure of the protein is well known and it's function is well understood. It seems very unlikely that humans would make seven functional variants of this protein especially since none of them are found in other mammals.

(Note: There seems to be an increasing reluctance to publish examples of transcript variants for specific genes. I can't recall when I've last seen any images like the ones I posted above. I wonder if this is because the proponents of alternative splicing are embarrassed to show representations of the data or whether they don't look at it themselves. I suspect the latter explanation. It seems as though workers in the field are increasingly relying on bioinformatic analysis of transcript variant databases without ever actually looking at specific genes to see if the databases make sense. It's time to re-issue my Challenge to Fans of Alternative Splicing.)

The Deflated Ego Problem

The controversy over the frequency of alternative splicing is related to something I call The Deflated Ego Problem. The "problem" is based on the view that humans are extraordinarily complex compared to other species and that this complexity should be reflected in the number of genes. Many scientists were "shocked" to discover that humans don't have very many more genes than the nematode Caenorhabditis elegans and even fewer genes than some flowering plants.

In order to preserve their view of human exceptionalism, these shocked scientists have been forced to come up with an explanation for this "anomaly." I listed seven of these explanations in the Deflated Ego post but the one I want to draw your attention to is alternative splicing. The idea is that while humans may not have a lot more genes than nematodes, they make much better use of those genes by producing multiple proteins from each gene. Thus, the complexity of humans is explained by alternative splicing and not by an increase in the number of genes.

The lack of genes is often referred to as the G-value paradox (see Deflated egos and the G-value paradox). It's only a problem if you haven't been following the work of developmental biologists over the past forty years. They have established that complexity and species differences are usually explained by changes in how genes are regulated and not by large increases in the evolution of new genes [Revisiting the deflated ego problem]. There is no "problem" and scientists should not have been shocked.²

Here's an explicit explanation of the imaginary problem as expressed by Gil Ast in a 2005 Scientific American article (Ast, 2005).

The Alternative Genome

The old axiom "one gene, one protein" no longer holds true. The more complex an organism, the more likely it became that way by extracting multiple protein meanings from individual genes
When a first draft of the human sequence was published the following summer, some observers were therefore shocked by the sequencing team's calculation of 30,000 to 35,000 protein-coding genes. The low number seemed almost embarrassing. In the years since, the human genome map has been finished and the gene estimate has been revised downward still further, to fewer than 25,000. During the same period, however, geneticists have come to understand that our low count might actually be viewed as a mark of our sophistication because humans make such incredibly versatile use of so few genes.

Through a mechanism called alternative splicing, the information stored in the genes of complex organisms can be edited in a number of ways, making it possible for a single gene to specify two or more distinct proteins. As scientists compare the human genome to those of other organisms, they are realizing the extent to which alternative splicing accounts for much of the diversity among organisms with relativity similar gene sets ....

Indeed, the prevalence of alternative splicing appears to increase with an organism's complexity—as many as three quarters of all human genes are subject to alternative splicing. The mechanism itself probably contributed to the evolution of that complexity and could drive our further evolution.

This view has become standard dogma in the alternative splicing world so that almost every new paper begins with a reference to it as though it were established theory. It seems to be widely accepted that multiple versions of metabolic enzymes such as triose phosphate isomerase will explain human complexity.³

But it is not a fact that most genes exhibit some form of alternative splicing; it's merely speculation designed to assuage deflated egos. Furthermore, the explanation relies on the assumption that less complex animals must make fewer proteins from a similar set of genes. Recent experiments have shown that this assumption is false so the whole argument falls apart [Alternative splicing in the nematode C. elegans].

Explain these facts

Here's a modified list of things that need explaining if you think that alternative splicing is widespread in humans. The original list was posted more than a year ago [The persistent myth of alternative splicing].

Splicing is associated with a known error rate that's consistent with the production of frequent spurious splice variants. Explain why this fact is ignored.
The unusual transcript variants are usually present at less than one copy per cell. Explain how thousands of such rare transcripts could have a function.
The unusual transcript variants are rapidly degraded and usually don't leave the nucleus. What is their function?
The transcripts are not conserved, as expected if they are splicing errors. Give a rational evolutionary explanation for why we should ignore the lack of sequence conservation.
In the vast majority of cases, the predicted protein products of these transcripts have never been detected. Explain that.
The number of different unusual transcripts produced from each gene makes it extremely unlikely that they could all be biologically relevant. Explain how such strange transcripts, and even stranger protein variants, could have evolved.
The number of detectable transcripts correlates with the length of the gene and the number of introns, which is consistent with splicing errors. Explain how this is consistent with biologically relevant alternative splicing.
Gene annotators who have looked closely at the data have determined that >90% of them are spurious junk RNA or noise and they have not been included in the standard reference database. Why do genome annotators dismiss most splice variants?

The Ule and Blencowe paper of 2020

This brings me, finally, to the paper I want to discuss. It was published last October (2019).

Ule, J., and Blencowe, B.J. (2019) Alternative splicing regulatory networks: Functions, mechanisms, and evolution. Molecular Cell, 76:329-345. [doi: 10.1016/j.molcel.2019.09.017]

This review article begins with the statement that "Transcripts from nearly all human protein-coding genes undergo one or more forms of alternative splicing ...." This statement is misleading, at best. I could easily make the case that nearly all genes produce multiple transcript variants but most of them are due to splicing errors. The interesting question is how many of them might, instead, be due to biologically relevant alternative splicing. The burden of proof is on those who claim functionality and, in the absence of evidence of function, the default assumption is junk RNA.

Most of the review article deals with the variety of RNA-binding and DNA-binding proteins that give rise to splice variants. I don't find this very interesting since it's not clear whether these are spurious binding events that give rise to errors in splicing or whether they are biologically relevant.

The authors clearly believe that alternative splicing "... accounts for the vast range of biological complexity and phenotypic attributes across metazoan species." They conclude that, "... it is becoming clear that alternative splicing has been particularly important for enriching proteomic complexity in animals in ways that have provided an expanded toolkit for evolution."

It's important to note that the authors are aware of the fact that the pattern of production of splice variants is not conserved between species. In fact, they explicitly mention this point in support of their claim that "... alternative splice patterns have diverged rapidly among species." They believe that the lack of conservation can be explained away by postulating rapid selection such that the patterns of thousands of genes are different, even between closely related species. This is a common rationale (rapid selection for divergence) used to dismiss the lack of sequence conservation.

The other interpretation, of course, is that most of the splice variants are due to splicing errors and that's why they are not conserved (see Using conservation to determine whether splice variants are functional for an extended discussion of this issue).

The most interesting part of the review paper, in my opinion, is the section called "Function versus Noise or Evolutionary Fodder." This is the part of the paper that deals with the controversy and it's good to see it finally addressed since most papers on alternative splicing ignore it. Here's how Ule and Blencowe begin this section ....

As the number of alternative splicing events detected in large-scale sequencing studies continues to rise, it has been argued that only a minor fraction of splice variants are regulated or translated or are of functional importance (Tress et al., 2017).

The paper they reference (Tress et al., 2017a) was covered in an earlier post on this blog [Debating alternative splicing (part II)]. What Tress et al. did was to use mass spectroscopy to look for the protein variants predicted by alternative splicing. The authors analyzed the results of eight large-scale experiments and reached the following conclusions ...

Alternative splicing is well documented at the transcript level, and microarray and RNA-seq experiments routinely detect evidence for many thousands of splice variants. However, large-scale proteomics experiments identify few alternative isoforms. The gap between the numbers of alternative variants detected in large-scale transcriptomics experiments and proteomics analyses is real and is difficult to explain away as a purely technical phenomenon. While alternative splicing clearly does contribute to the cellular proteome, the proteomics evidence indicates that it is not as widespread a phenomenon as suggested by transcript data. In particular, the popular view that alternative splicing can somehow compensate for the perceived lack of complexity in the human proteome is manifestly wrong. [my emphasis LAM]

... The results from large-scale proteomics experiments are in line with evidence from cross-species conservation, human population variation studies, and investigations into the relative effect of gene expression and alternative splicing. Gene expression levels, not alternative splicing, seem to be the key to tissue specificity. While a small number of alternative isoforms are conserved across species, have strong tissue dependence, and are translated in detectable quantities, most have variable tissue specificities and appear to be evolving neutrally. This suggests that most annotated alternative variants are unlikely to have a functional cellular role as proteins. [my emphasis, LAM]

As you might have guessed, Ben Blencowe was unhappy with this result so he responded with a critical letter published in the same journal a few months later (Blencowe, 2017) [see Debating alternative splicing (Part IV)]. In that letter, he made the same points that he makes in the Ule and Blencowe review; namely that the mass spec experiments are flawed for technical reasons—they are not detecting protein variants that should be there. However, the authors do concede that, "... alternative splicing events lie on an evolving spectrum of regulation and functionality; therefore, it is very challenging to draw a line between those that are functional or non-functional."

Tress et al. responded to Blencowe's letter back in 2017 (Tress et al., 2017b). As experts in proteomics they were probably aware of all of the objections that Blencowe raised, and many more. After considering Blencowe's criticisms, they write, "We believe our conclusions are well substantiated and invite readers to judge for themselves in the article and related papers."

Resolving the controversy

It don't think it's possible to state conclusively that almost all human protein-coding genes produce protein variants by biologically-relevant alternative splicing. Scientists who make such claims are wrong because there's nothing to support such a claim other than wishful thinking. On the other hand, it's not possible to conclude that most splice variants are noise, although I firmly believe that the evidence tilts in the direction of noise. The apppropriate null hypothesis is that the transcripts do not have a function and the burden of proof is on those who make the claim for function.

The main problems I have with the alternative splicing literature are: (1) that proponents of widespread alternative splicing are using questionable evolutionary arguments to rationalize their claim, and (2) they are mostly ignoring any objections to their claims and refusing to acknolwedge that they could be mistaken.

It's interesting that Ule and Blencowe do not address any of the other criticisms of alternative splicing. They only respond to one paper. Here's a short list of other papers they might have considered.

Bhuiyan, S.A., Ly, S., Phan, M., Huntington, B., Hogan, E., Liu, C.C., Liu, J., and Pavlidis, P. (2018) Systematic evaluation of isoform function in literature reports of alternative splicing. BMC Genomics, 19:637. [doi: 10.1186/s12864-018-5013-2]

Bitton, D.A., Atkinson, S. R., Rallis, C., Smith, G.C., Ellis, D.A., Chen, Y.Y., Malecki, M., Codlin, S., Lemay, J.-F., and Cotobal, C. (2015) Widespread exon skipping triggers degradation by nuclear RNA surveillance in fission yeast. Genome Research. [doi: 10.1101/gr.185371.114]

Hsu, S.-N., and Hertel, K.J. (2009) Spliceosomes walk the line: splicing errors and their impact on cellular function. RNA biology, 6:526-530. [doi: 10.4161/rna.6.5.986]

Melamud, E., and Moult, J. (2009a) Stochastic noise in splicing machinery. Nucleic acids research, gkp471. [doi: 10.1093/nar/gkp471]

Melamud, E., and Moult, J. (2009b) Structural implication of splicing stochastics. Nucleic acids research, gkp444. [doi: 10.1093/nar/gkp444]

Mudge, J.M., and Harrow, J. (2016) The state of play in higher eukaryote gene annotation. Nature Reviews Genetics, 17:758-772. [doi: 10.1038/nrg.2016.119]

Pickrell, J.K., Pai, A.A., Gilad, Y., and Pritchard, J.K. (2010) Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet, 6:e1001236. [doi: 10.1371/journal.pgen.1001236]

Saudemont, B., Popa, A., Parmley, J.L., Rocher, V., Blugeon, C., Necsulea, A., Meyer, E., and Duret, L. (2017) The fitness cost of mis-splicing is the main determinant of alternative splicing patterns. Genome biology, 18:208. [doi: 10.1186/s13059-017-1344-6]

Stepankiw, N., Raghavan, M., Fogarty, E.A., Grimson, A., and Pleiss, J.A. (2015) Widespread alternative and aberrant splicing revealed by lariat sequencing. Nucleic acids research, 43:8488-8501. [doi: 10.1093/nar/gkv763]

Tress, M. L., Martelli, P. L., Frankish, A., Reeves, G. A., Wesselink, J. J., Yeats, C., ĺsólfur Ólason, P., Albrecht, M., Hegyi, H., Giorgetti, A. et al. (2007) The implications of alternative splicing in the ENCODE protein complement. Proceedings of the National Academy of Sciences, 104:5495-5500. [doi: 10.1073/pnas.0700800104]

Zhang, Z., Xin, D., Wang, P., Zhou, L., Hu, L., Kong, X., and Hurst, L. D. (2009) Noisy splicing, more than expression regulation, explains why some exons are subject to nonsense-mediated mRNA decay. BMC biology, 7:23. [doi:10.1186/1741-7007-7-23]

Debating alternative splicing (part I)
Debating alternative splicing (part II)
Debating alternative splicing (Part III)
Debating alternative splicing (Part IV)

1. Some authors recognize this problem but they solve it by distinguishing between functional alternative splicing and spurious alternative splicing. I don't think this is helpful.

2. They should not have been shocked for other reasons, as well.

Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome

False History and the Number of Genes 2010

False history and the number of genes: 2016

3. I'm well aware of the fact that other types of genes could be alternatively spliced; especially genes involved in regulating gene expression. However, the proponents of alternative splicing do not single out specific types of genes; instead they claim that 90% of all genes are alternatively spliced. This must include thousands of conserved genes required for normal metabolic events. I focus attention on those genes to illustrate the absurdity of the claim.

Ast, G. (2005) The alternative genome. Scientific American, 292:58-65. [doi: 10.1038/scientificamerican0405-58]

Tress, M.L., Abascal, F., and Valencia, A. (2017) Alternative splicing may not be the key to proteome complexity. Trends in biochemical sciences, 42:98-110. [doi: 10.1016/j.tibs.2016.08.008]

Tress, M.L., Abascal, F., and Valencia, A. (2017b) Most Alternative Isoforms Are Not Functionally Important. Trends in biochemical sciences, 42:408-410. [doi: 10.1016/j.tibs.2017.04.002]

39 comments:

GregThursday, April 09, 2020 6:26:00 AM
This comment has been removed by the author.
ReplyDelete
Replies
Mark SturtevantThursday, April 09, 2020 10:15:00 AM
A while ago I had a conversation about this topic with a colleague at work. Like me, he had no real dog in this fight other than an interest in keeping abreast in biology and in understanding things accurately. But when I explained this viewpoint that the evidence for pervasive alternative splicing does not in fact exist he looked at me like I was crazy. It was surprisingly difficult to get him to see that the presence of some functional alternative splicing does not mean that all splicing variants are functional.
*Sigh*
ReplyDelete
Replies

Add comment