Friday, June 23, 2017

Debating alternative splicing (part I)

I recently had a chance to talk science with my old friend and colleague Jack Greenblatt. He has recently teamed up with some of my other colleagues at the University of Toronto to publish a paper on alternative splicing in mouse cells. Over the years I have had numerous discussions with these colleagues since they are proponents of massive alternative splicing in mammals. I think most splice variants are due to splicing errors.

There's always a problem with terminology whenever we get involved in this debate. My position is that it's easy to detect splice variants but they should be called "splice variants" until it has been firmly established that the variants have a biological function. This is not a distinction that's acceptable to proponents of massive alternative splicing. They use the term "alternative splicing" to refer to any set of processing variants regardless of whether they are splicing errors or real examples of regulation. This sometimes makes it difficult to have a discussion.

In fact, most of my colleagues seem reluctant to admit that some splice variants could be due to meaningless errors in splicing. Thus, they can't be pinned down when I ask them what percentage of variants are genuine examples of alternative splicing and what percentage are splicing mistakes. I usually ask them to pick out a specific gene, show me all the splice variants that have been detected, and explain which ones are functional and which ones aren't. I have a standing challenge to do this with any one of three sets of genes [A Challenge to Fans of Alternative Splicing].
  1. Human genes for the enzymes of glycolysis
  2. Human genes for the subunits of RNA polymerase with an emphasis on the large conserved subunits
  3. Human genes for ribosomal proteins
I realize that proponents of massive alternative splicing are not under any obligation to respond to my challenge but it sure would help if they did.

Here's my position.
  1. Most splice variants are present at very low concentrations. They are most easily explained as splicing errors since we know the spliceosome is error prone (Hsu and Hertel, 2009). As is so often the case in these discussions, the null hypothesis should be no function.
  2. The term "alternative splicing" should be reserved for those cases where a biological function of the splice variants has been demonstrated.
  3. Hyperbole about the prevalence and importance of presumed alternative splicing should be avoided unless you can back it up with solid evidence.
  4. All publications concerning presumed alternative splicing should mention that there's a controversy and discuss why they prefer to treat their data as biologically relevant. They should describe what steps they have taken to eliminate spurious splicing errors from their data.
I've discussed these issues many times on this blog ....
At the present time, the overwhelming opinion among workers in the field of alternative splicing is that most splice variants are biologically relevant. The "splicing error" view that I support is definitely the minority position. There aren't many papers in the scientific literature that challenge the current dogma. Here's a short list of the most important ones.
Hsu, S.-N., and Hertel, K.J. (2009) Spliceosomes walk the line: splicing errors and their impact on cellular function. RNA biology, 6:526-530. [doi: 10.4161/rna.6.5.986]

Melamud, E., and Moult, J. (2009) Stochastic noise in splicing machinery. Nucleic acids research, gkp471. [doi: 10.1093/nar/gkp471]

Pickrell, J.K., Pai, A.A., Gilad, Y., and Pritchard, J.K. (2010) Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet, 6:e1001236. [doi: 10.1371/journal.pgen.1001236]

Tress, M.L., Abascal, F., and Valencia, A. (2016) Alternative Splicing May Not Be the Key to Proteome Complexity. Trends in Biochemical Sciences. [doi: 10.1016/j.tibs.2016.08.008]

Zhang, Z., Xin, D., Wang, P., Zhou, L., Hu, L., Kong, X., and Hurst, L. D. (2009) Noisy splicing, more than expression regulation, explains why some exons are subject to nonsense-mediated mRNA decay. BMC biology, 7:23. [doi:10.1186/1741-7007-7-23]
I especially like the Pickrell et al. paper because it presents the controversy in the correct scientific manner. The abstract is great ....
While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution.
Let's me make my position very clear. I'm a strong supporter of the idea that most splice variants are due to splicing errors. I'm a strong supporter of the idea that the proper way to do science is to provide evidence for any claims of functionality. (Don't just assume it.) I strongly believe that if a field is controversial—and this one certainly is—then you have a ethical obligation to mention the controversy and point out why you disagree with your opponents. This criticism is directed mainly at proponents of massive alternative splicing; they must demonstrate that they understand the criticism of their position.

Here's the paper that Jack Greenblatt and his colleagues published in February.
Han, H., Braunschweig, U., Gonatopoulos-Pournatzis, T., Weatheritt, R. J., Hirsch, C.L., Ha, K.C., Radovani, E., Nabeel-Shah, S., Sterne-Weiler, T., Wang, J., O'Hanlon, D.O., Pan, Q., Ray, D., Zheng, H., Vizeacoumar, F., Datti, A., Magomedova, L., Cummins, C.L., Hughes, T.R., Greenblatt, J.F., Wrana, J.L., Moffat, J., and Blencowe, B.J. (2017) Multilayered control of alternative splicing regulatory networks by transcription factors. Molec. Cell, 65:539-553. e537. [doi: 10.1016/j.molcel.2017.01.011]

Networks of coordinated alternative splicing (AS) events play critical roles in development and disease. However, a comprehensive knowledge of the factors that regulate these networks is lacking. We describe a high-throughput system for systematically linking trans-acting factors to endogenous RNA regulatory events. Using this system, we identify hundreds of factors associated with diverse regulatory layers that positively or negatively control AS events linked to cell fate. Remarkably, more than one-third of the regulators are transcription factors. Further analyses of the zinc finger protein Zfp871 and BTB/POZ domain transcription factor Nacc1, which regulate neural and stem cell AS programs, respectively, reveal roles in controlling the expression of specific splicing regulators. Surprisingly, these proteins also appear to regulate target AS programs via binding RNA. Our results thus uncover a large “missing cache” of splicing regulators among annotated transcription factors, some of which dually regulate AS through direct and indirect mechanisms.
The idea is to identify all those factors that contribute to producing splice variants. It turns out there are many factors involved, including known transcription factors. The data says nothing about biological function since splicing errors are just as likely to be causing by spurious binding of factors as true alternative splicing. The authors are assuming that these factors are involved in "regulation" but they don't discuss the alternative explanation; namely, splice errors and artifacts.

Let's look at the introduction to see if it meets the sugested standards for papers on this subject.
Alternative splicing (AS) is the process by which different combinations of splice sites in precursor mRNA (pre-mRNA) are selected to generate structurally and functionally distinct mRNA and protein variants. It acts widely to expand the functional and regulatory capacity of metazoan genomes (Irimia and Blencowe, 2012; Licatalosi and Darnell, 2010 ; Nilsen and Graveley, 2010). For example, nearly all transcripts from human multi-exon genes are alternatively spliced, and a substantial fraction of these splice variants are differentially expressed in a cell- and tissue-specific manner (Pan et al., 2008 ; Wang et al., 2008). AS has critical roles in diverse biological processes, including cell-fate determination, and misregulation of AS is associated with numerous diseases (Daguenet et al., 2015; Jangi and Sharp, 2014 ; Kalsotra and Cooper, 2011). An important challenge is to understand how networks of AS events are coordinately regulated to impart their biological roles in different cellular contexts.
The good news is that they associate alternative splicing with functionally distinct mRNA and protein variants. The bad news is that they don't present any evidence that the variants they look at are truly functional.

The introduction contains the standard dogma about alternative splicing. They assume, without proof, that "nearly all" human protein-coding genes are alternatively spliced to produce functional isoforms. This is a true example of begging the question since the real question is not what causes alternative splicing but whether it actually exists. You shouldn't be investigating coordinate regulation until you can establish that there really is coordinate regulation in any meaningful sense. The authors ignore any controversy that challenges their claims even though they are certainly aware of it.

The authors make the standard implicit assumption that cell- and tissue-specific splicing is evidence of function. In fact, this assumption is wrong since the alternative explanation—incorrect splicing due to inappropriate binding of various factors—will also be tissue-specific.


30 comments:

  1. Hasnt anyone compared splice variants of glycolytic enzymes, pol2 etc between mouse and rat or human and chimp? I think any differences would clearly suggest lack of function.

    ReplyDelete
  2. The typical protein-coding gene has about 10-20 splice variants listed in the various databases.

    There are two distinct possibilities.

    1. All splice variants have a biological function.
    2. Only a subset of those variants have a function.

    The first possibility is indefensible.

    If the second possibility is correct then alternative splicing papers need to tell us which variants they are examining and why they think they are functional.

    ReplyDelete
    Replies
    1. But how do you verify function? Short of showing the protein product interacting with other components of the cell the only thing I can think of is inserting a construct using knock-in that preserves coding but removes splice junctions. But even if this had an effect one could always blame it on the constructs integration or expression etc.

      Delete
    2. Nobody says it's going to be easy but that's no reason to just ASSUME that the variants are functional.

      Delete
    3. I think it's possible to come up with some heuristics though, even in the absence of knockout experiments that directly test individual splice variants,.

      1) Conservation - is the splice variant consistent across species? This is probably the strongest indirect evidence.

      2) Developmental regulation - does the ratio of splice variants change between cell types? This type of evidence is not as strong, but does at least show that it's not a simple "x% of all splicing goes wrong". This line is stronger if you can show that the "alternative" form is the majority form in one or more cell types.

      3) Predicted coding capacity. This is harder to interpret since there are known functional alternative variants that alter the reading frame and functional alternative variants that do not.


      But it's helpful also to consider carefully what is meant by "functional". Consider the case of a gene with one major splice isoform that is unambiguously functional, and a bunch of minor splice isoforms that do not code for translatable proteins and are degraded by nonsense-mediated decay.

      If the gene in question is dosage sensitive, one can argue that the functional role of the alternative splicing is to prevent overdose of the gene product, and/or to regulate tissue specificity by altering the splicing pattern in different cell types. It's essentially a post-transcriptional form of transcriptional regulation.

      Delete
    4. The developmental "regulation" or tissue-specificity argument is weak. There is alternative splicing that is functional, and there are splicing factors that are expressed differentially in different conditions and cell types. Having differences in the splicing machinery in different tissues can perfectly result in tissue-specific noise. Hence, finding that some exons is slightly more frequent in one tissue than in another proves nothing. If they were functional we should see some selection to preserve those alternative exons, but this is not the case for most of them.

      As suggested by Peter, there are many possibilities in which alternative splicing could be functional without coding for functional proteins. However, as stated in this blog entry, the "null hypothesis should be no function". Multiple strands of evidence agree on the null: proteomics, evolution, protein structure... Which does not imply that there is relevant alternative splicing!

      Delete
    5. Sorry, I meant to say "Which does not imply that there is not relevant alternative splicing!" I mean, there IS

      Delete
    6. Hmm. I fear we are wandering into territory where one man's noise is another man's regulation. Tissue-specific "noise" in splicing efficiency is indistinguishable from tissue-specific "regulation" of the level of the major transcript isoform.

      In the suggested scenario, there are two factors - promoter strength and splicing pattern - which cooperate to regulate the level of the functional transcripts. Both can vary in a tissue-specific manner, and a mutation perturbing one element of the equation could trigger adaptive change in the other.

      Say I introduce a splice site mutation into a cell line meaning that gene A now generates 30% useless transcripts -
      a hypomorphic allele. Given sufficient time and selective pressure, the cell line could doubtless adapt by evolving a stronger promoter. If I now mutate the splice site back again, suddenly gene A is overdosed.

      There will now be purifying selection to maintain both the changes in the promoter and the changes in the splice site.
      So, once the promoter has adapted and equilibrium is restored, does that make the splice site mutation "functional"? From one point of view, yes - it's now necessary in order to prevent the gene being overdosed. From another point of view, it's just a bit of random biochemical noise that the cell has learned to tolerate.

      Formally, it's an epistatic interaction between two linked variants - one at the splice site and another in the promoter.

      I have the same quandary when pondering miRNA regulation of transcription - to what extent is this regulation a necessary part of making a cell work well, and to what extent is it an accumulation of pointless complexity that the cell can't eliminate?

      If you were making a human from scratch, how many genes would you actually need?

      Delete
  3. A worthwhile question might be how a splice variant influences the conservation of sequence similarity in the resulting protein. If a variant interrupts a domain or chops off a region that is otherwise conserved then I would be suspicious as to whether it means anything. Perhaps that's a facile view - seems like it should be front and centre to me...

    ReplyDelete
    Replies
    1. There have been several attempts to answer this question. When protein structure people look at the predicted isoforms they aren't impressed. Most of the predictions make no sense.

      There have also been attempts to detect the isoforms using mass spec. Those haven't been successful suggesting that the isoforms don't exist.

      As you might imagine, proponents of alternative splicing have advanced arguments to discount these observations but they sound a lot like special pleading.

      Keep in mind that many genes encode proteins that are found in large complexes. For example, the genes for the large subunits of RNA polymerase exhibit dozens of RNA variants with dozens of predicted protein isoforms. Many of them are missing large internal stretches due to exon skipping. None of the people studying RNA polymerase, including Jack Greenblatt, have stumbled upon these predicted isoforms over the past 40 years. It makes no sense that they would play a role in transcription.

      The latest version of the RefSeq database has eliminated and ignored the vast majority of splice variants. The annotators have only retained one or two variants per gene - the ones most likely to encode a functional variant. Most of those are probably incorrect. It emphasizes the fact that most variants have already been classified as splicing errors.

      Delete
    2. "As you might imagine, proponents of alternative splicing have advanced arguments to discount these observations but they sound a lot like special pleading."

      I like that. In the past, when it was found that most alternative exons were not conserved between mouse and human, proponents of AS argued that these AS events were species-specific innovations, giving an even more important role for AS. Now, with population variation data we can test whether alternatively-spliced exons are innovations in the human lineage. Summary=no signature of selection (except on those that are already conserved between species).

      Delete
  4. Your internet colleague Federico Abascal wrote a very good paper that most alternative splices may not even be translated!

    ReplyDelete
    Replies
    1. Sal, does that mean you don't think alternative splicing is a big thing? I'm happy to see that you don't go for every single bit of bio-woo.

      Delete
    2. I don't think anyone really knows what alternative splicing really does. I was just trying to help Larry out with his own hypothesis of junk and see what he says and whether he agrees with Federico's paper.

      But we do know choosing the wrong splice site is implicated in disease. So even if the alternative splices are not translated, they may have regulatory roles.

      But I'm for fair and honest discussion of all available facts. I thought I'd give Larry a little help since he's been kind enough to give me a little help with my work. Quid pro quo.

      But it may all be moot if he's wrong in the end. I've insisted we don't really know very much at all relative to what can be know. It's a little early in the game.

      FWIW, the authors of the rival Biochem textbook, Lehninger, has a favorable view of Alu's role and Alternative splicing. They practically said what I've said about Alu's value in the genome. So, surely there is room for a diversity of opinions in the present vacuum of actual laboratory facts.

      Delete
    3. Sal,

      Of course we know what alternative splicing really does. There are some excellent examples in the textbooks. We know how alternative splicing is regulated and we know the structures of the various factors and their binding sites. Most of this knowledge is 30 years old.

      It's because we know so much about real biologically meaningful alternative splicing that leads us to be skeptical of many of the claims being made today.

      It's not true to say that "choosing the wrong splicing site" causes genetic disease. It's far better to say that some diseases are caused by mutations that disrupt normal splicing. Some of these mutations affect normal splice sites but most of them occur in junk DNA (introns) where they create new splice sites.

      Delete
    4. Some of these mutations affect normal splice sites but most of them occur in junk DNA (introns) where they create new splice sites.

      Thank you. That's a question I was wondering about. Causation of disease through modification of junk to create new splice sites makes sense.

      Is it still reasonable to conclude mutations in junk DNA are a fair amount less likely to cause problems for the organism than mutations in useful DNA?

      However, before going too far down the path that this is a reason junk exists, see T Ryan Gregory's blog: http://www.genomicron.evolverzone.com/2009/12/does-junk-dna-protect-against-mutation/

      Delete
    5. Interesting blog post. (at Genomicron).

      Delete
  5. Slight typo, called to your attention just because the term is in quotes:

    My position is that it's easy to detect splice variants but they should be called "spice variants" until it has been firmly established that the variants have a biological function.

    ReplyDelete
    Replies
    1. Thanks. Fixed.

      I admire people who can see such things. I can't. I'm missing the good spelling alleles ... or maybe my spelling gene is spliced incorrectly. :-)

      Delete
    2. I think it's my lack of familiarity with the subject as opposed to you and most readers that makes me go over individual words more carefully. :-)

      Delete
  6. If Alternative Splicing were adaptive, shouldn't it exhibit genetic variation (perhaps with deviant individuals (genotypes) showing functional effects)?

    ReplyDelete
    Replies
    1. Usually we would expect quite the opposite; strong purifying selection and very little variation. This is why conservation of the presence of splice variants across species is such strong evidence of function.

      Delete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Is there documentation of the role of splicing as a route to enabling evolution of new functionality? Does alternative splicing relax selective constraints on conservation of functional gene sequences?

    ReplyDelete
    Replies
    1. It seems to me that it would tighten selective constraints, not relax them. The point of alternative splicing, and here I limit the term to functional products, is to get two or more functions out of one gene, and both, or multiple, functions must have somewhat different selective regimes that constrain sequences in different ways.

      Delete
    2. There is a nice example at least. A transposable element inserted within two pre-existing vertebrate genes in the ancestor of mammals and was co-opted for new functions. A new alternatively spliced isoform was created in such a way that the original isoform was preserved. AS here was key to the evolution of new functionality
      Alternative splicing and co-option of transposable elements: the case of TMPO/LAP2α and ZNF451 in mammals
      F Abascal, ML Tress, A Valencia
      Bioinformatics 31 (14), 2257-2261

      Delete
    3. And there are all the reliable cases of AS, many of which are based on the alternative splicing of mutually exclusive homologous exons. But most of them are very ancient. Here AS contributed to functional innovation, but as these events are very ancient it suggests AS is not a main contributor to functional novelty

      Delete
    4. There is a nice example at least.

      This is one of the 34 examples by those same authors listed in Tress, Abascal, and Valencia (2017). Those authors have been tireless opponents of massive alternative splicing over the past ten years. That hasn't stopped them from documenting and reporting the few genuine cases of alternative splicing.

      Delete
  9. I only just skimmed through this recent paper, but doesn't it imply that great deal of alternative splicing is stochastic in nature?

    Hu, J., Boritz, E., Wylie, W., & Douek, D. C. (2017). Stochastic principles governing alternative splicing of RNA. PLoS Computational Biology, 13(9), e1005761. https://doi.org/10.1371/journal.pcbi.1005761

    ReplyDelete