More Recent Comments

Tuesday, October 09, 2018

Alternative splicing and the gene concept

I just learned about a workshop scheduled for the end of this month. The topic is: Evolutionary Roles of Transposable Elements and Non-coding DNA: The Science and the Philosophy.

I'd love to attend but it's a just small workshop designed to encourage dialogue between scientists and philosophers who are interested in the topic. Here's a list of the speakers ...
  • Ryan Gregory: Junk DNA, genome size, and the onion test.
  • Stefan Linquist: Four decades debating junk DNA and the Phenotype Paradigm is (somehow) alive and well.
  • Chris Ponting: 92.9% of the human genome evolved neutrally.
  • Paul Griffiths: Both adaptation and adaptivity are relevant to diagnosing function.
  • Ford Doolittle: Selfish genes and selfish DNA: is there a difference?
  • Justin Garson: Biological functions, the liberality problem, and transposable elements.
  • Joyce Havstad: Evolutionary Thinking about Critique of Function Talk.
  • Guillame Bourque: Impact of transposable elements on human gene regulatory networks.
  • Ulrich Stegman: On parity, genetic causation and coding.
  • Steven Downes: Understanding non-coding variants as disease risk alleles.
  • Alexander Palazzo: How nuclear retention and cytoplasmic export of RNAs reduces the deleteriousness of junk DNA.
  • David Haig: Pax somatica
  • Cedric Feschotte: Transposable elements as catalysts of genome evolution.
There's a reading list for the workshop and several of the papers are new to me [Recommended Reading]. I was particularly interested in one of the papers by Stephen M Downes, a philosopher at the University of Utah and one of the participants in the upcoming workshop.
Downes, S.M. (2004) Alternative splicing, the gene concept, and evolution. History and philosophy of the life sciences:91-104. [PDF]
The paper discusses two of my favorite topics: alternative splicing and "what is a gene?" Another philosopher who's interested in defining the biological gene is Paul Griffiths and he will also be at the meeting. I remember talking to Paul and Karola Stotz at the junk DNA meeting in London a few years ago where I tried to explain that alternative splicing may not be real. They were not convinced.

Paul and Karola have written a book about genes where they claim that recent discoveries in genomics, including abundant alternative splicing, have overthrown the standard definition of a molecular gene. Their view on the importance of alternative splicing is not substantially different from that expressed by Stephen Downes in his 2004 paper so I'll concentrate on that paper.

Downes claims that the human proteome is enormously more complex than the number of genes would suggest. He is repeating a claim that, even today, is popular in the scientific literature. That doesn't make it true: in fact, there is no scientific evidence to support such a claim and plenty to refute it [The proteome complexity myth] [How many proteins in the human proteome?]. Downes goes on to offer an explanation for this imagined disparity between the number of genes and the number of proteins: the explanation is alternative splicing.

Giffiths and Stotz make the same argument on page 69 of their book ...
Another discovery of the postgenomic era has been the discrepancy between the number of genes in a genome and the number of products derived from them. For example, the human proteome outnumbers the number of discrete protein-coding genes by at least one order of magnitude, The human genome contains in the region of 20-25,000 genes (the correct number is still not known), while predictions have given numbers as high as 1 million proteins (Mueller et al., 2007). As we will show at length in 4.4 and 4.5, this discrepancy is explained by the fact that cellular mechanisms use the same coding region to make many different products and combine resources from different coding regions to make products.
I don't believe that there's a serious discrepancy that needs explaining. The reference quoted by Griffiths and Stotz does, indeed, make the claim that there may be up to one million different proteins in human cells but it's important to understand where this estimate comes from. Here's what Mueller et al. say in their review,
The relatively low number of human genes suggests that complexity of human biology is achieved through regulation on the transcriptional, post-transcriptional and post-translational level. Alternative splicing and translation as well as post-translational modification (e.g.: phosphorylation, glycosylation and proteolytic cleavage) both contribute to a “proteomic stratification” process that produces a protein population with a diversity that is several orders of magnitude higher than that of the number of genes encoding them. Correspondingly, it has been estimated that the human proteome comprises up to 1,000,000 protein species.
It looks like the estimate of one million different proteins is partly based on the assumption that alternative splicing is a real phenomenon in which case using the estimate to support the idea of alternative splicing seems like a failure in logic. But we don't need to quibble about "estimates" because there's real data to consider (see below).

Setting aside alternative splicing, there's still a major flaw in the argument that an enormous proteome requires rethinking fundamental concepts. Most of the Mueller et al. article is devoted to post-translational modifications that have been understood for decades. If every one of the 20,000 gene products have 50 such variants then there would be one million different protein species in the genome but, if true, this is not a "discrepancy" and it would not require any extraordinary explanation like alternative splicing. In other words, there's no mystery that needs explaining.

However, even the idea that the average polypeptide gene product gives rise to 50 different post-translational functional variants is ridiculous. For example, it would mean that each of the enzymes of the glycolytic pathway and the citric acid cycle have, on average, 50 different variants. These enzymes have been studied for half a century and there's no evidence to support such a claim. There's no evidence that every one of the subunits of RNA polymerases have 50 different variants nor is there any evidence that the subunits of the mitochondrial electron transport complexes exist in 50 different biologically relevant variations.

So, we can dismiss one of the major rationalizations for abundant alternative splicing but that doesn't mean that alternative splicing has been disproved. For that we have to look at the direct evidence. The evidence for abundant transcript variants for each multi-exon gene is solid. The important question is whether these variants are just the result of sloppy splicing, in which case they are junk RNA, or whether they are biologically relevant RNAs with a function, in which case they are genuine examples of alternative splicing.

Several groups have used sophisticated techniques to look for the alternative splice variants and they haven't found them [How many proteins in the human proteome?]. For those who are interested in seeing the actual experimental evidence, I recommend a paper by Bhuiyan et al. (2018). They say,
In this paper we take steps to address the gap between the commonplace assumption that most genes have more than one distinct functional product and evidence-based reality.
The "evidence-based reality" is that only ~5% of curated genes produce functionally diverse isoforms. In other words, massive alternative splicing is not supported by the available evidence. Most transcript variants are junk RNA produced by splicing errors.

The gene annotators have already decided that the vast majority of transcript variants are due to splicing errors. They have been purged from the databases. A typical gene in the genome database now has only two or three potential variants and most of those have not been shown to have a function. It's quite reasonable to hypothesize that only 5% of human protein-coding genes are involved in alternative splicing to produce two or more functional protein variants.

I've covered this debate in a series of post from last year so I won't repeat the arguments here [Are splice variants functional or noise?].1

I believe I'm correct when I say that genuine alternative splicing is not a widespread phenomenon. I'm absolutely certain I'm correct when I say that there's no evidence supporting the claim that almost most all genes are alternatively spliced and that the average gene produces ten or more different functional variants.

That's not the point I'm trying to make. My main argument with philosophers who write about the gene concept is that they are uncritically accepting outlandish claims without considering alternative explanations. It may be true that every gene produces multiple splice variants with multiple promoters and transcription termination sites in which case we may or may not need to revise our definition of a gene. However, it may also be true that those variants just represent sloppy biology and they have no biological function, in which case we don't need to upend our understanding of the molecular gene.

It's wrong for philosophers (and scientists) to just assume that one of those possibilities is correct and then use that, possibly incorrect, assumption to re-define the gene. Real philosophers (and scientists) should be absolutely sure of their facts before making such a radical proposal.

P.S. I define a gene as, "A gene is a DNA sequence that is transcribed to produce a functional product." [Debating philosophers: The molecular gene] [Philosophers talking about genes] [What Is a Gene?]. The functional product is RNA and it may be further processed to give rise to ribosomal RNA, snoRNA, or any number of other functional RNAs. It may also give rise to mRNA that's then translated to produce a protein. There are many genuine examples of alternative splicing but that doesn't affect my definition of a gene. It just means that the primary transcript (= functional product) can be subsequently processed in several different ways.

1. [Debating alternative splicing (part I)] [Debating alternative splicing (part II)] [Debating alternative splicing (Part III)] [Debating alternative splicing (Part IV)]

Bhuiyan, S.A., Ly, S., Phan, M., Huntington, B., Hogan, E., Liu, C.C., Liu, J., and Pavlidis, P. (2018) Systematic evaluation of isoform function in literature reports of alternative splicing. BMC Genomics 19:637. [doi: 10.1186/s12864-018-5013-2]

Mueller, M., Martens, L., and Apweiler, R. (2007) Annotating the human proteome: Beyond establishing a parts list. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics, 1774(2):175-191. [doi: 10.1016/j.bbapap.2006.11.011]


  1. It occurs to me that even for the small minority of genes that actually have more than one functional isoform, it might not even be the case that those variants are somehow there for any adaptive reason.
    So it's not that the alternatively spliced variant is necessary to the organism in any way. Rather it just happens to be the case that some small minority of proteins can function even in the absence (or with excessive copies) of particular exons.

    1. In our paper that Larry refers to (Bhuiyan et al., 2018), when we say ~5% of genes have functionally distinct splice isoforms (FDSIs), we define this by the splice isoform's necessity for the gene's overall function. To (self) quote:

      "To establish the extent to which splice isoforms increase the functional repertoire of the genome, we need data on which genes have functionally distinct splice isoforms (FDSIs). Identification of genes with FDSIs requires experimental support to demonstrate the necessity of each splice isoform.[...] This idea readily extends to isoforms; if a single isoform is made absent and that isoform is necessary for the normal function of the gene, then a consequence (change in phenotype) would be expected. A gene has FDSIs if two or more isoforms meet this criterion independently (Fig. 1a)."

    2. Thank you for the clarification. That implies there could be more genes with alternative splice isoforms which are not necessary for "normal" organismal function, yet nevertheless are capable of performing the function of the primary isoform.

    3. Mikkel, at least for our paper, we tried to do away with the idea of "primary" and alternative transcripts. For the limited subset of genes with functionally distinct splice isoforms (FDSIs), what makes a transcript the primary one? Primary implies some level of importance, and if two or more splice isoforms of a gene are necessary, then they're both important.

      Nevertheless, you are correct - the splice isoforms of the curated genes without evidence of necessity may still be capable of a "function", though as Larry has pointed out that there is limited evidence of that in functional genomics studies and I agree. Perhaps genes with functionally redundant splice isoforms is the correct term.

      As somewhat of an aside, from our curation, we could only identify 43 human and mouse genes with FDSIs, and due to this limited number of genes, we avoided making sweeping generalizations in the paper about the 43 genes with FDSIs. I would hypothesize though that the splicing of these 43 genes are evolving under negative selection because their FDSIs are likely necessary for reproductive success.

      Future researchers looking to provide new evidence for functional distinctness should prioritize their research towards splice isoforms likely evolving under negative selection. Given the neutral theory and the nearly neutral theory of evolution, randomly selecting splice isoforms will be a waste of resources as most splice isoforms are likely nonfunctional.

  2. " It's quite reasonable to hypothesize that only 5% of human protein-coding genes are involved in alternative splicing to produce two or more functional protein variants."

    It seems to me that this is a significant issue for selfish gene picture, which holds natural selection optimizes the genome. (Ditto epigenetics, unless methylation etc. are to be viewed as genetically determined processes that have undergone positive selection.)

    In gene selectionist popularizations, "genes" are effectively the Evolution God's commandments written in DNA rather than stone. In that context, revising the optimized genome does imply a new understanding of "gene."

  3. What is the "phenotype paradigm"? The only reference I have found so far is in the title of a paywalled article by Ford Doolittle, and the abstract says nothing about it.

    1. Doolittle and Sapienza coined the term "phenotype paradigm" in their 1980 paper on selfish genes. They are referring to the adaptationist idea that all sequences are selected for their effect on the phenotype of the organism. They were proposing that there's a different kind of selection; namely, selection at the level of selfish genes, as in transposons. In that case, the phenotype of the organism not relevant because selection is occurring only for the survival of the selfish gene (transposon).

      We know that bacterial and eukaryotic transposons have to make multiple copies of themselves before they are inactivated by mutations. In this way, they preserve functional copies that carry the elements essential for their survival. We can think of this as a form of selection at the level of the gene that's independent of the organism.

      We know that this is a rare phenomenon accounting for only a tiny percentage of a typical genome and we know that in the big picture of evolution it's just a footnote. However, philosophers of biology tend to put a great deal of emphasis on the discovery of selfish DNA and the overthrowing of the phenotype paradigm. I don't think I've ever heard of the phenotype paradigm in the scientific literature except in 1980.

      I'm going to meet with Stefan Linquist next week to discuss this issue. There seems to be a big difference between how scientists view the selfish DNA papers (interesting, but not terribly important) and how philosphers see them (groundbreaking, and paradigm shifting).

    2. @Larry: So there's no difference in practice between "weird exception" and "paradigm shift", at least as far as these philosophers are concerned. An interesting contrast with mathematics, where the statement "prime numbers are odd" is false, while in a field like biology it would be generally true.