More Recent Comments

Saturday, December 15, 2018

Alternative splicing in the nematode C. elegans

The importance of alternative splicing is highly controversial. In the case of humans, the competing views are: (a) more than 90% of human protein-coding genes are alternatively spliced to produce multiple protein isoforms, and (b) less than 10% of human genes are alternatively spliced and most of the splice variants detected are due to splicing errors.

In addition to this fundamental difference in how to interpret the data, there's a controversy over the meaning and significance of abundant alternative splicing, assuming that it exists. The consensus view among the workers in the field is that alternative splicing is ubiquitous and it explains why humans are so complex when they have only the same number of genes as "lower" species like the nematode C. elegans. This was the view expressed by Gil Ast in a 2005 Scientific American article on "The Alternative Genome."
Spring of 2000 found molecular biologists placing dollar bets, trying to predict the number of genes that would be found in the human genome when the sequence of its DNA nucleotides was completed. Estimates at the time ranged as high as 153,000. After all, many said, humans make some 90,000 different types of protein, so we should have at least as many genes to encode them. And given our complexity, we ought to have a bigger genetic assortment than the 1,000-cell Caenorhabditus elegans, which has a 19,500-gene complement, or corn, with its 40,000 genes.

When a first draft of the human sequence was published the following summer, some observers were therefore shocked by the sequencing team's calculation of 30,000 to 35,000 protein-coding genes. The low number seemed almost embarrassing. In the years since, the human genome map has been finished and the gene estimate has been revised downward still further, to fewer than 25,000. During the same period, however, geneticists had come to understand that our low count might actually be viewed as a mark of sophistication because humans make such incredibly versatile use of so few genes.

Through a mechanism called alternative splicing, the information stored in the genes of complex organisms can be edited in various ways, making it possible for a single gene to specify two or more distinct proteins. As scientists compare the human genome to those of other organisms, they are realizing the extent to which alternative splicing accounts for much of the diversity among organisms with relatively similar gene sets. In addition, within a single organism, alternative splicing allows different tissue types to perform diverse functions working from the same small gene assortment.

Indeed, the prevalence of alternative splicing appears to increase with an organism's complexity—as many as three quarters of all human genome genes are subject to alternative splicing. The mechanism itself probably contributed to the evolution of that complexity and could drive our further evolution
Gil Ast is referring to the widely-held view that there's a problem with the low number of human genes. Like the majority of scientists these days, he thought that the number of genes should correlate with complexity and the fact that the most sophisticated species (humans) didn't have a lot more genes that a lowly nematode like C. elegans was a major shock. This apparent paradox became known as the G-value paradox and it was "solved" by recognizing that sophisticated species like us can get away with so few genes by making many different proteins from each gene.

I addressed the errors in this view in a recent post so let me just summarize two of the main facts that I dispute [The persistent myth of alternative splicing]:
  1. We don't make 90,000 or 200,000 or one million different proteins as many scientists claim. Instead, the total number of distinct polypeptide chains is closer to 20,000.
  2. Alternative splicing is not common in humans; probably fewer than 1000 genes are alternatively spliced.
The purpose of this post is to dispute another crucial part of the argument; namely, the idea that the specialnesss of humans is due to alternative splicing. This part of the argument depends on showing that less complex species exhibit a lot less alternative splicing so they produce fewer proteins even though they have the same number of genes as we do. Gil Ast makes this part of the argument very clear when he says that the amount of alternative splicing increases with complexity. If that weren't true then the entire argument falls apart. What if most C. elegans genes were also alternatively spliced? How does that explain the G-value paradox?

Well, as it turns out, we've known for some time that C. elegans genes are also alternatively spiced but this hasn't seemed to matter to most scientists. The most recent analysis of the data indicates that 94% of C. elegans genes are alternatively spliced using the same criteria that are used in humans (Tourasse et al., 2017). Oops! There goes that argument.

Notice that I was careful to specify that Tourasse et al. used the same criteria as others have used to identify alternative splicing in humans. That's because the authors of the latest study are far more skeptical about alternative splicing than most others who work in this field. They are aware of the evidence for splicing errors so they know that the mere existence of splice variants is not evidence of biologically relevant alternative splicing. They also know the following bit of information that seems to have escaped the notice of many scientists.
... these observations support the widespread existence of biological noise in the splicing process. The sensitivity of modern deep sequencing methods for transcriptome characterisation ensures that even rare aberrantly spliced messengers will be detected alongside functional splice forms.
In an effort to distinguish between noise and genuine alternative splicing, the authors divided their 6 million splice variants into several categories corresponding to the concentration of the splice variants and the level of expression of the gene they came from. The idea here is that there should be a correlation if the variants are splicing errors: highly expressed genes are more likely to produce rare splice variants due to splicing errors.

I'm impressed with their reasoning ability in the following passages so I'll quote them all.1
We reasoned that by exploring a large collection of RNA-seq datasets we could compile a nearly comprehensive list of splice junctions in the C. elegans genome. The total amount of data used in our study provides a robust quantitative measure of the frequency of usage of each detected alternative splice junction and the expanded dynamic range obtained also allows for discrimination between genuine alternative splicing and potential biological noise. Our observation that “rare” junctions come disproportionately from highly expressed genes and are less conserved in other nematode species, could indicate that most of these junctions correspond to biological noise causing accidental splicing outside of the preferred functional sites. A similar analysis in human cell lines also found that rare alternative splicing events are more frequently observed in highly expressed genes and tend to be less conserved across species (Pickrell et al., 2010). The authors of that study also found that the small fraction of reads coming from unannotated rare junctions covered a large number of likely spurious splicing events.

While it is likely that our discrimination between rare and robust splice variants offers a good approximation for the functionality threshold of any given isoform, it is almost certain that some exceptions will apply. It is possible, for example, that a ubiquitously expressed gene also encodes a rare variant with a limited cell specificity, but studying and validating this kind of event will constitute its own challenge. In that context our classification should be considered a warning sign: these “rare” events are unlikely to be functional and studying them will be not be trivial as they are not detected in most RNA-seq experiments.

Our compendium based meta analysis provides a widely expanded dynamic range of detection with genes having between 0 and 107 reads per junctions. If we considered every splice junction detected by any RNA-seq experiment in our compendium we would conclude that ~94% of C. elegans genes are submitted to alternative splicing (Table 1). If we consider only genes for which there is a second isoform with a frequency of at least 1% of the major isoform, this number drops to 35% (~7,000 genes). This could be a valid definition since our analysis suggests that the majority of "rare" splicing events corresponds to biological noise rather than a conserved functional mechanism. However, it is possible that for some genes a rare isoform indeed is critical for a cell specific function. Conversely, we cannot proclaim that every event that is above the 1% threshold is a genuine alternative splicing event. If we place the bar at 5% of the gene expression level, then only 4,700 genes have more than one isoform. There is no objective quantitative criteria that can systematically discriminate between functional and spurious alternative isoforms at this time.
Regardless of whether the results indicate that 94% or 25% or even fewer genes are alternatively spliced, the point is that the same reasoning applies to human splice variants. The fact that there's no significant difference between alternative splicing in C. elegans and humans means that you can't use that excuse to soothe your Deflated Ego on learning that humans and C. elegans have approximately the same number of genes.

1. Under normal circumstances this line of reasoning should not be exceptional—it should be common practice. The fact that it's unusual illustrates the abysmal quality of most scientific papers these days.

Image Credits: C. elegans: Wikipedia; Gil Ast: Edmond J. Safra Center for Bioinformatics

Pickrell, J.K., Pai, A.A., Gilad, Y., and Pritchard, J.K. (2010) Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet, 6(12), e1001236. [doi: 10.1371/journal.pgen.1001236]

Tourasse, N.J., Millet, J.R., and Dupuy, D. (2017) Quantitative RNA-seq meta analysis of alternative exon usage in C. elegans. Genome Research, gr. 224626.224117. [doi: 10.1101/gr.224626.117]


Georgi Marinov said...

Is Bigelowiella on your to-do list for alternative splicing?

John Harshman said...

Thanks. I figured.

Unknown said...

I believe that is what is called a 'smackdown' in wrestling lingo.
Nicely done.