More Recent Comments

Friday, December 09, 2016

Using conservation to determine whether splice variants are functional

We've been having a discussion about function and how to recognize it. This is important when it comes to determining how much junk is in our genome [see Restarting the function wars (The Function Wars Part V)]. There doesn't seem to be any consensus on how to define "function" although there's general agreement on using sequence conservation as a first step. If some sequence under investigation is conserved in other species then that's a good sign that it's under negative selection and has a biological function. What if it's not conserved? Does that rule out function? The correct answer is "no" because one can always come up with explanations/excuses for such an observation. We discussed the example of de novo genes, which, by definition, are not conserved.

Let's look at another example: splice variants. Splice variants are different forms of RNA produced from the same gene. If they are biologically relevant then they will produce different forms of the protein (for protein-coding genes). This is an example of alternative splicing if, and only if, relevance has been proven.

Unfortunately, most scientists use the term "alternative splicing" to simply describe the presence of splice variants without determining whether they are functional or not. This is confusing. It often reflects the implicit assumption that any spice variant has to be functional when, in fact, that's the most important question. This is a classic example of begging the question.

There's a widespread myth circulating in the scientific literature. It assumes that alternative splicing is a proven fact and it explains another myth called proteome diversity. Proteome diversity is the idea that human cells make at least 100,000 different proteins because each gene can encode about five different polypeptides. The "evidence" for 100,000 different proteins is based on predictions from detecting splice variants and their potential proteins. As you can see, the argument is specious because it's circular [see How many proteins in the human proteome?].

Lest you think I'm making this up, let's look at a 2010 review of alternative splicing published in Nature [Nilsen and Graveley, 2010]. This is a prestigious journal so the review must be scientifically accurate, right? Here's the abstract ....
Expansion of the eukaryotic proteome by alternative splicing

The collection of components required to carry out the intricate processes involved in generating and maintaining a living, breathing and, sometimes, thinking organism is staggeringly complex. Where do all of the parts come from? Early estimates stated that about 100,000 genes would be required to make up a mammal; however, the actual number is less than one-quarter of that, barely four times the number of genes in budding yeast. It is now clear that the 'missing' information is in large part provided by alternative splicing, the process by which multiple different functional messenger RNAs, and therefore proteins, can be synthesized from a single gene.
The scientifically correct title should be: "Is there an expanded proteome and does alternative splicing contribute?"1 The authors recognize three "outstanding questions." The "foremost" is whether the extent of alternative splicing can account for organismal complexity. The third one is whether there is a splicing code that regulates alternative splicing.

The second "outstanding question" is ...
Another crucial question is how many mRNA isoforms are functionally relevant? Teleology suggests that if an isoform exists, it is important (similarly to the way in which 'junk' DNA is now considered to be treasure). But this idea is hard to prove and is difficult for some to accept. Unfortunately, the function of alternative mRNA isoforms has been carefully analysed for only a small proportion of genes. Such studies have, however, provided numerous examples in which alternative splicing clearly gives rise to functionally distinct isoforms (Table 2). But just as discernible changes in phenotype are frequently not observed even when entire genes are deleted, there are also many cases in which functional distinctions between mRNA isoforms have not been observed.
There's a logical inconsistency here. This is clearly the "foremost" question. If the isoforms are NOT "functionally relevant" then the other two questions are meaningless. If there's really any doubt about the relevance of splice variants then why isn't this reflected in the title?

We know the answer. It's because the authors of this review are confident that splice variants really are biologically functional. Why? Because they're there (teleology?). This isn't a useful criterion for determining function. We would like to have real evidence to support our claims.

What if we apply the selected-effect criterion by asking whether the presence of various splice variants is conserved in different species? Nilsen and Graveley considered this test (comparative genomics) but the results were disappointing as they always are in this field. As a general rule, splice variants are not conserved. Not to worry, there's an explanation/excuse ...
Nevertheless, comparative genomic approaches have limitations. Although evolutionary conservation provides strong evidence for the functional significance of a particular splicing pattern, the converse is not always true....

These examples show the high level of evolutionary plasticity that alternative splicing provides. Because small changes (that is, point mutations) in either exons or introns can create or destroy splicing control elements, it is easy to envisage that splicing patterns are constantly evolving: advantageous mutations would rapidly be selected for, and deleterious mutations would be selected against. Indeed, we speculate that 'non-conserved' changes in splicing patterns might underlie the observed phenotypic variations between species and between individuals within species.
There you have it. If the splice variants were conserved then everyone would have looked to that as evidence of function. But if they are not conserved then that's evidence of species specificity, plasticity, and rapid evolution. Your speculation can't be falsified by comparative genomics.

Nice trick.

I was prompted to write this post when I saw a paper that had just been published a few days ago. The paper looked at the conservation of alternative splicing in four species of Drosophila (Gibilisco et al., 2016). I wondered how the authors would deal with lack of conservation.

The first line of the introduction wasn't promising ...
Alternative pre-mRNA splicing (“AS”) greatly expands the proteome diversity within species by creating different combinations of exons from the same genomic loci [1].

Reference #1 is the review by Nilsen and Graveley that I described above. Here are some other references that they could have mentioned ...
Hsu, S.-N., and Hertel, K.J. (2009) Spliceosomes walk the line: splicing errors and their impact on cellular function. RNA biology, 6:526-530. [doi: 10.4161/rna.6.5.986]

Melamud, E., and Moult, J. (2009) Stochastic noise in splicing machinery. Nucleic acids research, gkp471. [doi: 10.1093/nar/gkp471]

Pickrell, J.K., Pai, A.A., Gilad, Y., and Pritchard, J.K. (2010) Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet, 6:e1001236. [doi: 10.1371/journal.pgen.1001236]

Tress, M.L., Abascal, F., and Valencia, A. (2016) Alternative Splicing May Not Be the Key to Proteome Complexity. Trends in Biochemical Sciences. [doi: 10.1016/j.tibs.2016.08.008]

Zhang, Z., Xin, D., Wang, P., Zhou, L., Hu, L., Kong, X., and Hurst, L. D. (2009) Noisy splicing, more than expression regulation, explains why some exons are subject to nonsense-mediated mRNA decay. BMC biology, 7:23. [doi:10.1186/1741-7007-7-23]
Gibilisco et al. didn't mention these papers (except Pickrell et al.) because that's not how science works these days. If you are committed to a favorite hypothesis then you're not supposed to mention those who disagree with you. That's just confusing your readers! :-)

Let's get back to the point. What did Gibilosco et al. find when they compared splice variants in different species? Did it confirm their assumptions that most splice variants are functional? Of course not. They found the same thing everyone else finds; namely, that splice variants are usually not conserved. What do they conclude? They conclude that alternative splicing is evident of species-specific adaptive evolution! Thus, sequence conservation is not a good test of function because it can't falsify your belief.

The only thing that distinguishes this paper from others is the inclusion of the following paragraph in the discussion. It looks a lot like something the referees demanded because if the authors really believed it they would have written the rest of the paper very differently.
While species-specific clustering for alternative splicing is consistent with lineage-specific adaptive evolution [6, 7], it may also support the hypothesis that much splicing is due to erroneous splice site choice, producing non-functional isoforms targeted for degradation/nonsense-mediated decay [38, 39]. These presumably deleterious splicing events would therefore be unlikely to be evolutionarily conserved among species.
So, when we look closely at what's in the scientific literature, we see that the select-effect criterion as evidence of function is often used as support for the conclusion of biological function if the sequences/phemonena are conserved. However, lack of conservation is almost never interpreted to mean that the sequences are just noise. There's always an excuse.

In this case, lack of conservation really does indicate that slice variants are just noise—errors in splicing. Most so-called "alternative splicing" is a myth not supported by evidence. That myth is taking a long time to die in spite of the fact there's no solid evidence to support it and plenty of data that challenges it.


1. I'm ignoring the other problems in the abstract. One of them is that early estimates by knowledgeable experts of the number of genes predicted about 30,000, not 100,000 [see False history and the number of genes: 2016]. Also, the phrase "it is now clear that" should be changed to "it's an open question whether."

Gibilsco, Q.Z., Mahajan, S., and Bachtrog, D. (2016) Alternative Splicing within and between Drosophila Species, Sexes, Tissues, and Developmental Stages. PLoS Genet 12:e1006464. [doi: 10.1371.pgen.1006464]

Nilsen, T.W., and Graveley, B.R. (2010) Expansion of the eukaryotic proteome by alternative splicing. Nature, 463:457-463. [doi: 10.1038/nature08909]

14 comments :

Arlin said...

+1 for my colleagues Eugene Melamud and John Moult.

txpiper said...

"There doesn't seem to be any consensus on how to define "function" although there's general agreement on using sequence conservation as a first step."

I'd think function should be about functionality. I guess the confusion about the definition is over my head. But defining 'conserved' is pretty easy. That just means immutable.

Mikkel Rumraket Rasmussen said...

If conserved means a sequence is immutable, then nothing in biology is conserved. Go deep enough in time and any gene will show some degree of divergence no matter how central it is to cellular life as we know it. Ribosomal proteins or RNA, tRNA, ribonucleotide reductase, key proteins in the electron transport chain, it doesn't matter, they all diverge at some point.

Federico Abascal said...

Conserved here means "evolving under purifying selection"; something is "conserved" if it has diverged less than expected by chance.

Federico Abascal said...

"There you have it. If the splice variants were conserved then everyone would have looked to that as evidence of function. But if they are not conserved then that's evidence of species specificity, plasticity, and rapid evolution. Your speculation can't be falsified by comparative genomics. “

Thankfully now we cannot only compare genomes but we can look into the patterns of genomic variation within populations. If alternative exons were species-specific innovations they should be evolving under selection on those species. We recently interrogated data from the 1000 Genomes Project and found that most alternative exons do not show such signatures of selection. Not only their ratios of non-synonymous to synonymous substitutions are close to neutral expectations but these ratios do not change with increasing allele frequencies. Hence, the evolutionary innovation “trick” cannot be invoked.

TheOtherJim said...

Are you referring to the Trends in Biochemical Sciences paper, with the correct proof on-line at the moment?

Federico Abascal said...

Yes, that one

TheOtherJim said...

Thanks!

Eric said...

If functional DNA was immutable then we wouldn't see synonymous mutations.

The fact of the matter is that deleterious mutations can occur in functional DNA which is then selected against. The removal of deleterious alleles within the population is what produces conservation of sequence.

Eric said...

20-40% of NY taxis have dents in them. Therefore, dents must be functional. Since different taxis have different dents, this explains the difference in function between them.

tantrev said...

Is it not presumptuous to demand unconserved DNA to be assumed as “noise”, rather than “functional”? It seems the bar for declaring DNA as “noise” should be significantly higher than function. Indeed, I cannot think of any experiment that can conclusively declare a DNA segment to be mere “noise”. The accusation that comparative genomics uses circular logic to defend its functionality claims should be equally applied to the “noise” dogma that demands conservation for function.

Faizal Ali said...

It seems the bar for declaring DNA as “noise” should be significantly higher than function. Indeed, I cannot think of any experiment that can conclusively declare a DNA segment to be mere “noise”.

One word: Onions.

judmarc said...

The accusation that comparative genomics uses circular logic to defend its functionality claims should be equally applied to the “noise” dogma that demands conservation for function.

Any logically true statement is a logical tautology and is thus subject to the accusation that it is "circular." 2+2=4 is such a tautology or "circular" statement. The fact that functional sequences change less quickly than neutral or deleterious ones is self-evident, and also equally as true, tautological, and "circular" as 2+2=4.

Besides lutesuite's apt citation of onions, there is also the GULOP pseudogene. There's your experiment, performed for you by Nature.

Anonymous said...

"Indeed, I cannot think of any experiment that can conclusively declare a DNA segment to be mere “noise”."

What about the experiment where several presumed "noise" DNA segments were removed and the mice produced with the modified DNA lived just fine? What about the professor who had his students sequence their own DNA in a region with several small introns and found that some had those introns and some didn't, yet they were all healthy?