Sandwalk: Two Examples of "Alternative Splicing"

Wednesday, November 12, 2008

Two Examples of "Alternative Splicing"

THEME:
Transcription
Last week I bumped into a colleague who teaches in our third year molecular biology course. I was lamenting about the sad state of science these days and we got to talking about alternative splicing. I repeated my complaint that much of the predicted alternative splice variants are artifacts. It makes no sense that conserved genes would be producing alternative protein variants that are species specific. I am convinced that the EST databases are full of artifacts and that most predicted splice variants do not exist.

My colleague was shocked. He is firmly convinced that most human genes express a number of different protein products that are produced as the result of alternatively spliced mRNA precursors. I asked him if he had ever looked at his favorite genes to see if the predicted variants make any sense. The ones that I've looked at certainly don't. (Join in the fun: see the challenge below.)

My colleague is very knowledgeable about the genes for the major subunits of eukaryotic RNA polymerase since it was his lab that cloned the first one. I suggested that he look at the predicted alternative splice variants of the two human genes and let me know if he is still convinced that these variants make biological sense. I'm not sure he will do it so let's take a look ourselves.

Eukaryotic RNA polymerase is a complex protein machine consisting of ten different subunits. Two of the subunits, Rpb1 and Rbp2, are more commonly known as A and B. In the human genome they are encoded by the genes POLR2A and POLR2B respectively [RNA Polymerase Genes in the Human Genome].

If you click on the Entrez Gene URLs you will end up at a page that summarizes what is known about the gene. Down the right-hand side of the page there are links to several other webpages, including a link to AceView, a database of alternative splice variants. Before following this link to the POLR1A variants, let's note that on the annotated Entrez Gene website there are no alternative splice variants listed. Apparently someone has decided that the predicted variants are probably artifacts.

Go to the AceView page for AceView POLR2A. The first thing you see is a short explanation.

RefSeq annotates one representative transcript (NM included in AceView variant.a), but Homo sapiens cDNA sequences in GenBank, filtered against clone rearrangements, coaligned on the genome and clustered in a minimal non-redundant way by the manually supervised AceView program, support at least 11 spliced variants.

AceView summary
Note that this locus is complex: it appears to produce several proteins with no sequence overlap.
Expression: According to AceView, this gene is expressed at very high level, 4.8 times the average gene in this release. The sequence of this gene is defined by 537 GenBank accessions from 518 cDNA clones, some from breast (seen 40 times), marrow (29), head neck (19), brain (18), eye (18), leukopheresis (18), lung tumor (18) and 132 other tissues. We annotate structural defects or features in 13 cDNA clones.
Alternative mRNA variants and regulation: The gene contains 29 different introns (28 gt-ag, 1 gc-ag). Transcription produces 13 different mRNAs, 11 alternatively spliced variants and 2 unspliced forms. There are 7 probable alternative promotors and 5 non overlapping alternative last exons (see the diagram). The mRNAs appear to differ by truncation of the 5' end, truncation of the 3' end, overlapping exons with different boundaries, alternative splicing or retention of 4 introns. 337 bp of this gene are antisense to spliced gene pluvu, raising the possibility of regulated alternate expression.
Protein coding potential: 10 spliced and the unspliced mRNAs putatively encode good proteins, altogether 11 different isoforms (3 complete, 4 COOH complete, 4 partial), some containing domains RNA polymerase Rpb1, domain 1, RNA polymerase, alpha subunit, RNA polymerase Rpb1, domain 3, RNA polymerase Rpb1, domain 4, RNA polymerase Rpb1, domain 5, RNA polymerase Rpb1, domain 6, RNA polymerase Rpb1, domain 7, Eukaryotic RNA polymerase II heptapeptide repeat [Pfam]. The remaining 2 mRNA variants (1 spliced, 1 unspliced) appear not to encode good proteins.

Here's the figure showing the various predicted alternatively spliced transcripts and the various different proteins.

It's really difficult to imagine that any of these are biologically relevant. How could a small bit of the large RNA polymerase subunit ever be part of the RNA polymerase protein complex? It's not a surprise that the Entrez Gene annotators have ignored these predictions.

If, as I believe, most of the small ESTs on which these predictions are based are artifacts, then the overall pattern makes sense. What you see are examples of splicing errors where an intron has not been correctly removed. These extremely rare splicing errors are copied into cDNA during construction of EST libraries and specifically selected by screening out all the correctly spliced mRNAs. (That's how you make most EST libraries.)

Here's what AceView says about the gene for the other large subbunit [AceView: POLR2B].

RefSeq annotates one representative transcript (NM included in AceView variant.a), but Homo sapiens cDNA sequences in GenBank, filtered against clone rearrangements, coaligned on the genome and clustered in a minimal non-redundant way by the manually supervised AceView program, support at least 9 spliced variants.

One again, AceView notes that the annotated human genome has ignored the predicted alternative plice variants but maintains that there are at least nine of them.

Here's the figure, decide for yourself whether this is credible.

There are several well-known examples of human genes producing different protein variants due to alternative splicing. The ones I can think of off the top of my head are the genes for class I antigens, α-tropomyosin, and calcitonin. I'm sure there are half a dozen others.

Here's the challenge. See if you can find a human gene for a well-studied protein where the structure of the protein is known and there are multiple protein variants derived by alternative splicing. I bet that readers of Sandwalk can't find very many where the predicted variants many any sense and are likely to be biologically significant.

What does this mean? Whenever you look at your favorite well-studied gene you see that the predictions of alternative splicing are silly. So why should we believe the genome wide analyses? Is it just a coincidence that the more we learn about a given gene the most we become willing to reject the ESTs as artifacts? Or is it possible that alternative splicing is mostly confined to those genes that have not been well studied?

10 comments :

Anonymous said...: Sorry for the laundry list of citations.

This protein is well-known to some of us, at least.

Some more examples from a very quick spin around the interwebs (guess what one of the Pubmed search handles was):

Schöning JC, Streitner C, Meyer IM, Gao Y, Staiger D.
Reciprocal regulation of glycine-rich RNA-binding proteins via an interlocked feedback loop coupling alternative splicing to nonsense-mediated decay in Arabidopsis.
Nucleic Acids Res. 2008 Nov 4. [Epub ahead of print]

Dinkins RD, Majee SM, Nayak NR, Martin D, Xu Q, Belcastro MP, Houtz RL, Beach CM, Downie AB.
Changing transcriptional initiation sites and alternative 5'- and 3'-splice site selection of the first intron deploys Arabidopsis protein isoaspartyl methyltransferase2 variants to different subcellular compartments.
Plant J. 2008 Jul;55(1):1-13.

Puyaubert J, Denis L, Alban C.
Dual targeting of Arabidopsis holocarboxylase synthetase1: a small upstream open reading frame regulates translation initiation and protein targeting.
Plant Physiol. 2008 Feb;146(2):478-91.

Bove J, Kim CY, Gibson CA, Assmann SM.
Characterization of wound-responsive RNA-binding proteins and their splice variants in Arabidopsis.
Plant Mol Biol. 2008 May;67(1-2):71-88.

Bocobza S, Adato A, Mandel T, Shapira M, Nudler E, Aharoni A.
Riboswitch-dependent gene regulation and its evolution in the plant kingdom.
Genes Dev. 2007 Nov 15;21(22):2874-9.

Muralla R, Chen E, Sweeney C, Gray JA, Dickerman A, Nikolau BJ, Meinke D.
A bifunctional locus (BIO3-BIO1) required for biotin biosynthesis in Arabidopsis.
Plant Physiol. 2008 Jan;146(1):60-73.

Zhang XC, Gassmann W.
Alternative splicing and mRNA levels of the disease resistance gene RPS4 are induced during defense responses.
Plant Physiol. 2007 Dec;145(4):1577-87.

Rossignol P, Collier S, Bush M, Shaw P, Doonan JH.
Arabidopsis POT1A interacts with TERT-V(I8), an N-terminal splicing variant of telomerase.
J Cell Sci. 2007 Oct 15;120(Pt 20):3678-87.

Castells E, Puigdomènech P, Casacuberta JM.
Regulation of the kinase activity of the MIK GCK-like MAP4K by alternative splicing.
Plant Mol Biol. 2006 Jul;61(4-5):747-56.

Lee JR, Jang HH, Park JH, Jung JH, Lee SS, Park SK, Chi YH, Moon JC, Lee YM, Kim SY, Kim JY, Yun DJ, Cho MJ, Lee KO, Lee SY.
Cloning of two splice variants of the rice PTS1 receptor, OsPex5pL and OsPex5pS, and their functional characterization using pex5-deficient yeast and Arabidopsis.
Plant J. 2006 Aug;47(3):457-66.

de la Fuente van Bentem S, Vossen JH, Vermeer JE, de Vroomen MJ, Gadella TW Jr, Haring MA, Cornelissen BJ.
The subcellular localization of plant protein phosphatase 5 isoforms is determined by alternative splicing.
Plant Physiol. 2003 Oct;133(2):702-12.

Savaldi-Goldstein S, Aviv D, Davydov O, Fluhr R.
Alternative splicing modulation by a LAMMER kinase impinges on developmental and transcriptome expression.
Plant Cell. 2003 Apr;15(4):926-38.

Jasinski S, Perennes C, Bergounioux C, Glab N.
Comparative molecular and functional analyses of the tobacco cyclin-dependent kinase inhibitor NtKIS1a and its spliced variant NtKIS1b.
Plant Physiol. 2002 Dec;130(4):1871-82.

Macknight R, Duroux M, Laurie R, Dijkwel P, Simpson G, Dean C.
Functional significance of the alternative transcript processing of the Arabidopsis floral promoter FCA.
Plant Cell. 2002 Apr;14(4):877-88.

Dinesh-Kumar SP, Baker BJ.
Alternatively spliced N resistance gene transcripts: their possible role in tobacco mosaic virus resistance.
Proc Natl Acad Sci U S A. 2000 Feb 15;97(4):1908-13.

Zhou DX, Kim YJ, Li YF, Carol P, Mache R.
COP1b, an isoform of COP1 generated by alternative splicing, has a negative effect on COP1 function in regulating light-dependent seedling development in Arabidopsis.
Mol Gen Genet. 1998 Feb;257(4):387-91.; Wednesday, November 12, 2008 7:06:00 PM
BioDiaz2 said...: well i can understand the great deal of research going on with that alternative splicing and the micro RNAs as well as small RNA. However, like you said i found that according to aceview there are about 34 spliced variants for PFKM, of which only ONE is recognized

http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?c=geneid&org=9606&l=5213; Wednesday, November 12, 2008 8:45:00 PM
Anonymous said...: My experience with Western blots (and clean antibodies!) of quite a number of different proteins shows that typically there is one major band, sometimes couple of isoforms and the rest are degradation products that can be controlled. So I share your views on alternative splicing in general. IMHO, they make a lot more sense from a mechanistic point of view.

Still, allow me to play devil's advocate and offer one possibility of the importance of rare alternative splice variants.

Regulation. Or it might be better called "accidental regulation" or "interference regulation". Disfunctional fragments of multidomain proteins retaining enough function to compete with the "normal" protein for its target(s), for example.

If we are to subscribe to the notion of a cell being a chaotic system where some small changes can potetnially bring about large consequences, this is certainly a possibility that cannot be discounted off hand without experimental evidence.

Although it's a lot of work, the experimental resolution seems to be straightforward:
transgenes that don't have [some] introns. If in majority of cases such transgenes don't have a phenotype, then there is much better footing for a sensible notion that majority of detected splice varians are spurious ones without functional role.

Hopefully someone is up to this arduous task - even with the odds of it being "negative" unpublishable result.; Wednesday, November 12, 2008 10:16:00 PM
Anonymous said...: dk, I'm sorry I couldn't summarize all of the studies in my list (or any of them, for that matter) - the list itself was too long. The paper by Dinesh-Kumar and Baker is just the sort of transgenic study you suggest, and it shows quite clearly that alternative splicing is required for the biological activity of the N gene.; Wednesday, November 12, 2008 10:25:00 PM
Anonymous said...: To art:
The paper by Dinesh-Kumar and Baker is just the sort of transgenic study you suggest, and it shows quite clearly that alternative splicing is required for the biological activity of the N gene.

Yes, but Larry's point (and mine) is not that alternative splicing has no role! OF COURSE it has a role. The issue in question is of relative frequency/importance. Is it a norm or is it rather an exception?

Correct me if I am wrong but all of the following is well-known to occur but involve only a clear minority of genes/proteins in eukariots:

- overlapping genes;
- polycistronic mRNA;
- intein splicing;
- polyproteins.

My gut feeling is that functionally important alternative splicing belongs to this list. And I don't think I saw an unequivocal evidence either pro or contra.

P.S. Just looked at AceView annotation for human actins. 20 splice variants for gamma gene and 3 for cardiac alpha. Now, that is just rubbish! Without any of its exons, actin is dead, dead, dead - nothing but unfolded garbage. But I suppose that it can be argued that cleaning up garbage is part of the normal cellular life and is thus can be "regulatory" ...; Thursday, November 13, 2008 12:31:00 AM
Anonymous said...: Hi dk,

Recall that my list was in response to Larry's challenge:

"Here's the challenge. See if you can find a gene for a well-studied protein where the structure of the protein is known and there are multiple protein variants derived by alternative splicing. I bet that readers of Sandwalk can't find very many where the predicted variants many any sense and are likely to be biologically significant."

I had no problems finding lots of cases.

I suspect that some readers are going to get confused by Larry's terminology (I may be one of these!). For me, when Larry claims that the alternatively-spliced RNAs shown here are EST artifacts, he is saying that they are the results of some activity of reverse transcriptase (and/or Taq polymerase) that occurs in the test tube; in other words, these RNA variants do not occur in the cell. I don't buy this claim.

OTOH, if, by artifact, Larry means non-functional RNA (such as an RNA derived from a splicing error), then I could agree that a sizeable fraction of the RNAs such as shown in the browser displays here are in this class. The relative sizes of the two classes (functional vs error) are open questions. But I don't think the universe of functionally-relevant alternative-spliced RNAs is nearly as tiny as Larry is implying.

Which brings me to sort of a counter-challenge - how many studies, perhaps analogous to Dinesh-Kumar and Baker, perhaps using other approaches, can readers here cite that show that an alternatively-spliced RNA has no function?; Thursday, November 13, 2008 10:58:00 AM
Anonymoustache said...: Alternative splicing is, as art pointed out clearly relevant in many cases. I also agree that many of the predicted splice forms are artifacts that could exist within cells. These may not just be due to incorrect splicing, but could also be splicing intermediates. It appears that many introns, especially large ones, are removed not in one step but via a step-wise ratcheting mechanism that involves "stepping-stone" splicing events. Look up "recursive splicing".
Anyway, I also think that many theoretical predictions of isoforms get fooled by very real splice sites that may only be used for recursive splicing and that may not contribute to to alternative splice-form diversity.; Friday, November 14, 2008 12:08:00 PM
Unknown said...: I have a question and I hope someone reads these older posts. Before the human genome was sequenced the number floating around was 100,000 human genes. Now the correct figure is 19,000+ genes but what is the current figure for gene products? If this 100,000 gene products figure is still correct then this becomes a combinational problem. The prediction would be around 50-60% of all genes having at least 1 alternative and some 300 genes with maybe 50 or more alternatives. I would think that looking at a gene family would be a way to go.; Tuesday, May 25, 2010 2:05:00 PM
Larry Moran said...: Bill asks,

I have a question and I hope someone reads these older posts. Before the human genome was sequenced the number floating around was 100,000 human genes.

That number was mere speculation by people who hadn't studied the problem. See Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome.

Now the correct figure is 19,000+ genes but what is the current figure for gene products?

Every gene has to make a product so there must be at least 20,000 protein products (plus a good number of RNA products). It's possible for a single gene to make several different protein products by splicing and dicing the primary transcript or the nascent polypeptide. Nobody knows for sure how many different products are synthesized in thie manner. It's still controversial. Personally, I don't think there are more that 25,000 distinct proteins produced in human cells.

If this 100,000 gene products figure is still correct then this becomes a combinational problem. The prediction would be around 50-60% of all genes having at least 1 alternative and some 300 genes with maybe 50 or more alternatives. I would think that looking at a gene family would be a way to go.

The hard work is being done. So far, there have only been a few dozen genes that have been shown to produce more than one biologically relevant protein.; Tuesday, May 25, 2010 4:02:00 PM
Unknown said...: Thanks Larry. I have actually seen this 100,000 figure used in calculations in some papers.; Tuesday, May 25, 2010 6:54:00 PM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Wednesday, November 12, 2008

Two Examples of "Alternative Splicing"

10 comments :