Sandwalk: An Example of Faulty Logic from Cold Spring Harbor

Friday, September 19, 2008

An Example of Faulty Logic from Cold Spring Harbor

A press release from Cold Spring Harbor Laboratory promotes the work of Michael Zhang and Adrian Krainer who work with splicing factors. In a typical attempt to hype the significance of the work, the press release claims that each human gene has many different variants produced by alternative splicing [CSHL team traces extensive networks regulating alternative RNA splicing].

That may or may not be correct—I happen to think it's mostly an artifact of EST cloning—but that's not the point I want to make here. The main point is the rationalization explained thus,

Biologists involved in the Human Genome Project were frankly astonished to discover that everything that makes us human is the product of a set of only 23,000 or so genes.¹ That number in itself, though several times smaller than prior estimates, is not shocking; it is the relative size of other genomes that surprised scientists.

The common fruit fly that hovers over your ripening bananas, for instance, possesses some 14,000 genes. It's perfectly obvious that human beings are vastly more complex, biologically, than a fly. Molecular biologists have demonstrated in recent years that it is not the number of genes that is the key to complexity but rather the number and diversity of gene products that a given set of genes can instruct cells to manufacture.

Rather than a single gene ordering the production of a single kind of protein, as scientists used to assume, it turns out that individual genes can in some cases give rise to dozens or even thousands of different proteins, thanks to a phenomenon called alternative splicing.

This is one version of what I call The Deflated Ego Problem. The "problem" is that some people are really, really, upset about the fact that humans may only have a few thousand genes more than a fruit fly.

So they look for some way to reflate their egos and one of the most common arguments is the one shown above. (It's excuse #1 on the list.) It goes like this.... Humans are so much more complex that fruit flies even though they have only a few thousand more genes because each human gene does double, or triple duty. Each human gene makes several different proteins by alternative splicing of the primary RNA transcript.

Viola! Problem solved.

Except for one little nasty fact. Drosophila genes also show abundant levels of alternative splicing. In fact they produce just as many variants per gene as humans, if you believe the EST data (which I don't).

Oops. There goes that solution. Humans don't have that many more proteins than fruit flies after all.

This is such an obviously bogus argument that I'm surprised it still appears in the scientific literature. Doesn't anyone realize that in order for it to salvage deflated egos there has to be no—or much less— alternative splicing in fruit flies?

Some people were surprised and embarrassed by the "low" number of genes in the human genome but others were pretty happy that their estimates were close to the mark [Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome].

25 comments:

RBHFriday, September 19, 2008 11:47:00 PM
From the release:

It's perfectly obvious that human beings are vastly more complex, biologically, than a fly.

That's not at all obvious to me. On what measure are humans vastly more complex, biologically, than a fly?
ReplyDelete
Replies
SigmundSaturday, September 20, 2008 3:17:00 AM
"On what measure are humans vastly more complex, biologically, than a fly?"
I suspect its the same measure that makes a blue whale vastly more complex than a human.
ReplyDelete
Replies
OptimistixSaturday, September 20, 2008 5:01:00 AM
I would have no problem if human beings were the least complex organisms on earth.

However, if one does believe the EST data, the fruitfly does not produce as many splice variants per alternatively spliced gene as human - see, for instance,

http://bioinfo.mbi.ucla.edu/ASAP2/

I don't really care about the "humans are superior/complex" stuff, but as a Bioinformatics Ph.D student working on alternative splicing, I've read several papers about the extent of alternative splicing - not all estimates are based on ESTs.

You might find it interesting that the latest editions of the classic texbooks by Alberts et al. and Lodish et al. mention that up to 60% of human genes are estimated to be alternatively spliced, and that it is an important mechanism of post-transcriptional regulation.
ReplyDelete
Replies
Larry MoranSaturday, September 20, 2008 9:17:00 AM
Rileen says,

You might find it interesting that the latest editions of the classic texbooks by Alberts et al. and Lodish et al. mention that up to 60% of human genes are estimated to be alternatively spliced, and that it is an important mechanism of post-transcriptional regulation.

Yes, isn't that astonishing!

There's no way that 60% of all human genes can have biologically relevant alternative splice variants. I don't understand how modern science could have deteriorated to the point where such silly speculations are considered serious.

BTW, Rileen, it's important to remember that in this controversy the skeptics are not denying the existence of biologically relevant alternative splicing. There are numerous excellent examples. We've been teaching about them for 25 years.

What we are questioning is the prevalence of the phenomenon and, especially, the ridiculous speculations on the various databases.

Here's an example from AceView [3309] of the most highly conserved gene in all of biology (human BiP/HSP70).

The predicted alternative variants would remove the hydrophobic core of the protein or the active site in the N-terminal half of the protein. It's beyond belief that anyone would think that humans could have evolved such a functional variant. (Other mammals show different "variants.")

You can look at the predictions for any well known protein where the structure has been solved and reach the same conclusion. Thus, in those cases where we know a lot about the gene/protein the alternative splice site predictions do not make any sense. It's logical to assume that this applies to almost all other genes as well.

I followed your link to the ASAPII database and keyed in the Drosophila version of the BiP gene (Hsc3). Remember that this is the most highly conserved gene in all of biology. Here are the so-called splice variants in Drosophila [Hsc70-3). As you can see there are even more of them than for the human gene. Furthermore, the introns are in different places so the various permutations of exons are quite different. Do you really believe that fruit flies specifically make such radical, biologically relevant, variants of a highly conserved protein?

For the sake of comparison, here are the mouse [HspA5bp1] and human [HSPA5] versions of the BiP gene from your preferred database. Are you—or anyone else—prepared to defend the idea that two mammals could make a variety of different functional variants of a highly conserved protein?
ReplyDelete
Replies
The Other JimSaturday, September 20, 2008 1:27:00 PM
This comment has been removed by the author.
ReplyDelete
Replies
The Other JimSaturday, September 20, 2008 1:35:00 PM
Larry,

In your opinion, what are the EST database errors are due to?
(This is an honest question.)
ReplyDelete
Replies
Larry MoranSaturday, September 20, 2008 3:25:00 PM
Jim asks,

In your opinion, what are the EST database errors are due to?

In most cases they are probably real copies of splicing errors. From what we know of biological processes, we expect that splicing will be error prone and incorrectly spliced precursors will hang around for a bit before being degraded in the nucleus. They may be copied into cDNA and converted to ESTs.

Other sources of error are incorrect copying of DNA fragments contaminating the RNA prep.

Most of the alternative splicing predictions seem to be based on single examples of ESTs from a single RNA preparation. The technique is quite capable of picking up extremely rare events like splicing errors. In fact, most of the isolation techniques enrich for rare transcripts that haven't been detected before.
ReplyDelete
Replies
SteveFSaturday, September 20, 2008 6:22:00 PM
Viola! Problem solved.

How do you solve a problem with a viola? ;-)
ReplyDelete
Replies
AnonymousSaturday, September 20, 2008 9:28:00 PM
LM: In most cases they are probably real copies of splicing errors. From what we know of biological processes, we expect that splicing will be error prone and incorrectly spliced precursors will hang around for a bit before being degraded in the nucleus.

"Error" and "incorrect", but only in the sense of "doesn't make a protein with the function we humans expect it to have". This gets to the issue of why have splicing at all? The traditional (adaptionist) answer is that the genetic modularity provided by splicing enables more efficient evolution of new gene combinations. Do you also reject the evolutionary utility of RNA splicing in the general case, as well as in the case of alternative splicing?
ReplyDelete
Replies
Larry MoranSaturday, September 20, 2008 9:28:00 PM
stevef asks,

How do you solve a problem with a viola? ;-)

I'm glad someone appreciates my sense of humor.
ReplyDelete
Replies
OptimistixSunday, September 21, 2008 5:41:00 AM
But alternative splicing produces alternative transcripts, not proteins. Whether these then lead to functional protein variants is another question. To me there are two questions :

(1)Is alternative splicing (AS) widespread (in mammals, say) ?

(2)Does it lead a lot of different functional variants of proteins?

You seem to be saying that the answer to (2) is "no", therefore so is the answer to (1).

From whatever I've read so far, I lean towards "yes" for (1), and "no" for (2).

Of course, in that case one would ask whether this AS is noise, or some sort of regulation? Many people seem to think that at least some of it is regulation via a coupling of AS and NMD.

ASAP is not my preferred database, I just linked to it because it's one of the standard ones, and has a table summarizing what they concluded about various organisms, human and fruitfly included.

The high estimates of AS are not only based on ESTs, papers using microarrays and full length cDNAs, as well as new ones using RNAseq, have all estimated high rates of AS. Do you think the other methods also involved artifacts of some sort?

There was a paper in PNAS last year, about AS in the "ENCODE complement" of the human genome - they also reported a figure around 60%, but they also noted that in many of the cases the predicted impact of AS on the protein would be disruptive , and said that this indicates that the role of AS in producing functional proteins may have been overestimated :-)

Which brings me to something else i keep asking people - how many functional proteins do you think the human proteome has, and could you point me to a good reference for the same? I keep hearing "well over a hundred thousand", but am yet to see a paper about it (remember that my background is not in Biology, and i am trying to learn as much as i can).

One last question regarding the bit about whether human and mouse would produce different variants of a conserved protein - would you be less skeptical about conserved AS inferred using full length cDNAs in human as well as mouse?
ReplyDelete
Replies
Larry MoranSunday, September 21, 2008 11:21:00 AM
Rileen asks,

But alternative splicing produces alternative transcripts, not proteins. Whether these then lead to functional protein variants is another question. To me there are two questions :

(1)Is alternative splicing (AS) widespread (in mammals, say) ?

(2)Does it lead a lot of different functional variants of proteins?

You seem to be saying that the answer to (2) is "no", therefore so is the answer to (1)

That's correct. The term "alternative splicing" has no meaning if it's only splicing errors. We've been using the term for almost three decades to refer to a biological phenomenon that leads to different functional versions of a protein (or an active RNA).

One last question regarding the bit about whether human and mouse would produce different variants of a conserved protein - would you be less skeptical about conserved AS inferred using full length cDNAs in human as well as mouse?

Yes, definitely I would be less skeptical. If a lab could show reproducibly that some mouse tissue produced functional mRNAs with a different coding region than human mRNAs then that would be good evidence of alternative splicing.

Are you thinking of a good example of a conserved gene where this happens? Can you post a reference?
ReplyDelete
Replies
Larry MoranSunday, September 21, 2008 11:37:00 AM
anonymous asks,

"Error" and "incorrect", but only in the sense of "doesn't make a protein with the function we humans expect it to have".

Not exactly. The conclusion is that it doesn't make a protein that makes any sense in terms of everything we know about genes expression and evolution.

The proteins could, in fact, be functional but it would mean throwing out all of our current models. I'm not saying that we shouldn't do this if the predictions of alternative splicing turn out to be correct, what I'm saying it that this kind of extraordinary claim would require extraordinary evidence.

This gets to the issue of why have splicing at all? The traditional (adaptionist) answer is that the genetic modularity provided by splicing enables more efficient evolution of new gene combinations. Do you also reject the evolutionary utility of RNA splicing in the general case, as well as in the case of alternative splicing?

Yes, I reject that argument. It requires that evolution look ahead to the future and that has been refuted as a possible model for evolution.

I favor the explanation that modern spliceosomal processin arose from Group II introns, which are a form of transposon. The spread of spliceosomal processing in eukaryotes may be due to adaptations to protect the genome from accidental inserts of Group II transposons. In other words, it's a defense against molecular parasites.

Subsequent beneficial effects of spliceosomal processing are epiphenomena.
ReplyDelete
Replies
OptimistixSunday, September 21, 2008 12:55:00 PM
I actually meant that the AS pattern would be the same in both human and mouse, i.e it would be supported by full length cDNA rather than "partial" ESTs, and in both human as well as genome. But you seem to have interpreted as different events being supported in the two transcriptomes. Which is also fine - the bottomline is that you would trust full length cDNAs more than ESTs for the purpose of inferring AS (did I et that right?).

Two references on inferring AS using full length cDNAs rather than ESTs (by the same group of people) :

http://nar.oxfordjournals.org/cgi/content/full/34/14/3917

http://nar.oxfordjournals.org/cgi/content/full/35/suppl_1/D104

I am yet to see a paper on conserved AS using this approach, but I'm sure it's only a matter of time.
ReplyDelete
Replies
AnonymousSunday, September 21, 2008 2:02:00 PM
As a biologists, I don't think we as a community really understand what complexity is. After all we know that even systems with only a few rules can display astonishing behavior (eg game of life, Mandelbrot etc). I think at heart many biologists are still stamp collectors.
ReplyDelete
Replies
TheBrummellSunday, September 21, 2008 3:36:00 PM
It's perfectly obvious that human beings are vastly more complex, biologically, than a fly.

I also hate statements like this. To repeat others, above, this is NOT OBVIOUS at all.

Did YOU undergo metamorphosis from a legless maggot into a walking, singing, flying wonder?
ReplyDelete
Replies
Larry MoranSunday, September 21, 2008 4:59:00 PM
Rileen says,

Two references on inferring AS using full length cDNAs rather than ESTs (by the same group of people) :

http://nar.oxfordjournals.org/cgi/content/full/34/14/3917

http://nar.oxfordjournals.org/cgi/content/full/35/suppl_1/D104

The first study took 56,417 so-called "full-length" cDNAs and started to eliminate as many artifacts as they could find. They also eliminated all those cases where there was a single transcript at the locus. They ended up with a set of 18,297 cDNAs or 32% of the total. What this means is that many of the so-called "full-length" cDNA were suspect.

The bottom line is that there were 6005 genes with variant mRNA due to alternative splicing. This is 25% of all genes. Several of these variants did not affect the sequence of the mature protein leaving only 3015 (12%) that did.

This is getting much closer to the number I expect, which is about 5%.

One of the genes where they detected alternative splice variants is HSPA8, a member of the HSP70 gene family—the most highly conserved genes in biology.

The authors propose a variant where 25 of the most highly conserved amino acid residues are deleted to give a functional variant. This is highly unlikely.

I conclude that their cDNA library is better than ESTs but it still contains artifacts.
ReplyDelete
Replies
AnonymousSunday, September 21, 2008 9:35:00 PM
LM: The spread of spliceosomal processing in eukaryotes may be due to adaptations to protect the genome from accidental inserts of Group II transposons.

How does efficient splicing defend genes from disruption by transposon insertion?
ReplyDelete
Replies
OptimistixMonday, September 22, 2008 3:30:00 AM
But I think their numbers are also low (partly) due to coverage issues - with higher coverage, they would have more loci with multiple transcripts (which would be identical in case of constitutive splicing). So the discarded cases are not necessarily errors, it's just that with one transcript at a locus, you can't conclude anything about constitutive or alternative splicing.

I would like to know how many proteins you think are there in humans, or other organisms, for that matter.
ReplyDelete
Replies
Larry MoranMonday, September 22, 2008 9:17:00 AM
Rileen says,

But I think their numbers are also low (partly) due to coverage issues - with higher coverage, they would have more loci with multiple transcripts (which would be identical in case of constitutive splicing). So the discarded cases are not necessarily errors, it's just that with one transcript at a locus, you can't conclude anything about constitutive or alternative splicing.

You'll be able to make excuses for all studies by using such arguments. They aren't going to get us anywhere.

What I'd like to do is to examine some specific case to see if they make sense. Do you have some specific examples of genes with alternative splice variants other than the classic examples that we can all agree on?

When I look at specific examples of genes that I'm familiar with, the proposed splice variants do not make sense. They are very unlikely to produce biologically relevant protein variants.

What I'm asking is whether that's the case with all examples. All you need to do is supply a few counter examples.

I would like to know how many proteins you think are there in humans, or other organisms, for that matter.

I think there are about 20,000 protein encoding genes in the human genome. Many of the protein products can be post-translationally modified in various way so the total number of different proteins—which I assume is what you were asking—is much higher. Every phosphorylated protein, for example, exists in at least two states: phosphorylated and unphosphorylated.

I don't know how many different variants there are but I wouldn't be surprised if the total comes to over 50,000.

Why do you ask?
ReplyDelete
Replies
AnonymousMonday, September 22, 2008 10:22:00 AM
As if alternative splicing isn't bad enough, now trans-splicing makes a splash in Science. :)

A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells.
Science. 2008 Sep 5;321(5894):1357-61.
Li H, Wang J, Mor G, Sklar J.
http://www.sciencemag.org/cgi/content/full/321/5894/1357
ReplyDelete
Replies
OptimistixMonday, September 22, 2008 11:12:00 AM
As I'm not a biologist, I am not quite sure which cases are implied when you say "the classic examples that we can all agree on" - I presume these are the ones which were already known 10 or more years ago, and some of them can be found in textbooks.

This paper talks about cases where AS in genes encoding single-pass transmembrane (TM) proteins is predicted to remove transmembrane anchoring domains and thereby create soluble protein isoforms :

Widespread production of novel soluble protein isoforms by alternative splicing removal of transmembrane anchoring domains

Yi Xing, Qiang Xu, Christopher Lee

http://www.febsletters.org/article/S0014-5793(03)01354-1/abstract

Would you say that makes biological sense?

The question about the number of proteins in human was because an oft invoked argument in papers and talks is that we have "well over 100,000 proteins" but "only 20,000-25,000 genes", and so there is a "genome-proteome gap", and AS is offered as one the mechanisms (along with post-translational modification etc.) which explain this gap. I reckon you consider this line of argument utter tosh :-)
ReplyDelete
Replies
Larry MoranMonday, September 22, 2008 2:56:00 PM
Rileen says,

This paper talks about cases where AS in genes encoding single-pass transmembrane (TM) proteins is predicted to remove transmembrane anchoring domains and thereby create soluble protein isoforms :

The authors use a computer program to predict transmembrane domains (TM). Then they look at all the alternative transcripts predicted by their analysis of EST data (and other data). If one of the predicted transcripts removes all of part of a predicted TM domain then they count this as evidence that the prediction is accurate.

Here's the data for the SDC4 gene encoding syndecan-4 [SDC4]

This is the example they specifically mention in their paper so it's safe to assume that this is one of the best examples.

What do you think? I assume that transcript "a" is the normal mRNA and transcript "b" is the isoform with the computer-predicted TM domain. Has there been anything published that proves that the predicted TM domain really is a transmembrane domain?

What do you think of transcripts "c" and "d"? Do you reject those as unreasonable transcripts but accept "b"?
ReplyDelete
Replies
OptimistixMonday, September 22, 2008 4:20:00 PM
Actually it would appear that transcript a is the normal one which also encodes a TM domain (via exon 5, presumably). The Aceview page you linked to mentions several papers on SDC4, e.g :

http://jb.oxfordjournals.org/cgi/reprint/119/5/979

http://www.jbc.org/cgi/content/full/280/52/42573

And here's a page on SDC4 from another, "manually annotated" database :

http://www.hprd.org/summary?protein=08366&isoform_id=08366_1&isoform_name=

The existence of the TM domain seems well accepted, which might be why the authors chose to highlight this example.

If I got that right, then transcript b would actually correspond to the "soluble" isoform, where AS removes the TM domain that is normally present.

Transcript d seems to suggest 2 cassette exons compared to others, with one of them absent from the "usual" form altogether, and d looks really funny. I'd say they could be rare variants expressed at a very low level, you might say that's an "excuse" :-p
ReplyDelete
Replies
SparkyMonday, September 22, 2008 6:20:00 PM
Apropos this discussion: a recent article on the Δp53 splice variant (open access), in which it is shown that the proposed splicing results in a misfolded protein highly prone to aggregation. Unless the splice fortuitously excises a whole domain, alternative assemblies of internal introns are likely to produce misfolded and useless species.
ReplyDelete
Replies

Add comment