Sandwalk: Philip Ball's view of alternative splicing

Thursday, October 31, 2024

Philip Ball's view of alternative splicing

Genomics is a powerful tool that allows you to collect massive amounts of data that can point the way to new understanding. But it can also be abused when the results are overinterpreted. We saw an extraordinary example of this in 2012 when ENCODE made unsubstantiated claims that were quickly challenged.

I'm reminded of the caution from Sydney Brenner who warned us about genomics (Brenner, 2000) and the warning in Dan Graur's harsh critique of the 2012 ENCODE claims (Graur et al., 2013) where they said ...

The Editor-in-Chief of Science, [Bruce Alberts,] has recently expressed concern about the future of "small science," given that ENCODE-style Big Science grabs the headlines that decision makers so dearly love. Actually the main function of Big Science is to generate massive amounts of easily accessible data. The road from data to wisdom is quite long and convoluted. Insight, understanding, and scientific progress are generally achieved by "small science." ...

High-throughput genomics and the centralization of science funding have enabled Big Science to generate "high-impact false positives" by the truckload. Those involved in Big Science will do well to remember the depressingly true popular maxim: "If it is too good to be true, it is too good to be true."

We conclude that the ENCODE Consortium has, so far, failed to provide a compelling reason to abandon the prevailing understanding among evolutionary biologists according to which most of the human genome is devoid of function. The ENCODE results were predicted by one of its lead authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass media hype, and public relations may well have to be rewritten.

Philip Ball is one of many science writers who don't distinguish between the existence of a feature and whether it is meaningful. For example, he takes ENCODE workers at their word when they claim that the human genome contains far more non-coding genes than protein-coding genes even though it is based on the mere existence of a transcript without any evidence that is is functional and not just junk RNA [Philip Ball says RNA may rule our genome]. Now, it may eventually turn out to be true that our genome is chock full of regulatory RNA genes but for now that speculation lacks evidence and runs contrary to the evidence of a sloppy genome that's 90% junk. It is not fair to mislead the general public by not fully presenting both sides of the controversy.¹

All researchers need to realize that the best scientific practice is produced when, like Darwin, they persistently search for flaws in their arguments.

Bruce Alberts et al. (2015)
"Self-correction in science at work"
Science 348: 1420Graur is not alone in pointing out the difference between collecting data and the established methods of forming and testing a hypothesis. He is not the only one who reminds us that extraordinary claims require extraordinary evidence—and let's be clear that saying there more non-coding genes than coding genes IS an extraordinary claim. All the critics of ENCODE have made it clear that it is scientific malpractice to make outrageous claims without mentioning all the evidence against your claim.

Large-scale genomics experiments have looked at all the RNAs in various tissues and mapped them to the genome. Many of them appear to come from protein-coding genes and that's not a surprise since these genes cover about 40% of the genome. Most of the transcripts are derived from introns and the surprise was that many of them seem to be slice variants that do not correspond to the canonical mRNA sequence that was previously characterized or predicted. The researchers who discovered these variants jumped to the conclusion that they must be due to alternative splicing—a conclusion that was based on the mere existence of a splice variant and not on any evidence that they were biologically significant.

One of the groups was based in a sister department to mine at the University of Toronto. They published a paper in 2008 claiming that 95% of multiexon genes undergo alternative splicing (Pan et al., 2008). This paper is widely quoted in the popular press and in the scientific literature. That's an extraordinary claim that flies in the face of common sense.

Let's see how Philip Ball handles alternative splicing. I'm quoting from his recent book How Life Works: A User's Guide to the New Biology (pp. 170-171). Judge for yourselves whether you think this science writer is doing a good job of presenting both sides of a major controversy.

Around 90% of our genes give rise to more than one mRNA by alternative splicing. It is particularly common in the brain, for reasons not fully understood. Proteins called neurexins, which control the adhesion between cells and are an essential component of the formation of synapses (the junctions of neurons), are alternatively spliced into vast numbers of different forms. A gene called Dscam1 encodes a protein that enables neurons to recognize each other so that they can avoid fusing amid the tangle of long filamentary axons ... It's thought that various "isoforms" of the Dscam1 protein are produced at random by alternative splicing, and that they act as arbitrary cell-surface labels that distinguish one axon from another. In the fruit fly, almost twenty thousand different alternatively splice variants of the Descam1 protein have been observed—all from a single gene. (It's not clear how many of them actually have a biological function, though.)

It is in this way that, from around twenty thousand genes, our cells can make between eighty thousand and four hundred thousand different proteins. That the number is still so uncertain testifies to how much we still don't understand about the human proteome. Alternative splicing and polyadenylation (p. 121) of mRNA shows that there is at least as much regulation going on after transcription of a gene has begun—that is, en route from RNA to protein—as there is before transcription happens, when it may be turned up or down with the involvement of regulatory sites on DNA itself.

Alternatively spliced proteins are essential components of our molecular toolkit, being mainly involved in the processes most central to the operation of complex organisms: signalling, cell communication. and regulation of development. Thanks to regulatory mechanisms, splicing is tissue-specific. The different cell types don't just have a different repertoire of genes turned on and off, but a different array of proteins made from them. It's no surprise then, to find that alternative splicing is common in multicellular eukaryotes with many tissue types, but much less so in single-celled eukaryotes (let alone prokaryotes).

Philip Ball is reporting on a widespread belief in abundant alternative splicing. This idea permeates the scientific literature making it difficult to find skeptical viewpoints. The majority of molecular biologists are firmly convinced that almost all human protein-coding genes produce several different functional isoforms as a result of alternative splicing.

The truth is more complicated. True alternative splicing has been well-documented and the best examples have been reported in the textbooks since the 1980s. Nobody questions those examples because the protein isoforms have been detected and their functions have been demonstrated. However, most of the speculation about widespread alternative splicing is based solely on the detection of splice variants in large-scale genomics experiments. It's possible that these splice variants are just splicing errors that are readily detected in sensitive transcriptome studies.

It is wrong to just blindly assume that all those splice variants are biologically relevant and declare that 90% of our genes give rise to to multiple protein isoforms. It's the same problem that we see with transcripts and transcription factor binding sites. The difference is that with transcripts we've come to recognize that many transcripts are junk RNA so nobody believes the original ENCODE claims that 80% of our genome is functional. Similarly, nobody believes the original ENCODE claim that every TF binding site represents a genuine regulatory seqeuence. We now know that large genomes full of junk DNA will give rise to millions of spurious transcription factor binding sites and that's why even the ENCODE researchers now refer to them as "candidate" cis-regulatory elements (cCREs).

This skepticism about transcripts and TF binding sites has not permeated the alternative splicing literature in spite of the fact that the same rules apply. The null hypothesis has to be splicing error when you detect a new splice variant. You cannot claim that this is evidence of alternative splicing without doing the hard work required to prove the case.

I write about this in my book in Chapter 6: How Many Genes? How Many Proteins? (pp. 154-169) and I've written dozens of blog posts on alternative splicing (see the list below).

The skeptical view of alternative splicing goes like this.

Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can—if you know anything at all wrong, or possibly wrong—to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it as well as those that agree with it.

Richard Fyenman (1985)
"Cargo Cult Science"
in Surely You're Joking, Mr. Feynman"

Respect the null hypothesis. Don't assume that every spice variant is biologically relevant. You need evidence to support such a positive claim. In the absence of such evidence the default assumption is splicing error. (Don't use the term "alternative splicing" unless it refers to a biologically relevant process.)

Understand splicing error rates. There are lots of papers on the error rate of splicing. It can be as high as 0.1% meaning that incorrectly spliced transcripts will be present in all cells. Some of these errors are due to inappropriate binding of tissue-specific splice factors so many splicing mistakes will be confined to certain cell types.
Understand the importance of concentration. Most splice variants are present at less than one copy per cell. That's a good indication that they are splicing errors and not true alternative splicing.

Can the predicted protein isoforms be detected? The answer is "no" in the vast majority of cases. Most protein-coding genes produce a single protein that's similar to the homologous protein in all other species.

Understand evolution and the importance of conservation. Are the observed slice variants present in other closely-related species? Is there evidence that they are preserved by natural selection? If the answer is "no" then splicing errors is a better explanation than biological function.

Use common sense. Does it make sense that alternative splicing would produce multiple functional protein isoforms for each of the 10,000 or so housekeeping genes? Why would humans need to have multiple versions of RNA polymerase subunits or the subunits of ATP synthase? Why would we need multiple versions of every enzyme involved in amino acid metabolism or lipid biosynthesis?

"Ball is one of the most meticulous, precise science writers out there. He is the antithesis of hypey, "dumb-it-down" reporting. He is MUCH more credible than you are, Laurence."

John Horgan July, 2024

Understand junk DNA. 90% of our genome is junk. Much of that resides in introns whose size correlates well with the size of the genome. The more junk DNA you have the bigger the introns and the bigger the introns the greater the chance of splicing error.

Be wary of arguments from medical relevance. Some people argue that alternative slicing must be important because there are many genetic diseases that are caused by mistakes in splicing due to loss-of-function mutations. But many of the ones that have been studied carefully show that the genetic defect is due to an intron mutation that creates a spurious splice site. These are gain-of-function errors in junk DNA and they support the idea that splicing errors are significant.

Be concerned about bias, especially your own. Many scientists are upset about the fact that humans have "only" 20,000 protein-coding genes. They were surprised when the sequence of the human genome was published because they had not kept up with the literature on the number of genes. (See, Revisiting the deflated ego problem.) If you are one of those people, you need to be careful about accepting unsubstantiated just-so stories supporting your view that humans must be a lot more complicated at the molecular level than nematodes and fruit flies. Alternative splicing is one of those stories; it seems to justify the "surprising" fact that we only have 20,000 genes by postulating that each one produces multiple proteins. In order to make sense, the argument requires that more simple species must have a lot less alternative splicing but, unfortunately for the logic, it turns out that when you look as closely for transcripts in other species (e.g. nematodes, plants) they have lots and lots of low-level splice variants just like we do. (See ad hoc rescue.)

Watch out for cherry-picking. Cherry-picking is a form of fallacy where touting the existence of individual cases is used as justification for an unwarrented extrapolation. It's commonly seen in the scientific literature where, for example, the discovery of a small bit of new functional DNA sequence is used as evidence that all junk DNA must be functional. Or, the existence of many confirmed regulatory RNAs must mean that all transcripts must be regulatory RNAs. Or, the fact that some transposons have secondarily acquired a function must mean that all transposon sequences must have some (unknown) function. Same with pseudogenes. There are genuine examples of alternative splicing. Highlighting them is evidence that the phenomenon exists but it is not evidence that all splice variants are due to alternative splicing.

Blog posts on alternative splicing

The number of splice variants in a species correlates inversely with the population size - what does that mean?

1. Ball has responded to my criticism by claiming that he has, in fact, presented views that conflict with his main message. It's true that you can comb through his book and his latest essays and find references to contrary points of view but these are never presented in a coherent manner that challenges his main message.

Brenner, S. (2000) Biochemistry strikes back. Trends in biochemical sciences 25:584. [doi: 10.1016/S0968-0004(00)01722-9]

Graur, D., Zheng, Y., Price, N., Azevedo, R.B., Zufall, R.A. and Elhaik, E. (2013) On the immortality of television sets:“function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution 5:578-590. [doi: 10.1093/gbe/evt028]

Pan, Q., Shai, O., Lee, L.J., Frey, B.J. and Blencowe, B.J. (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature genetics 40:1413-1415. [doi: 10.1038/ng.259]

3 comments :

Mitchell said...: Important post; Thursday, October 31, 2024 5:42:00 PM
gert korthof said...: Larry wrote "Use common sense. Does it make sense ..."
That's a dangerous advice in evolutionary biology! Does it makes sense to have introns in the first place? I would say: NO! It's an accident.; Friday, November 01, 2024 4:35:00 AM
Michael Tress said...: Agree that this is an important post. Too busy to do a long critique now, but Ball's assumption is that most of the 80,000-400,000 proteins detected (which I think will probably turn out to be correct) are "involved in the processes most central to the operation of complex organisms".

There are AS isoforms that are highly conserved, tissue-specific and involved in critical cellular roles, and many of them are generated from swapping of one tandem duplicated exon for another (a criminally ignored set of AS variants). However, the vast majority of AS isoforms are expressed in tiny quantities, have no little or no evidence of conservation and are not tissue-specific. There's no way they have important roles in any cellular process, and they ought to regarded as biological noise unless shown otherwise.

The few examples that are reported (e.g. TBXT, TRPV1) don't disprove the rule.; Monday, November 04, 2024 8:47:00 AM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Thursday, October 31, 2024

Philip Ball's view of alternative splicing

Blog posts on alternative splicing

3 comments :