Wednesday, November 12, 2008

Genes and Straw Men

Just in case there's someone who doesn't understand the concept of "straw man," here's a good description from Wikipedia: Straw Man.
A straw man argument is an informal fallacy based on misrepresentation of an opponent's position.[1] To "set up a straw man," one describes a position that superficially resembles an opponent's actual view, yet is easier to refute. Then, one attributes that position to the opponent. For example, someone might deliberately overstate the opponent's position.[1] While a straw man argument may work as a rhetorical technique—and succeed in persuading people—it carries little or no real evidential weight, since the opponent's actual argument has not been refuted.[2]

The term is derived from the practice in ages past of using human-shaped straw dummies in combat training. In such training, a scarecrow is made in the image of the enemy, sometimes dressed in an enemy uniform or decorated in some way to vaguely resemble them. A trainee then attacks the dummy with a weapon such as a sword, club, bow or musket. Such a target is, naturally, immobile and does not fight back, and is therefore not a realistic test of skill compared to a live and armed opponent. It is occasionally called a straw dog fallacy, scarecrow argument, or wooden dummy argument.[citation needed] In the UK, it is sometimes called Aunt Sally, with reference to a traditional fairground game.
You'd be surprised how often this fallacy comes up—and it's not just IDiots who use it.

The other day I attended a seminar by Jacek Majewski of McGill University (Montreal, Quebec, Canada). The subject was alternative splicing.

As most of you already know, this is a controversial field. Many people believe that alternative splicing is very common and that 50-70% of all human genes produce multiple versions of proteins due to alternative splicing. Majewski is one of those people.

Others, I am one, believe that much of the data is based on artifacts—especially expressed sequence tag (EST) artifacts. We believe that there are some very well established, and well-studied examples of alternative splicing but these represent only a small percentage of the total genes in the human genome.1 We'll call these two groups the "splicing is common" advocates and the "splicing is rare" advocates.

The "common" group likes to think of themselves as the leading edge of a paradigm shift. They believe that alternative splicing is so common that it requires a new way of looking at biology. Unfortunately, in their haste to promote the new paradigm, they often misrepresent the other side. As a matter of fact, the very existence of a legitimate scientific controversy is often deliberately overlooked because they set up a straw man that is easily refuted.

Here's an example. In Majewski's seminar he started by describing the current "dogma" of one gene-one enzyme. According to him, most biologists are wedded to the idea that each gene makes a single protein. They believe, according to Majewski, that the intermediate step of mRNA synthesis is unimportant. He even showed a slide illustrating the dogma. It represents the old paradigm.

At the end of the seminar I pointed out that we have been teaching a different version of information flow for over thirty years. I mentioned that all the leading textbooks talk about splicing and alternative splicing and, furthermore, this material has been in the textbooks for 25 years (e.g. Genes II by Benjamin Lewin published in 1983). I asked him if he actually knew any scientists who believed in the dogma that he described. His response was confusing but he didn't back down.

Why is this important? Because most of the "common" advocates focus on convincing us that alternative splicing is real rather than focusing on whether it is common. By refuting the straw man they hope to bolster their case for the prevalence of alternative splicing. But they do no such thing. Most scientists are well aware of alternative splicing and have been for decades. The dispute is not over whether it occurs but whether it is common. The straw man version of the opposition does not exist.

I was prompted to write about this form of rhetorical device by reading an article in Monday's New York Times. The article (Now: The Rest of the Genome) was written by Carl Zimmer. Most of you know what I think of Carl Zimmer. He is one of the best science writers on the planet [Carl Zimmer at Chautauqua] but this time he slipped up.

Zimmer writes about Sonja Prohaska, a bioinformatician at the University of Leipzig in Germany.
... new large-scale studies of DNA are causing her and many of her colleagues to rethink the very nature of genes. They no longer conceive of a typical gene as a single chunk of DNA encoding a single protein. “It cannot work that way,” Dr. Prohaska said. There are simply too many exceptions to the conventional rules for genes.

It turns out, for example, that several different proteins may be produced from a single stretch of DNA. Most of the molecules produced from DNA may not even be proteins, but another chemical known as RNA. The familiar double helix of DNA no longer has a monopoly on heredity. Other molecules clinging to DNA can produce striking differences between two organisms with the same genes. And those molecules can be inherited along with DNA.

The gene, in other words, is in an identity crisis.
I don't think there are any significant number of biochemists or molecular biologists who literally believe that every gene encodes a single protein. Everyone I know understands that there are ribosomal RNA genes, tRNA genes, and genes for all kinds of small RNAs. Everyone I know understands alternative splicing. (On the other hand, nobody I know thinks that epigenetics is any threat to our definition of a gene.)

If the gene has an identity crisis, which it does, it's not because of ignorance of these phenomena, it's because we can't all agree on a good definition. My own preference is to define as a gene as, "A gene is a DNA sequence that is transcribed to produce a functional product" [What Is a Gene?] and I've been using that definition in my own textbooks since 1989.

It's sad to hear that up until recently Sonja Prohaska and her colleagues believed in a long-discredited definition of a gene. It suggests that throughout her undergraduate and graduate education she never heard of ribosomal RNA genes or alternative spicing. (She got her Ph.D. in 2005.) Either that or she's deliberately setting up a straw man.

Carl Zimmer goes on to describe recent work on the analysis of the human genome, especially the work done by the ENCODE project.
Encode’s results reveal the genome to be full of genes that are deeply weird, at least by the traditional standard of what a gene is supposed to be. “These are not oddities — these are the rule,” said Thomas R. Gingeras of Cold Spring Harbor Laboratory and one of the leaders of Encode.

A single so-called gene, for example, can make more than one protein. In a process known as alternative splicing, a cell can select different combinations of exons to make different transcripts. Scientists identified the first cases of alternative splicing almost 30 years ago, but they were not sure how common it was. Several studies now show that almost all genes are being spliced. The Encode team estimates that the average protein-coding region produces 5.7 different transcripts. Different kinds of cells appear to produce different transcripts from the same gene.
With all due respect to Carl, these sentences contradict what he implied earlier on. Yes, it's true that scientists have known about alternative splicing for 30 years. In other words, they have known for at least that long that the old idea about one gene-one protein is incorrect. So what was the point of letting readers think that Sonja Prohaska's personal misunderstanding of a gene has any relevance?

As I mentioned above, the scientific controversy over alternative splicing is about how common it is and not about whether modern scientists recognize its existence. And it has nothing to do with the modern understanding of a gene since for the past 20 years everyone has incorporated alternative splicing into their understanding of a gene.

Thomas Gingras is clearly on the "common" side of the issue and not on the "rare" side. Unfortunately Zimmer doesn't do a good job of balance here. A better way to describe the results would be ...
Taken a face value, some of the published results from the ENCODE project suggest that, far from being a rare event, alternative splicing may be very common. In fact, some scientist think that most of our genes produce several different proteins due to alternative splicing. They even suggest that an average gene may produce five or six different alternatively spliced transcripts.

Other scientists dispute these results, pointing out that the predicted alternatively spliced transcripts make no sense for those genes that have been well-studied. These predictions are being quietly removed from the annotated human genome database. As more and more genes are being looked at, the number of proven protein variants gets smaller and smaller.

The original predictions rely heavily on the sequences of small bits of RNA called "ESTs" and it is becoming increasingly clear that many, perhaps most, ESTs are artifacts. It is quite possible that talk of changing paradigms is premature and the number of genes exhibiting alternative splicing may be closer to what scientists thought twenty years ago.

These are interesting times in genome research and, like all new fields, the preliminary results are exciting and provocative. Who knows whether the preliminary results will lead to new ways of looking at biology? Time will tell.

1. I'm using the human genome as an example. The same arguments apply to other genomes.

[Image Credit The Information Paradox: A Favorite Theist Logical Fallacy: The Straw Man]


  1. Small edit to make this very appropriate for yet another subject...

    Why is this important? Because most of the "common" advocates focus on convincing us that [functional non-protein-coding DNA] is real rather than focusing on whether it is common. By refuting the straw man they hope to bolster their case for the prevalence of [functional non-protein-coding DNA]. But they do no such thing. Most scientists are well aware of [functional transposable elements] and have been for decades. The dispute is not over whether it occurs but whether it is common. The straw man version of the opposition does not exist.

  2. I once search out and read all of the English first references to and uses of Straw Man (via the OED, of course) and this definition given by Wiki is only part of the story. I had always thought this was the best description, but it is likely that the phrase has multiple origins.

    Not that this matters to your point, of course.

  3. Are you saying that the Wikipedia definition of "straw man" is, itself, a straw man?

    (sorry, someone had to say it...)

  4. There are more "uses" for alternative processing than just making different protein isoforms.

    Also, splicing is at least as error-riddled as transcription initiation. The resulting mistakes are not experimental artifacts, they are reality and they factor mightily into the overall picture of RNA biosynthesis and degradation.

  5. There is no question that people are overselling ncRNA and alternative splicing

    Regarding the real question about how common it is, it seems like it is very common:

  6. I didn't like that article either - and I hate that the whole idea of DNA regulatory elements are just simply glossed over.

  7. This is shocking...I'm a 3rd year U of T student and I knew about alternative splicing and thought it pretty common knowledge. I can't believe someone with a PhD in anything anywhere near the field of genetics or biology in general could be not only unaware of it, but when finding about about it making such grand statements on it without looking into it at all first.

    Seems like deliberate deception must be the case.