Saturday, January 08, 2011

Extraordinary Claims about Human Genes

Sandra Porter of Discovering Biology in a Digital World has recently attended a talk by Chris Mason of Cornell University. According to Sandra, Chris Mason made the following claims based on his analysis of RNAs from various tissues (human? mammal?). [Next Generation Sequencing adds thousands of new genes]
  1. A large fraction of the existing genome annotation is wrong.
  2. We have far more than 30,000 genes, perhaps as many as 88,000.
  3. About ten thousand genes use over 6 different sites for polyadenylation.
  4. 98% of all genes are alternatively spliced.
  5. Several thousand genes are transcribed from the "anti-sense"strand.
  6. Lots of genes don't code for proteins. In fact, most genes don't code for proteins.
I bet that every one of those claims is wrong.

There's a saying about extraordinary claims—they require extraordinary evidence. In this case, I'm pretty sure the "evidence" is the detection of low abundance transcripts using highly sensitive sequencing technology. Anyone who's ever learned about DNA binding proteins knows about non-specific binding and they know that spurious transcription is inevitable. In order to overthrow our view of the number of genes and how they behave, you will have to convince me that you've ruled out accidental spurious transcription (junk RNA).

I think it's somewhat disingenuous to be giving a talk where you claim we have 88,000 genes and 98% of them are alternatively spliced. (The term "alternative splicing" implies biological significance and not just splicing errors.)

In order to evaluate transcriptome data we need to know the abundance of the transcript. It's not sufficient to simply report that such-and-such region of the genome was transcribed. Researchers have got to report the average number of transcripts per cell in the tissue they are analyzing. I'm betting that if we saw that data we would instantly recognize that the so-called new "genes" are producing less than one transcript per cell. If that's the case it can't be biologically significant in a large mammalian cell.


  1. 98% of all genes are alternatively spliced.

    I wonder how many of these transcripts contain premature termination codons and will undergo nonsense mediated mRNA decay anyway.

  2. I'm even willing to believe the "98% of the genes are alternatively spliced" part just because we tend to underestimate the complexities of cellular processes. But I think, as the paper you ref. in your next post pointed out, the key point here is functional relevance . If only these big numbers are thrown around with some explanation following them, rather than just to impress the audience.

  3. @Sparc

    take a look at this paper

  4. I sometimes wonder how much of this comes from the errors in high through put methods.