- A large fraction of the existing genome annotation is wrong.
- We have far more than 30,000 genes, perhaps as many as 88,000.
- About ten thousand genes use over 6 different sites for polyadenylation.
- 98% of all genes are alternatively spliced.
- Several thousand genes are transcribed from the "anti-sense"strand.
- Lots of genes don't code for proteins. In fact, most genes don't code for proteins.
There's a saying about extraordinary claims—they require extraordinary evidence. In this case, I'm pretty sure the "evidence" is the detection of low abundance transcripts using highly sensitive sequencing technology. Anyone who's ever learned about DNA binding proteins knows about non-specific binding and they know that spurious transcription is inevitable. In order to overthrow our view of the number of genes and how they behave, you will have to convince me that you've ruled out accidental spurious transcription (junk RNA).
I think it's somewhat disingenuous to be giving a talk where you claim we have 88,000 genes and 98% of them are alternatively spliced. (The term "alternative splicing" implies biological significance and not just splicing errors.)
In order to evaluate transcriptome data we need to know the abundance of the transcript. It's not sufficient to simply report that such-and-such region of the genome was transcribed. Researchers have got to report the average number of transcripts per cell in the tissue they are analyzing. I'm betting that if we saw that data we would instantly recognize that the so-called new "genes" are producing less than one transcript per cell. If that's the case it can't be biologically significant in a large mammalian cell.