I came across this paper while doing research on alternative splicing. The introduction annoyed me. It illustrates what to my mind are some serious problems with modern scholarship.
Scotti, M.M. and Swanson, M.S. (2016) RNA mis-splicing in disease. Nature Reviews Genetics 17:19–3 [doi: 10.1038/nrg.2015.3]Here's part of the first paragraph in the paper.
Recent analysis from the Encyclopedia of DNA Elements (ENCODE) project (GRCh38, Ensembl79) indicates that most of the human genome is transcribed and consists of ~60,000 genes (~20,000 protein-coding genes, ~16,000 long non-coding RNAs (lncRNAs), ~10,000 small non-coding RNA and 14,000 pseudogenes). Although this gene inventory will change with further analysis, the number of protein-coding genes is surprisingly low given the proteomic complexity that is evident in many tissues, particularly the central nervous system (CNS). High resolution mass spectrometry studies have identified peptides encoded by most of these annotated genes, but the number of isoforms expressed from this gene set has been estimated to be at least 5–10-fold higher. For example, long-read sequence analysis of adult mouse prefrontal cortex neurexin (Nrxn) mRNAs indicates that only three Nrxn genes produce thousands of isoform variants. This diversity is primarily generated by alternative splicing, with >90% of human protein-coding genes producing multiple mRNA isoforms.Here are some of the problems I have with this introduction. My opinions on these issues differ from those of the authors.
- I think that pseudogenes are not genes.
- I think there are NOT ~16,000 lncRNAs and ~10,000 small-noncoding RNA genes. Instead, there are approximately this many putative or predicted genes, many of which will undoubtedly turn out not to be genes. Some of them will be pseudogenes.
- I don't think there's a discrepancy between the known number of protein-coding genes and proteomic complexity; therefore, it is misleading to say that the number of protein-coding genes is "surprisingly low."
- I'm pretty sure that nobody has ever proposed a truly scientific "estimate" of isoforms showing that the number should be 5-10-fold higher than the number of genes. This is all speculation and guesswork based mostly on deflated egos.
- It is not true that >90% of human genes produce multiple mRNA isoforms by alternative splicing. What IS true is that for every human gene researchers have detected low levels of non-canonical splice events upon careful analysis of the transcriptome. We do not know whether these represent true biologically relevant alternative splicing or simply splicing errors. All available evidence suggests that the vast majority are splicing errors.
But surely there has to be a better way of expressing this opinion to make it clear that they aren't stating facts but just their own personal views based on their own interpretation of the literature? This becomes very important if there's widespread scientific controversy over some of these opinions. (It's not so important if there's widespread agreement, or consensus, in the scientific community. In those cases, you aren't obliged to mentions alternative views held by kooks.)
I believe that scientists have an ethical obligation to distinguish between fact and opinion and to make it very clear in their writings which is which. I don't know whether Scotti and Swanson know about the controversial aspect of their statements and are deliberately avoiding any mention of them, or whether they actually believe that their statements are factual. Either way, we have a problem.