More Recent Comments

Friday, March 12, 2021

The bad news from Ghent

A group of scientists, mostly from the University of Ghent1 (Belgium), have posted a paper on bioRxiv.

Lorenzi, L., Chiu, H.-S., Cobos, F.A., Gross, S., Volders, P.-J., Cannoodt, R., Nuytens, J., Vanderheyden, K., Anckaert, J. and Lefever, S. et al. (2019) The RNA Atlas, a single nucleotide resolution map of the human transcriptome. bioRxiv:807529. [doi: 10.1101/807529]

The human transcriptome consists of various RNA biotypes including multiple types of non-coding RNAs (ncRNAs). Current ncRNA compendia remain incomplete partially because they are almost exclusively derived from the interrogation of small- and polyadenylated RNAs. Here, we present a more comprehensive atlas of the human transcriptome that is derived from matching polyA-, total-, and small-RNA profiles of a heterogeneous collection of nearly 300 human tissues and cell lines. We report on thousands of novel RNA species across all major RNA biotypes, including a hitherto poorly-cataloged class of non-polyadenylated single-exon long non-coding RNAs. In addition, we exploit intron abundance estimates from total RNA-sequencing to test and verify functional regulation by novel non-coding RNAs. Our study represents a substantial expansion of the current catalogue of human ncRNAs and their regulatory interactions. All data, analyses, and results are available in the R2 web portal and serve as a basis to further explore RNA biology and function.

They spent a great deal of effort identifying RNAs from 300 human samples in order to construct an extensive catalogue of five kinds of transcripts: mRNAs, lncRNAs, antisenseRNAs, miRNAs, and circularRNAs. The paper goes off the rails in the first paragraph of the Results section where they immediately equate transcripts wiith genes. They report the following:

  • 19,107 mRNA genes (188 novel)
  • 18,387 lncRNA genes (13,175 novel)
  • 7,309 asRNA genes (2,519 novel)
  • 5,427 miRNAs
  • 5,427 circRNAs

As Sandwalk readers know, there's a bit of a controvery over the functionality of transcripts. I maintain that most noncoding transcripts are junk RNA resulting from spurious transcription; therefore, it is incorrect to associate each transcript with a gene [On the misrepresentation of facts about lncRNAs] [How many lncRNAs are functional?].

I'm not the only one who's skeptical about lncRNAs. I haven't got time to list all the papers that discuss the controversy but here's one of the most important ones.

Palazzo, A.F. and Lee, E.S. (2015) Non-coding RNA: what is functional and what is junk? Frontiers in genetics 6:2(1-11). doi: doi: 10.3389/fgene.2015.00002

The genomes of large multicellular eukaryotes are mostly comprised of non-protein coding DNA. Although there has been much agreement that a small fraction of these genomes has important biological functions, there has been much debate as to whether the rest contributes to development and/or homeostasis. Much of the speculation has centered on the genomic regions that are transcribed into RNA at some low level. Unfortunately these RNAs have been arbitrarily assigned various names, such as “intergenic RNA,” “long non-coding RNAs” etc., which have led to some confusion in the field. Many researchers believe that these transcripts represent a vast, unchartered world of functional non-coding RNAs (ncRNAs), simply because they exist. However, there are reasons to question this Panglossian view because it ignores our current understanding of how evolution shapes eukaryotic genomes and how the gene expression machinery works in eukaryotic cells. Although there are undoubtedly many more functional ncRNAs yet to be discovered and characterized, it is also likely that many of these transcripts are simply junk. Here, we discuss how to determine whether any given ncRNA has a function. Importantly, we advocate that in the absence of any such data, the appropriate null hypothesis is that the RNA in question is junk.

Here's the problem. Not only do the Ghent scientists make the mistake of equating transcipts with genes, they also completely ignore the controversy. They do not reference the Palazzo and Lee paper in their list of 90(!) references, nor do they reference any other papers that question the functionality of noncoding transcripts.

This is not right. Is it possible that they are completely unaware of the controversy in their own field? Or is there another explanation?2

I'm reminded of something said by one of my favorite scientists when discussing "cargo cult science."

Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can — if you know anything at all wrong, or possibly wrong — to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it.

      Richard Feynman "Cargo Cult Science"

The essence of cago cult science is a complete lack of skepticism and an unwillingness to consider any explanation other than the one preferred by the cult.


1.The title of this post is a take-off on How They Brought the Good News from Ghent to Aix. I'm aware of the fact that the citizens of Gent are mostly Flemish and they prefer the spelling "Gent." I'm only using the English version of the name because that's the one used in the preprint.

2. I'm normally a fan of Hanlon's razor but there are times when stupidity just doesn't seem to be the right answer.

1 comment :

Mark Sturtevant said...

The famous quote from Sir Peter Medawar in a book review seems appropriate here: "...its author can be excused of dishonesty only on the grounds that before deceiving others he has taken great pains to deceive himself. "