Wednesday, June 21, 2017

John Mattick still claims that most lncRNAs are functional

Most of the human genome is transcribed at some time or another in some tissue or another. The phenomenon is now known as pervasive transcription. Scientists have known about it for almost half a century.

At first the phenomenon seemed really puzzling since it was known that coding regions accounted for less than 1% of the genome and genetic load arguments suggested that only a small percentage of the genome could be functional. It was also known that more than half the genome consists of repetitive sequences that we now know are bits and pieces of defective transposons. It seemed unlikely back then that transcripts of defective transposons could be functional.

Part of the problem was solved with the discovery of RNA processing, especially splicing. It soon became apparent (by the early 1980s) that a typical protein coding gene was stretched out over 37,000 bp of which only 1300 bp were coding region. The rest was introns and intron sequences appeared to be mostly junk.

Since there are about 20,000 protein-coding genes, this means that almost 25% of the genome is transcribed just to produce those 20,000 proteins. This accounted for some of pervasive transcription. If you add in known genes for all the noncoding RNAs then you get to about 30% of the genome.

What about the rest? The latest deep sequencing studies from ENCODE and other sources suggest that about 80% of the genome is transcribed. Are there huge numbers of undiscovered genes for noncoding RNAs?

The growing consensus is that most of this transcription is spurious or accidental transcription produced by inappropriate transcription initiation at random sites in the genome. This is consistent with data showing that most transcripts are present at very low concentrations in the cells where they are detected. It's also consistent with the idea that most of these transcripts come from regions of the genome that are not conserved—it could be junk DNA.

Another bit of evidence is largely negative. After decades of looking for function there is still only a relatively small proportion of those transcripts that have been shown to have a biological function. Furthermore, the specificity of these transcripts is consistent with low level spurious transcription due to inappropriate binding of cell-specific transcription factors [see How many lncRNAs are functional: can sequence comparisons tell us the answer?].

As my colleagues Alex Palazzo and Eliza Lee point out, the default, or null, hypothesis should be that these transcripts do not have a function. The onus is on those who wish to show that most of them have a function (Palazzo and Lee, 2015).

John Mattick is one of those who defend the functionality of pervasive transcription. He thinks the human genome is full of thousands of noncoding genes that exquisitely regulate the protein-coding genes. He assumes that most of the transcripts are functional just because they exist.

Mattick was awarded a prestigious prize a few years ago for his great insight (?) [John Mattick Wins Chen Award for Distinguished Academic Achievement in Human Genetic and Genomic Research]. Let me remind you of the citation ...
The Award Reviewing Committee commented that Professor Mattick’s “work on long non-coding RNA has dramatically changed our concept of 95% of our genome”, and that he has been a “true visionary in his field; he has demonstrated an extraordinary degree of perseverance and ingenuity in gradually proving his hypothesis over the course of 18 years.”
I was hoping that, over time, Mattick would begin to appreciate the view that most transcripts are non-functional. I was hoping that he would, at the very least, begin to understand the arguments of his opponents (I am one).

That's a forlorn hope as his latest paper shows. Mattick and his colleagues work at the Garvan Institute of Medical Research in Sydney Australia. The Garvan Institute is associated with the University of New South Wales. Their latest review appears in Trends in Genetics (Deveson et al., 2017). Here's the abstract ...
The combination of pervasive transcription and prolific alternative splicing produces a mammalian transcriptome of great breadth and diversity. The majority of transcribed genomic bases are intronic, antisense, or intergenic to protein-coding genes, yielding a plethora of short and long non-protein-coding regulatory RNAs. Long noncoding RNAs (lncRNAs) share most aspects of their biogenesis, processing, and regulation with mRNAs. However, lncRNAs are typically expressed in more restricted patterns, frequently from enhancers, and exhibit almost universal alternative splicing. These features are consistent with their role as modular epigenetic regulators. We describe here the key studies and technological advances that have shaped our understanding of the dimensions, dynamics, and biological relevance of the mammalian noncoding transcriptome.
Let's see how he deals with the controversy.

Most of the genome is junk. There is abundant evidence that most of our genome (~90%) is junk. That means most of pervasive transcription produces junk RNA. Mattick and his colleagues don't deal with the evidence in favor of junk DNA.

Sequence conservation. The sequences and expression of most transcripts are not conserved. This is exactly what you expect for spurious transcription of junk DNA. This doesn't bother Mattick and his colleagues because they think the functional transcripts are likely human specific and therefore not conserved.

Low abundance. The paper argues that low abundance is not a problem because many of the lncRNAs are highly concentrated in a small number of cells rather than being distributed at low levels over thousands of cells.
One of the key concerns about the biological relevance of lncRNAs has been their low abundance in tissue samples, sometime argued to be a manifestation of "transcritpional noise." However, accumulating evidence suggest that this reflects heightened spatiotemporal precision rather than low abundance background noise.

Lack of proven function. The fact that only a few transcripts have a function is not a problem because we have some excellent examples of functional RNAs. Furthermore, assays for function by deletion or knock-out have revealed that some unknown lncRNAs might be functional. For example, a recent study showed that deleting 700 lncRNAs uncovered 51 (7%) that produced a change in the phenotype of cancer cells. Another knock-out study showed that 499 out of 16,401 (3%) affected cell growth. I prefer to see this glass as mostly empty.

Specificity. The fact that many transcripts are only found in some cell types is often used as an indication of function and that's how Mattick and his colleagues see it. However, cell specificity is also what you expect from spurious transcription. They don't mention that.

Alternative splicing. Mattick believes that most (probably all) protein-coding genes undergo alternative splicing to produce at least 10-12 different protein isoforms. To him, this is an example of fine-tuned regulation. He cites papers showing that many lncRNAs precursors are also alternatively spliced to produce a variety of different forms. To his way of thinking, this analogy to protein-coding genes implies that lncRNA are also part of a sophisticated gene regulation network.
The exon-centric architecture of lncRNAs, in which exons are recombined into a dizzying diversity of isoforms, can also be reconciled with an RNA-driven developmental program. This implies that exons may act as discrete functional domains, each with a unique and specific affinity for external biomolecules (specific protein domains, DNA motifs, etc.). Modular recombination of lncRNA exons may enable diverse and dynamic interactions, for instance by delivering a particular chromatin remodeler to particular sites in the genome at specific moments in development.
I think most splice variants in protein-coding genes are due to splicing errors and the typical protein-coding gene makes only one polypeptide. All available evidence is consistent with this view.

Since I don't believe in abundant alternative splicing in protein-coding genes, I don't think abundant splice variants in non-functional transcripts of junk DNA are significant.

The controversy over functional transcripts continues. It's disappointing that proponents of function fail to address the arguments of their opponents and it's disappointing that most of them still believe in abundant alternative splicing. It's disappointing that they ignore the arguments in favor of junk DNA. It's disappointing that they dismiss sequence conservation as an important criterion in deciding function.

This is largely a clash of worldviews. Some of us see biology as inherently sloppy and messy and we aren't the least bit surprised about splicing errors, transcription errors, and junk DNA. Other see cells more like finely-tuned Swiss watches where every molecule has a precise role to play and every molecule has a function. All you have to do is discover that function.

Deveson, I.W., Hardwick, S.A., Mercer, T.R., and Mattick, J.S. (2017) The Dimensions, Dynamics, and Relevance of the Mammalian Noncoding Transcriptome. TRENDS in Genetics. 33:464-478. [doi: 10.1016/j.tig.2017.04.004]

Palazzo, A.F., and Lee, E.S. (2015) Non-coding RNA: what is functional and what is junk? Frontiers in Genetics, 6. [doi: 10.3389/fgene.2015.00002]


  1. It seems that Mattick's arguments are very much like the arguments used by adaptionists. If the cell makes an RNA molecule then it must be functional simply by the fact that it makes the RNA molecule. The same for alternate splicing. I can only see this as mistaken anthropomorphizing on the part of Mattick and others.

    Enzymes and molecules have no intentions and are not 100% precise. The neat little equations that we drew in high school chemistry class were idealized formulas. 2H2 + O2 does not always produce 2H2O. That reaction also produces different amounts of hydrogen peroxide and waters with oxygen radicals. Enzymes don't react with just one substrate, and they don't always produce just one product. Transcription factors don't bind to just one DNA sequence, but can also bind to slightly different DNA sequences, even if it is at low levels.

    Chemistry and biology are messy processes, and Mattick seems to sweep this under the rug.

  2. One question that haunts the pervasive transcription believers remains:

    What is the ultimate advantage of such a strategy in producing in excess to discard a large fraction rather than more strictly targeting the production to the cellular needs?

    If you choose the evolutionary paradigm first and foremost, no matter where the evidence is pointing, then you have no choice but to come up with another paradigm that contradicts the very foundation your first paradigm had been build on...

    Occam's razor is never applied...

    1. It's (pervasive transcription) not there because it's an advantage. It's there because natural selection can't get rid of it, because it's an unavoidable consequence of intermolecular binding. It's just a fact that a transcription factor can bind more than one particular sequence of DNA.

    2. To build on what MRR is saying, evolution rarely finds an optimum, only what is good enough. Somewhat sloppy gene regulation, or "leaky transcription" in molecular biology terms, is good enough. The metabolic burden of junk DNA is probably not that large, especially when you look at the metabolic burden for things such as locomotion. Lifting a 200 lb rock probably uses up more energy than the total amount of junk RNA you produce in a week (just a guess on my part, but probably not too far off).

      As long as junk RNA does not interfere with function then then it just isn't that deleterious. On top of that, the changes needed for ultra-specific and tight gene regulation would require tons of changes to multiple proteins, and the selective pressure just isn't there for such high precision.

  3. What is their evidence for alternative splicing in lncRNA? Do they see long noncoding transcripts from particular loci with various 'introns' removed and 'exons' joined together? That would be fairly interesting, I suppose, but like you I don't see how that strongly argues for function.

    1. I'd actually expect alternatively spliced RNA, because some introns are self-splicing and you could easily see how they could some times fail to work, producing alternative variants of the same lncRNA.

    2. Since lncRNA is "long non-coding" RNA, I would suspect that there are no exons in lncRNA. Some people may define the entire lncRNA molecule to be a single intron. I also wouldn't doubt that lncRNA molecules would produce stem-loop structures that would be susceptible to DICER activity, or other enzymes that modify functional RNAs. However, that is what we would expect from junk RNA since random RNA sequences can produce stem-loop structures.

    3. Since lncRNA is "long non-coding" RNA, I would suspect that there are no exons in lncRNA.

      ?? Sure there are. Whenever a transcript is processed by the splicing machinery, the portions that are excised are the introns and the portions that make it to the mature product are the exons.
      Whether the mature product is translated or not is irrelevant.

  4. I find it quite extraordinary that scientists should be advised to label something which they don't fully understand as "junk" (i.e. of no interest). This doesn't seem (to me) like a scientific approach to anything.

    1. Your mistake here is to think we don't understand it, and that just because we know from several lines of evidence that it's mostly junk, that therefore it is without interest. It is of great interest regardless of whether it is junk or not.

    2. I suspect you mean that you hypothesize that it is.

    3. No. I meant the words I used in a direct and very literal sense. No ambiguity.

    4. Well, I'm not trying to upset anyone, but I recall that there was a time when some scientists "knew" that all the non-coding DNA was junk.