More Recent Comments

Friday, September 27, 2024

John Mattick's seminar at the University of Toronto

I just learned that John Mattick gave a seminar this morning at the Department of Cell & Systems Biology at the University of Toronto. Unfortunately, I was unable to attend.

Most Sandwalk readers will recognize Mattick as one of the few remaining vocal opponents of junk DNA. He is probably best known for his dog-ass plot but this is only one of the ways he misrepresents science.

The title of Mattick's seminar was Enhancers are genes. Here's the abstract.

There are hundreds of thousands of highly alternatively spliced long noncoding RNAs (lncRNAs) expressed from the human genome. There are hundreds of publications describing the involvement of lncRNAs in various developmental, neurological and disease processes, including our recent work showing that a lncRNA is required for hippocampal but not cortical memory and another for spatial navigation in females. However, a conceptual framework for understanding lncRNA evolution and function is lacking. Synthesizing the evidential landscape, it appears that most lncRNAs are the products of genetic loci called enhancers, which control the spatiotemporal patterns of development, estimated to number hundreds of thousands, perhaps well over a million, in the human genome. A widely accepted model proposed in the 1980s posited that enhancers function as binding sites for transcription factors that loop to contact promoters of target genes, but it has since emerged that enhancers are transcribed in the cells in which they are active to produce (like protein-coding genes) both short bidirectional transcripts and long multi-exonic RNAs. A variety of studies have shown that enhancer-derived lncRNAs (elncRNAs) are required for enhancer function and that alternative splicing of elncRNAs alters enhancer action. The emerging picture is that elncRNAs scaffold topologically associated phase-separated chromatin domains by interaction with histone modifiers, transcription factors and other proteins containing intrinsically disordered regions (IDRs) and guide these proteins to target genes by RNA-DNA interactions. The recognition that enhancers are genes explains the g-value enigma, the cell-specific expression and rapid evolution of lncRNAs under positive selection for adaptive radiation, and the genome-wide incidence of transcription start sites, splice sites and epigenetic modifications documented by the ENCODE project. Moreover, the extensive alternative splicing of lncRNAs and posttranslational modifications of the IDRs in proteins required for developmental processes provides a framework for understanding the numbers and features of the majority of lncRNAs and how trillions of cell fate decisions are made accurately during human ontogeny. The next challenge is to decipher the structure-function relationships in lncRNAs.

The first sentence is not correct. What Mattick meant to say is that there are hundreds of thousands of transcripts and many of them are associated with abundant splice variants. It is extremely misleading to equate "transcripts" and "lncRNAs" because this implies that those transcripts have a known function of some sort. Similarly, the term "alternative splicing" should be reserved for those splice variants that are known to be associated with functon. Most splice variants are just errors in splicing. [Splicing errors or alternative splicing?]

The second sentence is correct. There are, indeed, hundreds of publications suggesting that some transcripts have a function. Most of them fail to make such a connection.

It is not true that a "conceptional framework for understanding lncRNAs is lacking." The well-established conceptual framework is that scientists just have to recognize that most transcripts are spurious junk RNA and you need to provide solid evidence if you want to convince others that a given transcript is a functional lncRNA. That evidence has been convincingly provided for dozens of lncRNAs but "dozens" is a lot less than hundreds of thousands (see Palazzo and Lee, 2015). This is all covered in my book: Chapter 8: Noncoding Genes and Junk RNA.

Mattick goes on to state that there might be as many as one million enhancer-associated lncRNA required for regulation. Assuming that there are 25,000 genes (a generous estimate), that means as many as 40 of these regulatory lncRNAs per gene. Really? Is there anyone other than John Mattick who thinks this is reasonable? Why would 10,000 housekeeping genes, such as the genes for glycolytic enzymes or ribosomal proteins, need such complicated regulation? Is there even a single well-documented example of a human gene that is regulated by half-a-dozen regulatory RNAs?

It's true that there is some good solid evidence showing that transcription of the regulatory sequences of some genes may be functionally important. But there's also good evidence that some of the transcripts are simply due to spurious transcription fired off in the wrong direction from bidirectional promoters. Mattick is very fond of generalizing from a few examples; in this case that fact that some enhancer RNAs have a function does not mean that all trancripts in enhancer regions are functional.

If there is a well-documented case for the function of a transcript in an enhancer region, then the transcribed region qualifies as a gene. You would say that there is a non-coding gene spanning this locus and the locus also includes an enhancer for a different gene. It doesn't make a lot of sense to say that the particular enhancer is a gene and it certainly doesn't make any sense to phrase it like Mattick does where he implies that all enhancers are genes.

The G-value enigma, or G-value paradox, refers to the problem that some scientists faced when they learned that humans had roughly the same number of protein-coding genes as many other species. This didn't fit with their view that humans are much more complicated than other animals so humans should have a lot more genes. The term was first described by Hahn and Wray (2002) when the first draft of the human genome was published but Mattick and his colleagues also see this as a serious problem (Taft et al., 2007). I see it is part of the Deflated Ego Problem. There's no reason to think that humans need to have lots more genes than other species. [Revisiting the deflated ego problem] [Deflated egos and the G-value paradox]

Mattick thinks that most protein-coding genes are regulated by dozens of "enhancer" genes and this resolves his deflated ego problem and confirms the ENCODE claims. Really?

Finally, according to the abstract, alternative splicing and intrinsically disordered regions in proteins help us understand the possible functions of lncRNAs and developmental biology. I'd wish I could have heard more about this great understanding because I didn't think there was a problem understanding development.

I hope somebody who reads this was at the seminar so they can fill me in on all the paradigms that were shifted.

Should my sister department have invited John Mattick to give a seminar? You could argue that it was a good opportunity to refute most of his crazy ideas but I'd be surprised if that actually happened during the question period after the seminar. I'm worried that there will be many students at the seminar who don't get to hear why Mattick's ideas are not widely accepted by knowledgeable scientists. I hope they read my book.


Hahn, M.W. and Wray, G.A. (2002) The g-value paradox. Evolution and Development 4:73-75. [doi: 10.1046/j.1525-142X.2002.01069.x]

Palazzo, A.F. and Lee, E.S. (2015) Non-coding RNA: what is functional and what is junk? Frontiers in genetics 6:2(1-11). [doi: 10.3389/fgene.2015.00002]

Taft, R.J., Pheasant, M. and Mattick, J.S. (2007) The relationship between non‐protein‐coding DNA and eukaryotic complexity. BioEssays 29:288-299. [doi: 10.1002/bies.20544]

No comments :