More Recent Comments

Monday, September 04, 2023

John Mattick's paradigm shaft

Paradigm shifts are rare but paradigm shafts are common. A paradigm shaft is when a scientist describes a false paradigm that supposedly ruled in the past then shows how their own work overthrows that old (false) paradigm.1 In many cases, the data that presumably revolutionizes the field is somewhat exaggerated.

John Mattick's view of eukaryotic RNAs is a classic example of a paradigm shaft. At various times in the past he has declared that molecular biology used to be dominated by the Central Dogma, which, according to him, supported the concept that the only function of DNA was to produce proteins (Mattick, 2003; Morris and Mattick, 2014). More recently, he has backed off this claim a little bit by conceding that Crick allowed for functional RNAs but that proteins were the only molecules that could be involved in regulation. The essence of Mattick's argument is that past researchers were constrained by adherance to the paradigm that the only important functional molecules were proteins and RNA served only an intermediate role in protein synsthesis.

Mattick claims to have overthrown that dogmatic paradigm by discovering a huge array of regulatory RNAs that control gene expression in humans. According to him, there are so many of these regulatory RNAs that there's little room for junk DNA in our genome.

The latest expression of this point of view can be found in the book he has written with Paulo Amaral: RNA The epicenter of genetic information: A new undestanding of molecular biology. You can read the entire book online for free here.

We need to discuss whether Mattick's views really represent a new understanding of molecular biology—do the textbooks need to be rewritten?—or whether this is a paradigm shaft. Here's the latest version of Mattick's (and Amaral's) concept of the importance of regulatory RNA from the book.

The central dogma has held true to this day (except for the speculative transfer of information directly from DNA to protein), but became widely interpreted, including by Watson, as DNA makes RNA makes protein with its implicit assumption, not necessarily intended by Crick, that RNA functions only as an intermediate.

The two decades 1953 to 1972 were exhilarating and the new crop of molecular biologists were rightly pleased with what had been achieved, but their self-satisfaction and hubris were palpable. The lac operon and the Central Dogma consolidated the notion that (with exceptions like the few rRNA and tRNA types) genes are synonymous with proteins, and that all genetic information, including regulatory information, is transacted by proteins, not only in bacteria but also in developmentally complex plants and animals.

The belief that genes are synonymous with proteins reflected the mechanical zeitgeist of the age. Bicycles and cars have parts, and so do organisms—proteins that carry oxygen (hemoglobin), form skin (ketotins), signal energy levels (insulin) or control the activity of other genes (transcription factors), etc. It was just assumed that these 'conserved' components, whose expression is regulated by trans-acting transcription factors acting on malleable adjacent promoter-operator sequences, were enough explain all of biology.

It's true that many scientists assume that genes encode proteins and they don't consider noncoding genes. It's also true that there are many sites on the internet that define genes as protein-coding. However, when we're talking about paradigms, we don't take into account the mistaken beliefs of the hoi polloi who don't know any better. We're talking about scientists like Francis Crick and Jim Watson and others who were actively working in the field of gene expression. They were smart scientists and they knew about noncoding genes back in the 1970s.

And anyone who didn't know about noncoding genes by the end of the 1980s wasn't paying attention.

It's just not true that the active paradigm among knowledgeable molecular biologists was that genes are synonomous with proteins and it's not true that this conclusion was implied by the Central Dogma.

I remember going to phage meetings in the early 1970s where the role of regulatory RNAs in bacteriophage λ was being actively investigated. I'm thinking specifically of the antisense RNA transcribed from the control region. Later on, the antisense regulatory RNA transcribed from Q was shown to be a true regulatory RNA. I don't recall anyone saying that regulatory RNAs were forbidden by the Central Dogma and I don't recall anyone saying that genes could not produce functional noncoding RNAs.

By the time I was a post-doc I was working near a lab near run by Lucien Caro who was studying plasmid DNA replication. DNA replication in many bacterial plasmids is controlled by a regulatory RNA and I don't recall anyone saying that regulatory RNAs were forbidden by some mysterious paradigm.

Throughout the 1980s and 1990s we learned of snRNAs, snoRNAs, miRNAs, siRNAs, and piRNAs and, again, I didn't hear any announcements about overthrowing a paradigm.

I conclude that Mattick's description of such a protein-only paradigm is false.

The next step in a paradigm shaft is to show how you have overthrown the old (false) paradigm.

This book focuses on RNA as the main player in cell and developmental biology, but also on chromatin composition and regulatory logic. While most educated in the pre-genomic era taught gene regulation is primarily carried out by proteins, this became hard to reconcile the finding that genes encoding regulatory RNAs vastly outnumber protein-coding genes in humans, and the demonstrations of widespread sequence-specific guidance of effector proteins by RNAs. The simplicity and logic of base-pairing for sense-antisense target recognition and the ability of RNA to form complex three-dimensional structures are almost as old as the double-helix itself. The existence of regulatory RNAs was hinted during the early period molecular biology by genetic observations in fruit flies and maze, and by the appearance of unexplained bands in biochemical fractionations, but these were treated as oddities or interpreted through the lens of transcription factors, until the genome projects revealed the full extent of RNA expression in plants and animals.

We highlight the pioneers and controversies that accompanied the many unexpected observations, with particular attention to those that challenge the prevasiling consensus, often ignored, at least at first. The book spans the early confusion about the functions of proteins and nucleic acids, the elucidation of the double helix in the genetic code, the premature relegation of RNA to intermediary between gene and protein, the strange genomes and genetics of plants and animals, and the misguided musings that underpinned the idea of junk DNA. We chronicle the spectacular advances brought by gene cloning and genome sequencing, the small and large regulatory RNA revolutions, and the slowly dawning realization of the central role of transposon-derived sequences, intrinsically disordered proteins, 'enhancers' and the RNA directed epigenetic processes in multicellular development, which we have tried to integrate into a new framework for understanding genetic program programming.

Facts are important. Mattick and Amaral tell you that "genes encoding regulatory RNAs vastly outnumber protein-coding genes in humans" but that's not true. There are only a few hundred genes for regulatory RNAs that are supported by genuine evidence of function. There are thousands of transcripts that Mattick and his colleagues would like to believe are real regulatory RNAs but so far there's no evidence to support such a claim and plenty of evidence suggesting that those transcripts are just noise or junk RNA.2

What this means is that the false paradigm has not been overthrown. This is a classic paradigm shaft of the 3rd kind (false paradigm, false overthrow, false data).

By the way, pervasive transcription has been known since the early 1970s and abundant regulatory RNAs were discovered in the 1980s. Genome projects did not contribute to those early discoveries, they merely confused the issue by publicizing the exent of extraneous junk RNA transcripts in an effort to solve the Deflated Ego Problem.

This claim that the discovery of noncoding genes is a new idea has been published before; for example in Morris and Mattick (2014) they say,

Discoveries over the past decade portend a paradigm shift in molecular biology. Evidence suggests that RNA is not only functional as a messenger between DNA and protein but also involved in the regulation of genome organization and gene expression, which is increasingly elaborate in complex organisms.

This implies that noncoding genes were only discoverd in the 21st century. That would be a big surprise to the students who read my 1994 textbook where I said ...

We will define a gene as a region of DNA that is transcribed. It is important to keep this definition in mind; it means that there can be genes that do not encode proteins since not all transcripts are messenger RNA.3

I must have missed the message about a paradigm.

You might wonder how John Mattick can get away with making these claims but that's easy to explain. The false narrative that he is promoting is widely believed and it permeates the scientific literature. Most of today's young scientists are convinced that the scientists who worked out the fundamentals of gene expression back in the 1960s and 1970s believed that all genes encodied proteins and the concept of noncoding genes never entered their minds because it was ruled out by the Central Dogma. This false narrative is connected to opposition to junk DNA because, according to the myth, when those stupid scientists realized that only 2% of the human genome encoded protein, they just assumed that all the noncoding DNA was junk.

Here's a short list of some of those stupid scientists: Thomas Jukes, Francis Crick, Sidney Brenner, Tomoka Ohta, Jacques Monod, Motoo Kimura, Masatoshi Nei, Ford Doolittle, and Susumu Ohno.

1. The term "paradigm shaft" was first used by in a comment on my blog by "Diogenes" about ten years ago. Diogenes prefers to remain anonymous but has confirmed that they originated the phrase.

2. The latest summary of gene counts in the human genome [ensembl annotation] lists 25,134 noncoding genes, down from about 45,000 as recently as a few years ago. These are actually putative genes, for the most part, especially the 18,000 lncRNA genes. And they certainly don't "vastly outnumber protein-coding genes."

3. Moran, L., Scrimgeour, K.G., Horton, H.R., Ochs, R.S., and Rawn, J.D. (1994) Biochemistry, Neil Patterson Publishers/Prentice Hall, Englewood Cliffs, NJ (USA)

Mattick, J.S. (2003) Challenging the dogma: the hidden layer of non‐protein‐coding RNAs in complex organisms. BioEssays, 25:930-939. [doi: 10.1002/bies.10332]

Morris, K.V. and Mattick, J.S. (2014) The rise of regulatory RNA. Nature Reviews Genetics, 15(6), 423-437. [doi: 10.1038/nrg3722]


Gary S. Hurd said...

Thanks Larry.

I'll need 2 passes to absorb this one.

John Harshman said...

If only someone would write a book to set Mattick stright on all that.

Larry Moran said...

Nobody pays any attention to a book that raises the possibility that they've been working for years on spurious transcripts and nonexistent regulatory sequences. Their granting agencies might not like to hear that. Best to ignore such a book and hope that very few people read it.

Stephen Matheson said...

I skimmed the preface and first chapter and while I don't read Mattick and wish others would ignore him, I think the paradigm shaft he's digging isn't the one you quote. The authors are clear IMO about the history of the Central Dogma and they don't imply that any of the people of the time (in your list) thought that gene = protein coding. They make a predictable hash of the discussions of selfish DNA in the mid 70s and that alone shows they don't think that people were equating genes with proteins. Still, I love the phrase (new to me) 'paradigm shaft' and agree that Mattick is out to lunch.

Stephen Matheson said...
This comment has been removed by the author.
Joe Felsenstein said...

@Larry, yes, they probably won't read that book. I suspect that they have great plans for genome-wide high-throughput multi-investigator surveys to pin down the function of all parts of the genome. Which would be a great, perhaps Nobel-worthy project. In view of that the guy in Toronto is just being a spoilsport. Best to ignore him.

Larry Moran said...

@Stephen Matheson

Here's a quote from his latest paper, "Since then [1950's], the prevailing assumptions in molecular genetics have been, with minor exceptions, that 'genes' encode proteins, ..."

It's true that in the past few years he has backed of saying that this belief is due to the Central Dogma. That's because several of us have pointed out that he doesn't understand the Central Dogma. It took almost two decades to get him to read Crick's papers instead of just putting them in his references but I guess we have to count that as progress.

Here's a quote from his 2003 paper on "Challenging the dogma."

"The central dogma of biology holds that genetic information normally flows from DNA to RNA to protein. As a consequence it has been generally assumed that genes generally code for proteins, and that proteins fulfil not only most structural and catalytic but also most regulatory functions, in all cells, from microbes to mammals."

There are many other examples. He has probably apologized for screwing up the Central Dogma and credited his critics for correcting him, but I must have missed that paper. The link to his 2003 paper is in this post.

How Much Junk in the Human Genome?

Larry Moran said...

@Stephen Matheson: Diogenes coined the term "paradigm shaft" on this blog several years ago. I agree that it's a useful term and that's why I use it.