More Recent Comments

Friday, October 11, 2024

Philip Ball says RNA may rule our genome

Philip Ball is on a roll. He has published a new book plus several articles in popular magazines and he has appeared in a bunch of podcasts and YouTube videos. The message is all the same, he claims that it's time for a revolution in biology.

Ball's ideas are complicated and I won't go into all of them in this article. Instead, I want to focus on one of his more scientific claims; namely, the claim that genomic data has overthrown the fundamental principles of molecular biology. Let's look at his recent (May 14, 2024) article in Scientific American: Revolutionary Genetics Research Shows RNA May Rule Our Genome.1

The subtile of the article is "Scientists have recently discovered thousands of active RNA molecules that can control the human body" and that's the issue that I want to discuss here.

Ball begins with the same old myth that writers like him have been repeating for many years. He claims that before ENCODE most molecular biologists were really stupid. According to Philip Ball, most of us thought that coding DNA was the only functional part of the genome and most of the rest was junk DNA.

Making proteins was thought to be the genome’s primary job. Genes do this by putting manufacturing instructions into messenger molecules called mRNAs, which in turn travel to a cell’s protein-making machinery. As for the rest of the genome’s DNA? The “protein-coding regions,” [Thomas] Gingeras says, were supposedly “surrounded by oceans of biologically functionless sequences.” In other words, it was mostly junk DNA.

This is extremely misleading. Knowledgeable scientists knew that coding regions only took up about 1% of the genome but 10% is functional. That leaves plenty of room for regulatory sequences, non-coding genes, and other functional DNA elements.

Ball notes that the ENCODE papers published in 2012 showed that up to 75% of the genome is transcribed at some time or another. He claims that pervasive transcription was a surprise but even that's not true. The idea that a large fraction of the genome is transcribed has been common knowledge among the experts for more than 50 years. By the end of the 1970s they knew that protein-coding genes were huge because of large introns and we now know that these genes take up almost 40% of the genome. If you add in the known non-coding genes then genes cover about 45% of the genome so at least that much is regularly transcribed. Most of it is introns and introns are junk. In addition, lots of spurious transcripts arise from bits and pieces of viruses and transposons that litter the genome.

Furthermore, the idea of pervasive transcription was widely promoted in the first stage of ENCODE when they published a series of papers in 2007. There was a lot of discussion back then over whether most of those transcripts were junk. [The ENCODE publicity campaign of 2007] What this means is that by 2012, knowledgeable scientists were well aware of the fact that 45% of the genome was genes (and therefore transcribed) and much of the rest of the genome was also transcribed but the transcripts were junk RNA.

That's not exactly the story that Philip Ball wants to tell.

So it came as rather a shock when, in several 2012 papers in Nature, he and the rest of the ENCODE team reported that at one time or another, at least 75 percent of the genome gets transcribed into RNAs. The ENCODE work, using techniques that could map RNA activity happening along genome sections, had begun in 2003 and came up with preliminary results in 2007. But not until five years later did the extent of all this transcription become clear. If only 1 to 2 percent of this RNA was encoding proteins, what was the rest for? Some of it, scientists knew, carried out crucial tasks such as turning genes on or off; a lot of the other functions had yet to be pinned down. Still, no one had imagined that three quarters of our DNA turns into RNA, let alone that so much of it could do anything useful.

He wants you to believe that almost of all of those transcripts are functional—that's the revolution that he's promoting.

Ball does mention that many of us were skeptical about function but he dismisses the criticism.

Now it looks like ENCODE was basically right. Dozens of other research groups, scoping out activity along the human genome, also have found that much of our DNA is churning out “noncoding” RNA. It doesn’t encode proteins, as mRNA does, but engages with other molecules to conduct some biochemical task. By 2020 the ENCODE project said it had identified around 37,600 noncoding genes—that is, DNA stretches with instructions for RNA molecules that do not code for proteins. That is almost twice as many as there are protein-coding genes. Other tallies vary widely, from around 18,000 to close to 96,000. There are still doubters, but there are also enthusiastic biologists such as Jeanne Lawrence and Lisa Hall of the University of Massachusetts Chan Medical School. In a 2024 commentary for the journal Science, the duo described these findings as part of an “RNA revolution.”

This is the heart of the argument. Philip Ball sides with a small number of scientists who claim that our cells produce tens of thousands of regulatory RNAs in spite of the fact that there's no evidence to support such a claim. Yes, it's true that there are many regulatory RNAs with well-defined functions (siRNAs, miRNAs, lncRNAs) but that doesn't mean that all transcripts have a biologically relevant function. The number of proven regulatory non-coding genes is far less than the number of protein-coding genes.

I think it's wrong to say that there are so many non-coding genes and to imply that this discovery counts as an "RNA revolution." He could easily have explained that this is a genuine controversy with reasonable people on both sides. He could have said that we have to wait for much more data on the possible function of transcripts before knowing whether they come from real genes or are just spurious junk RNA. He could have said that he really likes the idea that molecular biologists were stupid but that he can't prove it, yet.

He could have said all those things and been an accurate reporter. But he didn't. That's not his style.

Genomics

Genomics began in the 1990s when several genome sequencing projects were finished. The human genome project was underway but there were attempts to identify genes by isolating and sequencing cDNAs that were presumably derived from mRNAs. That attempt was a miserable failure because most of those "expressed sequence tags" (ESTs) came from junk RNA and not from mRNA.

Once the human genome project was finished, the ENCODE project was started in order to identify all the functional regions of the genome. The main characteristic of genomic science is to collect data on genomes, cells, or tissues without regard for whether the data tells us anything about specific genes. For example, ENCODE has mapped all transcripts and all transcription factor binding sites and that data serves as the starting point for more detailed analysis.

In the beginning (2007, 2012), the ENCODE researchers tended to attribute biologically relevant function to everything that they detected but following the extensive criticism they received back then they now refer to these RNAs and binding sites as "candidate" regulatory RNAs or "candidate" regulatory sites. [ENCODE and their current definition of "function"]

It's this confusion about the discoveries of genomics that prompted Sydney Brenner to say,

If one surveys the so-called ‘new way of doing biology’ that is omic science, it has several characteristics; it is based on high-throughput methods, on making observations on as much as possible at the same time, and on its reliance on technological improvements to enhance, improve and often automate many old methods. Thus arrays of oligonucleotide probes are used to measure mRNA expression rather than the old method of ‘dot blots’. I am all for these technological advances but what dismays me about omic science is its departure from the hypothesis-generating-experiment basis of scientific investigation. I have even heard claims that it will liberate us from the domination of hypothesis, that is, thinking, in biology.

This was published in an essay titled "Biochemistry strikes back" (Brenner, 2000). What Brenner meant was that it's okay to generate lots of data but in order to determine its significance you need to get down to basic biochemistry and show that a given RNA or a given binding site has a biologically relevant function. That's what he means by hypothesis-driven science as opposed to simple data collection.

Other scientists are more harsh and some of them refer to ENCODE-like genomics experiments as stamp collecting—that's not meant as a compliment.

Post-genomics

Philip Ball is fond of saying that we are now in the post-genomics era. I'm not sure I agree. Ball's version of post-genomics is to accept all of the functional claims of genomic scientists without bothering to confirm them by doing hard-core biochemistry. He assumes that almost all the transcripts and almost all the transcription factor binding sites are functional just because they exist. If he is correct, then this really is a revolution because no knowledgeable scientist thought that every gene needed to be regulated by several regulatory RNAs. I call this the "naive post-genomics" era but it's actually more like a naive acceptance of genomics results.

I'm waiting for the real post-genomics era—an era that I call "skeptical post-genomics." That will be a time when most scientists take Sydney Brenner's criticism to heart and start to look carefully for meaningful function in the genomics data. I wrote about this in the last chapter of my book "Zen and the Art of Coping with a Poorly-Designed Genome" (p. 302).

More than two decades have passed since Sydney Brenner published his essay “Biochemistry strikes back” where he warned us about the demise of biochemistry and the increasing emphasis on omics. If he were still alive I’m sure he would be disappointed that his worse case scenario—that genomics would supplant hypothesis-driven research—has come true. At the time, he expressed confidence that most of the flaws of omic science would vanish when scientists realize that their results have to be interpreted in an evolutionary framework—a framework that includes junk. That hasn’t happened.


1. This article was originally published with the title “The New Code of Life” in Scientific American Magazine Vol. 330 No. 6 (June 2024), p. 40

Brenner, S. (2000) Biochemistry strikes back. Trends in biochemical sciences 25:584. [doi: 10.1016/S0968-0004(00)01722-9]

10 comments :

Gregory Morgan said...

Ball is speaking at Stevens IT this week

Anonymous said...

Typo on date of Ball's Scientific American article - should be 2024, not 2014

Neil Taylor said...

I'm confused by this: "by 2012, knowledgeable scientists were well aware of the fact that 45% of the genome was genes (and therefore transcribed) and much of the rest of the genome was also transcribed but the transcripts were junk RNA".

45% of the genome was genes?

My understand is that a gene is a stretch of DNA which is transcribed to produce a functional product (which doesn't have to be a protein!). Am I understanding correctly that most of the 45% figure includes introns? Of the 45% of the genome which your describing as genes, what proportion produces a functional product?

Is it 1% of the genome produces functional proteins, about 10% may be transcribed and regulaory, and the remaining 34% though transcribed is junk? While among the 55% outside of what you've called genes, is there again a small portion which may be regulatory, even though it isn't transcribed, and the rest junk? Thanks for clarifying, the 45% of the genome producing genes has really confused me.

Larry Moran said...

@Neil Taylor

A gene is the entire sequence that's transcribed. It includes introns even though they are mostly junk. The reason why we say that the gene has to produce a functional product is to eliminate transcribed regions that do not result in production of a functional product.

Coding regions make up about 1% of the human genome but there's another 9% that's assumed to be functional. It includes regulatory sequences, origins of replication, telomeres, centromeres, and SARs. None of these are transcribed. It also includes non-coding genes that are transcribed.

The reason for emphasizing the size of genes in my post is to remind people that almost half of the genome is genes and that means about half of the genome will be transcribed frequently. It's to counter that false notion that stupid molecular biologists thought that only exons were transcribed.

People with a degree in physics (Philip Ball) may have been surprised to learn that introns were transcribed but that's his problem.

Larry Moran said...

@Gregory Morgan I found a notice that Philip Ball is giving a Zoom talk on Nov.6th sponsored by "The Center for Science Writings" at Stevens.

The title of the talk is "Beyond the Gene" and here's the blurb: "Acclaimed science writer Philip Ball will discuss his book “How Life Works: A User’s Guide to the New Biology,” which explores scientific challenges to gene-centric biology and argues for a bold new vision of life. Bestselling author Siddhartha Mukherjee, MD, says Ball’s book “has exciting implications for the future of biology. I could not put it down."

I assume the professors at the center for science writings think that Philip Ball is an example of a good science writer. That's sad and their students will suffer if they can't tell the difference between good and bad science writing. However, it's not surprising since the director of the center is John Horgan and he is a big fan of Philip Ball. You might recall that Horgan and I had a discussion about this on Facebook where Horgan said, "Ball is one of the most meticulous, precise science writers out there. He is the antithesis of hypey, "dumb-it-down" reporting. He is MUCH more credible than you are, Laurence."

Ted said...

Since non-experts find exons and introns unnecessarily difficult to remember, I propose calling them "genelets" and "junklets." Genes consists of genelets separated by junklets, but the junklets are much longer. Since the entire gene, including junklets, is transcribed, there is extensive transcription, but most of the gemone is still junk.

John Harshman said...

As long as we're coining terms, maybe it would be more exciting if you started talking about the junkome.

Graham Jones said...

A long time ago I saw an analogy between introns and adverts which interrupted news stories.

Nice idea John, but ...
https://uncommondescent.com/junk-dna/you-knew-this-had-to-happen-junkomics/


John Harshman said...

Graham, sounds like an interesting paper. Unfortunately, the link is broken. Do you know the actual reference? And does the real paper refer to "junkomics"?

Graham Jones said...

A bit of digging suggests the author was Pawan K. Dhar, but i don't know which paper.