Ball's ideas are complicated and I won't go into all of them in this article. Instead, I want to focus on one of his more scientific claims; namely, the claim that genomic data has overthrown the fundamental principles of molecular biology. Let's look at his recent (May 14, 2024) article in Scientific American: Revolutionary Genetics Research Shows RNA May Rule Our Genome.1
The subtile of the article is "Scientists have recently discovered thousands of active RNA molecules that can control the human body" and that's the issue that I want to discuss here.
Ball begins with the same old myth that writers like him have been repeating for many years. He claims that before ENCODE most molecular biologists were really stupid. According to Philip Ball, most of us thought that coding DNA was the only functional part of the genome and most of the rest was junk DNA.
Making proteins was thought to be the genome’s primary job. Genes do this by putting manufacturing instructions into messenger molecules called mRNAs, which in turn travel to a cell’s protein-making machinery. As for the rest of the genome’s DNA? The “protein-coding regions,” [Thomas] Gingeras says, were supposedly “surrounded by oceans of biologically functionless sequences.” In other words, it was mostly junk DNA.
This is extremely misleading. Knowledgeable scientists knew that coding regions only took up about 1% of the genome but 10% is functional. That leaves plenty of room for regulatory sequences, non-coding genes, and other functional DNA elements.
Ball notes that the ENCODE papers published in 2012 showed that up to 75% of the genome is transcribed at some time or another. He claims that pervasive transcription was a surprise but even that's not true. The idea that a large fraction of the genome is transcribed has been common knowledge among the experts for more than 50 years. By the end of the 1970s they knew that protein-coding genes were huge because of large introns and we now know that these genes take up almost 40% of the genome. If you add in the known non-coding genes then genes cover about 45% of the genome so at least that much is regularly transcribed. Most of it is introns and introns are junk. In addition, lots of spurious transcripts arise from bits and pieces of viruses and transposons that litter the genome.
Furthermore, the idea of pervasive transcription was widely promoted in the first stage of ENCODE when they published a series of papers in 2007. There was a lot of discussion back then over whether most of those transcripts were junk. [The ENCODE publicity campaign of 2007] What this means is that by 2012, knowledgeable scientists were well aware of the fact that 45% of the genome was genes (and therefore transcribed) and much of the rest of the genome was also transcribed but the transcripts were junk RNA.
That's not exactly the story that Philip Ball wants to tell.
So it came as rather a shock when, in several 2012 papers in Nature, he and the rest of the ENCODE team reported that at one time or another, at least 75 percent of the genome gets transcribed into RNAs. The ENCODE work, using techniques that could map RNA activity happening along genome sections, had begun in 2003 and came up with preliminary results in 2007. But not until five years later did the extent of all this transcription become clear. If only 1 to 2 percent of this RNA was encoding proteins, what was the rest for? Some of it, scientists knew, carried out crucial tasks such as turning genes on or off; a lot of the other functions had yet to be pinned down. Still, no one had imagined that three quarters of our DNA turns into RNA, let alone that so much of it could do anything useful.
He wants you to believe that almost of all of those transcripts are functional—that's the revolution that he's promoting.
Ball does mention that many of us were skeptical about function but he dismisses the criticism.
Now it looks like ENCODE was basically right. Dozens of other research groups, scoping out activity along the human genome, also have found that much of our DNA is churning out “noncoding” RNA. It doesn’t encode proteins, as mRNA does, but engages with other molecules to conduct some biochemical task. By 2020 the ENCODE project said it had identified around 37,600 noncoding genes—that is, DNA stretches with instructions for RNA molecules that do not code for proteins. That is almost twice as many as there are protein-coding genes. Other tallies vary widely, from around 18,000 to close to 96,000. There are still doubters, but there are also enthusiastic biologists such as Jeanne Lawrence and Lisa Hall of the University of Massachusetts Chan Medical School. In a 2024 commentary for the journal Science, the duo described these findings as part of an “RNA revolution.”
This is the heart of the argument. Philip Ball sides with a small number of scientists who claim that our cells produce tens of thousands of regulatory RNAs in spite of the fact that there's no evidence to support such a claim. Yes, it's true that there are many regulatory RNAs with well-defined functions (siRNAs, miRNAs, lncRNAs) but that doesn't mean that all transcripts have a biologically relevant function. The number of proven regulatory non-coding genes is far less than the number of protein-coding genes.
I think it's wrong to say that there are so many non-coding genes and to imply that this discovery counts as an "RNA revolution." He could easily have explained that this is a genuine controversy with reasonable people on both sides. He could have said that we have to wait for much more data on the possible function of transcripts before knowing whether they come from real genes or are just spurious junk RNA. He could have said that he really likes the idea that molecular biologists were stupid but that he can't prove it, yet.
He could have said all those things and been an accurate reporter. But he didn't. That's not his style.
Genomics
Genomics began in the 1990s when several genome sequencing projects were finished. The human genome project was underway but there were attempts to identify genes by isolating and sequencing cDNAs that were presumably derived from mRNAs. That attempt was a miserable failure because most of those "expressed sequence tags" (ESTs) came from junk RNA and not from mRNA.
Once the human genome project was finished, the ENCODE project was started in order to identify all the functional regions of the genome. The main characteristic of genomic science is to collect data on genomes, cells, or tissues without regard for whether the data tells us anything about specific genes. For example, ENCODE has mapped all transcripts and all transcription factor binding sites and that data serves as the starting point for more detailed analysis.
In the beginning (2007, 2012), the ENCODE researchers tended to attribute biologically relevant function to everything that they detected but following the extensive criticism they received back then they now refer to these RNAs and binding sites as "candidate" regulatory RNAs or "candidate" regulatory sites. [ENCODE and their current definition of "function"]
It's this confusion about the discoveries of genomics that prompted Sydney Brenner to say,
If one surveys the so-called ‘new way of doing biology’ that is omic science, it has several characteristics; it is based on high-throughput methods, on making observations on as much as possible at the same time, and on its reliance on technological improvements to enhance, improve and often automate many old methods. Thus arrays of oligonucleotide probes are used to measure mRNA expression rather than the old method of ‘dot blots’. I am all for these technological advances but what dismays me about omic science is its departure from the hypothesis-generating-experiment basis of scientific investigation. I have even heard claims that it will liberate us from the domination of hypothesis, that is, thinking, in biology.
This was published in an essay titled "Biochemistry strikes back" (Brenner, 2000). What Brenner meant was that it's okay to generate lots of data but in order to determine its significance you need to get down to basic biochemistry and show that a given RNA or a given binding site has a biologically relevant function. That's what he means by hypothesis-driven science as opposed to simple data collection.
Other scientists are more harsh and some of them refer to ENCODE-like genomics experiments as stamp collecting—that's not meant as a compliment.
Post-genomics
Philip Ball is fond of saying that we are now in the post-genomics era. I'm not sure I agree. Ball's version of post-genomics is to accept all of the functional claims of genomic scientists without bothering to confirm them by doing hard-core biochemistry. He assumes that almost all the transcripts and almost all the transcription factor binding sites are functional just because they exist. If he is correct, then this really is a revolution because no knowledgeable scientist thought that every gene needed to be regulated by several regulatory RNAs. I call this the "naive post-genomics" era but it's actually more like a naive acceptance of genomics results.
I'm waiting for the real post-genomics era—an era that I call "skeptical post-genomics." That will be a time when most scientists take Sydney Brenner's criticism to heart and start to look carefully for meaningful function in the genomics data. I wrote about this in the last chapter of my book "Zen and the Art of Coping with a Poorly-Designed Genome" (p. 302).
More than two decades have passed since Sydney Brenner published his essay “Biochemistry strikes back” where he warned us about the demise of biochemistry and the increasing emphasis on omics. If he were still alive I’m sure he would be disappointed that his worse case scenario—that genomics would supplant hypothesis-driven research—has come true. At the time, he expressed confidence that most of the flaws of omic science would vanish when scientists realize that their results have to be interpreted in an evolutionary framework—a framework that includes junk. That hasn’t happened.
1. This article was originally published with the title “The New Code of Life” in Scientific American Magazine Vol. 330 No. 6 (June 2024), p. 40
Brenner, S. (2000) Biochemistry strikes back. Trends in biochemical sciences 25:584. [doi: 10.1016/S0968-0004(00)01722-9]
20 comments :
Ball is speaking at Stevens IT this week
Typo on date of Ball's Scientific American article - should be 2024, not 2014
I'm confused by this: "by 2012, knowledgeable scientists were well aware of the fact that 45% of the genome was genes (and therefore transcribed) and much of the rest of the genome was also transcribed but the transcripts were junk RNA".
45% of the genome was genes?
My understand is that a gene is a stretch of DNA which is transcribed to produce a functional product (which doesn't have to be a protein!). Am I understanding correctly that most of the 45% figure includes introns? Of the 45% of the genome which your describing as genes, what proportion produces a functional product?
Is it 1% of the genome produces functional proteins, about 10% may be transcribed and regulaory, and the remaining 34% though transcribed is junk? While among the 55% outside of what you've called genes, is there again a small portion which may be regulatory, even though it isn't transcribed, and the rest junk? Thanks for clarifying, the 45% of the genome producing genes has really confused me.
@Neil Taylor
A gene is the entire sequence that's transcribed. It includes introns even though they are mostly junk. The reason why we say that the gene has to produce a functional product is to eliminate transcribed regions that do not result in production of a functional product.
Coding regions make up about 1% of the human genome but there's another 9% that's assumed to be functional. It includes regulatory sequences, origins of replication, telomeres, centromeres, and SARs. None of these are transcribed. It also includes non-coding genes that are transcribed.
The reason for emphasizing the size of genes in my post is to remind people that almost half of the genome is genes and that means about half of the genome will be transcribed frequently. It's to counter that false notion that stupid molecular biologists thought that only exons were transcribed.
People with a degree in physics (Philip Ball) may have been surprised to learn that introns were transcribed but that's his problem.
@Gregory Morgan I found a notice that Philip Ball is giving a Zoom talk on Nov.6th sponsored by "The Center for Science Writings" at Stevens.
The title of the talk is "Beyond the Gene" and here's the blurb: "Acclaimed science writer Philip Ball will discuss his book “How Life Works: A User’s Guide to the New Biology,” which explores scientific challenges to gene-centric biology and argues for a bold new vision of life. Bestselling author Siddhartha Mukherjee, MD, says Ball’s book “has exciting implications for the future of biology. I could not put it down."
I assume the professors at the center for science writings think that Philip Ball is an example of a good science writer. That's sad and their students will suffer if they can't tell the difference between good and bad science writing. However, it's not surprising since the director of the center is John Horgan and he is a big fan of Philip Ball. You might recall that Horgan and I had a discussion about this on Facebook where Horgan said, "Ball is one of the most meticulous, precise science writers out there. He is the antithesis of hypey, "dumb-it-down" reporting. He is MUCH more credible than you are, Laurence."
Since non-experts find exons and introns unnecessarily difficult to remember, I propose calling them "genelets" and "junklets." Genes consists of genelets separated by junklets, but the junklets are much longer. Since the entire gene, including junklets, is transcribed, there is extensive transcription, but most of the gemone is still junk.
As long as we're coining terms, maybe it would be more exciting if you started talking about the junkome.
A long time ago I saw an analogy between introns and adverts which interrupted news stories.
Nice idea John, but ...
https://uncommondescent.com/junk-dna/you-knew-this-had-to-happen-junkomics/
Graham, sounds like an interesting paper. Unfortunately, the link is broken. Do you know the actual reference? And does the real paper refer to "junkomics"?
A bit of digging suggests the author was Pawan K. Dhar, but i don't know which paper.
I’m sorry to say that Larry’s commentary here is dismayingly inaccurate.
Let’s get this one out of the way first:
“He claims that before ENCODE most molecular biologists were really stupid.”
I have never made this claim and never would – it is a pure fabrication on Larry’s part. I guess this is what John Horgan meant in his comment to Larry: credible writers don’t just make up stuff.
Which brings us to this:
“[Ball] wants you to believe that almost of all of those transcripts [of 75% of the genome] are functional—that's the revolution that he's promoting.”
This too is sheer fabrication. I don’t say this in my article, nor in my book. Instead, I say pretty much what Larry seems to want me to say, but for some reason he will not admit it – which is that there is controversy about how many of the transcripts are functional. For example (from my article):
“Yet only a few of these [noncoding transcripts] have been shown to have specific functions, and how many of them really do remains an open question. “I personally don’t think all of those RNAs have an individual role,” Lawrence says…
“Brennecke advises caution about current high estimates of the number of noncoding genes. Al¬¬though he agrees that such genes “have been underappreciated for a long time,” he says we should not leap to assuming that all lncRNAs have functions. Many of them are transcribed only at low levels, which is what one would expect if indeed they were just random noise. Geneticist Adrian Bird of the University of Edinburgh points out that the abundance of the vast majority of ncRNAs seems to be well below one molecule per cell. “It is difficult to see how essential functions can be exerted by an ncRNA if it is absent in most cells,” he says…
“Some in the ENCODE team now agree that not all of the 75 percent or so of human genome transcription might be functionally significant.”
I also point out that “At its roots, the controversy over noncoding RNA is partly about what qualifies a molecule as “functional.””
I hope this is enough to show that Larry has constructed a straw man that is not based on my words. I will not speculate about why, but I hope readers will see how wholly unreliable his commentary is.
So, just to recap: “Ball's version of post-genomics is to accept all of the functional claims of genomic scientists without bothering to confirm them by doing hard-core biochemistry. He assumes that almost all the transcripts and almost all the transcription factor binding sites are functional just because they exist.” This is simply false, as anyone can easily verify by reading what I wrote. Larry is on a crusade, I understand that – but it is deeply unprofessional to make stuff up in order to pursue it.
One final point: it is rather bizarre not even to mention that my article discusses the work on microRNAs that have just won this year’s Nobel, and that I interviewed and quoted one of the laureates. Perhaps these facts caused Larry some discomfort, but I’m only speculating there.
Thanks for your clarifications. Just out of curiosity, what fraction of the human genome do you think is junk DNA (and by that I mean nonfunctional)?
And just as importantly, what do you make of the case for junk DNA overall, such as the arguments from genetic load, most DNA constituting what appears to be degraded remnants of ancient transposons and viruses, interspecies genome size variation, etc?
Not entirely sure if Mikkel's question is for me, but I'm happy to answer. 10% of the genome being functional seems fairly uncontroversial. A member of ENCODE has suggested to me that a figure of 30% seems plausible. (Larry claims to have read my book and so he presumably knows this, but has chosen to suppress it.) Those feel to me like good lower and upper limits at this point. Of course, as my article says, a lot rests on the definition of funtional. It seems inevitable that there is a lot of our DNA for which it would be hard to ascribe a meaningful function.
Thanks all for engaging in discussion. As a lay person it is fascinating.
Dr Ball, my impression is you are saying that a new paradigm is needed, while my understanding is that Prof Moran sees the current understanding of biochemistry developed by the likes of Crick and the modern synthesis incorporating neutral and nearly neutral drift with natural selection and population genetics as having sufficient explanatory power and hence doesn't need adding to.
He would rather not only the general public but also many practitioners could better understand the current paradigm and not abandon it for ideas which do not add anything particularly new.
This does have very real consequences because billions of pounds dollars etc are being spent in grant applications making over extravagant claims about new paradigms when the reality is likely to be false positive after false positive as spurious transcription etc is confused with biologically and medically relevant functionality.
Dr Ball what is the key point your new paradigm explains which is useful for working scientists beyond the "conventional" view expressed by Prof Moran etc?
@Philip Ball Your book has a very poor index - there's no entry for "junk DNA" so I had to flip through the pages to find where you discuss junk DNA. The relevant passage seems to be in the section on "Mapping the Transcriptome" (pp. 122-127) where you discuss the 2012 ENCODE results suggesting that they threatened "cherished ideas."
You point out that the ENCODE researchers claimed that 80% of the human genome is transcribed suggesting that 80% must have some biochemical function. You then spend several pages mentioning the dispute over function while emphasizing that "ENCODE was arguably at least as important as the Human Genome Project" (p. 125) because it demonstrated that non-coding genes were a big discovery that challenged the old way of thinking. Near the end of that section (p. 126) you say, "It's probably not 80 percent - ENCODE member Bradley Bernstein guesses that 30 percent might be a more realistic figure. But even then, he says, 'there is a huge amount of regulatory DNA [and thus RNA] controlling the 20,000 protein-coding genes - way more than coding DNA.'"
Now you are saying that you accept 70% junk DNA as the lower limit and you are comfortable with 90% junk DNA as the upper limit. That means you are comfortable with the idea that the ENCODE researchers were very wrong (70% junk) or spectacularly wrong (90% junk) when they announced the demise of junk DNA in 2012. It means that the ENCODE critics (I am one) were mostly correct when they challenged the ENCODE claims the day after they were published twelve years ago. It means that the smart, knowledgeable, molecular biologists of the 1970s may have been right when they said that 90% of our genome is junk and only 2% coded for protein meaning that another 8% contains non-coding genes and other functional elements such as regulatory sequences. Don't you think that deserved a mention in your books and articles, especially since some of those smart molecular biologists are Nobel Laureates?
Since you admit that the ENCODE researchers were wrong about the importance of pervasive transcription, that raises a serious question about how to define function. You never engage that issue by explaining how you decide whether a given transcript has a function or not. Why not? Why not discuss all of the important issues that have been raised in the scientific literature? Why don't you back up your claim that our genome may contain as many as 37,000 lncRNA genes (p. 125)?
Now let's look at the section on "RNA for RNA's Sake" beginning on page 115 where you say ...
"The lac operon is often presented as the paradigm of gene regulation. But while the regulatory element here is a protein (Lac1), gene regulation is more often orchestrated by small RNA molecules. In other words, these RNAs are not made as a means to another end (the synthesis of a protein) but have another function in their own right. They are said to be 'noncoding' because they don't encode protein molecules."
"When they were first observed [in the 1960s? LAM], noncoding RNAs were thought to be transcriptional errors: derived from bits of meaningless DNA transcribed by mistake. As is so often the case in science, the error is revealing. There was absolutely no a priori reason to think that cells would do something like this accidently; the only real reason to dismiss such an observation out of hand is that it doesn't fit with preconceptions. While we assumed that the only bits of DNA that mattered are the protein-coding genes, what else could noncoding RNA be but a mistake?"
How do these statements jibe with the fact that later on in the chapter you admit that most of the ENCODE transcripts are junk? And why do you keep misrepresenting history by pretending that the molecular biologists of the 70s, 80s, and 90s knew nothing about functional non-coding RNAs? And why do you say that "there was absolutely no a priori reason" to postulate junk RNA as accidents of transcription when there were, in fact, lots of excellent reasons for such criticisms and, furthermore, you accept them later on?
No Larry, this is absolutely not where we start from. I have pointed out that you have fabricated statements by me, and you have now neither backed up those claims nor retracted them. I think you owe it to your readers to show them whether or not you are arguing in good faith by addressing that issue.
You are also now apparently implying that you had not actually previously read my book, despite making all kinds of disparaging remarks about it.
In any event, you know that even if 30% of the genome is functional, that provides plenty of space for a huge number of nc genes.
I think you are being disingenuous when you challenge the claim that our genome might have as many as 37,000 lncRNA genes. You surely know very well that figures like this (and larger ones), estimated from a variety of methods, can be found in papers like this one:
https://pmc.ncbi.nlm.nih.gov/articles/PMC10055485/
I’m sure you disagree with those figures, and that’s fine. But you should not be implying that these numbers are made up.
I do not say that molecular biologists of the 70s-90s knew nothing about functional non-coding RNA. Of course they knew something about it, since they knew about rRNA and tRNA for example. But any reader who cares to look at the Google Ngram of “non-coding RNA” will get a clear sense of how perceptions about its significance have changed in the past two decades. And they should rest assured that that very informal measure is very much in line with the way and the extent it has been discussed in the scientific literature, whatever Larry says.
It is another fabrication to say that I never engage with the issue of how one defines function. That question is very clearly discussed both in my book and in my Scientific American article. Truly, it's hard to engage in this discussion at all when you just keep making stuff up.
Neil Taylor: thank you for asking this. It's actually a false dichotomy though. All I am doing is pointing out how our understanding of molecular and cell biology has continued to advance since the time of Crick et al. These advances have, in my view, been remarkable and exciting, and they make this a very vibrant time for biology. And of course it's natural that such advances in understanding change some of the ideas that came before - that's normal in science. My book explains in detail what some of these changes are. I find it peculiar that Larry seems reluctant to accept that anything significant has been discovered in these fields since the 1980s - his position seems to me to be to say "we already knew all this" (or perhaps "I already knew all this") to whatever comes along. That seems a shame.
@Philip Ball I have good reasons for being skeptical about ENCODE's claim of 37,000 lncRNA genes. It's based on all the reasons listed in the Ponting & Haerty (2022) paper that you have read. Most of those reasons were also listed in the earlier paper by Palazzo and Lee (2015), I cover them in: How many lncRNAs are functional: can sequence comparisons tell us the answer?. You may also have read he detailed discussion of this topic in Chapter 8 of my book.
I also understand the concept of the null hypothesis, do you? If you claim that something has a function (e.g. 37,000 lncRNA genes) then the onus is on you to provide evidence for that claim. What evidence did ENCODE use in order to convince you that they had identified 37,000 RNAs having a biologically significant function? Did you discuss that evidence in any of your recent publications or are you just accepting ENCODE's word? Your faith in the ENCODE researchers doesn't seem justified given their track record.
@Philip Ball says, "You are also now apparently implying that you had not actually previously read my book, despite making all kinds of disparaging remarks about it."
B.S. Just because I had to go back and find the specific sentence where you mentioned that "only" 30% may be junk according to Bradley Bernstein does not mean that I hadn't previously read your book.
Have you memorized the page numbers in my book where I refute these claims? Have you even read my book? If so, why didn't you quote all the famous scientists who think that 90% of our genome is junk?
The book, subtitled A User’s Guide to the New Biology, is written by a journalist and, predictably, doesn’t offer much new information, barely reaching an undergraduate level. Nevertheless, it makes a rather substantial claim. It’s no surprise: hype, along with the quest for authority and influence, is a driving force in much of journalism.
Post a Comment