I'm working on a chapter about pervasive transcription and how it relates to the junk DNA debate. I found a short review in Nature from 2002 so I decided to see how much progress we've made in the past 15 years.Most of our genome is transcribed at some time or another in some tissue. That's a fact we've known about since the late 1960s (King and Jukes, 1969). We didn't know it back then, but it turns out that a lot of that transcription is introns. In fact, the observation of abundant transcription led to the discovery of introns. We have about 20,000 protein-coding genes and the average gene is 37.2 kb in length. Thus, the total amount of the genome devoted to these genes is about 23%. That's the amount that's transcribed to produce primary transcripts and mRNA. There are about 5000 noncoding genes that contribute another 2% so genes occupy about 25% of our genome.
The old data from 50 years ago showed that more than 50% of the genome is transcribed. By 2002 there were studies indicating that this number could be even higher—perhaps as much as 80-90%. This is pervasive transcription although the term didn't become popular until ENCODE published their preliminary study in 2007 (Birney et al., 2007).
The question back then (2002) is the same question we ask today. What is all that RNA doing? Let's see how the question was answered in a review by Carina Dennis (Dennis, 2002).
One thing hasn't changed. The review begins with false hype about the Central Dogma. That's still common today even in top-notch scientific journals that should know better [The Central Dogma of Molecular Biology].
Biology's 'central dogma', laid down in the 1950s, states that genetic information flows from DNA to RNA to protein. Since then, numerous studies have shown that RNA does more than simply serve this intermediary function. But is it time to cast dogma aside and completely rethink the role of RNA? The answer is yes, according to a handful of geneticists. In multicellular organisms, they argue, the majority of RNA molecules are the principal actors in largely unexplored networks of gene regulation.The opening paragraph lays out the main claim of the adaptationist camp. They believe that most of these RNAs have a function of some sort and the most likely role is in regulation of gene expression. Then, as now, the main proponent of this idea is John Mattick.
Mattick believes the discovery of abundant transcripts overthrows the Central Dogma of Molecular Biology and forces scientists to finally recognize the importance of noncoding RNA. He's wrong about the Central Dogma and he's wrong when he thinks that knowledgeable scientists didn't know about noncoding RNAs. Mattick believes that all those RNAs are involved in exquisite control of gene expression in "higher" organisms.
Why would anyone believe such a thing? It because of the Deflated Ego Problem. Here's how it was explained in 2002.
It's an appealing idea, because comparisons of gene numbers don't seem to explain the difference between simple and complex organisms. We have only about two or three times as many protein-coding genes as the nematode Caenorhabditis elegans or the fruitfly Drosophila melanogaster, which, in turn, have only about twice as many as the yeast Saccharomyces cerevisiae.The "problem" was this. Humans are "higher" organisms. We are way more complex that the "lower" species so many scientists expected us to have lots more genes. They were flummoxed when it turned out that we don't have many more genes than other animals. Their view of human exceptionalism was threatened. Their egos were deflated.
Now, most Sandwalk readers know that they shouldn't have been surprised for two reasons.
- The approximate number of genes in the human genome had been known for a long time [False history and the number of genes: 2016].
- The discoveries of developmental biologists in the 1980s showed that you could easily get complexity and variation by changing the timing of gene expression. The main players were protein transcription factors. You didn't need new genes.
Messy businessThis is the same debate we're having today, 15 years later. There's plenty of evidence that 90% of of our genome is junk so most of pervasive transcription must be junk RNA with no biological function. That view is consistent with what we know about the biochemistry of transcription—it is messy—and the weak power of natural selection in complex species. But there are still many scientists who think we have several hundred thousand genes—most of them specifying regulatory RNAs.
Many researchers believe that the process of transcribing RNA from DNA is inherently messy. “My opinion is simply that transcription might be a noisy process, and that a lot of RNAs are made for no good reason,” says Jean-Michel Claverie of the Institute of Structural Biology and Microbiology in Marseille, part of the CNRS, France's national research agency. Sean Eddy, a computational biologist at Washington University in St Louis, Missouri, suspects that the cell rationalizes the costs of running a watertight operation against allowing some leakage — and leaky transcription wins out. “It's cheap to make transcripts,” he argues. “The cell can tolerate a high level of transcriptional slop.”
The controversy is promoted actively by science writers who, with few exceptions, want to believe that most of our genome is functional and we need to explain complexity. That's why they dismiss noise and messiness as an explanation of pervasive transcription. The buzzword today is "dark matter" as in "The majority of total nuclear-encoded non-ribosomal RNA in a human cell is' dark matter' un-annotated RNA" (Kapranov et al., 2010). Since 15 years of work has failed to prove most transcripts are functional, the fall-back strategy is to label it mysterious "dark matter" implying they must have functions but we just don't know what they are.
It's not dark matter and it's not functional. It's junk RNA produced by sloppy transcription. It may take another 15 years to convince everyone that we don't need thousands and thousand of new genes to generate complexity and that the vast majority of our genome is junk.
1. For an example of John Mattick's strange views on complexity, see: Genome Size, Complexity, and the C-Value Paradox. If you don't know about Dog's Ass Plots then this is a good time to learn!
Birney, E., Stamatoyannopoulos, J.A., Dutta, A., Guigó, R., Gingeras, T.R., Margulies, E.H., Weng, Z., Snyder, M., Dermitzakis, E.T., Thurman, R.E. et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799-816. [doi: 10.1038/nature05874]
Dennis, C. (2002) Gene regulation: The brave new world of RNA. Nature 418:122-124. [doi: 10.1038/418122a]
Kapranov, P., St Laurent, G., Raz, T., Ozsolak, F., Reynolds, C.P., Sorensen, P.H., Reaman, G., Milos, P., Arceci, R.J., and Thompson, J.F. (2010) The majority of total nuclear-encoded non-ribosomal RNA in a human cell is' dark matter'un-annotated RNA. BMC biology, 8:149. [doi: 10.1186/1741-7007-8-149]
King, J.L., and Jukes, T.H. (1969) Non-darwinian evolution. Science, 164:788-798. [PDF]