More Recent Comments

Showing posts sorted by relevance for query mattick. Sort by date Show all posts
Showing posts sorted by relevance for query mattick. Sort by date Show all posts

Sunday, March 03, 2024

Nils Walter disputes junk DNA: (5) What does the number of transcripts per cell tell us about function?

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the fifth post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' The fourth post makes the case that differing views on junk DNA are mainly due to philosophical disagreements.

-Nils Walter disputes junk DNA: (1) The surprise

-Nils Walter disputes junk DNA: (2) The paradigm shaft

-Nils Walter disputes junk DNA: (3) Defining 'gene' and 'function'

-Nils Walter disputes junk DNA: (4) Different views of non-functional transcripts

Transcripts vs junk DNA

The most important issue, according to Nils Walter, is whether the human genome contains huge numbers of genes for lncRNAs and other types of regulatory RNAs. He doesn't give us any indication of how many of these potential genes he thinks exist or what percentage of the genome they cover. This is important since he's arguing against junk DNA but we don't know how much junk he's willing to accept.

There are several hundred thousand transcripts in the RNA databases. Most of them are identified as lncRNAs because they are bigger than 200 bp. Let's assume, for the sake of argument, that 200,000 of these transcripts have a biologically relevant function and therefore there are 200,000 non-coding genes. A typical size might be 1000 bp so these genes would take up about 6.5% of the genome. That's about 10 times the number of protein-coding genes and more than 6 times the amount of coding DNA.

That's not going to make much of a difference in the junk DNA debate since proponents of junk DNA argue that 90% of the genome is junk and 10% is functional. All of those non-coding genes can be accommodated within the 10%.

The ENCODE researchers made a big deal out of pervasive transcription back in 2007 and again in 2012. We can quibble about the exact numbers but let's say that 80% of the human is transcribed. We know that protein-coding genes occupy at least 40% percent of the genome so much of this pervasive transcription is introns. If all of the presumptive regulatory genes are located in the remaining 40% (i.e. none in introns), and the average size is 1000 bp, then this could be about 1.24 million non-coding genes. Is this reasonable? Is this what Nils Walter is proposing?

I think there's some confusion about the difference between large numbers of functional transcripts and the bigger picture of how much total junk DNA there is in the human genome. I wish the opponents of junk DNA would commit to how much of the genome they think is functional and what evidence they have to support that position.

But they don't. So instead we're stuck with debates about how to decide whether some transcripts are functional or junk.

What does transcript concentration tell us about function?

If most detectable transcripts are due to spurious transcription of junk DNA then you would expect these transcripts to be present at very low levels. This turns out to be true as Nils Walter admits. He notes that "fewer than 1000 lncRNAs are present at greater than one copy per cell."

This is a problem for those who advocate that many of these low abundance transcripts must be functional. We are familiar with several of the ad hoc hypotheses that have been advanced to get around this problem. John Mattick has been promoting them for years [John Mattick's new paradigm shaft].

Walter advances two of these excuses. First, he says that a critical RNA may be present at an average of one molecule per cell but it might be abundant in just one specialized cell in the tissue. Furthermore, their expression might be transient so they can only be detected at certain times during development and we might not have assayed cells at the right time. I assume he's advocating that there might be a short burst of a large number of these extremely specialized regulatory RNAs in these special cells.

As far as I know, there aren't many examples of such specialized gene expression. You would need at least 100,000 examples in order to make a viable case for function.

His second argument is that many regulatory RNAs are restricted to the nucleus where they only need to bind to one regulatory sequence to carry out their function. This ignores the mass action laws that govern such interactions. If you apply the same reasoning to proteins then you would only need one lac repressor protein to shut down the lac operon in E. coli but we've known for 50 years that this doesn't work in spite of the fact that the lac repressor association constant shows that it is one of the tightest binding proteins known [DNA Binding Proteins]. This is covered in my biochemistry textbook on pages 650-651.1

If you apply the same reasoning to mammalian regulatory proteins then it turns out that you need 10,000 transcription factor molecules per nucleus in order to ensure that a few specific sites are occupied. That's not only because of the chemistry of binary interactions but also because the human genome is full of spurious sites that resemble the target regulatory sequence [The Specificity of DNA Binding Proteins]. I cover this in my book in Chapter 8: "Noncoding Genes and Junk RNA" in the section titled "On the important properties of DNA-binding proteins" (pp. 200-204). I use the estrogen receptor as an example based on calculations that were done in the mid-1970s. The same principles apply to regulatory RNAs.

This is a disagreement based entirely on biochemistry and molecular biology. There aren't enough examples (evidence) to make the first argument convincing and the second argument makes no sense in light of what we know about the interactions between molecules inside of the cell (or nucleus).

Note: I can almost excuse the fact that Nils Walter ignores my book on junk DNA, my biochemistry textbook, and my blog posts, but I can't excuse the fact that his main arguments have been challenged repeatedly in the scientific literature. A good scientist should go out of their way to seek out objections to their views and address them directly.


1. In addition to the thermodynamic (equilibrium) problem, there's a kinetic problem. DNA binding proteins can find their binding sites relatively quickly by one dimensional diffusion—an option that's not readily available to regulatory RNAs [Slip Slidin' Along - How DNA Binding Proteins Find Their Target].

Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]

Thursday, July 28, 2016

False history and the number of genes: 2016

There's an article about junk DNA in the latest issue of New Scientist. The title is: You are junk: Why it’s not your genes that make you human. The author is Colin Barras, a science writer from Michigan with a Ph.D. in paleontology.

He begins with .....
IT WAS a discovery that threatened to overturn everything we thought about what makes us human. At the dawn of the new millennium, two rival teams were vying to be the first to sequence the human genome. Their findings, published in February 2001, made headlines around the world. Back-of-the-envelope calculations had suggested that to account for the sheer complexity of human biology, our genome should contain roughly 100,000 genes. The estimate was wildly off. Both groups put the actual figure at around 30,000. We now think it is even fewer – just 20,000 or so.

"It was a massive shock," says geneticist John Mattick. "That number is tiny. It’s effectively the same as a microscopic worm that has just 1000 cells."

Saturday, April 07, 2018

Required reading for the junk DNA debate

This is a list of scientific papers on junk DNA that you need to read (and understand) in order to participate in the junk DNA debate. It's not a comprehensive list because it's mostly papers that defend junk DNA and refute arguments for massive amounts of function. The only exception is the paper by Mattick and Dinger (2013).1 It's the only anti-junk paper that attempts to deal with the main evidence for junk DNA. If you know of any other papers that make a good case against junk DNA then I'd be happy to include them in the list.

If you come across a publication that argues against junk DNA, then you should immediately check the reference list. If you do not see some of these references in the list, then don't bother reading the paper because you know the author is not knowledgeable about the subject.

Brenner, S. (1998) Refuge of spandrels. Current Biology, 8:R669-R669. [PDF]

Brunet, T.D., and Doolittle, W.F. (2014) Getting “function” right. Proceedings of the National Academy of Sciences, 111:E3365-E3365. [doi: 10.1073/pnas.1409762111]

Casane, D., Fumey, J., et Laurenti, P. (2015) L’apophénie d’ENCODE ou Pangloss examine le génome humain. Med. Sci. (Paris) 31: 680-686. [doi: 10.1051/medsci/20153106023] [The apophenia of ENCODE or Pangloss looks at the human genome]

Cavalier-Smith, T. (1978) Nuclear volume control by nucleoskeletal DNA, selection for cell volume and cell growth rate, and the solution of the DNA C-value paradox. Journal of Cell Science, 34(1), 247-278. [doi: PDF]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) published online March 11, 2013. [PubMed] [doi: 10.1073/pnas.1221376110]

Doolittle, W.F., Brunet, T.D., Linquist, S., and Gregory, T.R. (2014) Distinguishing between “function” and “effect” in genome biology. Genome biology and evolution 6, 1234-1237. [doi: 10.1093/gbe/evu098]

Doolittle, W.F., and Brunet, T.D. (2017) On causal roles and selected effects: our genome is mostly junk. BMC biology, 15:116. [doi: 10.1186/s12915-017-0460-9]

Eddy, S.R. (2012) The C-value paradox, junk DNA and ENCODE. Current Biology, 22:R898. [doi: 10.1016/j.cub.2012.10.002]

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology, 23:R259-R261. [10.1016/j.cub.2013.03.023]

Graur, D. (2017) Rubbish DNA: The functionless fraction of the human genome Evolution of the Human Genome I (pp. 19-60): Springer. [doi: 10.1007/978-4-431-56603-8_2 (book)] [PDF]

Graur, D. (2017) An upper limit on the functional fraction of the human genome. Genome Biology and Evolution, 9:1880-1885. [doi: 10.1093/gbe/evx121]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A., and Elhaik, E. (2013) On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution published online: February 20, 2013 [doi: 10.1093/gbe/evt028

Graur, D., Zheng, Y., and Azevedo, R.B. (2015) An evolutionary classification of genomic function. Genome Biology and Evolution, 7:642-645. [doi: 10.1093/gbe/evv021]

Gregory, T. R. (2005) Synergy between sequence and size in large-scale genomics. Nature Reviews Genetics, 6:699-708. [doi: 10.1038/nrg1674]

Haerty, W., and Ponting, C.P. (2014) No Gene in the Genome Makes Sense Except in the Light of Evolution. Annual review of genomics and human genetics, 15:71-92. [doi:10.1146/annurev-genom-090413-025621]

Hurst, L.D. (2013) Open questions: A logic (or lack thereof) of genome organization. BMC biology, 11:58. [doi:10.1186/1741-7007-11-58]

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G. E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]

Mattick, J. S., and Dinger, M. E. (2013) The extent of functionality in the human genome. The HUGO Journal, 7:2. [doi: 10.1186/1877-6566-7-2]

Five Things You Should Know if You Want to Participate in the Junk DNA DebateMorange, M. (2014) Genome as a Multipurpose Structure Built by Evolution. Perspectives in biology and medicine, 57:162-171. [doi: 10.1353/pbm.2014.000]

Niu, D. K., and Jiang, L. (2012) Can ENCODE tell us how much junk DNA we carry in our genome?. Biochemical and biophysical research communications 430:1340-1343. [doi: 10.1016/j.bbrc.2012.12.074]

Ohno, S. (1972) An argument for the genetic simplicity of man and other mammals. Journal of Human Evolution, 1:651-662. [doi: 10.1016/0047-2484(72)90011-5]

Ohno, S. (1972) So much "junk" in our genome. In H. H. Smith (Ed.), Evolution of genetic systems (Vol. 23, pp. 366-370): Brookhaven symposia in biology.

Palazzo, A.F., and Gregory, T.R. (2014) The Case for Junk DNA. PLoS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]

Rands, C. M., Meader, S., Ponting, C. P., and Lunter, G. (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLOS Genetics, 10:e1004525. [doi: 10.1371/journal.pgen.1004525]

Thomas Jr, C.A. (1971) The genetic organization of chromosomes. Annual review of genetics, 5:237-256. [doi: annurev.ge.05.120171.001321]


1. The paper by Kellis et al. (2014) is ambiguous. It's clear that most of the ENCODE authors are still opposed to junk DNA even though the paper is mostly a retraction of their original claim that 80% of the genome is functional.

Saturday, October 13, 2018

The great junk DNA debate


I've been talking to philosophers lately about the true state of the junk DNA controversy. I imagine what it would be like to stage a great debate on the topic. It's easy to come up with names for the pro-junk side; Dan Graur, Ford Doolittle, Sean Eddy, Ryan Gregory etc. It's hard to think of any experts who could defend the idea that most of our genome is functional. The only scientist I can think of who would accept such a challenge is John Mattick but let's imagine that he could find three others to join him in the great debate.

I claim that the debate would be a rout for the pro-junk side. The data and the theories are all on the side of those who would argue that 90% of our genome is junk. I don't think the functionalists could possibly defend the idea that most of our genome is functional. What do you think?

Assuming that I'm right, why is it that the average scientist doesn't know this? Why do they still believe there's a good case for function when none of the arguments stand up to close scrutiny? And why are philosophers not conveying the true state of the controversy to their readers? I'm told that anti-junk philosophers like Evelyn Fox Keller are held in high regard even though her arguments are easy to refute [When philosophers talk about genomes]. I'm told that John Mattick is highly respected in philosophy circles even though knowledgeable scientists have little use for his writings.

Can readers help me identify papers by philosophers of science that come down on the side of junk DNA and conclude that experts like Graur, Doolittle, etc are almost certainly correct?


Image Credit: The cartoon is by Tom Gauld and it was published online at the The New York Times Magazine website. I hope they will consider it fair use on an educational blog. See: Junk DNA comments in the New York Times Magazine.

Wednesday, July 08, 2009

Junk DNA and the Scientific Literature

 
A discussion about junk DNA has broken out in the comments to Monday's Molecule #128: Winners.

Charlie Wagner, an old talk.origins fan, wonders why junk DNA advocates are still around given that there have been several recent papers questioning the idea that most of our genome is junk.

Charlie asks ...
So why are Larry and many others still clinging to the myth of "junk DNA"? Do they not read the literature?
Of course we read the literature, Charlie, but unlike you we read all of the literature. You can't just pick out the papers that support your position and assume that the question has been settled.

The skill in reading the scientific literature is to put things into perspective and maintain a certain degree of skepticism. It's just not true that everything published in scientific journals is correct. An important part of science is challenging the consensus and many scientists try to make their reputation by coming up with interpretations that break new ground. The success of science depends on the few that are correct but let's not forget that most of them turn out to be wrong.

THEME

Genomes & Junk DNA
The trick is to recognize the new ideas that may be on to something and ignore those that aren't. This isn't easy but experienced scientists have a pretty good track record. Inexperienced scientists may not be able to distinguish between legitimate challenges to dogma and ones that are frivolous. The problem is even more severe for non-scientists and journalists. They are much more likely to be sucked in by the claims in the latest paper—especially if it's published in a high profile journal.

Lots of scientists don't like the idea of junk DNA because it doesn't fit into their view of how evolution works. They gleefully announce the demise of junk DNA whenever another little bit of noncoding DNA is discovered to have a function. They also attach undue significance to recent studies showing that a large part of mammalian genomes are transcribed at one time or another in spite of the fact that this phenomenon has been known for decades and is perfectly consistent with what we know about spurious transcription.

I've addressed many of the specific papers in previous postings. You can review my previous postings by clicking on the Theme Box URL. The bottom line is "don't trust everything you read in the recent scientific literature."

Another good rule of thumb is never trust any paper that doesn't give you a fair and accurate summary of the "dogma" they are opposing. When you challenge the concept of junk DNA, for example, it's not good enough to just present a piece of new evidence that may not fit the current "dogma." You also have to deal with all the evidence that was used to create the consensus view in the first place and show how it can be better explained by your new model. A good place to start is The Onion Test.


The figure is from Mattick (2007), an excellent example of what I'm talking about. This is a paper attacking the current consensus on junk DNA but in doing so it uses a figure that reveals an astonishing lack of understanding of genomes. This makes everything else in paper suspect. The figure was chosen by Ryan Gregory to be the classic example of a Dog's Ass Plot.

Mattick, J.S. (2004) The hidden genetic program of complex organisms. Sci Am. 291:60-67.

Monday, September 11, 2017

What's in Your Genome?: Chapter 4: Pervasive Transcription (revised)

I'm working (slowly) on a book called What's in Your Genome?: 90% of your genome is junk! The first chapter is an introduction to genomes and DNA [What's in Your Genome? Chapter 1: Introducing Genomes ]. Chapter 2 is an overview of the human genome. It's a summary of known functional sequences and known junk DNA [What's in Your Genome? Chapter 2: The Big Picture]. Chapter 3 defines "genes" and describes protein-coding genes and alternative splicing [What's in Your Genome? Chapter 3: What Is a Gene?].

Chapter 4 is all about pervasive transcription and genes for functional noncoding RNAs. I've finally got a respectable draft of this chapter. This is an updated summary—the first version is at: What's in Your Genome? Chapter 4: Pervasive Transcription.

Wednesday, August 17, 2011

Don Johnson


Don Johnson has written a book that I'm probably going to have to buy (and read) if I ever hope to understand Intelligent Design Creationism.

Who is Don Johnson? Here's what it said on Uncommon Descent a few months ago [Why one scientist checked out of Darwinism].
The author worked for ten years as a Senior Research Scientist in the medical and scientific instrument field. The complexity of life came to the forefront during continued research, especially when his research group was involved with recombinant DNA during the late 1970′s. … After several years as an independent consultant in laboratory automation an other computer fields, he began a 20-year career in university teaching, interrupted briefly to earn a second Ph.D. in Computer and information Sciences from the University of Minnesota.Over time, the author began to doubt the natural explanations that had been so ingrained. It was science, and not his religion, that caused his disbelief in the explanatory powers of nature in a number of key areas including the origin and fine-tuning of mass and energy, the origin of life with its complex information content, and the increase in complexity in living organisms. This realization was not achieved easily, as he had to admit that he had been duped into believing concepts that were scientifically unfounded. The fantastic leaps of faith required to accept the natural causes in these areas demand a scientific response to the scientific-sounding concepts that in fact have no known scientific basis.”
Sounds like a typical run-of-the-mill creationist. He has several of the common characteristics of Intelligent Design Creationist proponents: (1) religion, (2) a background in engineering and/or computer science, (3) no obvious expertise in evolutionary biology, (4) multiple Ph.D.s. I'm really intrigued by the fact that so many IDiots have more than one Ph.D. because I hang out with real scientists all the time and none of them have ever felt the need to be a graduate student more than once in their lives.

Why is this book interesting? Well, for one thing, there's this excerpt from Don Johnson's website [Science Integrity (sic)].
"In the absolute sense, one cannot rule out design of anything since a designer could design something to appear as if it weren’t designed. For example, one may not be able to prove an ordinary-looking rock hadn’t been designed to look as if it were the result of natural processes. The 'necessity of design,' however, is falsifiable. To do so, merely prove that known natural processes can be demonstrated (as opposed to merely speculated from unknown science) to produce: the fine-tuning empirically detectable in the Universe, life from non-life (including the information and its processing systems), the vast diversity of morphology suddenly appearing in the Cambrian era, and the increasing complexity moving up the tree of life (with the accompanying information increase and irreducibly complex systems). If those can be demonstrated with known science, the 'necessity of design' will have been falsified in line with using Occam’s Razor principles for determining the most reasonable scenarios. If the 'necessity of design' is falsified, some may continue to BELIEVE in design, but ID would no longer be appropriate as science." (p. 92)
Isn't that cool? It absolves Intelligent Design Creationism from any burden of proof since things are said to be designed unless you can prove the negative. If real scientists can't prove beyond a shadow of doubt that life came from non-life then design can't be falsified and must be true.

It doesn't matter how many times we can demonstrate that some things evolved, that still doesn't demonstrate that evolution is true. We can only do that if we fill in the most famous gaps existing in the early 21st century. That's the only way to falsify Intelligent Design Creationism. One of the ironies is that there's really no explanation to falsify other than "it has to be designed." This is quite clever. By refusing to offer an explanation of how life began, or how animal diversity arose 500 million years ago, the IDiots insulate themselves from the same criticism they level at evolutionary explanations.

I was prompted to write about Don Johnson after reading another except form his book. An excerpt that particularly impressed Denyse O'Leary. She posted this on uncommon Descent: What will be the next time and money-wasting error Darwinism leads scientists into?1].
Researchers are discovering that what had been dismissed as evolution’s relics are actually vital to life. What used to be considered evidence for neo-Darwinism gene-formation mechanism can no longer be use as such evidence. In this case, neo-Darwinism has been a proven science inhibitor as it postponed serious investigation of the non-coding DNA within the genome, which was “one of the biggest mistakes in the history of molecular biology” [John Mattick, BioEssays, 2003 930-939].” This is reminiscent of the classification of 86 (later expanded to 180) human organs as “vestigial” that Robert Wiedersheim (1893) believed “lost their original physiological significance.” in that they were vestiges of evolution. Functions have since been discovered for all 180 organs that were thought to be vestigial, including the wings of flightless birds, the appendix, and the ear muscles of humans.”
This is more than a little confusing since the statement is wrong about the scientific facts. But even more interesting is the implication that the presence of junk DNA and/or vestigial organs is a threat to Intelligent Design Creationism. What kind of threat? Here's how Denyse O'Leary describes it.
The explicit reason for both the junk DNA error and the vestigial organs error was the need to find evidence for Darwinism in the form of stuff in life forms that doesn’t work. Without that need, these errors would not have been made.
Setting aside the lie about these being errors, let's try and see why this is such a big deal for the IDiots.

As we saw from the first quotation, everything is assumed to be designed unless we can prove that the "big four" have a purely natural explanation. So why would the IDiots be concerned about some little fish like junk DNA and vestigial organs? If a large part of our genome turns out to be junk and at least one organ turns out the be truly vestigial does this mean Intelligent Design Creationism is falsified?

Not bloody likely. The real issue here is not whether Intelligent Design Creationism has a better explanation for the organization of the human genome. It doesn't. The real issue is that these topics can be used to discredit science and evolutionary biologists. (Hence, the title of the articles.)

As I point out in class, this is the 21st century and everyone needs to have science on their side. This includes the IDiots and the climate change deniers. They can't just take the position that they are opposed to science—even though they are. That strategy hasn't worked since Darwin.

So, what do you do when the science seems to refute your claims? You resort to the only option available, attack the science and discredit the messengers. That's why we see so many stories about evil "Darwinists" and that's why people like Denyse O'Leary pounce on any opportunity to point out errors and mistakes in the scientific literature. And if you can't find any real mistakes you can always just make them up.

Intelligent Design Creationism is not about proposing alternative explanations. It's about attacking evolution and evolutionary biologists. Don't believe me? Just look at the books and the blogs. Something like 99.9% of what's written by the IDiots is attacking evolution and science. When's the last time you ever saw anything explained by Intelligent Design Creationism?


1. Aren't you glad that Denyse O'Leary is a professional journalist? Can you imagine what her titles migh look like if she didn't have professional training?

Tuesday, October 29, 2013

The Khan Academy and AAMC Teach the Central Dogma of Molecular Biology in Preparation for the MCAT

Here's a presentation by Tracy Kovach, a 3rd year medical student at the University of Virginia School of Medicine. Sandwalk readers will be familiar with my view of Basic Concepts: The Central Dogma of Molecular Biology and the widespread misunderstanding of Crick's original idea. It won't be a surprise to learn that a 3rd year medical student is repeating the old DNA to RNA to protein mantra.

I suppose that's excusable, especially since that's what is likely to be tested on the MCAT. I wonder if students who take my course, or similar courses that correctly teach the Central Dogma, will be at a disadvantage on the MCAT?

The video is posted on the Khan Academy website at: Central dogma of molecular biology. What I found so astonishing about the video presentation is that Tracy Kovach spends so much time explaining how to remember "transcription" and "translation" and get them in the right order. Recall that this video is for students who are about to graduate from university and apply to medical school. I expect high school students to have mastered the terms "transcription" and "translation." I'm pretty sure that students in my undergraduate class would be insulted if I showed them this video. They would be able to describe the biochemistry of transcription and translation in considerable detail.


There are people who think that the Central Dogma is misunderstood to an even greater extent than I claim. They say that the Central Dogma is widely interpreted to mean that the only role of DNA information is to make RNA which makes protein. In other words, they fear that belief in that version of the Central Dogma rules out any other role for DNA. This is the view of John Mattick. He says that the Central Dogma has been overthrown by the discovery of genes that make functional RNA but not protein.

I wonder if students actually think that this is what the Central Dogma means? Watch the first few minutes of the video and give me your opinion. Is this what she is saying?


Monday, February 23, 2015

Should universities defend free speech and academic freedom?

This post was prompted by a discussion I'm having with Jerry Coyne on whether he should be trying to censor university professors who teach various forms of creationism.

I very much enjoyed Jerry Coyne's stance on free speech in his latest blog website post: The anti-free speech police ride again. Here's what he said,

Thursday, July 28, 2016

You are junk

There's an article about junk DNA in the latest issue of New Scientist (July 27, 2016) [You are junk: Why it’s not your genes that make you human]. I've already discussed the false meme at the beginning of the article [False history and the number of genes: 2016]. Now it's time to look at the main argument.

The subtitle is ...
Genes make proteins make us – that was the received wisdom. But from big brains to opposable thumbs, some of our signature traits could come from elsewhere.
You can see where this is going. You start with a false paradigm, "Genes make proteins make us," then proceed to refute it. This is called "paradigm shafting."1

Wednesday, March 13, 2024

Nils Walter disputes junk DNA: (7) Conservation of transcribed DNA

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the seventh post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift/shaft. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' In the fourth post I discuss his claim that differing opinions on junk DNA are mainly due to philosophical disagreements. The fifth and sixth posts address specific arguments in the junk DNA debate.


Sequence conservation

If you don't know what a transcript is doing then how are you going to know whether it's a spurious transcript or one with an unknown function? One of the best ways is to check and see whether the DNA sequence is conserved. There's a powerful correlation between sequence conservation and function: as a general rule, functional sequences are conserved and non-conserved sequences can be deleted without consequence.

There might be an exception to the the conservation criterion in the case of de novo genes. They arise relatively recently so there's no history of conservation. That's why purifying selection is a better criterion. Now that we have the sequences of thousands of human genomes, we can check to see whether a given stretch of DNA is constrained by selection or whether it accumulates mutations at the rate we expect if its sequence were irrelevant junk DNA (neutral rate). The results show that less than 10% of our genome is being preserved by purifying selection. This is consistent with all the other arguments that 90% of our genome is junk and inconsistent with arguments that most of our genome is functional.

This sounds like a problem for the anti-junk crowd. Let's see how it's addressed in Nils Walter's article in BioEssays.

There are several hand-waving objections to using conservation as an indication of function and Walter uses them all plus one unique argument that we'll get to shortly. Let's deal with some of the "facts" that he discusses in his defense of function. He seems to agree that much of the genome is not conserved even though it's transcribed. In spite of this, he says,

"... the estimates of the fraction of the human genome that carries function is still being upward corrected, with the best estimate of confirmed ncRNAs now having surpassed protein-coding genes,[12] although so far only 10%–40% of these ncRNAs have been shown to have a function in, for example, cell morphology and proliferation, under at least one set of defined conditions."

This is typical of the rhetoric in his discussion of sequence conservation. He seems to be saying that there are more than 20,000 "confirmed" non-coding genes but only 10%-40% of them have been shown to have a function! That doesn't make any sense since the whole point of this debate is how to identify function.

Here's another bunch of arguments that Walter advances to demonstrate that a given sequence could be functional but not conserved. I'm going to quote the entire thing to give you a good sense of Walter's opinion.

A second limitation of a sequence-based conservation analysis of function is illustrated by recent insights from the functional probing of riboswitches. RNA structure, and hence dynamics and function, is generally established co-transcriptionally, as evident from, for example, bacterial ncRNAs including riboswitches and ribosomal RNAs, as well as the co-transcriptional alternative splicing of eukaryotic pre-mRNAs, responsible for the important, vast diversification of the human proteome across ∼200 cell types by excision of varying ncRNA introns. In the latter case, it is becoming increasingly clear that splicing regulation involves multiple layers synergistically controlled by the splicing machinery, transcription process, and chromatin structure. In the case of riboswitches, the interactions of the ncRNA with its multiple protein effectors functionally engage essentially all of its nucleotides, sequence-conserved or not, including those responsible for affecting specific distances between other functional elements. Consequently, the expression platform—equally important for the gene regulatory function as the conserved aptamer domain—tends to be far less conserved, because it interacts with the idiosyncratic gene expression machinery of the bacterium. Consequently, taking a riboswitch out of this native environment into a different cell type for synthetic biology purposes has been notoriously challenging. These examples of a holistic functioning of ncRNAs in their species-specific cellular context lay bare the limited power of pure sequence conservation in predicting all functionally relevant nucleotides.

I don't know much about riboswitches so I can't comment on that. As for alternative splicing, I assume he's suggesting that much of the DNA sequence for large introns is required for alternative splicing. That's just not correct. You can have effective alternative splicing with small introns. The only essential parts of introns sequences are the splice sites and a minimum amount of spacer.

Part of what he's getting at is the fact that you can have a functional transcript where the actual nucleotide sequence doesn't matter so it won't look conserved. That's correct. There are such sequences. For example, there seem to be some examples of enhancer RNAs, which are transcripts in the regulatory region of a gene where it's the act of transcription that's important (to maintain an open chromatin conformation, for example) and not the transcript itself. Similarly, not all intron sequences are junk because some spacer sequence in required to maintain a minimum distance between splice sites. All this is covered in Chapter 8 of my book ("Noncoding Genes and Junk RNA").

Are these examples enough to toss out the idea of sequence conservation as a proxy for function and assume that there are tens of thousands of such non-conserved genes in the human genome? I think not. The null hypothesis still holds. If you don't have any evidence of function then the transcript doesn't have a function—you may find a function at some time in the future but right now it doesn't have one. Some of the evidence for function could be sequence conservation but the absence of conservation is not an argument for function. If conservation doesn't work then you have to come up with some other evidence.

It's worth mentioning that, in the broadest sense, purifying selection isn't confined to nucleotide sequence. It can also take into account deletions and insertions. If a given region of the genome is deficient in random insertions and deletions then that's an indication of function in spite of the fact that the nucleotide sequence isn't maintained by purifying selection. The maintenance definition of function isn't restricted to sequence—it also covers bulk DNA and spacer DNA.

(This is a good time to bring up a related point. The absence of conservation (size or sequence) is not evidence of junk. Just because a given stretch of DNA isn't maintained by purifying selection does not prove that it is junk DNA. The evidence for a genome full of junk DNA comes from different sources and that evidence doesn't apply to every little bit of DNA taken individually. On the other hand, the maintenance function argument is about demonstrating whether a particular region has a function or not and it's about the proper null hypothesis when there's no evidence of function. The burden of proof is on those who claim that a transcript is functional.)

This brings us to the main point of Walter's objection to sequence conservation as an indication of function. You can see hints of it in the previous quotation where he talks about "holistic functioning of ncRNAs in their species-specific cellular context," but there's more ...

Some evolutionary biologists and philosophers have suggested that sequence conservation among genomes should be the primary, or perhaps only, criterion to identify functional genetic elements. This line of thinking is based on 50 years of success defining housekeeping and other genes (mostly coding for proteins) based on their sequence conservation. It does not, however, fully acknowledge that evolution does not actually select for sequence conservation. Instead, nature selects for the structure, dynamics and function of a gene, and its transcription and (if protein coding) translation products; as well as for the inertia of the same in pathways in which they are not involved. All that, while residing in the crowded environment of a cell far from equilibrium that is driven primarily by the relative kinetics of all possible interactions. Given the complexity and time dependence of the cellular environment and its environmental exposures, it is currently impossible to fully understand the emergent properties of life based on simple cause-and-effect reasoning.

The way I see it, his most important argument is that life is very complicated and we don't currently understand all of it's emergent properties. This means that he is looking for ways to explain the complexity that he expects to be there. The possibility that there might be several hundred thousand regulatory RNAs seems to fulfil this need so they must exist. According to Nils Walter, the fact that we haven't (yet) proven that they exist is just a temporary lull on the way to rigorous proof.

This seems to be a common theme among those scientists who share this viewpoint. We can see it in John Mattick's writings as well. It's as though the logic of having a genome full of regulatory RNA genes is so powerful that it doesn't require strong supporting evidence and can't be challenged by contradictory evidence. The argument seems somewhat mystical to me. Its proponents are making the a priori assumption that humans just have to be a lot more complicated than what "reductionist" science is indicating and all they have to do is discover what that extra layer of complexity is all about. According to this view, the idea that our genome is full of junk must be wrong because it seems to preclude the possibility that our genome could explain what it's like to be human.


Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]

Thursday, May 20, 2010

Junk RNA or Imaginary RNA?

RNA is very popular these days. It seems as though new varieties of RNA are being discovered just about every month. There have been breathless reports claiming that almost all of our genome is transcribed and most of the this RNA has to be functional even though we don't yet know what the function is. The fervor with which some people advocate a paradigm shift in thinking about RNA approaches that of a cult follower [see Greg Laden Gets Suckered by John Mattick].

We've known for decades that there are many types of RNA besides messenger RNA (mRNA encodes proteins). Besides the standard ribosomal RNAs and transfer RNAs (tRNAs), there are a variety of small RNAs required for splicing and many other functions. There's no doubt that some of the new discoveries are important as well. This is especially true of small regulatory RNAs.

However, the idea that a huge proportion of our genome could be devoted to synthesizing functional RNAs does not fit with the data showing that most of our genome is junk [see Shoddy But Not "Junk"?]. That hasn't stopped RNA cultists from promoting experiments leading to the conclusion that almost all of our genome is transcribed.

Late to the Party

Several people have already written about this paper including Carl Zimmer and PZ Myers. There are also summaries in Nature News and PLoS Biology.
That may change. A paper just published in PLoS Biology shows that the earlier work was prone to artifacts. Some of those RNAs may not even be there and others are present in tiny amounts.

The work was done by Harm van Bakel in Tim Hughes' lab, right here in Toronto. It's only a few floors, and a bridge, from where I'm sitting right now. The title of their paper tries to put a positive spin on the results: "Most 'Dark Matter' Transcripts Are Associated With Known Genes" [van Bakel et. al. (2010)]. Nobody's buying that spin. They all recognize that the important result is not that non-coding RNAs are mostly associated with genes but the fact that they are not found in the rest of the genome. In other words, most of our genome is not transcribed in spite of what was said in earlier papers.

Van Bekal compared two different types of analysis. The first, called "tiling arrays," is a technique where bulk RNA (cDNA, actually) is hybridized to a series of probes on a microchip. The probes are short pieces of DNA corresponding to genomic sequences spaced every few thousand base pairs along each chromosome. When some RNA fragment hybridizes to one of these probes you score that as a "hit." The earlier experiments used this technique and the results indicated that almost every probe could hybridize an RNA fragment. Thus, as you scanned the chip you saw that almost every spot recorded a "hit." The conclusion is that almost all of the genome is transcribed even though only 2% corresponds to known genes.

The second type of analysis is called RNA-Seq and it relies on direct sequencing of RNA fragments. Basically, you copy the RNA into DNA, selecting for small 200 bp fragments. Using new sequencing technology, you then determine the sequence of one (single end) or both ends (paired end) of this cDNA. You may only get 30 bp of good sequence information but that's sufficient to place the transcript on the known genome sequence. By collecting millions of sequence reads, you can determine what parts of the genome are transcribed and you can also determine the frequency of transcription. The technique is much more quantitative than tiling experiments.

Van Bekel et al. show that using RNA-Seq they detect very little transcription from the regions between genes. On the other hand, using tiling arrays they detect much more transcription from these regions. They conclude that the tiling arrays are producing spurious results—possibly due to cross-hybridization or possibly due to detection of very low abundance transcripts. In other words, the conclusion that most of our genome is transcribed may be an artifact of the method.

The parts of the genome that are presumed to be transcribed but for which there is no function is called "dark matter." Here's the important finding in the author's own words.
To investigate the extent and nature of transcriptional dark matter, we have analyzed a diverse set of human and mouse tissues and cell lines using tiling microarrays and RNA-Seq. A meta-analysis of single- and paired-end read RNA-Seq data reveals that the proportion of transcripts originating from intergenic and intronic regions is much lower than identified by whole-genome tiling arrays, which appear to suffer from high false-positive rates for transcripts expressed at low levels.
Many of us dismissed the earlier results as transcriptional noise or "junk RNA." We thought that much of the genome could be transcribed at a very low level but this was mostly due to accidental transcription from spurious promoters. This low level of "accidental" transcription is perfectly consistent with what we know about RNA polymerase and DNA binding proteins [What is a gene, post-ENCODE?, How RNA Polymerase Binds to DNA]. Although we might have suspected that some of the "transcription" was a true artifact, it was difficult to see how the papers could have failed to consider such a possibility. They had been through peer review and the reviewers seemed to be satisfied with the data and the interpretation.

That's gonna change. I suspect that from now on everybody is going to ignore the tiling array experiments and pretend they don't exist. Not only that, but in light of recent results, I suspect more and more scientists will announce that they never believed the earlier results in the first place. Too bad they never said that in print.


van Bakel, H., Nislow, C., Blencowe, B. and Hughes, T. (2010) Most "Dark Matter" Transcripts Are Associated With Known Genes. PLoS Biology 8: e1000371 [doi:10.1371/journal.pbio.1000371]

Friday, December 13, 2019

The "standard" view of junk DNA is completely wrong

I was browsing the table of contents of the latest issue of Cell and I came across this ....
For decades, the miniscule protein-coding portion of the genome was the primary focus of medical research. The sequencing of the human genome showed that only ∼2% of our genes ultimately code for proteins, and many in the scientific community believed that the remaining 98% was simply non-functional “junk” (Mattick and Makunin, 2006; Slack, 2006). However, the ENCODE project revealed that the non-protein coding portion of the genome is copied into thousands of RNA molecules (Djebali et al., 2012; Gerstein et al., 2012) that not only regulate fundamental biological processes such as growth, development, and organ function, but also appear to play a critical role in the whole spectrum of human disease, notably cancer (for recent reviews, see Adams et al., 2017; Deveson et al., 2017; Rupaimoole and Slack, 2017).

Slack, F.J. and Chinnaiyan, A.M. (2019) The Role of Non-coding RNAs in Oncology. Cell 179:1033-1055 [doi: 10.1016/j.cell.2019.10.017]
Cell is a high-impact, refereed journal so we can safely assume that this paper was reviewed by reputable scientists. This means that the view expressed in the paragraph above did not raise any alarm bells when the paper was reviewed. The authors clearly believe that what they are saying is true and so do many other reputable scientists. This seems to be the "standard" view of junk DNA among scientists who do not understand the facts or the debate surrounding junk DNA and pervasive transcription.

Here are some of the obvious errors in the statement.
  1. The sequencing of the human genome did NOT show that only ~2% of our genome consisted of coding region. That fact was known almost 50 years ago and the human genome sequence merely confirmed it.
  2. No knowledgeable scientist ever thought that the remaining 98% of the genome was junk—not in 1970 and not in any of the past fifty years.
  3. The ENCODE project revealed that much of our genome is transcribed at some time or another but it is almost certainly true that the vast majority of these low-abundance, non-conserved, transcripts are junk RNA produced by accidental transcription.
  4. The existence of noncoding RNAs such as ribosomal RNA and tRNA was known in the 1960s, long before ENCODE. The existence of snoRNAs, snRNAs, regulatory RNAs, and various catalytic RNAS were known in the 1980s, long before ENCODE. Other RNAs such as miRNAs, piRNAS, and siRNAs were well known in the 1990s, long before ENCODE.
How did this false view of our genome become so widespread? It's partially because of the now highly discredited ENCODE publicity campaign orchestrated by Nature and Science but that doesn't explain everything. The truth is out there in peer-reviewed scientific publications but scientists aren't reading those papers. They don't even realize that their standard view has been seriously challenged. Why?


Friday, July 03, 2015

The fuzzy thinking of John Parrington: The Central Dogma

My copy of The Deeper Genome: Why there's more to the human genome than meets the eye has arrived and I've finished reading it. It's a huge disappointment. Parrington makes no attempt to describe what's in your genome in more than general hand-waving terms. His main theme is that the genome is really complicated and so are we. Gosh, golly, gee whiz! Re-write the textbooks!

You will look in vain for any hard numbers such as the total number of genes or the amount of the genome devoted to centromeres, regulatory sequences etc. etc. [see What's in your genome?]. Instead, you will find a wishy-washy defense of ENCODE results and tributes to the views of John Mattick.

John Parrington is an Associate Professor of Cellular & Molecular Pharmacology at the University of Oxford (Oxford, UK). He works on the physiology of calcium signalling in mammals. This should make him well-qualified to write a book about biochemistry, molecular biology, and genomes. Unfortunately, his writing leaves a great deal to be desired. He seems to be part of a younger generation of scientists who were poorly trained as graduate students (he got his Ph.D. in 1992). He exhibits the same kind of fuzzy thinking as many of the ENCODE leaders.

Let me give you just one example.

Tuesday, February 27, 2024

Nils Walter disputes junk DNA: (1) The surprise

Nils Walter attempts to present the case for a functional genome by reconciling opposing viewpoints. I address his criticisms of the junk DNA position and discuss his arguments in favor of large numbers of functional non-coding RNAs.

Nils Walter is Francis S. Collins Collegiate Professor of Chemistry, Biophysics, and Biological Chemistry at the University of Michigan in Ann Arbor (Michigan, USA). He works on human RNAs and claims that, "Over 75% of our genome encodes non-protein coding RNA molecules, compared with only <2% that encodes proteins." He recently published an article explaining why he opposes junk DNA.

Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]

The human genome project's lasting legacies are the emerging insights into human physiology and disease, and the ascendance of biology as the dominant science of the 21st century. Sequencing revealed that >90% of the human genome is not coding for proteins, as originally thought, but rather is overwhelmingly transcribed into non-protein coding, or non-coding, RNAs (ncRNAs). This discovery initially led to the hypothesis that most genomic DNA is “junk”, a term still championed by some geneticists and evolutionary biologists. In contrast, molecular biologists and biochemists studying the vast number of transcripts produced from most of this genome “junk” often surmise that these ncRNAs have biological significance. What gives? This essay contrasts the two opposing, extant viewpoints, aiming to explain their basis, which arise from distinct reference frames of the underlying scientific disciplines. Finally, it aims to reconcile these divergent mindsets in hopes of stimulating synergy between scientific fields.

Sunday, September 09, 2012

Brendan Maher Writes About the ENCODE/Junk DNA Publicity Fiasco

Brendan Maher is a Feature Editor for Nature. He wrote a lengthy article for Nature when the ENCODE data was published on Sept. 5, 2012 [ENCODE: The human encyclopaedia]. Here's part of what he said,
After an initial pilot phase, ENCODE scientists started applying their methods to the entire genome in 2007. Now that phase has come to a close, signalled by the publication of 30 papers, in Nature, Genome Research and Genome Biology. The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes.
I expect encyclopedias to be much more accurate than this.

As most people know by now, there are many of us who challenge the implication that 80% of the genome has a function (i.e it's not junk).1 We think the Consortium was not being very scientific by publicizing such a ridiculous claim.

The main point of Maher's article was that the ENCODE results reveal a huge network of regulatory elements controlling expression of the known genes. This is the same point made by the ENCODE researchers themselves. Here's how Brendan Maher expressed it.

The real fun starts when the various data sets are layered together. Experiments looking at histone modifications, for example, reveal patterns that correspond with the borders of the DNaseI-sensitive sites. Then researchers can add data showing exactly which transcription factors bind where, and when. The vast desert regions have now been populated with hundreds of thousands of features that contribute to gene regulation. And every cell type uses different combinations and permutations of these features to generate its unique biology. This richness helps to explain how relatively few protein-coding genes can provide the biological complexity necessary to grow and run a human being.
I think that much of this hype comes from a problem I've called The Deflated Ego Problem. It arises because many scientists were disappointed to discover that humans have about the same number of genes as many other species yet we are "obviously" much more complex than a mouse or a pine tree. There are many ways of solving this "problem." One of them is to postulate that humans have a much more sophisticated network of control elements in our genome. Of course, this ignores the fact that the genomes of mice and trees are not smaller than ours.

Tuesday, March 13, 2018

Making Sense of Genes by Kostas Kampourakis

Kostas Kampourakis is a specialist in science education at the University of Geneva, Geneva (Switzerland). Most of his book is an argument against genetic determinism in the style of Richard Lewontin. You should read this book if you are interested in that argument. The best way to describe the main thesis is to quote from the last chapter.

Here is the take-home message of this book: Genes were initially conceived as immaterial factors with heuristic values for research, but along the way they acquired a parallel identity as DNA segments. The two identities never converged completely, and therefore the best we can do so far is to think of genes as DNA segments that encode functional products. There are neither 'genes for' characters nor 'genes for' diseases. Genes do nothing on their own, but are important resources for our self-regulated organism. If we insist in asking what genes do, we can accept that they are implicated in the development of characters and disease, and that they account for variation in characters in particular populations. Beyond that, we should remember that genes are part of an interactive genome that we have just begun to understand, the study of which has various limitations. Genes are not our essences, they do not determine who we are, and they are not the explanation of who we are and what we do. Therefore we are not the prisoners of any genetic fate. This is what the present book has aimed to explain.

Saturday, March 21, 2015

How the genome lost its junk according to John Parrington

I really hate it when publishers start to hype a book several months before we can read it, especially when the topic is controversial. In this case, it's Oxford University Press and the book is "The Deeper Genome" Why there is more to the human genome than meets the eye." The author is John Parrington.

The title of the promotion blurb is: How the Genome Lost its Junk on the Canadian version of the Oxford University Press website. It looks like this book is going to be an attack on junk DNA.

We won't know for sure until June or July when the book is published. Until then, the author and the publisher will have free reign to sell their ideas without serious opposition or push back.

Here's the prepublication hype. I'm going to buy this book and read it as soon as it becomes available. Stay tuned for a review.

Wednesday, March 09, 2016

A 2004 kerfuffle over pervasive transcription in the mouse genome

The first drafts of the human genome sequence were published in 2001. There was still work to do on "finishing" the sequence but a lot of the International Human Genome Project (IHGP) team shifted to work on the mouse genome. The FANTOM Consortium and the RIKEN Genome Exploration Groups (I and II) published an analysis of mouse transcripts in December 2002.
Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., Kondo, S., Nikaido, I., Osato, N., Saito, R., Suzuki, H. et al. (2002) Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature, 420:563-573. [doi: 10.1038/nature01266]

Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 ‘transcriptional units’, contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense–antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.

Thursday, December 22, 2022

Junk DNA, TED talks, and the function of lncRNAs

Most of our genome is transcribed but so far only a small number of these transcripts have a well-established biological function.

The fact that most of our genome is transcribed has been known for 50 years but that fact only became widely known with the publication of ENCODE's preliminary results in 2007 (ENCODE, 2007). The ENOCDE scientists referred to this as "pervasive transription" and this label has stuck.

By the end of the 1970s we knew that much of this transcription was due to introns. The latest data shows that protein coding genes and known noncoding genes occupy about 45% of the genome and most of that is intron sequences that are mostly junk. That leaves 30-40% of the genome that is transcribed at some point producing something like one million transcripts of unknown function.