More Recent Comments

Showing posts with label Junk DNA. Show all posts
Showing posts with label Junk DNA. Show all posts

Sunday, November 10, 2024

Do plants have junk DNA?

Current Opinion in Plant Biology has a special edition devoted to Genome studies and molecular genetics 2024. The only paper (so far) that discusses plant genomes is one devoted to RNAs. Here's the abstract ...

Anyatama, A., Datta, T., Dwivedi, S. and Trivedi, P.K. (2024) Transcriptional junk: Waste or a key regulator in diverse biological processes? Current Opinion in Plant Biology 82:102639. [doi: 10.1016/j.pbi.2024.102639]

Plant genomes, through their evolutionary journey, have developed a complex composition that includes not only protein-coding sequences but also a significant amount of non-coding DNA, repetitive sequences, and transposable elements, traditionally labeled as “junk DNA”. RNA molecules from these regions, labeled as “transcriptional junk,” include non-coding RNAs, alternatively spliced transcripts, untranslated regions (UTRs), and short open reading frames (sORFs). However, recent research shows that this genetic material plays crucial roles in gene regulation, affecting plant growth, development, hormonal balance, and responses to stresses. Additionally, some of these regulatory regions encode small proteins, such as miRNA-encoded peptides (miPEPs) and microProteins (miPs), which interact with DNA or nuclear proteins, leading to chromatin remodeling and modulation of gene expression. This review aims to consolidate our understanding of the diverse roles that these so-called “transcriptional junk” regions play in regulating various physiological processes in plants.

Saturday, October 26, 2024

Three lungfish species have huge genomes

Lungfish are our closest living fish cousins. All living terrestrial vertebrates (e.g. amphibians, mammals, reptiles) descent from a common ancestor with lungfish. The split occurred about 400 million years ago (4Ma) (Devonian) when there were 70-100 different lungfish species.

This relationship (lungfish-tetrapods) was firmly established recently by comparing the genome of the Australian lungfish (Neoceratodus forsteri) with that of tetrapods (Meyer et al., 2021). The other possibility had been ceolacanth-tetrapods. Coelacanths and lungfish are related—they form the class Sarcopterygii (lobe-finned fish).

Friday, September 27, 2024

John Mattick's seminar at the University of Toronto

I just learned that John Mattick gave a seminar this morning at the Department of Cell & Systems Biology at the University of Toronto. Unfortunately, I was unable to attend.

Most Sandwalk readers will recognize Mattick as one of the few remaining vocal opponents of junk DNA. He is probably best known for his dog-ass plot but this is only one of the ways he misrepresents science.

Sunday, September 01, 2024

Scite Assistant (AI) answers the question "How much of the human genome consist of junk DNA?"

Scite Assistant is billed as "your AI research partner" and as "ChatGPT for researchers." It's supposed to draw on peer-reviewed published scientific papers for its information and it will give you an answer with genuine citations.

That sounds like a good idea until you realize that the scientific literature is full of misinformation and conflicting information. What we need is an AI assistant that can help us sort throught the misinformation and give us a genuine well-informed answer on controversial issues.

Let's pick the question of junk DNA as a completley random (!) example of such an issue. The scientific literature is full of false information about the origin of the term "junk DNA" and what it was originally intended to describe. It's also full of false information about recent results and how they pertain to junk DNA.

Monday, August 12, 2024

Zach Hancock explains junk DNA

Zach Hancock is a postdoc in ecology & evoluvionary biology at the University of Michigan. He has a YouTube channel with several thousand subscribers. You might recall that he interviewed me last year when my book came out [Zach Hancock interviews me on his YouTube channel].

He has just posted a new video on junk DNA that's well worth watching. He tries to correct all the falsehoods and misinformation on junk DNA, especially those promoted by creationists. It's well worth watching.


Wednesday, June 05, 2024

Tom Cech writes about the "dark matter" of the genome

Tom Cech won a Nobel Prize for discovering one example of a catalytic RNA. He recently published an article in the New York Times extolling the virtues of RNA and non-coding genes [The Long-Overlooked Molecule That Will Define a Generation of Science]. There's a fair amount of hype in the article but the main point is quite valid—over the past fifty years we have learned about dozens of important non-coding RNAs that we didn't know about at the beginning of molecular biology [see: Non-coding RNA, Non-coding DNA].

The main issue in this field concerns the number of non-coding genes in the human genome. I cover the available data in my book and conclude that there are fewer than 1000 (p.214). Those scientists who promote the importance of RNA (e.g. Tom Cech) would like you to believe that there are many more non-coding genes; indeed, most of those scientists believe that there are more non-coding genes than coding genes (i.e. > 20,000). They rarely present evidence for such a claim beyond noting that much of our genome is transcribed.

Tom Cech is wise enough to avoid publishing an estimate of the number of non-coding genes but his bias is evident in the following paragraph from near the end of his article.

Although most scientists now agree on RNA's bright promise, we are still only beginning to unlock its potential. Consider, for instance, that some 75 percent of the human genome consists of dark matter that is copied into RNAs of unknown function. While some researchers have dismissed this dark matter as junk or noise, I expect it will be the source of even more exciting breakthroughs.

Let's dissect this to see where the bias lies. The first thing you note is the use of the term "dark matter" to make it sound like there's a lot of mysterious DNA in our genome. This is not true. We know a heck of a lot about our genome, including the fact that it's full of junk DNA. Only 10% of the genome is under purifying selection and assumed to be functional. The rest is full of introns, pseudogenes, and various classes of repetitive sequences made up mostly of degraded transposons and viruses. The entire genome has been sequenced—there's not much mystery there. I don't know why anyone refers to this as "dark matter" unless they have a hidden agenda.

The second thing you notice is the statement that 75% of the genome is transcribed at some time or another and, according to Tom Cech, these transcripts have an unknown function. That's strange since protein-coding genes take up roughly 40% of our genome and we know a great deal about coding DNA, UTRs, and introns. If you add in the known examples of non-coding genes, this accounts for an additional 2-3% of the genome.1

Almost all the rest of the transcripts come from non-conserved DNA and those transcripts are present at less than one copy per cell. As the ENCODE researchers noted in 2014, they are likely to be junk RNA resulting from spurious transcription. I'd say we know a great deal about the fraction of the genome that's transcribed and there's not much indication that it's hiding a plethora of undiscovered functional RNAs.


Photo credit: University of Colorado, Boulder.

1. In my book I make a generous estimate of 5,000 non-coding genes in order to avoid quibbling over a smaller number and in order to demonstrate that even with such a obvious over-estimate the genome is still 90% junk.

Sunday, May 05, 2024

Junk DNA debate: Casey Luskin vs Dan Stern Cardinale

Here's a link to the junk DNA debate between Dan Stern Cardinale and Casey Luskin. The debate took place on May 2, 2024.

I mentioned in a previous post that Luskin should have been called out on his repeated attempts to equate junk DNA with non-coding DNA. This allowed him to portray all non-coding functions as evidence against junk DNA. [Casey Luskin posts misleading quotes about junk DNA].

There are several other things that I would have done differently. I would have made it clear that 10% of the genome is functional and we don't know the function of some of that fraction. Thus, all newly discovered functional regions could still fit into the 10% and 90% of the genome is still junk. Every time Casey mentions a new function he should have been challenged to specify exactly what percentage of the genome he is referring to. (Dan tried to do this but he was too nice, and let Casey off the hook.)

The idea here is to make it clear to viewers that recent discoveries of functional regions do not affect the idea that most of our genome is junk.

I would also attempt to get Casey to admit that there's a scientific controversy over junk DNA so there are many papers defending junk DNA and criticizing the arguments of junk DNA opponents. For every quotation from a scientist who opposes junk, there's an equally significant quotation from one who supports junk. Why does Casey only quote scientists who agree with him? Is this cherry-picking? Is selectively rattling off quotations and references from people who agree with you a reasonable way to have a serious scientific debate?

I think the arguments over transcripts should begin with presenting all the scientific evidence that spurious transcripts exist - for example, random DNA sequences inserted into a cell nucleus are transcribed and spurious transcription is easily documented in well-studied organisms such as bacteria and yeast. The characteristics of spurious transcription are that the transcripts are present in very small amounts, that they are rapidly degraded, that they come from regions of the genome that are not under purifying selection, and they are cell/tissue specific. So what is the most reasonable explanation when you look at such transcripts?

Casey Luskin's attempt to avoid the best explanation (spurius transcription) is a classic example ad hoc rescue and it might have been useful to point this out to viewers.

Regulation is not new. There was serious discussion and debate over the amount of the genome devoted to regulation back in the late 1960s when the concept of junk DNA was first proposed. Casey should have been challenged to state what percentage of the genome is devoted to regulation and if he comes up with an unreasonable number he should have to give examples of many well-studied genes that have been shown to have that level of regulation. (Hint: There aren't any.) All of the detailed work on the regulation of dozens of specific human genes has shown that you don't need more than a few transcription factor binding sites to control expression. Is there any reason to suppose that the other genes require ten or a hundred times more regulatory sequences to control expression?

What is the trend line? Ever since the ENCODE publicity disaster of 2012 there has been a flood of papers defending junk DNA and the data supporting junk DNA is now stronger that it has ever been because we now know from hundreds of thousands of human genome sequences that only about 10% is under purifying selection. There have also been a lot of papers fleshing out the 10% of the genome that's functional. There have only been a handful of papers published in the past ten years that seriously attempt to present evidence that most of our genome is functional. I would have challenged Casey to come up with a single scientific publication in the past ten years claiming, with supporting data, that most of the genome is functional.


Saturday, May 04, 2024

Casey Luskin posts misleading quotes about junk DNA

On Thursday May 2, 2024, Casey Luskin and Dan Stern Cardinale debated junk DNA on the YouTube channel "The NonSequitor Show." David Klinghoffer thinks that this debate went very well for the ID side [Debate: Casey Luskin Versus Rutgers Biologist Dan Cardinale, Thursday, May 2]. I agree with Klinghoffer; Luskin did an excellent job of promoting his case because many of his statements and claims were not challenged effectively.

I'll be putting up a separate post on the debate but for now I'd like to address an article by Casey Luskin that he posted before the debate as preparation for what he was going to say. The article consists of a bunch of quotes from prominent scientists about junk DNA [“Junk DNA” from Three Perspectives: Some Key Quotes]. Here are the three perspectives, according to Luskin.

Category 1: Quotes from evolutionists claiming (or repeating the widespread belief) that non-coding DNA is “junk” and has no function.

Some of the quotes represent the actual position of junk DNA proponents but Luskin has also picked out stupid quotes from scientists who think, incorrectly, that all non-coding DNA is junk. This is deliberate as we will see below.

Category 2: Early quotes from intelligent design theorists predicting function for non-coding “junk” DNA.

Luskin builds the case for function in non-coding DNA by quoting religious scientists who "predict" that there will be functional DNA in non-coding regions of the genome. This is disingenuous at best because Luskin knows full well that from the very beginning of the scientific debate we knew about functional non-coding DNA. It was never the case that all non-coding DNA was assumed to be junk.

Category 3: Quotes from mainstream scientific sources saying that we’ve experienced a shift in our thinking that junk DNA actually has function.

Many of these quotes are from scientists announcing that some non-coding DNA has a function. They support Luskin's false claim that all non-coding DNA was thought to be junk and the discovery of functional regions of non-coding DNA has resulted in a "paradigm shift" in our view of the human genome.

Casey Luskin should not have been allowed to get away with equating junk DNA and non-coding DNA in the debate. He should have been challenged to retract that false claim at the very beginning of the debate and called out whenever he used the term "non-coding DNA" during the debate.


Friday, March 29, 2024

Why do Intelligent Design Creationists still lie about junk DNA?

Intelligent Design Creationists are heavily invested in refuting junk DNA because it casts doubt on their model of an intelligently designed human. Over the years they have advanced all kinds of arguments against junk DNA and some ID supporters actually address the real scientific issues (e.g. Jonathan Wells). However, most Intelligent Design Creationists are as ignorant about the scientific dispute over junk DNA as they are about evolution and lots of other science issues that conflict with their underlying religious beliefs.

A few days ago (March 26, 2024), the Discovery Institute's Center for Science and Culture published a short video on "The MYTH of Junk DNA" where they ignored most of the science and appealed to the majority of creationists who don't care about the truth. We have enough data to conclude that the Discovery Institute isn't just ignorant of the real science but is actually lying in this video. We know this because there are prominent Senior Fellows of the Center for Science and Culture who know that the material in this video is wrong and/or mispleading.

Thursday, March 21, 2024

Science misinformation is being spread in the lecture halls of top universities

Should universities remove online courses that contain incorrect or misleading information?

There are lots of scientific controversies where different scientists have conflicting views. Eventually these controversies will be solved by normal scientific means involving evidence and logic but for the time being there isn't enough data to settle a genuine scientific controversy. Many of us are interested in these controversies and some of us have chosen to invest time and effort into defending one side or the other.

But there's a dark side of science that infects these debates—false or misleading information used to support one side of a legitimate controversy. To give just one example, I'm frustrated at the constant reference to junk DNA being defined as non-coding DNA. Many scientists believe that this was the way junk DNA was defined by its earliest proponents and then they go on to say that the recent discovery of functional non-coding DNA refutes junk.

I don't know where this idea came from because there's nothing in the scientific literature from 50 years ago to support such a ridiculous claim. It must be coming from somewhere since the idea is so widespread.

Where does misinformation come from and how is it spread?

Monday, March 18, 2024

Intelligent design creationists think junk DNA is a placeholder for ignorance

Paul Nelson is a Senior Fellow of the Discovery Institute—the most important source of intelligent design propaganda. Paul and I have been disagreeing about science for many years. He is prone to interpret anything he finds in the scientific literature as support for the idea that scientists have misunderstood their subject matter and failed to recognize that science supports intelligent design. My goal has always been to try and explain the actual science and why his interpretations are misguided. I have not been very successful.

The photo was taken in London (UK) in 2016 at a meeting on evolution. It looks like I'm holding my breath because I'm beside a creationist but I assure you that's not what was happening. We actually get along quite well in spite of the fact that he's wrong about everything. :-)

Friday, March 15, 2024

Nils Walter disputes junk DNA: (9) Reconciliation

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the ninth and last post in the series. I'm going to discuss Walker's view on how to tone down the dispute over the amount of junk in the human genome. Here's a list of the previous posts.


"Conclusion: How to Reconcile Scientific Fields"

Walter concludes his paper with some thoughts on how to deal with the controversy going forward. I'm using the title that he choose. As you can see from the title, he views this as a squabble between two different scientific fields, which he usually identifies as geneticists and evolutionary biologists versus biochemists and molecular biologists. I don't agree with this distinction. I'm a biochemist and molecular biologist, not a geneticist or an evolutionary biologist, and still I think that many of his arguments are flawed.

Let's see what he has to say about reconciliation.

Science thrives from integrating diverse viewpoints—the more diverse the team, the better the science.[107] Previous attempts at reconciling the divergent assessments about the functional significance of the large number of ncRNAs transcribed from most of the human genome by pointing out that the scientific approaches of geneticists, evolutionary biologists and molecular biologists/biochemists provide complementary information[42] was met with further skepticism.[74] Perhaps a first step toward reconciliation, now that ncRNAs appear to increasingly leave the junkyard,[35] would be to substitute the needlessly categorical and derogative word RNA (or DNA) “junk” for the more agnostic and neutral term “ncRNA of unknown phenotypic function”, or “ncRNAupf”. After all, everyone seems to agree that the controversy mostly stems from divergent definitions of the term “function”,[42, 74] which each scientific field necessarily defines based on its own need for understanding the molecular and mechanistic details of a system (Figure 3). In addition, “of unknown phenotypic function” honors the null hypothesis that no function manifesting in a phenotype is currently known, but may still be discovered. It also allows for the possibility that, in the end, some transcribed ncRNAs may never be assigned a bona fide function.

First, let's take note of the fact that this is a discussion about whether a large percentage of transcripts are functional or not. It is not about the bigger picture of whether most of the genome is junk in spite of the fact that Nils Walter frames it in that manner. This becomes clear when you stop and consider the implications of Walter's claim. Let's assume that there really are 200,000 functional non-coding genes in the human genome. If we assume that each one is about 1000 bp long then this amounts to 6.5% of the genome—a value that can easily be accommodated within the 10% of the genome that's conserved and functional.

Now let's look at how he frames the actual disagreement. He says that the groups on both sides of the argument provide "complementary information." Really? One group says that if you can delete a given region of DNA with no effect on the survival of the individual or the species then it's junk and the other group says that it still could have a function as long as it's doing something like being transcribed or binding a transcription factor. Those don't look like "complimentary" opinions to me.

His first step toward reconciliation starts with "now that ncRNAs appear to increasingly leave the junkyard." That's not a very conciliatory way to start a conversation because it immediately brings up the question of how many ncRNAs we're talking about. Well-characterized non-coding genes include ribosomal RNA genes (~600), tRNA genes (~200), the collection of small non-coding genes (snRNA, snoRNA, microRNA, siRNA, PiWi RNA)(~200), several lncRNAs (<100), and genes for several specialized RNAs such as 7SL and the RNA component of RNAse P (~10). I think that there are no more than 1000 extra non-coding genes falling outside these well-known examples and that's a generous estimate. If he has evidence for large numbers that have left the junkyard then he should have presented it.

Walter goes on to propose that we should divide non-coding transcripts into two categories; those with well-characterized functions and "ncRNA of unknown function." That's ridiculous. That is not a "agnostic and neutral term." It implies that non-conserved transcripts that are present at less that one copy per cell could still have a function in spite of the fact that spurious transcription is well-documented. In fact, he basically admits this interpretation at the end of the paragraph where he says that using this description (ncRNA of unknown function) preserves the possibility that a function might be discovered in the future. He thinks this is the "null hypothesis."

The real null hypothesis is that a transcript has no function until it can be demonstrated. Notice that I use the word "transcript" to describe these RNAs instead of "ncRNA" or "ncRNA of unknown phenotypic function." I don't think we lose anything by using the word "transcript."

Walter also address the meaning of "function" by claiming that different scientific fields use different definitions as though that excuses the conflict. But that's not an accurate portrayal of the problem. All scientists, no matter what field they identify with, are interested in coming up with a way of identifying functional DNA. There are many biochemists and molecular biologists who accept the maintenance definition as the best available definition of function. As scientists, they are more than willing to entertain any reasonable scientific arguments in favor of a different definition but nobody, including Nils Walter, has come up with such arguments.

Now let's look at the final paragraph of Walter's essay.

Most bioscientists will also agree that we need to continue advancing from simply cataloging non-coding regions of the human genome toward characterizing ncRNA functions, both elementally and phenotypically, an endeavor of great challenge that requires everyone's input. Solving the enigma of human gene expression, so intricately linked to the regulatory roles of ncRNAs, holds the key to devising personalized medicines to treat most, if not all, human diseases, rendering the stakes high, and unresolved disputes counterproductive.[108] The fact that newly ascendant RNA therapeutics that directly interface with cellular RNAs seem to finally show us a path to success in this challenge[109] only makes the need for deciphering ncRNA function more urgent. Succeeding in this goal would finally fulfill the promise of the human genome project after it revealed so much non-protein coding sequence (Figure 1). As a side effect, it may make updating Wikipedia and encyclopedia entries less controversial.

I agree that it's time for scientists to start identifying those transcripts that have a true function. I'll go one step further; it's time to stop pretending that there might be hundreds of thousands of functional transcripts until you actually have some data to support such a claim.

I take issue with the phrase "solving the enigma of human gene expression." I think we already have a very good understanding of the fundamental mechanisms of gene expression in eukaryotes, including the transitions between open and closed chromatin domains. There may be a few odd cases that deviate from the norm (e.g. Xist) but that hardly qualifies as an "enigma." He then goes on to say that this "enigma" is "intricately linked to the regulatory roles of ncRNAs" but that's not a fact, it's what's in dispute and why we have to start identifying the true function (if any) of most transcripts. Oh, and by the way, sorting out which parts of the genome contain real non-coding genes may contribute to our understanding of genetic diseases in humans but it won't help solve the big problem of how much of our genome is junk because mutations in junk DNA can cause genetic diseases.

Sorting out which transcripts are functional and which ones are not will help fill in the 10% of the genome that's functional but it will have little effect on the bigger picture of a genome that's 90% junk.

We've known that less than 2% of the genome codes for proteins since the late 1960s—long before the draft sequence of the human genome was published in 2001—and we've known for just as long that lots of non-coding DNA has a function. It would be helpful if these facts were made more widely known instead of implying that they were only dscovered when the human genome was sequenced.

Once we sort out which transcripts are functional, we'll be in a much better position to describe the all the facts when we edit Wikipedia articles. Until that time, I (and others) will continue to resist the attempts by the students in Nils Walter's class to remove all references to junk DNA.


Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]

Thursday, March 14, 2024

Nils Walter disputes junk DNA: (8) Transcription factors and their binding sites

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the seventh post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift/shaft. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' In the fourth post I discuss his claim that differing opinions on junk DNA are mainly due to philosophical disagreements. The fifth, sixth, and seventh posts address specific arguments in the junk DNA debate.

Wednesday, March 13, 2024

Nils Walter disputes junk DNA: (7) Conservation of transcribed DNA

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the seventh post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift/shaft. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' In the fourth post I discuss his claim that differing opinions on junk DNA are mainly due to philosophical disagreements. The fifth and sixth posts address specific arguments in the junk DNA debate.


Sequence conservation

If you don't know what a transcript is doing then how are you going to know whether it's a spurious transcript or one with an unknown function? One of the best ways is to check and see whether the DNA sequence is conserved. There's a powerful correlation between sequence conservation and function: as a general rule, functional sequences are conserved and non-conserved sequences can be deleted without consequence.

There might be an exception to the the conservation criterion in the case of de novo genes. They arise relatively recently so there's no history of conservation. That's why purifying selection is a better criterion. Now that we have the sequences of thousands of human genomes, we can check to see whether a given stretch of DNA is constrained by selection or whether it accumulates mutations at the rate we expect if its sequence were irrelevant junk DNA (neutral rate). The results show that less than 10% of our genome is being preserved by purifying selection. This is consistent with all the other arguments that 90% of our genome is junk and inconsistent with arguments that most of our genome is functional.

This sounds like a problem for the anti-junk crowd. Let's see how it's addressed in Nils Walter's article in BioEssays.

There are several hand-waving objections to using conservation as an indication of function and Walter uses them all plus one unique argument that we'll get to shortly. Let's deal with some of the "facts" that he discusses in his defense of function. He seems to agree that much of the genome is not conserved even though it's transcribed. In spite of this, he says,

"... the estimates of the fraction of the human genome that carries function is still being upward corrected, with the best estimate of confirmed ncRNAs now having surpassed protein-coding genes,[12] although so far only 10%–40% of these ncRNAs have been shown to have a function in, for example, cell morphology and proliferation, under at least one set of defined conditions."

This is typical of the rhetoric in his discussion of sequence conservation. He seems to be saying that there are more than 20,000 "confirmed" non-coding genes but only 10%-40% of them have been shown to have a function! That doesn't make any sense since the whole point of this debate is how to identify function.

Here's another bunch of arguments that Walter advances to demonstrate that a given sequence could be functional but not conserved. I'm going to quote the entire thing to give you a good sense of Walter's opinion.

A second limitation of a sequence-based conservation analysis of function is illustrated by recent insights from the functional probing of riboswitches. RNA structure, and hence dynamics and function, is generally established co-transcriptionally, as evident from, for example, bacterial ncRNAs including riboswitches and ribosomal RNAs, as well as the co-transcriptional alternative splicing of eukaryotic pre-mRNAs, responsible for the important, vast diversification of the human proteome across ∼200 cell types by excision of varying ncRNA introns. In the latter case, it is becoming increasingly clear that splicing regulation involves multiple layers synergistically controlled by the splicing machinery, transcription process, and chromatin structure. In the case of riboswitches, the interactions of the ncRNA with its multiple protein effectors functionally engage essentially all of its nucleotides, sequence-conserved or not, including those responsible for affecting specific distances between other functional elements. Consequently, the expression platform—equally important for the gene regulatory function as the conserved aptamer domain—tends to be far less conserved, because it interacts with the idiosyncratic gene expression machinery of the bacterium. Consequently, taking a riboswitch out of this native environment into a different cell type for synthetic biology purposes has been notoriously challenging. These examples of a holistic functioning of ncRNAs in their species-specific cellular context lay bare the limited power of pure sequence conservation in predicting all functionally relevant nucleotides.

I don't know much about riboswitches so I can't comment on that. As for alternative splicing, I assume he's suggesting that much of the DNA sequence for large introns is required for alternative splicing. That's just not correct. You can have effective alternative splicing with small introns. The only essential parts of introns sequences are the splice sites and a minimum amount of spacer.

Part of what he's getting at is the fact that you can have a functional transcript where the actual nucleotide sequence doesn't matter so it won't look conserved. That's correct. There are such sequences. For example, there seem to be some examples of enhancer RNAs, which are transcripts in the regulatory region of a gene where it's the act of transcription that's important (to maintain an open chromatin conformation, for example) and not the transcript itself. Similarly, not all intron sequences are junk because some spacer sequence in required to maintain a minimum distance between splice sites. All this is covered in Chapter 8 of my book ("Noncoding Genes and Junk RNA").

Are these examples enough to toss out the idea of sequence conservation as a proxy for function and assume that there are tens of thousands of such non-conserved genes in the human genome? I think not. The null hypothesis still holds. If you don't have any evidence of function then the transcript doesn't have a function—you may find a function at some time in the future but right now it doesn't have one. Some of the evidence for function could be sequence conservation but the absence of conservation is not an argument for function. If conservation doesn't work then you have to come up with some other evidence.

It's worth mentioning that, in the broadest sense, purifying selection isn't confined to nucleotide sequence. It can also take into account deletions and insertions. If a given region of the genome is deficient in random insertions and deletions then that's an indication of function in spite of the fact that the nucleotide sequence isn't maintained by purifying selection. The maintenance definition of function isn't restricted to sequence—it also covers bulk DNA and spacer DNA.

(This is a good time to bring up a related point. The absence of conservation (size or sequence) is not evidence of junk. Just because a given stretch of DNA isn't maintained by purifying selection does not prove that it is junk DNA. The evidence for a genome full of junk DNA comes from different sources and that evidence doesn't apply to every little bit of DNA taken individually. On the other hand, the maintenance function argument is about demonstrating whether a particular region has a function or not and it's about the proper null hypothesis when there's no evidence of function. The burden of proof is on those who claim that a transcript is functional.)

This brings us to the main point of Walter's objection to sequence conservation as an indication of function. You can see hints of it in the previous quotation where he talks about "holistic functioning of ncRNAs in their species-specific cellular context," but there's more ...

Some evolutionary biologists and philosophers have suggested that sequence conservation among genomes should be the primary, or perhaps only, criterion to identify functional genetic elements. This line of thinking is based on 50 years of success defining housekeeping and other genes (mostly coding for proteins) based on their sequence conservation. It does not, however, fully acknowledge that evolution does not actually select for sequence conservation. Instead, nature selects for the structure, dynamics and function of a gene, and its transcription and (if protein coding) translation products; as well as for the inertia of the same in pathways in which they are not involved. All that, while residing in the crowded environment of a cell far from equilibrium that is driven primarily by the relative kinetics of all possible interactions. Given the complexity and time dependence of the cellular environment and its environmental exposures, it is currently impossible to fully understand the emergent properties of life based on simple cause-and-effect reasoning.

The way I see it, his most important argument is that life is very complicated and we don't currently understand all of it's emergent properties. This means that he is looking for ways to explain the complexity that he expects to be there. The possibility that there might be several hundred thousand regulatory RNAs seems to fulfil this need so they must exist. According to Nils Walter, the fact that we haven't (yet) proven that they exist is just a temporary lull on the way to rigorous proof.

This seems to be a common theme among those scientists who share this viewpoint. We can see it in John Mattick's writings as well. It's as though the logic of having a genome full of regulatory RNA genes is so powerful that it doesn't require strong supporting evidence and can't be challenged by contradictory evidence. The argument seems somewhat mystical to me. Its proponents are making the a priori assumption that humans just have to be a lot more complicated than what "reductionist" science is indicating and all they have to do is discover what that extra layer of complexity is all about. According to this view, the idea that our genome is full of junk must be wrong because it seems to preclude the possibility that our genome could explain what it's like to be human.


Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]

Sunday, March 03, 2024

Nils Walter disputes junk DNA: (5) What does the number of transcripts per cell tell us about function?

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the fifth post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' The fourth post makes the case that differing views on junk DNA are mainly due to philosophical disagreements.

-Nils Walter disputes junk DNA: (1) The surprise

-Nils Walter disputes junk DNA: (2) The paradigm shaft

-Nils Walter disputes junk DNA: (3) Defining 'gene' and 'function'

-Nils Walter disputes junk DNA: (4) Different views of non-functional transcripts

Transcripts vs junk DNA

The most important issue, according to Nils Walter, is whether the human genome contains huge numbers of genes for lncRNAs and other types of regulatory RNAs. He doesn't give us any indication of how many of these potential genes he thinks exist or what percentage of the genome they cover. This is important since he's arguing against junk DNA but we don't know how much junk he's willing to accept.

There are several hundred thousand transcripts in the RNA databases. Most of them are identified as lncRNAs because they are bigger than 200 bp. Let's assume, for the sake of argument, that 200,000 of these transcripts have a biologically relevant function and therefore there are 200,000 non-coding genes. A typical size might be 1000 bp so these genes would take up about 6.5% of the genome. That's about 10 times the number of protein-coding genes and more than 6 times the amount of coding DNA.

That's not going to make much of a difference in the junk DNA debate since proponents of junk DNA argue that 90% of the genome is junk and 10% is functional. All of those non-coding genes can be accommodated within the 10%.

The ENCODE researchers made a big deal out of pervasive transcription back in 2007 and again in 2012. We can quibble about the exact numbers but let's say that 80% of the human is transcribed. We know that protein-coding genes occupy at least 40% percent of the genome so much of this pervasive transcription is introns. If all of the presumptive regulatory genes are located in the remaining 40% (i.e. none in introns), and the average size is 1000 bp, then this could be about 1.24 million non-coding genes. Is this reasonable? Is this what Nils Walter is proposing?

I think there's some confusion about the difference between large numbers of functional transcripts and the bigger picture of how much total junk DNA there is in the human genome. I wish the opponents of junk DNA would commit to how much of the genome they think is functional and what evidence they have to support that position.

But they don't. So instead we're stuck with debates about how to decide whether some transcripts are functional or junk.

What does transcript concentration tell us about function?

If most detectable transcripts are due to spurious transcription of junk DNA then you would expect these transcripts to be present at very low levels. This turns out to be true as Nils Walter admits. He notes that "fewer than 1000 lncRNAs are present at greater than one copy per cell."

This is a problem for those who advocate that many of these low abundance transcripts must be functional. We are familiar with several of the ad hoc hypotheses that have been advanced to get around this problem. John Mattick has been promoting them for years [John Mattick's new paradigm shaft].

Walter advances two of these excuses. First, he says that a critical RNA may be present at an average of one molecule per cell but it might be abundant in just one specialized cell in the tissue. Furthermore, their expression might be transient so they can only be detected at certain times during development and we might not have assayed cells at the right time. I assume he's advocating that there might be a short burst of a large number of these extremely specialized regulatory RNAs in these special cells.

As far as I know, there aren't many examples of such specialized gene expression. You would need at least 100,000 examples in order to make a viable case for function.

His second argument is that many regulatory RNAs are restricted to the nucleus where they only need to bind to one regulatory sequence to carry out their function. This ignores the mass action laws that govern such interactions. If you apply the same reasoning to proteins then you would only need one lac repressor protein to shut down the lac operon in E. coli but we've known for 50 years that this doesn't work in spite of the fact that the lac repressor association constant shows that it is one of the tightest binding proteins known [DNA Binding Proteins]. This is covered in my biochemistry textbook on pages 650-651.1

If you apply the same reasoning to mammalian regulatory proteins then it turns out that you need 10,000 transcription factor molecules per nucleus in order to ensure that a few specific sites are occupied. That's not only because of the chemistry of binary interactions but also because the human genome is full of spurious sites that resemble the target regulatory sequence [The Specificity of DNA Binding Proteins]. I cover this in my book in Chapter 8: "Noncoding Genes and Junk RNA" in the section titled "On the important properties of DNA-binding proteins" (pp. 200-204). I use the estrogen receptor as an example based on calculations that were done in the mid-1970s. The same principles apply to regulatory RNAs.

This is a disagreement based entirely on biochemistry and molecular biology. There aren't enough examples (evidence) to make the first argument convincing and the second argument makes no sense in light of what we know about the interactions between molecules inside of the cell (or nucleus).

Note: I can almost excuse the fact that Nils Walter ignores my book on junk DNA, my biochemistry textbook, and my blog posts, but I can't excuse the fact that his main arguments have been challenged repeatedly in the scientific literature. A good scientist should go out of their way to seek out objections to their views and address them directly.


1. In addition to the thermodynamic (equilibrium) problem, there's a kinetic problem. DNA binding proteins can find their binding sites relatively quickly by one dimensional diffusion—an option that's not readily available to regulatory RNAs [Slip Slidin' Along - How DNA Binding Proteins Find Their Target].

Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]

Saturday, March 02, 2024

Nils Walter disputes junk DNA: (4) Different views of non-functional transcripts

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is trying to explain the conflict between proponents of junk DNA and their opponents. His main focus is building a case for large numbers of non-coding genes.

This is the third post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' In this post I'll describe the heart of the dispute according to Nils Walter.

-Nils Walter disputes junk DNA: (1) The surprise

-Nils Walter disputes junk DNA: (2) The paradigm shaft

-Nils Walter disputes junk DNA: (3) Defining 'gene' and 'function'

Thursday, February 29, 2024

Nils Walter disputes junk DNA: (3) Defining 'gene' and 'function'

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is trying to explain the conflict between proponents of junk DNA and their opponents. His main focus is building a case for large numbers of non-coding genes.

This is the third post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift.

-Nils Walter disputes junk DNA: (1) The surprise

-Nils Walter disputes junk DNA: (2) The paradigm shaft

Any serious debate requires some definitions and the debate over junk DNA is no exception. It's important that everyone is on the same page when using specific words and phrases. Nils Walter recognizes this so he begins his paper with a section called "Starting with the basics: Defining 'function' and 'gene'."

Tuesday, February 27, 2024

Nils Walter disputes junk DNA: (2) The paradigm shaft

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is trying to explain the conflict between proponents of junk DNA and their opponents. His main focus is building a case for large numbers of non-coding genes.

This is the second post in the series. The first one outlines the issues that led to the current paper.

Nils Walter disputes junk DNA: (1) The surprise

Walter begins his defense of function by outlining a "paradigm shift" that's illustrated in Figure 1.

FIGURE 1: Assessment of the information content of the human genome ∼20 years before (left)[110] and after (right)[111] the Human Genome Project was preliminarily completed, drawn roughly to scale.[9] This significant progress can be described per Thomas Kuhn as a “paradigm shift” flanked by extended periods of “normal science”, during which investigations are designed and results interpreted within the dominant conceptual frameworks of the sub-disciplines.[9] Others have characterized this leap in assigning newly discovered ncRNAs at least a rudimentary (elemental) biochemical activity and thus function as excessively optimistic, or Panglossian, since it partially extrapolates from the known to the unknown.[75] Adapted from Ref. [9].

Reference #9 is a paper by John Mattick promoting a "Kuhnian revolution" in molecular biology. I've already discussed that paper as an example of a paradigm shaft, which is defined as a strawman "paradigm" set up to make your work look like revolutionary [John Mattick's new paradigm shaft]. Here's the figure from the Mattick paper.

The Walter figure is another example of a paradigm shaft—not to be confused with a real paradigm shift.1 Both pie charts misrepresent the amount of functional DNA since they don't show regulatory sequences, centromeres, telomeres, origins of replication, and SARS. Together, these account for more functional DNA than the functional regions of protein-coding genes and non-coding genes. We didn't know the exact amounts in 1980 but we sure knew they existed. I cover this in Chapter 5 of my book: "The Big Picture."

The 1980 view also implies, incorrectly, that we knew nothing about the non-functional component of the genome when, in fact, we knew by then that half of our genome was composed of transposon and viral sequences that were likely to be inactive, degenerate fragments of once active elements. (John Mattick's figure is better.)

The 2020 view implies that most intron sequences are functional since introns make up more than 40% of our genome but only about 3% of the pie chart. As far as I know, there's no evidence to support that claim. About 80% of the pie chart is devoted to transcripts identified as either small ncRNAs or lncRNAs. The implication is that the discovery of these RNAs represents a paradigm shift in our understanding of the genome.

The alternative explanation is that we've known since the late 1960s that most of the human genome is transcribed and that these transcripts—most of which turned out to be introns—are junk RNA that is confined to the nucleus and rapidly degraded. Advances in technology have enabled us to detect many examples of spurious transcripts that are present transiently at low levels in certain cells. I cover this in Chaper 8 of my book: "Noncoding Genes and Junk RNA.

The whole point of Nils Walter's paper is to defend the idea that most of these transcripts are functional and the alternative explanation is wrong. He's trying to present a balanced view of the controversy so he's well aware of the fact that some of us interpret the red part of the pie chart as spurious transcripts (junk RNA). If he's wrong, and I am right, then there's no paradigm shift.

You don't get to shift the paradigm all on our own, even if John Mattick is on your side. A true paradigm shift requires that the entire community of scientists changes their perspective and that hasn't happened.

In the next few posts we'll see whether Nils Walter can make a strong case that all those lncRNAs are functional. They cover about two-thirds of the genome in the pie chart. If we assume that the average length of these long transcripts is 2000 bp then this represents one million transcripts and potentially one million non-coding genes.


1. The term "paradigm shaft" was coined by reader Diogenes in a comment on this blog from many years ago.

Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]

Wednesday, February 14, 2024

Copilot answers the question, "What is junk DNA?"

The Microsoft browser (Edge) has a built in function called Copilot. It's an AI assistant based on ChatGPT-4.

I decided to test it byt asking "What is junk DNA?" and here's the answer it gave me.