More Recent Comments

Showing posts sorted by date for query junk dna. Sort by relevance Show all posts
Showing posts sorted by date for query junk dna. Sort by relevance Show all posts

Thursday, July 06, 2023

James Shapiro doesn't like junk DNA

Shapiro doubles down on his claim that junk DNA doesn't exist.

It's been a while since we've heard from James Shaprio. You might recall that James A. Shapiro is a biochemistry/microbiology professor at the University of Chicago and the author of a book promoting natural genetic engineering. I reviewed his book and didn't like it very much—Shapiro didn't like my review [James Shapiro Never Learns] [James Shapiro Responds to My Review of His Book].

Tuesday, June 27, 2023

Gert Korthof reviews my book

Gert Korthof thinks that the current view of evolution is incomplete and he's looking for a better explanation. He just finished reading my book so he wrote a review on his blog.

Scientists say: 90% of your genome is junk. Have a nice day! Biochemist Laurence Moran defends junk DNA theory

The good news is that I've succeeded in making Gert Korthof think more seriously about junk DNA and random genetic drift. The bad news is that I seem to have given him the impression that natural selection is not an important part of evolution. Furthermore, he insists that "evolution needs both mutation and natural selection" because he doesn't like the idea that random genetic drift may be the most common mechanism of evolution. He thinks that statement only applies at the molecular level. But "evolution" doesn't just refer to adaptation at the level of organisms. It's just not true that all examples of evolution must involve natural selection.

I think I've failed to explain the null hypothesis correctly because Korthof writes,

It's clear this is a polemical book. It is a very forceful criticism of ENCODE and everyone who uncritically accepts and spreads their views including Nature and Science. I agree that this criticism is necessary. However, there is a downside. Moran writes that the ENCODE research goals of documenting all transcripts in the human genome was a waste of money. Only a relatively small group of transcripts have a proven biological function ("only 1000 lncRNAs out of 60,000 were conserved in mammals"; "the number with a proven function is less than 500 in humans"; "The correct null hypothesis is that these long noncoding RNAs are examples of noisy transcription", or junk RNA"). Furthermore, Moran also thinks it is a waste of time and money to identify the functions of the thousands of transcripts that have been found because he knows its all junk. I disagree. The null hypothesis is an hypothesis, not a fact. One cannot assume it is true. That would be the 'null dogma'.

That's a pretty serious misunderstanding of what I meant to say. I think it was a worthwhile effort to document the number of transcripts in various cell types and all the potential regulatory sequences. What I objected to was the assumption by ENCODE researchers that these transcripts and sites were functional simply because they exist. The null hypothesis is no function and scientists must provide evidence of function in order to refute the null hypothesis.

I think it would be a very good idea to stop further genomic surveys and start identifying which transcripts and putative regulatory elements are actually functional. I'd love to know the answer to that very important question. However, I recognize that it will be expensive and time consuming to investigate every transcript and every putative regulatory element. I don't think any lab is going to assign random transcripts and random transcription factor binding sites to graduate students and postdocs because I suspect that most of those sequences aren't going to have a function. If I were giving out grant money I give it to some other lab. In that sense, I believe that it would be a waste of time and money to search for the function of tens of thousands of transcripts and over one million transcription factor binding sites.

That not dogmatic. It's common sense. Most of those transcripts and binding sites are not conserved and not under purifying election. That's pretty good evidence that they aren't functional, especially if you believe in the importance of natural selection.

There's lot more to his review including some interesting appendices. I recommend that you read it carefully to see a different perspective than the one I adocate in my book.


Saturday, May 20, 2023

Chapter 10: Turning Genes On and Off

Francis Collins, and many others, believe that the concept of junk DNA is outmoded because recent discoveries have shown that most of the human genome is devoted to regulation. This is part of a clash of worldviews where one side sees the genome as analogous to a finely tuned Swiss watch with no room for junk and the other sees the genome as a sloppy entity that's just good enough to survive.

The ENCODE researchers and their allies claim that the human genome contains more than 600,000 regulatory sites and that means an average of 24 per gene covering about 10,000 bp per gene. I explain why these numbers are unreasonable and why most of the sites they identify have nothing to do with biologically significant regulation.

This chapter also covers the epigenetics hype and restriction/modification.

Click on this link to see more.
Chapter 10: Turning Genes On and Off


Wednesday, May 17, 2023

Chapter 9: The ENCODE Publicity Campaign

In September 2012, the ENCODE researchers published a bunch of papers claiming to show that 80% of the human genome was functional. They helped orchestrate a massive publicity campaign with the help pf Nature— a campaign that succeeded in spreading the message that junk DNA had been refuted.

That claim was challenged within 24 hours by numerous scientists on social media. They pointed out that the ENCODE researchers were using a ridiculous definition of function and that they had completely ignored all the evidence for junk DNA. Over the next two years there were numerous scientific papers criticizing the ENCODE claims and the ENCODE researchers were forced to retract the claim that they had proven that 80% of the genome is functional.

I discuss what went wrong and lay the blame mostly on the ENCODE researchers who did not behave as proper scientists when presenting a controversial hypothesis. The editors of Nature share the blame for not doing a proper job of vetting the ENCODE claims and not subjecting the papers to rigorous peer review. Science writers also failed to think critically about the results they were reporting.

Click on this link to see more.
Chapter 9: The ENCODE Publicity Campaign


Monday, May 15, 2023

Chapter 8: Noncoding Genes and Junk RNA

I think there are no more than 5,000 noncoding genes but many scientists claim that there are tens of thousands of newly discovered noncoding genes. I describe the known noncoding genes (less than 1000) and explain why many of the transcripts detected are just junk RNA produced by spurious transcription. The presence of abundant noncoding genes will not solve the Deflated Ego Problem.

This chapter covers the misconceptions about the Central Dogma and how they are incorrectly used to try and discredit junk DNA. The views of John Mattick are explained and refuted. I end the chapter with a plea to adopt a worldview that can accommodate messy biochemistry and a sloppy genome that's full of junk DNA.

Click on this link to see more.

Chapter 8: NoncodingGenes and Junk RNA

Thursday, May 11, 2023

Chapter 7: Gene Families and the Birth & Death of Genes

This chapter describes gene families in the human genome. I explain how new genes are born by gene duplication and how they die by deletion or by becoming pseudogenes. Our genome is littered with pseudogenes: how do they evolve and are they all junk? What are the consequences of whole genome duplications and what does it teach us about junk DNA? How many real ORFan genes are there and why do some people think there are more? Finally, you will learn why dachshunds have short legs and what "The Bridge on the River Kwai" has to do with the accuracy of the human genome sequence.

Click on this link to see more.

Gene Families and the Birth and Death of Genes

Saturday, March 25, 2023

ChatGPT lies about junk DNA

I asked ChatGPT some questions about junk DNA and it made up a Francis Crick quotation and misrepresented the view of Susumu Ohno.

We have finally restored the Junk DNA article on Wikipedia. (It was deleted about ten years ago when Wikipedians decided that junk DNA doesn't exist.) One of the issues on Wikipedia is how to deal with misconceptions and misunderstandings while staying within the boundaries of Wikipedia culture. Wikipedians have an aversion to anything that looks like editorializing so you can't just say something like, "Nobody ever said that all non-coding DNA was junk." Instead, you have to find a credible reference to someone else who said that.

I've been trying to figure out how far the misunderstandings of junk DNA have spread so I asked ChatGPt (from OpenAI) again.

Wednesday, March 08, 2023

A small crustacean with a very big genome

The antarctic krill genome is the largest animal genome sequenced to date.

Antarctic krill (Euphausia superba) is a species of small crustacean (about 6 cm long) that lives in large swarms in the seas around Antarctica. It is one of the most abundant animals on the planet in terms of biomass and numbers of individuals.

It was known to have a large genome with abundant repetitive DNA sequences making assembly of a complete genome very difficult. Recent technological advances have made it possible to sequence very long fragments of DNA that span many of the repetitive regions and allow assembly of a complete genome (Shao et al. 2023).

The project involved 28 scientists from China (mostly), Australia, Denmark, and Italy. To give you an idea of the effort involved, they listed the sequencing data that was collected: 3.06 terabases (Tb) PacBio long read sequences, 734.99 Gb PacBio circular consensus sequences, 4.01 Tb short reads, and 11.38 Tb Hi-C reads. The assembled genome is 48.1 Gb, which is considerably larger than that of the African lungfish (40 Gb), which up until now was the largest fully sequenced animal genome.

The current draft has 28,834 protein-coding genes and an unknown number of noncoding genes. About 92% of the genome is repetitive DNA that's mostly transposon-related sequences. However, there is an unusual amount of highly repetitive DNA organized as long tandem repeats and this made the assembly of the complete genome quite challenging.

The protein-coding genes in the Antarctic krill are longer than in other species due to the insertion of repetitive DNA into introns but the increase in intron size is less than expected from studies of other large genomes such as lungfish and Mexican axolotl. It looks like more of the genome expansion has occurred in the intergenic DNA compared to these other species.

This study supports the idea that genome expansion is mostly due to the insertion and propagation of repetitive DNA sequences. Some of us think that the repetitive DNA is mostly junk DNA but in this case it seems unusual that there would be so much junk in the genome of a species with such a huge population size (about 350 trillion individuals). The authors were aware of this problem but they were able to calculate an effective population size because they had sequence data from different individuals all around Antarctica. The effective population size (Ne) turned out to be one billion times smaller than the census population size indicating that the population of krill had been much smaller in the recent past. Their data suggests strongly that this smaller population existed only 10 million years ago.

The authors don't mention junk DNA. They seem to favor the idea that large genomes are associated with crustaceans that live in polar regions and that large genomes may confer a selective advantage.


Shao, C., Sun, S., Liu, K., Wang, J., Li, S., Liu, Q., Deagle, B.E., Seim, I., Biscontin, A., Wang, Q. et al. (2023) The enormous repetitive Antarctic krill genome reveals environmental adaptations and population insights. Cell 186:1-16. [doi: 10.1016/j.cell.2023.02.005]

Wednesday, March 01, 2023

Definition of a gene (again)

The correct definition of a molecular gene isn't difficult but getting it recognized and accepted is a different story.

When writing my book on junk DNA I realized that there was an issue with genes. The average scientist, and consequently the average science writer, has a very confused picture of genes and the proper way to define them. The issue shouldn't be confusing for Sandwalk readers since we've covered that ground many times in the past. I think the best working definition of a gene is, "A gene is a DNA sequence that is transcribed to produce a functional product" [What Is a Gene?]

Thursday, February 16, 2023

What are the best Nobel Prizes in biochemistry & molecular biology since 1945?

The 2022 Nobel Prize in Physiology or Medicne went to Svante Pääbo “for his discoveries concerning the genomes of extinct hominins and human evolution”. It's one of a long list of Nobel Prizes awarded for technological achievement. It most cases, the new techniques led to a better understanding of science and medicine.

Since World War II, there have been significant advances in our understanding of biology but most of these have come about by the slow and steady accumulation of knowledge and not by paradigm-shifting breakthroughs. These advances don't often get recognized by the Nobel Prize committees because it's difficult to single out any one individual or any single experiment that merits a Nobel Prize. In some cases the Nobel Prize committees have tried to recognize major advances by picking out leaders that have made important contributions over a number of years but their choices don't always satisfy others in the field. One of the notable successes is the awarding of Nobel Prizes to Max Delbrück, Alfred D. Hershey and Salvador E. Luria “for their discoveries concerning the replication mechanism and the genetic structure of viruses” (Nobel Prize in Physiology or Medicine 1969). Another is Edward B. Lewis, Christiane Nüsslein-Volhard and Eric F. Wieschaus “for their discoveries concerning the genetic control of early embryonic development” (Nobel Prize in Physiology or Medicine 1995)

Birds of a feather: epigenetics and opposition to junk DNA

There's an old saying that birds of a feather flock together. It means that people with the same interests tend to associate with each other. It's extended meaning refers to the fact that people who believe in one thing (X) tend to also believe in another (Y). It usually means that X and Y are both questionable beliefs and it's not clear why they should be associated.

I've noticed an association between those who promote epigenetics far beyond it's reasonable limits and those who reject junk DNA in favor of a genome that's mostly functional. There's no obvious reason why these two beliefs should be associated with each other but they are. I assume it's related to the idea that both beliefs are presumed to be radical departures from the standard dogma so they reinforce the idea that the author is a revolutionary.

Or maybe it's just that sloppy thinking in one field means that sloppy thinking is the common thread.

Here's an example from Chapter 4 of a 2023 edition of the Handbook of Epigenetics (Third Edition).

The central dogma of life had clearly established the importance of the RNA molecule in the flow of genetic information. The understanding of transcription and translation processes further elucidated three distinct classes of RNA: mRNA, tRNA and rRNA. mRNA carries the information from DNA and gets translated to structural or functional proteins; hence, they are referred to as the coding RNA (RNA which codes for proteins). tRNA and rRNA help in the process of translation among other functions. A major part of the DNA, however, does not code for proteins and was previously referred to as junk DNA. The scientists started realizing the role of the junk DNA in the late 1990s and the ENCODE project, initiated in 2003, proved the significance of junk DNA beyond any doubt. Many RNA types are now known to be transcribed from DNA in the same way as mRNA, but unlike mRNA they do not get translated into any protein; hence, they are collectively referred to as noncoding RNA (ncRNA). The studies have revealed that up to 90% of the eukaryotic genome is transcribed but only 1%–2% of these transcripts code for proteins, the rest all are ncRNAs. The ncRNAs less than 200 nucleotides are called small noncoding RNAs and greater than 200 nucleotides are called long noncoding RNAs (lncRNAs).

In case you haven't been following my blog posts for the past 17 years, allow me to briefly summarize the flaws in that paragraph.

  • The central dogma has nothing to do with whether most of our genome is junk
  • There was never, ever, a time when knowledgeable scientists defended the idea that all noncoding DNA is junk
  • ENCODE did not "prove the significance of junk DNA beyond any doubt"
  • Not all transcripts are functional; most of them are junk RNA transcribed from junk DNA

So, I ask the same question that I've been asking for decades. How does this stuff get published?


Sunday, January 01, 2023

The function wars are over

In order to have a productive discussion about junk DNA we needed to agree on how to define "function" and "junk." Disagreements over the definitions spawned the Function Wars that became intense over the past decade. That war is over and now it's time to move beyond nitpicking about terminology.

The idea that most of the human genome is composed of junk DNA arose gradually in the late 1960s and early 1970s. The concept was based on a lot of evidence dating back to the 1940s and it gained support with the discovery of massive amounts of repetitive DNA.

Various classes of functional DNA were known back then including: regulatory sequences, protein-coding genes, noncoding genes, centromeres, and origins of replication. Other categories have been added since then but the total amount of functional DNA was not thought to be more than 10% of the genome. This was confirmed with the publication of the human genome sequence.

From the very beginning, the distinction between functional DNA and junk DNA was based on evolutionary principles. Functional DNA was the product of natural selection and junk DNA was not constrained by selection. The genetic load argument was a key feature of Susumu Ohno's conclusion that 90% of our genome is junk (Ohno, 1972a; Ohno, 1972b).

Thursday, December 22, 2022

Junk DNA, TED talks, and the function of lncRNAs

Most of our genome is transcribed but so far only a small number of these transcripts have a well-established biological function.

The fact that most of our genome is transcribed has been known for 50 years but that fact only became widely known with the publication of ENCODE's preliminary results in 2007 (ENCODE, 2007). The ENOCDE scientists referred to this as "pervasive transription" and this label has stuck.

By the end of the 1970s we knew that much of this transcription was due to introns. The latest data shows that protein coding genes and known noncoding genes occupy about 45% of the genome and most of that is intron sequences that are mostly junk. That leaves 30-40% of the genome that is transcribed at some point producing something like one million transcripts of unknown function.

Wednesday, December 21, 2022

A University of Chicago history graduate student's perspective on junk DNA

A new master's thesis on the history of junk DNA has been posted. It's from the Department of History at the University of Chicago.

My routine scan for articles on junk DNA turned up the abstract of an M.A. thesis on the history of junk DNA: Requiem for a Gene: The Problem of Junk DNA for the Molecular Paradigm. The supervisor is Professor Emily Kern in the Department of History at the University of Chicago. I've written to her to ask for a copy of the thesis and for permission to ask her, and her student, some questions about the thesis. No reply so far.

Here's the abstract of the thesis.

“Junk DNA” has been at the center of several high-profile scientific controversies over the past four decades, most recently in the disputes over the ENCODE Project. Despite its prominence in these debates, the concept has yet to be properly historicized. In this thesis, I seek to redress this oversight, inaugurating the study of junk DNA as a historical object and establishing the need for an earlier genesis for the concept than scholars have previously recognized. In search of a new origin story for junk, I chronicle developments in the recognition and characterization of noncoding DNA sequences, positioning them within existing historiographical narratives. Ultimately, I trace the origin of junk to 1958, when a series of unexpected findings in bacteria revealed the existence of significant stretches of DNA that did not encode protein. I show that the discovery of noncoding DNA sequences undermined molecular biologists’ vision of a gene as a line of one-dimensional code and, in turn, provoked the first major crisis in their nascent field. It is from this crisis, I argue, that the concept of junk DNA emerged. Moreover, I challenge the received narrative of junk DNA as an uncritical reification of the burgeoning molecular paradigm. By separating the history of junk DNA from its mythology, I demonstrate that the conceptualization of junk DNA reveals not the strength of molecular biological authority but its fragility.

It looks like it might be a history of noncoding DNA but I won't know for certain until I see the entire thesis. It's only available to students and staff at the University of Chicago.


Friday, December 16, 2022

Can the AI program ChatGPT pass my exam?

There's a lot of talk about ChatGPT and how it can prepare lectures and get good grades on undergraduate exams. However, ChatGPT is only as good as the information that's popular on the internet and that's not always enough to get a good grade on my exam.

ChatGPT is an artificial intelligence (AI) program that's designed to answer questions using a style and language that's very much like the responses you would get from a real person. It was developed by OpenAI, a tech company in San Francisco. You can create an account and log in to ask any question you want.

Several professors have challenged it with exam questions and they report that ChatGPT would easily pass their exams. I was skeptical, especially when it came to answering questions on controversial topics where there was no clear answer. I also suspected that ChatGPT would get it's answers from the internet and this means that popular, but incorrect, views would likely be part of ChatGPT's response.

Here are my questions and the AI program's answers. It did quite well in some cases but not so well in others. My main concern is that programs like this might be judged to be reliable sources of information despite the fact that the real source is suspect.

Monday, December 12, 2022

Did molecular biology make any contribution to evolutionary theory?

Some evolutionary biologists think—incorrectly, in my opinion—that molecular biology has made no contributions to our understanding of evolution.

PNAS published a series of articles on Gregor Mendel and one of them caught my eye. Here's what Nicholas Barton wrote in his article The "New Synthesis".

During the 1960s and 1970s, there were further conceptual developments—largely independent of the birth of molecular biology during the previous two decades (15). First, there was an understanding that adaptations cannot be explained simply as being “for the good of the species” (16, 17). One must explain how the genetic system (including sexual reproduction, recombination, and a fair meiosis, with each copy of a gene propagating with the same probability) is maintained through selection on individual genes, and remains stable despite mutations that would disrupt the system (17, 19, 20). Second, and related to this, there was an increased awareness of genetic conflicts that arise through sexual reproduction; selfish elements may spread through biased inheritance, even if they reduce individual fitness (19, 21, 22). In the decade following the discovery that DNA carries genetic information, all the fundamental principles of molecular biology were established: the flow of information from sequences of DNA through RNA to protein, the regulation of genes by binding to specific sequences in promoters, and the importance of allostery in allowing arbitrary regulatory networks (23, 24). Yet, the extraordinary achievements of molecular biology had little effect on the conceptual development of evolutionary biology. Conversely, although evolutionary arguments were crucial in the founding of molecular biology, they have had rather little influence in the half-century since (e.g., ref. 25). Of course, molecular biology has revealed an astonishing range of adaptations that demand explanation—for example, the diversity of biochemical pathways, that allow exploitation of almost any conceivable resource, or the efficiency of molecular machines such as the ribosome, which translates the genetic code. Technical advances have brought an accelerating flood of data, most recently, giving us complete genome sequences and expression patterns from any species. Yet, arguably, no fundamentally new principles have been established in molecular biology, and, in evolutionary biology, despite sophisticated theoretical advances and abundant data, we still grapple with the same questions as a century or more ago.

This does not seem fair to me. I think that neutral theory, nearly neutral theory, and the importance of random genetic drift relied heavily on work done by molecular biologists. Similarly, the development of dating techniques using DNA and protein sequences is largely the work of molecular biologists. It wasn't the adaptationists or the paleontologists who discovered that humans and chimpanzees shared a common ancestor 5-7 million years ago and it wasn't either of those groups who discovered the origin of mitochondria.

And some of us are grappling with the idea that most of our genome is junk DNA, a question that never would have occurred to evolutionary biologists from a century ago.

Barton knows all about modern population genetics and the importance of neutral theory because later on he says,

If we consider a single allele, then we can see it as “effectively neutral” if its effect on fitness is less than ∼1/2Ne. This idea was used by Ohta (54) in a modification of the neutral theory, to suggest why larger populations might be less diverse than expected (because a smaller fraction of mutations would be effectively neutral), and why rates of substitution might be constant per year rather than per generation (because species with shorter generation times might tend to have large populations, and have a smaller fraction of effectively neutral mutations that contribute to long-term evolution). Lynch (21) has applied this concept to argue that molecular adaptations that are under weak selection cannot be established or maintained in (relatively) smaller populations, imposing a “drift barrier” to adaptation. Along the same lines, Kondrashov (55) has argued that deleterious mutations with Nes ≈ 1 will accumulate, steadily degrading the population. Both ideas seem problematic if we view adaptation as due to optimization of polygenic traits: Organisms can be well adapted even if drift dominates selection on individual alleles, and, under a model of stabilizing selection on very many traits, any change that degrades fitness can be compensated.

Barton may think that the drift-barrier hypothesis is "problematic" but it certainly seems like a significant advance that owes something to molecular biology.

What do you think? Do you agree with Barton that, "... the extraordinary achievements of molecular biology had little effect on the conceptual development of evolutionary biology."


Thursday, December 01, 2022

University of Michigan biochemistry students edit Wikipedia

Students in a special topics course at the University of Michigan were taught how to edit a Wikipedia article in order to promote function in repetitive DNA and downplay junk.

The Wikipedia article on Repeated sequence (DNA) was heavily edited today by students who were taking an undergraduate course at the University of Michgan. One of the student leaders, Rasberry Neuron, left the following message on the "Talk" page.

This page was edited for a course assignment at the University of Michigan. The editing process included peer review by four students, the Chemistry librarian at the University of Michigan, and course instructors. The edits published on 12/01/2022 reflect improvements guided by the original editing team and the peer review feedback. See the article's History page for information about what changes were made from the previous version.

References to junk DNA were removed by the students but quickly added back by Paul Gardner who is currently fixing other errors that the students have made.

I checked out the webpage for the course at CHEM 455_505 Special Topics in Biochemistry - Nucleic Acids Biochemistry. The course description is quite revealing.

We now realize that the human genome contains at least 80,000 non-redundant non-coding RNA genes, outnumbering protein-coding genes by at least 4-fold, a revolutionary insight that has led some researchers to dub the eukaryotic cell an “RNA machine”. How exactly these ncRNAs guide every cellular function – from the maintenance and processing to the regulated expression of all genetic information – lies at the leading edge of the modern biosciences, from stem cell to cancer research. This course will provide an equally broad as deep overview of the structure, function and biology of DNA and particularly RNA. We will explore important examples from the current literature and the course content will evolve accordingly.

The class will be taught from a chemical/molecular perspective and will bring modern interdisciplinary concepts from biochemistry, biophysics and molecular biology to the fore.

Most of you will recognize right away that there are factually incorrect statements (i.e. misinformation) in that description. It is not true that there are at least 80,000 noncoding genes in the human genome. At some point in the future that may turn out to be true but it's highly unlikely. Right now, there are at most 5,000 proven noncoding genes. There are many scientists who claim that the mere existence of a noncoding transcript is proof that a corresponding gene must exist but that's not how science works. Before declaring that a gene exists you must present solid evidence that it produces a biologically relevant product [Most lncRNAs are junk] [Wikipedia blocks any mention of junk DNA in the "Human genome" article] [Editing the Wikipedia article on non-coding DNA] [On the misrepresentation of facts about lncRNAs] [The "standard" view of junk DNA is completely wrong] [What's In Your Genome? - The Pie Chart] [How many lncRNAs are functional?].

I'm going to email a link to this post to the course instructors and some of the students. Let's see if we can get them to discuss junk DNA.


Saturday, November 19, 2022

How many enhancers in the human genome?

In spite of what you might have read, the human genome does not contain one million functional enhancers.

The Sept. 15, 2022 issue of Nature contains a news article on "Gene regulation" [Two-layer design protects genes from mutations in their enhancers]. It begins with the following sentence.

The human genome contains only about 20,000 protein-coding genes, yet gene expression is controlled by around one million regulatory DNA elements called enhancers.

Sandwalk readers won't need to be told the reference for such an outlandish claim because you all know that it's the ENCODE Consortium summary paper from 2012—the one that kicked off their publicity campaign to convince everyone of the death of junk DNA (ENCODE, 2012). ENCODE identified several hundred thousand transcription factor (TF) binding sites and in 2012 they estimated that the total number of base pairs invovled in regulating gene expression could account for 20% of the genome.

How many of those transcription factor binding sites are functional and how many are due to spurious binding to sites that have nothing to do with gene regulation? We don't know the answer to that question but we do know that there will be a huge number of spurious binding sites in a genome of more than three billion base pairs [Are most transcription factor binding sites functional?].

The scientists in the ENCODE Consortium didn't know the answer either but what's surprising is that they didn't even know there was a question. It never occured to them that some of those transcription factor binding sites have nothng to do with regulation.

Fast forward ten years to 2022. Dozens of papers have been published criticizing the ENCODE Consortium for their stupidity lack of knowledge of the basic biochemical properties of DNA binding proteins. Surely nobody who is interested in this topic believes that there are one million functional regulatory elements (enhancers) in the human genome?

Wrong! The authors of this Nature article, Ran Elkon at Tel Aviv University (Israel) and Reuven Agami at the Netherlands Cancer Institute (Amsterdam, Netherlands), didn't get the message. They think it's quite plausible that the expression of every human protein-coding gene is controlled by an average of 50 regulatory sites even though there's not a single known example any such gene.

Not only that, for some reason they think it's only important to mention protein-coding genes in spite of the fact that the reference they give for 20,000 protein-coding genes (Nurk et al., 2022) also claims there are an additional 40,000 noncoding genes. This is an incorrect claim since Nurk et al. have no proof that all those transcribed regions are actually genes but let's play along and assume that there really are 60,000 genes in the human genome. That reduces the average number of enhancers to an average of "only" 17 enhancers per gene. I don't know of a single gene that has 17 or more proven enhancers, do you?

Why would two researchers who study gene regulation say that the human genome contains one million enhancers when there's no evidence to support such a claim and it doesn't make any sense? Why would Nature publish this paper when surely the editors must be aware of all the criticism that arose out of the 2012 ENCODE publicity fiasco?

I can think of only two answers to the first question. Either Elkon and Agami don't know of any papers challenging the view that most TF binding sites are functional (see below) or they do know of those papers but choose to ignore them. Neither answer is acceptable.

I think that the most important question in human gene regulation is how much of the genome is devoted to regulation. How many potential regulatory sites (enhancers) are functional and how many are spurious non-functional sites? Any paper on regulation that does not mention this problem should not be published. All results have to interpreted in light of conflicting claims about function.

Here are some example of papers that raise the issue. The point is not to prove that these authors are correct - although they are correct - but to show that there's a controvesy. You can't just state that there are one million regulatory sites as if it were a fact when you know that the results are being challenged.

"The observations in the ENCODE articles can be explained by the fact that biological systems are noisy: transcription factors can interact at many nonfunctional sites, and transcription initiation takes place at different positions corresponding to sequences similar to promoter sequences, simply because biological systems are not tightly controlled." (Morange, 2014)

"... ENCODE had not shown what fraction of these activities play any substantive role in gene regulation, nor was the project designed to show that. There are other well-studied explanations for reproducible biochemical activities besides crucial human gene regulation, including residual activities (pseudogenes), functions in the molecular features that infest eukaryotic genomes (transposons, viruses, and other mobile elements), and noise." (Eddy, 2013)

"Given that experiments performed in a diverse number of eukaryotic systems have found only a small correlation between TF-binding events and mRNA expression, it appears that in most cases only a fraction of TF-binding sites significantly impacts local gene expression." (Palazzo and Gregory, 2014)

One surprising finding from the early genome-wide ChIP studies was that TF binding is widespread, with thousand to tens of thousands of binding events for many TFs. These number do not fit with existing ideas of the regulatory network structure, in which TFs were generally expected to regulate a few hundred genes, at most. Binding is not necessarily equivalent to regulation, and it is likely that only a small fraction of all binding events will have an important impact on gene expression. (Slattery et al., 2014)

Detailed maps of transcription factor (TF)-bound genomic regions are being produced by consortium-driven efforts such as ENCODE, yet the sequence features that distinguish functional cis-regulatory sites from the millions of spurious motif occurrences in large eukaryotic genomes are poorly understood. (White et al., 2013)

One outstanding issue is the fraction of factor binding in the genome that is "functional", which we define here to mean that disturbing the protein-DNA interaction leads to a measurable downstream effect on gene regulation. (Cusanovich et al., 2014)

... we expect, for example, accidental transcription factor-DNA binding to go on at some rate, so assuming that transcription equals function is not good enough. The null hypothesis after all is that most transcription is spurious and alterantive transcripts are a consequence of error-prone splicing. (Hurst, 2013)

... as a chemist, let me say that I don't find the binding of DNA-binding proteins to random, non-functional stretches of DNA surprising at all. That hardly makes these stretches physiologically important. If evolution is messy, chemistry is equally messy. Molecules stick to many other molecules, and not every one of these interactions has to lead to a physiological event. DNA-binding proteins that are designed to bind to specific DNA sequences would be expected to have some affinity for non-specific sequences just by chance; a negatively charged group could interact with a positively charged one, an aromatic ring could insert between DNA base pairs and a greasy side chain might nestle into a pocket by displacing water molecules. It was a pity the authors of ENCODE decided to define biological functionality partly in terms of chemical interactions which may or may not be biologically relevant. (Jogalekar, 2012)


Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko, A., et al. (2022) The complete sequence of a human genome. Science, 376:44-53. [doi:10.1126/science.abj6987]

The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57-74. [doi: 10.1038/nature11247]

Saturday, November 05, 2022

Nature journalist is confused about noncoding RNAs and junk

Nature Methods is one of the journals in Nature Portfolio published by Springer Nature. Its focus is novel methods in the life sciences.

The latest issue (October, 2022) highlights the issues with identifying functional noncoding RNAs and the editorial, Decoding noncoding RNAs, is quite good—much better than the comments in other journals. Here's the final paragraph.

Despite the increasing prominence of ncRNA, we remind readers that the presence of a ncRNA molecule does not always imply functionality. It is also possible that these transcripts are non-functional or products from, for example, splicing errors. We hope this Focus issue will provide researchers with practical advice for deciphering ncRNA’s roles in biological processes.

However, this praise is mitigated by the appearance of another article in the same journal. Science journalist, Vivien Marx has written a commentary with a title that was bound to catch my eye: How noncoding RNAs began to leave the junkyard. Here's the opening paragraph.

Junk. In the view of some, that’s what noncoding RNAs (ncRNAs) are — genes that are transcribed but not translated into proteins. With one of his ncRNA papers, University of Queensland researcher Tim Mercer recalls that two reviewers said, “this is good” and the third said, “this is all junk; noncoding RNAs aren’t functional.” Debates over ncRNAs, in Mercer’s view, have generally moved from ‘it’s all junk’ to ‘which ones are functional?’ and ‘what are they doing?’

This is the classic setup for a paradigm shaft. What you do is create a false history of a field and then reveal how your ground-breaking work has shattered the long-standing paradigm. In this case, the false history is that the standard view among scientists was that ALL noncoding RNAs were junk. That's nonsense. It means that these old scientists must have dismissed ribosomal RNA and tRNA back in the 1960s. But even if you grant that those were exceptions, it means that they knew nothing about Sidney Altman's work on RNAse P (Nobel Prize, 1989), or 7SL RNA (Alu elements), or the RNA components of spliceosomes (snRNAs), or PiWiRNAs, or snoRNAs, or microRNAs, or a host of regulatory RNAs that have been known for decades.

Knowledgeable scientists knew full well that there are many functional noncoding RNAS and that includes some that are called lncRNAs. As the editorial says, these knowledgeable scientists are warning about attributing function to all transcripts without evidence. In other words, many of the transcripts found in human cells could be junk RNA in spite of the fact that there are also many functional nonciding RNAs.

So, Tim Mercer is correct, the debate is over which ncRNAs are functional and that's the same debate that's been going on for 50 years. Move along folks, nothing to see here.

The author isn't going to let this go. She decides to interview John Mattick, of all people, to get a "proper" perspective on the field. (Tim Mercer is a former student of Mattick's.) Unfortunately, that perspective contains no information on how many functional ncRNAs are present and on what percentage of the genome their genes occupy. It's gonna take several hundred thousand lncRNA genes to make a significant impact on the amount of junk DNA but nobody wants to say that. With John Mattick you get a twofer: a false history (paradigm strawman) plus no evidence that your discoveries are truly revolutionary.

Nature Methods should be ashamed, not for presenting the views of John Mattick—that's perfectly legitimate—but for not putting them in context and presenting the other side of the controversy. Surely at this point in time (2022) we should all know that Mattick's views are on the fringe and most transcripts really are junk RNA?