More Recent Comments

Friday, May 07, 2021

More misinformation about junk DNA: this time it's in American Scientist

Emily Mortola and Manyuan Long have just published an article in American Scientist about Turning Junk into Us: How Genes Are Born. The article contains a lot of misinformaton about junk DNA that I'll discuss below.

Emily Mortola is a freelance science writer who worked with Manyuan Long when she was an undergraduate (I think). Manyuan Long is the Edna K. Papazian Distinguished Service Professor of Ecology and Evolution in the Department of Ecology and Evolution at the University of Chicago. His main research interest is the origin of new genes. It's reasonable to suspect that he's an expert on genome structure and evolution.

The article is behind a paywall so most of you can't see anything more than the opening paragraphs so let's look at those first. The second sentence is ...

As we discovered in 2003 with the conclusion of the Human Genome Project, a monumental 13-year-long research effort to sequence the entire human genome, approximately 98.8 percent of our DNA was categorized as junk.

This is not correct. The paper on the finished version of the human genome sequence was published in October 2004 (Finishing the euchromatic sequence of the human genome) and the authors reported that the coding exons of protein-coding genes covered about 1.2% of the genome. However, the authors also noted that there are many genes for tRNAs, ribosomal RNAs, snoRNAs, microRNAs, and probably other functional RNAs. Although they don't mention it, the authors must also have been aware of regulatory sequences, centromeres, telomeres, origins of replication and possibly other functional elements. They never said that all noncoding DNA (98.8%) was junk because that would be ridiculous. It's even more ridiculous to say it in 2021 [Stop Using the Term "Noncoding DNA:" It Doesn't Mean What You Think It Means].

The part of the article that you can see also lists a few "Quick Takes" and one of them is ...

Close to 99 percent of our genome has been historically classified as noncoding, useless “junk” DNA. Consequently, these sequences were rarely studied.

This is also incorrect as many scientists have pointed out repeatedly over the past fifty years or so. At no time in the past 50 years has any knowledgeable scientist ever claimed that all noncoding DNA is junk. I'm sorely tempted to accuse the authors of this article of lying because they really should know better, especially if they're writing an article about junk DNA in 2021. However, I reluctantly defer to Hanlon's razor.

Mortola and Long claim that mammalian genomes have between 85% to 99% junk DNA and wonder if it could have a function.

To most geneticists, the answer was that it has no function at all. The flow of genetic information—the central dogma of molecular biology—seems to leave no role for all of our intergenic sequences. In the classical view, a gene consists of a sequence of nucleotides of four possible types--adenine, cytosine, guanine, and thymine--represented by the letters A, C, G, and T. Three nucleotides in a row make up a codon, with each codon corresponding to a specific amino acid, or protein subunit, in the final protein product. In active genes, harmful mutations are weeded out by selection and beneficial ones are allowed to persist. But noncoding regions are not expressed in the form of a protein, so mutations in noncoding regions can be neither harmful nor beneficial. In other words, "junk" mutations cannot be steered by natural selection.

Those of you who have read this far will cringe when reading that. There are so many obvious errors in that paragraph that applying Hanlon's razor seems very complimentary. Imagine saying in the 21st centurey that the Central Dogma leaves no role at all for regulatory sequences or ribosomal RNA genes! But there's more; the authors double-down on their incorrect understanding of "gene" in order to fit their misunderstanding of the Central Dogma.

What Is a Gene, Really?

In our de novo gene studies in rice, to truly assess the potential significance of de novo genes, we relied on a strict definition of the word "gene" with which nearly every expert can agree. First, in order for a nucleotide sequence to be considered a true gene, an open reading frame (ORF) must be present. The ORF can be thought of as the "gene itself"; it begins with a starting mark common for every gene and ends with one of three possible finish line signals. One of the key enzymes in this process, the RNA polymerase, zips along the strand of DNA like a train on a monorail, transcribing it into its messenger RNA form. This point brings us to our second important criterion: A true gene is one that is both transcribed and translated. That is, a true gene is first used as a template to make transient messenger RNA, which is then translated into a protein.

Five Things You Should Know if You Want to Participate in the Junk DNA Debate

The authors admit in the next paragraph that some pseudogenes may produce functional RNAs that are never translated into proteins but they don't mention any other types of gene. I can understand why you might concentrate on protein-coding genes if you are studying de novo genes but why not just say that there are two types of genes and either one can arise de novo? But there's another problem with their definition: they left out a key property of a gene. It's not sufficient that a given stretch of DNA is transcribed and the RNA is translated to make a protein: the protein has to have a function before you can say that the stretch of DNA is a gene [What Is a Gene?]. We'll see in a minute why this is important.

The main point of the paper is the birth of de novo genes and the authors discuss their work with the rice genome. They say they've discovered 175 de novo genes but they don't say how many have a real biological function. This is an important problem in this field and it would have been fascinating to see a description of how they go about assigning a function to their, mostly small, pepides [The evolution of de novo genes]. I'm guessing that they just assume a function as soon as they recognize an open reading frame in a transcript.

As you can see from the title of the article, the emphasis is on the idea that de novo genes can arise from junk DNA—a concept that's not seriously disputed. The one good thing about the article is that the authors do not directly state that the reason for junk DNA is to give rise to new genes but this caption is troubling.

The Human Genome Project was a 13-year-long research effort aimed at mapping the entire human genetic sequence. One of its most intriguing findings was the observation that the number of protein-coding genes estimated to exist in humans--approximately 22,300--represents a mere 1.2 percent of our whole genome, with the other 98.8 percent being categorized as noncoding, useless junk. Analyses of this presumed junk DNA in diverse species are now revealing its role in the creation of genes.

Why do science writers continue to spread misinformation about junk DNA when there's so much correct information out there? All you have to do is look [More misconceptions about junk DNA - what are we doing wrong?].


  1. Thanks,Larry.

    I am a scientist trained in another field, now on-line defender (alas that a thing is necesary) and expositor of evolution science, and find your takedowns of high-flown nonsense invaluable

  2. It would be very interesting to know just how these ideas arise and are perpetuated. How did all these folks first learn that non-coding = junk, historically? How did they learn that the central dogma has been overturned? Was it in undergraduate or graduate courses? Did they absorb it from their advisors or from colleagues at meetings?

    Any sociologists of science reading your blog?

    1. In the 1970s DNA sequencing was very difficult and expensive.
      Soon after the discovery of how DNA stored sequences used to replicate proteins there began a bidding war for research funds to specify the DNA sequences and identify their function. In the battle, non-coding sections were called "junk DNA" since there was no obvious function that could be intuitively connected with a particular gene. A protein coding sequence clearly had a function, even if what the protein did was unknown at the time. Since building a sequence data base was then extremely expensive (and boring), the argument against deciphering non-translated "Junk DNA" won out. But, the possible functionality of "Junk DNA" was raised in the late 1970s. The argument was simple: there was an evolutionary cost to making copies of useless DNA. Since this cost was being paid, the "Junk" must have a function. The human genome project was conceived after the discovery of polymerase chain reaction (PCR) in 1983. Many researchers were still objecting to spending scarce research money on non-coding sequences as late as 1989.

    2. There are a variety of reasons. A big problem is that one scientist will make a mistake which others repeat until it eventually becomes a "fact". The central dogma was explained incorrectly years ago in a book by James Watson, which may have been the source of this misconception. A big take away from this blog is that scientists aren't reading the primary literature, and have incorrect ideas about what these ideas meant to the people who came up with them in the first place. While we don't have the time to go through decades of papers when we cite something, you should be sure to do your homework before making big claims. This is inexcusable, as most scientists today have access to more literature at their computer than their predecessors could have dreamt of.

      A related problem is that scientists really abuse the terminology and use technical jargon inconsistently. Terms like epigenetics, metagenomics, junk DNA, and the central dogma have a variety of meanings that vary considerably between scientists. We often argue past each other because we don't understand what the other person believes.

      Another problem is that people feel the need to oversell the science. For decades, we have been exposed to headlines, such as "Scientists Found a Possible Cure for Cancer!", when a more appropriate headline would be "New research has made in incremental improvement of our understanding of a specific type of cancer. This may contribute to modest improvements in treatment years from now.". It's fashionable to state that scientists discovered something that Darwin didn't know, as if our current understanding of biology hasn't changed since the 1800s. This isn't limited to the science writers either. Scientists deserve a lot of blame by claiming that their work overturned past ideas, will lead to a paradigm shift, overthrows past dogmas, etc. There are many cases of scientists claiming to have discovered something new that was in fact discovered decades ago.

    3. Gary, I don't think your explanation can possibly be right. Many non-protein-coding, functional sequences were known in the 1970s. Why, one of the first genes to be sequenced for a lot of taxa was a small-subunit rRNA. How could anyone have decided that non-coding = junk, given that and the many other known functional sequences?

      I suspect that some of this has to do with the deflated ego problem.

    4. @Gary S. Hurd

      I was part of a team of post-docs who cloned and characterized one of the first eukaryotic protein-coding genes back in 1977. The first ones cloned were genes where we could use semi-purified abundant mRNAs as probes.

      I can assure you that back then we never, ever, thought that all noncoding RNA was junk. Quite the contrary, since one of our main objectives was to characterize the regulatory sequences around our genes.

      Of course we knew that mammalian genomes had a lot of junk and of course we weren't interested in putting any graduate students or post-docs on projects such as cloning and sequencing junk DNA. That would be suicidal.

      It's just as suicidal today. We'd love to see the ENCODE labs devote some of their resources to proving that every transcription factor binding site has a function but they secretly know that this would likely never lead to a publication and would ruin the careers of many graduate students and post-docs.

      You said, "The argument was simple: there was an evolutionary cost to making copies of useless DNA. Since this cost was being paid, the "Junk" must have a function."

      That was indeed the adaptationist argument against junk DNA and it's still the rationale used today by many scientists who don't understand molecular evolution. However, the argument was shown to be flawed with the development of Nearly-Neutral Theory and other advances in population genetics. Some of us now know that even if junk DNA is deleterious it can still exist.

      BTW, back in the 1990s there was a massive effort to focus on functional regions of the human genome by cloning and sequencing cDNAs (ESTs). It was a massive failure since they just ended up cloning a lot of junk RNAs.

    5. Incidentally, I've spent a lot of time and effort amplifying (sometimes by cloning) and sequencing junk DNA, assuming we count introns. It has the valuable function of being very handy in phylogenetics. That's probably why God created so much of it.

  3. The "evolutionary cost" of making useless DNA is almost inconsequential. I delved into the topic a few years ago and calculated that the amount of DNA in a typical diploid cell was only a small percentage, maybe 1%-2% of total cell dry mass. Also, it is easier for selection to increase the amount of DNA, such as useless DNA hitch-hiking on a useful DNA duplications (for example), than it is for selection or drift to eliminate useless DNA through the fixation of near-neutral deletions (i.e. junk elimination).
    IMHO, ideas, such as that junk DNA has "the purpose of providing" material for new genes, arise through occult teleology: The feeling that the universe is progressive and everything has a meaning that goes beyond mere proximate causality. Biochemists like Long may not guard themselves as strongly against this type of thinking as do hard-core evolutionists.

    1. I get that the average biochemist and molecular biologist may not be aware of the latest thinking in evolutionary biology but if they venture into that field then shouldn't they do a bit of reading to get up to speed?

      Anyway, that doesn't apply to Manyuan Long because he's a professor in a department of evolution.

      What perplexes me about these issues is that scientists like Long (and the ENCODE researchers) usually have a fairly large group of graduate students and post-docs and I assume they have frequent meetings and discussions. It's surprising to me that none of those students and post-docs raised the issue of whether noncoding DNA is all junk or what is the proper definition of the Central Dogma.

      I know that when I was a graduate student and a post-doc we often challenged the thinking of our advisors and fellow students. I also know for a fact that my own graduate students and post-docs never hesitated to tell me when I was wrong!

      Does that not happen any more?

    2. I have read with interest your continuing campaign against biologists who appear to reject junk DNA as a reality. Your arguments sound persuasive to a layman like myself (a physicist), but why don't the targets of your opprobrium respond? All the physics controversies I remember from my 55 years in the profession engendered public debates, sometimes short and brutal, sometimes quiet but chronic, usually quite easy to follow in the literature. Where are those debates on the extent of junk DNA?

    3. I don't know the answer to your question. I often send copies of my posts to the scientists and science writers I mention but I never get a reply. I suspect they think I'm a kook given the overwhelming prevalence of the false counter-narrative.

      They have no reason to think they are wrong given that everyone around them thinks like they do. However, a bit of research should reveal that some of their ideas are, at the very least, controversial, but they don't seem to have taken that literature into account.

    4. Few people seem to be able to admit they were wrong. That might be another reason.

    5. I keep suspecting that part of the reason researchers hold onto the idea of most of the genome being "functional" is: "Think of all the grants we can apply for!