Sandwalk: Search results for "junk dna"

Showing posts sorted by date for query "junk dna". Sort by relevance Show all posts

Saturday, May 09, 2026

Pervasive transcription = genes + noise

Most of the DNA in the human genome is transcribed at some point in development or in some cell type. This fact has been known since the late 1960s.

There are basically two types of transcripts. Functional transcripts mostly come from genes although there might be a few exceptions (e.g. enhancer RNAs). Non-functional transcripts can be produced by pseudogenes or from virus and transposon fossils. They can also due to transcriptional noise caused by spurious transcription.

Philosophers talk about junk DNA

There is considerable debate in the scientific literature over the amount of junk DNA in the human genome. The standard model was developed 50 years ago; it postulated that only 10% of the genome is functional and 90% is junk. Most of the evidence since then has supported that model but there are many scientists who reject it.

This would seem to be fertile ground for philosophers of biology and, indeed, there are some philosophers who have made a significant contribution, mostly in sorting out how to define function (Brunet et al., 2021; Linquist et al., 2020). Also, many philosophers are interested in the history of biology and some (e.g. Morange, 2020) have done a good job of describing the history of the junk DNA concept.

Ask Gemini: "What is the difference between junk DNA and non-coding DNA?"

This is weird. I was a bit bored so I asked Gemini the following question: "What is the difference between junk DNA and non-coding DNA?" I thought the first answer was so wrong that I decided to ask it again to see if I got the same answer.

The second answer was quite different because Gemini noticed that I had bookmarked Sandwalk, a blog written by Laurence Moran, a champion of the 'junk DNA' concept. Is it trying to give me the answer it thinks I want or the best possible scientific answer?

Note: Here is the correct answer.

Non-coding DNA refers to the part of the genome that doesn't code for proteins. It's one way to partition the genome - you could also refer to regulatory sequences and non-regulatory sequences.

By the late 1960s scientists knew of lots of functional non-coding DNA such as regulatory sequences and non-coding genes such as those for ribosomal RNA and tRNA. (There are other non-coding functional elements.) It became apparent that most of the human genome consisted of non-functional DNA or junk DNA. The original model back then was that 10% is functional and 90% is junk. The 10% that is functional consisted of 1-2% coding DNA and about 8% of functional non-coding DNA.

No knowledgeable scientist ever said that all non-coding DNA was junk; that's a lie that continues to be perpetuated in scientific publications and the popular media even though it has been repeatedly debunked.

Most of the data that has accumulated over the past 50+ years has supported the idea that 90% of the human genome is junk and only 10% is functional.

The Gemini answers relate to the debate concerning whether AI is really intelligent and, more importantly, whether the popular (free) algorithms are spreading misinformation.

How many pseudogenes in the human genome?

There are somewhat less than 25,000 genes in the human genome and there are probably about the same number of pseudogenes.

Pseudogenes are sequences that resemble real functional genes but they contain mutations that render them non-functional. They are very real examples of junk DNA.

There are four kinds of pseudogenes. Duplicated pseudogenes arise from a gene duplication event when one of the original copies mutates. Duplicated pseudogenes retain all of the features of the original gene, including introns and adjacent regulatory sequences. The inactivating mutation may occur in the gene itself—for example in the coding region of a protein coding gene—in which case the pseudogene may still be transcribed. Duplicated pseudogenes are usually found adjacent to their parent gene.

Processed pseudogenes arise when the normal transcript is copied by reverse transcriptase and the DNA copy is reintegrated into the genome. Processed pseudogenes don't have introns or regulatory sequences and they are not near their parent gene. Most processed pseudogenes come from transcripts that are expressed in the germ line.

How can we combat the spread of misinformation?

This is a serious question. We (Sandwalk readers) know that there's a lot of science misinformation being spread in the popular science literature.¹ So far, scientists have been spectacularly unsuccessful in stopping it.

The misinformation covers all aspects of science but my particular bugaboos are evolution, genomes, and junk DNA.

I'm going to quote the first few paragraphs from an article on the Knowable Magazine website. It seems to be associated with Annual Reviews and it certainly looks like it should be a credible source of science information.

The article is The silent majority: RNAs that don’t make proteins. The author is Christina Szalinski and here's how she describes herself on her website.

I know science.

I became a science writer in 2013 after finishing my PhD in cell biology at the University of Pittsburgh. So when it comes to writing, I can shake out the molecular tangles, unravel the cellular threads, and wade through the formidable details of scientific studies.

Is it wrong to specifically identify science writers who are spreading misinformation? Is it cruel or mean to imply that they don't understand their subject?

Do other science writers and their organizations have any obligation to police their own discipline to ensure scientific accuracy?

Does anybody have any good ideas on how to clean up this mess?

Here's an excerpt from the article. I don't think I need to explain what's wrong.

When scientists first cracked the genetic code, they expected a simple story: DNA makes RNA, and that RNA, known as messenger RNA, makes proteins. Proteins would do all the important work — building tissues, fighting infections, digesting food.

But when the DNA of our genome was finally sequenced, researchers encountered a head-scratcher: The 20,000-plus genes that carry instructions for making our proteins account for less than 2 percent of our DNA. What was the rest of it good for?

For years, the remaining 98 percent was dismissed as “junk DNA” — evolutionary debris, filler. But as sequencing technology improved, a startling picture emerged. Our cells were busy making RNA copies of all those expanses, not just making messenger RNA — or mRNA — from the protein-coding genes. They were churning out vast quantities of RNA molecules with no known purpose.

The question became: Why would cells waste so much energy on copying that junk?

Today, however, the importance of this non-coding RNA — the catchall term for RNA molecules that don’t carry instructions for proteins — is undeniable. Non-coding RNAs turn out to regulate everything from embryonic development to immune responses to brain function. They help determine which genes get turned on and off, and when. They can promote cancer or suppress it.

I contacted the author last week to warn her that I was about to publish this post. I asked if she wished to comment or to provide the source of her information on the history of the field. I did not get a reply.

The problem with this kind of description is that it misrepresents the way science is done. Most scientific models are due to slow and steady, incremental advances building on previous studies. That kind of science is (usually) self-correcting—when new information becomes available, the old models are revised.

The picture that is being presented to the general public is that old scientists were pretty stupid because they thought there was only one kind of gene (protein coding) and that everything else in the genome (98%) had to be junk. According to that false history, the old fuddy-duddies were shown to be totally wrong when the human genome was sequenced and thousands of non-coding genes were discovered for the first time. That disproves junk DNA according to the false history.

Is there a way of writing the true history in a way that's accessible to the general public? I don't know but I thought I would give it a try in order to try and show modern science writers how it coould be done.

It's not easy. Read my attempt below and let me know if it works.

Scientists were actively working out the functions of DNA back in the 1950s and 1960s. By the mid-1960s they had discovered two kinds of genes. The majority encoded proteins but there were also non-coding genes that specified important RNAs such as ribosomal RNA (rRNA) and transfer RNA (tRNA) that were used in protein synthesis.

Scientists also established that DNA contained regulatory elements that controlled the expression of those two types of genes. Other functional DNA elements were also identified at this time.

Most of this work was done in bacteria and their viruses where genes took up a very large percentage of the DNA in their chromosomes. However, it soon became apparent that this was not the case in humans where the coding regions of the protein-coding genes seemed to account for only 2% of the genome. (The genome is the total amount of DNA in all chromosomes.) Other functional elements, such as non-coding genes and regulatory sequences only accounted for a bit more of the genome.

This gave rise to a model developed by the leading experts of the time, including several Nobel Laureates. They proposed that only 10% of the human genome is functional and 90% is junk DNA. Based on a lot of experimental data, they estimated that there were about 30,000 genes in the human genome.

Additional non-coding genes specifying regulatory RNAs were identified at this time (early 1970s) but the biggest advance in this area ocurred in the 1980s with the discovery of a host of genes specifying various new RNAs. Some of these new non-coding genes specified RNAs that acted like protein enzymes to catalyze biochemical reactions. Others were involved in regulating gene expression and still others were structural components of large cellular complexes.

These results, and others from the 1990s, raised the number of non-coding genes in humans to as many as several thousand but they still only accounted for a fraction of the total number of protein-coding genes.

The first draft of the human genome was published 25 years ago and it confirmed the model developed more than 50 years ago. There were about 30,000 genes, just as the experts had predicted, and most of the human genome was junk.

Subsequent work on identifying features of the human genome have, by and large, confirmed this model but there are scientists who are skeptical.

Most of the human genome is transcribed into RNAs—a fact that was known 50 years ago—but many of the leading experts concluded that most of those RNAs were probably junk RNA of various sorts. The idea here is that the human genome is very messy and it gives rise to lots of spurious, accidental RNAs that are not biologically relevant. Most of those RNAs are present in small amounts and they are rapidly degraded. They are not conserved in our closest relatives. (Sequence conservation is a good indication of function and lack of sequence conservation is a good indication of junk.)

The skeptics, on the other hand, argue that most of those RNAs have a function and there are far more non-coding genes than protein-coding genes. The debate continues to this day.

1. And, unfortutnately, in the legitimate scientific literature.

Monday, April 06, 2026

How can philosophy contribute to science?

I've written quite a bit about the perceived conflict between science and philosophy and defended my view that science is best described in broad terms as a way of knowing that requires evidence, skepticism, and rational thinking. As far as I know, there is no other way of knowing that has produced true knowledge.

In this sense, the proper practice of philosophy has to involve science—and by that I mean evidence— if the results are going to produce knowledge. There's lots to debate on this topic, including discussions about the meaning of "knowledge" [Is science the only way of knowing?].

But that's not what I want to talk about today. Today's topic is about the contribution that philosophers can make to science. I'll focus on philosophers of biology and on scientific topics that I'm knowledgeable about and I'll assume that most philosophers agree with Elisabeth Lloyd when she says, "As a philosopher of science, I have always been oriented towards addressing problems that scientists have, not so much problems that philosophers have. That is how to do good philosophy of science."¹

Now, let me be clear about the issue. It is blindingly obvious that philosophers could use their deep understanding of logic and argumentation to make significant contributions to biology, especially in cases where scientists are misusing logic. The question is not whether philosophy is incapable of ever contributing to biology but whether it is actually fulfilling that potential.

Philosophers and definitions of evolution and allele

One of my Facebook friends posted a link to an article on genetic drift on the Stanford Encyclopedia of Philosophy website [Genetic Drift]. The author is Roberta Millstein and the article is a recent update to an older version that I questioned ten years ago [A philosopher's view of random genetic drift]. I noted on Facebook that this was "Another example of philosophers who don’t understand modern science." By this I meant that the article seemed to ignore the abundant molecular evidence of drift.

That prompted a response from defenders of philosophy and Roberta Millstein joined in. Here's the essence of her defense of philosophy.

The Stanford Encyclopedia of Philosophy is, as the name suggests, about philosophy. Thus, the entry surveys views about drift in philosophy -- starting with the history of drift because some philosophical views about drift, such as my own, take that as their starting point -- which include debates about what drift is and other philosophical topics.

Perhaps if biologists had been crystal clear and consistent about what drift is, there would have been less to write about, but there is good evidence for the claim I make in the entry and elsewhere, that in fact scientists use the term in different ways, some of which I think are unproductive (e.g., describing drift in terms of outcome rather than as causal process).

I'll post a separate article about her views on genetic drift but here I want to address the point that biologists aren't always "crystal clear." It turns out that Millstein doesn't agree with my definition of evolution or my definition of allele so when I try to make the case that fixation of neutral alleles is a clear example of random genetic drift this is challenged by evidence that not all biologists accept my definitions so genetic drift isn't as solidly established as I might think.

Carl Zimmer writes about AlphaGenome

We may not know a lot about how artificial intelligence (AI) algorithms work but the one thing we do know is that they are only as good as their databases. If you ask an AI program to tell you when Charles Darwin was born then chances are good it's going to give you the correct answer because that information is in Wikipedia and lots of other reliable online sources.

However, if you ask it to tell you how many genes are in the human genome it will not give you the correct answer. The correct answer is that we don't know for sure because it depends on how you define a gene and how many non-coding genes there are using various definitions. That's not the answer you will get. (I personally believe that there are only about 1000 non-coding genes but I don't expect a good "intelligence" program to favor my view over others. I DO expect it to not favor other opinions over mine.)

I just asked ChatGPT and it told me that there are tens of thousands of non-coding genes based on the Human Genome Project plus GENCODE and Ensemble annotations. This is correct ... and misleading. It's giving the best answer it can based on the databases it searches. However, many of us are skeptical of the GENCODE and Ensemble annotations and for good reason. They tend to err on the side of inclusion in order to avoid false negatives. In other words, they don't want to risk ignoring a real biologically relevant feature for lack of evidence so they deliberately risk including a lot of false positives. This is why those databases include a lot of questionable features such as non-coding genes, multiple transcription start sites, multiple splice variants, and tons of potential regulatory elements.

Along comes AlphaGenome. It's an AI program designed to scan those GENCODE and Ensemble databases to identify important features that might play a role in genetic diseases. What could possibly go wrong? [How intelligent is artificial intelligence?] [Will AlphaGenome from Google DeepMind help us understand the human genome?]

The average science writer jumped all over the original announcement of AlphGenome to let us all know that artificial intelligence was going to solve the problem of the mysterious genome. Apparently the complexity of the human genome has astonished scientists ever since the first human genome sequence was published 25 years ago.¹ The typical article on AlphaGenome fits nicely into the common theme that AI is soon going to rule the world.

That's why I was excited to pick up my copy of the New York Times yesterday and see that Carl Zimmer had written about AlphaGenome. Finally, an intelligent, highly respected, science writer was going to give us the truth. Here's the article that I saw in my version of the paper. (It was originally published several weeks ago on January 28, 2026.)

What a disappointment! Zimmer goes with the hype about AlphaGenome and repeats some of the tropes that he has avoided in the past. For example, he writes about how alternative splicing can create hundreds of different proteins from a single gene and how regulatory sequences can lie thousands or million of base pairs away from a gene. (There's no question that this is true for a small number of transcription factor binding sites but the vast majority are close to the promoter.)

Zimmer gives an example showing that AlphaGenome identified a regulatory sequence for a gene called TAL1, implying that the program will help decipher the rest of the genome. The general tone of the newspaper article is that AlphaGenome will be of great help to scientists who want to understand the human genome.

I checked the online version of Carl Zimmer's article in order to prepare for this blog post. I was surprised to see that there were lots of things in the online version that weren't in the newspaper article. For example, Zimmer quotes my colleague Alex Palazzo saying that everybody uses AlphaFold to study proteins then later on in the article Zimmer notes that, "But the more scientists studied the human genome, the more complicated and messy it turned out to be." The newspaper article left out the words "and messy" and that's significant because junk DNA supporters like Alex Palazzo often refer to the human genome as "messy" and full of junk DNA and that's a very different perspective than opponents of junk DNA who emphasize things like "complicated" and "mysterious."²

Zimmer has an even more revealing section that's in the online version but not the newspaper version.

Peter Koo, a computational biologist at Cold Spring Harbor Laboratory in New York who was not involved in the project, said that AlphaGenome represented an important step forward in applying artificial intelligence to the genome. “It’s an engineering marvel,” he said.

But Dr. Koo and other outside experts cautioned that it represented just one step on a long road ahead. “This is not AlphaFold, and it’s not going to win the Nobel Prize,” said Mark Gerstein, a computational biologist at Yale.

AlphaGenome will be useful. Dr. Gerstein said that he would probably add it to his toolbox for exploring DNA, and others expect to follow suit. But not all scientists trust A.I. programs like AlphaGenome to help them understand the genome.

“I see no value in them at all right now,” said Steven Salzberg, a computational biologist at Johns Hopkins University. “I think there are a lot of smart people wasting their time.”

The end of the online article is quite different from the final paragraphs of the newspaper article. In the newspaper article, Zimmer describes the TAL1 result then ends it with the paragraph starting with "In reality." I've highlighted that paragraph in the quotations below from the online version.

The AlphaGenome researchers shared their TAL1 predictions with Dr. Marc Mansour, a hematologist at University College London who spent years uncovering the leukemia-driving mutations with lab experiments.

“It was quite mind-blowing,” Dr. Mansour said. “It really showed how powerful this is.”

But, Dr. Mansour noted, AlphaGenome’s predictive powers fade the farther its gaze strays from a particular gene. He is now using AlphaGenome in his cancer research but does not blindly accept its results.

“These prediction tools are still prediction tools,” he said. “We still need to go to the lab.”

Dr. Salzberg of Johns Hopkins is less sanguine about AlphaGenome, in part because he thinks its creators put too much trust in the data they trained it on. Scientists who study splice sites don’t agree on which sites are real and which are genetic mirages. As a result, they have created databases that contain different catalogs of splice sites.

“The community has been working for 25 years to try to figure out what are all the splice sites in the human genome, and we’re still not really there,” Dr. Salzberg said. “We don’t have an agreed-upon gold-standard set.”

Dr. Pollard also cautioned that AlphaGenome was a long way from being a tool that doctors could use to scan the genomes of patients for threats to their health. It predicts only the effects of a single mutation on one standard human genome.

In reality, any two people have millions of genetic differences in their DNA. Assessing the effects of all those variations throughout a patient’s body remains far beyond AlphaGenome’s industrial-strength power.

“It is a much, much harder problem — and yet that’s the problem we need to solve if we want to use a model like this for health care,” Dr. Pollard said.

The net effect of these differences is to transform the article from one that promotes AlphaGenome in the newspaper version to one that's far more skeptical in the online version. I believe that the online version is far more accurate and reflects the high standard that I expect from Carl Zimmer. I'm assuming that the newspaper article was edited for the New York Times supplement that I read and I'm assuming that Zimmer did not approve of that edit.

Note: The cartoon was generated by ChatGPT in response to the request, "draw a cartoon illustrating GIGO - garbage in garbage out."

Note: The photo is from 10 years ago when Carl was in Toronto working on his junk DNA article for The New York Times [Is Most of Our DNA Garbage?]. That's Alex Palazzo on the left, then me, Ryan Gregory, and Carl Zimmer on the right.

1. Most knowledgeable scientists were not astonished to learn that 90% of our genome really is junk and there are fewer than 30,000 genes.

2. See the last chapter of my book: "Chapter 11: Zen and the Art of Coping with a Sloppy Genome."

Tuesday, February 10, 2026

How intelligent is artificial intelligence?

Over the past few years I've been assessing AI algorithms to see if they can answer difficult questions about junk DNA, alternative splicing, evolution, epigenetics and a number of other topics. As a general rule, these AI algorithms are good at searching the internet and returning a consensus view of what's out there. Unfortunately, the popular view on some of these topics is wrong and most AI algorithms are incapable of sorting the wheat from the chaff.

In most cases, they aren't even capable of recognizing that there's a controversy and that their preferred answer might not be correct. They are quite capable of getting their answer from known kooks and unreliable, non-scientific, websites, [The scary future of AI is revealed by how it deals with junk DNA].

Others have now recognized that there's a problem with AI so they devised a set of expert questions that have definitive, correct, answers but the answers cannot be retrieved by simple internet searches. The idea is to test whether AI algorithms are actually intelligent or just very fast search engines that can summarize the data they retrieve and create an intelligent-sounding output.

Is epigenetics one of the best ideas of the 21st century?

New Scientist has produced a special issue on the best ideas of the 21st century. Here's the complete list.

The Third Way Evolution Conference

The Third Way of Evolution is a strange organization composed of mavericks who think they're not getting enough attention. Here's how they describe their movement.

The vast majority of people believe that there are only two alternative ways to explain the origins of biological diversity. One way is Creationism that depends upon intervention by a divine Creator. That is clearly unscientific because it brings an arbitrary supernatural force into the evolution process. The commonly accepted alternative is Neo-Darwinism, which is clearly naturalistic science but ignores much contemporary molecular evidence and invokes a set of unsupported assumptions about the accidental nature of hereditary variation. Neo-Darwinism ignores important rapid evolutionary processes such as symbiogenesis, horizontal DNA transfer, action of mobile DNA and epigenetic modifications. Moreover, some Neo-Darwinists have elevated Natural Selection into a unique creative force that solves all the difficult evolutionary problems without a real empirical basis. Many scientists today see the need for a deeper and more complete exploration of all aspects of the evolutionary process.

Answers in Genesis uses the latest DNA research to destroy evolutionary proof (not!)

There's been so much bad news this week that I though you might enjoy a little humor to lighten your day. Here are some devout Young Earth Creationists making fun of some stupid comments they've found on the internet and calling on some professor to "destroy" evolutionists who believe in junk DNA [Latest in DNA Research Destroys Evolutionary “Proof”].

Even more regulatory elements?

The expression of genes is regulated at many levels but one of the most important is regulation at the level of transcription. Transcription initiation is controlled by transcription factors that bind to sequences near the promoter and either activate or repress transcription.

A lot of work has been done on transcription regulation in mammals over the past 40 years. The general impression from these detailed studies of individual genes is that regulation usually involves a relatively small number of transcription factors that bind to sequences within 1000 bp or so of the transcription start site.

This model was challenged by the ENCODE studies in 2012. ENCODE researchers claimed to have discovered hundreds of thousands of cis-regulatory elements (CRE's) covering a substantial percentage of the genome. If they are correct, then this means that there are dozens of transcription factors controlling the expression of every gene.

Will AlphaGenome from Google DeepMind help us understand the human genome?

I recently reported that Google's AI program does a horrible job of summarizing the junk DNA controversy. [The scary future of AI is revealed by how it deals with junk DNA] That led to a discussion about the "intelligence" in artificial intelligence and whether AI was capable of distinguishing between accurate and inaccurate data.

Google DeepMind is an artificial intelligence research laboratory headquartered in London, UK. Two of its programmers, Demis Hassabis and John Jumper, were awarded the 2024 Nobel Prize in Chemistry for developing AlphaFold, a program that predicts the tertiary structure of proteins.

The activity of "random" DNA supports the junk DNA model

I complain a lot about the quality of science writing but today's post is very different. I want to highlight an article by Michael Le Page that he just published in New Scientist. It's one of the best articles on junk DNA that I've ever seen in popular science magazines and newspapers [Human-plant hybrid cells reveal truth about dark DNA in our genome].

I've admired Michael Le Page for many years because of his articles on climate change and evolution. It doesn't surprise me that he's right about junk DNA.

The scary future of AI is revealed by how it deals with junk DNA

Today I did a Google search for the term "JUNK DNA" and, as usual, the first thing I saw was the Google AI description of junk DNA. It's wrong, but that's not the scary part. The most frightening thing about the AI description is that it promotes three videos that misrepresent science and two of them are from well known kooks.

What does this tell you about current versions of AI? It tells you that it is not intelligent in any meaningful sense of the word. It tells you that Google AI is incapable of distinguishing between scientific facts and ignorance. It tilts toward the loudest voices on the internet and, as we all know, those voices are frequently wrong.

How many lncRNA genes in the human genome? (2025)

There is considerable controversy over the total number of genes in the human genome. The number of protein-coding genes is pretty well established at somewhere between 19,500 and 20,000. It's the number of non-coding genes that's disputed.

There's general agreement on the number of well-defined small RNA genes such as snRNAs, snoRNA, microRNAs etc. Similarly, the number of ribosomal RNA and tRNA genes is known. The problem is with identifying genuine long non-coding RNA genes (lncRNA genes). Estimates vary from less than 20,000 to more than 200,000 but most of these estimates fail to define what they mean by "gene." Many scientists seem to think that any detectable transcript must come from a gene.

This doesn't make any sense since we know that spurious transcripts exist and they don't come from genes by any meaningful definition of gene. The only reasonable definition of a molecular gene is a DNA sequence that's transcribed to produce a functional product.¹

The idea that spurious, non-functional, transcripts exist has been described in the scientific literature for many decades. One of my favorites is in a paper by Ponting and Haerty (2022) quoting another paper from thirteen years ago by Ulitsky and Bartel.

The cellular transcriptional machinery does not perfectly discriminate cryptic promoters from functional gene promoters. This machinery is abundant and so can engage sites momentarily depleted of nucleosomes and rapidly initiate transcription. The chance occurrence of splice sites can then facilitate the capping, splicing, and polyadenylation of long transcripts. A very large number of such rare RNA species are detectable in RNA-sequencing experiments whose properties are virtually indistinguishable from those of bona fide lncRNAs. Consequently, “a sensible [null] hypothesis is that most of the currently annotated long (typically >200 nt) noncoding RNAs are not functional, i.e., most impart no fitness advantage, however slight” (Ulitsky and Bartel, 2013: p. 26).

The important point here is that the correct null hypothesis is that these transcripts don't have a biologically relevant function and the burden of proof is on researchers to demonstrate function before assigning them to a genuine gene. My colleagues at the University of Toronto made the same point in a paper published in 2015.

In the absence of sufficient evidence, a given ncRNA should be provisionally labeled as non-functional. Subsequently, if the ncRNA displays features/activities beyond what one would expect for the null hypothesis, then we can reclassify the ncRNA in question as being functional. (Palazzo and Lee, 2015)

There are a number of well-defined lncRNAs that have been shown to have distinct reproducible functions. The key question is how many of these biologically relevant lncRNA genes exist in the human genome. I struggled with the answer to this question when I was writing my book. I finally decided to make a generous estimate of 5000 non-coding genes and that implies several thousand lncRNA genes (p. 127). I now think that estimate was far too generous and there are probably fewer than 1000 genuine lncRNA genes.

I have not scoured the literature for all the examples of human lncRNAs having good evidence of function but my impression is that there are only a few hundred. This post was incited by a recent publication by researchers from the Hospital for Sick Children and the University of Toronto (Toronto, Canada) who characterized another functional lncRNA called CISTR-ACT that plays a role in regulating cell size (Kiriakopulos et al., 2025).

I was prompted to revisit this controversy by the accompanying press release that said ...

Unlike genes that encode for proteins, CISTR-ACT is a long non-coding RNA (or lncRNA) and is part of the non-coding genome, the largely unexplored part that makes up 98 per cent of our DNA. This research helps show that the non-coding genome, often dismissed as ‘junk DNA’, plays an important role in how cells function.

We're used to this kind of misinformation² in press releases but I thought it would be a good idea to read the paper. As I expected, there's nothing in the paper about junk DNA but here's the first sentence of the introduction.

The human genome contains more long non-coding RNAs (lncRNAs) than protein-coding genes (GENCODE v49) which regulate genes and chromatin scaffolding.

The latest version of GENCODE Release 49 claims that there are 35,899 lncRNA genes. This is the only reference in the Kiriakopulos et al. paper to the number of lncRNA genes. There's no mention of the controversy and none of the papers that discuss the controversy are referenced.

The GENCODE number is close to the latest version of Ensembl, which lists 35,042 lncRNA genes. I couldn't find any good explanation for these numbers or for the definition of "gene" that they are using but what's interesting is how these numbers are climbing every year; for example, a paper from two years ago listed a number of sources and you can see that the RefSeq and GENCODE numbers are much smaller than today's numbers (Amaral et al., 2023).³

We intend to provoke alternative interpretation of questionable evidence and thorough inquiry into unsubstantiated claims.

Ponting and Haerty (2022)

It's perfectly acceptable to state your preferred view on lncRNAs when you publish a paper. The authors of the recent paper may want to believe that there are more lncRNA genes than protein-coding genes but I think it's important for them to define what they mean by "gene" when they make such a claim. What's not acceptable, in my opinion, is to ignore a genuine scientific controversy by not mentioning in the introduction that there are other legitimate views.

It's a shame that they didn't do that because their paper is a good example of the hard work that needs to be done in order to demonstrate that a particular lncRNA has a biologically relevant function.

In closing, I want to emphasize the recent review by Ponting and Haerty (2022)⁴ that points out the importance of the problem and the kinds of experiments that need to be done in order to establish that a given RNA comes from a real gene. This is how a scientific controversy should be addressed. Here's the abstract of that paper ...

Do long noncoding RNAs (lncRNAs) contribute little or substantively to human biology? To address how lncRNA loci and their transcripts, structures, interactions, and functions contribute to human traits and disease, we adopt a genome-wide perspective. We intend to provoke alternative interpretation of questionable evidence and thorough inquiry into unsubstantiated claims. We discuss pitfalls of lncRNA experimental and computational methods as well as opposing interpretations of their results. The majority of evidence, we argue, indicates that most lncRNA transcript models reflect transcriptional noise or provide minor regulatory roles, leaving relatively few human lncRNAs that contribute centrally to human development, physiology, or behavior. These important few tend to be spliced and better conserved but lack a simple syntax relating sequence to structure and mechanism, and so resist simple categorization. This genome-wide view should help investigators prioritize individual lncRNAs based on their likely contribution to human biology.

1. See Wikipedia: Gene ; What Is a Gene?; Definition of a gene (again); Must a Gene Have a Function?.

2. No knowledgeable scientist ever said that all non-coding DNA was junk. We've known about non-coding genes for more than half-a-century.

3. See How many genes in the human genome (2023)?

4. See Most lncRNAs are junk
Amaral, P., Carbonell-Sala, S., De La Vega, F.M., Faial, T., Frankish, A., Gingeras, T., Guigo, R., Harrow, J.L., Hatzigeorgiou, A.G., Johnson, R. et al. (2023) The status of the human gene catalogue. Nature 622:41-47. [doi: 10.1038/s41586-023-06490-x]

Kiriakopulos et al. (2025) LncRNA CISTR-ACT regulates cell size in human and mouse by guiding FOSL2. Nature communications: (in press). [doi: 10.1038/s41467-025-67591-x]

Palazzo, A.F. and Lee, E.S. (2015) Non-coding RNA: what is functional and what is junk? Frontiers in genetics 6:2(1-11). [doi: 10.3389/fgene.2015.00002]
Ponting, C.P. and Haerty, W. (2022) Genome-Wide Analysis of Human Long Noncoding RNAs: A Provocative Review. Annual review of genomics and human genetics 23. [doi: 10.1146/annurev-genom-112921-123710

Ulitsky, I. and Bartel, D.P. (2013) lincRNAs: genomics, evolution, and mechanisms. Cell 154:26-46. [doi: 10.1016/j.cell.2013.06.020]

Thursday, December 11, 2025

How many regulatory sites in the human genome?

The current best model of the human genome is that only 10% is functional and 90% is junk. This model was first developed over half a century ago (see Junk DNA). From the very beginning, the model recognized that regulatory sequences would make up a significant proportion of the functional elements but early suggestions that most of the repetitive DNA would turn out to be involved in regulation were rejected.

As more and more data accumulated on regulatory sequences, it became apparent that most regulatory sequences of pol II (RNA polymerase II) genes could be found in relatively short regions of DNA just upstream of the transcription start site. It also became apparent that for each transcription factor there were thousands of transcription factor binding sites even though only a small number were actually involved in genuine gene regulation.¹

Google AI references a "Biblical Genetics" video in claiming that junk DNA is no longer considered junk

90% of the human genome is junk DNA.

Today I did a routine search for "junk DNA" "2025" to see if misinformation is still dominating the web. It is, but that's not the most surprising thing I discovered. Here's what Google AI told me at the top of the search page.

In 2025, "junk DNA" is no longer considered junk, as new studies show it plays vital roles in gene regulation and development. Research from 2025 indicates that these sequences, many of which come from ancient viruses, can act as "genetic switches" that influence how genes are turned on or off and how cells respond to their environment. This has led to potential breakthroughs in regenerative medicine and cancer treatment by providing new therapeutic targets.

This video explains how what was once considered junk DNA has been found to contain thousands of new genes:

The video is by Robert Carter who has a Ph.D. in molecular biology. His site is called Biblical Genetics. He also posts on creation.com

Carter sounds like he knows what he's talking about but he's just parroting all the misinformation that permeates the scientific literature. The main message of this video is that scientists were shocked to discover that the human genome only had 20,000 protein coding genes but we now know (no, we don't) that each gene makes many different proteins and that accounts for the "missing" complexity that all the experts had expected.¹

We also "know" (no, we don't) that scientists have discovered tens of thousands of new protein coding genes that make small proteins. He references a Science article by Elizabeth Pennisi who has been spreading misinformation about the human genome for more than 25 years.

It's not surprising that Robert Carter wants to discredit the idea of junk DNA. What's surprising is that Google AI is directing readers to a creationist video.

1. The knowledgeable experts predicted that the human genome would have fewer than 30,000 genes and that's exactly what was found when the human genome sequence was published.

Thursday, September 25, 2025

Wednesday talk at the University of Toronto: Larry Moran on "What's in Your Genome"

I'm giving a talk next Wednesday (October 1st) to the members of the Senior College (retired faculty). It's at the University of Toronto Faculty Club at 10am. I'll talk for 50 mins then there's a coffee break followed by 50 mins of questions and discussion.

Guests are welcome but you'll have to pay $10 to cover the cost of coffee and cookies. You can also register to watch my talk on Zoom. You can also stay for lunch at the Faculty CLub but you'll have to let me know so I can put you down as a guest.

Here's the link to register: What's in Your Genome?

Wednesday Talk: Wednesday, October 1, 2025, 10am-12pm.

In-person at the Faculty Club and on Zoom
Larry Moran, Biochemistry, University of Toronto

Title: “What’s in Your Genome?”

Abstract: Scientists have been studying the human genome for more than 70 years but today there is considerable controversy about what’s in our genome. The publication of the complete sequence of the human genome in 2001 did nothing to resolve the controversy. For many scientists, the data confirmed their predictions that we have about 30,000 genes and most of our genome is useless junk DNA. Other scientists were shocked to learn that we have so few genes so they began the search for other explanations. Today, the majority of molecular biologists and biochemists believe that most of our genome is functional and there may be as many as 100,000 extra genes that weren’t identified in 2001. The majority of experts in molecular evolution disagree —they believe that 90% of our genome is junk DNA. I will summarize the data from both sides of the controversy and discuss the role that science journalism has played in misrepresenting scientific discoveries about the human genome.

Subscribe to: Posts ( Atom )

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Saturday, May 09, 2026

Friday, May 08, 2026

Monday, April 27, 2026

Tuesday, April 14, 2026

Friday, April 10, 2026

Monday, April 06, 2026

Friday, March 27, 2026

Monday, February 16, 2026

Tuesday, February 10, 2026

Sunday, February 01, 2026

Monday, January 26, 2026

Saturday, January 17, 2026

Thursday, January 15, 2026

Sunday, January 04, 2026

Wednesday, December 31, 2025

Sunday, December 28, 2025

Friday, December 19, 2025

Thursday, December 11, 2025

Tuesday, October 21, 2025

Thursday, September 25, 2025