More Recent Comments

Showing posts with label Junk DNA. Show all posts
Showing posts with label Junk DNA. Show all posts

Sunday, January 04, 2026

Will AlphaGenome from Google DeepMind help us understand the human genome?

I recently reported that Google's AI program does a horrible job of summarizing the junk DNA controversy. [The scary future of AI is revealed by how it deals with junk DNA] That led to a discussion about the "intelligence" in artificial intelligence and whether AI was capable of distinguishing between accurate and inaccurate data.

Google DeepMind is an artificial intelligence research laboratory headquartered in London, UK. Two of its programmers, Demis Hassabis and John Jumper, were awarded the 2024 Nobel Prize in Chemistry for developing AlphaFold, a program that predicts the tertiary structure of proteins.

Wednesday, December 31, 2025

The activity of "random" DNA supports the junk DNA model

I complain a lot about the quality of science writing but today's post is very different. I want to highlight an article by Michael Le Page that he just published in New Scientist. It's one of the best articles on junk DNA that I've ever seen in popular science magazines and newspapers [Human-plant hybrid cells reveal truth about dark DNA in our genome].

I've admired Michael Le Page for many years because of his articles on climate change and evolution. It doesn't surprise me that he's right about junk DNA.

Sunday, December 28, 2025

The scary future of AI is revealed by how it deals with junk DNA

Today I did a Google search for the term "JUNK DNA" and, as usual, the first thing I saw was the Google AI description of junk DNA. It's wrong, but that's not the scary part. The most frightening thing about the AI description is that it promotes three videos that misrepresent science and two of them are from well known kooks.

What does this tell you about current versions of AI? It tells you that it is not intelligent in any meaningful sense of the word. It tells you that Google AI is incapable of distinguishing between scientific facts and ignorance. It tilts toward the loudest voices on the internet and, as we all know, those voices are frequently wrong.

Friday, December 19, 2025

How many lncRNA genes in the human genome? (2025)

There is considerable controversy over the total number of genes in the human genome. The number of protein-coding genes is pretty well established at somewhere between 19,500 and 20,000. It's the number of non-coding genes that's disputed.

There's general agreement on the number of well-defined small RNA genes such as snRNAs, snoRNA, microRNAs etc. Similarly, the number of ribosomal RNA and tRNA genes is known. The problem is with identifying genuine long non-coding RNA genes (lncRNA genes). Estimates vary from less than 20,000 to more than 200,000 but most of these estimates fail to define what they mean by "gene." Many scientists seem to think that any detectable transcript must come from a gene.

This doesn't make any sense since we know that spurious transcripts exist and they don't come from genes by any meaningful definition of gene. The only reasonable definition of a molecular gene is a DNA sequence that's transcribed to produce a functional product.1

The idea that spurious, non-functional, transcripts exist has been described in the scientific literature for many decades. One of my favorites is in a paper by Ponting and Haerty (2022) quoting another paper from thirteen years ago by Ulitsky and Bartel.

The cellular transcriptional machinery does not perfectly discriminate cryptic promoters from functional gene promoters. This machinery is abundant and so can engage sites momentarily depleted of nucleosomes and rapidly initiate transcription. The chance occurrence of splice sites can then facilitate the capping, splicing, and polyadenylation of long transcripts. A very large number of such rare RNA species are detectable in RNA-sequencing experiments whose properties are virtually indistinguishable from those of bona fide lncRNAs. Consequently, “a sensible [null] hypothesis is that most of the currently annotated long (typically >200 nt) noncoding RNAs are not functional, i.e., most impart no fitness advantage, however slight” (Ulitsky and Bartel, 2013: p. 26).

The important point here is that the correct null hypothesis is that these transcripts don't have a biologically relevant function and the burden of proof is on researchers to demonstrate function before assigning them to a genuine gene. My colleagues at the University of Toronto made the same point in a paper published in 2015.

In the absence of sufficient evidence, a given ncRNA should be provisionally labeled as non-functional. Subsequently, if the ncRNA displays features/activities beyond what one would expect for the null hypothesis, then we can reclassify the ncRNA in question as being functional. (Palazzo and Lee, 2015)

There are a number of well-defined lncRNAs that have been shown to have distinct reproducible functions. The key question is how many of these biologically relevant lncRNA genes exist in the human genome. I struggled with the answer to this question when I was writing my book. I finally decided to make a generous estimate of 5000 non-coding genes and that implies several thousand lncRNA genes (p. 127). I now think that estimate was far too generous and there are probably fewer than 1000 genuine lncRNA genes.

I have not scoured the literature for all the examples of human lncRNAs having good evidence of function but my impression is that there are only a few hundred. This post was incited by a recent publication by researchers from the Hospital for Sick Children and the University of Toronto (Toronto, Canada) who characterized another functional lncRNA called CISTR-ACT that plays a role in regulating cell size (Kiriakopulos et al., 2025).

I was prompted to revisit this controversy by the accompanying press release that said ...

Unlike genes that encode for proteins, CISTR-ACT is a long non-coding RNA (or lncRNA) and is part of the non-coding genome, the largely unexplored part that makes up 98 per cent of our DNA. This research helps show that the non-coding genome, often dismissed as ‘junk DNA’, plays an important role in how cells function.

We're used to this kind of misinformation2 in press releases but I thought it would be a good idea to read the paper. As I expected, there's nothing in the paper about junk DNA but here's the first sentence of the introduction.

The human genome contains more long non-coding RNAs (lncRNAs) than protein-coding genes (GENCODE v49) which regulate genes and chromatin scaffolding.

The latest version of GENCODE Release 49 claims that there are 35,899 lncRNA genes. This is the only reference in the Kiriakopulos et al. paper to the number of lncRNA genes. There's no mention of the controversy and none of the papers that discuss the controversy are referenced.

The GENCODE number is close to the latest version of Ensembl, which lists 35,042 lncRNA genes. I couldn't find any good explanation for these numbers or for the definition of "gene" that they are using but what's interesting is how these numbers are climbing every year; for example, a paper from two years ago listed a number of sources and you can see that the RefSeq and GENCODE numbers are much smaller than today's numbers (Amaral et al., 2023).3

We intend to provoke alternative interpretation of questionable evidence and thorough inquiry into unsubstantiated claims.

Ponting and Haerty (2022)

It's perfectly acceptable to state your preferred view on lncRNAs when you publish a paper. The authors of the recent paper may want to believe that there are more lncRNA genes than protein-coding genes but I think it's important for them to define what they mean by "gene" when they make such a claim. What's not acceptable, in my opinion, is to ignore a genuine scientific controversy by not mentioning in the introduction that there are other legitimate views.

It's a shame that they didn't do that because their paper is a good example of the hard work that needs to be done in order to demonstrate that a particular lncRNA has a biologically relevant function.

In closing, I want to emphasize the recent review by Ponting and Haerty (2022)4 that points out the importance of the problem and the kinds of experiments that need to be done in order to establish that a given RNA comes from a real gene. This is how a scientific controversy should be addressed. Here's the abstract of that paper ...

Do long noncoding RNAs (lncRNAs) contribute little or substantively to human biology? To address how lncRNA loci and their transcripts, structures, interactions, and functions contribute to human traits and disease, we adopt a genome-wide perspective. We intend to provoke alternative interpretation of questionable evidence and thorough inquiry into unsubstantiated claims. We discuss pitfalls of lncRNA experimental and computational methods as well as opposing interpretations of their results. The majority of evidence, we argue, indicates that most lncRNA transcript models reflect transcriptional noise or provide minor regulatory roles, leaving relatively few human lncRNAs that contribute centrally to human development, physiology, or behavior. These important few tend to be spliced and better conserved but lack a simple syntax relating sequence to structure and mechanism, and so resist simple categorization. This genome-wide view should help investigators prioritize individual lncRNAs based on their likely contribution to human biology.


1. See Wikipedia: Gene; What Is a Gene?; Definition of a gene (again); Must a Gene Have a Function?.

2. No knowledgeable scientist ever said that all non-coding DNA was junk. We've known about non-coding genes for more than half-a-century.

3. See How many genes in the human genome (2023)?

4. See Most lncRNAs are junk

Amaral, P., Carbonell-Sala, S., De La Vega, F.M., Faial, T., Frankish, A., Gingeras, T., Guigo, R., Harrow, J.L., Hatzigeorgiou, A.G., Johnson, R. et al. (2023) The status of the human gene catalogue. Nature 622:41-47. [doi: 10.1038/s41586-023-06490-x]

Kiriakopulos et al. (2025) LncRNA CISTR-ACT regulates cell size in human and mouse by guiding FOSL2. Nature communications: (in press). [doi: 10.1038/s41467-025-67591-x]

Palazzo, A.F. and Lee, E.S. (2015) Non-coding RNA: what is functional and what is junk? Frontiers in genetics 6:2(1-11). [doi: 10.3389/fgene.2015.00002]

Ponting, C.P. and Haerty, W. (2022) Genome-Wide Analysis of Human Long Noncoding RNAs: A Provocative Review. Annual review of genomics and human genetics 23. [doi: 10.1146/annurev-genom-112921-123710

Ulitsky, I. and Bartel, D.P. (2013) lincRNAs: genomics, evolution, and mechanisms. Cell 154:26-46. [doi: 10.1016/j.cell.2013.06.020]

Thursday, December 11, 2025

How many regulatory sites in the human genome?

The current best model of the human genome is that only 10% is functional and 90% is junk. This model was first developed over half a century ago (see Junk DNA). From the very beginning, the model recognized that regulatory sequences would make up a significant proportion of the functional elements but early suggestions that most of the repetitive DNA would turn out to be involved in regulation were rejected.

As more and more data accumulated on regulatory sequences, it became apparent that most regulatory sequences of pol II (RNA polymerase II) genes could be found in relatively short regions of DNA just upstream of the transcription start site. It also became apparent that for each transcription factor there were thousands of transcription factor binding sites even though only a small number were actually involved in genuine gene regulation.1

Tuesday, October 21, 2025

Google AI references a "Biblical Genetics" video in claiming that junk DNA is no longer considered junk

90% of the human genome is junk DNA.

Today I did a routine search for "junk DNA" "2025" to see if misinformation is still dominating the web. It is, but that's not the most surprising thing I discovered. Here's what Google AI told me at the top of the search page.

In 2025, "junk DNA" is no longer considered junk, as new studies show it plays vital roles in gene regulation and development. Research from 2025 indicates that these sequences, many of which come from ancient viruses, can act as "genetic switches" that influence how genes are turned on or off and how cells respond to their environment. This has led to potential breakthroughs in regenerative medicine and cancer treatment by providing new therapeutic targets.

This video explains how what was once considered junk DNA has been found to contain thousands of new genes:

The video is by Robert Carter who has a Ph.D. in molecular biology. His site is called Biblical Genetics. He also posts on creation.com

Carter sounds like he knows what he's talking about but he's just parroting all the misinformation that permeates the scientific literature. The main message of this video is that scientists were shocked to discover that the human genome only had 20,000 protein coding genes but we now know (no, we don't) that each gene makes many different proteins and that accounts for the "missing" complexity that all the experts had expected.1

We also "know" (no, we don't) that scientists have discovered tens of thousands of new protein coding genes that make small proteins. He references a Science article by Elizabeth Penissi who has been spreading misinformation about the human genome for more than 25 years.

It's not surprising that Robert Carter wants to discredit the idea of junk DNA. What's surprising is that Google AI is directing readers to a creationist video.


1. The knowledgeable experts predicted that the human genome would have fewer than 30,000 genes and that's exactly what was found when the human genome sequence was published.

Thursday, September 25, 2025

Wednesday talk at the University of Toronto: Larry Moran on "What's in Your Genome"

I'm giving a talk next Wednesday (October 1st) to the members of the Senior College (retired faculty). It's at the University of Toronto Faculty Club at 10am. I'll talk for 50 mins then there's a coffee break followed by 50 mins of questions and discussion.

Guests are welcome but you'll have to pay $10 to cover the cost of coffee and cookies. You can also register to watch my talk on Zoom. You can also stay for lunch at the Faculty CLub but you'll have to let me know so I can put you down as a guest.

Here's the link to register: What's in Your Genome?

 

Wednesday Talk: Wednesday, October 1, 2025, 10am-12pm.

In-person at the Faculty Club and on Zoom

Larry Moran, Biochemistry, University of Toronto

Title: “What’s in Your Genome?”

Abstract: Scientists have been studying the human genome for more than 70 years but today there is considerable controversy about what’s in our genome. The publication of the complete sequence of the human genome in 2001 did nothing to resolve the controversy. For many scientists, the data confirmed their predictions that we have about 30,000 genes and most of our genome is useless junk DNA. Other scientists were shocked to learn that we have so few genes so they began the search for other explanations. Today, the majority of molecular biologists and biochemists believe that most of our genome is functional and there may be as many as 100,000 extra genes that weren’t identified in 2001. The majority of experts in molecular evolution disagree —they believe that 90% of our genome is junk DNA. I will summarize the data from both sides of the controversy and discuss the role that science journalism has played in misrepresenting scientific discoveries about the human genome.


Monday, July 21, 2025

Endogenous retrovirus sequences can be transcriptionally active: the reality vs. the hype

A recent paper on characterizing endogenous retrovirus sequences has attracted some attention because of a press release from Kyoto University that focused on refuting junk DNA. But it turns out that there's no mention of junk DNA in the published paper.

Let's start with a little background. Retroviruses are RNA viruses that go though a stage where their RNA genomes are copied into DNA by reverse transcriptase. The virus may integrate into the host genome and be carried along for many generations producing low levels of virus particles [Retrotransposons/Endogenous Retroviruses]. The integrated copies are called endogenous retroviruses (ERVs).

Our genome contains about 31 different families of ERVS that have integrated over millions of years. Most of the original virus genomes have acquired mutations, including insertions and deletions, and they are no longer active. These sequences account for about 8% of our genome.

Thursday, July 17, 2025

Predatory journals are helping to spread misinformation in the scientific literature

At the end of last year (2024) I posted an article about distinguished molecular biologist William Hasletine who published an article in Forbes about A New Dogma Of Molecular Biology: A Paradigm Shift. The article was about overthrowing the Central Dogma of Molecular Biology because of the discovery of thousands of non-coding genes. There is no paradigm shift. It's a paradigm shaft. [William Haseltine misrepresents molecular biology and calls for a paradigm shift]

Tuesday, May 06, 2025

L'ADN poubelle: Junk DNA

This is a podcast in French on the topic of junk DNA. The moderator is Thomas C. Durand of La Tronche en Biais, a YouTube channel that focuses on critical thinking. Durand interviews two scientists from l’Université Paris Cité (City University of Paris), Didier Casane and Patrick Laurenti.

It's a two hour video that discusses all the relevant topics on the human genome and junk DNA. The most exciting part for me comes at 56 mins when the moderator asks Casane and Laurenti to recommend a book on the subject (see screenshot on right). Patrick Laurenti suggests that my book should be translated into French but I don't think that's going to happen.


Saturday, April 12, 2025

Templeton Foundation funds a grant on transposons

The John Templeton Foundation supports "interdisciplinary research and catalyze conversations that enable people to pursue lives of meaning and purpose." Many of these projects have religious themes or religious implications. The foundation is well-known for its support of projects that promote the compatibility of science and religion. You can see a list of recent grants here.

Templeton recently awarded a grant of $607,686 (US) to study the role of transposons in the human genome. The project leader is Stefan Linquist, a philosopher from the University of Guelph (Guelph, Ontario, Canada). Stefan has published a number of papers on junk DNA and he promotes the definition of functional DNA as DNA that is subject to purifying selection [The function wars are over]. Other members of the team include Ryan Gregory and Ford Doolittle who are prominent supporters of junk DNA.

Saturday, March 29, 2025

Tom Cech rejects junk DNA

A few months ago (June, 2024) I commented on an article by Tom Cech in The New York Times. [Tom Cech writes about the "dark matter" of the genome] In that article he expressed the view that 75% of the human genome consists of "dark matter" that is copied into RNAs of unknown function. He believes that many of these mysterious RNAs will turn out to have exciting functions.

I suspected that Cech is opposed to junk DNA and that suspicion is confirmed in his new book The Catalyst.

Monday, March 24, 2025

Google's "Generative AI" lies about junk DNA

Every now and then I check Google to see if there's any news about junk DNA. I use "junk DNA" as my search query.

The first thing I see at the top of the results page is a summary of the topic created by Google's Generative AI, which it claims is experimental. The AI summary is different every time you start a new search but all of the responses are similar in that they criticize the idea of junk DNA. Here's an example from today,

Saturday, February 15, 2025

Junk DNA is gradually making its way into mainstream textbooks

The idea that most of the human genome is junk originated more that 50 years ago. Since then, evidence in support of this concept has steadily accumulated but it has been stongly resisted by most biochemists and molecular biologists. Opposition is even stronger among scientists in other fields and in the general public thanks to a steady stream of anti-junk articles in the popular press.

Much of this opposition to junk DNA stems from a massive publiciy campaign launched by ENCODE researchers and the leading science journals back in 2012.

It's likely that most of the controversy over junk DNA is related to differing views on evolution and the power of natural selection. Most people think that natural selection is very powerful so that modern species must be extremely well-adapted to their present environment. They tend to believe that complexity is simply a reflection of sophisticated fine-tuning and this must apply to the human genome. According to this view, the presence of huge amounts of DNA with an unknown function is just a temporary situation and in the next few years most of this 'dark matter' will turn out to have a function. It has to have a function otherwise natural selection would have eliminated it.

Thursday, January 16, 2025

Intelligent Design Creationists launch a new attack on junk DNA (are they getting worried?)

The Center for Science and Culture (sic) and the Discovery Institute (sic) have published another propaganda video on junk DNA. The emphasis is on their claim that ID predicted a functional genome and that prediction turned out to be correct! The difference between this video an previous attempts to rationalize their failures is that I now get a personal mention and a caricature in this latest video.

I think I understand the problem. The ID creationists are getting worried about junk DNA as they realize that more and more scientists are beginning to understand the real problems with the ENCODE data and previous claims of function. This is why they are attempting to rebut the science behind junk DNA. But the real problem is that they simply don't understand the science as you can see in the video.

Once again, we are faced with a question about whether Intelligent Design Creationists are stupid or lying (or both).


Wednesday, December 18, 2024

BREAKING NEWS: Intelligent Design Creationists claim that this year's Nobel Prize refutes junk DNA and confirms IDC predictions!

This is a Come Let Us Reason Together (sic) podcast moderated by Lenny Esposito with Faxale "Fuz" Rana of Reasons to Believe and Casey Luskin of the Discovery Institute. They discuss this year's Nobel Prize for the discovery of microRNAs and "how it supports intelligent design and weakens the evolutionary paradigm." Casey Luskin devoted a post to the topic on the Discovery Institute propaganda blog: 2024 Nobel Prize Awarded for the Discovery of Function for a Type of “Junk DNA”.

Enjoy! (Spot the lies.1)


1. In Luskin's case, we know he is lying. [Is Casey Luskin lying about junk DNA or is he just stupid?]

Sunday, November 10, 2024

Do plants have junk DNA?

Current Opinion in Plant Biology has a special edition devoted to Genome studies and molecular genetics 2024. The only paper (so far) that discusses plant genomes is one devoted to RNAs. Here's the abstract ...

Anyatama, A., Datta, T., Dwivedi, S. and Trivedi, P.K. (2024) Transcriptional junk: Waste or a key regulator in diverse biological processes? Current Opinion in Plant Biology 82:102639. [doi: 10.1016/j.pbi.2024.102639]

Plant genomes, through their evolutionary journey, have developed a complex composition that includes not only protein-coding sequences but also a significant amount of non-coding DNA, repetitive sequences, and transposable elements, traditionally labeled as “junk DNA”. RNA molecules from these regions, labeled as “transcriptional junk,” include non-coding RNAs, alternatively spliced transcripts, untranslated regions (UTRs), and short open reading frames (sORFs). However, recent research shows that this genetic material plays crucial roles in gene regulation, affecting plant growth, development, hormonal balance, and responses to stresses. Additionally, some of these regulatory regions encode small proteins, such as miRNA-encoded peptides (miPEPs) and microProteins (miPs), which interact with DNA or nuclear proteins, leading to chromatin remodeling and modulation of gene expression. This review aims to consolidate our understanding of the diverse roles that these so-called “transcriptional junk” regions play in regulating various physiological processes in plants.

Saturday, October 26, 2024

Three lungfish species have huge genomes

Lungfish are our closest living fish cousins. All living terrestrial vertebrates (e.g. amphibians, mammals, reptiles) descent from a common ancestor with lungfish. The split occurred about 400 million years ago (4Ma) (Devonian) when there were 70-100 different lungfish species.

This relationship (lungfish-tetrapods) was firmly established recently by comparing the genome of the Australian lungfish (Neoceratodus forsteri) with that of tetrapods (Meyer et al., 2021). The other possibility had been ceolacanth-tetrapods. Coelacanths and lungfish are related—they form the class Sarcopterygii (lobe-finned fish).

Friday, September 27, 2024

John Mattick's seminar at the University of Toronto

I just learned that John Mattick gave a seminar this morning at the Department of Cell & Systems Biology at the University of Toronto. Unfortunately, I was unable to attend.

Most Sandwalk readers will recognize Mattick as one of the few remaining vocal opponents of junk DNA. He is probably best known for his dog-ass plot but this is only one of the ways he misrepresents science.

Sunday, September 01, 2024

Scite Assistant (AI) answers the question "How much of the human genome consist of junk DNA?"

Scite Assistant is billed as "your AI research partner" and as "ChatGPT for researchers." It's supposed to draw on peer-reviewed published scientific papers for its information and it will give you an answer with genuine citations.

That sounds like a good idea until you realize that the scientific literature is full of misinformation and conflicting information. What we need is an AI assistant that can help us sort throught the misinformation and give us a genuine well-informed answer on controversial issues.

Let's pick the question of junk DNA as a completley random (!) example of such an issue. The scientific literature is full of false information about the origin of the term "junk DNA" and what it was originally intended to describe. It's also full of false information about recent results and how they pertain to junk DNA.