More Recent Comments

Showing posts sorted by date for query "junk dna". Sort by relevance Show all posts
Showing posts sorted by date for query "junk dna". Sort by relevance Show all posts

Sunday, January 04, 2026

Will AlphaGenome from Google DeepMind help us understand the human genome?

I recently reported that Google's AI program does a horrible job of summarizing the junk DNA controversy. [The scary future of AI is revealed by how it deals with junk DNA] That led to a discussion about the "intelligence" in artificial intelligence and whether AI was capable of distinguishing between accurate and inaccurate data.

Google DeepMind is an artificial intelligence research laboratory headquartered in London, UK. Two of its programmers, Demis Hassabis and John Jumper, were awarded the 2024 Nobel Prize in Chemistry for developing AlphaFold, a program that predicts the tertiary structure of proteins.

Wednesday, December 31, 2025

The activity of "random" DNA supports the junk DNA model

I complain a lot about the quality of science writing but today's post is very different. I want to highlight an article by Michael Le Page that he just published in New Scientist. It's one of the best articles on junk DNA that I've ever seen in popular science magazines and newspapers [Human-plant hybrid cells reveal truth about dark DNA in our genome].

I've admired Michael Le Page for many years because of his articles on climate change and evolution. It doesn't surprise me that he's right about junk DNA.

Sunday, December 28, 2025

The scary future of AI is revealed by how it deals with junk DNA

Today I did a Google search for the term "JUNK DNA" and, as usual, the first thing I saw was the Google AI description of junk DNA. It's wrong, but that's not the scary part. The most frightening thing about the AI description is that it promotes three videos that misrepresent science and two of them are from well known kooks.

What does this tell you about current versions of AI? It tells you that it is not intelligent in any meaningful sense of the word. It tells you that Google AI is incapable of distinguishing between scientific facts and ignorance. It tilts toward the loudest voices on the internet and, as we all know, those voices are frequently wrong.

Friday, December 19, 2025

How many lncRNA genes in the human genome? (2025)

There is considerable controversy over the total number of genes in the human genome. The number of protein-coding genes is pretty well established at somewhere between 19,500 and 20,000. It's the number of non-coding genes that's disputed.

There's general agreement on the number of well-defined small RNA genes such as snRNAs, snoRNA, microRNAs etc. Similarly, the number of ribosomal RNA and tRNA genes is known. The problem is with identifying genuine long non-coding RNA genes (lncRNA genes). Estimates vary from less than 20,000 to more than 200,000 but most of these estimates fail to define what they mean by "gene." Many scientists seem to think that any detectable transcript must come from a gene.

This doesn't make any sense since we know that spurious transcripts exist and they don't come from genes by any meaningful definition of gene. The only reasonable definition of a molecular gene is a DNA sequence that's transcribed to produce a functional product.1

The idea that spurious, non-functional, transcripts exist has been described in the scientific literature for many decades. One of my favorites is in a paper by Ponting and Haerty (2022) quoting another paper from thirteen years ago by Ulitsky and Bartel.

The cellular transcriptional machinery does not perfectly discriminate cryptic promoters from functional gene promoters. This machinery is abundant and so can engage sites momentarily depleted of nucleosomes and rapidly initiate transcription. The chance occurrence of splice sites can then facilitate the capping, splicing, and polyadenylation of long transcripts. A very large number of such rare RNA species are detectable in RNA-sequencing experiments whose properties are virtually indistinguishable from those of bona fide lncRNAs. Consequently, “a sensible [null] hypothesis is that most of the currently annotated long (typically >200 nt) noncoding RNAs are not functional, i.e., most impart no fitness advantage, however slight” (Ulitsky and Bartel, 2013: p. 26).

The important point here is that the correct null hypothesis is that these transcripts don't have a biologically relevant function and the burden of proof is on researchers to demonstrate function before assigning them to a genuine gene. My colleagues at the University of Toronto made the same point in a paper published in 2015.

In the absence of sufficient evidence, a given ncRNA should be provisionally labeled as non-functional. Subsequently, if the ncRNA displays features/activities beyond what one would expect for the null hypothesis, then we can reclassify the ncRNA in question as being functional. (Palazzo and Lee, 2015)

There are a number of well-defined lncRNAs that have been shown to have distinct reproducible functions. The key question is how many of these biologically relevant lncRNA genes exist in the human genome. I struggled with the answer to this question when I was writing my book. I finally decided to make a generous estimate of 5000 non-coding genes and that implies several thousand lncRNA genes (p. 127). I now think that estimate was far too generous and there are probably fewer than 1000 genuine lncRNA genes.

I have not scoured the literature for all the examples of human lncRNAs having good evidence of function but my impression is that there are only a few hundred. This post was incited by a recent publication by researchers from the Hospital for Sick Children and the University of Toronto (Toronto, Canada) who characterized another functional lncRNA called CISTR-ACT that plays a role in regulating cell size (Kiriakopulos et al., 2025).

I was prompted to revisit this controversy by the accompanying press release that said ...

Unlike genes that encode for proteins, CISTR-ACT is a long non-coding RNA (or lncRNA) and is part of the non-coding genome, the largely unexplored part that makes up 98 per cent of our DNA. This research helps show that the non-coding genome, often dismissed as ‘junk DNA’, plays an important role in how cells function.

We're used to this kind of misinformation2 in press releases but I thought it would be a good idea to read the paper. As I expected, there's nothing in the paper about junk DNA but here's the first sentence of the introduction.

The human genome contains more long non-coding RNAs (lncRNAs) than protein-coding genes (GENCODE v49) which regulate genes and chromatin scaffolding.

The latest version of GENCODE Release 49 claims that there are 35,899 lncRNA genes. This is the only reference in the Kiriakopulos et al. paper to the number of lncRNA genes. There's no mention of the controversy and none of the papers that discuss the controversy are referenced.

The GENCODE number is close to the latest version of Ensembl, which lists 35,042 lncRNA genes. I couldn't find any good explanation for these numbers or for the definition of "gene" that they are using but what's interesting is how these numbers are climbing every year; for example, a paper from two years ago listed a number of sources and you can see that the RefSeq and GENCODE numbers are much smaller than today's numbers (Amaral et al., 2023).3

We intend to provoke alternative interpretation of questionable evidence and thorough inquiry into unsubstantiated claims.

Ponting and Haerty (2022)

It's perfectly acceptable to state your preferred view on lncRNAs when you publish a paper. The authors of the recent paper may want to believe that there are more lncRNA genes than protein-coding genes but I think it's important for them to define what they mean by "gene" when they make such a claim. What's not acceptable, in my opinion, is to ignore a genuine scientific controversy by not mentioning in the introduction that there are other legitimate views.

It's a shame that they didn't do that because their paper is a good example of the hard work that needs to be done in order to demonstrate that a particular lncRNA has a biologically relevant function.

In closing, I want to emphasize the recent review by Ponting and Haerty (2022)4 that points out the importance of the problem and the kinds of experiments that need to be done in order to establish that a given RNA comes from a real gene. This is how a scientific controversy should be addressed. Here's the abstract of that paper ...

Do long noncoding RNAs (lncRNAs) contribute little or substantively to human biology? To address how lncRNA loci and their transcripts, structures, interactions, and functions contribute to human traits and disease, we adopt a genome-wide perspective. We intend to provoke alternative interpretation of questionable evidence and thorough inquiry into unsubstantiated claims. We discuss pitfalls of lncRNA experimental and computational methods as well as opposing interpretations of their results. The majority of evidence, we argue, indicates that most lncRNA transcript models reflect transcriptional noise or provide minor regulatory roles, leaving relatively few human lncRNAs that contribute centrally to human development, physiology, or behavior. These important few tend to be spliced and better conserved but lack a simple syntax relating sequence to structure and mechanism, and so resist simple categorization. This genome-wide view should help investigators prioritize individual lncRNAs based on their likely contribution to human biology.


1. See Wikipedia: Gene; What Is a Gene?; Definition of a gene (again); Must a Gene Have a Function?.

2. No knowledgeable scientist ever said that all non-coding DNA was junk. We've known about non-coding genes for more than half-a-century.

3. See How many genes in the human genome (2023)?

4. See Most lncRNAs are junk

Amaral, P., Carbonell-Sala, S., De La Vega, F.M., Faial, T., Frankish, A., Gingeras, T., Guigo, R., Harrow, J.L., Hatzigeorgiou, A.G., Johnson, R. et al. (2023) The status of the human gene catalogue. Nature 622:41-47. [doi: 10.1038/s41586-023-06490-x]

Kiriakopulos et al. (2025) LncRNA CISTR-ACT regulates cell size in human and mouse by guiding FOSL2. Nature communications: (in press). [doi: 10.1038/s41467-025-67591-x]

Palazzo, A.F. and Lee, E.S. (2015) Non-coding RNA: what is functional and what is junk? Frontiers in genetics 6:2(1-11). [doi: 10.3389/fgene.2015.00002]

Ponting, C.P. and Haerty, W. (2022) Genome-Wide Analysis of Human Long Noncoding RNAs: A Provocative Review. Annual review of genomics and human genetics 23. [doi: 10.1146/annurev-genom-112921-123710

Ulitsky, I. and Bartel, D.P. (2013) lincRNAs: genomics, evolution, and mechanisms. Cell 154:26-46. [doi: 10.1016/j.cell.2013.06.020]

Thursday, December 11, 2025

How many regulatory sites in the human genome?

The current best model of the human genome is that only 10% is functional and 90% is junk. This model was first developed over half a century ago (see Junk DNA). From the very beginning, the model recognized that regulatory sequences would make up a significant proportion of the functional elements but early suggestions that most of the repetitive DNA would turn out to be involved in regulation were rejected.

As more and more data accumulated on regulatory sequences, it became apparent that most regulatory sequences of pol II (RNA polymerase II) genes could be found in relatively short regions of DNA just upstream of the transcription start site. It also became apparent that for each transcription factor there were thousands of transcription factor binding sites even though only a small number were actually involved in genuine gene regulation.1

Tuesday, October 21, 2025

Google AI references a "Biblical Genetics" video in claiming that junk DNA is no longer considered junk

90% of the human genome is junk DNA.

Today I did a routine search for "junk DNA" "2025" to see if misinformation is still dominating the web. It is, but that's not the most surprising thing I discovered. Here's what Google AI told me at the top of the search page.

In 2025, "junk DNA" is no longer considered junk, as new studies show it plays vital roles in gene regulation and development. Research from 2025 indicates that these sequences, many of which come from ancient viruses, can act as "genetic switches" that influence how genes are turned on or off and how cells respond to their environment. This has led to potential breakthroughs in regenerative medicine and cancer treatment by providing new therapeutic targets.

This video explains how what was once considered junk DNA has been found to contain thousands of new genes:

The video is by Robert Carter who has a Ph.D. in molecular biology. His site is called Biblical Genetics. He also posts on creation.com

Carter sounds like he knows what he's talking about but he's just parroting all the misinformation that permeates the scientific literature. The main message of this video is that scientists were shocked to discover that the human genome only had 20,000 protein coding genes but we now know (no, we don't) that each gene makes many different proteins and that accounts for the "missing" complexity that all the experts had expected.1

We also "know" (no, we don't) that scientists have discovered tens of thousands of new protein coding genes that make small proteins. He references a Science article by Elizabeth Penissi who has been spreading misinformation about the human genome for more than 25 years.

It's not surprising that Robert Carter wants to discredit the idea of junk DNA. What's surprising is that Google AI is directing readers to a creationist video.


1. The knowledgeable experts predicted that the human genome would have fewer than 30,000 genes and that's exactly what was found when the human genome sequence was published.

Thursday, September 25, 2025

Wednesday talk at the University of Toronto: Larry Moran on "What's in Your Genome"

I'm giving a talk next Wednesday (October 1st) to the members of the Senior College (retired faculty). It's at the University of Toronto Faculty Club at 10am. I'll talk for 50 mins then there's a coffee break followed by 50 mins of questions and discussion.

Guests are welcome but you'll have to pay $10 to cover the cost of coffee and cookies. You can also register to watch my talk on Zoom. You can also stay for lunch at the Faculty CLub but you'll have to let me know so I can put you down as a guest.

Here's the link to register: What's in Your Genome?

 

Wednesday Talk: Wednesday, October 1, 2025, 10am-12pm.

In-person at the Faculty Club and on Zoom

Larry Moran, Biochemistry, University of Toronto

Title: “What’s in Your Genome?”

Abstract: Scientists have been studying the human genome for more than 70 years but today there is considerable controversy about what’s in our genome. The publication of the complete sequence of the human genome in 2001 did nothing to resolve the controversy. For many scientists, the data confirmed their predictions that we have about 30,000 genes and most of our genome is useless junk DNA. Other scientists were shocked to learn that we have so few genes so they began the search for other explanations. Today, the majority of molecular biologists and biochemists believe that most of our genome is functional and there may be as many as 100,000 extra genes that weren’t identified in 2001. The majority of experts in molecular evolution disagree —they believe that 90% of our genome is junk DNA. I will summarize the data from both sides of the controversy and discuss the role that science journalism has played in misrepresenting scientific discoveries about the human genome.


Monday, July 21, 2025

Endogenous retrovirus sequences can be transcriptionally active: the reality vs. the hype

A recent paper on characterizing endogenous retrovirus sequences has attracted some attention because of a press release from Kyoto University that focused on refuting junk DNA. But it turns out that there's no mention of junk DNA in the published paper.

Let's start with a little background. Retroviruses are RNA viruses that go though a stage where their RNA genomes are copied into DNA by reverse transcriptase. The virus may integrate into the host genome and be carried along for many generations producing low levels of virus particles [Retrotransposons/Endogenous Retroviruses]. The integrated copies are called endogenous retroviruses (ERVs).

Our genome contains about 31 different families of ERVS that have integrated over millions of years. Most of the original virus genomes have acquired mutations, including insertions and deletions, and they are no longer active. These sequences account for about 8% of our genome.

Thursday, July 17, 2025

Predatory journals are helping to spread misinformation in the scientific literature

At the end of last year (2024) I posted an article about distinguished molecular biologist William Hasletine who published an article in Forbes about A New Dogma Of Molecular Biology: A Paradigm Shift. The article was about overthrowing the Central Dogma of Molecular Biology because of the discovery of thousands of non-coding genes. There is no paradigm shift. It's a paradigm shaft. [William Haseltine misrepresents molecular biology and calls for a paradigm shift]

Thursday, May 22, 2025

Is AI really "intelligent"? Here are 13 biology questions to test the latest AI algorithms.

Last night I attended a talk by Chris DiCarlo who warned us about the dangers of AI. I'm sure he's right to be worried but I'm skeptical about some of the hype surrounding AI. For example, Chris said that just a few years ago the best AI algorithms were performing at high school level but now they are at Ph.D. level. The implication is that it won't be long before AI is smarter than humans.

Here's the problem. I can only access the cheap versions of AI such as ChatGPT and Scite Assistant but I can also see the results of Google's Generative AI whenever I do a Google search. Chris has access to more sophisticated versions so that's what he might be referring to when he says they operate at the Ph.D. level of intelligence.

Monday, May 19, 2025

A new higher mutation rate in humans includes indels in repetitive DNA regions

Theme

Mutation

-definition
-mutation types
-mutation rates
-phylogeny
-controversies

There are three ways of estimating the human mutation rate. The Biochemical Method is based on the known error rate of DNA replication and the average number of cell divisions between generations. It gives a rate of about 130 mutations per generation.

The Phylogenetic Method assumes that a large fraction of mammalian genomes is evolving at the neutral rate because it is junk DNA. Since we know that the rate of fixation of neutral alleles is equal to the mutation rate, we can estimate the mutation rate if we know the total number of nucleotide difference between two species (e.g. humans and chimpanzees) and the approximate time of divergence from a common ancestor. This gives an estimate of about 112 mutations per generation.

Tuesday, May 06, 2025

L'ADN poubelle: Junk DNA

This is a podcast in French on the topic of junk DNA. The moderator is Thomas C. Durand of La Tronche en Biais, a YouTube channel that focuses on critical thinking. Durand interviews two scientists from l’Université Paris Cité (City University of Paris), Didier Casane and Patrick Laurenti.

It's a two hour video that discusses all the relevant topics on the human genome and junk DNA. The most exciting part for me comes at 56 mins when the moderator asks Casane and Laurenti to recommend a book on the subject (see screenshot on right). Patrick Laurenti suggests that my book should be translated into French but I don't think that's going to happen.


Saturday, May 03, 2025

Saturday, April 12, 2025

Templeton Foundation funds a grant on transposons

The John Templeton Foundation supports "interdisciplinary research and catalyze conversations that enable people to pursue lives of meaning and purpose." Many of these projects have religious themes or religious implications. The foundation is well-known for its support of projects that promote the compatibility of science and religion. You can see a list of recent grants here.

Templeton recently awarded a grant of $607,686 (US) to study the role of transposons in the human genome. The project leader is Stefan Linquist, a philosopher from the University of Guelph (Guelph, Ontario, Canada). Stefan has published a number of papers on junk DNA and he promotes the definition of functional DNA as DNA that is subject to purifying selection [The function wars are over]. Other members of the team include Ryan Gregory and Ford Doolittle who are prominent supporters of junk DNA.

Saturday, March 29, 2025

Tom Cech rejects junk DNA

A few months ago (June, 2024) I commented on an article by Tom Cech in The New York Times. [Tom Cech writes about the "dark matter" of the genome] In that article he expressed the view that 75% of the human genome consists of "dark matter" that is copied into RNAs of unknown function. He believes that many of these mysterious RNAs will turn out to have exciting functions.

I suspected that Cech is opposed to junk DNA and that suspicion is confirmed in his new book The Catalyst.

Monday, March 24, 2025

Google's "Generative AI" lies about junk DNA

Every now and then I check Google to see if there's any news about junk DNA. I use "junk DNA" as my search query.

The first thing I see at the top of the results page is a summary of the topic created by Google's Generative AI, which it claims is experimental. The AI summary is different every time you start a new search but all of the responses are similar in that they criticize the idea of junk DNA. Here's an example from today,

Friday, March 21, 2025

The misinformation spread by ENCODE in 2012 is gradually being recognized

I want to draw your attention to an excellent online book on bacterial genomes: Bacterial Genomes:Trees and Networks. The author is Aswin Sai Narain Seshasayee of the National Centre for Biological Sciences at the Tata Institute of Fundamental Research in Bangalore, India. Here's a link to Chapter 3: The genome: how much DNA? where he explains why bacterial genomes don't have very much junk DNA.

The chapter contains an excellent summary of the history of genome sizes in bacteria and eukaryotes and a detailed description of both the c-value paradox and the mutation load arguments. The relationship between junk DNA and population size is described.

I was especially pleased to see that the author didn't pull any punches in describing the ENCODE publicity campaign and their false statements about junk DNA.

In 2012, a post-human-genome project called ENCODE, which aims to experimentally identify regions of the human genome that undergo transcription—or are bound by a set of DNA-binding proteins, or undergo chemical changes called epigenetic modifications—came to a stunning conclusion that at least 80% of the human genome is functional and that it was time to sing a requiem for the concept of junk DNA! However, this conclusion, which has been severely criticised since its publication, ignores decades of well-supported arguments from evolutionary biology arising from the c-value paradox, some of which we have described here or will do so shortly; it does not quite explain why this conclusion—if broadly applied to the genomes of other multicellular eukaryotes—would not imply that a fish needs 100 times as much functional DNA as a human; and plays “fast and loose” with the definition of the term ‘function’. While the ENCODE project, a great success in many ways, has provided an invaluable resource for the study of human molecular biology, we can safely ignore its ill-fated conclusion on what fraction of the human genome is functional.


Saturday, February 15, 2025

Junk DNA is gradually making its way into mainstream textbooks

The idea that most of the human genome is junk originated more that 50 years ago. Since then, evidence in support of this concept has steadily accumulated but it has been stongly resisted by most biochemists and molecular biologists. Opposition is even stronger among scientists in other fields and in the general public thanks to a steady stream of anti-junk articles in the popular press.

Much of this opposition to junk DNA stems from a massive publiciy campaign launched by ENCODE researchers and the leading science journals back in 2012.

It's likely that most of the controversy over junk DNA is related to differing views on evolution and the power of natural selection. Most people think that natural selection is very powerful so that modern species must be extremely well-adapted to their present environment. They tend to believe that complexity is simply a reflection of sophisticated fine-tuning and this must apply to the human genome. According to this view, the presence of huge amounts of DNA with an unknown function is just a temporary situation and in the next few years most of this 'dark matter' will turn out to have a function. It has to have a function otherwise natural selection would have eliminated it.

Wednesday, February 05, 2025

Why Trust Science?

Bruce Alberts,1 Karen Hopkin, and Keith Roberts have published an essay on Why Trust Science.

In this essay, we address the question of why we can trust science—and how we can identify which scientific claims we can trust. We begin by explaining how scientists work together, as part of a larger scientific community, to generate knowledge that is reliable. We describe how the scientific process builds a consensus, and how new evidence can change the ways that scientists—and, ultimately, the rest of us—see the world. Last, but not least, we explain how, as informed citizens, we can all become “competent outsiders” who are equipped to evaluate scientific claims and are able to separate science facts from science fiction.

Most of the essay describes an idealized version of how science works with an emphasis on collaboration and rigorous oversight. They claim that the work of scientists can usually be trusted because it is self-correcting.

Thursday, January 16, 2025

Intelligent Design Creationists launch a new attack on junk DNA (are they getting worried?)

The Center for Science and Culture (sic) and the Discovery Institute (sic) have published another propaganda video on junk DNA. The emphasis is on their claim that ID predicted a functional genome and that prediction turned out to be correct! The difference between this video an previous attempts to rationalize their failures is that I now get a personal mention and a caricature in this latest video.

I think I understand the problem. The ID creationists are getting worried about junk DNA as they realize that more and more scientists are beginning to understand the real problems with the ENCODE data and previous claims of function. This is why they are attempting to rebut the science behind junk DNA. But the real problem is that they simply don't understand the science as you can see in the video.

Once again, we are faced with a question about whether Intelligent Design Creationists are stupid or lying (or both).