More Recent Comments

Showing posts sorted by date for query "junk dna". Sort by relevance Show all posts
Showing posts sorted by date for query "junk dna". Sort by relevance Show all posts

Tuesday, September 05, 2023

John Mattick's new paradigm shaft

John Mattick continues to promote the idea that he is leading a paradigm shift in molecular biology. He believes that he and his colleagues have discovered a vast world of noncoding genes responsible for intricate gene regulation in complex eukaryotes. The latest salvo was fired a few months ago in June 2023.

Mattick, J.S. (2023) A Kuhnian revolution in molecular biology: Most genes in complex organisms express regulatory RNAs. BioEssays:2300080. [doi: 10.1002/bies.202300080]

Thomas Kuhn described the progress of science as comprising occasional paradigm shifts separated by interludes of ‘normal science’. The paradigm that has held sway since the inception of molecular biology is that genes (mainly) encode proteins. In parallel, theoreticians posited that mutation is random, inferred that most of the genome in complex organisms is non-functional, and asserted that somatic information is not communicated to the germline. However, many anomalies appeared, particularly in plants and animals: the strange genetic phenomena of paramutation and transvection; introns; repetitive sequences; a complex epigenome; lack of scaling of (protein-coding) genes and increase in ‘noncoding’ sequences with developmental complexity; genetic loci termed ‘enhancers’ that control spatiotemporal gene expression patterns during development; and a plethora of ‘intergenic’, overlapping, antisense and intronic transcripts. These observations suggest that the original conception of genetic information was deficient and that most genes in complex organisms specify regulatory RNAs, some of which convey intergenerational information.

This paper is promoted by a video in which he explains why there's a Kuhnian revolution under way. This paper differs from most of his others on the same topic because Mattick now seems to have acquired some more knowledge of the mutation load argument and the neutral theory of evolution. Now he's not only attacking the so-called "protein centric" paradigm but also the Modern Synthesis. Apparently, a slew of "anomalies" are casting doubt on several old paradigms.

This is still a paradigm shaft but it's a bit more complicated than his previous versions (see: John Mattick's paradigm shaft). Now his "anomalies" include not only large numbers of noncoding genes but also the C-value paradox, repetitive DNA, introns, enhancers, gene silencing, the g-value enigma, pervasive transcription, transvection, and epigenetics. Also, he now seems to be aware of many of the arguments for junk DNA but not so aware that he can reference any of his critics.1 His challenges to the Modern Synthesis include paramutation which, along with epigenetics, violate the paradigm of the Moden Synthesis because of non-genetic inheritance.

But the heart of his revolution is still the discovery of massive numbers of noncoding genes that only he and a few of his diehard colleague can see.

The genomic programming of developmentally complex organisms was misunderstood for much of the last century. The mammalian genome harbors only ∼20 000 protein-coding genes, similar in number and with largely orthologous functions as those in other animals, including simple nematodes. On the other hand, the extent of non-protein-coding DNA increases with increasing developmental and cognitive complexity, reaching 98.5% in humans. Moreover, high throughput analyses have shown that the majority of the mammalian genome is differentially and dynamically transcribed during development to produce tens if not hundreds of thousands of short and long non-protein-coding RNAs that show highly specific expression patterns and subcellular locations.

The figure is supposed to show that by 2020 junk DNA had been eliminated and almost all of the mammalian genome is devoted to functional DNA—mostly in the form of noncoding genes. There's only one very tiny problem with this picture—it's not supported by any evidence that all those functional noncoding genes exist. This is still a paradigm shaft of the third kind (false paradigm, false overthrow, false data).


1. There are 124 references; Dawkins and ENCODE make the list along with 14 of his own papers. Most of the papers in my list of Required reading for the junk DNA debate are missing. The absence of Palazzo and Gregory (2023) is particularly noteworthy.

Palazzo, A.F., and Gregory, T.R. (2014) The Case for Junk DNA. PLoS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]>/p>

John Mattick's new dog-ass plot (with no dog)

John Mattick is famous for arguing that there's a correlation between genome size and complexity; notably in a 2004 Scientific American article (Mattick, 2004) [Genome Size, Complexity, and the C-Value Paradox ]. That's the article that has the famous Dog-Ass Plot (left) with humans representing the epitome of complexity and genome size. He claims that this correlation is evidence that most of the genomes of complex animals must have a function. He repeats this claim in a recent paper (see below).

Mattick, J.S. (2023) RNA out of the mist. TRENDS in Genetics 39:187-207. [doi: 10.1016/j.tig.2022.11.001,/p>

RNA has long been regarded primarily as the intermediate between genes and proteins. It was a surprise then to discover that eukaryotic genes are mosaics of mRNA sequences interrupted by large tracts of transcribed but untranslated sequences, and that multicellular organisms also express many long ‘intergenic’ and antisense noncoding RNAs (lncRNAs). The identification of small RNAs that regulate mRNA translation and half-life did not disturb the prevailing view that animals and plant genomes are full of evolutionary debris and that their development is mainly supervised by transcription factors. Gathering evidence to the contrary involved addressing the low conservation, expression, and genetic visibility of lncRNAs, demonstrating their cell-specific roles in cell and developmental biology, and their association with chromatin-modifying complexes and phase-separated domains. The emerging picture is that most lncRNAs are the products of genetic loci termed ‘enhancers’, which marshal generic effector proteins to their sites of action to control cell fate decisions during development.

Monday, September 04, 2023

John Mattick's paradigm shaft

Paradigm shifts are rare but paradigm shafts are common. A paradigm shaft is when a scientist describes a false paradigm that supposedly ruled in the past then shows how their own work overthrows that old (false) paradigm.1 In many cases, the data that presumably revolutionizes the field is somewhat exaggerated.

John Mattick's view of eukaryotic RNAs is a classic example of a paradigm shaft. At various times in the past he has declared that molecular biology used to be dominated by the Central Dogma, which, according to him, supported the concept that the only function of DNA was to produce proteins (Mattick, 2003; Morris and Mattick, 2014). More recently, he has backed off this claim a little bit by conceding that Crick allowed for functional RNAs but that proteins were the only molecules that could be involved in regulation. The essence of Mattick's argument is that past researchers were constrained by adherance to the paradigm that the only important functional molecules were proteins and RNA served only an intermediate role in protein synsthesis.

Saturday, July 29, 2023

How could a graduate student at King's College in London not know the difference between junk DNA and non-coding DNA?

There's something called "the EDIT lab blog" written by people at King's College In London (UK). Here's a recent post (May 19, 2023) that was apparently written by a Ph.D. student: J for Junk DNA Does Not Exist!.

It begins with the standard false history,

The discovery of the structure of DNA by James Watson and Francis Crick in 1953 was a milestone in the field of biology, marking a turning point in the history of genetics (Watson & Crick, 1953). Subsequent advances in molecular biology revealed that out of the 3 billion base pairs of human DNA, only around 2% codes for proteins; many scientists argued that the other 98% seemed like pointless bloat of genetic material and genomic dead-ends referred to as non-coding DNA, or junk DNA – a term you’ve probably come across (Ohno, 1972).

You all know what's coming next. The discovery of function in non-coding DNA overthrew the concept of junk DNA and ENCODE played a big role in this revolution. The post ends with,

Nowadays, researchers are less likely to describe any non-coding sequences as junk because there are multiple other and more accurate ways of labelling them. The discussion over non-coding DNA’s function is not over, and it will be long before we understand our whole genome. For many researchers, the field’s best way ahead is keeping an open mind when evaluating the functional consequences of non-coding DNA and RNA, and not to make assumptions about their biological importance.

As Sandwalk readers know, there was never a time when knowledgeable scientists said that all non-coding DNA was junk. They always knew that there was functional DNA outside of coding regions. Real open-minded scientists are able to distinguish between junk DNA and non-coding DNA and they are able to evaluate the evidence for junk DNA without dismissing it based on a misunderstanding of the history of the subject.

The question is why would a Ph.D. student who makes the effort to write a blog post on junk DNA not take the time to read up on the subject and learn the proper definition of junk and the actual evidence? Why would their supervisors and other members of the lab not know that this post is wrong?

It's a puzzlement.


Thursday, July 06, 2023

James Shapiro doesn't like junk DNA

Shapiro doubles down on his claim that junk DNA doesn't exist.

It's been a while since we've heard from James Shaprio. You might recall that James A. Shapiro is a biochemistry/microbiology professor at the University of Chicago and the author of a book promoting natural genetic engineering. I reviewed his book and didn't like it very much—Shapiro didn't like my review [James Shapiro Never Learns] [James Shapiro Responds to My Review of His Book].

Tuesday, June 27, 2023

Gert Korthof reviews my book

Gert Korthof thinks that the current view of evolution is incomplete and he's looking for a better explanation. He just finished reading my book so he wrote a review on his blog.

Scientists say: 90% of your genome is junk. Have a nice day! Biochemist Laurence Moran defends junk DNA theory

The good news is that I've succeeded in making Gert Korthof think more seriously about junk DNA and random genetic drift. The bad news is that I seem to have given him the impression that natural selection is not an important part of evolution. Furthermore, he insists that "evolution needs both mutation and natural selection" because he doesn't like the idea that random genetic drift may be the most common mechanism of evolution. He thinks that statement only applies at the molecular level. But "evolution" doesn't just refer to adaptation at the level of organisms. It's just not true that all examples of evolution must involve natural selection.

I think I've failed to explain the null hypothesis correctly because Korthof writes,

It's clear this is a polemical book. It is a very forceful criticism of ENCODE and everyone who uncritically accepts and spreads their views including Nature and Science. I agree that this criticism is necessary. However, there is a downside. Moran writes that the ENCODE research goals of documenting all transcripts in the human genome was a waste of money. Only a relatively small group of transcripts have a proven biological function ("only 1000 lncRNAs out of 60,000 were conserved in mammals"; "the number with a proven function is less than 500 in humans"; "The correct null hypothesis is that these long noncoding RNAs are examples of noisy transcription", or junk RNA"). Furthermore, Moran also thinks it is a waste of time and money to identify the functions of the thousands of transcripts that have been found because he knows its all junk. I disagree. The null hypothesis is an hypothesis, not a fact. One cannot assume it is true. That would be the 'null dogma'.

That's a pretty serious misunderstanding of what I meant to say. I think it was a worthwhile effort to document the number of transcripts in various cell types and all the potential regulatory sequences. What I objected to was the assumption by ENCODE researchers that these transcripts and sites were functional simply because they exist. The null hypothesis is no function and scientists must provide evidence of function in order to refute the null hypothesis.

I think it would be a very good idea to stop further genomic surveys and start identifying which transcripts and putative regulatory elements are actually functional. I'd love to know the answer to that very important question. However, I recognize that it will be expensive and time consuming to investigate every transcript and every putative regulatory element. I don't think any lab is going to assign random transcripts and random transcription factor binding sites to graduate students and postdocs because I suspect that most of those sequences aren't going to have a function. If I were giving out grant money I give it to some other lab. In that sense, I believe that it would be a waste of time and money to search for the function of tens of thousands of transcripts and over one million transcription factor binding sites.

That not dogmatic. It's common sense. Most of those transcripts and binding sites are not conserved and not under purifying election. That's pretty good evidence that they aren't functional, especially if you believe in the importance of natural selection.

There's lot more to his review including some interesting appendices. I recommend that you read it carefully to see a different perspective than the one I adocate in my book.


Saturday, May 20, 2023

Chapter 10: Turning Genes On and Off

Francis Collins, and many others, believe that the concept of junk DNA is outmoded because recent discoveries have shown that most of the human genome is devoted to regulation. This is part of a clash of worldviews where one side sees the genome as analogous to a finely tuned Swiss watch with no room for junk and the other sees the genome as a sloppy entity that's just good enough to survive.

The ENCODE researchers and their allies claim that the human genome contains more than 600,000 regulatory sites and that means an average of 24 per gene covering about 10,000 bp per gene. I explain why these numbers are unreasonable and why most of the sites they identify have nothing to do with biologically significant regulation.

This chapter also covers the epigenetics hype and restriction/modification.

Click on this link to see more.
Chapter 10: Turning Genes On and Off


Wednesday, May 17, 2023

Chapter 9: The ENCODE Publicity Campaign

In September 2012, the ENCODE researchers published a bunch of papers claiming to show that 80% of the human genome was functional. They helped orchestrate a massive publicity campaign with the help pf Nature— a campaign that succeeded in spreading the message that junk DNA had been refuted.

That claim was challenged within 24 hours by numerous scientists on social media. They pointed out that the ENCODE researchers were using a ridiculous definition of function and that they had completely ignored all the evidence for junk DNA. Over the next two years there were numerous scientific papers criticizing the ENCODE claims and the ENCODE researchers were forced to retract the claim that they had proven that 80% of the genome is functional.

I discuss what went wrong and lay the blame mostly on the ENCODE researchers who did not behave as proper scientists when presenting a controversial hypothesis. The editors of Nature share the blame for not doing a proper job of vetting the ENCODE claims and not subjecting the papers to rigorous peer review. Science writers also failed to think critically about the results they were reporting.

Click on this link to see more.
Chapter 9: The ENCODE Publicity Campaign


Monday, May 15, 2023

Chapter 8: Noncoding Genes and Junk RNA

I think there are no more than 5,000 noncoding genes but many scientists claim that there are tens of thousands of newly discovered noncoding genes. I describe the known noncoding genes (less than 1000) and explain why many of the transcripts detected are just junk RNA produced by spurious transcription. The presence of abundant noncoding genes will not solve the Deflated Ego Problem.

This chapter covers the misconceptions about the Central Dogma and how they are incorrectly used to try and discredit junk DNA. The views of John Mattick are explained and refuted. I end the chapter with a plea to adopt a worldview that can accommodate messy biochemistry and a sloppy genome that's full of junk DNA.

Click on this link to see more.

Chapter 8: NoncodingGenes and Junk RNA

Thursday, May 11, 2023

Chapter 7: Gene Families and the Birth & Death of Genes

This chapter describes gene families in the human genome. I explain how new genes are born by gene duplication and how they die by deletion or by becoming pseudogenes. Our genome is littered with pseudogenes: how do they evolve and are they all junk? What are the consequences of whole genome duplications and what does it teach us about junk DNA? How many real ORFan genes are there and why do some people think there are more? Finally, you will learn why dachshunds have short legs and what "The Bridge on the River Kwai" has to do with the accuracy of the human genome sequence.

Click on this link to see more.

Gene Families and the Birth and Death of Genes

Saturday, March 25, 2023

ChatGPT lies about junk DNA

I asked ChatGPT some questions about junk DNA and it made up a Francis Crick quotation and misrepresented the view of Susumu Ohno.

We have finally restored the Junk DNA article on Wikipedia. (It was deleted about ten years ago when Wikipedians decided that junk DNA doesn't exist.) One of the issues on Wikipedia is how to deal with misconceptions and misunderstandings while staying within the boundaries of Wikipedia culture. Wikipedians have an aversion to anything that looks like editorializing so you can't just say something like, "Nobody ever said that all non-coding DNA was junk." Instead, you have to find a credible reference to someone else who said that.

I've been trying to figure out how far the misunderstandings of junk DNA have spread so I asked ChatGPt (from OpenAI) again.

Wednesday, March 08, 2023

A small crustacean with a very big genome

The antarctic krill genome is the largest animal genome sequenced to date.

Antarctic krill (Euphausia superba) is a species of small crustacean (about 6 cm long) that lives in large swarms in the seas around Antarctica. It is one of the most abundant animals on the planet in terms of biomass and numbers of individuals.

It was known to have a large genome with abundant repetitive DNA sequences making assembly of a complete genome very difficult. Recent technological advances have made it possible to sequence very long fragments of DNA that span many of the repetitive regions and allow assembly of a complete genome (Shao et al. 2023).

The project involved 28 scientists from China (mostly), Australia, Denmark, and Italy. To give you an idea of the effort involved, they listed the sequencing data that was collected: 3.06 terabases (Tb) PacBio long read sequences, 734.99 Gb PacBio circular consensus sequences, 4.01 Tb short reads, and 11.38 Tb Hi-C reads. The assembled genome is 48.1 Gb, which is considerably larger than that of the African lungfish (40 Gb), which up until now was the largest fully sequenced animal genome.

The current draft has 28,834 protein-coding genes and an unknown number of noncoding genes. About 92% of the genome is repetitive DNA that's mostly transposon-related sequences. However, there is an unusual amount of highly repetitive DNA organized as long tandem repeats and this made the assembly of the complete genome quite challenging.

The protein-coding genes in the Antarctic krill are longer than in other species due to the insertion of repetitive DNA into introns but the increase in intron size is less than expected from studies of other large genomes such as lungfish and Mexican axolotl. It looks like more of the genome expansion has occurred in the intergenic DNA compared to these other species.

This study supports the idea that genome expansion is mostly due to the insertion and propagation of repetitive DNA sequences. Some of us think that the repetitive DNA is mostly junk DNA but in this case it seems unusual that there would be so much junk in the genome of a species with such a huge population size (about 350 trillion individuals). The authors were aware of this problem but they were able to calculate an effective population size because they had sequence data from different individuals all around Antarctica. The effective population size (Ne) turned out to be one billion times smaller than the census population size indicating that the population of krill had been much smaller in the recent past. Their data suggests strongly that this smaller population existed only 10 million years ago.

The authors don't mention junk DNA. They seem to favor the idea that large genomes are associated with crustaceans that live in polar regions and that large genomes may confer a selective advantage.


Shao, C., Sun, S., Liu, K., Wang, J., Li, S., Liu, Q., Deagle, B.E., Seim, I., Biscontin, A., Wang, Q. et al. (2023) The enormous repetitive Antarctic krill genome reveals environmental adaptations and population insights. Cell 186:1-16. [doi: 10.1016/j.cell.2023.02.005]

Wednesday, March 01, 2023

Definition of a gene (again)

The correct definition of a molecular gene isn't difficult but getting it recognized and accepted is a different story.

When writing my book on junk DNA I realized that there was an issue with genes. The average scientist, and consequently the average science writer, has a very confused picture of genes and the proper way to define them. The issue shouldn't be confusing for Sandwalk readers since we've covered that ground many times in the past. I think the best working definition of a gene is, "A gene is a DNA sequence that is transcribed to produce a functional product" [What Is a Gene?]

Thursday, February 16, 2023

What are the best Nobel Prizes in biochemistry & molecular biology since 1945?

The 2022 Nobel Prize in Physiology or Medicne went to Svante Pääbo “for his discoveries concerning the genomes of extinct hominins and human evolution”. It's one of a long list of Nobel Prizes awarded for technological achievement. It most cases, the new techniques led to a better understanding of science and medicine.

Since World War II, there have been significant advances in our understanding of biology but most of these have come about by the slow and steady accumulation of knowledge and not by paradigm-shifting breakthroughs. These advances don't often get recognized by the Nobel Prize committees because it's difficult to single out any one individual or any single experiment that merits a Nobel Prize. In some cases the Nobel Prize committees have tried to recognize major advances by picking out leaders that have made important contributions over a number of years but their choices don't always satisfy others in the field. One of the notable successes is the awarding of Nobel Prizes to Max Delbrück, Alfred D. Hershey and Salvador E. Luria “for their discoveries concerning the replication mechanism and the genetic structure of viruses” (Nobel Prize in Physiology or Medicine 1969). Another is Edward B. Lewis, Christiane Nüsslein-Volhard and Eric F. Wieschaus “for their discoveries concerning the genetic control of early embryonic development” (Nobel Prize in Physiology or Medicine 1995)

Birds of a feather: epigenetics and opposition to junk DNA

There's an old saying that birds of a feather flock together. It means that people with the same interests tend to associate with each other. It's extended meaning refers to the fact that people who believe in one thing (X) tend to also believe in another (Y). It usually means that X and Y are both questionable beliefs and it's not clear why they should be associated.

I've noticed an association between those who promote epigenetics far beyond it's reasonable limits and those who reject junk DNA in favor of a genome that's mostly functional. There's no obvious reason why these two beliefs should be associated with each other but they are. I assume it's related to the idea that both beliefs are presumed to be radical departures from the standard dogma so they reinforce the idea that the author is a revolutionary.

Or maybe it's just that sloppy thinking in one field means that sloppy thinking is the common thread.

Here's an example from Chapter 4 of a 2023 edition of the Handbook of Epigenetics (Third Edition).

The central dogma of life had clearly established the importance of the RNA molecule in the flow of genetic information. The understanding of transcription and translation processes further elucidated three distinct classes of RNA: mRNA, tRNA and rRNA. mRNA carries the information from DNA and gets translated to structural or functional proteins; hence, they are referred to as the coding RNA (RNA which codes for proteins). tRNA and rRNA help in the process of translation among other functions. A major part of the DNA, however, does not code for proteins and was previously referred to as junk DNA. The scientists started realizing the role of the junk DNA in the late 1990s and the ENCODE project, initiated in 2003, proved the significance of junk DNA beyond any doubt. Many RNA types are now known to be transcribed from DNA in the same way as mRNA, but unlike mRNA they do not get translated into any protein; hence, they are collectively referred to as noncoding RNA (ncRNA). The studies have revealed that up to 90% of the eukaryotic genome is transcribed but only 1%–2% of these transcripts code for proteins, the rest all are ncRNAs. The ncRNAs less than 200 nucleotides are called small noncoding RNAs and greater than 200 nucleotides are called long noncoding RNAs (lncRNAs).

In case you haven't been following my blog posts for the past 17 years, allow me to briefly summarize the flaws in that paragraph.

  • The central dogma has nothing to do with whether most of our genome is junk
  • There was never, ever, a time when knowledgeable scientists defended the idea that all noncoding DNA is junk
  • ENCODE did not "prove the significance of junk DNA beyond any doubt"
  • Not all transcripts are functional; most of them are junk RNA transcribed from junk DNA

So, I ask the same question that I've been asking for decades. How does this stuff get published?


Sunday, January 01, 2023

The function wars are over

In order to have a productive discussion about junk DNA we needed to agree on how to define "function" and "junk." Disagreements over the definitions spawned the Function Wars that became intense over the past decade. That war is over and now it's time to move beyond nitpicking about terminology.

The idea that most of the human genome is composed of junk DNA arose gradually in the late 1960s and early 1970s. The concept was based on a lot of evidence dating back to the 1940s and it gained support with the discovery of massive amounts of repetitive DNA.

Various classes of functional DNA were known back then including: regulatory sequences, protein-coding genes, noncoding genes, centromeres, and origins of replication. Other categories have been added since then but the total amount of functional DNA was not thought to be more than 10% of the genome. This was confirmed with the publication of the human genome sequence.

From the very beginning, the distinction between functional DNA and junk DNA was based on evolutionary principles. Functional DNA was the product of natural selection and junk DNA was not constrained by selection. The genetic load argument was a key feature of Susumu Ohno's conclusion that 90% of our genome is junk (Ohno, 1972a; Ohno, 1972b).

Thursday, December 22, 2022

Junk DNA, TED talks, and the function of lncRNAs

Most of our genome is transcribed but so far only a small number of these transcripts have a well-established biological function.

The fact that most of our genome is transcribed has been known for 50 years but that fact only became widely known with the publication of ENCODE's preliminary results in 2007 (ENCODE, 2007). The ENOCDE scientists referred to this as "pervasive transription" and this label has stuck.

By the end of the 1970s we knew that much of this transcription was due to introns. The latest data shows that protein coding genes and known noncoding genes occupy about 45% of the genome and most of that is intron sequences that are mostly junk. That leaves 30-40% of the genome that is transcribed at some point producing something like one million transcripts of unknown function.