More Recent Comments

Monday, March 04, 2024

Nils Walter disputes junk DNA: (6) The C-value paradox

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the fifth post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift/shaft. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' In the fourth post I discuss his claim that differing opinions on junk DNA are mainly due to philosophical disagreements.

The C-value paradox

One of the important arguments for junk DNA is that it explains the observation that the size of genomes varies enormously between species. Even closely-related species can have vastly different genome sizes. When it became obvious that most of this difference is due to repetitive DNA sequences, it seemed reasonable to assume that much of this DNA was non-functional. This fit with lots of other data.

The explanation is so powerful that Ryan Gergory proposed the Onion Test as a way of testing the logic of any other explanation. What we're talking about is the explanatory power of the junk DNA model. If you are going to propose another model then it should be able to account for the vast differences in genome size in related species.

This poses a problem for the anti-junk crowd since their arguments fail the Onion Test. Let's see how Nils Walter handles this issue. He says,

The C-value paradox or enigma refers to the violation of the prior assumption of a constant ("C") amount ("value") of DNA per haploid set of chromosomes so that genome length is not generally correlated with the complexity of an organism.

Walter is mixing up the C-value paradox with the G-value paradox but that's the least of his problems. In order to maintain that most of the human genome is functional, you need to account for the fact that half of it is repetitive DNA and that this fraction of the genome accounts for a lot of the differences in genome size between different species.

He relies on the standard argument advanced by opponents of junk DNA; namely, that many of these repetitive DNA sequences in the human genome are actually functional. This argument is usually supported by giving a few isolated examples of transposons or viral sequences that have been co-opted to become functional elements in the human genome. This is a version of the cherry-picking fallacy.

The big picture—that most transposon and viral sequences are functional—fails the Onion Test and cannot be an explanation of the C-value paradox. That doesn't seem to bother junk DNA opponents.

There's no evidence of function in most of the middle repetitive DNA sequences but the absence of evidence is of no concern. There's also the embarrassing data showing that these repetitive sequences are evolving at the neutral rate just as you would expect if they were junk. Walter parrots the John Mattick argument that tries to account for that problem. Mattick says that the conclusion is wrong because the repetitive DNA is slightly functional so you can't use that as the standard for the neutral DNA rate. Here's how Walter explains it after describing some functional transposon-related elements.

Estimates of the extent of neutral evolution, or random genetic drift, of the human genome, which are often based on the assumption of the non-functionality of retrotransposon-derived sequences,[27] may have to be adjusted based on these discoveries.

Reference #27 is an old paper by Mattick from 2007 but he (Mattick) has made the same argument more recently. It's nonsense, as I explained in a post from 2013 [The Junk DNA Controversy: John Mattick Defends Design]. If you read that post, you'll see the parallels between what Nils Walter writes in 2024 and what Mattick wrote in 2013.

I covered this specific argument in Chapter 4 of my book on page 103. Here's what I wrote.

There’s one last argument against sequence conservation that I have to mention for completeness. It’s the argument used by John Mattick, one of the most prominent opponents of junk DNA, but a few others have made the same argument. They believe that the sequence conservation data is misleading because the wrong controls are being used. Specifically, they argue that scientists are using the evolution of defective transposon sequences as a measure of the neutral rate of evolution and judging all other sequences by that standard. But, they claim, lots of transposons have a function so they are not evolving at the neutral rate, just evolving slowly. Thus, according to their logic, much of the genome appears to be evolving at the neutral rate when, in fact, it’s actually somewhat conserved. Thus, a much larger proportion of the genome exhibits sequence conservation that is being overlooked because of a false premise.

Several scientists have addressed this criticism and concluded that it has no merit. With few exceptions, degenerate transposon sequences are, in fact, evolving neutrally, as are pseudogenes. Furthermore, you might recall that the back-of-the envelope estimation of phylogenetic mutation rate agrees with the biochemical mutation rate and the direct mutation rate, and that agreement can only be true if most of the human and chimpanzee genomes are evolving at the neutral rate.

There are several papers worth reading but one of he most important is Ponting (2017) because he addresses the particular problem of sequence conservation in the twilight zone. Here's what Ponting says about Matick's claim.

Rapid resculpting of mammalian genomes is dominated by lineage-specific insertion and deletion of transposable element (TE) sequence whose debris, together with other repetitive sequence, contribute up to two-thirds of the human genome [45]. Although occasionally it is proposed that a large fraction of TEs are functional [Mattick video, 2010], there is no evolutionary or experimental evidence to support this. Conversely, because the locations of insertion or deletion mutations in TEs occur almost exactly as would be expected from random events, the vast majority of TEs appear to be inert [47], with less than 2% of TE sequence (approximately 20 Mb) bearing the signature of constraint [44, 48].

It's disappointing that Walter doesn't reference this paper and that, in general, he doesn't address objections to his claims.

The C-value paradox, repetitive DNA, and sequence consevation are covered extensively in Chapter 2 of my book "The Evolution of Sloppy Genomes" and in Chapter 3 "Repetitive DNA and Mobile Genetic Elements."

Ponting, C.P. (2017) Biological function in the twilight zone of sequence conservation. BMC biology 15:1-9. [doi: 10.1186/s12915-017-0411-5]

Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]


John Harshman said...

You seem to have spent most of your post on sequence conservation and little on the C-vallue paradox and Onion Test. Can you in fact explain Walter's argument against big differences in genome size as evidence of junk? It's brief, but I don't understand it at all.


However, modern sequence analyses have found that at least some species, including ancient crop plants such as corn that form allopolyploids (i.e., complete sets of chromosomes from different species) with wide hybridization between variants, provide an opportunity to unite retrotransposons in one genome following a period of divergence, which in turn leads to periodic bursts of retrotransposon and genome expansion.[65] Since the genomes of today represent a recording of sequence alterations over possibly millions of years across many ecological niches, they likely entail evolutionary imprints of ever-varying biological conditions that cannot be fully understood from just examining the end product in extant species.

What is that supposed to mean?

Larry Moran said...

@John Harshman: You have to read that in the context of the next paragraph that says, among other things, "Functionally, LINE-1 retrotransposition has been hypothesized contribute to somatic mosaicism, genome diversification and genetic innovation."

What he's arguing, I think, is that transposon-related sequences aren't junk. Instead they contribute to the long term survival and evolution of the species. The reason we see those sequences in today's genomes is because they played an important role in the past and our genome preserves a record of that success.

SPARC said...

THe more I read about the article the more I get the impression that Walter is trapped in an RNA world of his own making. Just like John Mattick.

John Harshman said...

Larry: That would seem to imply that these sequences were functional at one time, in some other context, but may not be so now. Or is he talking about junk as the repository of once and future function, somehow maintained by selection because it might come in handy some day? It really isn't clear.