More Recent Comments

Showing posts sorted by date for query encode. Sort by relevance Show all posts
Showing posts sorted by date for query encode. Sort by relevance Show all posts

Thursday, August 04, 2022

Identifying functional DNA (and junk) by purifying selection

Functional DNA is best defined as DNA that is currently under purifying selection. In other words, it can't be deleted without affecting the fitness of the individual. This is the "maintenance function" definition and it differs from the "causal role" and "selected effect" definitions [The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect].

It has always been difficult to determine whether a given sequence is under purifying selection so sequence conservation is often used as a proxy. This is perfectly justifiable since the two criteria are strongly correlated. As a general rule, sequences that are currently being maintained by selection are ancient enough to show evidence of conservation. The only exceptions are de novo sequences and sequences that have recently become expendable and these are rare.

Sunday, July 31, 2022

Junk DNA causes cancer

This is a story about misleading press releases. The spread of misinformation by press offices is a serious issue that needs to be addressed.

The Institute of Cancer Research in London (UK) published a press release on July 19, 2022 with the provocative title: ‘Junk’ DNA could lead to cancer by stopping copying of DNA. The first three sentences tell most of the story.

Scientists have found that non-coding ‘junk’ DNA, far from being harmless and inert, could potentially contribute to the development of cancer.

Their study has shown how non-coding DNA can get in the way of the replication and repair of our genome, potentially allowing mutations to accumulate.

It has been previously found that non-coding or repetitive patterns of DNA – which make up around half of our genome – could disrupt the replication of the genome.

Nobody ever said that junk DNA was "inert and harmless;" in fact it is assumed to be slightly deleterious and only gets fixed because it is invisible to natural selection in small populations (Nearly Neutral Theory). And no intelligent scientist equates noncoding DNA and junk DNA, even by implication. But in any case, this article isn't about all junk DNA, it's about certain small stretches of repetitive DNA that interfere with replication so that the resulting mutations have to be fixed by repair mechanisms. The most likely sequences to interfere with replication are repeats of CG or (CG)n repeats. As the authors point out in the discussion, these repeats are "extremely rare" in all genomes, including the human genome, suggesting that they are under negative selection.

Other, more common, repeats also show detectable in vitro interference with replisomes at replication forks. The errors introduced by replication stalling can be repaired but some of them will escape repair causing mutations. It's not clear to me why mutations in junk DNA are a problem. That's not explained in the paper.

Here's the paper.

Casas-Delucchi, C.S., Daza-Martin, M., Williams, S.L. et al. (2022) Mechchanism of replication stalling and recovery within repetitive DNA. Nat Commun 13:3953 [doi: 10.1038/s41467-022-31657-x]

Accurate chromosomal DNA replication is essential to maintain genomic stability. Genetic evidence suggests that certain repetitive sequences impair replication, yet the underlying mechanism is poorly defined. Replication could be directly inhibited by the DNA template or indirectly, for example by DNA-bound proteins. Here, we reconstitute replication of mono-, di- and trinucleotide repeats in vitro using eukaryotic replisomes assembled from purified proteins. We find that structure-prone repeats are sufficient to impair replication. Whilst template unwinding is unaffected, leading strand synthesis is inhibited, leading to fork uncoupling. Synthesis through hairpin-forming repeats is rescued by replisome-intrinsic mechanisms, whereas synthesis of quadruplex-forming repeats requires an extrinsic accessory helicase. DNA-induced fork stalling is mechanistically similar to that induced by leading strand DNA lesions, highlighting structure-prone repeats as an important potential source of replication stress. Thus, we propose that our understanding of the cellular response to replication stress may also be applied to DNA-induced replication stalling.

The word "junk" does not appear anywhere in the paper and the word "cancer" appears only once in the text where it refers to a "cancer-associated" mutation in yeast. This makes me wonder why the press release uses both of these words so prominently. Does anybody have any ideas?

Perhaps it has something to do with a quotation from Gideon Coster, who is described as the study leader. He says,

We wanted to understand why it seems more difficult for cells to copy repetitive DNA sequences than other parts of the genome. Our study suggests that so-called junk DNA is actually playing an important and potentially damaging role in cells, by blocking DNA replication and potentially opening the door to cancerous mutations.

I find it strange that he refers to "so-called junk DNA" in the press release but didn't mention it in the peer-reviewed paper. He also didn't emphasize cancerous mutations in the paper.

The press release contain another quotation, this time it's from Kristian Helin who is the Chief Executive of The Institute of Cancer Research. He says,

This study helps to unravel the puzzle of junk DNA – showing how these repetitive sequences can block DNA replication and repair. It’s possible that this mechanism could play a role in the development of cancer as a cause of genetic instability – especially as cancer cells start dividing more quickly and so place the process of DNA replication under more stress.

It's unclear to me how studying these mutation-inducing repeats could help "unravel the puzzle of junk DNA" but that's probably why I'm not the chief executive of a cancer research insitute. I'm so stupid that I didn't even known there WAS a "puzzle" of junk DNA to be unravelled!

It's time for scientists to speak out against press releases like this one. It misrepresents the results and their interpretation as published after undergoing peer review. Intead, the press release is used as a propaganda exercise to promote the personal views of the scientists—views that they couldn't publish. This is what happened with ENCODE and it's becoming more and more common. The fact that, in this case, the personal views of these scientists are flawed only makes the situation worse.


Saturday, July 30, 2022

Wikipedia blocks any mention of junk DNA in the "Human genome" article

Wikipedia has an article on the Human genome. The introduction includes the following statement,

Human genomes include both protein-coding DNA genes and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly-repetitive sequences.

This is a recent improvement (July 22, 2022) over the original statement that simply said, "Human genomes include both protein-coding DNA genes and noncoding DNA." I noted in the "talk" section" that there was no mention of junk DNA in the entire article on the human genome so I added a sentence to the end of the section quoted above. I said,

Some non-coding DNA is junk, such as pseudogenes, but there is no firm consensus over the total mount of junk DNA.1

Sunday, July 17, 2022

The Function Wars Part XIII: Ford Doolittle writes about transposons and levels of selection

It's theoretically possible that the presence of abundant transposon fragments in a genome could provide a clade with a selective advantage at the level of species sorting. Is this an important contribution to the junk DNA debate?

As I explained in Function Wars Part IX, we need to maintain a certain perspective in these debates over function. The big picture view is that 90% of the human genome qualifies as junk DNA by any reasonable criteria. There's lots of evidence to support that claim but in spite of the evidence it is not accepted by most scientists.

Most scientists think that junk DNA is almost an oxymoron since natural selection would have eliminated it by now. Many scientists think that most of our genome must be functional because it is transcribed and because it's full of transcription factor binding sites. My goal is to show that their lack of understanding of population genetics and basic biochemistry has led them astray. I am trying to correct misunderstandings and the false history of the field that have become prominent in the scientific literature.

For the most part, philosophers and their friends have a different goal. They are interested in epistemology and in defining exactly what you mean by 'function' and 'junk.' To some extent, this is nitpicking and it undermines my goal by lending support, however oblique, to opponents of junk DNA.1

As I've mentioned before, this is most obvious when it comes to the ENCODE publicity campaign of 2012 [see: Revising history and defending ENCODE]. The reason why the ENCODE researchers were wrong is that they didn't understand that many transcription factor binding sites are unimportant and they didn't understand that many transcripts could be accidental. These facts are explained in the best undergraduate textbooks and they were made clear to ENCODE researchers in 2007 when they published their preliminary results. They were wrong because they didn't understand basic biochemistry. [ENCODE 2007]

Some people are trying to excuse ENCODE on the grounds that they simply picked an inappropriate definition of function. In other words, ENCODE made an epistemology error not a stupid biochemistry mistake. Here's another example from a new paper by Ford Doolittle in Biology and Philosophy. He says,

However, almost all of these developments in evolutionary biology and philosophy passed molecular genetics and genomics by, so that publicizers of the ENCODE project’s results could claim in 2012 that 80.4% of the human genome is “functional” (Ecker et al 2012) without any well thought-out position on the meaning of ‘function’. The default assumption made by ENCODE investigators seemed to have been that detectable activities are almost always products of selection and that selection almost always serves the survival and reproductive interests of organisms. But what ENCODE interpreted as functionality was unclear—from a philosophical perspective. Charitably, ENCODE’s principle mistake could have been a too broad and level-ignorant reading of selected effect (SE) “function” (Garson 2021) rather than the conflation of SE and causal role (CR) definitions of “the F-word”, as it is often seen as being (Doolittle and Brunet 2017).

My position is that this is far too "charitable." ENCODE's mistake was not in using the wrong definition of function; their mistake was in assuming that all transcripts and all transcription factor binding sites were functional in any way. That was a stupid assumption and they should have known better. They should have learned from the criticism they got in 2007.

This is only a small part of Doolittle's paper but I wanted to get that off my chest before delving into the main points. I find it extremely annoying that there's so much ink and electrons being wasted on the function wars when the really important issues are a lack of understanding of population genetics and basic biochemistry. I fear that the function wars are contributing to the existing confusion rather than clarifying it.

Doolittle, F. (2022) All about levels: transposable elements as selfish DNAs and drivers of evolution. Biology & Philosophy 37: article number 24 [doi: 10.1007/s10539-022-09852-3]

The origin and prevalence of transposable elements (TEs) may best be understood as resulting from “selfish” evolutionary processes at the within-genome level, with relevant populations being all members of the same TE family or all potentially mobile DNAs in a species. But the maintenance of families of TEs as evolutionary drivers, if taken as a consequence of selection, might be better understood as a consequence of selection at the level of species or higher, with the relevant populations being species or ecosystems varying in their possession of TEs. In 2015, Brunet and Doolittle (Genome Biol Evol 7: 2445–2457) made the case for legitimizing (though not proving) claims for an evolutionary role for TEs by recasting such claims as being about species selection. Here I further develop this “how possibly” argument. I note that with a forgivingly broad construal of evolution by natural selection (ENS) we might come to appreciate many aspects of Life on earth as its products, and TEs as—possibly—contributors to the success of Life by selection at several levels of a biological hierarchy. Thinking broadly makes this proposition a testable (albeit extraordinarily difficult-to-test) Darwinian one.

The essence of Ford's argument builds on the idea that active transposable elements (TEs) are examples of selfish DNA that propagate in the genome. This is selection at the level of DNA. Other elements of the genome, such as genes, regulatory sequences, and origins of replication, are examples of selection at the level of the organism and individuals within a population. Ford points out that some transposon-related sequences might be co-opted to form functional regions of the genome that are under purifying selection at the level of organisms and populations. He then goes on to argue that species with large amounts of transposon-related sequences in their genomes might have an evolutionary advantage because they have more raw material to work with in evolving new functions. If this is true, then this would be an example of species level selection.

These points are summarized near the end of his paper.

Thus TE families, originating and establishing themselves abundantly within a species through selection at their own level may wind up as a few relics retained by purifying selection at the level of organisms. Moreover, if this contribution to the formation of useful relics facilitated the diversification of species or the persistence of clades, then we might also say that these TE families were once “drivers” of evolution at these higher levels, and that their possession was once an adaptation at each such higher level.

There are lots of details that we could get into later but I want to deal with the main speculation; namely, that species with lots of TE fragments in their genome might have an adaptive advantage over species that don't.

This is challenging topic because lots of people have expressed their opinions on many of the topics that Ford covers in his article. None of their opinions are identical and many of them are based on different assumptions about things like evolvability, teleology, the significance of the problem, how to define species sorting, and whether hierachy theory is important . Many of those people are very smart (as is Ford Doolittle) and it hurts my brain trying to figure out who is correct. I'll try and explain some of the issues and the controversies.

A solution in search of a problem?

What's the reason for speculating that abundant bits of junk DNA might be selected because they will benefit the species at some time in the next ten million years or so? Is there a problem that this speculation explains?

The standard practice in science is to suggest hypotheses that account for an unexplained observation; for example, the idea of abundant junk DNA explained the C-value Paradox and the mutation load problem. Models are supposed to have explanatory power—they are supposed to explain something that we don't understand.

Ford thinks there's is a reason for retaining junk DNA. He writes,

Eukaryotes are but one of the many clades emerging from the prokaryotic divergence. Although such beliefs may be impossible to support empirically it is widely held that that was a special and evolutionarily important event....

Assuming this to be true (but see Booth and Doolittle 2015) we might ask if there are reasons for this differential evolutionary success, and are these reasons clade- level properties that have been selected for at this high level? Is one of them the possession of large and variable families of TEs?

You'll have to read his entire paper to see his full explanation but this is the important part. Ford, thinks that the diversity and success of eukaryotes requires an explanation because it can't be accounted for by standard evolutionary theory. I don't see the problem so I don't see the need for an explanation.

Of course there doesn't have to be a scientific problem that needs solving. This could just be a theoretical argument showing that excess DNA could lead to species level selection. That puts it more in the realm of philosophy and Ford does make the point in his paper that one of his goals is simply to defend multilevel selection theory (MLST) as a distinct possibility. The main proponents of this idea (Hierarchy Theory) are Niles Eldredge and Stephen Jay Gould and the theory is thoroughly covered in Gould's book The Structure of Evolutionary Theory. I was surprised to discover that this book isn't mentioned in the Doolittle paper.

I don't have a problem with Hierarchy Theory (or Multilevel Selection Theory, or group selection) as a theoretical possibility. The important question, as far as I'm concerned, is whether there's any evidence to support species selection. As Ford notes, "such beliefs may be impossible to support empirically" and that may be true; however, there's a danger in promoting ideas that have no empirical support because that opens a huge can of worms that less rigorous scientists are eager to exploit.

With respect to the role of transposon-related sequences, the important question, in my opinion, is: Would life look substantially less diverse or less complex if no transposon-related sequences had ever been exapted to form elements that are now under purifying selection? I suspect that the answer is no—life would be different but no less diverse or complex.

Species selection vs species sorting

Speculations about species-level evolution are usually discussed in the context of group selection and species selection or, more broadly, as the levels-of-selection debate. Those are the terms Doolittle uses and he is very much interested in explaining junk DNA as contributing to adaptation at the species level.

But if the insertion of [transcription factor binding sites] TFBSs helps species to innovate and thus diversify (speciate and/or forestall extinction) and is a consequence of TFBS-bearing TE carriage, then such carriage might be cast as an adaptation at the level of species and maintained at that level too, by the differential extinction of TE-deficient species (Linquist et al 2020; Brunet et al 2021).

I think it's unfortunate that we don't use the term 'species sorting' instead of 'species selection' because as soon as you restrict your discussion to selection, you are falling into the adaptationist trap. Elisabeth Vrba, backed by Niles Eldredge, preferred 'species sorting' partly in order to avoid this trap.

I am convinced, on the basis of Vrba's analysis, that we naturalists have been saying 'species selection' when we really should have been calling the phenomenon 'species sorting.' Species sorting is extremely common, and underlies a great deal of evolutionary patterns, as I shall make clear in this narrative. On the other hand, true species selection, in its properly more restricted sense, I now believe to be relatively rare. (Niles Eldredge, in Reinventing Darwin (1995) p. 137)

As I understand it, the difference between 'species sorting' and 'species selection' is that the former term does not commit you to an adaptationist explanation.2 Take the Galapagos finches as an example. There has been fairly rapid radiation of these species from a small initial population that reached the islands. This radiation was not due to any intrinsic propery of the finch genome that made finches more successful at speciation; it was just a lucky accident. Similary, the fact that there are many marsupial species in Australia is probably not because the marsupial genome is better suited to evolution; it's probably just a founder effect at the species level.

Gould still prefers 'species selection' but he recognizes the problem. He points out that whenever you view species as evolving entities within a larger 'population' of other species, you must consider species drift as a distinct possibility. And this means that you can get evolution via a species-level founder effect that has nothing to do with adapation.

Low population (number of species in a clade) provides the enabling criterion for important drift ... at the species level. The analogue of genetic drift—which I shall call 'species drift' must act both frequently and powerfully in macroevolution. Most clades do not contain large numbers of species. Therefore, trends may often originate for effectively random reasons. (Stephen J. Gould, in The Structure of Eolutionary Theory (2001) p. 736)

Let's speculate how this might relate to the current debate. It's possible that the apparent diversity and complexity of large multicellular eukaryotes is mostly due to the fact that they have small populations and long generation times. This means that there were plenty of opportunities for small isolated populations to evolve distinctive features. Thus, we have, for example, more than 1000 different species of bats because of species drift (not species selection). What this means is that the evolution of new species is due to the same reason (small populations) as the evolution of junk DNA. One phenomenon (junk DNA) didn't cause the other (speciation); instead, both phenomena have the same cause.

Michael Lynch has written about this several times, but the important, and mind-hurting, paper is Lynch (2007) where he says,

Under this view, the reductions in Ng that likely accompanied both the origin of eukaryotes and the emergence of the animal and land-plant lineages may have played pivotal roles in the origin of modular gene architectures on which further develomental complexity was built.

Lynch's point is that we should not rule out nonadaptive processes (species drift) in the evolution of complexity, modularity, and evolvability.

If we used species sorting instead of species selection, it would encourage a more pluralsitic perspective and a wider variety of speculations. I don't mean to imply that this issue is ignored by Ford Doolittle, only that it doesn't get the attention it deserves.

Evolvability and teleology

Ford is invoking evolvability as the solution to the evolved complexity and diversity of multicellular eukaryotes. This is not a new idea: it is promoted by James Shapiro, by Mark Kirschner and John Gerhart, and by Günter Wagner, among others. (None of them are referenced in the Doolittle paper.)

The idea here is that clades with lots of TEs should be more successful than those with less junk DNA. It would be nice to have some data the address this question. For example, is the success of the bat clade due to more transposons than other mammals? Probably not, since bats have smaller genomes than other mammals. What about birds? There are lots of bird species but birds seem to have smaller genomes than some of their reptilian ancestors.

There are dozens of Drosophila species and they all have smaller genome sizes than many other flies. In this case, it looks like the small genome had an advantage in evolvability but that's not the prediction.

The concept of evolvability is so attractive that even a staunch gene-centric adaptationist like Richard Dawkins is willing to consider it (Dawkins, 1988). Gould devotes many pages (of course) to the subject in his big Structure book. Both Dawkins and Gould recognize that they are possibly running afoul of teleology in the sense of arguing that species have foresight. Here's how Dawkins puts it ...

It is all too easy for this kind of argument to be used loosely and unrespectably. Sydney Brenner justly ridiculed the idea of foresight in evolution, specifically the notion that a molecule, useless to a lineage of organisms in it own geological era, might nevertheless be retained in the gene pool because of its possible usefulness in some future era: "It might come in handy in the Cretaceous!" I hope I shall not be taken as saying anything like that. We certainly should have no truck with suggestions that individual animals might forego their selfish advantage because of posssible long-term benefits to their species. Evolution has no foresight. But with hindsight, those evolutionary changes in embryology that look as though they were planned with foresight are the ones that dominate successful forms of life.

I interpret this to mean that we should not be fooled by hindsight into looking for causes when what we are seeing is historical contingency. If you have not already read Wonderful Life by Stephen Jay Gould then I highly recommend that you get a copy and read it now in order to understand the role of contingency in the evolution of animals. You should also brush up on the more recent contributions to the tape-of-life debate in order to put this discussion about evolvability into the proper context [Replaying life's tape].

Ford also recognizes the teleological problem and even quotes Sydney Brenner! Here's how Ford explains the relationship between transposon-related sequences and species selection.

As I argue here, organisms took on the burden of TEs not because TE accumulation, TE activity or TE diversity are selected-for traits within any species, serving some current or future need, but because lower-level (intragenomic) selection creates and proliferates TEs as selfish elements. But also, and just possibly, species in which this has happened speciate more often or last longer and (even more speculatively still) ecosystems including such species are better at surviving through time, and especially through the periodic mass extinctions to which this planet has been subjected (Brunet and Doolittle 2015). ‘More speculatively still’ because the adaptations at higher levels invoked are almost impossible to prove empirically. So what I present are again only ‘how possibly’, not ‘how actually’ arguments (Resnick 1991).

This is diving deeply into the domain of abstract thought that's not well-connected to scientific facts. As I mentioned above, I tend to look on these speculations as solutions looking for a problem. I would like to see more evidence that the properties of genomes endow certain species with more power to diversify than species with different genomic properties. Nevertheless, the idea of evolvability is not going away so let's see if Ford's view is reasonable.

As usual, Stephen Jay Gould has thought about this deeply and come up with some useful ideas. His argument is complicated but I'll try and explain it in simple terms. I'm relying mostly on the section called "Resolving the paradox of Evolvability and Defining the Exaptive Pool" in The Structure of Evolutionary Theory pages 1270-1295.

Gould argues that in Hierarchy Theory, the properties at each level of evolution must be restricted to that level. Thus, you can't have evolution at the level of DNA impinging on evolution at the level of the organism. For example, you can't have selection between transposons within a genome affecting evolution at the level of organisms and population. Similarly, selection at the level of organisms can't directly affect species sorting.

What this means in terms of genomes full of transposon-related sequences is the following. Evolution at the level of species involves sorting (or selection) between different species or clades. Each of these species have different properties that may or may not make them more prone to speciations but those properties are equivalent to mutations, or variation, at the level of organisms. Some species may have lots of transposon sequences in their genome and some may have less and this difference arises just by chance as do mutations. There is no foresight in generating mutations and there is no foresight in having different sized genomes.

During species sorting, the differences may confer some selective advantage so species with, say, more junk DNA are more likely to speciate but the differences arose by chance in the same sense that mutations arise by chance (i.e. with no foresight). For example, in Lenski's long-term evolution experiment, certain neutral mutations became fixed by chance so that new mutations arising in this background became adaptive [Contingency, selection, and the long-term evolution experiment]. Scientists and philosophers aren't concerned about whether those neutral mutations might have arisen specifically in order to potentiate future evolution.

Similarly, it is inappropriate to say that transposons, or pervasive transcription, or splicing errors, arose BECAUSE they encouraged evolution at the species level. Instead, as Dawkins said, those features just look with hindsight as though they were planned. They are fortuitous accidents of evolution.

Gould also makes the point, again, that we could just as easily be looking at species drift as species selection and we have to be careful not to resort to adaptive just-so stories in the absence of evidence for selection.

Here's how Gould describes his view of evolvability using the term "spandrel" to describe potentiating accidents.

Thus, Darwinians have always argued that mutational raw material must be generated by a process other than organismal selection, and must be "random" (in the crucal sense of undirected towards adaptive states) with respect to realized pathways of evolutionary change. Traits that confer evolvability upon species-individuals, but arise by selection upon organisms, provide a precise analog at the species level to the classical role of mutation at the organismal level. Because these traits of species evolvability arise by a different process (organismal selection), unrelated to the selective needs of species, they may emerge as the species level as "random" raw material, potentially utilizable as traits for species selection.

The phenotypic effects of mutation are, in exactly the same manner, spandrels at the organismal level—that is, nonadaptive and automatic manifestations at a higher level of different kinds of causes acting directly at a lower level. The exaptation of a small and beneficial subset of these spandrels virtually defines the process of natural selection. Why else do we so commonly refer to the theory of natural selection as as interplay of "chance" (for the spandrels of raw material in mutational variation) and "necessity" (for the locally predictable directions of selection towards adaptation). Similarly, species selection operates by exapting emergent spandrels from causal processes acting upon organisms.

This is a difficult concept to gasp so I urge interested readers to study the relevant chapter in Gould's book. The essence of his argument is that species sorting can only be understood at the level of species as individuals and the properties of species as the random variation upon which species sorting operates.

Michael Lynch is also skeptical about evolvability but for slightly different reasons (Lynch, 2007). Lynch is characteristically blunt about how he views anyone who disagrees with him. (I have been on the losing side of one of those disagreement and I still have the scars to prove it.)

Four of the major buzzwords in biology today are complexity, modularity, evolvability, and robustness, and it is often claimed that ill-defined mechanisms not previously appreciated by evolutionary biologists must be invoked to explain the existence of emergent properties that putatively enhance the long-term success of extant taxas. This stance is not very different from the intelligent-design philosophy of invoking unknown mechanisms to explain biodiversity.

This is harsh and somewhat unfair since nobody would accuse Ford Doolittle of ulterior motives. Lynch's point is that evolvability must be subjected to the same rigorous standards that he applies to population genetics. He questions the idea that "the ability to evolve itself is actively promoted by directional selection" and raises four objections.

  1. Evolvability doesn't meet the stringent conditions that a good hypothesis demands.
  2. It's not clear that the ability to evolve is necessarily advantageous.
  3. There's no evidence that differences between species are anything other than normal variation.
  4. "... comparative genomics provides no support for the idea that genome architectural changes have been promoted in multicellular lineages so as to enhance their ability to evolve.

Why transposon-related sequences?

One of the problems that occurred to me was why there was so much emphasis on transposon sequences. Don't the same arguments apply to pseudogenes, random duplications, and, especially, genome doublings? They do, but the paper appears to be part of a series that arose out of a 2018 meeting on Evolutionary Roles of Transposable Elements: The Science and Philosophy organized by Stefan Linquist and Ford Doolittle. That's why there's a focus on transposons. I assume that Ford could make the same case for other properties of large genomes such as pervasive transcription, spurious transcription binding sites, and splicing errors even if they had nothing to do with transposons.

Is this an attempt to justify junk?

I argue that genomes are sloppy and junk DNA accumulates just because it can. There's no ulterior motive in having a large genome full of junk and it's far more likely to be slightly deleterious than neutral. I believe that all the evidence points in that direction.

This is not a popular view. Most scientists want to believe that all that of excess DNA is there for a reason. If it doesn't have a direct functional role then, at the very least, it's preserved in the present because it allows for future evolution. The arguments promoted by Ford Doolittle in this article, and by others in related articles, tend to support those faulty views about the importance of junk DNA even though that wasn't the intent. Doolittle's case is much more sophisticated than the naive views of junk DNA opponents but, nevertheless, you can be sure that this paper will be referenced frequently by those opponents.

Normal evolution is hard enough but multilevel selection is even harder, especially for molecular biologists who would never think of reading The Structure of Evolutionary Theory, or any other book on evolution. That's why we have to be really careful to distinguish between effects that are adaptations for species sorting and effects that are fortuitous and irrelevant for higher level sorting.

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

1. The same issues about function come up in the debate over alternative splicing [Alternative splicing and evolution].

2. See Vrba and Gould (1986) for a detailed discussion of species sorting and species seletion and how it pertains to the hierarchical perspective.

Dawkins, R. (1988) The Evolution of Evolvability. Artifical Life, The proceedings of an Interdisciplinary Workshp on The Synthesis and Simulation of Living Systems held September 1987 in Los Alamos, New Mexico. C. G. Langton, Addison-Wesley Publishing Company: 201-220.

Lynch, M. (2007) The frailty of adaptive hypotheses for the origins of organismal complexity. Proceedings of the National Academy of Sciences 104:8597-8604. [doi: 10.1073/pnas.0702207104

Vrba, E.S. and Gould, S.J. (1986) The hierarchical expansion of sorting and selection: sorting and selection cannot be equated. Paleobiology 12:217-228. [doi: 10.1017/S0094837300013671]

Wednesday, June 29, 2022

The Function Wars Part XII: Revising history and defending ENCODE

I'm very disappointed in scientists and philosophers who try to defend ENCODE's behavior on the grounds that they were using a legitimate definition of function. I'm even more annoyed when they deliberately misrepresent ENCODE's motive in launching the massive publicity campaign in 2012.

Here's another new paper on the function wars.

Ratti, E. and Germain, P.-L. (2021) A Relic of Design: Against Proper Functions in Biology. Biology & Philosophy 37:27. [doi: 10.1007/s10539-022-09856-z]

The notion of biological function is fraught with difficulties - intrinsically and irremediably so, we argue. The physiological practice of functional ascription originates from a time when organisms were thought to be designed and remained largely unchanged since. In a secularized worldview, this creates a paradox which accounts of functions as selected effect attempt to resolve. This attempt, we argue, misses its target in physiology and it brings problems of its own. Instead, we propose that a better solution to the conundrum of biological functions is to abandon the notion altogether, a prospect not only less daunting than it appears, but arguably the natural continuation of the naturalisation of biology.

Wednesday, June 22, 2022

The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect

How much of the human genome is functional? This a problem that will be solved by biochemists not epistemologists.

What is junk DNA? What is functional DNA? Defining your terms is a key part of any scientific controversy because you can't have a debate if you can't agree on what you are debating. We've been debating the prevalence of junk DNA for more than 50 years and much of that debate has been (deliberately?) muddled by one side or the other in order to score points. For example, how many times have you heard the ridiculous claim that all noncoding DNA was supposed to be junk DNA? And how many times have you heard that all transcripts must have a function merely because they exist?

Monday, June 13, 2022

Manolis Kellis dismisses junk DNA

Manolis Kellis is a professor of computer science at the Massachusetts Institute of Technology (MIT). Sandwalk readers will remember him as one of the ENCODE leaders who participated in the massive publicity campaign of 2012 where they attempted to prove that most of the human genme is functional, not junk. He is the lead author of the semi-retraction that was published eighteen months later. [What did the ENCODE Consortium say in 2012 and 2014?]

Kellis was interviewed in April 2022 and it's interesting to hear his current views on junk DNA especially since MIT has just been rated the top university in the world for the 11th straight year. [QS ranks MIT the world’s No. 1 university for 2022-23].

His response to a question about junk DNA begins at 58 minutes. Kellis makes three points.

  • He doesn't like the word "junk."
  • Lots of noncoding DNA has known functions such as noncoding genes and regulatory sequences.
  • Half of our genome consists of transposon sequences and their regulatory regions fueled the mammalian radiation following the asteroid impact so that modern mammalian genomes now contain a complex and sophisticated network of regulatory sequences.

As I suspected, Kellis still doesn't recognize any of the evidence for junk DNA that was briefly outlined in the Kellis et al. (2014) paper. I find it surprising that after a decade of being exposed to criticism of his stance on junk DNA he is still not capable of presenting a cogent argument against junk.


Kellis, M. et al. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) April 24, 2014 published online [doi: 10.1073/pnas.1318948111]

Saturday, May 14, 2022

Editing the Wikipedia article on non-coding DNA

I decided to edit the Wikipedia article on non-coding DNA by adding new sections on "Noncoding genes," "Promoters and regulatory sequences," "Centromeres," and "Origins of replication." That didn't go over very well with the Wikipedia police so they deleted the sections on "Noncoding genes" and "Origins of replication." (I'm trying to restore them so you may see them come back when you check the link.)

I also decided to re-write the introduction to make it more accurate but my version has been deleted three times in favor of the original version you see now on the website. I have been threatened with being reported to Wikipedia for disruptive edits.

The introduction has been restored to the version that talks about the ENCODE project and references Nessa Carey's book. I tried to move that paragraph to the section on the ENCODE project and I deleted the reference to Carey's book on the grounds that it is not scientifically accurate [see Nessa Carey doesn't understand junk DNA]. The Wikipedia police have restored the original version three times without explaining why they think we should mention the ENCODE results in the introduction to an article on non-coding DNA and without explaining why Nessa Carey's book needs to be referenced.

The group that's objecting includes Ramos1990, Qzd, and Trappist the monk. (I am Genome42.) They seem to be part of a group that is opposed to junk DNA and resists the creation of a separate article for junk DNA. They want junk DNA to be part of the article on non-coding DNA for reasons that they don't/won't explain.

The main problem is the confusion between "noncoding DNA" and "junk DNA." Some parts of the article are reasonably balanced but other parts imply that any function found in noncoding DNA is a blow against junk DNA. The best way to solve this problem is to have two separate articles; one on noncoding DNA and it's functions and another on junk DNA. There has been a lot of resistance to this among the current editors and I can only assume that this is because they don't see the distinction. I tried to explain it in the discussion thread on splitting by pointing out that we don't talk about non-regulatory DNA, non-centromeric DNA, non-telomeric DNA, or non-origin DNA and there's no confusion about the distinction between these parts of the genome and junk DNA. So why do we single out noncoding DNA and get confused?

It looks like it's going to be a challenge to fix the current Wikipedia page(s) and even more of a challenge to get a separate entry for junk DNA.

Here is the warning that I have received from Ramos1990.

Your recent editing history shows that you are currently engaged in an edit war; that means that you are repeatedly changing content back to how you think it should be, when you have seen that other editors disagree. To resolve the content dispute, please do not revert or change the edits of others when you are reverted. Instead of reverting, please use the talk page to work toward making a version that represents consensus among editors. The best practice at this stage is to discuss, not edit-war. See the bold, revert, discuss cycle for how this is done. If discussions reach an impasse, you can then post a request for help at a relevant noticeboard or seek dispute resolution. In some cases, you may wish to request temporary page protection.

Being involved in an edit war can result in you being blocked from editing—especially if you violate the three-revert rule, which states that an editor must not perform more than three reverts on a single page within a 24-hour period. Undoing another editor's work—whether in whole or in part, whether involving the same or different material each time—counts as a revert. Also keep in mind that while violating the three-revert rule often leads to a block, you can still be blocked for edit warring—even if you do not violate the three-revert rule—should your behavior indicate that you intend to continue reverting repeatedly.

I guess that's very clear. You can't correct content to the way you think it should be as long as other editors disagree. I explained the reason for all my changes in the "history" but none of the other editors have bothered to explain why they reverted to the old version. Strange.


Friday, April 15, 2022

Most lncRNAs are junk

A hard-hitting review will be published in Annual Review of Genomics and Human Genetics. It shows that the case for large numbers of functional lncRNAs is grossly exaggerated.

A long-time Sandwalk reader (Ole Kristian Tørresen) alerted me to a paper that's coming out next October in Annual Review of Genomics and Human Genetics. (Thank-you Ole.) The authors of the review are Chris Ponting from the University of Edinburgh (Edinburgh, Scotland, UK) and Wilfried Haerty at the Earlham Institute in Norwich, UK. They have been arguing the case for junk DNA for the past two decades but most of their arguments are ignored. This paper won't be so easy to ignore because it makes the case forcibly and critically reviews all the false claims for function. I'm going to quote a few juicy parts because I know that many of you will not be able to access the preprint.

Tuesday, April 05, 2022

Transcription activity in repeat regions of the human genome

A detailed examination of the new complete human genome reveals that 54% of it consists of various repetitive elements. Some of them are transcribed and some aren't.

This is my fourth post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

The fourth paper extends the ENCODE-type analysis of the T2T-CHM13 sequence by focusing on repeats.

Hoyt, S.J., Storer, J.M., Hartley, G.A., Grady, P.G., Gershman, A., de Lima, L.G., Limouse, C., Halabian, R., Wojenski, L., Rodriguez, M. et al. (2021) From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science 376:57. [doi: 10.1126/science.abk3112]

Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.

The most useful part of this paper is the complete analysis of all repetitive elements in the T2T-CHM13 genome. This gives us, for the first time, a complete picture of a human genome. The exact values of the various components aren't important because there's considerable variation with the human population but the big picture is informative.

These are the percentages of the human genome occupied by the different classes of repetitive DNA.

  • SINEs 12.8%
  • Retrotransposon 0.15%
  • LINEs 20.7%
  • LTRs 8.8%
  • DNA transposons 3.6%
  • simple repeats 8%

The total comes to 54%. There are other estimates that are higher because of a more lenient cutoff value for sequence similarity but this gives you a pretty good idea of what the genome looks like. Most of the transposon-related sequence consists of fragments of once active transposons so the fraction of the genome consisting of true selfish DNA capable of transposing is a small fraction of this 54%.

We have every reason to believe that most of this DNA is junk DNA based on several lines of evidence developed over the past 50 years but most of the authors of this paper are reluctant to reach that conclusion so the fact that these repetitive sequences might be junk isn't mentioned in the paper. Instead, the authors concentrate on mapping CpG methylation sites and transcribed regions. They refer to this as "functional annotation" but they don't provide a definition of function.

We provide a high-confidence functional annotation of repeats across the human genome.

As you might expect, the repeat elements that retain vestiges of promoters are often transcribed and this includes adjacent genomic sequences that are found near these promoter (e.g. near LTRs). The long stretches of short tandem repeats (e.g. satellite DNA) do not contain any sequences that resemble promoters so these regions are not transcribed. (The authors seem to be a bit surprised by this result.) Further work is needed to decide how much of this DNA is truly functional and which parts contribute to human uniqueness. Naturally, that will require much more ENCODE-type work and T2T sequencing of other primates.

Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Although we find repeat variants that appear enriched or specific to the human lineage, in the absence of T2T-level assemblies from other primate species, we cannot truly attribute these elements to specific human phenotypes. Thus, the extent of variation described herein highlights the need to expand the effort to create human and nonhuman primate pan-genome references to support exploration of repeats that define the true extent of human variation.

This will cost millions of dollars. I suspect the grant applications have already been sent.



Sunday, April 03, 2022

Epigenetic markers in the last 8% of the human genome sequence

The newly sequenced part of the human genome contains the same chromatin regions as the rest of the genome and they don't tell us very much about which regions are functional and which ones are junk.

This is my second post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

Friday, April 01, 2022

Illuminating dark matter in human DNA?

A few months ago, the press office of the University of California at San Diego issued a press release with a provocative title ...

Illuminating Dark Matter in Human DNA - Unprecedented Atlas of the "Book of Life"

The press release was posted on several prominent science websites and Facebook groups. According to the press release, much of the human genome remains mysterious (dark matter) even 20 years after it was sequenced. According to the senior author of the paper, Bing Ren, we still don't understand how genes are expressed and how they might go awry in genetic diseases. He says,

A major reason is that the majority of the human DNA sequence, more than 98 percent, is non-protein-coding, and we do not yet have a genetic code book to unlock the information embedded in these sequences.

We've heard that story before and it's getting very boring. We know that 90% of our genome is junk, about 1% encodes proteins, and another 9% contains lots of functional DNA sequences, including regulatory elements. We've known about regulatory elements for more than 50 years so there's nothing mysterious about that component of noncoding DNA.

Monday, March 14, 2022

Junk DNA

My book manuscript has been reviewed by some outside experts and they seem to have convinced my editor that my book is worth publishing. I hope we can get it finished soon. It would be nice to publish in in September on the 10th anniversary of the ENCODE disaster.

Meanwhile, I keep scanning the literature for mentions of junk DNA to see if scientists are finally coming to their senses. Apparently not, and that's a good thing because it means that my book is still needed. Here's the opening paragraph from a recent review of lncRNAs. The authors are in the Department of Medicine at the Medical College of Gerogia, in Augusta, Georgia (USA).

Ghanam, A.R., Bryant, W.B. and Miano, J.M. (2022) Of mice and human-specific long noncoding RNAs. Mammalian Genome:1-12. [doi: 10.1007/s00335-022-09943-2]

Approximately ninety-eight percent of our genome is noncoding. Contrary to initial descriptions of this vast sea of sequence comprising “junk DNA” (Ohno 1972), comparative genomics and various next-generation sequencing studies have revealed millions of transcription factor binding sites (TFBS) (Vierstra et al. 2020) and tens of thousands of noncoding genes, most notably the class of long noncoding RNAs (LncRNAs), defined currently as processed transcripts of length > 200 base pairs with no protein-coding capacity (Rinn and Chang 2020; Statello et al. 2021). The widespread transcription of LncRNAs and abundance of regulatory sequences such as enhancers support the concept of a genome that is largely functional (ENCODE Project Consortium 2012). Such a dynamic genome should not be surprising given the complex nature of gene expression and gene function necessary for embryonic and postnatal development as well as disease processes.

  • No reasonable scientist, especially Susumu Ohno, ever said that all noncoding DNA was junk.
  • There are millions of transcription factor binding sites but most of them are spurious binding sites that have nothing to do with regulation. They simply reflect the expected behavior of typical DNA binding proteins in a large genome full of junk DNA.
  • Nobody has demonstrated that there are tens of thousand of noncoding genes. There may be tens of thousands of transcripts but that's not the same thing since you have to prove that those transcripts are functional before you can say that they come from genes.
  • There is currently no evidence to support the concept of a genome that is largely functional in spite of what the ENCODE researchers might have said ten years ago.
  • Such a genome would be very surprising, if it were true, given what we know about genomes, evolution, and basic biochemistry.

Except for those few minor details—I hope I'm not being too picky—that's a pretty good way to start a review of lncRNAs. :-)


Sunday, October 24, 2021

Style vs substance in science communication: The role of science writers in major science journals

Science writers have always had articles published in the leading science journals such as Science and Nature but over the past few decades their role seems to have increased so that now even lesser journals employ them to write articles, commentary, and press releases. I recently posted an example of where this can go horribly wrong [Society for Molecular Biology and Evolution (SMBE) spreads misinformation about junk DNA].

The role of science writers has come to dominate the pages of Science and Nature so that we now have a situation where only two thirds of the pages in a typical issue are devoted to actual science publications and most readers are concentrating on the news and opinons in the front part of the journal. In some cases, the science writers control the image of these journals as happened at Nature during the ENCODE publicity campaign in 2012. Over at Science, Elisabeth Pennisi has done more to spread misinformation than any scientist in the field of molecular biology.

These are cases where science writers have sacrificed sustance for style. They write nice readable articles that promote the image of their journal but are scientifcally incorrect.

Let's look at a specific example. Back in 2005 Science celebrated its 125th anniversary by publishing "125 Questions: What We Don't Know." One of those questions was "Why Do Humans Have So Few Genes?"—a question that scientists had adequately answered in 2005 but you wouldn't know that from the short article written by Elizabeth Pennisi [SCIENCE Questions: Why Do Humans Have So Few Genes?]. The article was full of untruths and misinformation. There were lots of other questions in that issue that were just as ridiculous if you knew the topics.

Now, you might imagine that these questions were posed by the leading researchers in their fields but you would be wrong. The list of questions was drawn up by editors and science writers as described in the anniversary issue [SCIENCE Questions: Asking the Right Question].

We began by asking Science’s Senior Editorial Board, our Board of Reviewing Editors, and our own editors and writers to suggest questions that point to critical knowledge gaps. The ground rules: Scientists should have a good shot at answering the questions over the next 25 years, or they should at least know how to go about answering them. We intended simply to choose 25 of these suggestions and turn them into a survey of the big questions facing science. But when a group of editors and writers sat down to select those big questions, we quickly realized that 25 simply wouldn’t convey the grand sweep of cutting-edge research that lies behind the responses we received. So we have ended up with 125 questions, a fitting number for Science’s 125th anniversary.

Isn't it remrkable that editors and writers are being asked to evaluate science (substance) as if their opinions were more important than those of the scientists?

Has Science learned from these mistakes? No, because a few months ago they published a new list of 125 questions in collaboration with the 125th anniversary of Shanghai Jiao Tong University: 125 Questions: Exploration and Discovery. The list of questions hasn't gotten any better; it includes questions like, "How do organisms evolve?"; "What genes make us uniquely human?"; and "How are biomolecules organized in cells to function orderly and effectively?" Many of you can imagine what the short accompanying explanation looks like and you would be right.

Pennisi's original question has disappeared but there's a very similar question in the 2021 list.

Why are some genomes so big and others very small?

Genome size, which is the amount of DNA in a cell nucleus, is extremely diverse across animals and plants, and varies more than 64,000-fold. The smallest genome recorded exists in the microsporidian Encephalitozoon intestinalis (a parasite in certain mammals), and the largest genome belongs to a flowering plant known as Paris japonica, which has 150 billion base pairs of DNA per cell (50 times larger than that of a human). Plants are interesting in that their genome size plays an important role in their biology and evolution. But as the authors of a 2017 paper in Trends in Plant Sciences wrote: “Although we now know the major contributors to genome size diversity are non-protein coding, often highly repetitive DNA sequences, why their amounts vary so much still remains enigmatic.”

Sandwalk readers know that knowledgeable scientists came up with good answers to that question about 50 years ago. One answer is that different species have different amounts of junk DNA because some species don't have large enough populations to eliminate it by natural selection. In other cases, the differences are due to polyploidization.

You would think that after all the criticism of Science over their past coverage of genomes and junk DNA that the writers and editors would know this. But they don't, and that's because science writers and editors seem to be remarkably immune to scientific criticism. (The topic probably doesn't come up when they get together at their science writers' conventions.) I'm making the case that they are so focused on style (science writing) that they just don't care about substance (scientific accuracy).

The major journals have a serious problem that they don't recognize. A lot of the stuff that appears in their journals is not scientifically accurate or, at the very least, is misleading. They're not going to fix this problem if their editorial staff is dominated by science journalists.


Monday, September 27, 2021

The biggest mistake in the history of molecular biology (not!)

The creationists are committed to proving that most of our genome is functional because otherwise the idea of an intelligent designer doesn't make a lot of sense. They reject all of the evidence that supports junk DNA and they vehemently reject the notion that 90% of our genome is junk.

I was recently alerted to a video on junk DNA produced by Creation Ministries International in which they quote John Mattick.

A leading figure in genetics, Prof. John Mattick said ...'the failure to recognize the implications of the non-coding DNA will go down as the biggest mistake in the history of molecular biology'.

The creationists are making the common mistake of equating noncoding DNA and junk DNA but the quotation sounded accurate to me since John Mattick makes similar mistakes in his publications. I decided to try and find the exact quotation and reference and the closest I could come to a direct quote was in a paper by Mattick from 2007 (Mattick, 2007). He's referring to introns—here's the exact quotation.

It should be noted that the power and precision of digital communication and control systems has only been broadly established in the human intellectual and technological experience during the past 20–30 years, well after the central tenets of molecular biology were developed and after introns had been discovered. The latter was undoubtedly the biggest surprise (Williamson, 1977), and its misinterpretation possibly the biggest mistake, in the history of molecular biology. Although introns are transcribed, since they did not encode proteins and it was inconceivable that so much non-coding RNA could be functional, especially in an unexpected way, it was immediately and almost universally assumed that introns are non-functional and that the intronic RNA is degraded (rather than further processed) after splicing. The presence of introns in eukaryotic genomes was then rationalized as the residue of the early assembly of genes that had not yet been removed and that had utility in the evolution of proteins by facilitating domain shuffling and alternative splicing (Crick, 1979; Gilbert, 1978; Padgett et al., 1986). Interestingly, while it has been widely appreciated for many years that DNA itself is a digital storage medium, it was not generally considered that some of its outputs may themselves be digital signals, communicated viaRNA.

However, the idea of the biggest mistake in molecular biology predates that reference. Mattick is quoted in a Scientific American article by W. Wayt Gibbs where Gibbs is discssing the "suprising" fact that regulatory sequences are conserved and that some genes are noncoding genes (Gibbs, 2003).

“I think this will come to be a classic story of orthodoxy derailing objective analysis of the facts, in this case for a quarter of a century,” Mattick says. “The failure to recognize the full implications of this—particularly the possibility that the intervening noncoding sequences may be transmitting parallel information in the form of RNA molecules—may well go down as one of the biggest mistakes in the history of molecular biology.”

The discovery of introns in the mid-1970s was definitely a surprise but it's not true, as Mattick implies, that they were immediately assumed to be junk. In fact, as he points out, there was a lot of debate over the possible role of introns in the evolution of protein-coding genes where they could stimulate exon shuffling. Later on, the presence of introns was recognized to be an essential component of alternative splicing.

Once more and more sequences were published it became apparent that neither their size nor their sequences were conserved except for the spliceosome recognition sequences. It soon became obvious that their sequences were evolving at the neutral rate demonstrating that they were mostly junk. Mattick assumes that this conclusion—that introns are mostly junk—is one of the biggest mistakes in molecular biology. I think the opposite is true. I think that the failure of most molecular biologists to understand junk DNA is a huge mistake.

The creationists are misquoting Mattick when they say that the classification of all noncoding as junk is the biggest mistake in molecular biology. In the quotations above, Mattick is specifically referrring to introns but I'm sure he won't be upset to be misquoted in that manner since he firmly believes that most noncoding DNA is functional.

There's a bit of an ironic twist here. If it were true that knowledgeable scientists in the 1970s actually believed that all noncoding DNA was junk then I'd have to agree that this would have been a big (biggest?) mistake. But they didn't and it wasn't a big mistake. As I've said many times, no knowledgeable scientist ever said that all noncoding DNA was junk since they (we) all knew about noncoding genes, regulatory sequences, centromeres, and origins of replication, all of which are functional noncoding DNA. We now know that about 1% of our genome is coding sequences and about 9% is functional noncoding DNA. The other 90% is junk.

[Stop Using the Term "Noncoding DNA:" It Doesn't Mean What You Think It Means]


Mattick, J.S. (2007) A new paradigm for developmental biology. Journal of Experimental Biology 210:1526-1547. [doi: 10.1242/jeb.005017]

Gibbs, W.W. (2003) The unseen genome: gems among the junk. Scientific American 289:46-53.

Monday, May 03, 2021

More illusions/delusions of James Shapiro and Denis Noble

It was just a few weeks ago that I discussed short articles by Denis Noble and James Shapiro that were published in the journal Biosemiotics [The illusions of Denis Noble] [The illusions of James Shapiro].

Several readers questioned whether Biosemiotics is a real science journal and they were right: it's a kooky journal and that's why it publishes papers by kooks. However, we now have a new paper by Shapiro and Noble that's about to appear in a legitimate scientific journal; albeit, one that has seen better days. This would normally raise red flags concerning peer review but we're long past the time when we can count on peer review to weed out the kooks.

Here's the paper. I'm not going to discuss all the main points because they were covered in my previous posts. I'll just concentrate on the most ridiculous part in order to illustrate the (lack of) quality of this paper.1

Shapiro, J. and Noble, D. (2021) What prevents mainstream evolutionists teaching the whole truth about how genomes evolve? Progress in Biophysics and Molecular Biology. [doi: 10.1016/j.pbiomolbio.2021.04.004]

The common belief that the neo-Darwinian Modern Synthesis (MS) was buttressed by the discoveries of molecular biology is incorrect. On the contrary those discoveries have undermined the MS. This article discusses the many processes revealed by molecular studies and genome sequencing that contribute to evolution but nonetheless lie beyond the strict confines of the MS formulated in the 1940s. The core assumptions of the MS that molecular studies have discredited include the idea that DNA is intrinsically a faithful self-replicator, the one-way transfer of heritable information from nucleic acids to other cell molecules, the myth of “selfish DNA,” and the existence of an impenetrable Weismann Barrier separating somatic and germ line cells. Processes fundamental to modern evolutionary theory include symbiogenesis, biosphere interactions between distant taxa (including viruses), horizontal DNA transfers, natural genetic engineering, organismal stress responses that activate intrinsic genome change operators, and macroevolution by genome restructuring (distinct from the gradual accumulation of local microevolutionary changes in the MS). These 21st Century concepts treat the evolving genome as a highly formatted and integrated Read-Write (RW) database rather than a Read-Only Memory (ROM) collection of independent gene units that change by random copying errors. Most of the discoverers of these macroevolutionary processes have been ignored in mainstream textbooks and popularizations of evolutionary biology, as we document in some detail. Ironically, we show that the active view of evolution that emerges from genomics and molecular biology is much closer to the 19th century ideas of both Darwin and Lamarck. The capacity of cells to activate evolutionary genome change under stress can account for some of the most negative clinical results in oncology, especially the sudden appearance of treatment-resistant and more aggressive tumors following therapies intended to eradicate all cancer cells. Knowing that extreme stress can be a trigger for punctuated macroevolutionary change suggests that less lethal therapies may result in longer survival times.

The section on "selfish DNA" is the one that seems to have the highest number of misleading and false statements per paragraph.

1.4. The end of “selfish” or “junk” DNA

A major shortcoming of the MS is that it was based on a “gene-centric” view, which assumed that the genome is basically a collection of “genes” that are the protein-coding units of heredity and heritable variation. As we saw in the quotation from Goldschmidt's 1940 book, this view failed to take the evolutionary importance of chromosome structure into account (Goldschmidt, 1940). It also blinded evolutionary biologists to the importance of McClintock's mid- 20th Century discovery of mobile “controlling elements” (McClintock, 1987). Both the ideas of genetic transposition and control of gene expression by these non-coding mobile elements did not fit within the narrow confines of the MS concepts of genome function and variation. A further empirical assault on the limited MS conceptual framework came in the late 1960s when Britten and Kohne discovered that a significant fraction of genomic DNA from complex eukaryotes consists of highly repetitive sequences rather than the unique coding sequences expected to make up the hereditary material (Britten and Kohne, 1968).

  • The title is ridiculous since no respectable scientist ever equated selfish DNA with junk DNA [Selfish genes and transposons].

  • The Modern Synthesis (MS) was not based on a "gene-centric" view.
  • For the past 50 years, no respectable scientist, and no knowledgeable expert in molecular evolution, has restricted the definition of "gene" to just protein-coding genes.
  • For the past 50 years, no expert in molecular evolution has ever thought that the genome is just a collection of protein-coding genes.
  • For the past 50 years, experts in molecular biology have known about transposons and have considered the view that some of them might be "controlling elements." They have concluded that most transposon-related sequences are just fragments of defective transposons with no biological function.
  • Nobody cares whether mobile genetic elements fit within the narrow confines of the Modern Synthesis as described by Huxley and other in the 1940s because no exeprt in molecular evolution has believed in that view of evolution since the late 1960s.
  • The Britten and Kohne paper established that the genomes of most multicellular eukaryotes contain large amounts of repetivie DNA. This was an attempt to resolve the C-value paradox. Britten and Kohne didn't like the idea that this could be junk DNA so they offered some speculation about function. However, futher data established that most of this repetitive DNA is, indeed, junk and Britten and Kohn's speculations have been discredited. Britten and Kohn were attempting to interpret their result within the context of the adaptationist views that characterized the the Modern Synthesis back then. The correct interpretation of their results came with the overthrow of the Modern Synthesis and the adoption of a new view of evolutionary theory that focused on Neutral Theory, Nearly-Neural Theory, and the importance of random geneitc drift. Shaprio and Noble missed that revolution so they continue to attack an old-fashioned strawman version of evolutionay theory.

Before continuing, it's important to realize that by the early 1970s selectionist thinking had been abandoned by the experts in genome evolution. By 1978 Gould and Lewontin tried, unsccessfully, to convince all other biologists to abandon the old selectionist way of thinking [The Spandrels of San Marco and the Panglossian Paradigm]. James Shapiro and Denis Noble are among those other biologists who didn't get the message.

In order to apply selectionist thinking to explain the presence of so much non-coding DNA, evolutionary biologists called this unexpected portion of the genome “junk DNA” (Ohno, 1972) or “selfish DNA” (Orgel and Crick, 1980). Richard Dawkins used an extreme view of these “selfish genes” to erect a whole philosophy of strictly passive evolutionary gradualism (Dawkins, 1976). Today we know that the human genome contains at least 30X as much repetitive non-coding DNA as protein-coding sequences (Lander et al., 2001). Repetitive DNA provides formatting signals for transcription, epigenetic modification and chromosome mechanics and also is the most variable component in the evolutionary diversification of complex genomes (Symonová and Howell, 2018; Subirana et al., 2015; Matsubara et al., 2016; CioffiMde et al., 2015; Chalopin et al., 2015; Shao et al., 2019; Böhne et al., 2008; Li et al., 2016; Oliver et al., 2013). A 2013 plot of organismal complexity against protein-coding and non-coding DNA showed that coding DNA peaked at approximately ∼3 × 107 bp, while the non-coding DNA increased linearly with growing complexity up to ∼2–3 x 1010 bp (Liu et al., 2013). In other words, non-coding DNA tracked organismal complexity better than the protein-coding genes. The “encyclopedia of DNA elements” (ENCODE) project, which largely abandoned the term “gene,” revealed that the large majority of the so-called junk DNA is actively transcribed in a regulated manner, indicating that it is functional (Consortium, 2012; Pennisi, 2012).

  • It is completely, totally, ridiculous to say that the idea of junk DNA was due to selectionist thinking. The first statement in this paragraph is powerful evidence that Shaprio and Noble don't know what they are talking about. The concept of junk DNA is a rejection of selectionist thinking.
  • The use of "noncoding DNA" is what's called a "tell."
  • Again, equating junk DNA with selfish DNA is stupid. If all the excess DNA were selfish then it isn't junk because it has a function.
  • Richard Dawkins' view on evolution is closer to the old-fashioned adaptationist view that was abandoned by the experts by the time he wrote The Selfish Gene. Dawkins book is not really about "genes," however, as is clear to anyone who has read it. He's talking about any piece of DNA that confers a fitness advantage. The Dawkins strawman is a favorite target of the Third Way types but it's just a strawman.
  • No significant proportion of repetitive DNA has a function in spite of the references quoted above.
  • There is no significant correlation between organismal compexity and noncoding DNA. Lots of very similar species, such as onions, have very different genome sizes.
  • No knowledgeable scientist since the 1980s thinks there should be a significant correlation between the number of genes and organismal complexity. We know that most of the phenotypic differences between multicellular species are due to changes in the timing and amount of expression of a standard set of genes. This is the main discovery of evolutionary-developmental biology (evo-devo), another revolution that Shapiro and Nobel missed. They should educate themselves by reading Sean B. Carroll's books.
  • The ENCODE researchers did lots of silly things but they did NOT abandon the term "gene."
  • The idea that most of our genome is functional because of ENCODE is laughable in 2021. The fact that Shapiro and Noble would bring this up is another "tell" and the fact that they would reference Elizabeth Pennisi is even more revealing. These guys are incapable of thinking critically.

Shaprio and Noble then describe a few examples of repetitive DNA sequences that have a known function and they point out that a number of noncoding genes have been indentified. They imply that these functional sequences make up a signifcant fraction of the genome thus calling the concept of junk DNA into question. They close the section with,

Clearly, none of the eminent scientists who wrote about junk or selfish DNA could possibly have imagined the wide range of cellular functionalities that we know today are executed by ncRNA molecules. The idea that a genome was just a collection of protein coding sequences has proved completely inadequate.

  • I don't know about you, dear reader, but I'll match those "eminent scientists" against Shapiro and Noble any day. I'd love to see them try to defend their views in a public debate against some of the leading proponents of junk DNA. I know where my money would be.

Let me close by quoting the last chapter of this paper. I don't intend to comment on it except to say that it gives new meaning to the word "irony."

The campaign to sustain the Modern Synthesis causes real harm in a number of different ways. Among doctors treating bacterial infections, ignorance of real-world evolutionary processes has led to a situation in which the available antibiotics have lost their effectiveness against many life-threatening conditions (CDC et al., 2019). Among the general public, the inability to comprehend the potential all living organisms possess for transferring and reorganizing genomic configurations makes them unprepared to form sound judgements about how society should utilize its growing arsenal of biotechnology tools acquired from our microbial neighbors, like CRISPR (Doudna, 2020). Among oncologists, MS thinking prevents the practitioners treating cancer patients from recognizing the dangers of overtreating tolerable tumors in ways that may provoke a macroevolutionary transition to a far more lethal and untreatable disease (Heng, 2019). Finally, in the battle against obscurantism and anti-evolution prejudice, insistence on an outdated set of assertions about how life can change itself leaves the defenders of rigorous scientific inquiry without satisfactory responses to critics. Clearly, the time has come for the mainstream evolution community to recognize and join the scientific reality of the 21st Century.

Finally, one of the most important properties of kooks is that they find each other and they tend to hang out together, either physically or virtually. I'm not sure why this happens since they often espouse mutually exclusive views. I'm guessing that we can explain it in two different ways: (1) they are all outsiders fighting against a common enemy; namely, real science, and (2) they lack critical thinking skills so they don't see the flaws in each other's arguments.


1. In case you didn't recognize the quality from the title.