More Recent Comments

Friday, August 12, 2022

The surprising (?) conservation of noncoding DNA

We've known for more than half-a-century that a lot of noncoding DNA is functional. Why are some people still surprised? It's a puzzlement.

A paper in Trends in Genetics caught my eye as I was looking for somethng else. The authors review the various functions of noncoding DNA such as regulatory sequences and noncoding genes. There's nothing wrong with that but the context is a bit shocking for a paper that was published in 2021 in a highly respected journal.

Leypold, N.A. and Speicher, M.R. (2021) Evolutionary conservation in noncoding genomic regions. TRENDS in Genetics 37:903-918. [doi: 10.1016/j.tig.2021.06.007]

Humans may share more genomic commonalities with other species than previously thought. According to current estimates, ~5% of the human genome is functionally constrained, which is a much larger fraction than the ~1.5% occupied by annotated protein-coding genes. Hence, ~3.5% of the human genome comprises likely functional conserved noncoding elements (CNEs) preserved among organisms, whose common ancestors existed throughout hundreds of millions of years of evolution. As whole-genome sequencing emerges as a standard procedure in genetic analyses, interpretation of variations in CNEs, including the elucidation of mechanistic and functional roles, becomes a necessity. Here, we discuss the phenomenon of noncoding conservation via four dimensions (sequence, regulatory conservation, spatiotemporal expression, and structure) and the potential significance of CNEs in phenotype variation and disease.

Thursday, August 04, 2022

Identifying functional DNA (and junk) by purifying selection

Functional DNA is best defined as DNA that is currently under purifying selection. In other words, it can't be deleted without affecting the fitness of the individual. This is the "maintenance function" definition and it differs from the "causal role" and "selected effect" definitions [The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect].

It has always been difficult to determine whether a given sequence is under purifying selection so sequence conservation is often used as a proxy. This is perfectly justifiable since the two criteria are strongly correlated. As a general rule, sequences that are currently being maintained by selection are ancient enough to show evidence of conservation. The only exceptions are de novo sequences and sequences that have recently become expendable and these are rare.

Sunday, July 31, 2022

Junk DNA causes cancer

This is a story about misleading press releases. The spread of misinformation by press offices is a serious issue that needs to be addressed.

The Institute of Cancer Research in London (UK) published a press release on July 19, 2022 with the provocative title: ‘Junk’ DNA could lead to cancer by stopping copying of DNA. The first three sentences tell most of the story.

Scientists have found that non-coding ‘junk’ DNA, far from being harmless and inert, could potentially contribute to the development of cancer.

Their study has shown how non-coding DNA can get in the way of the replication and repair of our genome, potentially allowing mutations to accumulate.

It has been previously found that non-coding or repetitive patterns of DNA – which make up around half of our genome – could disrupt the replication of the genome.

Nobody ever said that junk DNA was "inert and harmless;" in fact it is assumed to be slightly deleterious and only gets fixed because it is invisible to natural selection in small populations (Nearly Neutral Theory). And no intelligent scientist equates noncoding DNA and junk DNA, even by implication. But in any case, this article isn't about all junk DNA, it's about certain small stretches of repetitive DNA that interfere with replication so that the resulting mutations have to be fixed by repair mechanisms. The most likely sequences to interfere with replication are repeats of CG or (CG)n repeats. As the authors point out in the discussion, these repeats are "extremely rare" in all genomes, including the human genome, suggesting that they are under negative selection.

Other, more common, repeats also show detectable in vitro interference with replisomes at replication forks. The errors introduced by replication stalling can be repaired but some of them will escape repair causing mutations. It's not clear to me why mutations in junk DNA are a problem. That's not explained in the paper.

Here's the paper.

Casas-Delucchi, C.S., Daza-Martin, M., Williams, S.L. et al. (2022) Mechchanism of replication stalling and recovery within repetitive DNA. Nat Commun 13:3953 [doi: 10.1038/s41467-022-31657-x]

Accurate chromosomal DNA replication is essential to maintain genomic stability. Genetic evidence suggests that certain repetitive sequences impair replication, yet the underlying mechanism is poorly defined. Replication could be directly inhibited by the DNA template or indirectly, for example by DNA-bound proteins. Here, we reconstitute replication of mono-, di- and trinucleotide repeats in vitro using eukaryotic replisomes assembled from purified proteins. We find that structure-prone repeats are sufficient to impair replication. Whilst template unwinding is unaffected, leading strand synthesis is inhibited, leading to fork uncoupling. Synthesis through hairpin-forming repeats is rescued by replisome-intrinsic mechanisms, whereas synthesis of quadruplex-forming repeats requires an extrinsic accessory helicase. DNA-induced fork stalling is mechanistically similar to that induced by leading strand DNA lesions, highlighting structure-prone repeats as an important potential source of replication stress. Thus, we propose that our understanding of the cellular response to replication stress may also be applied to DNA-induced replication stalling.

The word "junk" does not appear anywhere in the paper and the word "cancer" appears only once in the text where it refers to a "cancer-associated" mutation in yeast. This makes me wonder why the press release uses both of these words so prominently. Does anybody have any ideas?

Perhaps it has something to do with a quotation from Gideon Coster, who is described as the study leader. He says,

We wanted to understand why it seems more difficult for cells to copy repetitive DNA sequences than other parts of the genome. Our study suggests that so-called junk DNA is actually playing an important and potentially damaging role in cells, by blocking DNA replication and potentially opening the door to cancerous mutations.

I find it strange that he refers to "so-called junk DNA" in the press release but didn't mention it in the peer-reviewed paper. He also didn't emphasize cancerous mutations in the paper.

The press release contain another quotation, this time it's from Kristian Helin who is the Chief Executive of The Institute of Cancer Research. He says,

This study helps to unravel the puzzle of junk DNA – showing how these repetitive sequences can block DNA replication and repair. It’s possible that this mechanism could play a role in the development of cancer as a cause of genetic instability – especially as cancer cells start dividing more quickly and so place the process of DNA replication under more stress.

It's unclear to me how studying these mutation-inducing repeats could help "unravel the puzzle of junk DNA" but that's probably why I'm not the chief executive of a cancer research insitute. I'm so stupid that I didn't even known there WAS a "puzzle" of junk DNA to be unravelled!

It's time for scientists to speak out against press releases like this one. It misrepresents the results and their interpretation as published after undergoing peer review. Intead, the press release is used as a propaganda exercise to promote the personal views of the scientists—views that they couldn't publish. This is what happened with ENCODE and it's becoming more and more common. The fact that, in this case, the personal views of these scientists are flawed only makes the situation worse.


Saturday, July 30, 2022

Wikipedia blocks any mention of junk DNA in the "Human genome" article

Wikipedia has an article on the Human genome. The introduction includes the following statement,

Human genomes include both protein-coding DNA genes and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly-repetitive sequences.

This is a recent improvement (July 22, 2022) over the original statement that simply said, "Human genomes include both protein-coding DNA genes and noncoding DNA." I noted in the "talk" section" that there was no mention of junk DNA in the entire article on the human genome so I added a sentence to the end of the section quoted above. I said,

Some non-coding DNA is junk, such as pseudogenes, but there is no firm consensus over the total mount of junk DNA.1

Thursday, July 28, 2022

Kat Arney defends junk DNA

I'm a big fan of Kat Arney and I loved her 2016 book Herding Hemingway's Cats where she interviews a number of prominent scientists. If you haven't read it you should try and get a copy even if it's just to read the chapters on Mark Ptashne, Dan Graur, and Adrian Bird. The last chapter begins with an attempt to interview Evelyn Fox Keller but don't be put off by that because the rest of the chapter is very scientific.

Kar Arney gets mentioned a couple of times in my book and I quote her opinion of epigenetics from the chapter on Adrian Bird. She has a much better understanding of genes, genomes, and junk DNA that every other person who's ever written a book on those subjects. I especially like what she has to say about her journey of discovery on page 259 near the end of the book.

Things that I thought were solid fact have been exposed as dogma and scientific hearsay, based on little evidence but repeated enough times by researchers, textbooks, and journalists until they feel real.
                                                                                Kat Arney (2016)

Kat Arney has just (July 28, 2022) posted a Genetics Society podcast on Genetics Unzipped. The main title is Does size matter when it comes to your genes and the subsections are "Where have all the genes gone?" "Genes or junk?" and "Are you more special than an onion?" You can listen to the podcast on that site (24 minutes) or read the entire transcript.

I don't entirely agree with everything she says in the podcast but she should be applauded for defending junk DNA in the face of all the scientific hearsay that out there. Good for her.

Here's three things that I might have said differently.

  • I don't agree with her historical account of estimates of the number of genes in the human genome [False History and the Number of Genes 2010]. The knowledgeable experts in the field were predicting about 30,000 genes and their estimates weren't far off. The figure below is from Hatje et al. (2019). Note the anomalous estimates from the GeneSweep lottery and the EST data. The EST data were known to be highly suspect. This is important because the false narrative promotes the idea that scientists knew very little about the human genome before the sequence was published and it promotes the idea that there's some great mystery (too few genes) that needs to be solved.
  • I disagree with her statement that "actual genes makes up less than 2% of all the DNA in the whole human genome." My disagreement depends somewhat on the definition of a gene but that's not really controversial. We're talking about the molecular gene and that's defined as "A gene is a DNA sequence that is transcribed to produce a functional product" [What Is a Gene?]. There are exceptions but this is the best definition we have. The fact that a great many scientists are confused about this is no excuse. Genes include introns so the typical human gene is quite large. In fact, about 45% of the human genome is devoted to genes. This is a far cry from the small percentage (<2%) that consists only of coding regions.
  • Kat Arney says, "So, given that most of our genome isn’t actually genes, what does the rest of it do? Well, it’s complicated, and there’s still a lot we don’t know." My quibble here is subtle but I think it's important. I think we have a pretty good handle on the functional parts of our genome and I don't expect any surprises. We know that about 10% of our genome is conserved and we can account for most of that functional DNA. The rest is not a mystery. We know that most of it consists of various flotsam and jetsam related to transposons and things like pseudogenes and dead viruses. This is junk DNA by any definition and we should stop pretending that it's a big mystery. When we say that 90% of our genome is junk that's not a reflection of ignorance; it's an evidence-based conclusion.

Hatje, K., Mühlhausen, S., Simm, D., and Kollmar, M. (2019) The Protein-Coding Human Genome: Annotating High-Hanging Fruits. BioEssays, 0(0), 1900066. [doi: 10.1002/bies.201900066]

Sunday, July 17, 2022

The Function Wars Part XIII: Ford Doolittle writes about transposons and levels of selection

It's theoretically possible that the presence of abundant transposon fragments in a genome could provide a clade with a selective advantage at the level of species sorting. Is this an important contribution to the junk DNA debate?

As I explained in Function Wars Part IX, we need to maintain a certain perspective in these debates over function. The big picture view is that 90% of the human genome qualifies as junk DNA by any reasonable criteria. There's lots of evidence to support that claim but in spite of the evidence it is not accepted by most scientists.

Most scientists think that junk DNA is almost an oxymoron since natural selection would have eliminated it by now. Many scientists think that most of our genome must be functional because it is transcribed and because it's full of transcription factor binding sites. My goal is to show that their lack of understanding of population genetics and basic biochemistry has led them astray. I am trying to correct misunderstandings and the false history of the field that have become prominent in the scientific literature.

For the most part, philosophers and their friends have a different goal. They are interested in epistemology and in defining exactly what you mean by 'function' and 'junk.' To some extent, this is nitpicking and it undermines my goal by lending support, however oblique, to opponents of junk DNA.1

As I've mentioned before, this is most obvious when it comes to the ENCODE publicity campaign of 2012 [see: Revising history and defending ENCODE]. The reason why the ENCODE researchers were wrong is that they didn't understand that many transcription factor binding sites are unimportant and they didn't understand that many transcripts could be accidental. These facts are explained in the best undergraduate textbooks and they were made clear to ENCODE researchers in 2007 when they published their preliminary results. They were wrong because they didn't understand basic biochemistry. [ENCODE 2007]

Some people are trying to excuse ENCODE on the grounds that they simply picked an inappropriate definition of function. In other words, ENCODE made an epistemology error not a stupid biochemistry mistake. Here's another example from a new paper by Ford Doolittle in Biology and Philosophy. He says,

However, almost all of these developments in evolutionary biology and philosophy passed molecular genetics and genomics by, so that publicizers of the ENCODE project’s results could claim in 2012 that 80.4% of the human genome is “functional” (Ecker et al 2012) without any well thought-out position on the meaning of ‘function’. The default assumption made by ENCODE investigators seemed to have been that detectable activities are almost always products of selection and that selection almost always serves the survival and reproductive interests of organisms. But what ENCODE interpreted as functionality was unclear—from a philosophical perspective. Charitably, ENCODE’s principle mistake could have been a too broad and level-ignorant reading of selected effect (SE) “function” (Garson 2021) rather than the conflation of SE and causal role (CR) definitions of “the F-word”, as it is often seen as being (Doolittle and Brunet 2017).

My position is that this is far too "charitable." ENCODE's mistake was not in using the wrong definition of function; their mistake was in assuming that all transcripts and all transcription factor binding sites were functional in any way. That was a stupid assumption and they should have known better. They should have learned from the criticism they got in 2007.

This is only a small part of Doolittle's paper but I wanted to get that off my chest before delving into the main points. I find it extremely annoying that there's so much ink and electrons being wasted on the function wars when the really important issues are a lack of understanding of population genetics and basic biochemistry. I fear that the function wars are contributing to the existing confusion rather than clarifying it.

Doolittle, F. (2022) All about levels: transposable elements as selfish DNAs and drivers of evolution. Biology & Philosophy 37: article number 24 [doi: 10.1007/s10539-022-09852-3]

The origin and prevalence of transposable elements (TEs) may best be understood as resulting from “selfish” evolutionary processes at the within-genome level, with relevant populations being all members of the same TE family or all potentially mobile DNAs in a species. But the maintenance of families of TEs as evolutionary drivers, if taken as a consequence of selection, might be better understood as a consequence of selection at the level of species or higher, with the relevant populations being species or ecosystems varying in their possession of TEs. In 2015, Brunet and Doolittle (Genome Biol Evol 7: 2445–2457) made the case for legitimizing (though not proving) claims for an evolutionary role for TEs by recasting such claims as being about species selection. Here I further develop this “how possibly” argument. I note that with a forgivingly broad construal of evolution by natural selection (ENS) we might come to appreciate many aspects of Life on earth as its products, and TEs as—possibly—contributors to the success of Life by selection at several levels of a biological hierarchy. Thinking broadly makes this proposition a testable (albeit extraordinarily difficult-to-test) Darwinian one.

The essence of Ford's argument builds on the idea that active transposable elements (TEs) are examples of selfish DNA that propagate in the genome. This is selection at the level of DNA. Other elements of the genome, such as genes, regulatory sequences, and origins of replication, are examples of selection at the level of the organism and individuals within a population. Ford points out that some transposon-related sequences might be co-opted to form functional regions of the genome that are under purifying selection at the level of organisms and populations. He then goes on to argue that species with large amounts of transposon-related sequences in their genomes might have an evolutionary advantage because they have more raw material to work with in evolving new functions. If this is true, then this would be an example of species level selection.

These points are summarized near the end of his paper.

Thus TE families, originating and establishing themselves abundantly within a species through selection at their own level may wind up as a few relics retained by purifying selection at the level of organisms. Moreover, if this contribution to the formation of useful relics facilitated the diversification of species or the persistence of clades, then we might also say that these TE families were once “drivers” of evolution at these higher levels, and that their possession was once an adaptation at each such higher level.

There are lots of details that we could get into later but I want to deal with the main speculation; namely, that species with lots of TE fragments in their genome might have an adaptive advantage over species that don't.

This is challenging topic because lots of people have expressed their opinions on many of the topics that Ford covers in his article. None of their opinions are identical and many of them are based on different assumptions about things like evolvability, teleology, the significance of the problem, how to define species sorting, and whether hierachy theory is important . Many of those people are very smart (as is Ford Doolittle) and it hurts my brain trying to figure out who is correct. I'll try and explain some of the issues and the controversies.

A solution in search of a problem?

What's the reason for speculating that abundant bits of junk DNA might be selected because they will benefit the species at some time in the next ten million years or so? Is there a problem that this speculation explains?

The standard practice in science is to suggest hypotheses that account for an unexplained observation; for example, the idea of abundant junk DNA explained the C-value Paradox and the mutation load problem. Models are supposed to have explanatory power—they are supposed to explain something that we don't understand.

Ford thinks there's is a reason for retaining junk DNA. He writes,

Eukaryotes are but one of the many clades emerging from the prokaryotic divergence. Although such beliefs may be impossible to support empirically it is widely held that that was a special and evolutionarily important event....

Assuming this to be true (but see Booth and Doolittle 2015) we might ask if there are reasons for this differential evolutionary success, and are these reasons clade- level properties that have been selected for at this high level? Is one of them the possession of large and variable families of TEs?

You'll have to read his entire paper to see his full explanation but this is the important part. Ford, thinks that the diversity and success of eukaryotes requires an explanation because it can't be accounted for by standard evolutionary theory. I don't see the problem so I don't see the need for an explanation.

Of course there doesn't have to be a scientific problem that needs solving. This could just be a theoretical argument showing that excess DNA could lead to species level selection. That puts it more in the realm of philosophy and Ford does make the point in his paper that one of his goals is simply to defend multilevel selection theory (MLST) as a distinct possibility. The main proponents of this idea (Hierarchy Theory) are Niles Eldredge and Stephen Jay Gould and the theory is thoroughly covered in Gould's book The Structure of Evolutionary Theory. I was surprised to discover that this book isn't mentioned in the Doolittle paper.

I don't have a problem with Hierarchy Theory (or Multilevel Selection Theory, or group selection) as a theoretical possibility. The important question, as far as I'm concerned, is whether there's any evidence to support species selection. As Ford notes, "such beliefs may be impossible to support empirically" and that may be true; however, there's a danger in promoting ideas that have no empirical support because that opens a huge can of worms that less rigorous scientists are eager to exploit.

With respect to the role of transposon-related sequences, the important question, in my opinion, is: Would life look substantially less diverse or less complex if no transposon-related sequences had ever been exapted to form elements that are now under purifying selection? I suspect that the answer is no—life would be different but no less diverse or complex.

Species selection vs species sorting

Speculations about species-level evolution are usually discussed in the context of group selection and species selection or, more broadly, as the levels-of-selection debate. Those are the terms Doolittle uses and he is very much interested in explaining junk DNA as contributing to adaptation at the species level.

But if the insertion of [transcription factor binding sites] TFBSs helps species to innovate and thus diversify (speciate and/or forestall extinction) and is a consequence of TFBS-bearing TE carriage, then such carriage might be cast as an adaptation at the level of species and maintained at that level too, by the differential extinction of TE-deficient species (Linquist et al 2020; Brunet et al 2021).

I think it's unfortunate that we don't use the term 'species sorting' instead of 'species selection' because as soon as you restrict your discussion to selection, you are falling into the adaptationist trap. Elisabeth Vrba, backed by Niles Eldredge, preferred 'species sorting' partly in order to avoid this trap.

I am convinced, on the basis of Vrba's analysis, that we naturalists have been saying 'species selection' when we really should have been calling the phenomenon 'species sorting.' Species sorting is extremely common, and underlies a great deal of evolutionary patterns, as I shall make clear in this narrative. On the other hand, true species selection, in its properly more restricted sense, I now believe to be relatively rare. (Niles Eldredge, in Reinventing Darwin (1995) p. 137)

As I understand it, the difference between 'species sorting' and 'species selection' is that the former term does not commit you to an adaptationist explanation.2 Take the Galapagos finches as an example. There has been fairly rapid radiation of these species from a small initial population that reached the islands. This radiation was not due to any intrinsic propery of the finch genome that made finches more successful at speciation; it was just a lucky accident. Similary, the fact that there are many marsupial species in Australia is probably not because the marsupial genome is better suited to evolution; it's probably just a founder effect at the species level.

Gould still prefers 'species selection' but he recognizes the problem. He points out that whenever you view species as evolving entities within a larger 'population' of other species, you must consider species drift as a distinct possibility. And this means that you can get evolution via a species-level founder effect that has nothing to do with adapation.

Low population (number of species in a clade) provides the enabling criterion for important drift ... at the species level. The analogue of genetic drift—which I shall call 'species drift' must act both frequently and powerfully in macroevolution. Most clades do not contain large numbers of species. Therefore, trends may often originate for effectively random reasons. (Stephen J. Gould, in The Structure of Eolutionary Theory (2001) p. 736)

Let's speculate how this might relate to the current debate. It's possible that the apparent diversity and complexity of large multicellular eukaryotes is mostly due to the fact that they have small populations and long generation times. This means that there were plenty of opportunities for small isolated populations to evolve distinctive features. Thus, we have, for example, more than 1000 different species of bats because of species drift (not species selection). What this means is that the evolution of new species is due to the same reason (small populations) as the evolution of junk DNA. One phenomenon (junk DNA) didn't cause the other (speciation); instead, both phenomena have the same cause.

Michael Lynch has written about this several times, but the important, and mind-hurting, paper is Lynch (2007) where he says,

Under this view, the reductions in Ng that likely accompanied both the origin of eukaryotes and the emergence of the animal and land-plant lineages may have played pivotal roles in the origin of modular gene architectures on which further develomental complexity was built.

Lynch's point is that we should not rule out nonadaptive processes (species drift) in the evolution of complexity, modularity, and evolvability.

If we used species sorting instead of species selection, it would encourage a more pluralsitic perspective and a wider variety of speculations. I don't mean to imply that this issue is ignored by Ford Doolittle, only that it doesn't get the attention it deserves.

Evolvability and teleology

Ford is invoking evolvability as the solution to the evolved complexity and diversity of multicellular eukaryotes. This is not a new idea: it is promoted by James Shapiro, by Mark Kirschner and John Gerhart, and by Günter Wagner, among others. (None of them are referenced in the Doolittle paper.)

The idea here is that clades with lots of TEs should be more successful than those with less junk DNA. It would be nice to have some data the address this question. For example, is the success of the bat clade due to more transposons than other mammals? Probably not, since bats have smaller genomes than other mammals. What about birds? There are lots of bird species but birds seem to have smaller genomes than some of their reptilian ancestors.

There are dozens of Drosophila species and they all have smaller genome sizes than many other flies. In this case, it looks like the small genome had an advantage in evolvability but that's not the prediction.

The concept of evolvability is so attractive that even a staunch gene-centric adaptationist like Richard Dawkins is willing to consider it (Dawkins, 1988). Gould devotes many pages (of course) to the subject in his big Structure book. Both Dawkins and Gould recognize that they are possibly running afoul of teleology in the sense of arguing that species have foresight. Here's how Dawkins puts it ...

It is all too easy for this kind of argument to be used loosely and unrespectably. Sydney Brenner justly ridiculed the idea of foresight in evolution, specifically the notion that a molecule, useless to a lineage of organisms in it own geological era, might nevertheless be retained in the gene pool because of its possible usefulness in some future era: "It might come in handy in the Cretaceous!" I hope I shall not be taken as saying anything like that. We certainly should have no truck with suggestions that individual animals might forego their selfish advantage because of posssible long-term benefits to their species. Evolution has no foresight. But with hindsight, those evolutionary changes in embryology that look as though they were planned with foresight are the ones that dominate successful forms of life.

I interpret this to mean that we should not be fooled by hindsight into looking for causes when what we are seeing is historical contingency. If you have not already read Wonderful Life by Stephen Jay Gould then I highly recommend that you get a copy and read it now in order to understand the role of contingency in the evolution of animals. You should also brush up on the more recent contributions to the tape-of-life debate in order to put this discussion about evolvability into the proper context [Replaying life's tape].

Ford also recognizes the teleological problem and even quotes Sydney Brenner! Here's how Ford explains the relationship between transposon-related sequences and species selection.

As I argue here, organisms took on the burden of TEs not because TE accumulation, TE activity or TE diversity are selected-for traits within any species, serving some current or future need, but because lower-level (intragenomic) selection creates and proliferates TEs as selfish elements. But also, and just possibly, species in which this has happened speciate more often or last longer and (even more speculatively still) ecosystems including such species are better at surviving through time, and especially through the periodic mass extinctions to which this planet has been subjected (Brunet and Doolittle 2015). ‘More speculatively still’ because the adaptations at higher levels invoked are almost impossible to prove empirically. So what I present are again only ‘how possibly’, not ‘how actually’ arguments (Resnick 1991).

This is diving deeply into the domain of abstract thought that's not well-connected to scientific facts. As I mentioned above, I tend to look on these speculations as solutions looking for a problem. I would like to see more evidence that the properties of genomes endow certain species with more power to diversify than species with different genomic properties. Nevertheless, the idea of evolvability is not going away so let's see if Ford's view is reasonable.

As usual, Stephen Jay Gould has thought about this deeply and come up with some useful ideas. His argument is complicated but I'll try and explain it in simple terms. I'm relying mostly on the section called "Resolving the paradox of Evolvability and Defining the Exaptive Pool" in The Structure of Evolutionary Theory pages 1270-1295.

Gould argues that in Hierarchy Theory, the properties at each level of evolution must be restricted to that level. Thus, you can't have evolution at the level of DNA impinging on evolution at the level of the organism. For example, you can't have selection between transposons within a genome affecting evolution at the level of organisms and population. Similarly, selection at the level of organisms can't directly affect species sorting.

What this means in terms of genomes full of transposon-related sequences is the following. Evolution at the level of species involves sorting (or selection) between different species or clades. Each of these species have different properties that may or may not make them more prone to speciations but those properties are equivalent to mutations, or variation, at the level of organisms. Some species may have lots of transposon sequences in their genome and some may have less and this difference arises just by chance as do mutations. There is no foresight in generating mutations and there is no foresight in having different sized genomes.

During species sorting, the differences may confer some selective advantage so species with, say, more junk DNA are more likely to speciate but the differences arose by chance in the same sense that mutations arise by chance (i.e. with no foresight). For example, in Lenski's long-term evolution experiment, certain neutral mutations became fixed by chance so that new mutations arising in this background became adaptive [Contingency, selection, and the long-term evolution experiment]. Scientists and philosophers aren't concerned about whether those neutral mutations might have arisen specifically in order to potentiate future evolution.

Similarly, it is inappropriate to say that transposons, or pervasive transcription, or splicing errors, arose BECAUSE they encouraged evolution at the species level. Instead, as Dawkins said, those features just look with hindsight as though they were planned. They are fortuitous accidents of evolution.

Gould also makes the point, again, that we could just as easily be looking at species drift as species selection and we have to be careful not to resort to adaptive just-so stories in the absence of evidence for selection.

Here's how Gould describes his view of evolvability using the term "spandrel" to describe potentiating accidents.

Thus, Darwinians have always argued that mutational raw material must be generated by a process other than organismal selection, and must be "random" (in the crucal sense of undirected towards adaptive states) with respect to realized pathways of evolutionary change. Traits that confer evolvability upon species-individuals, but arise by selection upon organisms, provide a precise analog at the species level to the classical role of mutation at the organismal level. Because these traits of species evolvability arise by a different process (organismal selection), unrelated to the selective needs of species, they may emerge as the species level as "random" raw material, potentially utilizable as traits for species selection.

The phenotypic effects of mutation are, in exactly the same manner, spandrels at the organismal level—that is, nonadaptive and automatic manifestations at a higher level of different kinds of causes acting directly at a lower level. The exaptation of a small and beneficial subset of these spandrels virtually defines the process of natural selection. Why else do we so commonly refer to the theory of natural selection as as interplay of "chance" (for the spandrels of raw material in mutational variation) and "necessity" (for the locally predictable directions of selection towards adaptation). Similarly, species selection operates by exapting emergent spandrels from causal processes acting upon organisms.

This is a difficult concept to gasp so I urge interested readers to study the relevant chapter in Gould's book. The essence of his argument is that species sorting can only be understood at the level of species as individuals and the properties of species as the random variation upon which species sorting operates.

Michael Lynch is also skeptical about evolvability but for slightly different reasons (Lynch, 2007). Lynch is characteristically blunt about how he views anyone who disagrees with him. (I have been on the losing side of one of those disagreement and I still have the scars to prove it.)

Four of the major buzzwords in biology today are complexity, modularity, evolvability, and robustness, and it is often claimed that ill-defined mechanisms not previously appreciated by evolutionary biologists must be invoked to explain the existence of emergent properties that putatively enhance the long-term success of extant taxas. This stance is not very different from the intelligent-design philosophy of invoking unknown mechanisms to explain biodiversity.

This is harsh and somewhat unfair since nobody would accuse Ford Doolittle of ulterior motives. Lynch's point is that evolvability must be subjected to the same rigorous standards that he applies to population genetics. He questions the idea that "the ability to evolve itself is actively promoted by directional selection" and raises four objections.

  1. Evolvability doesn't meet the stringent conditions that a good hypothesis demands.
  2. It's not clear that the ability to evolve is necessarily advantageous.
  3. There's no evidence that differences between species are anything other than normal variation.
  4. "... comparative genomics provides no support for the idea that genome architectural changes have been promoted in multicellular lineages so as to enhance their ability to evolve.

Why transposon-related sequences?

One of the problems that occurred to me was why there was so much emphasis on transposon sequences. Don't the same arguments apply to pseudogenes, random duplications, and, especially, genome doublings? They do, but the paper appears to be part of a series that arose out of a 2018 meeting on Evolutionary Roles of Transposable Elements: The Science and Philosophy organized by Stefan Linquist and Ford Doolittle. That's why there's a focus on transposons. I assume that Ford could make the same case for other properties of large genomes such as pervasive transcription, spurious transcription binding sites, and splicing errors even if they had nothing to do with transposons.

Is this an attempt to justify junk?

I argue that genomes are sloppy and junk DNA accumulates just because it can. There's no ulterior motive in having a large genome full of junk and it's far more likely to be slightly deleterious than neutral. I believe that all the evidence points in that direction.

This is not a popular view. Most scientists want to believe that all that of excess DNA is there for a reason. If it doesn't have a direct functional role then, at the very least, it's preserved in the present because it allows for future evolution. The arguments promoted by Ford Doolittle in this article, and by others in related articles, tend to support those faulty views about the importance of junk DNA even though that wasn't the intent. Doolittle's case is much more sophisticated than the naive views of junk DNA opponents but, nevertheless, you can be sure that this paper will be referenced frequently by those opponents.

Normal evolution is hard enough but multilevel selection is even harder, especially for molecular biologists who would never think of reading The Structure of Evolutionary Theory, or any other book on evolution. That's why we have to be really careful to distinguish between effects that are adaptations for species sorting and effects that are fortuitous and irrelevant for higher level sorting.

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

1. The same issues about function come up in the debate over alternative splicing [Alternative splicing and evolution].

2. See Vrba and Gould (1986) for a detailed discussion of species sorting and species seletion and how it pertains to the hierarchical perspective.

Dawkins, R. (1988) The Evolution of Evolvability. Artifical Life, The proceedings of an Interdisciplinary Workshp on The Synthesis and Simulation of Living Systems held September 1987 in Los Alamos, New Mexico. C. G. Langton, Addison-Wesley Publishing Company: 201-220.

Lynch, M. (2007) The frailty of adaptive hypotheses for the origins of organismal complexity. Proceedings of the National Academy of Sciences 104:8597-8604. [doi: 10.1073/pnas.0702207104

Vrba, E.S. and Gould, S.J. (1986) The hierarchical expansion of sorting and selection: sorting and selection cannot be equated. Paleobiology 12:217-228. [doi: 10.1017/S0094837300013671]

Friday, July 15, 2022

Alternative splicing and evolution

The important issue is whether alternative splicing is ubiquitous or rare. What are the evolutionary implications?

I believe that almost all of the splice variants that are routinely detected in eukaryotic cells are the product of splicing errors. (I've summarized the data on splicing errors in the Wikipedia article on Intron.) Database annotators have rejected several hundred thousand of these variants so that the typical human gene now lists only a handful of possible splice variants and very few of these have been experimentally confirmed as genuine examples of alternative splicing.

There are excellent examples of biologically relevant alternative splicing but they are confined to a small number of genes (<5%) and in almost all cases there are only a small number of alternatives (usually two) [Alternative splicing: function vs noise].

Saturday, July 09, 2022

Do we need a new theory of evolution?

The classic Modern Synthesis is effectively dead. It was replaced by a more modern version that includes Neutral Theory, Nearly-Neutral Theory, and the importance of random genetic drift. Proponents of the "Extended Evolutionary Synthesis" don't have anything significant to add to our current understanding of evolutionary theory.

The latest kerfuffle in evolution is over a recent article published in The Guardian by Stephen Buranyi, Do we need a new theory of evolution?. The subtitle of the article summarizes the issue ...

A new wave of scientists argues that mainstream evolutionary theory needs an urgent overhaul. Their opponents have dismissed them as misguided careerists – and the conflict may determine the future of biology.

I think Stephen Buranyi did a pretty good job of covering the controversy as long as you ignore the first four paragraphs of his article. He talked to all the right people1 and he got to the gist of the fundamental problem; namely, the over-emphasis on natural selection as the only significant player in evolution. There's no question that this is a serious problem. Here's a quotation from his article.

Wednesday, June 29, 2022

The Function Wars Part XII: Revising history and defending ENCODE

I'm very disappointed in scientists and philosophers who try to defend ENCODE's behavior on the grounds that they were using a legitimate definition of function. I'm even more annoyed when they deliberately misrepresent ENCODE's motive in launching the massive publicity campaign in 2012.

Here's another new paper on the function wars.

Ratti, E. and Germain, P.-L. (2021) A Relic of Design: Against Proper Functions in Biology. Biology & Philosophy 37:27. [doi: 10.1007/s10539-022-09856-z]

The notion of biological function is fraught with difficulties - intrinsically and irremediably so, we argue. The physiological practice of functional ascription originates from a time when organisms were thought to be designed and remained largely unchanged since. In a secularized worldview, this creates a paradox which accounts of functions as selected effect attempt to resolve. This attempt, we argue, misses its target in physiology and it brings problems of its own. Instead, we propose that a better solution to the conundrum of biological functions is to abandon the notion altogether, a prospect not only less daunting than it appears, but arguably the natural continuation of the naturalisation of biology.

Tuesday, June 28, 2022

The Function Wars Part XI: Stefan Linquist responds to my critique

Stefan Linquist is a philosopher at the University of Guelph (Guelph, Ontario, Canada). He recently published a paper on function that I discussed [The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect]. This is his response.


Hi Larry,

First, thank you for giving my paper a careful read. The intended audience is the community of biologically-minded philosophers who seem largely convinced that:

1) Genes are so passé. More specifically, when it comes to explaining phenotypic development and evolution, such non-genic factors as noncoding RNA, maternally inherited methylation patterns, repetitive elements, etc. are equally if not more significant than genes. It is a short step to the view that most of these elements are somehow functional for the organism. Stated pejoratively, thinkers like John Mattick and Evelyn Fox-Keller have had a significant intellectual founder-effect on my discipline. My paper attempts to push back against this trend.

2) Molecular biology can and should ignore evolution. The idea here is that when it comes to the search for molecular mechanisms, it doesn’t matter if genomes are the product of multi-level evolution or if they had been created by God. When you work on mechanisms, you do experiments, and evolutionary considerations are irrelevant to how those experiments are conducted and interpreted. Or, so the thinking goes.

Many of your blog posts present counter arguments to these ideas with a level of understanding and precision that exceeds my efforts in this paper. Nonetheless, I want to take issue with your one suggestion (if I understand correctly) that biochemists tend to operate with a sophisticated understanding of the genome. My position is that biochemistry might be necessary, but is not sufficient for an informed view of genomics. Without Darwinian reasoning, biochemistry leads down unnecessary blind alleys.

Obvious to whom?

Let me be upfront that I am something of an academic bumpkin in comparison to fancy city-folk like you, or Palazzo, or Graur, or Doolittle, or Haig. My knowledge of molecular genetics is largely self taught. This is partly why I am perplexed by statements like the following. In a special collection of Chromosome Research entitled, “Transposable elements and the multidimensional genome” (2018), P.A. Larson (the collection editor) opens with this doozy:

“There is no such thing as “junk DNA.” Indeed, a suite of discoveries made over the past few decades have put to rest this misnomer and have identified many important roles that so-called junk DNA provides to both genome structure and function…”

Is it me? Or is it him? It’s him, right? My point is simply that it can’t be obvious to everyone within the molecular biological community that not every binding site or repetitive element is somehow functional for the organism. This is to say nothing of the hype surrounding lncRNA.

My argument in the paper is that the missing piece of information is an understanding of where the majority of eukaryotic DNA comes from: a byproduct of coevolutionary interactions between parasitic TEs and the cell. Indeed, I provide evidence in another publication, Transposon dynamics and the epigenetic switch hypothesis, that over the past two decades or so, within the fields of molecular biology and biomedicine, interest in TEs has steadily declined. This trend is surprising given that over the same period we have come to learn just how prevalent TEs are in most genomes. I think that I can show, in another forthcoming paper, that this trend toward ignoring TE coevolutionary dynamics is associated with the increased biomedicalization of molecular biology as a discipline (more on that another time, perhaps). Whether this decline of interest in TEs is responsible for the tendency to interpret junk DNA as somehow functional is a further question.

Another factor that I find perplexing is the trickle of molecular biology majors who attend my philosophy of biology undergraduate seminar. I'm not surprised that they show up at all, rather I'm surprised about their conviction that any biochemically active region of the genome simply must be functional for the organism. "Functional until proven otherwise" seems to be the mantra that one must memorize in order to pass the med-school admissions exam. When I suggest to them that Darwinian reasoning leads to an alternative hypothesis about most of the DNA in eukaryotic genomes, they balk. Some just leave my class: “What does he know, just a philosopher.” Such is the life of an academic bumpkin from the intellectual sticks.

This is all to say that, yes, you are correct that my paper presents no new biological data. In a sense, it is old news. But it is news that many people –even some academic city slickers-- seem not to have absorbed.

I like to think of philosophy journals as a clearing house for discussions that are extremely important, but would be unlikely to elbow their way into the pages of most scientific journals. Aside of helpful blogs such as yours, where else are we to debate the theoretical framing and interpretation of junk DNA?

What’s with the philosophical obsession over functions?

It’s true that my article focuses on this longstanding debate over CR vs SE functions. I can imagine that from the perspective of a molecular biologist (with such a rich ontology to draw from, and so many fine grained distinctions at your disposal) this binary must appear ham-fisted.

Let me say two things. First, I repeat that my main audience is the community of biological philosophers. In this context, these basic categories of function and the debates that surround them provide a lingua franca. To have this discussion without connecting it to function concepts would seem odd. Second, I think that you and I would both be happy if the word “function” in genetics were restricted to what I elsewhere call maintenance functions. That is, to elements that have been maintained by purifying selection. However, many of my colleagues are so convinced of point 2 (above) that this proposal is essentially a non-starter. That is, they maintain that since molecular biology doesn't investigate causal role functions (a big assumption, but let it go for now), then this discipline can ignore Darwinian reasoning. My argument is that this inference is too quick. A problem with CR functions is their permissiveness: any old strand of DNA can have some CR function or other. What we need is some way to sort the functional wheat from the junky chaff. To do that, thinking about selective history is your best bet. In effect, you can deny entrance to Darwin at the front door if you want, but eventually you’ll have to let him in through the back.

A final note on the term “selective history.” You suggested that I should have instead used “evolution” in order to discourage a Panglossian view of the genome. The issue I see with your suggestion is that “evolution” is too vague –it really just means change over time. My contention is that one needs to do more than just consider historical (e.g. phylogenetic) details in order to take a biologically informed view of the genome. In addition, one needs to think about how the cell coevolves with parasitic TEs. Maybe “coevolutionary dynamics” would have been a better choice.

Finally, a plug. The paper you read is part of a special collection in Biology and Philosophy that I co-edited with Ford Doolittle entitled, “Function, junk and transposable elements: contested issues in the science of genomics.” As I write, I see that three papers have so far appeared and the other two (including mine and one coauthored by Alex Palazzo) should see the light of day soon:

Function, junk and transposable elements: contested issues in the science of genomics

Hopefully some of these will provide additional fodder.
Cheers,
Stefan

Thursday, June 23, 2022

The Function Wars Part X: "Spam DNA"?

The authors of a recent paper think we need a new term "spam DNA" to describe some features of the human genome.

Fagundes, N.J., Bisso-Machado, R., Figueiredo, P.I., Varal, M. and Zani, A.L. (2022) What We Talk About When We Talk About “Junk DNA”. Genome Biology and Evolution 14:evac055. [doi: 10.1093/gbe/evac055]

“Junk DNA” is a popular yet controversial concept that states that organisms carry in their genomes DNA that has no positive impact on their fitness. Nonetheless, biochemical functions have been identified for an increasing fraction of DNA elements traditionally seen as “Junk DNA”. These findings have been interpreted as fundamentally undermining the “Junk DNA” concept. Here, we reinforce previous arguments that this interpretation relies on an inadequate concept of biological function that does not consider the selected effect of a given genomic structure, which is central to the “Junk DNA” concept. Next, we suggest that another (though ignored) confounding factor is that the discussion about biological functions includes two different dimensions: a horizontal, ecological dimension that reflects how a given genomic element affects fitness in a specific time, and a vertical, temporal dimension that reflects how a given genomic element persisted along time. We suggest that “Junk DNA” should be used exclusively relative to the horizontal dimension, while for the vertical dimension, we propose a new term, “Spam DNA”, that reflects the fact that a given genomic element may persist in the genome even if not selected for on their origin. Importantly, these concepts are complementary. An element can be both “Spam DNA” and “Junk DNA”, and “Spam DNA” can also be recruited to perform evolved biological functions, as illustrated in processes of exaptation or constructive neutral evolution.

The authors are scientists at the Federal Univesity of Rio Grande do Sul in Brazil. They are concerened about the origins of junk DNA and whether true selected effect functions (strong selected effect = SSE) conflict with the definition of junk DNA. Here's how they put it,

Paradoxically as it may seem, under the SSE definition, elements that contribute positively to fitness and are maintained by purifying selection would still count as “junk” only because they did not originate as an adaptation.

This is essentially correct according to how many philosophers define selected effect functions but that issue was resolved by focusing on purifying selection as the important criterion and ignoring the history of the trait (= maintenance function, MF). There is only a 'paradox' if you stick to the philosophy definition of function (i.e. SSE) and even then, the paradox only exists if the SSE definition is the only way to identify junk DNA. [see: The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect] The authors recognize this since they include a good discussion of this other definition (MF) and its advantages. Nevertheless, they propose a new term called "spam DNA" to help clarify the problem.

"Spam DNA" represents every genomic element which has not been selected for during its origin in the genome, even if it currently participates in relevant biological functions.

All of the DNA in the light blue box is spam DNA. Note that it includes DNA that is currently functional as long as it originated from junk DNA as they define it. Also, some junk DNA isn't spam DNA as long as it arose from the inactivation of DNA that used to have a function. Thus, pseudogenes aren't junk and neither are bits and pieces of transposons.

This isn't helpful. The current debate is about how much of our genome is junk so who cares about the history of individual sequences? A significant amount of what we currently define as junk DNA may have come from once-active transposons but we may never be able to trace the history of each piece of junk DNA. Does it fall into the first category in the figure (functional to junk) or is it spam DNA? Is this really important? No,it is not.

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

Wednesday, June 22, 2022

The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect

How much of the human genome is functional? This a problem that will be solved by biochemists not epistemologists.

What is junk DNA? What is functional DNA? Defining your terms is a key part of any scientific controversy because you can't have a debate if you can't agree on what you are debating. We've been debating the prevalence of junk DNA for more than 50 years and much of that debate has been (deliberately?) muddled by one side or the other in order to score points. For example, how many times have you heard the ridiculous claim that all noncoding DNA was supposed to be junk DNA? And how many times have you heard that all transcripts must have a function merely because they exist?

Tuesday, June 14, 2022

Distrust simplicity (and turn off your irony meters)

I just stumbled upon an opinion piece published in EMBO Reports on May 22, 2022. The author is Frank Gannon who is identified as the former Director of the QIMR Berghofer Medical Research Institute in Brisbane, Australia and the title of the article is "Seek simplicity and distrust it."

I'm about to quote some excerpts from the article but before doing so I need to warn you to run off your irony meters—even if you have the latest version with the most recent software updates.

Gannon's main point is that scientists should seek simple explanations but they must be willing to abandon them when better data comes along. He gives us some examples.

However, it seems that there is a collective amnesia among scientists such that we forget to distrust the simplicity that we pursue on our path to insight. The central dogma of molecular biology—that information flows unidirectionally from DNA to RNA to protein—was overturned, at least in part, with the discovery that this linear cascade could be reversed by reverse transcription.

Really? The Central Dogma of Molecular Biology was overturned, "at least in part," by reverse transcriptase? (It wasn't.) If you are going to write about a topic like this then you'd better make sure you know what you're talking about.

The great quote from Jacques Monod “What is true for E. coli is true for the elephant”, held valid only until the discovery of introns in eukaryotes. As I was close to the earliest data that pointed to the existence of split genes, I am well aware of the incredulity of biologists when they realised that genetic material did not have the same simple design irrespective of the organism.

Monod's statement was never supposed to be taken as literally as that.1 He was referring to the unity of biochemistry (Friedman, 2004). This is clear from what he says in Chance and Necessity, "Today we know that from the bacterium to man the chemical machinery is essentially the same, in both its structure and functioning." He meant that all species have DNA, RNA, and protein and that these molecules carry out the same roles in humans as they do in bacteria. The essence of this simple observation is as true today as it was 50 years ago.

The death of “Junk DNA”—a term, coined in 1972 by Susumu Ohno for the non-coding parts of the genome—has been more gradual. The perception that exons are the only useful part of the genome has been proven wrong with the discoveries of noncoding RNA, the controlling roles of intra-genomic areas, the essential interactions between distant genomic regions and peptides encoded by short open frame regions.

Did you turn off your irony meter? Don't say I didn't warn you. Jacques Monod (and Susumu Ohno) would be surprised to learn that in 1972 they knew nothing about noncoding genes and regulatory sequences.

More seriously, how did we ever get to the stage where a prominent scientist who frequently publishes opinion pieces in EMBO Reports could be so ignorant of the junk DNA controversy after all that's been written about it in the past ten years?



1. Besides, introns exist in bacteria.

Friedman, H.C. (2004) From Butyribacterium to E. coli: An Essay on Unity in Biochemistry. Perspectives in Biology and Medicine 47:47-66. [doi: 10.1353/pbm.2004.0007]

Monday, June 13, 2022

Manolis Kellis dismisses junk DNA

Manolis Kellis is a professor of computer science at the Massachusetts Institute of Technology (MIT). Sandwalk readers will remember him as one of the ENCODE leaders who participated in the massive publicity campaign of 2012 where they attempted to prove that most of the human genme is functional, not junk. He is the lead author of the semi-retraction that was published eighteen months later. [What did the ENCODE Consortium say in 2012 and 2014?]

Kellis was interviewed in April 2022 and it's interesting to hear his current views on junk DNA especially since MIT has just been rated the top university in the world for the 11th straight year. [QS ranks MIT the world’s No. 1 university for 2022-23].

His response to a question about junk DNA begins at 58 minutes. Kellis makes three points.

  • He doesn't like the word "junk."
  • Lots of noncoding DNA has known functions such as noncoding genes and regulatory sequences.
  • Half of our genome consists of transposon sequences and their regulatory regions fueled the mammalian radiation following the asteroid impact so that modern mammalian genomes now contain a complex and sophisticated network of regulatory sequences.

As I suspected, Kellis still doesn't recognize any of the evidence for junk DNA that was briefly outlined in the Kellis et al. (2014) paper. I find it surprising that after a decade of being exposed to criticism of his stance on junk DNA he is still not capable of presenting a cogent argument against junk.


Kellis, M. et al. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) April 24, 2014 published online [doi: 10.1073/pnas.1318948111]

Monday, June 06, 2022

My father on D-day

Today is the 78th anniversary of D-Day—the day British, Canadian, and American troops landed on the beaches of Normandy in World War II.1

For us baby boomers it always meant a day of special significance for our parents. In my case, it was my father who took part in the invasions. That's him on the right as he looked in 1944. He was an RAF pilot flying rocket-firing typhoons in close support of the ground troops. His missions were limited to quick strikes and reconnaissance during the first few days of the invasion because Normandy was at the limit of their range from southern England. During the second week of the invasion (June 14th) his squadron landed in Crepon, Normandy and things became very hectic from then on with several close support missions every day [see Hawker Hurricanes and Typhoons in World War II].


Monday, May 16, 2022

Wikipedia editors want to supress an article on junk DNA

I've been trying to fix the Wikipedia artilce on Noncoding DNA but it's quite a challenge because the page is controlled by editors who are opposed to junk DNA and I am accused of starting an "edit war" that goes against the consensus. On a parallel track, I have proposed creating a separate Wikipedia article on junk DNA where we can present the evidence for and against junk. This is being disussed under the "Talk" thread on the "Non-coding DNA" article.

Here's an exchange bewteen me [Genome42] and one of the editors who exerts control over the noncoding DNA page. It's shows you what we are up against.

Let's get back to the main topic. Is there anyone here who objects to creating a separate page for junk DNA? If you object, please explain why because it seems to me that we really need such a page in order to explain to viewers what the main issues are in the controversy. We need some place to put all the evidence showing that 90% of the human genome is junk and to explain why many scientists reject this evidence.Genome42 (talk) 20:18, 15 May 2022 (UTC)

I looked at pubmed and searched for "junk dna" to see how prominent this topic even is. It seems the term is declining in usage in the scientific literature [7] (see the "results by year"). This is despite all of the abundant media coverage it still gets. I would say that if the usage in the scientific literature was rising then perhaps it would be good a good idea, but the reverse is happening. I see an increasing number of papers calling for abandoning the term altogether too. Just an FYI, one of the original reasons for the merge of the junk DNA to this article was that it was causing too much confusion and edit warring as a separate page. When merged you could have the general article on noncoding DNA without the fireworks and a section isolating the controversies coming from it rather than having 2 pages on the same topic with the Junk DNA article mixing controversy with general information on noncoding DNA.Ramos1990 (talk) 21:28, 15 May 2022 (UTC)

Are you serious? Do you really believe that the debate is over and junk DNA doesn't exist just because the opinions you prefer to read are against junk? You don't seem to be knowledgeable about this topic. I can help you get up to speed. Read these articles on my blog.

Also, you seem to be genuinely confused about the difference between junk DNA and noncoding DNA. Think of it this way. Genomes can be divided into centromeric DNA and non-centromeric DNA and the junk is located in the non-centromeric DNA. Does that mean we should have an article on non-centromeric DNA where we discuss junk? We can also split the genome between regulatory DNA and non-regulatory DNA but I don't see you calling for an article on non-regulatory DNA where we discuss junk DNA.

The only reason why you favor discussing junk DNA in a article on non-coding DNA is because you think that junk DNA was once defined as non-coding DNA and this article will prove that some non-coding DNA has a function - therefore it is not all junk. That's an extremely biased, and incorrect, view. No knowledgeable scientist ever defended the claim that all noncoding DNA was junk. Do you think we didn't know about noncoding genes, regulatory sequences, and origins of replication back in the 1960s?

Genomes can be separated into functional DNA and junk DNA and that's where the debate is. The non-coding DNA fraction is a heterogeneous mixture of functional elements and junk DNA and it's very confusing to mix them. An article on junk DNA will discuss all of the various functional regions of the genome and how common they are in the human genome. We will see that if you add them all up you only get to about 5% of the genome. The article will discuss the evidence for junk DNA and the arguments against claims for abundant function. None of that is appropriate in an article on non-coding DNA.

It's easy for me to see why there was "edit warring" over a junk DNA article. It's because many of the editors here are opposed to junk DNA so they try to suppress the legitimate scientific debate. You need to recognize that what you are doing here is expressing a very personal and biased opinion about the topic of junk DNA and you are using your position to start edit wars in order to censure any views in favor of junk DNA. Genome42 (talk) 14:49, 16 May 2022 (UTC)


Sunday, May 15, 2022

Describing non-coding DNA on the NIH (USA) National Human Genome Research Institute website

Here's a link to a short podcast on non-coding DNA narrated by Shurjo K. Sen, Program Director, Divison of Genome Sciences. This is the complete text.

Non-coding DNA. So I could talk about this one forever because it actually happened to be the part of the genome that I did most of my PhD work in. And there used to be an older and derogatory term called junk DNA, which, thankfully, doesn't get used these days much longer. So really, the thing to keep in mind here that human genome is a vast, vast expanse of nucleotides, 3.3 billion almost. And only a very, very small fraction of that, about 2% actually codes for what we know to be proteins. And so the question is, what really happens with the rest? Is it just there doing nothing? Or does it have a function? And for many years, particularly in the earlier stages of genomics as a field, people were not really sure that the non-coding parts of the genome have a purpose for being there. And now, or I would say over the last decade or so maybe, we are only just starting to realize that there are an immense number of ways in which what we think of as non-coding actually might just have a more subtle way of passing its information along. So it may not code in the classical protein-coding sense. But there is a ton of information crucial in many, many ways that is hidden in this part of the genome.

I wish I could tell you that this is some kind of a spoof but it's not. It's an example of the poor state of sceince these days and of how much work we need to do to fix it. I would start by firing the Program Director of the Division of Genome Sciences.