The Microsoft browser (Edge) has a built in function called Copilot. It's an AI assistant based on ChatGPT-4.
I decided to test it byt asking "What is junk DNA?" and here's the answer it gave me.
The Microsoft browser (Edge) has a built in function called Copilot. It's an AI assistant based on ChatGPT-4.
I decided to test it byt asking "What is junk DNA?" and here's the answer it gave me.
I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.
This is the fifth post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' The fourth post makes the case that differing views on junk DNA are mainly due to philosophical disagreements.
-Nils Walter disputes junk DNA: (1) The surprise
-Nils Walter disputes junk DNA: (2) The paradigm shaft
-Nils Walter disputes junk DNA: (3) Defining 'gene' and 'function'
-Nils Walter disputes junk DNA: (4) Different views of non-functional transcripts
The most important issue, according to Nils Walter, is whether the human genome contains huge numbers of genes for lncRNAs and other types of regulatory RNAs. He doesn't give us any indication of how many of these potential genes he thinks exist or what percentage of the genome they cover. This is important since he's arguing against junk DNA but we don't know how much junk he's willing to accept.
There are several hundred thousand transcripts in the RNA databases. Most of them are identified as lncRNAs because they are bigger than 200 bp. Let's assume, for the sake of argument, that 200,000 of these transcripts have a biologically relevant function and therefore there are 200,000 non-coding genes. A typical size might be 1000 bp so these genes would take up about 6.5% of the genome. That's about 10 times the number of protein-coding genes and more than 6 times the amount of coding DNA.
That's not going to make much of a difference in the junk DNA debate since proponents of junk DNA argue that 90% of the genome is junk and 10% is functional. All of those non-coding genes can be accommodated within the 10%.
The ENCODE researchers made a big deal out of pervasive transcription back in 2007 and again in 2012. We can quibble about the exact numbers but let's say that 80% of the human is transcribed. We know that protein-coding genes occupy at least 40% percent of the genome so much of this pervasive transcription is introns. If all of the presumptive regulatory genes are located in the remaining 40% (i.e. none in introns), and the average size is 1000 bp, then this could be about 1.24 million non-coding genes. Is this reasonable? Is this what Nils Walter is proposing?
I think there's some confusion about the difference between large numbers of functional transcripts and the bigger picture of how much total junk DNA there is in the human genome. I wish the opponents of junk DNA would commit to how much of the genome they think is functional and what evidence they have to support that position.
But they don't. So instead we're stuck with debates about how to decide whether some transcripts are functional or junk.
If most detectable transcripts are due to spurious transcription of junk DNA then you would expect these transcripts to be present at very low levels. This turns out to be true as Nils Walter admits. He notes that "fewer than 1000 lncRNAs are present at greater than one copy per cell."
This is a problem for those who advocate that many of these low abundance transcripts must be functional. We are familiar with several of the ad hoc hypotheses that have been advanced to get around this problem. John Mattick has been promoting them for years [John Mattick's new paradigm shaft].
Walter advances two of these excuses. First, he says that a critical RNA may be present at an average of one molecule per cell but it might be abundant in just one specialized cell in the tissue. Furthermore, their expression might be transient so they can only be detected at certain times during development and we might not have assayed cells at the right time. I assume he's advocating that there might be a short burst of a large number of these extremely specialized regulatory RNAs in these special cells.As far as I know, there aren't many examples of such specialized gene expression. You would need at least 100,000 examples in order to make a viable case for function.
His second argument is that many regulatory RNAs are restricted to the nucleus where they only need to bind to one regulatory sequence to carry out their function. This ignores the mass action laws that govern such interactions. If you apply the same reasoning to proteins then you would only need one lac repressor protein to shut down the lac operon in E. coli but we've known for 50 years that this doesn't work in spite of the fact that the lac repressor association constant shows that it is one of the tightest binding proteins known [DNA Binding Proteins]. This is covered in my biochemistry textbook on pages 650-651.1
If you apply the same reasoning to mammalian regulatory proteins then it turns out that you need 10,000 transcription factor molecules per nucleus in order to ensure that a few specific sites are occupied. That's not only because of the chemistry of binary interactions but also because the human genome is full of spurious sites that resemble the target regulatory sequence [The Specificity of DNA Binding Proteins]. I cover this in my book in Chapter 8: "Noncoding Genes and Junk RNA" in the section titled "On the important properties of DNA-binding proteins" (pp. 200-204). I use the estrogen receptor as an example based on calculations that were done in the mid-1970s. The same principles apply to regulatory RNAs.This is a disagreement based entirely on biochemistry and molecular biology. There aren't enough examples (evidence) to make the first argument convincing and the second argument makes no sense in light of what we know about the interactions between molecules inside of the cell (or nucleus).
Note: I can almost excuse the fact that Nils Walter ignores my book on junk DNA, my biochemistry textbook, and my blog posts, but I can't excuse the fact that his main arguments have been challenged repeatedly in the scientific literature. A good scientist should go out of their way to seek out objections to their views and address them directly.1. In addition to the thermodynamic (equilibrium) problem, there's a kinetic problem. DNA binding proteins can find their binding sites relatively quickly by one dimensional diffusion—an option that's not readily available to regulatory RNAs [Slip Slidin' Along - How DNA Binding Proteins Find Their Target].
Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]
The ENCODE Project, today, announces that most of what was previously considered as 'junk DNA' in the human genome is actually functional. The ENCODE Project has found that 80 per cent of the human genome sequence is linked to biological function.
[Google Earth of Biomedical Research]
I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.
This is the fifth post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift/shaft. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' In the fourth post I discuss his claim that differing opinions on junk DNA are mainly due to philosophical disagreements.
In order to have a productive discussion about junk DNA we needed to agree on how to define "function" and "junk." Disagreements over the definitions spawned the Function Wars that became intense over the past decade. That war is over and now it's time to move beyond nitpicking about terminology.
The idea that most of the human genome is composed of junk DNA arose gradually in the late 1960s and early 1970s. The concept was based on a lot of evidence dating back to the 1940s and it gained support with the discovery of massive amounts of repetitive DNA.
Various classes of functional DNA were known back then including: regulatory sequences, protein-coding genes, noncoding genes, centromeres, and origins of replication. Other categories have been added since then but the total amount of functional DNA was not thought to be more than 10% of the genome. This was confirmed with the publication of the human genome sequence.
From the very beginning, the distinction between functional DNA and junk DNA was based on evolutionary principles. Functional DNA was the product of natural selection and junk DNA was not constrained by selection. The genetic load argument was a key feature of Susumu Ohno's conclusion that 90% of our genome is junk (Ohno, 1972a; Ohno, 1972b).
Some people revise history by claiming that no mainstream biologists ever regarded non-protein-coding DNA as “junk.”It's in the best interests of the IDiots to promote the idea that all "Darwinists" believed in the "myth" of junk DNA and that it wasn't until the predictions of the IDiots were confirmed (not) that the biologists changed their minds.
This claim is easily disproved: Francis Crick and Leslie Orgel published an article in Nature in 1980 (284: 604-607) arguing that such DNA “is little better than junk,” and “it would be folly in such cases to hunt obsessively” for functions in it. Since then, Brown University biologist Kenneth R. Miller, Oxford University biologist Richard Dawkins, University of Chicago biologist Jerry A. Coyne, and University of California–Irvine biologist John C. Avise have all argued that most of our DNA is junk, and that this provides evidence for Darwinian evolution and against intelligent design. National Institutes of Health director Francis Collins argued similarly in his widely read 2006 book The Language of God.
It is true that some biologists (such as Thomas Cavalier-Smith and Gabriel Dover) have long been skeptical of “junk DNA” claims, but probably a majority of biologists since 1980 have gone along with the myth. The revisionists are misinformed (or misinforming).
Nessa Carey has a virology PhD from the University of Edinburgh and is a former Senior Lecturer in Molecular Biology at Imperial College, London. She worked in the biotech and pharmaceutical industry for thirteen years and is now International Director for the UK's leading organisation for technology transfer professionals. She lives in Norfolk and is a Visiting Professor at Imperial College.Pretty impressive.
We've known for 60 years that some non-coding DNA has a function but the latest generation of scientists thinks this was only discovered in their lifetime. Writer Kara Mason posts an article on the Department of Biomedical Informatics website at the University of Colorado.
Eukaryotic genomes mostly consist of DNA that is not translated into protein sequence. However, noncoding DNA (ncDNA) has been little studied relative to proteins. The lack of knowledge about its functional significance has led to hypotheses that much nongenic DNA is useless "junk" (Ohno, 1972) or that it exists only to replicate itself (Doolittle and Sapienza, 1980; Orgel and Crick, 1980).Ludwig says that we now know some of the functions of non-coding DNA and one of them is regulation of gene expression.
These regulatory sequences are distributed among selfish transposons and middle or short repetitive DNAs. The genome is an extremely complex machine; functionally as well as structurally it is generally not possible to disentangle the regulatory function from the junk selfish activity. The idea of junk DNA needs to be revisited.Of course we all know about regulatory sequences. We've known about this function of non-coding DNA for half a century. The question that interests us is not whether non-coding DNA has a function but whether a large proportion of noncoding DNA is junk.
Rands, C. M., Meader, S., Ponting, C. P., and Lunter, G. (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLoS Genetics, 10(7), e1004525. [doi: 10.1371/journal.pgen.1004525]
First: State that most of our genome is junk.Oh dear. There's so much wrong with the logic of this posting that I hardly know where to begin.
Second: When more and more promoters, enhancers, repressors and other regulatory elements are discovered, claim that this of course was not included in the definition of “most of the genome”. The perfect excuse because it means you’ll never be wrong.
Last: Complain when the press does not understand that “most of our DNA” actually meant “much of our DNA , but with a lot of exceptions” and that science reporters don’t intuitively know which exceptions these are.
Post written using the zpen in dire agony over extremely poor science communication from the same persons who most eagerly criticize science communication from others.[see the original article for links - LAM]
Protein-coding sequences make up only about 1% of the mammalian genome. Much of the remaining 99% has been long assumed to be junk DNA, with little or no functional significance.I agree with Ryan Gregory that this is extremely misleading. It implies that there are legitimate scientists who think that all non-coding DNA is junk. It would be far better to say something like this ...
Genes that encode proteins, and other genes, make up only a few percent of our genome. If you add in all of the other DNA sequences that are known to be essential you still can only account for no more than 5% of our genome. Most of the rest is thought to be junk DNA with no biological function. There are no respectable scientists who think that none of it will ever be shown to have a function but the general consensus among the defenders of junk DNA is that the vast majority of these DNA sequences, consisting mostly of defective transposons and pseudogenes, will turn out to have no function.The authors of the paper go on to present evidence that about 5.4% of non-coding DNA has a function.
Continuing my survey of recent papers on junk DNA, I stumbled upon a review by Subash Lakhotia that has recently been accepted in The Proceedings of the Indian National Science Academy (Lakhotia, 2018). It illustrates the extent of the publicity campaign mounted by ENCODE and opponents of junk DNA. In the title of this post, I paraphrased a sentence from the abstract that summarizes the point of the paper; namely, that the 'recent' discovery of noncoding RNAs refutes the concept of junk DNA.
Lakhotia claims to have written a review of the history of junk DNA but, in fact, his review perpetuates a false history. He repeats a version of history made popular by John Mattick. It goes like this. Old-fashioned scientists were seduced by Crick's central dogma into thinking that the only important part of the genome was the part encoding proteins. They ignored genes for noncoding RNAs because they didn't fit into their 'dogma.' They assumed that most of the noncoding part of the genome was junk. However, recent new discoveries of huge numbers of noncoding RNAs reveal that those scientists were very stupid. We now know that the genome is chock full of noncoding RNA genes and the concept of junk DNA has been refuted.Nils Walter attempts to present the case for a functional genome by reconciling opposing viewpoints. I address his criticisms of the junk DNA position and discuss his arguments in favor of large numbers of functional non-coding RNAs.
Nils Walter is Francis S. Collins Collegiate Professor of Chemistry, Biophysics, and Biological Chemistry at the University of Michigan in Ann Arbor (Michigan, USA). He works on human RNAs and claims that, "Over 75% of our genome encodes non-protein coding RNA molecules, compared with only <2% that encodes proteins." He recently published an article explaining why he opposes junk DNA.
Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]
The human genome project's lasting legacies are the emerging insights into human physiology and disease, and the ascendance of biology as the dominant science of the 21st century. Sequencing revealed that >90% of the human genome is not coding for proteins, as originally thought, but rather is overwhelmingly transcribed into non-protein coding, or non-coding, RNAs (ncRNAs). This discovery initially led to the hypothesis that most genomic DNA is “junk”, a term still championed by some geneticists and evolutionary biologists. In contrast, molecular biologists and biochemists studying the vast number of transcripts produced from most of this genome “junk” often surmise that these ncRNAs have biological significance. What gives? This essay contrasts the two opposing, extant viewpoints, aiming to explain their basis, which arise from distinct reference frames of the underlying scientific disciplines. Finally, it aims to reconcile these divergent mindsets in hopes of stimulating synergy between scientific fields.
This is a list of scientific papers on junk DNA that you need to read (and understand) in order to participate in the junk DNA debate. It's not a comprehensive list because it's mostly papers that defend junk DNA and refute arguments for massive amounts of function. The only exception is the paper by Mattick and Dinger (2013).1 It's the only anti-junk paper that attempts to deal with the main evidence for junk DNA. If you know of any other papers that make a good case against junk DNA then I'd be happy to include them in the list.
If you come across a publication that argues against junk DNA, then you should immediately check the reference list. If you do not see some of these references in the list, then don't bother reading the paper because you know the author is not knowledgeable about the subject.1. The paper by Kellis et al. (2014) is ambiguous. It's clear that most of the ENCODE authors are still opposed to junk DNA even though the paper is mostly a retraction of their original claim that 80% of the genome is functional.
I decided to edit the Wikipedia article on non-coding DNA by adding new sections on "Noncoding genes," "Promoters and regulatory sequences," "Centromeres," and "Origins of replication." That didn't go over very well with the Wikipedia police so they deleted the sections on "Noncoding genes" and "Origins of replication." (I'm trying to restore them so you may see them come back when you check the link.)
I also decided to re-write the introduction to make it more accurate but my version has been deleted three times in favor of the original version you see now on the website. I have been threatened with being reported to Wikipedia for disruptive edits.
The introduction has been restored to the version that talks about the ENCODE project and references Nessa Carey's book. I tried to move that paragraph to the section on the ENCODE project and I deleted the reference to Carey's book on the grounds that it is not scientifically accurate [see Nessa Carey doesn't understand junk DNA]. The Wikipedia police have restored the original version three times without explaining why they think we should mention the ENCODE results in the introduction to an article on non-coding DNA and without explaining why Nessa Carey's book needs to be referenced.
The group that's objecting includes Ramos1990, Qzd, and Trappist the monk. (I am Genome42.) They seem to be part of a group that is opposed to junk DNA and resists the creation of a separate article for junk DNA. They want junk DNA to be part of the article on non-coding DNA for reasons that they don't/won't explain.
The main problem is the confusion between "noncoding DNA" and "junk DNA." Some parts of the article are reasonably balanced but other parts imply that any function found in noncoding DNA is a blow against junk DNA. The best way to solve this problem is to have two separate articles; one on noncoding DNA and it's functions and another on junk DNA. There has been a lot of resistance to this among the current editors and I can only assume that this is because they don't see the distinction. I tried to explain it in the discussion thread on splitting by pointing out that we don't talk about non-regulatory DNA, non-centromeric DNA, non-telomeric DNA, or non-origin DNA and there's no confusion about the distinction between these parts of the genome and junk DNA. So why do we single out noncoding DNA and get confused?
It looks like it's going to be a challenge to fix the current Wikipedia page(s) and even more of a challenge to get a separate entry for junk DNA.
Here is the warning that I have received from Ramos1990.
Your recent editing history shows that you are currently engaged in an edit war; that means that you are repeatedly changing content back to how you think it should be, when you have seen that other editors disagree. To resolve the content dispute, please do not revert or change the edits of others when you are reverted. Instead of reverting, please use the talk page to work toward making a version that represents consensus among editors. The best practice at this stage is to discuss, not edit-war. See the bold, revert, discuss cycle for how this is done. If discussions reach an impasse, you can then post a request for help at a relevant noticeboard or seek dispute resolution. In some cases, you may wish to request temporary page protection.
Being involved in an edit war can result in you being blocked from editing—especially if you violate the three-revert rule, which states that an editor must not perform more than three reverts on a single page within a 24-hour period. Undoing another editor's work—whether in whole or in part, whether involving the same or different material each time—counts as a revert. Also keep in mind that while violating the three-revert rule often leads to a block, you can still be blocked for edit warring—even if you do not violate the three-revert rule—should your behavior indicate that you intend to continue reverting repeatedly.
I guess that's very clear. You can't correct content to the way you think it should be as long as other editors disagree. I explained the reason for all my changes in the "history" but none of the other editors have bothered to explain why they reverted to the old version. Strange.
Larry Moran has sort-of replied to my previous blogpost but disappoints with only one substantive point. And even that one point is wrong: ID is not committed to the idea that individual genomes be well-designed; that is just an expectation some of us derive based on belief in a designer which is established on other evidence. ID would still be true if only globular proteins were designed (lookup Axe), or even if only the flagellum was designed (lookup Behe), or even if only the first life form was designed (lookup Meyer – and please read their actual work, not cheap reviews, because reviewers often dont pick up on the salient points – more below). I just say this lest readers get the impression that this is ID’s strongest point, or in any sense a weak point. It is neither.It's true that there are some IDiots who are distancing themselves from a commitment to junk DNA. There are probably some who claim that they could live with the fact that 90% of our DNA is junk.
December 20, 1979
Dear Francis:
I am sure that you realize how frightfully angry a lot of people will be if you say that much of the DNA is junk. The geneticists will be angry because they think that DNA is sacred. The Darwinian evolutionists will be outraged because they believe every change in DNA that is accepted in evolution is necessarily an adaptive change. To suggest anything else is an insult to the sacred memory of Darwin.
This additive is so pervasive that if no reason can be found for an evolutionary change, it is necessary to invent one. Kimura points out that one author attributed the pink color of flamingos to protective coloration against the setting sun. This type of thinking carries over into people who sequence mRNA. They claim that differences between rabbit and human globin mRNAs are because each species has its own requirements for secondary structure.
Various people have tried to think up possible functions for the regions of DNA that do not code for anything as far as is known. Roy Britten says that such DNA has a regulatory function.
Actually, the scheme proposed by Britten about ten years ago was that occasionally events of saltatory duplication, took place, so that a great many copies of a short piece of DNA were made. As time went by, the composition of a family of identical copies became changed by drift, until the copies no longer closely resemble each other. Figure 55 of the article by Britten shows a diagram of a sort of "junk DNA generating system". I note that he says on page 105 "the rate of increase in DNA content per cell resulting from saltatory replication alone may prove to be embarrassingly large and a mechanism for the loss of DNA may have to be invoked". I gather that you agree with this.
I quoted you on drift in DNA in a talk that I gave at the symposium for Emil Smith (see enclosure). Your concept of "junk DNA" presumably includes this idea. I shall look forward to hearing more about it, and I have been asked by Die Naturwissenschaften to write an article on silent changes, so I hope I can include mention of your new manuscript when I start to write mine.
With best regards,
Thomas H. Jukes