Templeton recently awarded a grant of $607,686 (US) to study the role of transposons in the human genome. The project leader is Stefan Linquist, a philosopher from the University of Guelph (Guelph, Ontario, Canada). Stefan has published a number of papers on junk DNA and he promotes the definition of functional DNA as DNA that is subject to purifying selection [The function wars are over]. Other members of the team include Ryan Gregory and Ford Doolittle who are prominent supporters of junk DNA.
More Recent Comments
Saturday, April 12, 2025
Templeton Foundation funds a grant on transposons
Friday, March 21, 2025
The misinformation spread by ENCODE in 2012 is gradually being recognized
The chapter contains an excellent summary of the history of genome sizes in bacteria and eukaryotes and a detailed description of both the c-value paradox and the mutation load arguments. The relationship between junk DNA and population size is described.
I was especially pleased to see that the author didn't pull any punches in describing the ENCODE publicity campaign and their false statements about junk DNA.
In 2012, a post-human-genome project called ENCODE, which aims to experimentally identify regions of the human genome that undergo transcription—or are bound by a set of DNA-binding proteins, or undergo chemical changes called epigenetic modifications—came to a stunning conclusion that at least 80% of the human genome is functional and that it was time to sing a requiem for the concept of junk DNA! However, this conclusion, which has been severely criticised since its publication, ignores decades of well-supported arguments from evolutionary biology arising from the c-value paradox, some of which we have described here or will do so shortly; it does not quite explain why this conclusion—if broadly applied to the genomes of other multicellular eukaryotes—would not imply that a fish needs 100 times as much functional DNA as a human; and plays “fast and loose” with the definition of the term ‘function’. While the ENCODE project, a great success in many ways, has provided an invaluable resource for the study of human molecular biology, we can safely ignore its ill-fated conclusion on what fraction of the human genome is functional.
Saturday, February 15, 2025
Junk DNA is gradually making its way into mainstream textbooks
Much of this opposition to junk DNA stems from a massive publiciy campaign launched by ENCODE researchers and the leading science journals back in 2012.
It's likely that most of the controversy over junk DNA is related to differing views on evolution and the power of natural selection. Most people think that natural selection is very powerful so that modern species must be extremely well-adapted to their present environment. They tend to believe that complexity is simply a reflection of sophisticated fine-tuning and this must apply to the human genome. According to this view, the presence of huge amounts of DNA with an unknown function is just a temporary situation and in the next few years most of this 'dark matter' will turn out to have a function. It has to have a function otherwise natural selection would have eliminated it.
Thursday, November 14, 2024
Science journal tries to understand misinformation
Do any of you remember the arsenic affair? That's when science accepted a paper by Felisa Wolfe-Simon and her collaborators claiming that they isolated a bacterium that substituted arsenic for phosphorus in its DNA. The paper was published online and was severely criticized after a ridiculous NASA press conference. It was eventually refuted when Rosie Redfield and others looked closely at the bacterial DNA and showed that it did not contain arsenic. The paper has still not been retracted. [See Reviewing the "Arseniclife" Paper.]
And let's not forget the massive misinformation campaign associated with the publication of ENCODE results in 2012.
Monday, October 21, 2024
Philip Ball strikes back
In order to build the case for revolution, he tries to demonstrate a paradigm shift in our view of molecular biology by showing a huge gap between the understanding of previous generations of molecular biologists and the post-genomic view. I believe he is wrong about this for two reasons: first, he misrepresents the views of older molecular biologists and, second he misrepresents the discoveries of the past twenty years. I tried to explain why he was wrong about these two claims in a previous post where I discussed an article he published in Scientific American in May 2024: Philip Ball says RNA may rule our genome.
Philip Ball responded to my criticism in a comment under that article.
Older molecular biologists were really stupid
I said ...
Ball begins with the same old myth that writers like him have been repeating for many years. He claims that before ENCODE most molecular biologists were really stupid. According to Philip Ball, most of us thought that coding DNA was the only functional part of the genome and most of the rest was junk DNA.
In the comment section of my earlier post, Philip Ball says,
I’m sorry to say that Larry’s commentary here is dismayingly inaccurate.
Let’s get this one out of the way first:
“He claims that before ENCODE most molecular biologists were really stupid.”
I have never made this claim and never would – it is a pure fabrication on Larry’s part. I guess this is what John Horgan meant in his comment to Larry: credible writers don’t just make up stuff.
I admit that Philip Ball never said those exact words. I'll leave it to the readers to decide whether my characterization of his position is accurate.
I stand by the statements I made although I admit to a bit of hyperbole. Ball has said repeatedly that the molecular biologists of my generation were wedded to the idea that coding regions were the only important part of the genome and he often connects that to the Central Dogma of Molecular Biology. He also claims that the experts in molecular biology dismissed all non-coding DNA as junk. Here's how he puts it in another article that he published recently in Aeon: We are not machines.
Only around 1-2 per cent of the entire human genome actually consists of protein-coding genes. The remainder was long thought to be mostly junk: meaningless sequences accumulated over the course of evolution. But at least some of that non-coding genome is now known to be involved in regulating genes: altering, activating or suppressing their transcription in RNA and translation into proteins.
I interpret that to mean that older molecular biologists, like me, didn't know about functional non-coding DNAs such as centromeres, telomeres, origins of replication, non-coding genes, SARs, and regulatory sequences in spite of the fact that thousands of papers on these sequences were published in the 30 years that preceded the publication of the first draft of the human genome sequence. This is not true, we did know about those things. I don't think it's too much of an exaggeration to say that Philip Ball thinks we were really stupid.
Here's what he says in his book, "How Life Works" (p. 85) when he's talking about the beginning of the human genome project.
Even at its outset, it faced the somewhat troubling issue that just 2 percent or so of our genome actually accounts for protein-coding genes. The conventional narrative was that our biology was all about proteins, for each of which the genome held the template. ... But we had all this other DNA too! What was it for? The common view was that it was mostly just junk, like the stuff in our attics: meaningless material accumulated during evolution, which our cells had no motivation to clear out.
Again, his claim is that in 1990 at the beginning of the human genome project the experts in molecular biology thought that non-coding DNA was mostly junk (98% of the genome). I have repeatedly refuted this myth and challenged anyone to come up with a single scientific paper arguing that all non-coding DNA is junk. I challenge Philip Ball to find a single molecular biology textbook written before 1990 that fails to discuss regulation, non-coding genes, and other non-coding functional elements in the human genome.
The truth is that the molecular biology experts concluded in the 1970s that we had about 30,000 genes and that 90% of our genome is junk and 10% is functional. That 10% consisted of about 2% coding DNA (now thought to be only 1%) and 8% functional non-coding DNA. So the "conventional narrative" was that there was a lot more functional non-coding DNA than coding DNA.
The human genome is full of genes for regulatory RNAs.
"Ball is one of the most meticulous, precise science writers out there. He is the antithesis of hypey, "dumb-it-down" reporting. He is MUCH more credible than you are, Laurence."
John Horgan July, 2024The title of the article I was discussing is "Revolutionary Genetics Research Shows RNA May Rule Our Genome." In that article Ball says that ENCODE was basically right and there are many more non-coding genes than protein-coding genes. I pointed out that Ball mentions some criticism of this idea but only to dismiss it. I said that "[Ball] wants you to believe that almost of all of those transcripts are functional—that's the revolution that he's promoting." Philip Ball objects to this statement ...
This too is sheer fabrication. I don’t say this in my article, nor in my book. Instead, I say pretty much what Larry seems to want me to say, but for some reason he will not admit it – which is that there is controversy about how many of the transcripts are functional."
Ball states that "ENCODE was basically right" when they claimed that 75% of our genome was transcribed and he goes on to say that ...
Dozens of other research groups, scoping out activity along the human genome, also have found that much of our DNA is churning out 'noncoding' RNA.
He says that ENCODE has identified 37,000 noncoding genes but there may be as many as 96,000. After making these definitive statements, he mentions that there are "still doubters" but then discuss why these discoveries are revolutionary. Later on he quotes John Mattick suspecting that there may be more that 500,000 non-coding genes.
Toward the end of the article, after discussing all kinds of functional RNAs, he brings up the Ponting and Haerty review where they say that most lncRNAs are just noise. He also mentions that the low copy number of non-coding RNAs raises questions about whether they are functional but immediately counters with the standard excuses from his allies.
Ball closes the article with ...
Gingeras says he is perplexed by ongoing claims that ncRNAs are merely noise or junk, as evidence is mounting that they do many things. "It is puzzling why there is such an effort to persuade colleagues to move from a sense of interest and curiosity in the ncRNA field to a more dubious and critical one," he says.
Perhaps the arguments are so intense because they undercut the way we think our biology works. Ever since the epochal discovery about DNA's double helix and how it encodes information, the bedrock idea of molecular biology has been that there are precisely encoded instructions that program specific molecules for particular tasks. But ncRNAs seem to point to a fuzzier, more collective, logic to life. It is a logic that is harder to discern and harder to understand. ut if scientists can learn to live with the fuzziness, this view of life may turn out to be more complete.
What's remarkable about the quote from a leading ENCODE worker (Gingeras) is that he is "puzzled" by scientists who are dubious and critical about claims in the ncRNA field. Isn't that what good scientists are supposed to do? Isn't that exactly what we did when we successfully challenged the dubious claims about junk DNA made in 2012?
There is no doubt in my mind that Philip Ball has fallen hook-line-and-sinker for the ENCODE claims that our genome is buzzing with non-coding genes. He only brings up the counter-arguments to dismiss them and pretend that he is being fair. Nobody who was truly skeptical about the function of transcripts would write an article with the title, "Revolutionary Genetics Research Shows RNA May Rule Our Genome."
However, as Ball points out in other comments, he does have a sentence in his book where he mentions that perhaps only 30% of the genome is functional. He says in the comment that what he believes is that the amount of functional DNA lies somewhere between 10% and 30%. That's not something that he mentions in the Scientific American article but, if he's being honest, it does mean that I was unfair when I said he believes that "almost of all of those transcripts are functional" but I only know that from what he now says, not from the published article.
If I were to take Philip Ball at his word—as expressed in the comment—then he must believe that most of the ENCODE transcripts are junk RNA. That's not a belief that you get from reading his published work.2 Furthermore, if I were to take him at his word, then he must believe that there are some reasonable criteria that must be applied to a transcript in order to decide whether it has a biologically relevant function. So, when he says that ENCODE identified 37,600 non-coding genes he must have these criteria in mind but he doesn't express any serious skepticism about that number. We all know that there's no solid evidence that such a large number of transcripts are functional but that doesn't bother Philip Ball. He thinks we are in the middle of an RNA revolution.
1. In commenting to my previous post, Ball says he believes that somewhere between 70% and 90% of our genome is junk but he doesn't say this in the Scientific American article. Instead, he says that scientists were surprised to learn that 75% of the human genome is transcribed implying that there's a lot of function. He goes on the say that "ENCODE was basically right." But what the ENCODE publicity campaign actually said was that junk DNA is dead and there's practically no junk DNA. If Ball really believes that up to 90% of the genome is junk then to me this means that ENCODE was spectacularly wrong not "basically right."
2. Ball says that 75% of the genome is transcribed. If Ball believes that as little as 10% may be functional then he must believe that less than 10% is transcribed to produce functional RNAs since he has to allow for regulatory sequences and other functional DNA elements. Let's say that 8% is a reasonable number. Ball seems to be willing to admit that 67% of the genome might be transcribed to produce junk RNA.
Friday, October 11, 2024
Philip Ball says RNA may rule our genome
Ball's ideas are complicated and I won't go into all of them in this article. Instead, I want to focus on one of his more scientific claims; namely, the claim that genomic data has overthrown the fundamental principles of molecular biology. Let's look at his recent (May 14, 2024) article in Scientific American: Revolutionary Genetics Research Shows RNA May Rule Our Genome.1
The subtile of the article is "Scientists have recently discovered thousands of active RNA molecules that can control the human body" and that's the issue that I want to discuss here.
Friday, September 27, 2024
John Mattick's seminar at the University of Toronto
Most Sandwalk readers will recognize Mattick as one of the few remaining vocal opponents of junk DNA. He is probably best known for his dog-ass plot but this is only one of the ways he misrepresents science.
Sunday, September 01, 2024
Scite Assistant (AI) answers the question "How much of the human genome consist of junk DNA?"
That sounds like a good idea until you realize that the scientific literature is full of misinformation and conflicting information. What we need is an AI assistant that can help us sort throught the misinformation and give us a genuine well-informed answer on controversial issues.
Let's pick the question of junk DNA as a completley random (!) example of such an issue. The scientific literature is full of false information about the origin of the term "junk DNA" and what it was originally intended to describe. It's also full of false information about recent results and how they pertain to junk DNA.
Tuesday, February 27, 2024
Nils Walter disputes junk DNA: (1) The surprise
Nils Walter attempts to present the case for a functional genome by reconciling opposing viewpoints. I address his criticisms of the junk DNA position and discuss his arguments in favor of large numbers of functional non-coding RNAs.
Nils Walter is Francis S. Collins Collegiate Professor of Chemistry, Biophysics, and Biological Chemistry at the University of Michigan in Ann Arbor (Michigan, USA). He works on human RNAs and claims that, "Over 75% of our genome encodes non-protein coding RNA molecules, compared with only <2% that encodes proteins." He recently published an article explaining why he opposes junk DNA.
Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]
The human genome project's lasting legacies are the emerging insights into human physiology and disease, and the ascendance of biology as the dominant science of the 21st century. Sequencing revealed that >90% of the human genome is not coding for proteins, as originally thought, but rather is overwhelmingly transcribed into non-protein coding, or non-coding, RNAs (ncRNAs). This discovery initially led to the hypothesis that most genomic DNA is “junk”, a term still championed by some geneticists and evolutionary biologists. In contrast, molecular biologists and biochemists studying the vast number of transcripts produced from most of this genome “junk” often surmise that these ncRNAs have biological significance. What gives? This essay contrasts the two opposing, extant viewpoints, aiming to explain their basis, which arise from distinct reference frames of the underlying scientific disciplines. Finally, it aims to reconcile these divergent mindsets in hopes of stimulating synergy between scientific fields.
Saturday, December 16, 2023
What is the "dark matter of the genome"?
The phrase "dark matter of the genome" is used by scientists who are skeptical of junk DNA so they want to convey the impression that most of the genome consists of important DNA whose function is just waiting to be discovered. Not surprisingly, the term is often used by researchers who are looking for funding and investors to support their efforts to use the latest technology to discover this mysterious function that has eluded other scientists for over 50 years.
The term "dark matter" is often applied to the human genome but what does it mean? We get a clue from a BBC article published by David Cox last April: The mystery of the human genome's dark matter. He begins the article by saying,
Twenty years ago, an enormous scientific effort revealed that the human genome contains 20,000 protein-coding genes, but they account for just 2% of our DNA. The rest of was written off as junk – but we are now realising it has a crucial role to play.
Sunday, November 12, 2023
ChatGPT is still making up quotations from scientists
I wondered whether ChatGPT had improved in the last six months so I asked it again about junk DNA. The answers reveal that ChatGPT is still lying and spreading false information.
What is junk DNA?
ChatGPT: "Junk DNA" is a term that was historically used to describe portions of the DNA sequence that do not code for proteins or have any known function. These non-coding regions were once thought to be vestiges of evolutionary processes or simply genetic "leftovers" with no apparent purpose.
However, it's important to note that the term "junk DNA" is now considered outdated and somewhat misleading. Advances in genomics and molecular biology have revealed that many non-coding regions of DNA have important functions, even if they do not encode proteins. These regions may contain regulatory elements that control the activity of genes, play a role in the three-dimensional structure of the genome, or have other functional significance.
The Human Genome Project and subsequent genomic research efforts have shown that a significant portion of the genome is involved in the regulation of gene expression, development, and other cellular processes. The term "non-coding DNA" or "non-coding regions" is now preferred over "junk DNA" to emphasize that these sections of the genome have functions that are not immediately apparent and are still an active area of research.
In summary, while some regions of the genome were once thought to be functionless "junk," ongoing scientific research is uncovering the complexity and functional significance of non-coding DNA.
So, the answer is "no," ChatGPT hasn't gotten any better. Even after we worked on a Wikipedia entry for junk DNA, ChatGPT is still responding with the standard non-scientific perspective on junk DNA. It implies that the term junk DNA originally meant all non-coding DNA and that the discovery of function in non-coding regions is due to recent advances in genomics and molecular biology.
I asked for some specific evidence that this was, indeed, the historical definition of junk DNA.
Can you give me some direct quotes from scientists who said that all non-coding DNA was junk?
ChatGPT: The idea that non-coding DNA was "junk DNA" was a historical perspective that gained traction in the early days of molecular biology when researchers primarily focused on protein-coding genes. It's important to note that while the term "junk DNA" was used, scientists were aware that there were likely unknown functions in non-coding regions. Over time, advances in genomics and molecular biology have revealed the complexity and functionality of non-coding DNA.
Here are a couple of historical quotes that reflect the perspective on non-coding DNA from earlier times:
Francis Crick (1970): In a 1970 paper, Francis Crick, one of the co-discoverers of the structure of DNA, stated, "The ultimate aim of the DNA investigators is to identify, with some precision, the genetic material. At present, it looks as if at least 90% of the DNA in any organism is useless."
Susumu Ohno (1972): Geneticist Susumu Ohno, in his book "Evolution by Gene Duplication" (1972), stated, "The failure to recognize this [the abundance of 'junk DNA'] for so long has misled the investigators studying the higher order chromosomal structure... No doubt the junk DNA sequences in the heterochromatin and at the centromeres are examples of 'selfish' DNA."
It's important to understand that these quotes represent a historical perspective, and scientific understanding has evolved significantly since then. The term "junk DNA" has since been reconsidered as researchers have discovered various functions within non-coding regions of the genome. Scientists today emphasize the importance of non-coding DNA in regulatory processes, genome stability, and other essential biological functions.
Neithe of these quotes support the idea that all non-coding DNA was thought to be junk. The direct quote from Crick in 1970 seems suspicious so I asked for clarification.
Can you supply the complete reference to the Francis Crick quotation that you just gave me?
ChatGPT: I apologize for any confusion, but I made an error in my previous response. After a thorough review, I couldn't find a direct quote from Francis Crick expressing the sentiment that "at least 90% of the DNA in any organism is useless." It appears there might have been a misunderstanding or misattribution in the information I provided.
Francis Crick and James Watson, in their groundbreaking work, were primarily focused on understanding the structure of DNA, and while there were discussions about non-coding regions, the notion of "junk DNA" as we understand it today wasn't explicitly articulated by Crick in the manner suggested.
I appreciate your diligence in seeking accurate information, and I apologize for any confusion caused by the error in my response. If you have any further questions or if there's anything else I can assist you with, please let me know.
The only way you would know that ChatGPT was making up direct quotations would be if you asked for confirmation. The average person would just assume that the fake quote was accurate. It would be fun to grade a student essay that used such a quotation!
When I asked these question last March, ChatGPT also made up false quotations that didn't even support it's claim that all non-coding DNA was thought to be junk [ChatGPT lies about junk DNA].
Can you supply the complete reference to the Susumu Ohno quotation that you just gave me?
ChatGPT: Certainly, Susumu Ohno's quote about "junk DNA" is attributed to his book "Evolution by Gene Duplication," published in 1970. I must clarify that the quote I provided earlier is a paraphrased version based on the general sentiment expressed in his work rather than an exact wording. Unfortunately, I couldn't locate the precise quote within the text.
If you're interested in exploring Ohno's thoughts on this matter in more detail, you may want to refer to his book "Evolution by Gene Duplication," which was influential in discussions about the evolutionary significance of gene duplication and, by extension, the role of non-coding DNA.
This is another example of a made-up quotation and it doesn't address the main issue. ChatGPT is unable to provide any evidence that all non-coding DNA was thought to be junk DNA.
The behavior of ChatGPT should be a major concern. If we are going to rely on artificial intelligence in the future then we better make sure that the information it gathers is correct.
Sunday, October 15, 2023
Only 10.7% of the human genome is conserved
The Zoonomia project aligned the genome sequences of 240 mammalian species and determined that only 10.7% of the human genome is conserved. This is consistent with the idea that about 90% of our genome is junk.
The April 28, 2023 issue of science contains eleven papers reporting the results of a massive study comparing the genomes of 240 mammalian species. The issue also contains a couple of "Perspectives" that comment on the work.
Friday, September 29, 2023
Evelyn Fox Keller (1936 - 2023) and junk DNA
Evelyn Fox Keller died a few days ago (Sept. 22, 2023). She was a professor of History and Philosopher of Science at the Massachusetts Institute of Technology (Boston, MA, USA). Most of the obituaries praise her for her promotion of women scientists and her critiques of science as a male-dominated discipline. More recently, she turned her attention to molecular biology and genomics and many philosophers (and others) seem to think that she made notable contributions in that area as well.
Thursday, July 06, 2023
James Shapiro doesn't like junk DNA
Shapiro doubles down on his claim that junk DNA doesn't exist.
It's been a while since we've heard from James Shaprio. You might recall that James A. Shapiro is a biochemistry/microbiology professor at the University of Chicago and the author of a book promoting natural genetic engineering. I reviewed his book and didn't like it very much—Shapiro didn't like my review [James Shapiro Never Learns] [James Shapiro Responds to My Review of His Book].
Thursday, February 16, 2023
Birds of a feather: epigenetics and opposition to junk DNA
There's an old saying that birds of a feather flock together. It means that people with the same interests tend to associate with each other. It's extended meaning refers to the fact that people who believe in one thing (X) tend to also believe in another (Y). It usually means that X and Y are both questionable beliefs and it's not clear why they should be associated.
I've noticed an association between those who promote epigenetics far beyond it's reasonable limits and those who reject junk DNA in favor of a genome that's mostly functional. There's no obvious reason why these two beliefs should be associated with each other but they are. I assume it's related to the idea that both beliefs are presumed to be radical departures from the standard dogma so they reinforce the idea that the author is a revolutionary.
Or maybe it's just that sloppy thinking in one field means that sloppy thinking is the common thread.
Here's an example from Chapter 4 of a 2023 edition of the Handbook of Epigenetics (Third Edition).
The central dogma of life had clearly established the importance of the RNA molecule in the flow of genetic information. The understanding of transcription and translation processes further elucidated three distinct classes of RNA: mRNA, tRNA and rRNA. mRNA carries the information from DNA and gets translated to structural or functional proteins; hence, they are referred to as the coding RNA (RNA which codes for proteins). tRNA and rRNA help in the process of translation among other functions. A major part of the DNA, however, does not code for proteins and was previously referred to as junk DNA. The scientists started realizing the role of the junk DNA in the late 1990s and the ENCODE project, initiated in 2003, proved the significance of junk DNA beyond any doubt. Many RNA types are now known to be transcribed from DNA in the same way as mRNA, but unlike mRNA they do not get translated into any protein; hence, they are collectively referred to as noncoding RNA (ncRNA). The studies have revealed that up to 90% of the eukaryotic genome is transcribed but only 1%–2% of these transcripts code for proteins, the rest all are ncRNAs. The ncRNAs less than 200 nucleotides are called small noncoding RNAs and greater than 200 nucleotides are called long noncoding RNAs (lncRNAs).
In case you haven't been following my blog posts for the past 17 years, allow me to briefly summarize the flaws in that paragraph.
- The central dogma has nothing to do with whether most of our genome is junk
- There was never, ever, a time when knowledgeable scientists defended the idea that all noncoding DNA is junk
- ENCODE did not "prove the significance of junk DNA beyond any doubt"
- Not all transcripts are functional; most of them are junk RNA transcribed from junk DNA
So, I ask the same question that I've been asking for decades. How does this stuff get published?
Thursday, December 22, 2022
Junk DNA, TED talks, and the function of lncRNAs
Most of our genome is transcribed but so far only a small number of these transcripts have a well-established biological function.
The fact that most of our genome is transcribed has been known for 50 years but that fact only became widely known with the publication of ENCODE's preliminary results in 2007 (ENCODE, 2007). The ENOCDE scientists referred to this as "pervasive transription" and this label has stuck.
By the end of the 1970s we knew that much of this transcription was due to introns. The latest data shows that protein coding genes and known noncoding genes occupy about 45% of the genome and most of that is intron sequences that are mostly junk. That leaves 30-40% of the genome that is transcribed at some point producing something like one million transcripts of unknown function.
Wednesday, December 21, 2022
A University of Chicago history graduate student's perspective on junk DNA
A new master's thesis on the history of junk DNA has been posted. It's from the Department of History at the University of Chicago.
My routine scan for articles on junk DNA turned up the abstract of an M.A. thesis on the history of junk DNA: Requiem for a Gene: The Problem of Junk DNA for the Molecular Paradigm. The supervisor is Professor Emily Kern in the Department of History at the University of Chicago. I've written to her to ask for a copy of the thesis and for permission to ask her, and her student, some questions about the thesis. No reply so far.
Here's the abstract of the thesis.
“Junk DNA” has been at the center of several high-profile scientific controversies over the past four decades, most recently in the disputes over the ENCODE Project. Despite its prominence in these debates, the concept has yet to be properly historicized. In this thesis, I seek to redress this oversight, inaugurating the study of junk DNA as a historical object and establishing the need for an earlier genesis for the concept than scholars have previously recognized. In search of a new origin story for junk, I chronicle developments in the recognition and characterization of noncoding DNA sequences, positioning them within existing historiographical narratives. Ultimately, I trace the origin of junk to 1958, when a series of unexpected findings in bacteria revealed the existence of significant stretches of DNA that did not encode protein. I show that the discovery of noncoding DNA sequences undermined molecular biologists’ vision of a gene as a line of one-dimensional code and, in turn, provoked the first major crisis in their nascent field. It is from this crisis, I argue, that the concept of junk DNA emerged. Moreover, I challenge the received narrative of junk DNA as an uncritical reification of the burgeoning molecular paradigm. By separating the history of junk DNA from its mythology, I demonstrate that the conceptualization of junk DNA reveals not the strength of molecular biological authority but its fragility.
It looks like it might be a history of noncoding DNA but I won't know for certain until I see the entire thesis. It's only available to students and staff at the University of Chicago.
Saturday, November 19, 2022
How many enhancers in the human genome?
In spite of what you might have read, the human genome does not contain one million functional enhancers.
The Sept. 15, 2022 issue of Nature contains a news article on "Gene regulation" [Two-layer design protects genes from mutations in their enhancers]. It begins with the following sentence.
The human genome contains only about 20,000 protein-coding genes, yet gene expression is controlled by around one million regulatory DNA elements called enhancers.
Sandwalk readers won't need to be told the reference for such an outlandish claim because you all know that it's the ENCODE Consortium summary paper from 2012—the one that kicked off their publicity campaign to convince everyone of the death of junk DNA (ENCODE, 2012). ENCODE identified several hundred thousand transcription factor (TF) binding sites and in 2012 they estimated that the total number of base pairs invovled in regulating gene expression could account for 20% of the genome.
How many of those transcription factor binding sites are functional and how many are due to spurious binding to sites that have nothing to do with gene regulation? We don't know the answer to that question but we do know that there will be a huge number of spurious binding sites in a genome of more than three billion base pairs [Are most transcription factor binding sites functional?].
The scientists in the ENCODE Consortium didn't know the answer either but what's surprising is that they didn't even know there was a question. It never occured to them that some of those transcription factor binding sites have nothng to do with regulation.
Fast forward ten years to 2022. Dozens of papers have been published criticizing the ENCODE Consortium for their stupidity lack of knowledge of the basic biochemical properties of DNA binding proteins. Surely nobody who is interested in this topic believes that there are one million functional regulatory elements (enhancers) in the human genome?
Wrong! The authors of this Nature article, Ran Elkon at Tel Aviv University (Israel) and Reuven Agami at the Netherlands Cancer Institute (Amsterdam, Netherlands), didn't get the message. They think it's quite plausible that the expression of every human protein-coding gene is controlled by an average of 50 regulatory sites even though there's not a single known example any such gene.
Not only that, for some reason they think it's only important to mention protein-coding genes in spite of the fact that the reference they give for 20,000 protein-coding genes (Nurk et al., 2022) also claims there are an additional 40,000 noncoding genes. This is an incorrect claim since Nurk et al. have no proof that all those transcribed regions are actually genes but let's play along and assume that there really are 60,000 genes in the human genome. That reduces the average number of enhancers to an average of "only" 17 enhancers per gene. I don't know of a single gene that has 17 or more proven enhancers, do you?
Why would two researchers who study gene regulation say that the human genome contains one million enhancers when there's no evidence to support such a claim and it doesn't make any sense? Why would Nature publish this paper when surely the editors must be aware of all the criticism that arose out of the 2012 ENCODE publicity fiasco?
I can think of only two answers to the first question. Either Elkon and Agami don't know of any papers challenging the view that most TF binding sites are functional (see below) or they do know of those papers but choose to ignore them. Neither answer is acceptable.
I think that the most important question in human gene regulation is how much of the genome is devoted to regulation. How many potential regulatory sites (enhancers) are functional and how many are spurious non-functional sites? Any paper on regulation that does not mention this problem should not be published. All results have to interpreted in light of conflicting claims about function.
Here are some example of papers that raise the issue. The point is not to prove that these authors are correct - although they are correct - but to show that there's a controvesy. You can't just state that there are one million regulatory sites as if it were a fact when you know that the results are being challenged.
"The observations in the ENCODE articles can be explained by the fact that biological systems are noisy: transcription factors can interact at many nonfunctional sites, and transcription initiation takes place at different positions corresponding to sequences similar to promoter sequences, simply because biological systems are not tightly controlled." (Morange, 2014)"... ENCODE had not shown what fraction of these activities play any substantive role in gene regulation, nor was the project designed to show that. There are other well-studied explanations for reproducible biochemical activities besides crucial human gene regulation, including residual activities (pseudogenes), functions in the molecular features that infest eukaryotic genomes (transposons, viruses, and other mobile elements), and noise." (Eddy, 2013)
"Given that experiments performed in a diverse number of eukaryotic systems have found only a small correlation between TF-binding events and mRNA expression, it appears that in most cases only a fraction of TF-binding sites significantly impacts local gene expression." (Palazzo and Gregory, 2014)
One surprising finding from the early genome-wide ChIP studies was that TF binding is widespread, with thousand to tens of thousands of binding events for many TFs. These number do not fit with existing ideas of the regulatory network structure, in which TFs were generally expected to regulate a few hundred genes, at most. Binding is not necessarily equivalent to regulation, and it is likely that only a small fraction of all binding events will have an important impact on gene expression. (Slattery et al., 2014)
Detailed maps of transcription factor (TF)-bound genomic regions are being produced by consortium-driven efforts such as ENCODE, yet the sequence features that distinguish functional cis-regulatory sites from the millions of spurious motif occurrences in large eukaryotic genomes are poorly understood. (White et al., 2013)
One outstanding issue is the fraction of factor binding in the genome that is "functional", which we define here to mean that disturbing the protein-DNA interaction leads to a measurable downstream effect on gene regulation. (Cusanovich et al., 2014)
... we expect, for example, accidental transcription factor-DNA binding to go on at some rate, so assuming that transcription equals function is not good enough. The null hypothesis after all is that most transcription is spurious and alterantive transcripts are a consequence of error-prone splicing. (Hurst, 2013)
... as a chemist, let me say that I don't find the binding of DNA-binding proteins to random, non-functional stretches of DNA surprising at all. That hardly makes these stretches physiologically important. If evolution is messy, chemistry is equally messy. Molecules stick to many other molecules, and not every one of these interactions has to lead to a physiological event. DNA-binding proteins that are designed to bind to specific DNA sequences would be expected to have some affinity for non-specific sequences just by chance; a negatively charged group could interact with a positively charged one, an aromatic ring could insert between DNA base pairs and a greasy side chain might nestle into a pocket by displacing water molecules. It was a pity the authors of ENCODE decided to define biological functionality partly in terms of chemical interactions which may or may not be biologically relevant. (Jogalekar, 2012)
Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko, A., et al. (2022) The complete sequence of a human genome. Science, 376:44-53. [doi:10.1126/science.abj6987]
The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57-74. [doi: 10.1038/nature11247]
Monday, September 05, 2022
The 10th anniversary of the ENCODE publicity campaign fiasco
On Sept. 5, 2012 ENCODE researchers, in collaboration with the science journal Nature, launched a massive publicity campaign to convince the world that junk DNA was dead. We are still dealing with the fallout from that disaster.
The Encyclopedia of DNA Elements (ENCODE) was originally set up to discover all of the functional elements in the human genome. They carried out a massive number of experiments involving a huge group of researchers from many different countries. The results of this work were published in a series of papers in the September 6th, 2012 issue of Nature. (The papers appeared on Sept. 5th.)
Sunday, September 04, 2022
Wikipedia: the ENCODE article
The ENCODE article on Wikipedia is a pretty good example of how to write a science article. Unfortunately, there are a few issues that will be very difficult to fix.
When Wikipedia was formed twenty years ago, there were many people who were skeptical about the concept of a free crowdsourced encyclopedia. Most people understood that a reliable source of information was needed for the internet because the traditional encyclopedias were too expensive, but could it be done by relying on volunteers to write articles that could be trusted?
The answer is mostly “yes” although that comes with some qualifications. Many science articles are not good; they contain inaccurate and misleading information and often don’t represent the scientific consensus. They also tend to be disjointed and unreadable. On the other hand, many non-science articles are at least as good, and often better, than anything in the traditional encyclopedias (eg. Battle of Waterloo; Toronto, Ontario; The Beach Boys).
By 2008, Wikipedia had expanded enormously and the quality of articles was being compared favorably to those of Encyclopedia Britannica, which had been forced to go online to compete. However, this comparison is a bit unfair since it downplays science articles.