More Recent Comments

Showing posts sorted by relevance for query junk dna. Sort by date Show all posts
Showing posts sorted by relevance for query junk dna. Sort by date Show all posts

Friday, May 07, 2021

More misinformation about junk DNA: this time it's in American Scientist

Emily Mortola and Manyuan Long have just published an article in American Scientist about Turning Junk into Us: How Genes Are Born. The article contains a lot of misinformaton about junk DNA that I'll discuss below.

Emily Mortola is a freelance science writer who worked with Manyuan Long when she was an undergraduate (I think). Manyuan Long is the Edna K. Papazian Distinguished Service Professor of Ecology and Evolution in the Department of Ecology and Evolution at the University of Chicago. His main research interest is the origin of new genes. It's reasonable to suspect that he's an expert on genome structure and evolution.

The article is behind a paywall so most of you can't see anything more than the opening paragraphs so let's look at those first. The second sentence is ...

As we discovered in 2003 with the conclusion of the Human Genome Project, a monumental 13-year-long research effort to sequence the entire human genome, approximately 98.8 percent of our DNA was categorized as junk.

This is not correct. The paper on the finished version of the human genome sequence was published in October 2004 (Finishing the euchromatic sequence of the human genome) and the authors reported that the coding exons of protein-coding genes covered about 1.2% of the genome. However, the authors also noted that there are many genes for tRNAs, ribosomal RNAs, snoRNAs, microRNAs, and probably other functional RNAs. Although they don't mention it, the authors must also have been aware of regulatory sequences, centromeres, telomeres, origins of replication and possibly other functional elements. They never said that all noncoding DNA (98.8%) was junk because that would be ridiculous. It's even more ridiculous to say it in 2021 [Stop Using the Term "Noncoding DNA:" It Doesn't Mean What You Think It Means].

The part of the article that you can see also lists a few "Quick Takes" and one of them is ...

Close to 99 percent of our genome has been historically classified as noncoding, useless “junk” DNA. Consequently, these sequences were rarely studied.

This is also incorrect as many scientists have pointed out repeatedly over the past fifty years or so. At no time in the past 50 years has any knowledgeable scientist ever claimed that all noncoding DNA is junk. I'm sorely tempted to accuse the authors of this article of lying because they really should know better, especially if they're writing an article about junk DNA in 2021. However, I reluctantly defer to Hanlon's razor.

Mortola and Long claim that mammalian genomes have between 85% to 99% junk DNA and wonder if it could have a function.

To most geneticists, the answer was that it has no function at all. The flow of genetic information—the central dogma of molecular biology—seems to leave no role for all of our intergenic sequences. In the classical view, a gene consists of a sequence of nucleotides of four possible types--adenine, cytosine, guanine, and thymine--represented by the letters A, C, G, and T. Three nucleotides in a row make up a codon, with each codon corresponding to a specific amino acid, or protein subunit, in the final protein product. In active genes, harmful mutations are weeded out by selection and beneficial ones are allowed to persist. But noncoding regions are not expressed in the form of a protein, so mutations in noncoding regions can be neither harmful nor beneficial. In other words, "junk" mutations cannot be steered by natural selection.

Those of you who have read this far will cringe when reading that. There are so many obvious errors in that paragraph that applying Hanlon's razor seems very complimentary. Imagine saying in the 21st centurey that the Central Dogma leaves no role at all for regulatory sequences or ribosomal RNA genes! But there's more; the authors double-down on their incorrect understanding of "gene" in order to fit their misunderstanding of the Central Dogma.

What Is a Gene, Really?

In our de novo gene studies in rice, to truly assess the potential significance of de novo genes, we relied on a strict definition of the word "gene" with which nearly every expert can agree. First, in order for a nucleotide sequence to be considered a true gene, an open reading frame (ORF) must be present. The ORF can be thought of as the "gene itself"; it begins with a starting mark common for every gene and ends with one of three possible finish line signals. One of the key enzymes in this process, the RNA polymerase, zips along the strand of DNA like a train on a monorail, transcribing it into its messenger RNA form. This point brings us to our second important criterion: A true gene is one that is both transcribed and translated. That is, a true gene is first used as a template to make transient messenger RNA, which is then translated into a protein.

Five Things You Should Know if You Want to Participate in the Junk DNA Debate

The authors admit in the next paragraph that some pseudogenes may produce functional RNAs that are never translated into proteins but they don't mention any other types of gene. I can understand why you might concentrate on protein-coding genes if you are studying de novo genes but why not just say that there are two types of genes and either one can arise de novo? But there's another problem with their definition: they left out a key property of a gene. It's not sufficient that a given stretch of DNA is transcribed and the RNA is translated to make a protein: the protein has to have a function before you can say that the stretch of DNA is a gene [What Is a Gene?]. We'll see in a minute why this is important.

The main point of the paper is the birth of de novo genes and the authors discuss their work with the rice genome. They say they've discovered 175 de novo genes but they don't say how many have a real biological function. This is an important problem in this field and it would have been fascinating to see a description of how they go about assigning a function to their, mostly small, pepides [The evolution of de novo genes]. I'm guessing that they just assume a function as soon as they recognize an open reading frame in a transcript.

As you can see from the title of the article, the emphasis is on the idea that de novo genes can arise from junk DNA—a concept that's not seriously disputed. The one good thing about the article is that the authors do not directly state that the reason for junk DNA is to give rise to new genes but this caption is troubling.

The Human Genome Project was a 13-year-long research effort aimed at mapping the entire human genetic sequence. One of its most intriguing findings was the observation that the number of protein-coding genes estimated to exist in humans--approximately 22,300--represents a mere 1.2 percent of our whole genome, with the other 98.8 percent being categorized as noncoding, useless junk. Analyses of this presumed junk DNA in diverse species are now revealing its role in the creation of genes.

Why do science writers continue to spread misinformation about junk DNA when there's so much correct information out there? All you have to do is look [More misconceptions about junk DNA - what are we doing wrong?].


Saturday, July 29, 2023

How could a graduate student at King's College in London not know the difference between junk DNA and non-coding DNA?

There's something called "the EDIT lab blog" written by people at King's College In London (UK). Here's a recent post (May 19, 2023) that was apparently written by a Ph.D. student: J for Junk DNA Does Not Exist!.

It begins with the standard false history,

The discovery of the structure of DNA by James Watson and Francis Crick in 1953 was a milestone in the field of biology, marking a turning point in the history of genetics (Watson & Crick, 1953). Subsequent advances in molecular biology revealed that out of the 3 billion base pairs of human DNA, only around 2% codes for proteins; many scientists argued that the other 98% seemed like pointless bloat of genetic material and genomic dead-ends referred to as non-coding DNA, or junk DNA – a term you’ve probably come across (Ohno, 1972).

You all know what's coming next. The discovery of function in non-coding DNA overthrew the concept of junk DNA and ENCODE played a big role in this revolution. The post ends with,

Nowadays, researchers are less likely to describe any non-coding sequences as junk because there are multiple other and more accurate ways of labelling them. The discussion over non-coding DNA’s function is not over, and it will be long before we understand our whole genome. For many researchers, the field’s best way ahead is keeping an open mind when evaluating the functional consequences of non-coding DNA and RNA, and not to make assumptions about their biological importance.

As Sandwalk readers know, there was never a time when knowledgeable scientists said that all non-coding DNA was junk. They always knew that there was functional DNA outside of coding regions. Real open-minded scientists are able to distinguish between junk DNA and non-coding DNA and they are able to evaluate the evidence for junk DNA without dismissing it based on a misunderstanding of the history of the subject.

The question is why would a Ph.D. student who makes the effort to write a blog post on junk DNA not take the time to read up on the subject and learn the proper definition of junk and the actual evidence? Why would their supervisors and other members of the lab not know that this post is wrong?

It's a puzzlement.


Thursday, April 07, 2011

IDiots vs Francis Collins

Here's a video where several IDiots take on Francis Collins and his book The Language of God,. Jonathan Wells is prominently featured in this video and much of the attack is concerned with junk DNA. Wells makes his position very clear. He claims that modern scientific evidence has overthrown the "myth" of junk DNA and those of us who still believe in junk DNA are guilty of a "Darwin of the gaps" kind of argument. That's because, according to Welles, we don't have an explanation for junk DNA therefore we attribute it to Darwinian evolution (6.00 minute mark).

There is a lot of positive evidence that much of the DNA in our genome is non-functional. Wells is dead wrong about this. Furthermore, assuming that this junk DNA is non-functional and assuming that species share a common ancestor, we can explain many observations about genomes. IDiots can't do this. They have yet to provide an explanation for shared pseudogenes1 in the chimp and human genomes, for example. And I haven't heard any IDiot explain why the primate genomes are chock full of Alu sequences derived from a particular rearranged 7SL RNA sequence while rodent genomes have SINES from a different rearranged 7SL sequence and lots of others from a tRNA pseudogene [Junk in Your Genome: SINES].


Did Francis Collins use the existence of junk DNA as support for Darwin's theory of evolution? Here's what Wells says in the video (50 second mark).
If fact he relies on so-called junk DNA—sequences of DNA that apparently have no function—as evidence that Darwin's theory explains everything we see in living things.
I searched in the The Language of God for proof that Wells is correct. The best example I could find is from pages 129-130 where Collins describes the results from comparing DNA sequences of different organisms. He points out that you can compare coding regions and detect similarities between humans and other mammals and even yeast and bacteria. On the other hand, if you look at non-coding regions the similarities fall off rapidly so that there's almost no similarity between human DNA and non-mammalian genomes (e.g. chicken). This is powerful support for "Darwin's theory of evolution" according to Francis Collins. First, because you can construct phylogenetic trees based on DNA sequences and ...
Second, within the genome, Darwin's theory predicts that mutations that do not affect function (namely, those located in "junk DNA") will accumulate steadily over time. Mutations in the coding regions of genes, however, are expected to be observed less frequently, since most of these will be deleterious, and only a rare such event will provide a selective advantage and be retained during the evolutionary process. That is exactly what is observed.
I leave it as an exercise for Sandwalk readers to figure out how to explain this observation if the regions that accumulate fixed mutations aren't really junk but functional DNA. Your explanation should consist of two parts: (1) why the DNA is functional even though the sequence isn't conserved (provide evidence)2, and (2) why coding regions show fewer changes and why comparisons of different species lead to a tree-like organization.

I wasn't able to find where in Origin of Species Darwin discuss this prediction but I'm sure it must be there somewhere. Perhaps some kind reader can supply the page numbers?


1. Every knowledgeable, intelligent biologist knows that pseudogenes exist and they are junk. That's not in dispute. I haven't heard any Intelligent Design Creationists admit that there are thousands of functionless pseudogenes in our genome.

2. I can think of two or three possibilities but no evidence to support them.

Tuesday, November 05, 2013

Stop Using the Term "Noncoding DNA:" It Doesn't Mean What You Think It Means

Axel Visel is a member of the ENCODE Consortium. He is a Staff Scientist at the Lawrence Berkeley National Laboratory in Berkeley, California (USA). Axel Visel is responsible, in part, for the publicity fiasco of September 2012 where the entire ENCODE Consortium gave the impression that most of our genome is functional.

He is also the senior author on a paper I blogged about last week—the one where some journalists made a big deal about junk DNA when there was nothing in the paper about junk DNA [How to Turn a Simple Paper into a Scientific Breakthrough: Mention Junk DNA].

Dan Graur contacted him by email to see if he had any comment about this misrepresentation of his published work and he defended the journalist. Here's the email response from Axel Visel to Dan Gaur.

Saturday, March 21, 2015

Junk DNA comments in the New York Times Magazine

It's always fun to be quoted in The New York Times Magazine but there's a more serious issue to discuss. I'm referring to a brief article about online comments after Carl Zimmer published a piece on "Is Most of Our DNA Garbage?" a few weeks ago. If you read the comments under that article you'll discover that we have a lot of work to do if we are going to convince the general public that our genome is full of junk.

Thursday, November 29, 2007

More Misconceptions About Junk DNA

 
iayork of Mystery Rays from Outer Space is upset about an article on retroviruses that appeared in The New Yorker [ “Darwin’s Surprise” in the New Yorker]. Here's what iayork says,
The whole “junk DNA” has been thrashed out a dozen times (see Genomicron for a good start). The bottom line? If you search Pubmed for the phrase “junk DNA” you will find a total of 80 articles (compare to, say, 985 articles for “endogenous retrovirus”); and a large fraction of those 80 articles only use the phrase to explain what a poor term it is. Scientists don’t use the term “junk DNA’. Lazy journalists use it so they can sneer at scientists (who don’t use it) for using it.
Wrong! Lots of scientists use the term "junk DNA." Properly, understood, it's a very useful term and has been for several decades [Noncoding DNA and Junk DNA].

Yes, it's true that journalists often don't understand junk DNA and they are easily tricked into thinking that junk DNA is a discredited concept. The journalists are wrong, not the scientists who use the term.

It's also true that there are many scientists who feel uneasy about junk DNA because it doesn't fit with their adaptationist leanings. Just because there's controversy doesn't mean that the term isn't still used by it's proponents (I am one).

I'm sorry, iayork, but statements like that don't help in educating people about science. Junk DNA is that fraction of a genome that has no known function and based on everything we know about biology is unlikely to have some unknown function. Junk DNA happens to represent more than 90% of our genome and that's significant by my standards.


Monday, July 11, 2016

A genetics professor who rejects junk DNA

Praveen Sethupathy is a genetics professor at the University of North Carolina in Chapel Hill, North Carolina, USA.

He explains why he is a Christian and why he is "more than his genes" in Am I more than my genes? Faith, identity, and DNA.

Here's the opening paragraph ...
The word “genome” suggests to many that our DNA is simply a collection of genes from end-to-end, like books on a bookshelf. But it turns out that large regions of our DNA do not encode genes. Some once called these regions “junk DNA.” But this was a mistake. More recently, they have been referred to as the “dark matter” of our genome. But what was once dark is slowly coming to light, and what was once junk is being revealed as treasure. The genome is filled with what we call “control elements” that act like switches or rheostats, dialing the activation of nearby genes up and down based on whatever is needed in a particular cell. An increasing number of devastating complex diseases, such as cancer, diabetes, and heart disease, can often be traced back, in part, to these rheostats not working properly.

Thursday, September 20, 2012

Are All IDiots Irony Deficient?

As I'm sure you can imagine, the Intelligent Design Creationists are delighted with the ENCODE publicity. This is a case where some expert scientists support one of their pet beliefs; namely, that there's no such thing as junk DNA. The IDiots tend not to talk about other expert evolutionary biologists who disagree with them—those experts are biased Darwinists or are part of a vast conspiracy to mislead the public.

You might think that distinguishing between these two types of expert scientists would be a real challenge and you would be right. Let's watch how David Klinghoffer manoeuvres through this logical minefield at: ENCODE Results Separate Science Advocates from Propagandists. He begins with ....
"I must say," observes an email correspondent of ours, who is also a biologist, "I'm getting a kick out of watching evolutionary biologists attack molecular biologists for 'hyping' the ENCODE results."

True, and equally enjoyable -- in the sense of confirming something you strongly suspected already -- is seeing the way the ENCODE news has drawn a bright line between voices in the science world that care about science and those that are more focussed on the politics of science, even as they profess otherwise.

Wednesday, March 13, 2024

Nils Walter disputes junk DNA: (7) Conservation of transcribed DNA

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the seventh post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift/shaft. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' In the fourth post I discuss his claim that differing opinions on junk DNA are mainly due to philosophical disagreements. The fifth and sixth posts address specific arguments in the junk DNA debate.


Sequence conservation

If you don't know what a transcript is doing then how are you going to know whether it's a spurious transcript or one with an unknown function? One of the best ways is to check and see whether the DNA sequence is conserved. There's a powerful correlation between sequence conservation and function: as a general rule, functional sequences are conserved and non-conserved sequences can be deleted without consequence.

There might be an exception to the the conservation criterion in the case of de novo genes. They arise relatively recently so there's no history of conservation. That's why purifying selection is a better criterion. Now that we have the sequences of thousands of human genomes, we can check to see whether a given stretch of DNA is constrained by selection or whether it accumulates mutations at the rate we expect if its sequence were irrelevant junk DNA (neutral rate). The results show that less than 10% of our genome is being preserved by purifying selection. This is consistent with all the other arguments that 90% of our genome is junk and inconsistent with arguments that most of our genome is functional.

This sounds like a problem for the anti-junk crowd. Let's see how it's addressed in Nils Walter's article in BioEssays.

There are several hand-waving objections to using conservation as an indication of function and Walter uses them all plus one unique argument that we'll get to shortly. Let's deal with some of the "facts" that he discusses in his defense of function. He seems to agree that much of the genome is not conserved even though it's transcribed. In spite of this, he says,

"... the estimates of the fraction of the human genome that carries function is still being upward corrected, with the best estimate of confirmed ncRNAs now having surpassed protein-coding genes,[12] although so far only 10%–40% of these ncRNAs have been shown to have a function in, for example, cell morphology and proliferation, under at least one set of defined conditions."

This is typical of the rhetoric in his discussion of sequence conservation. He seems to be saying that there are more than 20,000 "confirmed" non-coding genes but only 10%-40% of them have been shown to have a function! That doesn't make any sense since the whole point of this debate is how to identify function.

Here's another bunch of arguments that Walter advances to demonstrate that a given sequence could be functional but not conserved. I'm going to quote the entire thing to give you a good sense of Walter's opinion.

A second limitation of a sequence-based conservation analysis of function is illustrated by recent insights from the functional probing of riboswitches. RNA structure, and hence dynamics and function, is generally established co-transcriptionally, as evident from, for example, bacterial ncRNAs including riboswitches and ribosomal RNAs, as well as the co-transcriptional alternative splicing of eukaryotic pre-mRNAs, responsible for the important, vast diversification of the human proteome across ∼200 cell types by excision of varying ncRNA introns. In the latter case, it is becoming increasingly clear that splicing regulation involves multiple layers synergistically controlled by the splicing machinery, transcription process, and chromatin structure. In the case of riboswitches, the interactions of the ncRNA with its multiple protein effectors functionally engage essentially all of its nucleotides, sequence-conserved or not, including those responsible for affecting specific distances between other functional elements. Consequently, the expression platform—equally important for the gene regulatory function as the conserved aptamer domain—tends to be far less conserved, because it interacts with the idiosyncratic gene expression machinery of the bacterium. Consequently, taking a riboswitch out of this native environment into a different cell type for synthetic biology purposes has been notoriously challenging. These examples of a holistic functioning of ncRNAs in their species-specific cellular context lay bare the limited power of pure sequence conservation in predicting all functionally relevant nucleotides.

I don't know much about riboswitches so I can't comment on that. As for alternative splicing, I assume he's suggesting that much of the DNA sequence for large introns is required for alternative splicing. That's just not correct. You can have effective alternative splicing with small introns. The only essential parts of introns sequences are the splice sites and a minimum amount of spacer.

Part of what he's getting at is the fact that you can have a functional transcript where the actual nucleotide sequence doesn't matter so it won't look conserved. That's correct. There are such sequences. For example, there seem to be some examples of enhancer RNAs, which are transcripts in the regulatory region of a gene where it's the act of transcription that's important (to maintain an open chromatin conformation, for example) and not the transcript itself. Similarly, not all intron sequences are junk because some spacer sequence in required to maintain a minimum distance between splice sites. All this is covered in Chapter 8 of my book ("Noncoding Genes and Junk RNA").

Are these examples enough to toss out the idea of sequence conservation as a proxy for function and assume that there are tens of thousands of such non-conserved genes in the human genome? I think not. The null hypothesis still holds. If you don't have any evidence of function then the transcript doesn't have a function—you may find a function at some time in the future but right now it doesn't have one. Some of the evidence for function could be sequence conservation but the absence of conservation is not an argument for function. If conservation doesn't work then you have to come up with some other evidence.

It's worth mentioning that, in the broadest sense, purifying selection isn't confined to nucleotide sequence. It can also take into account deletions and insertions. If a given region of the genome is deficient in random insertions and deletions then that's an indication of function in spite of the fact that the nucleotide sequence isn't maintained by purifying selection. The maintenance definition of function isn't restricted to sequence—it also covers bulk DNA and spacer DNA.

(This is a good time to bring up a related point. The absence of conservation (size or sequence) is not evidence of junk. Just because a given stretch of DNA isn't maintained by purifying selection does not prove that it is junk DNA. The evidence for a genome full of junk DNA comes from different sources and that evidence doesn't apply to every little bit of DNA taken individually. On the other hand, the maintenance function argument is about demonstrating whether a particular region has a function or not and it's about the proper null hypothesis when there's no evidence of function. The burden of proof is on those who claim that a transcript is functional.)

This brings us to the main point of Walter's objection to sequence conservation as an indication of function. You can see hints of it in the previous quotation where he talks about "holistic functioning of ncRNAs in their species-specific cellular context," but there's more ...

Some evolutionary biologists and philosophers have suggested that sequence conservation among genomes should be the primary, or perhaps only, criterion to identify functional genetic elements. This line of thinking is based on 50 years of success defining housekeeping and other genes (mostly coding for proteins) based on their sequence conservation. It does not, however, fully acknowledge that evolution does not actually select for sequence conservation. Instead, nature selects for the structure, dynamics and function of a gene, and its transcription and (if protein coding) translation products; as well as for the inertia of the same in pathways in which they are not involved. All that, while residing in the crowded environment of a cell far from equilibrium that is driven primarily by the relative kinetics of all possible interactions. Given the complexity and time dependence of the cellular environment and its environmental exposures, it is currently impossible to fully understand the emergent properties of life based on simple cause-and-effect reasoning.

The way I see it, his most important argument is that life is very complicated and we don't currently understand all of it's emergent properties. This means that he is looking for ways to explain the complexity that he expects to be there. The possibility that there might be several hundred thousand regulatory RNAs seems to fulfil this need so they must exist. According to Nils Walter, the fact that we haven't (yet) proven that they exist is just a temporary lull on the way to rigorous proof.

This seems to be a common theme among those scientists who share this viewpoint. We can see it in John Mattick's writings as well. It's as though the logic of having a genome full of regulatory RNA genes is so powerful that it doesn't require strong supporting evidence and can't be challenged by contradictory evidence. The argument seems somewhat mystical to me. Its proponents are making the a priori assumption that humans just have to be a lot more complicated than what "reductionist" science is indicating and all they have to do is discover what that extra layer of complexity is all about. According to this view, the idea that our genome is full of junk must be wrong because it seems to preclude the possibility that our genome could explain what it's like to be human.


Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]

Thursday, May 30, 2013

What Does the Bladderwort Genome Tell Us about Junk DNA?

The so-called "C-Value Paradox" was described over thirty years ago. Here's how Benjamin Lewin explained it in Genes II (1983).
The C value paradox takes its name from our inability to account for the content of the genome in terms of known function. One puzzling feature is the existence of huge variations in C values between species whose apparent complexity does not vary correspondingly. An extraordinary range of C values is found in amphibians where the smallest genomes are just below 109bp while the largest are almost 1011. It is hard to believe that this could reflect a 100-fold variation in the number of genes needed to specify different amphibians.
Since then we have dozens and dozens of examples of very similar looking species with vastly different genome sizes. These observations require an explanation and the best explanation by far is that most of the genomes of multicellular species is junk. In fact, it's the data on genome sizes that provide the best evidence for junk DNA.

Monday, December 19, 2011

Jonathan McLatchie and Junk DNA

 
THEME

Genomes & Junk DNA
Jonathan McLatchie takes on PZ Myers in a spirited attack on junk DNA [Treasure in the Genetic Goldmine: PZ Myers Fails on "Junk DNA"]. The Intelligent Design Creationists are convinced that most of our genome is functional because that's what a good designer would create. They claim that junk DNA is a myth and their "evidence" is selective quotations from the scientific literature. They ignore the big picture, as they so often due.

I discussed most of the creationist arguments in my review of The Myth of Junk DNA.

Jonathan McLatchie analyzes three argument made by PZ Myers in his presentation at Skepticon IV. In that talk PZ said that introns are junk, telomeres are junk, and transposons are junk. I have already stated that I diasgree with PZ on these points [see PZ Myers Talks About Junk DNA]. Now I want to be clear on why Jonathan McLatchie is wrong.
  1. Introns are mostly junk. I think PZ exaggerated a bit when he dismissed all introns as junk. My position is that we should treat introns as functional elements of a gene even though many (but not all) of them could probably be deleted without affecting the survival of the species. Each intron has about 50-80 bp of essential information that's required for proper splicing [Junk in Your Genome: Protein-Encoding Genes]. The rest of the intron, which can be thousand of base pairs in length, is mostly junk [Junk in Your Genome: Intron Size and Distribution]. Some introns contain essential gene regulatory regions and some contain essential genes. That does not mean that all intron sequences are functional.
  2. Telomeres are not junk. I don't think telomeres are junk [Telomeres]. They are absolutely required for proper DNA replication. PZ Myers agrees that telomeres (and centromeres) are functional DNA (28 minutes into the talk). Jonathan McLatchie claims that PZ describes telomeres as junk DNA, "Myers departs from the facts, however, when he asserts that these telomeric repetitive elements are non-functional." McLatchie is not telling the truth.
  3. Defective Transposons are Junk. PZ Myers talks about transposons as mobile genetic elements and states that transposons make up more than half of our genome. That's all junk according to PZ Myers. My position is that the small number of active transposons are functional selfish genes and the real junk is the defective transposon sequences that make up most of the genome [Transposon Insertions in the Human Genome]. Thus, I differ a bit from PZ's position. Jonathan McLatchie, like Jonathan Wells, argues that because the occasional defective transposon in the odd species has acquired a function, this means that most of the defective transposon sequences (~50% of the genome) are functional. This is nonsense.

[Image Credit: The image shows human chromosomes labelled with a telomere probe (yellow), from Christoher Counter at Duke University.]

Monday, September 10, 2012

Science Writes Eulogy for Junk DNA

Elizabeth Pennisi is a science writer for Science, the premiere American science journal. She's been writing about "dark matter" for years focusing on how little we know about most of the human genome and ignoring all of the data that says it's mostly junk [see SCIENCE Questions: Why Do Humans Have So Few Genes? ].

It doesn't take much imagination to guess what Elizabeth Pennisi is going to write when she heard about the new ENCODE Data. Yep, you guessed it. She says that the ENCODE Project Writes Eulogy for Junk DNA.

THEME

Genomes & Junk DNA
Let's look at the opening paragraph in her "eulogy."
When researchers first sequenced the human genome, they were astonished by how few traditional genes encoding proteins were scattered along those 3 billion DNA bases. Instead of the expected 100,000 or more genes, the initial analyses found about 35,000 and that number has since been whittled down to about 21,000. In between were megabases of “junk,” or so it seemed.

Saturday, October 05, 2013

Barry Arrington, Junk DNA, and Why We Call Them Idiots

You're really not going to believe what's going on over at Uncommon Descent. Not only are we witnessing the meltdown of Barry Arrington, but we may also be witnessing the beginning of the end of Intelligent Design Creationism. The IDiots are manoeuvring themselves into such an extreme position that no intelligent person can possibly support them. Just read the comments.

I'm reminded of the word "pathos" but I had to look it up to make sure I got it right. It means something that causes people to feel pity, sadness, or even compassion. It's the right word to describe what's happening. It's also similar to the word "pathetic."

Here's what's happening.

As you know, Barry Arrington claimed that the IDiots made a prediction. They predicted that there's no such thing as junk DNA. They predicted that most of our genome would turn out to have a function [Let’s Put This One To Rest Please]. That's much is true. It makes perfect sense because an Intelligent Design Creator wouldn't create a genome that was 90% junk.

Tuesday, May 24, 2011

Junk & Jonathan: Part 6—Chapter 3

This is part 6 of my review of The Myth of Junk DNA. For a list of other postings on this topic see the link to Genomes & Junk DNA in the "theme box" below or in the sidebar under "Themes."

We learn in Chapter 9 that Wells has two categories of evidence against junk DNA. The first covers evidence that sequences probably have a function and the second covers specific known examples of functional sequences. In the first category there are two lines of evidence: transcription and conservation. Both of them are covered in Chapter 3 making this one of the most important chapters in the book. The remaining category of specific examples is described in Chapters 4-7.

The title of Chapter 3 is Most DNA Is Transcribed into RNA. As you might have anticipated, the focus of Wells' discussion is the ENCODE pilot project that detected abundant transcription in the 1% of the genome that they analyzed (ENCODE Project Consortium, 2007). Their results suggest that most of the genome is transcribed. Other studies support this idea and show that transcripts often overlap and many of them come from the opposite strand in a gene giving rise to antisense RNAs.

The original Nature paper says,
... our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another.
The authors of these studies firmly believe that evidence of transcription is evidence of function. This has even led some of them to propose a new definition of a gene [see What is a gene, post-ENCODE?]. There's no doubt that many molecular biologists take this data to mean that most of our genome has a function and that's the same point that Wells makes in his book. It's evidence against junk DNA.

What are these transcripts doing? Wells devotes a section to "Specific Functions of Non-Protein-Coding RNAs." These RNAs may be news to most readers but they are well known to biochemists and molecular biologists. This is not the place to describe all the known functional non-coding RNAs but keep in mind that there are three main categories: ribosomal RNA (rRNA), transfer RNA (tRNA), and a heterogeneous category called small RNAs. There are dozens of different kinds of small RNAs including unique ones such as the 7SL RNA of signal recognition factor, the P1 RNA of RNAse P and the guide RNA in telomerase. Other categories include the spliceosome RNAs, snoRNAs, piRNAs, siRNAs, and miRNAs. These RNAs have been studied for decades. It's important to note that the confirmed examples are transcribed from genes that make up less than 1% of the genome.

One interesting category is called "long noncoding RNAs" or lncRNAs. As the name implies, these RNAs are longer that the typical small RNAs. Their functions, if any, are largely unknown although a few have been characterized. If we add up all the genes for these RNAs and assume they are functional it will account for about 0.1% of the genome so this isn't an important category in the discussion about junk DNA.

Theme

Genomes
& Junk DNA
So, we're left with a puzzle. If more than 90% of the genome is transcribed but we only know about a small number of functional RNAs then what about the rest?

Opponents of junk DNA—both creationists and scientists—would have you believe that there's a lot we don't know about genomes and RNA. They believe that we will eventually find functions for all this RNA and prove that the DNA that produces them isn't junk. This is a genuine scientific controversy. What do their scientific opponents (I am one) say about the ENCODE result?

Criticisms of the ENCODE analysis take two forms ...
  • The data is wrong and only a small fraction of the genome is transcribed
  • The data is mostly correct but the transcription is spurious and accidental. Most of the products are junk RNA.
Criticisms of the Data

Several papers have appeared that call into question the techniques used by the ENCODE consortium. They claim that many of the identified transcribed regions are artifacts. This is especially true of the repetitive regions of the genome that make up more than half of the total content. If any one of these regions is transcribed then the transcript will likely hybridize to the remaining repeats giving a false impression of the amount of DNA that is actually transcribed.

Of course, Wells doesn't mention any of these criticisms in Chapter 3. In fact, he implies that every published paper is completely accurate in spite of the fact that most of them have never been replicated and many have been challenged by subsequent work. The readers of The Myth of Junk DNA will assume, intentionally or otherwise, that if a paper appears in the scientific literature it must be true.

But criticism of the ENCODE results are so widespread that they can't be ignored so Wells is forced to deal with them in Chapter 8. (Why not in Chapter 3 when they are first mentioned?) In particular, Wells has to address the van Bakel et al. (2010) paper from Tim Hughes' lab here in Toronto. This paper was widely discussed when it came out last year [see: Junk RNA or Imaginary RNA?]. We'll deal with it when I cover Chapter 9 but, suffice to say, Wells dismisses the criticism.

Criticisms of the Interpretation

The other form of criticism focuses on the interpretation of the data rather than its accuracy. Most of us who teach transcription take pains to point out to our students that RNA polymerase binds non-specifically to DNA and that much of this binding will result in spurious transcription at a very low frequency. This is exactly what we expect from a knowledge of transcription initiation [How RNA Polymerase Binds to DNA]. The ENCODE data shows that most of the genome is "transcribed" at a frequency of once every few generations (or days) and this is exactly what we expect from spurious transcription. The RNAs are non-functional accidents due to the sloppiness of the process [Useful RNAs?].

Wells doesn't mention any of this. I don't know if that's because he's ignorant of the basic biochemistry and hasn't read the papers or whether he is deliberately trying to mislead his readers. It's probably a bit of both.

It's not as if this is some secret known only to the experts. The possibility of spurious transcription has come up frequently in the scientific literature in the past few years. For example, Guttmann et al. (2009) write,
Genomic projects over the past decade have used shotgun sequencing and microarray hybridization to obtain evidence for many thousands of additional non-coding transcripts in mammals. Although the number of transcripts has grown, so too have the doubts as to whether most are biologically functional. The main concern was raised by the observation that most of the intergenic transcripts show little to no evolutionary conservation. Strictly speaking, the absence of evolutionary conservation cannot prove the absence of function. But the remarkably low rate of conservation seen in the current catalogues of large non-coding transcripts (less than 5% of cases) is unprecedented and would require that each mammalian clade evolves its own distinct repertoire of non-coding transcripts. Instead, the data suggest that the current catalogues may consist largely of transcriptional noise, with a minority of bona fide functional lincRNAs hidden amid this background.
This paper is in the Wells reference list so we know that he has read it.

What these authors are saying is that the data is consistent with spurious transcription (noise). Part of the evidence is the lack of any sequence conservation among the transcripts. It's as though they were mostly derived from junk DNA.

Sequence Conservation

Recall that the purpose of Chapter 3 is to show that junk DNA is probably functional. The first part of the chapter reportedly shows that most of our genome is transcribed. The second part addresses sequence conservation.

Here's what Wells says about sequence conservation.
Widespread transcription of non-protein-coding DNA suggests that the RNAs produced from such DNA might serve biological functions. Ironically, the suggestion that much non-protein-coding DNA might be functional also comes from evolutionary theory. If two lineages diverge from a common ancestor that possesses regions of non-protein-coding DNA, and these regions are really nonfunctional, then they will accumulate random mutations that are not weeded out by natural selection. Many generations later, the sequences of the corresponding non-protein-coding regions in the two descendant lineages will probably be very different. [Due to fixation by random genetic drift—LAM] On the other hand, if the original non-protein-coding DNA was functional, then natural selection will tend to weed out mutations affecting that function. Many generations later, the sequences of the corresponding non-protein-coding regions in the two descendant lineages will still be similar. (In evolutionary terminology, the sequences will be "conserved.") Turning the logic around, Darwinian theory implies that if evolutionarily divergent organisms share similar non-protein-coding DNA sequences, those sequences are probably functional.
Wells then references a few papers that have detected such conserved sequences, including the Guttmann et al. (2009) paper mentioned above. They found "over a thousand highly conserved large non-coding RNAs in mammals." Indeed they did, and this is strong evidence of function.1 Every biochemist and molecular biologist will agree. One thousand lncRNAs represent 0.08% of the genome. The sum total of all other conserved sequences is also less than 1%. Wells forgets to mention this in his book. He also forgets to mention the other point that Guttman et al. make; namely, that the lack of sequence conservation suggests that the vast majority of transcripts are non-functional. (Oops!)

There's irony here. We know that the sequences of junk DNA are not conserved and this is taken as evidence (not conclusive) that the DNA is non-functional. The genetic load argument makes the same point. We know that the vast majority of spurious RNA transcripts are also not conserved from species to species and this strongly suggests that those RNAs are not functional. Wells ignores this point entirely—it never comes up anywhere in his book. On the other hand, when a small percentage of DNA (and transcripts) are conserved, this gets prominent mention.

Wells doesn't believe in common ancestry so he doesn't believe that sequences are "conserved." (Presumably they reflect common design or something like that.) Nevertheless, when an evolutionary argument of conservation suits his purpose he's happy to invoke it, while, at the same time, ignoring the far more important argument about lack of conservation of the vast majority of spurious transcripts. Isn't that strange behavior?

The bottom line hear is that Jonathan Wells is correct to point to the ENCODE data as a problem for junk DNA proponents. This is part of the ongoing scientific controversy over the amount of junk in our genome. Where I fault Wells is his failure to explain to his readers that this is disputed data and interpretation. There's no slam-dunk case for function here. In fact, the tide seems to turning more and more against the original interpretation of the data. Most knowledgeable biochemists and molecular biologists do not believe that >90% of our genome is transcribed to produce functional RNAs.

UPDATE: How much of the genome do we expect to be transcribed on a regular basis? Protein-encoding genes account for about 30% of the genome, including introns (mostly junk). They will be transcribed. Other genes produce functional RNAs and together they cover about 3% of the genome. Thus, we expect that roughly a third of the genome will be transcribed at some time during development. We also expect that a lot more of the genome will be transcribed on rare occasions just because of spurious (accidental) transcription initiation. This doesn't count. Some pseudogenes, defective transposons, and endogenous retroviruses have retained the ability to be transcribed on a regular basis. This may account for another 1-2% of the genome. They produce junk RNA.


1. Conservation is not proof of function. In an effort to test this hypothesis Nöbrega et al. (2004) deleted two large regions of the mouse genome containing large numbers of sequences corresponding to conserved non-coding RNAs. They found that the mice with the deleted regions showed no phenotypic effects indicating that the DNA was junk. Jonathan Wells forgot to mention this experiment in his book.

Guttman, M. et al. (2009) Chromatin signature reveals over a thousand highly conserved non-coding RNAs in mammals. Nature 458:223-227. [NIH Public Access]

Nörega, M.A., Zhu, Y., Plajzer-Frick, I., Afzal, V. and Rubin, E.M. (2004) Megabase deletions of gene deserts result in viable mice. Nature 431:988-993. [Nature]

The ENCODE Project Consortium (2007) Nature 447:799-816. [PDF]

Saturday, January 06, 2024

Why do Intelligent Design Creationists lie about junk DNA?

A recent post on Evolution News (sic) promotes a a new podcast: Casey Luskin on Junk DNA’s “Kuhnian Paradigm Shift”. You can listen to the podcast here but most Sandwalk readers won't bother because they've heard it all before. [see Paradigm shifting.]

Luskin repeats the now familiar refrain of claiming that scientists used to think that all non-coding DNA was junk. Then he goes on to list recent discoveries showing that some of this non-coding DNA is functional. The truth is that no knowledgeable scientist ever claimed that all non-coding DNA was junk. The original idea of junk DNA was based on evidence that only 10% of the genome is functional and these scientists knew that coding regions occupied only a few percent. Thus, right from the beginning, the experts on genome evolution knew about all sorts of functional non-coding DNA such as regulatory sequences, non-coding genes, and other things.

Friday, March 27, 2015

Plant biologists are confused about the meanings of junk DNA and genes

A recent issue of Nature contains a report on plant micro-RNAs (Lauressergues et al., 2015). The authors found that certain genes for plant micro-RNAs encoded short peptides in the micro-RNA precursors and those peptides seemed to have a biological function. What this means is that part of the longer precursor RNA that is cleaved to produce the final micro-RNA may have a function that wasn't recognized. If you thought that the part of the precursor that was thought to be discarded as useless junk was, in fact, junk, then you were wrong—at least for some genes.

This is not a big deal and the authors of the paper don't even mention junk DNA.

The paper was reviewed by Peter M. Waterhouse and Roger P. Hellens in the same issue (Waterhouse and Hellens, 2015). They think it's a big deal. Here's what they say,

Friday, November 20, 2015

The truth about ENCODE

A few months ago I highlighted a paper by Casane et al. (2015) where they said ...
In September 2012, a batch of more than 30 articles presenting the results of the ENCODE (Encyclopaedia of DNA Elements) project was released. Many of these articles appeared in Nature and Science, the two most prestigious interdisciplinary scientific journals. Since that time, hundreds of other articles dedicated to the further analyses of the Encode data have been published. The time of hundreds of scientists and hundreds of millions of dollars were not invested in vain since this project had led to an apparent paradigm shift: contrary to the classical view, 80% of the human genome is not junk DNA, but is functional. This hypothesis has been criticized by evolutionary biologists, sometimes eagerly, and detailed refutations have been published in specialized journals with impact factors far below those that published the main contribution of the Encode project to our understanding of genome architecture. In 2014, the Encode consortium released a new batch of articles that neither suggested that 80% of the genome is functional nor commented on the disappearance of their 2012 scientific breakthrough. Unfortunately, by that time many biologists had accepted the idea that 80% of the genome is functional, or at least, that this idea is a valid alternative to the long held evolutionary genetic view that it is not. In order to understand the dynamics of the genome, it is necessary to re-examine the basics of evolutionary genetics because, not only are they well established, they also will allow us to avoid the pitfall of a panglossian interpretation of Encode. Actually, the architecture of the genome and its dynamics are the product of trade-offs between various evolutionary forces, and many structural features are not related to functional properties. In other words, evolution does not produce the best of all worlds, not even the best of all possible worlds, but only one possible world.
How did we get to this stage where the most publicized result of papers published by leading scientists in the best journals turns out to be wrong, but hardly anyone knows it?

Back in September 2012, the ENCODE Consortium was preparing to publish dozens of papers on their analysis of the human genome. Most of the results were quite boring but that doesn't mean they were useless. The leaders of the Consortium must have been worried that science journalists would not give them the publicity they craved so they came up with a strategy and a publicity campaign to promote their work.

Their leader was Ewan Birney, a scientist with valuable skills as a herder of cats but little experience in evolutionary biology and the history of the junk DNA debate.

The ENCODE Consortium decided to add up all the transcription factor binding sites—spurious or not—and all the chromatin makers—whether or not they meant anything—and all the transcripts—even if they were junk. With a little judicious juggling of numbers they came up with the following summary of their results (Birney et al., 2012) ..
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
See What did the ENCODE Consortium say in 2012? for more details on what the ENCODE Consortium leaders said, and did, when their papers came out.

The bottom line is that these leaders knew exactly what they were doing and why. By saying they have assigned biochemical functions for 80% of the genome they knew that this would be the headline. They knew that journalists and publicists would interpret this to mean the end of junk DNA. Most of ENCODE leaders actually believed it.

That's exactly what happened ... aided and abetted by the ENCODE Consortium, the journals Nature and Science, and gullible science journalists all over the world. (Ryan Gregory has published a list of articles that appeared in the popular press: The ENCODE media hype machine..)

Almost immediately the knowledgeable scientists and science writers tried to expose this publicity campaign hype. The first criticisms appeared on various science blogs and this was followed by a series of papers in the published scientific literature. Ed Yong, an experienced science journalist, interviewed Ewan Birney and blogged about ENCODE on the first day. Yong reported the standard publicity hype that most of our genome is functional and this interpretation is confirmed by Ewan Birney and other senior scientists. Two days later, Ed Yong started adding updates to his blog posting after reading the blogs of many scientists including some who were well-recognized experts on genomes and evolution [ENCODE: the rough guide to the human genome].

Within a few days of publishing their results the ENCODE Consortium was coming under intense criticism from all sides. A few journalists, like John Timmer, recongized right away what the problem was ...
Yet the third sentence of the lead ENCODE paper contains an eye-catching figure that ended up being reported widely: "These data enabled us to assign biochemical functions for 80 percent of the genome." Unfortunately, the significance of that statement hinged on a much less widely reported item: the definition of "biochemical function" used by the authors.

This was more than a matter of semantics. Many press reports that resulted painted an entirely fictitious history of biology's past, along with a misleading picture of its present. As a result, the public that relied on those press reports now has a completely mistaken view of our current state of knowledge (this happens to be the exact opposite of what journalism is intended to accomplish). But you can't entirely blame the press in this case. They were egged on by the journals and university press offices that promoted the work—and, in some cases, the scientists themselves.


[Most of what you read was wrong: how press releases rewrote scientific history]
Nature may have begun to realize that it made a mistake in promoting the idea that most of our genome was functional. Two days after the papers appeared, Brendan Maher, a Feature Editor for Nature, tried to get the journal off the hook but only succeeded in making matters worse [see Brendan Maher Writes About the ENCODE/Junk DNA Publicity Fiasco].

Meanwhile, two private for-profit companies, illumina and Nature, team up to promote the ENCODE results. They even hire Tim Minchin to narrate it. This is what hype looks like ...


Soon articles began to appear in the scientific literature challenging the ENCODE Consortium's interpretation of function and explaining the difference between an effect—such as the binding of a transcription factor to a random piece of DNA—and a true biological function.

Eddy, S.R. (2012) The C-value paradox, junk DNA and ENCODE. Current Biology, 22:R898. [doi: 10.1016/j.cub.2012.10.002]

Niu, D. K., and Jiang, L. (2012) Can ENCODE tell us how much junk DNA we carry in our genome?. Biochemical and biophysical research communications 430:1340-1343. [doi: 10.1016/j.bbrc.2012.12.074]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) published online March 11, 2013. [PubMed] [doi: 10.1073/pnas.1221376110]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A., and Elhaik, E. (2013) On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution published online: February 20, 2013 [doi: 10.1093/gbe/evt028

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology, 23:R259-R261. [10.1016/j.cub.2013.03.023]

Hurst, L.D. (2013) Open questions: A logic (or lack thereof) of genome organization. BMC biology, 11:58. [doi:10.1186/1741-7007-11-58]

Morange, M. (2014) Genome as a Multipurpose Structure Built by Evolution. Perspectives in biology and medicine, 57:162-171. [doi: 10.1353/pbm.2014.000]

Palazzo, A.F., and Gregory, T.R. (2014) The Case for Junk DNA. PLoS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]

By March 2013—six months after publication of the ENCODE papers—some editors at Nature decided that they had better say something else [see Anonymous Nature Editors Respond to ENCODE Criticism]. Here's the closest thing to an apology that they have ever written ....
The debate over ENCODE’s definition of function retreads some old battles, dating back perhaps to geneticist Susumu Ohno’s coinage of the term junk DNA in the 1970s. The phrase has had a polarizing effect on the life-sciences community ever since, despite several revisions of its meaning. Indeed, many news reports and press releases describing ENCODE’s work claimed that by showing that most of the genome was ‘functional’, the project had killed the concept of junk DNA. This claim annoyed both those who thought it a premature obituary and those who considered it old news.

There is a valuable and genuine debate here. To define what, if anything, the billions of non-protein-coding base pairs in the human genome do, and how they affect cellular and system-level processes, remains an important, open and debatable question. Ironically, it is a question that the language of the current debate may detract from. As Ewan Birney, co-director of the ENCODE project, noted on his blog: “Hindsight is a cruel and wonderful thing, and probably we could have achieved the same thing without generating this unneeded, confusing discussion on what we meant and how we said it”.
Oops! The importance of junk DNA is still an "important, open and debatable question" in spite of what the video sponsored by Nature might imply.

(To this day, neither Nature nor Science have actually apologized for misleading the public about the ENCODE results. [see Science still doesn't get it ])

The ENCODE Consortium leaders responded in April 2014—eighteen months after their original papers were published.

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G. E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]

In that paper they acknowledge that there are multiple meanings of the word function and their choice of "biochemical" function may not have been the best choice ....
However, biochemical signatures are often a consequence of function, rather than causal. They are also not always deterministic evidence of function, but can occur stochastically.
This is exactly what many scientists have been telling them. Apparently they did not know this in September 2012.

They also include in their paper a section on "Case for Abundant Junk DNA." It summarizes the evidence for junk DNA, evidence that the ENCODE Consortium did not acknowledge in 2012 and certainly didn't refute.

In answer to the question, "What Fraction of the Human Genome Is Functional?" they now conclude that ENCODE hasn't answered that question and more work is needed. They now claim that the real value of ENCODE is to provide "high-resolution, highly-reproducible maps of DNA segments with biochemical signatures associate with diverse molecular functions."
We believe that this public resource is far more important than any interim estimate of the fraction of the human genome that is functional.
There you have it, straight from the horse's mouth. The ENCODE Consortium now believes that you should NOT interpret their results to mean that 80% of the genome is functional and therefore not junk DNA. There is good evidence for abundant junk DNA and the issue is still debatable.

I hope everyone pays attention and stops referring to the promotional hype saying that ENCODE has refuted junk DNA. That's not what the ENCODE Consortium leaders now say about their results.


Casane, D., Fumey, J., et Laurenti, P. (2015) L’apophénie d’ENCODE ou Pangloss examine le génome humain. Med. Sci. (Paris) 31: 680-686. [doi: 10.1051/medsci/20153106023]

Friday, May 23, 2008

Fugu, Pharyngula, and Junk

 
PZ Myers writes about Random Acts of Evolution in the latest issue of Seed magazine. The subtitle says it all.
The idea of humankind as a paragon of design is called into question by the puffer fish genome—the smallest, tidiest vertebrate genome of all.
The genome of the puffer fish (Takifugu rubripes or Fugu rubripes) has about the same number of genes as other vertebrates (20,000) but its genome is only 400 Mb in size [Fugu Genome Project]. This is about 12.5% of the size of mammalian genomes.

THEME

Genomes & Junk DNA

Total Junk so far

    53%
The Fugu Genome Project was initiated by workers who wanted to sequence a vertebrate genome with as little junk DNA as possible in order to determine which sequences are essential in vertebrate genomes. The small size of the fugu genome suggests that more than 80% of our genome is non-essential junk.

Many of you might recall the results of my Junk DNA Poll from last January. In case you've forgotten the results, I'll post them again. The question was: "How much of our genome could be deleted without having any significant effect on our species?" The question was designed to find out whether Sandwalk readers believed in junk DNA or whether they were being persuaded by some scientists to think that most of our genome was essential. (Modern creationists are also promoting the death of junk DNA.) There was some dispute about the interpretation of the question but most readers took it to be a question about the amount of junk DNA.




Astonishingly, almost half of Sandwalk readers think that we need more than half of our genome to survive. This would be a surprise to a puffer fish.

I began a series of postings in order to explain what our genome actually looks like. So far we've determined that about 2.5% is essential and 53% is junk. Now it's time to finish off this particular theme and have another vote.

PZ points out that most of what we call junk DNA is not controversial. It consists of LINEs and SINES, which are (mostly) defective transposons. The pufferfish genome has a lot less of this kind of junk DNA than we do. This accounts for a good deal of the reduction n genome size that we see in modern pufferfish.

PZ also points out that we need to think differently about evolution ...
In the world of genomic housekeeping, the puffer fish is a neatnik who keeps the trash under control, while the rest of us are pack rats hoarding junk DNA.

There's a lot of thought these days going into trying to figure out some adaptive reason for such a sorry state of affairs. None of it is particularly convincing. We'd be better off reconciling ourselves to the notion that much of evolution is random, and that nothing prevents nonfunctional complexity from simply accumulating.
Well said PZ!!1

Watch for a few more postings on the remaining 45% of our genome then get ready to vote again. I'm hoping for a better result next time!


1. I used to know someone named Paul Myers who would never had said such a thing on talk.origins. Any relation?

[Image Credit: The junk DNA icon is from the creationist website Evolution News & Views.]

Sunday, March 27, 2016

Georgi Marinov reviews two books on junk DNA

The December issue of Evolution: Education and Outreach has a review of two books on junk DNA. The reviewer is Georgi Marinov, a name that's familiar to Sandwalk readers. He is currently working with Michael Lynch at Indiana University in Bloomington, Indiana, USA. You can read the review at: A deeper confusion.

The books are ...
The Deeper Genome: Why there is more to the human genome than meets the eye, by John Parrington, (Oxford, United Kingdom: Oxford University Press), 2015. ISBN:978-0-19-968873-9.

Junk DNA: A Journey Through the Dark Matter of the Genome, by Nessa Carey, (New York, United States: Columbia University Press), 2015. ISBN:978-0-23-117084-0.
You really need to read the review for yourselves but here's a few teasers.
If taken uncritically, these texts can be expected to generate even more confusion in a field that already has a serious problem when it comes to communicating the best understanding of the science to the public.
Parrington claims that noncoding DNA was thought to be junk and Georgi replies,
However, no knowledgeable person has ever defended the position that 98 % of the human genome is useless. The 98 % figure corresponds to the fraction of it that lies outside of protein coding genes, but the existence of distal regulatory elements, as nicely narrated by the author himself, has been at this point in time known for four decades, and there have been numerous comparative genomics studies pointing to a several-fold larger than 2% fraction of the genome that is under selective constraint.
I agree. That's a position that I've been trying to advertise for several decades and it needs to be constantly reiterated since there are so many people who have fallen for the myth.

Georgi goes on to explain where Parringtons goes wrong about the ENCODE results. This critique is devastating, coming, as it does, from an author of the most relevant papers.1 My only complaint about the review is that George doesn't reveal his credentials. When he quotes from those papers—as he does many times—he should probably have mentioned that he is an author of those quotes.

Georgi goes on to explain four main arguments for junk DNA: genetic load, the C-value Paradox, transposons (selfish DNA), and modern evolutionary theory. I like this part since it's similar to the Five Things You Should Know if You Want to Participate in the Junk DNA Debate. The audience of this journal is teachers and this is important information that they need to know, and probably don't.

His critique of Nessa Carey's book is even more devastating. It begins with,
Still, despite a few unfortunate mistakes, The Deeper Genome is well written and gets many of its facts right, even if they are not interpreted properly. This is in stark contrast with Nessa Carey’s Junk DNA: A Journey Through the Dark Matter of the Genome. Nessa Carey has a PhD in virology and has in the past been a Senior Lecturer in Molecular Biology at Imperial College, London. However, Junk DNA is a book not written at an academic level but instead intended for very broad audience, with all the consequences that the danger of dumbing it down for such a purpose entails.
It gets worse. Nessa Carey claims that scientists used to think that all noncoding DNA was junk but recent discoveries have discredited that view. Georgi sets her straight with,
Of course, scientists have had a very good idea why so much of our DNA does not code for proteins, and they have had that understanding for decades, as outlined above. Only by completely ignoring all that knowledge could it have been possible to produce many of the chapters in the book. The following are referred to as junk DNA by Carey, with whole chapters dedicated to each of them (Table 3).


The inclusion of tRNAs and rRNAs in the list of “previously thought to be junk” DNA is particularly baffling given that they have featured prominently as critical components of the protein synthesis machinery in all sorts of basic high school biology textbooks for decades, not to mention the role that rRNAs and some of the other noncoding RNAs on that list play in many “RNA world” scenarios for the origin of life. How could something that has so often been postulated to predate the origin of DNA as the carrier of genetic information (Jeffares et al. 1998; Fox 2010) and that must have been of critical importance both before and after that be referred to as “junk”?
You would think that this is something that doesn't have to be explained to biology teachers but the evidence suggests otherwise. One of those teachers recently reviewed Nessa Carey's book very favorably in the journal The American Biology Teacher and another high school teacher reveals his confusion about the subject in the comments to my post [see Teaching about genomes using Nessa Carey's book: Junk DNA].

It's good that Georgi Marinov makes this point forcibly.

Now I'm going to leave you with an extended quote from Georgi Marinov's review. Coming from a young scientist, this is very potent and it needs to be widely disseminated. I agree 100%.
The reason why scientific results become so distorted on their way from scientists to the public can only be understood in the socioeconomic context in which science is done today. As almost everyone knows at this point, science has existed in a state of insufficient funding and ever increasing competition for limited resources (positions, funding, and the small number of publishing slots in top scientific journals) for a long time now. The best way to win that Darwinian race is to make a big, paradigm shifting finding. But such discoveries are hard to come by, and in many areas might actually never happen again—nothing guarantees that the fundamental discoveries in a given area have not already been made. ... This naturally leads to a publishing environment that pretty much mandates that findings are framed in the most favorable and exciting way, with important caveats and limitations hidden between the lines or missing completely. The author is too young to have directly experienced those times, but has read quite a few papers in top journals from the 1970s and earlier, and has been repeatedly struck by the difference between the open discussion one can find in many of those old articles and the currently dominant practices.

But that same problem is not limited to science itself, it seems to be now prevalent at all steps in the chain of transmission of findings, from the primary literature, through PR departments and press releases, and finally, in the hands of the science journalists and writers who report directly to the lay audience, and who operate under similar pressures to produce eye-catching headlines that can grab the fleeting attention of readers with ever decreasing ability to concentrate on complex and subtle issues. This leads to compound overhyping of results, of which The Deeper Genome is representative, and to truly surreal distortion of the science, such as what one finds in Nessa Carey’s Junk DNA.

The field of functional genomics is especially vulnerable to these trends, as it exists in the hard-to-navigate context of very rapid technological changes, a potential for the generation of truly revolutionary medical technologies, and an often difficult interaction with evolutionary biology, a controversial for a significant portion of society topic. It is not a simple subject to understand and communicate given all these complexities while in the same time the potential and incentives to mislead and misinterpret are great, and the consequences of doing so dire. Failure to properly communicate genomic science can lead to a failure to support and develop the medical breakthroughs it promises to deliver, or what might be even worse, to implement them in such a way that some of the dystopian futures imagined by sci-fi authors become reality. In addition, lending support to anti-evolutionary forces in society by distorting the science in a way that makes it appear to undermine evolutionary theory has profound consequences that given the fundamental importance of evolution for the proper understanding of humanity’s place in nature go far beyond making life even more difficult for teachers and educators of even the general destruction of science education. Writing on these issues should exercise the needed care and make sure that facts and their best interpretations are accurately reported. Instead, books such as The Deeper Genome and Junk DNA are prime examples of the negative trends outlined above, and are guaranteed to only generate even deeper confusion.
It's not easy to explain these things to a general audience, especially an audience that has been inundated with false information and false ideas. I'm going to give it a try but it's taking a lot more effort than I imagined.


1. Georgi Marinov is an author on the original ENCODE paper that claimed 80% of our genome is functional (ENCODE Project Consortium, 2012) and the paper where the ENCODE leaders retreated from that claim (Kellis et al., 2014).

ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 48957-74. [doi: 10.1038/nature11247]

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G.E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]