Stephan Jay Gould (1982)This is my fourth post on the function wars.
The first post in this series covered the various definitions of "function" [Quibbling about the meaning of the word "function"]. In the second post I tried to create a working definition of "function" and I discussed whether active transposons count as functional regions of the genome or junk [The Function Wars: Part II]. I claim that junk DNA is DNA that is nonfunctional and it can be deleted from the genome of an organism without affecting its survival, or the survival of its descendants.
In the third post I discussed a paper by Rands et al. (2014) presenting evidence that about 8% of the human genome is conserved [The Function Wars: Part III]. This is important since many workers equate sequence conservation with function. It suggests that only 8% of our genome is functional and the rest is junk. The paper is confusing and I'm still not sure what they did in spite of the fact that the lead author (Chris Rands) helped us out in the comments. I don't know what level of sequence similarity they counted as "constrained." (Was it something like 35% identity over 100 bp?)
My position if is that there's no simple definition of function but sequence conservation is a good proxy. It's theoretically possible to have selection for functional bulk DNA that doesn't depend on sequence but, so far, there are no believable hypothesis that make the case. It is wrong to arbitrarily DEFINE function in terms of selection (for sequence) because that rules out all bulk DNA hypotheses by fiat and that's not a good way to do science.
So, if the Rands et al. results hold up, it looks like more that 90% of our genome is junk.
Let's see how a typical science writer deals with these issues. The article I'm selecting is from Nature. It was published online yesterday (Aug. 6, 2014) (Woolston, 2014). The author is Chris Woolston, a freelance writer with a biology background. Keep in mind that it was Nature that started the modern functions wars by falling hook-line-and-sinker for the ENCODE publicity hype. As far as I know, the senior editors have not admitted that they, and their reviewers, were duped.
& Junk DNAA science writer covers the function wars
It must be very difficult to cover this story, although that hasn't prevented some science writers (e.g. Elizabeth Pennisi) from trying. (She has been spectacularly unsuccessful.)
Here's what Chris Woolston writes.
Just how much of our genome serves a purpose anyway? A recent study reignited the debate on this, particularly on social media. ...I don't know why it was necessary to quote Patrik D'haesseleer—he clearly doesn't understand the problem. Otherwise, this is a pretty good description of the issue.
After comparing the genomes of 12 different mammals (including humans, mice and pandas), researchers at the University of Oxford, UK, concluded that only about 8.2% of the human genome is shaped by natural selection. The rest, they argue, is non-functional. Observers noted the large difference between this estimate and a previous claim by the ENCODE (Encyclopedia of DNA Elements) Project that 80% of the genome is biochemically active. Patrik D'haeseleer, a computational biologist at Lawrence Livermore National Laboratory, California, tweeted “only between 8% and 80% of human #genome is functional. Glad we've got that sorted out.” At the heart of the issue are differing definitions of 'function'. Erick Loomis, an epigeneticist at Imperial College London, tweeted: “Maybe we should stop using 'functional' if we can't find a common definition.”
I understand the frustration of people like Eric Loomis. (What the heck is an "epigeneticist"?). I imagine that most scientists are pretty tired with reading about the function wars. But avoiding the word "function" isn't going to make the problem go away. Something other than semantics is at stake. For example, we would still have to deal with the question of junk even if we studiously avoided the word "function."
Besides, as any biologist (including epigeneticists?) should know, we can't agree on the definitions of all kinds of things like "gene," "species," "evolution," "epigenetics," and "The Central Dogma of Molecular Biology," and that doesn't prevent us from talking about them.
Chris Woolston continues ...
The attempts to define genome function have been mired in controversy since ENCODE published its '80%' finding in 2012 (Nature 489, 57–74; 2012). A subsequent paper from the same consortium a few months ago also met with derision, partly because it didn't even speculate on the fraction of the genome that might have a purpose (M. Kellis et al. Proc. Natl Acad. Sci. USA 111, 6131–6138; 2014). That paper did, however, argue that evolutionary, genetic and biochemical data need to be taken into account to work out the answer.I think this is a pretty accurate summary of the problem.
In the latest report, the Oxford researchers responded to that call by focusing on evolutionary data. They looked for parts of the genome that showed low rates of mutation, a sign that those regions were conserved through natural selection. They classified the sequences — and only those sequences — as functional, a definition that is at odds with that used by ENCODE, which equated biochemical activity with functionality.
The shifting definitions confused some readers. "I don't get this paper," tweeted John Greally, an epigeneticist at the Albert Einstein College of Medicine of Yeshiva University in New York City. "Functional=conserved, but discussion acknowledges that function can be in non-conserved sequences?" When reached for further comment, Greally says that he "gets" the paper now, but that he is "still frustrated by the way this debate is causing so much unproductive friction".I'm with Greally except that I still don't "get" the Rands et al. paper. I don't understand what they did and how they distinguish between "constrained" and "conserved" sequences. Nevertheless, there are many papers that agree with the general conclusion. About 5-10% of the human genome is conserved. (Another "epigeneticist"?)
When Greally says he is frustrated, he is not alone. I too, regret that there have been so many papers discussing "function." The semantic debate is distracting us from the real issue. As soon as ENCODE opponents started debating the meaning of the word "function" they conceded that there IS a debate and ENCODE may be right after all.
The paper, Greally says, missed an opportunity to explore why certain sequences — especially those known as transcription factor binding sites — are under such low evolutionary pressure, even though they presumably have important biological roles. Instead, he adds, the authors emphasized the supposed discrepancy with ENCODE. "The paper appears to be in use as a bludgeon with which to hammer the ENCODE project, not necessarily by the authors, but by others," he suggests.I wasn't aware of the fact that transcription factor binding sites are "under low evolutionary pressure." Is this true? Is the consensus binding site for human transcription factors different than the binding site for the orthologous mouse transcription factor? I didn't think there was a difference for most transcription factors.
In any case, I don't think the Rands et al. study is capable of recognizing conserved transcription factor binding sites unless they are embedded in a fairly large stretch of additional conserved sequence.
And, yes, the paper was intended as a criticism of the ENCODE publicity hype and their ridiculous claim that 80% of our genome is functional. We need more bludgeons because there are still some biologists who don't get it.
One outspoken critic of ENCODE is Dan Graur, who studies molecular evolutionary bioinformatics at the University of Houston, Texas. He publicly celebrated the new paper by tweeting: "What an amazing birthday present." In a follow-up interview, he said that the paper refutes ENCODE's claims, and added that it is "idiotic" to suggest that a part of the genome could be functional if it didn't respond to pressure from natural selection.Hmmm ... I have suggested that parts of the genome are functional even though their sequences are not conserved by pressure from natural selection. These are spacer sequences such as those required to separate some transcription factor binding sites and intron cleavage recognition sites. I don't think this is necessarily "idiotic" but I'll have to ask Dan what he thinks of my idea. I also don't think that the various bulk DNA hypotheses are necessarily idiotic. True, there are some idiots who advocate a role for bulk DNA, but there are also some very smart people who have contributed to the debate.
ENCODE member Ross Hardison, a molecular biologist at Pennsylvania State University, called the latest paper "elegant” even though it took a different view of functionality. The Oxford group's findings don't contradict those of ENCODE, he says, because the project never estimated the proportion of the genome that would be conserved through natural selection. He added that it will probably take a combination of approaches to determine which parts of the genome we can't live without. "I expect that with more experiments and analyses, estimates of the proportion of the human genome that is functional will approach some convergence, even though they are pretty far apart now."The Rands et al. paper directly contradicts the claim by the ENCODE consortium that 80% of our genome is functional.
The ENCODE Consortium did look at conservation and the results were reported in the summary paper (ENCODE, 2012). Their Figure 1 looks at the conservation of the elements they mapped. They conclude that a significant percentage of these elements show evidence of selection, especially in primate lineages. They speculate that these "functional" elements arose relatively recently in the human lineage. They conclude (page 71) ...
Importantly, for the first time we have sufficient statistical power to assess the impact of negative selection on primate-specific elements, and all ENCODE classes display evidence of negative selection in these unique-to-primate elements. Furthermore, even with our most conservative estimate of functional elements (8.5% of putative DNA/protein binding regions) and assuming that we have already sampled half of the elements from our transcription factor and cell-type diversity, one would estimate that at a minimum 20% (17% from protein binding and 2.9% protein coding gene exons) of the genome participates in these specific functions, with the likely figure significantly higher.Their estimate of 20% constrained sequence is contradicted by the Rand et al. paper. Ross Hardison is going to be disappointed. There isn't going to be any "convergence" or middle ground in this debate.
For those of you who are truly interested, Ross Hardison has a podcast where he defends ENCODE against Dan Graur [Debating ENCODE Part II: Ross Hardison, Penn St.]. The interviewer never asks the big question, "How much of our genome is junk?", but Hardison tells us that he has never been comfortable with the idea of junk DNA.
I think that Chris Woolston did a pretty good job of explaining this controversy to the general audience of Nature readers. My only complaint is that he was a little too even-handed. The ENCODE Consortium is clearly on the losing side of the debate about junk DNA and that is the real story. There is a massive amount of evidence supporting the idea that most of our genome is junk.
ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57-74. [doi: 10.1038/nature11247]
Gould, S. J. (1982) Darwinism and the expansion of evolutionary theory. Science 216, 380-387. [doi: 10.1126/science.7041256]
Rands, C.M., Meader, S., Ponting, C.P. and Lunter, G. (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLoS genetics 10, e1004525. [doi: 10.1371/journal.pgen.1004525]
Woolston, C. (2014) Furore over genome function. Nature 512:9. [doi: 10.1038/512009e]