More Recent Comments

Thursday, August 07, 2014

The Function Wars: Part IV

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.

Stephan Jay Gould (1982)
This is my fourth post on the function wars.

The first post in this series covered the various definitions of "function" [Quibbling about the meaning of the word "function"]. In the second post I tried to create a working definition of "function" and I discussed whether active transposons count as functional regions of the genome or junk [The Function Wars: Part II]. I claim that junk DNA is DNA that is nonfunctional and it can be deleted from the genome of an organism without affecting its survival, or the survival of its descendants.

In the third post I discussed a paper by Rands et al. (2014) presenting evidence that about 8% of the human genome is conserved [The Function Wars: Part III]. This is important since many workers equate sequence conservation with function. It suggests that only 8% of our genome is functional and the rest is junk. The paper is confusing and I'm still not sure what they did in spite of the fact that the lead author (Chris Rands) helped us out in the comments. I don't know what level of sequence similarity they counted as "constrained." (Was it something like 35% identity over 100 bp?)

My position if is that there's no simple definition of function but sequence conservation is a good proxy. It's theoretically possible to have selection for functional bulk DNA that doesn't depend on sequence but, so far, there are no believable hypothesis that make the case. It is wrong to arbitrarily DEFINE function in terms of selection (for sequence) because that rules out all bulk DNA hypotheses by fiat and that's not a good way to do science.

So, if the Rands et al. results hold up, it looks like more that 90% of our genome is junk.

Let's see how a typical science writer deals with these issues. The article I'm selecting is from Nature. It was published online yesterday (Aug. 6, 2014) (Woolston, 2014). The author is Chris Woolston, a freelance writer with a biology background. Keep in mind that it was Nature that started the modern functions wars by falling hook-line-and-sinker for the ENCODE publicity hype. As far as I know, the senior editors have not admitted that they, and their reviewers, were duped.

& Junk DNA
A science writer covers the function wars

It must be very difficult to cover this story, although that hasn't prevented some science writers (e.g. Elizabeth Pennisi) from trying. (She has been spectacularly unsuccessful.)

Here's what Chris Woolston writes.
Just how much of our genome serves a purpose anyway? A recent study reignited the debate on this, particularly on social media. ...

After comparing the genomes of 12 different mammals (including humans, mice and pandas), researchers at the University of Oxford, UK, concluded that only about 8.2% of the human genome is shaped by natural selection. The rest, they argue, is non-functional. Observers noted the large difference between this estimate and a previous claim by the ENCODE (Encyclopedia of DNA Elements) Project that 80% of the genome is biochemically active. Patrik D'haeseleer, a computational biologist at Lawrence Livermore National Laboratory, California, tweeted “only between 8% and 80% of human #genome is functional. Glad we've got that sorted out.” At the heart of the issue are differing definitions of 'function'. Erick Loomis, an epigeneticist at Imperial College London, tweeted: “Maybe we should stop using 'functional' if we can't find a common definition.”
I don't know why it was necessary to quote Patrik D'haesseleer—he clearly doesn't understand the problem. Otherwise, this is a pretty good description of the issue.

I understand the frustration of people like Eric Loomis. (What the heck is an "epigeneticist"?). I imagine that most scientists are pretty tired with reading about the function wars. But avoiding the word "function" isn't going to make the problem go away. Something other than semantics is at stake. For example, we would still have to deal with the question of junk even if we studiously avoided the word "function."

Besides, as any biologist (including epigeneticists?) should know, we can't agree on the definitions of all kinds of things like "gene," "species," "evolution," "epigenetics," and "The Central Dogma of Molecular Biology," and that doesn't prevent us from talking about them.

Chris Woolston continues ...
The attempts to define genome function have been mired in controversy since ENCODE published its '80%' finding in 2012 (Nature 489, 57–74; 2012). A subsequent paper from the same consortium a few months ago also met with derision, partly because it didn't even speculate on the fraction of the genome that might have a purpose (M. Kellis et al. Proc. Natl Acad. Sci. USA 111, 6131–6138; 2014). That paper did, however, argue that evolutionary, genetic and biochemical data need to be taken into account to work out the answer.

In the latest report, the Oxford researchers responded to that call by focusing on evolutionary data. They looked for parts of the genome that showed low rates of mutation, a sign that those regions were conserved through natural selection. They classified the sequences — and only those sequences — as functional, a definition that is at odds with that used by ENCODE, which equated biochemical activity with functionality.
I think this is a pretty accurate summary of the problem.
The shifting definitions confused some readers. "I don't get this paper," tweeted John Greally, an epigeneticist at the Albert Einstein College of Medicine of Yeshiva University in New York City. "Functional=conserved, but discussion acknowledges that function can be in non-conserved sequences?" When reached for further comment, Greally says that he "gets" the paper now, but that he is "still frustrated by the way this debate is causing so much unproductive friction".
I'm with Greally except that I still don't "get" the Rands et al. paper. I don't understand what they did and how they distinguish between "constrained" and "conserved" sequences. Nevertheless, there are many papers that agree with the general conclusion. About 5-10% of the human genome is conserved. (Another "epigeneticist"?)

When Greally says he is frustrated, he is not alone. I too, regret that there have been so many papers discussing "function." The semantic debate is distracting us from the real issue. As soon as ENCODE opponents started debating the meaning of the word "function" they conceded that there IS a debate and ENCODE may be right after all.
The paper, Greally says, missed an opportunity to explore why certain sequences — especially those known as transcription factor binding sites — are under such low evolutionary pressure, even though they presumably have important biological roles. Instead, he adds, the authors emphasized the supposed discrepancy with ENCODE. "The paper appears to be in use as a bludgeon with which to hammer the ENCODE project, not necessarily by the authors, but by others," he suggests.
I wasn't aware of the fact that transcription factor binding sites are "under low evolutionary pressure." Is this true? Is the consensus binding site for human transcription factors different than the binding site for the orthologous mouse transcription factor? I didn't think there was a difference for most transcription factors.

In any case, I don't think the Rands et al. study is capable of recognizing conserved transcription factor binding sites unless they are embedded in a fairly large stretch of additional conserved sequence.

And, yes, the paper was intended as a criticism of the ENCODE publicity hype and their ridiculous claim that 80% of our genome is functional. We need more bludgeons because there are still some biologists who don't get it.
One outspoken critic of ENCODE is Dan Graur, who studies molecular evolutionary bioinformatics at the University of Houston, Texas. He publicly celebrated the new paper by tweeting: "What an amazing birthday present." In a follow-up interview, he said that the paper refutes ENCODE's claims, and added that it is "idiotic" to suggest that a part of the genome could be functional if it didn't respond to pressure from natural selection.
Hmmm ... I have suggested that parts of the genome are functional even though their sequences are not conserved by pressure from natural selection. These are spacer sequences such as those required to separate some transcription factor binding sites and intron cleavage recognition sites. I don't think this is necessarily "idiotic" but I'll have to ask Dan what he thinks of my idea. I also don't think that the various bulk DNA hypotheses are necessarily idiotic. True, there are some idiots who advocate a role for bulk DNA, but there are also some very smart people who have contributed to the debate.
ENCODE member Ross Hardison, a molecular biologist at Pennsylvania State University, called the latest paper "elegant” even though it took a different view of functionality. The Oxford group's findings don't contradict those of ENCODE, he says, because the project never estimated the proportion of the genome that would be conserved through natural selection. He added that it will probably take a combination of approaches to determine which parts of the genome we can't live without. "I expect that with more experiments and analyses, estimates of the proportion of the human genome that is functional will approach some convergence, even though they are pretty far apart now."
The Rands et al. paper directly contradicts the claim by the ENCODE consortium that 80% of our genome is functional.

The ENCODE Consortium did look at conservation and the results were reported in the summary paper (ENCODE, 2012). Their Figure 1 looks at the conservation of the elements they mapped. They conclude that a significant percentage of these elements show evidence of selection, especially in primate lineages. They speculate that these "functional" elements arose relatively recently in the human lineage. They conclude (page 71) ...
Importantly, for the first time we have sufficient statistical power to assess the impact of negative selection on primate-specific elements, and all ENCODE classes display evidence of negative selection in these unique-to-primate elements. Furthermore, even with our most conservative estimate of functional elements (8.5% of putative DNA/protein binding regions) and assuming that we have already sampled half of the elements from our transcription factor and cell-type diversity, one would estimate that at a minimum 20% (17% from protein binding and 2.9% protein coding gene exons) of the genome participates in these specific functions, with the likely figure significantly higher.
Their estimate of 20% constrained sequence is contradicted by the Rand et al. paper. Ross Hardison is going to be disappointed. There isn't going to be any "convergence" or middle ground in this debate.

For those of you who are truly interested, Ross Hardison has a podcast where he defends ENCODE against Dan Graur [Debating ENCODE Part II: Ross Hardison, Penn St.]. The interviewer never asks the big question, "How much of our genome is junk?", but Hardison tells us that he has never been comfortable with the idea of junk DNA.

I think that Chris Woolston did a pretty good job of explaining this controversy to the general audience of Nature readers. My only complaint is that he was a little too even-handed. The ENCODE Consortium is clearly on the losing side of the debate about junk DNA and that is the real story. There is a massive amount of evidence supporting the idea that most of our genome is junk.

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57-74. [doi: 10.1038/nature11247]

Gould, S. J. (1982) Darwinism and the expansion of evolutionary theory. Science 216, 380-387. [doi: 10.1126/science.7041256]

Rands, C.M., Meader, S., Ponting, C.P. and Lunter, G. (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLoS genetics 10, e1004525. [doi: 10.1371/journal.pgen.1004525]

Woolston, C. (2014) Furore over genome function. Nature 512:9. [doi: 10.1038/512009e]


roger shrubber said...

" I too, regret that there have been so many papers discussing "function." The semantic debate is distracting us from the real issue."
Hmmm. I think you're loving it. And I think you enjoy holding people's feet to the fire when they say stupid things, perhaps because they have been swept along with some trend or hype, or perhaps because they never really learned to be a critical scientist.
Or if you don't actually enjoy it, maybe you do it from some sense of responsibility as a teacher. Either way, good on ya.

Larry Moran said...

I can regret that it happened but still love doing it. For example, I regret that anyone listens to Intelligent Design Creationists but since they do ....

Georgi Marinov said...

I wasn't aware of the fact that transcription factor binding sites are "under low evolutionary pressure." Is this true? Is the consensus binding site for human transcription factors different than the binding site for the orthologous mouse transcription factor? I didn't think there was a difference for most transcription factors.

Transcription factor binding sites do seem to turn over quite rapidly in vertebrate lineages.

However, this is well explained in a framework in which neutral process play a major role, see this paper for more details:

Mikkel Rumraket Rasmussen said...

I'm surprised this wasn't known to you Larry, it should be pretty easy to elucidate from the very same principles of transcription you have written about on this blog often. Isn't the main constraint operating on a transcription factor binding site, that it is sufficiently large that it is unlikely to look like another piece of the genome, to prevent transcriptional interference?

It would seem to me that such a binding site would be much* more tolerant of mutation, as long as it doesn't start looking too much like other binding sites, or substantially erodes the binding affinity of the transcription factor.

I have even used the fact that some of these binding sites seem to drift weakly over time, to show that they are at some level incompatible with the ID argument of "common design", in arguments with creationists. The argument goes that the designer would not need to slowly change these binding sites in different species, he could have simply designed a large set of binding sites and used the same set in all his different creations. The fact that they aren't the same, but actually slowly evolve, is evidence against the idea that the designer is re-using his "designs" in different organisms. It's a falsification of the "common design" retort.

* At least more tolerant than I think most coding regions are.

Larry Moran said...

I don't have a problem with mutations that affect the binding site or its position. What I was questioning is the idea that the protein evolves rapidly so that it recognizes different consensus sequences in different species. I have difficult believing that this happens frequently in less than one hundred million years. Does anyone have an example of homologous transcription factors from different mammals that bind to different sequences? How many examples are there?

Unknown said...

Why should sequence conservation, or transcription, be necessary for a segment of DNA to have a function? Theoretically at least, some segments of DNA could be spacers, part of the scaffolding needed to physically construct a particular protein. They would not be transcribed and their sequences would not be conserved (only the length would matter) but they would still have a function.

TheOtherJim said...

I am interested in hearing of any, as well. There is only one example I can think of, and it is a bit of an unusual case.