Sandwalk: Anonymous Nature Editors Respond to ENCODE Criticism

Thursday, March 14, 2013

Anonymous Nature Editors Respond to ENCODE Criticism

There are now been four papers in the scientific literature criticizing the way ENCODE leaders hyped their data by claiming that most of our genome is functional [see Ford Doolittle's Critique of ENCODE ]. There have been dozens of blog postings on the same topic.

The worst of the papers were published by Nature—this includes the abominable summary that should never have made it past peer review (Encode Consortium, 2012).

The lead editor on the ENCODE story was Brendan Maher and he promoted the idea that the ENCODE results showed that most of our genome has a function [ENCODE: The human encyclopaedia]

The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes.

But the very next day (Sept. 6, 2012) Brendon Maher got wind of the controversy and started to defend Nature's decisions. He quoted several bloggers, including me [Fighting about ENCODE and junk]. His main defense was ...

ENCODE was conceived of and practised as a resource-building exercise. In general, such projects have a huge potential impact on the scientific community, but they don’t get much attention in the media. The journal editors and authors at ENCODE collaborated over many months to make the biggest splash possible and capture the attention of not only the research community but also of the public at large.

In other words, the editors of Nature thought about this for several months and then decided that it was okay to attack junk DNA because that would make a big splash in the media.

A few days ago (March 12, 2013) the editors of Nature published another response to criticism [Form and Function]. These editors don't identify themselves.

Let's see how they do by analyzing each part of the editorial. Let's begin with the subtitle ...

Although debate over scientific definitions is important, it risks obscuring the real issues.

The real issues are whether most of our genome is functional or not and whether the ENCODE leaders understood the concept of noise and chance associations. I hope that Nature realizes that it really screwed up by allowing stupid definitions of function to obscure those issues, giving rise to the idea that junk DNA was debunked. Let's see if they understand where they went wrong.

Science is at the mercy of its language. It can be difficult for researchers to communicate what most excites them about the beauty, intricacy and complexity of the natural world. And when words fail, debates and arguments often arise.

One enduring debate has been resurrected by ENCODE, the Encyclopedia of DNA Elements — an ongoing multimillion-dollar project to catalogue the functional elements of the human genome. A headline-grabbing claim, first made in this publication last September, was that roughly 80% of human DNA had been ascribed some “biochemical function” thanks to the efforts of more than 440 scientists (The ENCODE Project Consortium Nature 489, 57–74; 2012).

That percentage is remarkably high, in part because of a broad definition of ‘function’. The ENCODE team used the term to include binding by a regulatory protein, or transcription into RNA — activities identified as widespread. But almost immediately, other scientists began to take this definition to task, calling it essentially meaningless.

They got that part right. The immediate reaction to the Nature papers is that the journal made a big mistake by using a silly definition of "function"—one that was bound to be misinterpreted by everyone. Many of us thought (and still think) that the authors actually believed that most of the genome is functional in the classic sense. In other words, it's not at all clear that there's a difference between the ENCODE definition of function and the definition used by everyone else, at least in the minds of the ENCODE leaders.

Some background is useful. Genomes vary dramatically in size — sometimes irrespective of the complexity of the organism. Take, for example, the genome of the marbled lungfish (Protopterus aethiopicus), which clocks in at an excessive 133 billion base pairs. That of the pufferfish (Takifugu rubripes), by contrast, sports only 365 million.

For the ENCODE paper to suggest that humans have little genomic redundancy implies that the 3.2-billion-base-pair human genome hits a sweet spot in efficiency. Critics suggested, sometimes sharply, that this was both anthropocentric and ignorant of how evolution shapes the genome. Much of human DNA is non-functional, they insisted. It is a relic of history, garbled by mutation and essentially junk.

The most recent formal critique, published this week, suggests that similar analyses on organisms with very large and very small genomes would probably find the same density of functional elements (W. F. Doolittle Proc. Natl Acad. Sci. USA http://doi.org/kr3; 2013). This investigation has yet to be done.

This is Ford Doolittle's critique but there are many others. Clearly, it's important to bring up this issue (variations in C-value) when discussing the possibility of junk DNA. I don't recall that this discussion took place in the original Nature papers or editorial comments. In fact. I don't recall any substantive discussion of junk or the possibility of "non-function" in any of the paper I read. Can anyone else find a reference?

The debate over ENCODE’s definition of function retreads some old battles, dating back perhaps to geneticist Susumu Ohno’s coinage of the term junk DNA in the 1970s. The phrase has had a polarizing effect on the life-sciences community ever since, despite several revisions of its meaning. Indeed, many news reports and press releases describing ENCODE’s work claimed that by showing that most of the genome was ‘functional’, the project had killed the concept of junk DNA. This claim annoyed both those who thought it a premature obituary and those who considered it old news.

There is a valuable and genuine debate here. To define what, if anything, the billions of non-protein-coding base pairs in the human genome do, and how they affect cellular and system-level processes, remains an important, open and debatable question. Ironically, it is a question that the language of the current debate may detract from. As Ewan Birney, co-director of the ENCODE project, noted on his blog: “Hindsight is a cruel and wonderful thing, and probably we could have achieved the same thing without generating this unneeded, confusing discussion on what we meant and how we said it” (see go.nature.com/8xorge).

Excellent! I'm glad to see that the editors are admitting some responsibility even though they are shifting most of the blame to "big talker" Ewan Birney and not to their reviewers (or themselves). On the other hand, to claim that junk DNA is still an "open and debatable question" seems like a bit of a cop-out. Yes, it's "debatable" but the proponents of junk DNA will probably win any debates. Rumors of the death of junk DNA are not "premature" and they are not "old news." Most of our genome is junk whether the ENCODE leaders believe it or not. It's a fact even if the editors of Nature are skeptical.

Any knowledgeable reviewer would have said the same thing. They would have pointed out that the discussions of function have to include all of the data suggesting that most of these sites are nothing nonfunctional noise. After all, we went through this same debate in 2007 when the preliminary ENCODE data was published. Ignoring this possibility is not good science. Good scientists think of ways their data could be falsified and they give appropriate credit to other interpretations that disagree with their own. It looks like the ENCODE scientists learned nothing in 2007 [see The ENCODE Data Dump and the Responsibility of Science Journalists for a discussion of what happened in 2007.]

We didn't see very much of that kind of good science in Nature last September and I'm still not seeing much of it here.

The ferocity of the criticism has no doubt been fuelled by dissatisfaction over ENCODE’s top-down, big-science approach and the large share of research funds that it has attracted. Many biologists have called the 80% figure more a publicity stunt than a statement of scientific fact. Nevertheless, ENCODE leaders say, the data resources that they have provided have been immensely popular. So far, papers that use the data have outnumbered those that take aim at the definition of function.

If you read Dan Graur's critique you'll see that the data resources are difficult to use and that they are contaminated by an emphasis on function. I think most biologists would be happy if the huge amount of money spent on the project really did yield useful databases. That's by no means certain.

And just because a lot of people might be using the data is no excuse for the publicity hype that misled thousands of scientists and all of the general public. It will take a long tome to undo the damage caused by Nature (and Science) and the ENCODE leaders. Most people now believe that our genome is packed with important regulatory features and that junk DNA no longer exists. I would be more impressed with this editorial if they made it clear that such conclusions were not supported by the ENCODE data.

The debate sounds like a matter of definitional differences. But to dismiss it as semantics minimizes the importance of words and definitions, and of how they are used to engage in research and to communicate findings. ENCODE continues to collect data and to characterize what the 3.2 billion base pairs might be doing in our genome and whether that activity is important. If a better word than ‘function’ is needed to describe those activities, so be it. Suggestions on a postcard please.

The editors are correct. This isn't just about semantics. Do they really need help in defining the abundant binding sites and transcripts that ENCODE found? If so, then clearly they haven't learned their lesson.

As Martin Hafner says in the second comment on the Nature website ...

'The English word to describe those activities you mention in your last paragraph is noise.'

Funny that the editors never discuss this possibility, isn't it?

[Photo Credit: The photograph of Brendan Maher was taken by Jason Varney.]

The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. (E. Birney, corresponding author)

59 comments:

SchenckThursday, March 14, 2013 3:27:00 PM
It sounds like they're basically doubling down on it, they're saying, "Oy, lots of great debates can be had over semantics, and it can be important. By the way, 80% of the genome is functional, this has generated lots of papers, and only a few disagree, let us know when you turn those thought-experiments into real experiments".
ReplyDelete
Replies
Donald ForsdykeThursday, March 14, 2013 6:00:00 PM
On this topic I tried to add a comment, in polite style, to the Nature "Form and Function" editorial, but was told that: "This account has been banned from commenting due to posting of comments classified as inappropriate or other violations of our Terms of Service." This is most mysterious. I rechecked the "Terms of Service," but the mystery remained. Perhaps someone has cracked my account and is submitting in my name. However, for anyone who might be interested, a fellow Canadian, Professor Gregory, has kindly allowed a posting of a version of the comment at his site.

ReplyDelete
Replies
bmaher.sciwriterFriday, March 15, 2013 12:24:00 AM
Dear Larry,
I'm not sure why you want my grinning mug on your posts, but if you must, could you please credit the photographer, Jason Varney http://www.varneyphoto.com/. He's a wonderful guy from Philly, who gave me permission to use it some time ago. I've tried to credit him in platforms that allow it, but perhaps it got lost. It wouldn't be a bad idea to ask him for permission, either. He's a nice guy and will likely say, yes.

I know Nature's editorial structure is opaque, but I was not the lead editor on the ENCODE story. I work on the news team, and although there is an obvious relationship with the back half editors, we are in practice separated by literal and figurative walls to allow (and even encourage) us on the news team to be tough and dispassionate about the research in our journal and others. I can only do my best to be as objective as possible. I'll continue to follow your blog as I continue to report on whether this behemoth project has worth the time and investment.

Also, I noticed a point mutation has crept into my name. It's Brendan with an 'a'.
Cheers!
ReplyDelete
Replies
Joe FelsensteinFriday, March 15, 2013 2:31:00 AM
I just noticed that this year's meeting of the Society for Molecular Biology and Evolution (July 7-11 in Chicago) will have a symposium on "Where Did The Junk Go?". Sounds like it would be a lively airing of the controversy. But wait ... the description of the symposium says that

With the completion of the current ENCODE project junk DNA effectively disappeared because there's no useless DNA in the genomes no more [sic].

Oy vey.
ReplyDelete
Replies
TheOtherJimFriday, March 15, 2013 6:30:00 AM
Interesting podcast on the subject. It interviews Dan Graur and M<icheal Eisen, and discussing the whole ENCODE media bomb.

http://www.mendelspod.com/podcast/debating-encode-with-dan-graur-and-michael-eisen
ReplyDelete
Replies
UnknownFriday, March 15, 2013 10:41:00 AM
This is rather unfortunate...

Reading an article posted by JAMA yesterday. http://jama.jamanetwork.com/article.aspx?articleid=1666972#ref-jvp130033-5

Crossing the Omic Chasm
A Time for Omic Ancillary Systems

4th paragraph states,

"Omic data are different. An individual's germline genetic sequence changes little over a lifetime, but understanding of that sequence is changing rapidly. For years the DNA between coding regions was called “junk,” but it is now known that this DNA plays an important role in gene regulation.5- 6 The 1000 Genomes project has identified tens of millions of different genomic variants; the clinical significance of these variants is mostly unknown, but current understanding is rapidly changing. Unlike serum sodium levels, the clinical implications of NGS obtained today will keep changing for years as knowledge evolves."

smh
ReplyDelete
Replies
Larry MoranFriday, March 15, 2013 10:57:00 AM
@Georgi Marinov

You said,

The truth is that while this is not arguing over semantics, debunking junk DNA was never the goal of the ENCODE consortium, the goal was to find "functional elements" (which, unfortunately, has turned not to be nearly as straightforward as initially hoped for).

It's true that debunking junk DNA was not a well-defined goal of the consortium. Their goal was to attribute function to as much of the genome as possible. You are correct when you say that this turned out to be difficult—especially in light of abundant evidence for junk.

In your opinion, what's the best paper that discusses this difficulty and addresses the problems of noise and spurious transcripts? There should be at least one paper that tries to distinguish between real biological function and accidental elements, right?

It's only the main paper and the editorials that really feature those statements (and if you know what is meant by "function" there, they would not sound nearly as alarming), the companion papers are mostly focused and often very useful functional genomic studies that do not touch on this subject.

With all due respect, this is nonsense. In light of the 2007 criticism of the pilot project, every single paper should have been aware of the problem of distinguishing between real biological function and nonfunctional effects. In the absence of that critical analysis of their own data, I can only conclude that they meant to convey the idea of real biological function associated with most of their elements.

There were six different articles written by ENCODE Consortium leaders who summarized their results. Every single one of them emphasized the biological importance of their elements and none of them even mentioned junk DNA or the possibility that they could be looking at artifacts.

Do you think they were completely unaware of the criticism that was about to be leveled at their interpretation of their own data? Isn't it more likely that they actually believed that they were annotating true biological function and debunking junk DNA? Some of the leaders have actually said this in interviews.
ReplyDelete
Replies
AnonymousFriday, March 15, 2013 12:16:00 PM
Just wanna let you all know, that I'm taking screen pictures of all the clever guys' comments here who are so convinced that the ENOCODE project is so wrong and the most of our DNA remains non-functional. Doing it just in case in a couple of years or even earlier, it will turn out that it is the other way around. I just wanna have some arrows to shoot at some of the smart asses here.

Are you eligible for a retirement yet Larry?
ReplyDelete
Replies
John HarshmanFriday, March 15, 2013 6:17:00 PM
I have to say that there's never been a paper published under my name (and some of them are what passes for "big science" in systematics) that I haven't at least read and commented on before publication. If such a manuscript had said anything I seriously disagreed with (and some of them have), I would have (and have) mentioned that in my comments and requested it be changed (and almost always it has been). And I would consider this minimal professionalism.

Thoughts?
ReplyDelete
Replies
SPARCTuesday, February 06, 2018 8:23:00 AM
Since my comment you mentioned above and all others disappeared from Natures web pages I've added it via PubMed commons to the article's Pubmed page
Martin
ReplyDelete
Replies

Add comment