After an initial pilot phase, ENCODE scientists started applying their methods to the entire genome in 2007. Now that phase has come to a close, signalled by the publication of 30 papers, in Nature, Genome Research and Genome Biology. The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes.I expect encyclopedias to be much more accurate than this.
As most people know by now, there are many of us who challenge the implication that 80% of the genome has a function (i.e it's not junk).1 We think the Consortium was not being very scientific by publicizing such a ridiculous claim.
The main point of Maher's article was that the ENCODE results reveal a huge network of regulatory elements controlling expression of the known genes. This is the same point made by the ENCODE researchers themselves. Here's how Brendan Maher expressed it.
The real fun starts when the various data sets are layered together. Experiments looking at histone modifications, for example, reveal patterns that correspond with the borders of the DNaseI-sensitive sites. Then researchers can add data showing exactly which transcription factors bind where, and when. The vast desert regions have now been populated with hundreds of thousands of features that contribute to gene regulation. And every cell type uses different combinations and permutations of these features to generate its unique biology. This richness helps to explain how relatively few protein-coding genes can provide the biological complexity necessary to grow and run a human being.I think that much of this hype comes from a problem I've called The Deflated Ego Problem. It arises because many scientists were disappointed to discover that humans have about the same number of genes as many other species yet we are "obviously" much more complex than a mouse or a pine tree. There are many ways of solving this "problem." One of them is to postulate that humans have a much more sophisticated network of control elements in our genome. Of course, this ignores the fact that the genomes of mice and trees are not smaller than ours.
Brendan Maher became aware of the controversy in the hours following publication of the ENCODE results. He published a follow-up article the next day [Fighting about ENCODE and junk]. This is the article that Ed Yong and others have pointed to as an example of responsible journalism. In fact, Ed Yong refers to it as ...
The ENCODE reactions have come thick and fast, and Brendan Maher has written the best summary of them. I’m not going to duplicate his sterling efforts. Head over to Nature’s blog for more.Let's look at this "sterling effort."
... several critics have challenged some of the most prominently reported claims in the papers, the way their publication was handled and the indelicate use of the word ‘junk’ on some material promoting the research.We understand. After long and thorough discussions Brendan Maher decided to report the misleading figure exactly as Ewan Birney intended without highlighting any of the problems.
First up was a scientific critique that the authors had engaged in hyperbole. In the main ENCODE summary paper, published in Nature, the authors prominently claim that the ENCODE project has thus far assigned “biochemical functions for 80% of the genome”. I had long and thorough discussions with Ewan Birney about this figure and what it actually meant, and it was clear that he was conflicted about reporting it in the paper’s abstract.
It’s a big number, to be sure. The protein-encoding portion of the genome — that which has historically been considered the most important part— represents a little more than 1%, and to imply that they found similarly important and interesting functions for another 79% is an extraordinary claim. Birney had said to me and reiterates in a Q&A-style blog post that it is also a loose interpretation of the word ‘functional’ that encompassed many categories of biochemical activity, from the very broad — such as actively producing or ‘transcribing’ RNA — to being attached to some sort of transcription-factor protein, all the way down to that narrow range of protein-encoding DNA within the 1%.No knowledgeable scientist ever said that only 1% of our genome was functional. It's extremely annoying that journalists keep repeating stuff like this as though none of us ever knew about all the other functional parts of the genome that had been solidly proven decades before anyone ever dreamed of ENCODE. This is part of the problem.
But what "defense" is Brendan Maher actually mounting here? All he's saying is that Ewan Birney invented a ridiculous definition of function and that many journalists fell for it.
But hold on, said a number of genome experts: most of that activity isn’t particularly specific or interesting and may not have an impact on what makes a human a human (or what makes one human different from another). A blog post by Ed Yong discusses some of these critiques. It was already known, for example, that vast portions of the genome are transcribed into RNA. A small amount of that RNA encodes protein, and some serves a regulatory role, but the rest of it is chock-full of seemingly nonsensical repeats, remnants of past viruses and other weird little bits that shouldn’t serve a purpose.Exactly. Scientists knew that, and much more. Why didn't science journalists also know that?
The paper does drill down somewhat into what the authors mean by functional elements. And Birney does the same in his blog. Excluding all but the sites where there is very probable active binding by a regulatory protein, “we see a cumulative occupation of 8% of the genome,” he writes. Add to that the 1% of protein-encoding DNA and you get 9%.Genes make up about 20% of our genome (exons plus introns). There are about 25,000 known genes. Birney is saying that the ENCODE project identified 256,000,000 bp (8%) of sequence that's required for regulating gene expression. That's roughly 10,000 bp of sequence for every gene. Since the typical transcription factor binding site is 6-8 bp, this means at least 1000 transcription factor binding sites are controlling each gene.
I'd like to know of any gene where this kind of complex regulation has been demonstrated. Does it apply to the thousand of genes encoding basic metabolic enzymes like those of the citric acid cycle? Does it apply to all of the genes for ribosomal proteins or all of the known tRNA genes?
This doesn't make sense but I excuse Brendan Maher and other journalists in this case since you have to know a lot about gene regulation to see the absurdity in the ENCODE predictions.
Birney and his colleagues have estimated how complete their sampling is, and suspect that they will find another 11% of the genome with this kind of regulatory activity. That gets them to 20%. So, perhaps the main conclusion should have been that 20% of the genome in some situation can directly influence gene expression and phenotype of at least one human cell type. It’s a far cry from 80%, but a substantial increase from 1%.If you thought 8% was ridiculous then 19% is even worse. Feel sorry for the poor pufferfish whose genome is only 12% as large as the human genome. Think of all the complex regulation that pufferfish just can't do.
I'm not saying that my estimates are definitive proof that the ENCODE conclusions are wrong and I'm not saying that the size of the pufferfish genome disproves the estimation Birney is making. What I'm saying is that results and conclusions have to be viewed skeptically and put into bigger context before believing that they are true. That's the job science journalists have undertaken, otherwise they are just the mouthpiece of the authors.
Some suggest that a majority of the genome does have an active role in biological functions. John Mattick, director of the Garvan Institute of Medical Research in Sydney, Australia, who I spoke to in the run up to the publication of these papers, argued that the ENCODE authors were being far too conservative in their claims about the significance of all that transcription. “We have misunderstood the nature of genetic programming for the past 50 years,” he told me. Having long argued that non-coding RNA has a crucial role in cell regulatory functions, his gentle criticism is that “they’ve reported the elephant in the room then chosen to otherwise ignore it”.Good reporting. Yes, there are some other scientists who think that all of the human genome is functional. That's why this is a genuine scientific controversy. (I think Mattick is dead wrong [Genome Size, Complexity, and the C-Value Paradox].)
The 80% number may not have been ideal, but it did provide a headline figure that was impressive to the mainstream media. This is at the core of a related critique against the ENCODE researchers and the journals that published their papers. By bandying about this big number, press releases on the project touted the idea that ENCODE had demolished some long-standing notion that much of the genome is ‘junk’. Michael Eisen, an evolutionary biologist at the University of California, Berkeley, said in a blog post that this pushed “a narrative about their results that is, at best, misleading.”So, what's up Brendan Maher? Are you saying that publication of an admittedly misleading number (80%) was acceptable because "it did provide a headline figure that was impressive to the mainstream media." Or, are you going to admit that you made a mistake?
That narrative goes something like this: scientists long thought the genome was littered with junk, evolutionary remnants that serve no purpose, but ENCODE has shown that 80% of the genome (and possibly more to come) does serve a purpose. That narrative appeared in many media reports on the publication. Many on Twitter and in online conversations bemoaned the rehashing of a junk-DNA debate that they considered imaginary or at least long-settled. Eisen, perhaps rightfully, puts the blame on press releases that touted the supposed paradigm shift: the one from Nature Publishing Group started thus: “Far from being junk, the vast majority of our DNA participates in at least one biochemical event in at least one cell type.” Eisen says that “the authors undoubtedly know, nobody actually thinks that non-coding DNA is ‘junk’ anymore. It’s an idea that pretty much only appears in the popular press, and then only when someone announces that they have debunked it.”
It is an old argument, but it’s not clear that it is a dead argument. Several researchers took issue with ENCODE’s suggestion that its wobbly 80% number in any way disproves that some DNA is junk. Larry Moran, a biochemist at the University of Toronto in Ontario argued on his blog that claims about disproving the existence of junk gives ammunition to creationists who like a tidy view of every letter in the genome having some sort of divine purpose. “This is going to make my life very complicated,” he writes.
Indeed, the papers have caught the attention of at least some creationists, and of just about everyone else. This was in part designed by the project leaders and editors, who organized a simultaneous release of the publications to maximize their impact. This was a major, time-consuming event that occupied a great deal of time from the scientists involved and from the editors at their respective journals.
(And, for the record, I did not mean that we should pay any attention at all to what the creationists think. When I said that "This is going to make my life very complicated" I was thinking more of how I was going to explain this to scientists and people interested in science. The damage done by this publicity campaign is that it misleads the general public, not just creationists.
ENCODE was conceived of and practised as a resource-building exercise. In general, such projects have a huge potential impact on the scientific community, but they don’t get much attention in the media. The journal editors and authors at ENCODE collaborated over many months to make the biggest splash possible and capture the attention of not only the research community but also of the public at large. Similar efforts went into the coordinated publication of the first drafts of the human genome, another resource-building project, more than a decade ago. Although complaints and quibbles will probably linger for some time, the real test is whether scientists will use the data and prove ENCODE’s worth.I'm sorry but if good journalists like Ed Yong think this is a "sterling effort" at defending the ENCODE Consortium against scientific criticism then we're in much more trouble than I originally thought.
1. Yes, I know that what the consortium actually said was that 80% has a "biochemical function" and that kind of function may, or may not, indicate a biologically relevant function. The distinction is not appreciated by the average reader and, quite frankly, not by the average science writer either.