In most cases, those articles contained interviews with ENCODE leaders and direct quotes about the presence of large amounts of functional DNA in the human genome.
The second wave of the ENCODE publicity campaign is trying to claim that this was all a misunderstanding. According to this revisionist view of recent history, the actual ENCODE papers never said that most of our genome had to be functional and never implied that junk DNA was dead. It was the media that misinterpreted the papers. Don't blame the scientists.
You can see an example of this version of history in the comments to How does Nature deal with the ENCODE publicity hype that it created?, where some people are arguing that the ENCODE summary paper has been misrepresented.
Let's look at the summary paper published in Nature on September 6, 2012 in the issue devoted largely to ENCODE papers. The lead author was Ewan Birney, by his own admission, and the paper is usually referenced as Birney et al. (2012).
The scientific summary was accompanied by two articles and one video produced by Nature editors. In the first one by Brendan Maher (Maher, 2012) he attempts to explain the purpose of the ENCODe project. He says,
ENCODE was designed to pick up where the Human Genome Project left off. Although that massive effort revealed the blueprint of human biology, it quickly became clear that the instruction manual for reading the blueprint was sketchy at best. Researchers could identify in its 3 billion letters many of the regions that code for proteins, but those make up little more than 1% of the genome, contained in around 20,000 genes — a few familiar objects in an otherwise stark and unrecognizable landscape. Many biologists suspected that the information responsible for the wondrous complexity of humans lay somewhere in the ‘deserts’ between the genes. ENCODE, which started in 2003, is a massive data-collection effort designed to populate this terrain. The aim is to catalogue the ‘functional’ DNA sequences that lurk there, learn when and in which cells they are active and trace their effects on how the genome is packaged, regulated and read.Presumably this description reflects the view of the ENCODE Consortium, which worked closely with Nature in coordinating the publicity campaign that accompanied the publications. You can't blame anyone for assuming that the goal of ENCODE is to look for function in junk DNA.
Maher goes on to describe the publication of the pilot project in 2007 when 1% of the genome was analyzed.
The pilot projects transformed biologists’ view of the genome. Even though only a small amount of DNA manufactures protein-coding messenger RNA,for example, the researchers found that much of the genome is ‘transcribed’ into non-coding RNA molecules, some of which are now known to be important regulators of gene expression. And although many geneticists had thought that the functional elements would be those that are most conserved across species, they actually found that many important regulatory sequences have evolved rapidly.The results of the polot project did NOT "transformed biologists’ view of the genome" if you are referring to knowledgeable biologists. Those results came under heavy criticism for exactly the same reasons the more recent publications have been criticized. Nobody who believed that most of our genome is junk was swayed by the results of the pilot project because the interpretation of those results was flawed by illogical thinking and overinterpretation of data. Just like what happened five years later.
With reference to the results appearing in the Sept. 6, 2012 issue of Nature, Maher says,
The real fun starts when the various data sets are layered together. Experiments looking at histone modifications, for example, reveal patterns that correspond with the borders of the DNaseI-sensitive sites. Then researchers can add data showing exactly which transcription factors bind where, and when. The vast desert regions have now been populated with hundreds of thousands of features that contribute to gene regulation. And every cell type uses different combinations and permutations of these features to generate its unique biology. This richness helps to explain how relatively few protein-coding genes can provide the biological complexity necessary to grow and run a human being. ENCODE “is much more than the sum of the parts”, says Manolis Kellis, a computational genomicist at the Massachusetts Institute of Technology in Cambridge, who led some of the data-analysis efforts.Now this sounds to me like a genome teaming with function and very little junk DNA but maybe that's not what Brendan Maher and the ENCODE Consortium really meant, right? (BTW, many of us think that our understanding of development in organisms like fruit flies show that a relatively small number of transcription factors working on a conserved set of genes can explain complexity. Apparently the ENCODE Consortium leaders think that something more is need to explain humans.)
The other editorial paper is by Skipper et al. (2012). The lead author is senior editor Magdalena Skipper who can be seen in this video with Ewan Birney announcing that 80% of the genome has a function, meaning that it is not junk.
The Skipper et al. paper introduces us to five researchers who were invited to participate in the publicity campaign by sharing "their views on what the results mean to them and their work" (Ecker et al. (2012). The first of these experts is Joseph Ecker who says,
One of the more remarkable findings described in the consortium's 'entrée' paper (page 57)2 is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly 'junk DNA'. The authors report that the space between genes is filled with enhancers (regulatory DNA elements), promoters (the sites at which DNA's transcription into RNA is initiated) and numerous previously overlooked regions that encode RNA transcripts that are not translated into proteins but might have regulatory roles.Now, none of these editors and experts are ENCODE Consortium authors so it's quite possible that they have all misinterpreted the 'entrée' paper. This is the view now being suggested by the Consortium leaders (Kellis et al., 2014). They now argue that Genetic, Evolutionary, and Biochemical descriptions of "function" are all reasonable approaches to understanding junk DNA and genome composition. They now claim that just having the data available is their most important contribution and not claims about how much of the genome is functional.
In contrast to evolutionary and genetic evidence, biochemical data offer clues about both the molecular function served by underlying DNA elements and the cell types in which they act, thus providing a launching point to study differentiation and development, cellular circuitry, and human disease. The major contribution of ENCODE to date has been high-resolution, highly-reproducible maps of DNA segments with biochemical signatures associated with diverse molecular functions. We believe that this public resource is far more important than any interim estimate of the fraction of the human genome that is functional.The implication is that they really didn't mean to say that 80% of our genome is functional. It was all a misunderstanding. They are now saying that the goal of the Consortium wasn't to discover function at all but merely to provide maps of places that might be functional, depending on your definition.
With that introduction, let's look at what Birney et al. actually said in their summary paper back in September 2012. You can judge for yourselves whether their statements were misinterpreted.
It's hard to avoid the impression that Birney et al. were attributing some sort of function to every single protein binding site and every single transcribed sequence. The abstract to their paper says,
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.The revisionist view of this history would have us believe that their claim of "new insights" is only tentative depending on whether "biochemical functions" really means anything.
The authors define what they mean by "function" in the introduction.
The Encyclopedia of DNA Elements (ENCODE) project aims to delineate all functional elements encoded in the human genome. Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure). Comparative genomic studies suggest that 3–8% of bases are under purifying (negative) selection and therefore may be functional, although other analyses have suggested much higher estimates. In a pilot phase covering 1% of the genome, the ENCODE project annotated 60% of mammalian evolutionarily constrained bases, but also identified many additional putative functional elements without evidence of constraint. The advent of more powerful DNA sequencing technologies now enables whole-genome and more precise analyses with a broad repertoire of functional assays.Now, I don't know about the rest of you, but that sounds pretty clear to me. The authors really mean it when they say that 80% of our genome is functional.
Here we describe the production and initial analysis of 1,640 data sets designed to annotate functional elements in the entire human genome. We integrate results from diverse experiments within cell types, related experiments involving 147 different cell types, and all ENCODE data with other resources, such as candidate regions from genome-wide association studies (GWAS) and evolutionarily constrained regions. Together, these efforts reveal important features about the organization and function of the human genome, summarized below.I don't blame media types and other experts for reading this as a claim that junk DNA is only a small percentage of the genome.
• The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type. Much of the genome lies close to a regulatory event: 95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction (as assayed by bound ChIP-seq motifs or DNase I footprints), and 99% is within 1.7 kb of at least one of the biochemical events measured by ENCODE.
Many of us recognized right away that this was ridiculous so we looked in the paper to see where they discussed nonfunctional binding and aberrant transcripts and referenced the papers that raised these issues. I couldn't find any such discussion. Did miss it? Can anyone give me the page numbers and the references numbers where they point out the possible limitations of their definition of function?
What we find instead is a description of the assays for function and a summary (page 60) that says ...
Accounting for all these elements, a surprisingly large amount of the human genome, 80.4%, is covered by at least one ENCODE-identified element (detailed in Supplementary Table 1, section Q). The broadest element class represents the different RNA types, covering 62% of the genome (although the majority is inside of introns or near genes). Regions highly enriched for histone modifications form the next largest class (56.1%). Excluding RNA elements and broad histone elements, 44.2% of the genome is covered. Smaller proportions of the genome are occupied by regions of open chromatin (15.2%) or sites of transcription factor binding (8.1%), with 19.4% covered by at least one DHS or transcription factor ChIP-seq peak across all cell lines. Using our most conservative assessment, 8.5% of bases are covered by either a transcription-factor-binding-site motif (4.6%) or a DHS footprint (5.7%). This, however, is still about 4.5-fold higher than the amount of protein-coding exons, and about twofold higher than the estimated amount of pan-mammalian constraint.What we also find is a criticism of conservation (Evolutionary version of function) because it can't identify human specific functions that have arisen recently. They reference experiments that support such a conclusion and say, "This indicates that an appreciable proportion of the unconstrained elements are lineage-specific elements required for organismal function, consistent with long-standing views of recent evolution and the remainder are probably ‘neutral’ elements that are not currently under selection but may still affect cellular or larger scale phenotypes without an effect on fitness."
Given that the ENCODE project did not assay all cell types, or all transcription factors, and in particular has sampled few specialized or developmentally restricted cell lineages, these proportions must be underestimates of the total amount of functional bases. ... These estimates represent a lower bound, but reinforce the observation that there is more non-coding functional DNA than either coding sequence or mammalian evolutionarily constrained bases.
Such statements reinforce the idea that the ENCODE authors look to biochemical function as the definitive definition of function.
The "Concluding Remarks" say,
The unprecedented number of functional elements identified in this study provides a valuable resource to the scientific community as well as significantly enhances our understanding of the human genome....I wonder what "unprecedented number of functional elements" means? I wonder how we "significantly enhance our understanding of the human genome" if most of those biochemical functional elements are artifacts? Did the authors really means to say that maybe only 10% of our genome is functional and this does nothing to enhance our understanding of the human genome? I don't think so.
The large spread of coverage—from our highest resolution, most conservative set of bases implicated in GENCODE protein-coding gene exons (2.9%) of specific protein DNA binding (8.5%) to the broadest, most general set of marks covering the genome (approximately 80%), with many gradations in between—represents a spectrum of elements with different functional properties discovered by ENCODE.
The authors then go on to reiterate that all of these functional elements are likely to be underestimates.
The press picked up on these statements and reported that most of our genome was functional, not junk. As I stated above, those press releases often quoted leaders of the ENCODE Consortium and we know for a fact that many of them actually believed they had debunked junk. In fact, many of them still do believe that most of our genome is functional in spite of the Kellis et al. paper.
Does anyone honestly believe that this whole publicity hype campaign orchestrated by Ewan Birney and Nature with the active collaboration of other ENCODE leaders was all a big misunderstanding? Are some ENCODE Consortium members honestly trying to tell use that the paper was totally misinterpreted though no fault of the authors and they were really saying something very different than that most of the human genome has a function?
'Cause if that's what they're saying then how come the ENCODE leaders didn't speak up at the time and distance themselves from the misleading stories in the press?
How come they didn't point to the sentences on page 71 of their paper and claim that they didn't really mean to say ...
Importantly, for the first time we have sufficient statistical power to assess the impact of negative selection on primate-specific elements, and all ENCODE classes display evidence of negative selection in these unique-to-primate elements. Furthermore, even with our most conservative estimate of functional elements (8.5% of putative DNA/protein binding regions) and assuming that we have already sampled half of the elements from our transcription factor and cell-type diversity, one would estimate that at a minimum 20% (17% from protein binding and 2.9% protein coding gene exons) of the genome participates in these specific functions, with the likely figure significantly higher.What they are saying is that they truly believe that every single transcription factor binding site has a function.
That's just nonsense.
Birney , E. et al. (The ENCODE Consortium) (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74. [doi: 10.1038/nature11247]
Ecker, J.R., Bickmore, W.A., Barroso, I., Pritchard, J.K., Gilad, Y., and Segal, E. (2012) Genomics: ENCODE explained. Nature 489:52-55. [doi: 10.1038/489052a]
Kellis, M. et al. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) April 24, 2014 published online [doi: 10.1073/pnas.1318948111]
Skipper, M., Dhand, R., and Campbell, P. (2012) Presenting ENCODE. Nature 489:45. [doi: 10.1038/489045a]