More Recent Comments

Monday, September 05, 2022

The 10th anniversary of the ENCODE publicity campaign fiasco

On Sept. 5, 2012 ENCODE researchers, in collaboration with the science journal Nature, launched a massive publicity campaign to convince the world that junk DNA was dead. We are still dealing with the fallout from that disaster.

The Encyclopedia of DNA Elements (ENCODE) was originally set up to discover all of the functional elements in the human genome. They carried out a massive number of experiments involving a huge group of researchers from many different countries. The results of this work were published in a series of papers in the September 6th, 2012 issue of Nature. (The papers appeared on Sept. 5th.)

Most of the papers are quite technical, and several of them are almost unreadable, so the consortium leaders published a summary article in order to explain the results to the average reader (Birney et al., 2012). The summary article contained these sentences in the abstract [my emphasis, LAM].

The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions.

The idea that 80% of the genome is functional was taken to mean that there's almost no junk DNA in our genome and this was the main theme of the publicity campaign launched by Nature and promoted in videos, press releases, and guest editorials [What did the ENCODE Consortium say in 2012?]. The death of junk DNA was announced on the same day in newspapers and science websites all over the world. (Science journalists were provided with advanced notice under embargo until Sept. 5th.)

There was immediate criticism on blogs and other websites and this prompted an explanation from a senior Nature editor, Brendan Maher [Brendan Maher Writes About the ENCODE/Junk DNA Publicity Fiasco]. Here's how he explained the reason for the publicity campaign on the very next day (Sept. 6).

ENCODE was conceived of and practised as a resource-building exercise. In general, such projects have a huge potential impact on the scientific community, but they don’t get much attention in the media. The journal editors and authors at ENCODE collaborated over many months to make the biggest splash possible and capture the attention of not only the research community but also of the public at large. Similar efforts went into the coordinated publication of the first drafts of the human genome, another resource-building project, more than a decade ago. Although complaints and quibbles will probably linger for some time, the real test is whether scientists will use the data and prove ENCODE’s worth.

The important point here is that the publicity campaign was deliberate. It was planned over several months in order to make "the biggest splash possible." Apparently the "real test" will be whether researchers use the date but another, fairly important, "real test" is whether they will agree with the conclusions reached by ENCODE researchers.

As the nature of the fiasco became known in the scientific community, there was some attempt to make excuses by claiming that the ENCODE researchers and Nature editors were misunderstood. The revisionist story is that there were using a very particular definition of function and they didn't mean to imply that this refuted junk DNA [How does Nature deal with the ENCODE publicity hype that it created?]. That's nonsense and everybody knows it. There's abundant evidence that the ENCODE researchers really did mean to sound the death knell for junk DNA and Nature supported them [The truth about ENCODE]. One of the best examples is a press release from the Sanger Institute (UK) on Sept. 5, 2012.

The ENCODE Project, today, announces that most of what was previously considered as ‘junk DNA’ in the human genome is actually functional. The ENCODE Project has found that 80 per cent of the human genome sequence is linked to biological function.

One of the best analyses of the ENCODE publicity campaign fiasco is a paper by Casane et al. (2015). Unfortunately, it is written in french but the english abstract tells you all you need to know [The apophenia of ENCODE or Pangloss looks at the human genome].

In September 2012, a batch of more than 30 articles presenting the results of the ENCODE (Encyclopaedia of DNA Elements) project was released. Many of these articles appeared in Nature and Science, the two most prestigious interdisciplinary scientific journals. Since that time, hundreds of other articles dedicated to the further analyses of the Encode data have been published. The time of hundreds of scientists and hundreds of millions of dollars were not invested in vain since this project had led to an apparent paradigm shift: contrary to the classical view, 80% of the human genome is not junk DNA, but is functional. This hypothesis has been criticized by evolutionary biologists, sometimes eagerly, and detailed refutations have been published in specialized journals with impact factors far below those that published the main contribution of the Encode project to our understanding of genome architecture. In 2014, the Encode consortium released a new batch of articles that neither suggested that 80% of the genome is functional nor commented on the disappearance of their 2012 scientific breakthrough. Unfortunately, by that time many biologists had accepted the idea that 80% of the genome is functional, or at least, that this idea is a valid alternative to the long held evolutionary genetic view that it is not. In order to understand the dynamics of the genome, it is necessary to re-examine the basics of evolutionary genetics because, not only are they well established, they also will allow us to avoid the pitfall of a panglossian interpretation of Encode. Actually, the architecture of the genome and its dynamics are the product of trade-offs between various evolutionary forces, and many structural features are not related to functional properties. In other words, evolution does not produce the best of all worlds, not even the best of all possible worlds, but only one possible world.

It's now been ten years since publication of the original paper and ENCODE has never again mentioned their 80% functional claim or claimed that most of the genome is functional.

We still have to deal with fallout from the huge success of a massive publicity campaign that spread false information about junk DNA. Ten years later, the majority of scientists and most of the general public still believe that ENCODE refuted junk DNA. It tells us that propaganda is much more effective than scientific evidence.

Birney et al. (The ENCODE Consortium) 2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74. [doi: 10.1038/nature11247]

Casane, D., Fumey, J., et Laurenti, P. (2015) L’apophénie d’ENCODE ou Pangloss examine le génome humain. Med. Sci. (Paris) 31: 680-686. [doi: 10.1051/medsci/20153106023]


Georgi Marinov said...

Yeah, we knew it was going to happen like that 10 years ago, and here we are.

It's the usual pattern -- the first that people here is what most of them remember as ground immutable truth, especially if it is designed to pander to their biases, then the corrections after that nobody pays attention to, and thus the damage is done and becomes very hard to undo.

Larry Moran said...


Many editors on Wikipedia see the Kellis et al. paper as a defense of the ENCODE position. Is that what you thought when you were writing your part of the paper?

Did the people in your lab at the time think that they were disproving junk DNA? Did you ever talk about it in group meetings?

Where are you now?

Georgi Marinov said...

As I said, once something has been given a very public platform and lay people see it for the first time, no amount of damage control can help fix things in the short term.

We had a lot of those discussions here at the time if you recall.

The 80% claim was never discussed by anyone prior to it appearing on September 5th 2012, it just appeared in the 2012 paper, and even then nobody in the know would have paid much attention to it (because they know what stood behind that number) if it wasn't for the media publicity (which again is something that came out of the blue for most).

I certainly didn't wake up on that day ten years ago with the expectation that what happened would happen.

Larry Moran said...


Yes, but the Kellis et al paper was eighteen months later and there had been plenty of time to recognize what had happened in September 2012.

The ENCODE leaders could had said in that paper that they disowned the 80% function claim and did not dispute the existence of lots of junk DNA. They didn't say that and I think it's because most of them are very much opposed to junk DNA.

Do you agree?

Joe Felsenstein said...

The state of acceptance of the very-little-junk-DNA delusion is illustrated by an article in today's New York Times section ScienceTimes. A report by Oliver Whang is called Cracking the Case of the Gient Fern Genome and starts:
"Humans, like many complex organisms, have large genomes, which contain the codes for our lives. Want to explain your dark hair, thin bones, and existential dread? Look to your 46 chromosomes and three billion nucleotide base pairs. But those numbers are nothing compared with the genomes of another organism, which contains twice as many base pairs and three times as many chromosomes."
He goes on to reveal that this is a Flying Spider Monkey Tree Fern, found in Southeast Asia. (He ignores even more massive genomes in lungfish).
Then he continues:
"What accounts for, or requires, so much DNA is what Fay-Wei Li, a botanist at the Boyce Thompson Institute calls 'the biggest question in fern genomics'"
The rest of the article does not address the question but concerns other ferns that have been found to have similar sized genomes.

Larry Moran said...


The sequences of two other fern genomes were published last week. The Ceratopteris richardii genome is also quite large (9.6 Gb, 7.5 Gb was sequenced) and 85% of it consists of repetitive DNA. Transposon-related sequences account for 75% of the genome. A total of 37,000 protein-coding genes were identified.

There's evidence of two polyploidy events (whole genome duplications, WGD). One was only 60 My ago and the other was 300 My ago. There's no mystery about the large genome. It's due to the whole genome duplications and expansion of repetitive DNA. I assume that a very large percentage of the genome is junk DNA but the authors don't mention junk DNA for some strange reason.

The other paper reports the sequence of the maidenhair fern genome sequence. The complete genome is 5.0 Gb of which 4.8 Gb was sequenced. 85% of the genome is repetitive DNA and most of this is transposon-related.

There's no evidence of a recent WGD (<300 My) in the maidenhair fern genome. The authors report 31,000 protein-coding genes and 9,000 noncoding genes.

In my opinion, most of the genome is junk DNA due to expansion of repetitive sequences so there's no great mystery about why the genome is larger than the human genome. But if the numbers of protein-coding genes are accurate, then some of the increase in genome size among ferns compared to mammals is due to more genes with large introns. Thus, part of the expansion includes insertion of junk repetitive DNA elements into introns. For some strange reason, these authors also avoid using the term "junk DNA" and avoid any mention of the possibility that much of the genome could be nonfunctional.

The bottom line is that there isn't anything to see in the three fern genome sequences to alter the view that genome expansion is due to polyploidy events and transposon-related sequences giving rise to lots of junk DNA.

There have been a least a dozen popular science articles about this work but none of them have mentioned junk DNA. Almost all of them treat large genomes as an important mystery that's puzzling scientists.

Joe Felsenstein said...

Just wait till they discover lungfish genomes ...

Graham Jones said...

I think ferns can beat lungfish.

"Estimates of fern genome sizes range from 0.77 pg for Azolla microphylla (heterosporous
leptosporangiate) to 65.55 pg for Ophioglossum reticulatum and 72.68 pg for Psilotum nudum (two
eusporangiate ferns; Bennett and Leitch 2001; Obermayer et al. 2002)."

Ophioglossum reticulatum has very high ploidy with 1440 chromosomes.

Possibly the really important question in fern genomics is "How can we get a grant to sequence the really big ones?"

Joe Felsenstein said...

I think Paris japonica, an alpine flower in Japan, beats them all.

John Harshman said...

It's as if nobody has ever heard of the c-value problem and need to discover it anew, without knowledge of the literature about it. Or without knowledge of onions, fugu, genome size databases, etc.

Larry Moran said...

@John Harshman

The significance of a model can judged by its explanatory power, which is another way of saying that a good model explains observations and data much better than bad models. The junk DNA view of genomes explains the range of genome sizes much better than any model suggesting that most of the human genome is functional.

This is why the Onion Test is so important in these discussions.

The bigger problem is that a large number of junk DNA skeptics are really bad at seeing the "big picture" and putting their speculations into the broader context of all of life and all of the data. I think this is a failure of critical thinking.

Stewart said...


I wouldn't appeal in general to whole genome duplication to account for large genomes in plants - while it accounts to the large genomes of neopolyploids such as Paris japonica relative to their diploid congeners plants have gone through repeated cycles of polyploidisation and diploidisation, and any correlation with the number of rounds of ancient polyploidisation is weak: Arabidospsis thaliana was the first plant genome sequenced partly due to its small genome (it was also a model organism for the study of plant development) but still shows evidence of 2 or 3 rounds of whole genome duplication.

Gossypium (cotton) is a relatively young genus, but the random walk in genome sizes has been such that some diploid Australian cotton species have larger genomes than the tetraploid New World/Pacific species; this is not so much because the tetraploids (subgenus Karpas) have lost duplicated genes as that the Australian species (subgenus Sturtia) had acquited genomes twice the size of those of the the Afro-Asian (subgenus Gossypium) and American (subgenus Houzingenia) lineages.

My layman's explanation for why plants in general have more genes than vertebrates (which have also gone through 3 rounds of whole genome duplication) is that plants have to synthesise all their own biomolecules - both in the essential bits of the metabolism, and in the secondary metabolites they produce to deter herbivores.