More Recent Comments

Showing posts sorted by relevance for query encode project. Sort by date Show all posts
Showing posts sorted by relevance for query encode project. Sort by date Show all posts

Monday, September 05, 2022

The 10th anniversary of the ENCODE publicity campaign fiasco

On Sept. 5, 2012 ENCODE researchers, in collaboration with the science journal Nature, launched a massive publicity campaign to convince the world that junk DNA was dead. We are still dealing with the fallout from that disaster.

The Encyclopedia of DNA Elements (ENCODE) was originally set up to discover all of the functional elements in the human genome. They carried out a massive number of experiments involving a huge group of researchers from many different countries. The results of this work were published in a series of papers in the September 6th, 2012 issue of Nature. (The papers appeared on Sept. 5th.)

Wednesday, June 29, 2022

The Function Wars Part XII: Revising history and defending ENCODE

I'm very disappointed in scientists and philosophers who try to defend ENCODE's behavior on the grounds that they were using a legitimate definition of function. I'm even more annoyed when they deliberately misrepresent ENCODE's motive in launching the massive publicity campaign in 2012.

Here's another new paper on the function wars.

Ratti, E. and Germain, P.-L. (2021) A Relic of Design: Against Proper Functions in Biology. Biology & Philosophy 37:27. [doi: 10.1007/s10539-022-09856-z]

The notion of biological function is fraught with difficulties - intrinsically and irremediably so, we argue. The physiological practice of functional ascription originates from a time when organisms were thought to be designed and remained largely unchanged since. In a secularized worldview, this creates a paradox which accounts of functions as selected effect attempt to resolve. This attempt, we argue, misses its target in physiology and it brings problems of its own. Instead, we propose that a better solution to the conundrum of biological functions is to abandon the notion altogether, a prospect not only less daunting than it appears, but arguably the natural continuation of the naturalisation of biology.

Friday, November 20, 2015

The truth about ENCODE

A few months ago I highlighted a paper by Casane et al. (2015) where they said ...
In September 2012, a batch of more than 30 articles presenting the results of the ENCODE (Encyclopaedia of DNA Elements) project was released. Many of these articles appeared in Nature and Science, the two most prestigious interdisciplinary scientific journals. Since that time, hundreds of other articles dedicated to the further analyses of the Encode data have been published. The time of hundreds of scientists and hundreds of millions of dollars were not invested in vain since this project had led to an apparent paradigm shift: contrary to the classical view, 80% of the human genome is not junk DNA, but is functional. This hypothesis has been criticized by evolutionary biologists, sometimes eagerly, and detailed refutations have been published in specialized journals with impact factors far below those that published the main contribution of the Encode project to our understanding of genome architecture. In 2014, the Encode consortium released a new batch of articles that neither suggested that 80% of the genome is functional nor commented on the disappearance of their 2012 scientific breakthrough. Unfortunately, by that time many biologists had accepted the idea that 80% of the genome is functional, or at least, that this idea is a valid alternative to the long held evolutionary genetic view that it is not. In order to understand the dynamics of the genome, it is necessary to re-examine the basics of evolutionary genetics because, not only are they well established, they also will allow us to avoid the pitfall of a panglossian interpretation of Encode. Actually, the architecture of the genome and its dynamics are the product of trade-offs between various evolutionary forces, and many structural features are not related to functional properties. In other words, evolution does not produce the best of all worlds, not even the best of all possible worlds, but only one possible world.
How did we get to this stage where the most publicized result of papers published by leading scientists in the best journals turns out to be wrong, but hardly anyone knows it?

Back in September 2012, the ENCODE Consortium was preparing to publish dozens of papers on their analysis of the human genome. Most of the results were quite boring but that doesn't mean they were useless. The leaders of the Consortium must have been worried that science journalists would not give them the publicity they craved so they came up with a strategy and a publicity campaign to promote their work.

Their leader was Ewan Birney, a scientist with valuable skills as a herder of cats but little experience in evolutionary biology and the history of the junk DNA debate.

The ENCODE Consortium decided to add up all the transcription factor binding sites—spurious or not—and all the chromatin makers—whether or not they meant anything—and all the transcripts—even if they were junk. With a little judicious juggling of numbers they came up with the following summary of their results (Birney et al., 2012) ..
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
See What did the ENCODE Consortium say in 2012? for more details on what the ENCODE Consortium leaders said, and did, when their papers came out.

The bottom line is that these leaders knew exactly what they were doing and why. By saying they have assigned biochemical functions for 80% of the genome they knew that this would be the headline. They knew that journalists and publicists would interpret this to mean the end of junk DNA. Most of ENCODE leaders actually believed it.

That's exactly what happened ... aided and abetted by the ENCODE Consortium, the journals Nature and Science, and gullible science journalists all over the world. (Ryan Gregory has published a list of articles that appeared in the popular press: The ENCODE media hype machine..)

Almost immediately the knowledgeable scientists and science writers tried to expose this publicity campaign hype. The first criticisms appeared on various science blogs and this was followed by a series of papers in the published scientific literature. Ed Yong, an experienced science journalist, interviewed Ewan Birney and blogged about ENCODE on the first day. Yong reported the standard publicity hype that most of our genome is functional and this interpretation is confirmed by Ewan Birney and other senior scientists. Two days later, Ed Yong started adding updates to his blog posting after reading the blogs of many scientists including some who were well-recognized experts on genomes and evolution [ENCODE: the rough guide to the human genome].

Within a few days of publishing their results the ENCODE Consortium was coming under intense criticism from all sides. A few journalists, like John Timmer, recongized right away what the problem was ...
Yet the third sentence of the lead ENCODE paper contains an eye-catching figure that ended up being reported widely: "These data enabled us to assign biochemical functions for 80 percent of the genome." Unfortunately, the significance of that statement hinged on a much less widely reported item: the definition of "biochemical function" used by the authors.

This was more than a matter of semantics. Many press reports that resulted painted an entirely fictitious history of biology's past, along with a misleading picture of its present. As a result, the public that relied on those press reports now has a completely mistaken view of our current state of knowledge (this happens to be the exact opposite of what journalism is intended to accomplish). But you can't entirely blame the press in this case. They were egged on by the journals and university press offices that promoted the work—and, in some cases, the scientists themselves.


[Most of what you read was wrong: how press releases rewrote scientific history]
Nature may have begun to realize that it made a mistake in promoting the idea that most of our genome was functional. Two days after the papers appeared, Brendan Maher, a Feature Editor for Nature, tried to get the journal off the hook but only succeeded in making matters worse [see Brendan Maher Writes About the ENCODE/Junk DNA Publicity Fiasco].

Meanwhile, two private for-profit companies, illumina and Nature, team up to promote the ENCODE results. They even hire Tim Minchin to narrate it. This is what hype looks like ...


Soon articles began to appear in the scientific literature challenging the ENCODE Consortium's interpretation of function and explaining the difference between an effect—such as the binding of a transcription factor to a random piece of DNA—and a true biological function.

Eddy, S.R. (2012) The C-value paradox, junk DNA and ENCODE. Current Biology, 22:R898. [doi: 10.1016/j.cub.2012.10.002]

Niu, D. K., and Jiang, L. (2012) Can ENCODE tell us how much junk DNA we carry in our genome?. Biochemical and biophysical research communications 430:1340-1343. [doi: 10.1016/j.bbrc.2012.12.074]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) published online March 11, 2013. [PubMed] [doi: 10.1073/pnas.1221376110]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A., and Elhaik, E. (2013) On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution published online: February 20, 2013 [doi: 10.1093/gbe/evt028

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology, 23:R259-R261. [10.1016/j.cub.2013.03.023]

Hurst, L.D. (2013) Open questions: A logic (or lack thereof) of genome organization. BMC biology, 11:58. [doi:10.1186/1741-7007-11-58]

Morange, M. (2014) Genome as a Multipurpose Structure Built by Evolution. Perspectives in biology and medicine, 57:162-171. [doi: 10.1353/pbm.2014.000]

Palazzo, A.F., and Gregory, T.R. (2014) The Case for Junk DNA. PLoS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]

By March 2013—six months after publication of the ENCODE papers—some editors at Nature decided that they had better say something else [see Anonymous Nature Editors Respond to ENCODE Criticism]. Here's the closest thing to an apology that they have ever written ....
The debate over ENCODE’s definition of function retreads some old battles, dating back perhaps to geneticist Susumu Ohno’s coinage of the term junk DNA in the 1970s. The phrase has had a polarizing effect on the life-sciences community ever since, despite several revisions of its meaning. Indeed, many news reports and press releases describing ENCODE’s work claimed that by showing that most of the genome was ‘functional’, the project had killed the concept of junk DNA. This claim annoyed both those who thought it a premature obituary and those who considered it old news.

There is a valuable and genuine debate here. To define what, if anything, the billions of non-protein-coding base pairs in the human genome do, and how they affect cellular and system-level processes, remains an important, open and debatable question. Ironically, it is a question that the language of the current debate may detract from. As Ewan Birney, co-director of the ENCODE project, noted on his blog: “Hindsight is a cruel and wonderful thing, and probably we could have achieved the same thing without generating this unneeded, confusing discussion on what we meant and how we said it”.
Oops! The importance of junk DNA is still an "important, open and debatable question" in spite of what the video sponsored by Nature might imply.

(To this day, neither Nature nor Science have actually apologized for misleading the public about the ENCODE results. [see Science still doesn't get it ])

The ENCODE Consortium leaders responded in April 2014—eighteen months after their original papers were published.

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G. E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]

In that paper they acknowledge that there are multiple meanings of the word function and their choice of "biochemical" function may not have been the best choice ....
However, biochemical signatures are often a consequence of function, rather than causal. They are also not always deterministic evidence of function, but can occur stochastically.
This is exactly what many scientists have been telling them. Apparently they did not know this in September 2012.

They also include in their paper a section on "Case for Abundant Junk DNA." It summarizes the evidence for junk DNA, evidence that the ENCODE Consortium did not acknowledge in 2012 and certainly didn't refute.

In answer to the question, "What Fraction of the Human Genome Is Functional?" they now conclude that ENCODE hasn't answered that question and more work is needed. They now claim that the real value of ENCODE is to provide "high-resolution, highly-reproducible maps of DNA segments with biochemical signatures associate with diverse molecular functions."
We believe that this public resource is far more important than any interim estimate of the fraction of the human genome that is functional.
There you have it, straight from the horse's mouth. The ENCODE Consortium now believes that you should NOT interpret their results to mean that 80% of the genome is functional and therefore not junk DNA. There is good evidence for abundant junk DNA and the issue is still debatable.

I hope everyone pays attention and stops referring to the promotional hype saying that ENCODE has refuted junk DNA. That's not what the ENCODE Consortium leaders now say about their results.


Casane, D., Fumey, J., et Laurenti, P. (2015) L’apophénie d’ENCODE ou Pangloss examine le génome humain. Med. Sci. (Paris) 31: 680-686. [doi: 10.1051/medsci/20153106023]

Tuesday, June 25, 2013

"Reasons to Believe" in ENCODE

Fazale "Fuz" Rana is a biochemist at Reasons to Believe". He and his colleagues are Christian apologists who try to make their faith compatible with science. Fuz was very excited about the ENCODE results when they were first published [One of the Most Significant Days in the History of Biochemistry]. That's because Christians of his ilk were very unhappy about junk DNA and the ENCODE Consortium showed that all of our genome is functional.1

Fuz is aware of the fact that some people are skeptical about the ENCODE results. He wrote a series of posts defending ENCODE.
  1. Do ENCODE Skeptics Protest Too Much? Part 1 (of 3)
  2. Do ENCODE Skeptics Protest Too Much? Part 2 (of 3)
  3. Do ENCODE Skeptics Protest Too Much? Part 3 (of 3)
The first post is merely a list of the objections many of us raised.

Tuesday, June 19, 2007

What is a gene, post-ENCODE?

Back in January we had a discussion about the definition of a gene [What is a gene?]. At that time I presented my personal preference for the best definition of a gene.
A gene is a DNA sequence that is transcribed to produce a functional product.
This is a definition that's widely shared among biochemists and molecular biologists but there are competing definitions.

Now, there's a new kid on the block. The recent publication of a slew of papers from the ENCODE project has prompted many of the people involved to proclaim that a revolution is under way. Part of the revolution includes redefining a gene. I'd like to discuss the paper by Mark Gerstein et al. (2007) [What is a gene, post-ENCODE? History and updated definition] to see what this revolution is all about.

The ENCODE project is a large scale attempt to analyze and annotate the human genome. The first results focus on about 1% of the genome spread out over 44 segments. These results have been summarized in an extraordinarily complex Nature paper with massive amounts of supplementary material (The Encode Project Consortium, 2007). The Nature paper is supported by dozens of other papers in various journals. Ryan Gregory has a list of blog references to these papers at ENCODE links.

I haven't yet digested the published results. I suspect that like most bloggers there's just too much there to comment on without investing a great deal of time and effort. I'm going to give it a try but it will require a lot of introductory material, beginning with the concept of alternative splicing, which is this week's theme.

The most widely publicized result is that most of the human genome is transcribed. It might be more correct to say that the ENCODE Project detected RNA's that are either complimentary to much of the human genome or lead to the inference that much of it is transcribed.

This is not news. We've known about this kind of data for 15 years and it's one of the reasons why many scientists over-estimated the number of humans genes in the decade leading up to the publication of the human genome sequence. The importance of the ENCODE project is that a significant fraction of the human genome has been analyzed in detail (1%) and that the group made some serious attempts to find out whether the transcripts really represent functional RNAs.

My initial impression is that they have failed to demonstrate that the rare transcripts of junk DNA are anything other than artifacts or accidents. It's still an open question as far as I'm concerned.

It's not an open question as far as the members of the ENCODE Project are concerned and that brings us to the new definition of a gene. Here's how Gerstein et al. (2007) define the problem.
The ENCODE consortium recently completed its characterization of 1% of the human genome by various high-throughput experimental and computational techniques designed to characterize functional elements (The ENCODE Project Consortium 2007). This project represents a major milestone in the characterization of the human genome, and the current findings show a striking picture of complex molecular activity. While the landmark human genome sequencing surprised many with the small number (relative to simpler organisms) of protein-coding genes that sequence annotators could identify (~21,000, according to the latest estimate [see www.ensembl.org]), ENCODE highlighted the number and complexity of the RNA transcripts that the genome produces. In this regard, ENCODE has changed our view of "what is a gene" considerably more than the sequencing of the Haemophilus influenza and human genomes did (Fleischmann et al. 1995; Lander et al. 2001; Venter et al. 2001). The discrepancy between our previous protein-centric view of the gene and one that is revealed by the extensive transcriptional activity of the genome prompts us to reconsider now what a gene is.
Keep in mind that I personally reject the premise and I don't think I'm alone. As far as I'm concerned, the "extensive transcriptional activity" could be artifact and I haven't had a "protein-centric" view of a gene since I learned about tRNA and ribosomal RNA genes as an undergraduate in 1967. Even if the ENCODE results are correct my preferred definition of a gene is not threatened. So, what's the fuss all about?

Regulatory Sequences
Gerstein et al. are worried because many definitions of a gene include regulatory sequences. Their results suggest that many genes have multiple large regions that control transcription and these may be located at some distance from the transcription start site. This isn't a problem if regulatory sequences are not part of the gene, as in the definition quoted above (a gene is a transcribed region). As a mater of fact, the fuzziness of control regions is one reason why most modern definitions of a gene don't include them.
Overlapping Genes
According to Gerstein et al.
As genes, mRNAs, and eventually complete genomes were sequenced, the simple operon model turned out to be applicable only to genes of prokaryotes and their phages. Eukaryotes were different in many respects, including genetic organization and information flow. The model of genes as hereditary units that are nonoverlapping and continuous was shown to be incorrect by the precise mapping of the coding sequences of genes. In fact, some genes have been found to overlap one another, sharing the same DNA sequence in a different reading frame or on the opposite strand. The discontinuous structure of genes potentially allows one gene to be completely contained inside another one’s intron, or one gene to overlap with another on the same strand without sharing any exons or regulatory elements.
We've known about overlapping genes ever since the sequences of the first bacterial operons and the first phage genomes were published. We've known about all the other problems for 20 years. There's nothing new here. No definition of a gene is perfect—all of them have exceptions that are difficult to squeeze into a one-size-fits-all definition of a gene. The problem with the ENCODE data is not that they've just discovered overlapping genes, it's that their data suggests that overlapping genes in the human genome are more the rule than the exception. We need more information before accepting this conclusion and redefining the concept of a gene based on analysis of the human genome.
Splicing
Splicing was discovered in 1977 (Berget et al. 1977; Chow et al. 1977; Gelinas and Roberts 1977). It soon became clear that the gene was not a simple unit of heredity or function, but rather a series of exons, coding for, in some cases, discrete protein domains, and separated by long noncoding stretches called introns. With alternative splicing, one genetic locus could code for multiple different mRNA transcripts. This discovery complicated the concept of the gene radically.
Perhaps back in 1978 the discovery of splicing prompted a re-evaluation of the concept of a gene. That was almost 30 years ago and we've moved on. Now, many of us think of a gene as a region of DNA that's transcribed and this includes exons and introns. In fact, the modern definition doesn't have anything to do with proteins.

Alternative splicing does present a problem if you want a rigorous definition with no fuzziness. But biology isn't like that. It's messy and you can't get rid of fuzziness. I think of a gene as the region of DNA that includes the longest transcript. Genes can produce multiple protein products by alternative splicing. (The fact that the definition above says "a" functional product shouldn't mislead anyone. That was not meant to exclude multiple products.)

The real problem here is that the ENCODE project predicts that alternative splicing is abundant and complex. They claim to have discovered many examples of splice variants that include exons from adjacent genes as shown in the figure from their paper. Each of the lines below the genome represents a different kind of transcript. You can see that there are many transcripts that include exons from "gene 1" and "gene 2" and another that include exons from "gene 1" and "gene 4." The combinations and permutations are extraordinarily complex.

If this represents the true picture of gene expression in the human genome, then it would require a radical rethinking of what we know about molecular biology and evolution. On the other hand, if it's mostly artifact then there's no revolution under way. The issue has been fought out in the scientific literature over the past 20 years and it hasn't been resolved to anyone's satisfaction. As far as I'm concerned the data overwhelmingly suggests that very little of that complexity is real. Alternative splicing exists but not the kind of alternative splicing shown in the figure. In my opinion, that kind of complexity is mostly an artifact due to spurious transcription and splicing errors.
Trans-splicing
Trans-splicing refers to a phenomenon where the transcript from one part of the genome is attached to the transcript from another part of the genome. The phenomenon has been known for over 20 years—it's especially common in C. elegans. It's another exception to the rule. No simple definition of a gene can handle it.
Parasitic and mobile genes
This refers mostly to transposons. Gerstein et al say, "Transposons have altered our view of the gene by demonstrating that a gene is not fixed in its location." This isn't true. Nobody has claimed that the location of genes is fixed.
The large amount of "junk DNA" under selection
If a large amount of what we now think of as junk DNA turns out to be transcribed to produce functional RNA (or proteins) then that will be a genuine surprise to some of us. It won't change the definition of a gene as far as I can see.
The paper goes on for many more pages but the essential points are covered above. What's the bottom line? The new definition of an ENCODE gene is:
There are three aspects to the definition that we will list below, before providing the succinct definition:
  1. A gene is a genomic sequence (DNA or RNA) directly encoding functional product molecules, either RNA or protein.
  2. In the case that there are several functional products sharing overlapping regions, one takes the union of all overlapping genomic sequences coding for them.
  3. This union must be coherent—i.e., done separately for final protein and RNA products—but does not require that all products necessarily share a common subsequence.
This can be concisely summarized as:
The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.
On the surface this doesn't seem to be much different from the definition of a gene as a transcribed region but there are subtle differences. The authors describe how their new definition works using a hypothetical example.

How the proposed definition of the gene can be applied to a sample case. A genomic region produces three primary transcripts. After alternative splicing, products of two of these encode five protein products, while the third encodes for a noncoding RNA (ncRNA) product. The protein products are encoded by three clusters of DNA sequence segments (A, B, and C; D; and E). In the case of the three-segment cluster (A, B, C), each DNA sequence segment is shared by at least two of the products. Two primary transcripts share a 5' untranslated region, but their translated regions D and E do not overlap. There is also one noncoding RNA product, and because its sequence is of RNA, not protein, the fact that it shares its genomic sequences (X and Y) with the protein-coding genomic segments A and E does not make it a co-product of these protein-coding genes. In summary, there are four genes in this region, and they are the sets of sequences shown inside the orange dashed lines: Gene 1 consists of the sequence segments A, B, and C; gene 2 consists of D; gene 3 of E; and gene 4 of X and Y. In the diagram, for clarity, the exonic and protein sequences A and E have been lined up vertically, so the dashed lines for the spliced transcripts and functional products indicate connectivity between the proteins sequences (ovals) and RNA sequences (boxes). (Solid boxes on transcripts) Untranslated sequences, (open boxes) translated sequences.
This isn't much different from my preferred definition except that I would have called the region containing exons C and D a single gene with two different protein products. Gerstein et al (2007) split it into two different genes.

The bottom line is that in spite of all the rhetoric the "new" definition of a gene isn't much different from the old one that some of us have been using for a couple of decades. It's different from some old definitions that other scientists still prefer but this isn't revolutionary. That discussion has already been going on since 1980.

Let me close by making one further point. The "data" produced by the ENCODE consortium is intriguing but it would be a big mistake to conclude that everything they say is a proven fact. Skepticism about the relevance of those extra transcripts is quite justified as is skepticism about the frequency of alternative splicing.


Gerstein, M.B., Bruce, C., Rozowsky, J.S., Zheng, D., Du, J., Korbel, J.O., Emanuelsson, O., Zhang, Z.D., Weissman, S. and Snyder, M. (2007) What is a gene, post-ENCODE? History and updated definition. Genome Res. 17:669-681.

The ENCODE Project Consortium (2007) Nature 447:799-816. [PDF]

[Hat Tip: Michael White at Adaptive Complexity]

Tuesday, September 11, 2012

ENCODE/Junk DNA Fiasco: The IDiots Don't Like Me

Casey Luskin has devoted an entire post to discussing my views on junk DNA. I'm flattered. Read it at: What an Evolution Advocate's Response to the ENCODE Project Tells Us about the Evolution Debate.

Let's look at how the IDiots are responding to this publicity fiasco. Casey Luskin begins with ...
University of Toronto biochemistry professor Larry Moran is not happy with the results of the ENCODE project, which report evidence of "biochemical functions for 80% of the genome." Other evolution-defenders are trying to dismiss this paper as mere "hype".

Yes that's right -- we're supposed to ignore the intentionally unambiguous abstract of an 18-page Nature paper, the lead out of 30 other simultaneous papers from this project, co-authored by literally hundreds of leading scientists worldwide, because it's "hype." (Read the last two or so pages of the main Nature paper to see the uncommonly long list of international scientists who were involved with this project, and co-authored this paper.) Larry Moran and other vocal Internet evolution-activists are welcome to disagree and protest these conclusions, but it's clear that the consensus of molecular biologists -- people who actually study how the genome works -- now believe that the idea of "junk DNA" is essentially wrong.

Wednesday, May 14, 2014

What did the ENCODE Consortium say in 2012?

When the ENCODE Consortium published their results in September 2012, the popular press immediately seized upon the idea that most of our genome was functional and the concept of junk DNA was debunked. The "media" in this case includes writers at prestigious journals like Science and Nature and well-known science writers in other respected publications and blogs.

In most cases, those articles contained interviews with ENCODE leaders and direct quotes about the presence of large amounts of functional DNA in the human genome.

The second wave of the ENCODE publicity campaign is trying to claim that this was all a misunderstanding. According to this revisionist view of recent history, the actual ENCODE papers never said that most of our genome had to be functional and never implied that junk DNA was dead. It was the media that misinterpreted the papers. Don't blame the scientists.

You can see an example of this version of history in the comments to How does Nature deal with the ENCODE publicity hype that it created?, where some people are arguing that the ENCODE summary paper has been misrepresented.

Friday, March 15, 2013

On the Meaning of the Word "Function"

A lot of the debate over ENCODE's publicity campaign concerns the meaning of the word "function." In the summary article published in Nature last September the authors said, "These data enabled us to assign biochemical functions for 80% of the genome ...." (The ENCODE Project Consortium, 2012).

Here's how they describe function.
Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure).
What, exactly, do the ENCODE scientists mean? Do they think that junk DNA might contain "functional elements"? If so, that doesn't make a lot of sense, does it?

Ewan Birney tried to address this definitional morass on his blog [ENCODE: My own thoughts] where he says ....
It’s clear that 80% of the genome has a specific biochemical activity – whatever that might be. This question hinges on the word “functional” so let’s try to tackle this first. Like many English language words, “functional” is a very useful but context-dependent word. Does a “functional element” in the genome mean something that changes a biochemical property of the cell (i.e., if the sequence was not here, the biochemistry would be different) or is it something that changes a phenotypically observable trait that affects the whole organism? At their limits (considering all the biochemical activities being a phenotype), these two definitions merge. Having spent a long time thinking about and discussing this, not a single definition of “functional” works for all conversations. We have to be precise about the context. Pragmatically, in ENCODE we define our criteria as “specific biochemical activity” – for example, an assay that identifies a series of bases. This is not the entire genome (so, for example, things like “having a phosphodiester bond” would not qualify). We then subset this into different classes of assay; in decreasing order of coverage these are: RNA, “broad” histone modifications, “narrow” histone modifications, DNaseI hypersensitive sites, Transcription Factor ChIP-seq peaks, DNaseI Footprints, Transcription Factor bound motifs, and finally Exons.
That's about as clear as mud.

We all know what the problem is. It's whether all binding sites have a biological function or whether many of them are just noise arising as a property of DNA binding proteins. It's whether all transcripts have a biological function or whether many of those detected by ENCODE are just spurious transcripts or junk RNA. These questions were debated extensively when the ENCODE pilot project was published in 2007. Every ENCODE scientist should know about this problem so you might expect that they would take steps to distinguish between real biological function and nonfunctional noise.

Their definition of "function" is not helpful. In fact, it seems deliberately designed to obfuscate.

Let's see how other scientist interpret the ENCODE results. In a News & Views article published in Nature last September, Joseph R, Ecker (Salk Institute scientist) said ...
One of the more remarkable findings described in the consortium's 'entré' paper is that 80% of the genome contains elements linked to biochemical function, dispatching the widely held view that the human genome is mostly 'junk DNA.'
That makes at least one genomics worker who thinks that "biochemical function" and junk DNA are mutually exclusive.

Recently a representative of GENCODE responded to Dan Graur's criticism [On the annotation of functionality in GENCODE (or: our continuing efforts to understand how a television set works)]. This person (JM) says ...
Q1: Does GENCODE believe that 80% of the genome is functional?

As noted, we will only discuss here the portion of the genome that is transcribed. According to the main ENCODE paper, while 80% of the genome appears to have some biological activity, only “62% of genomic bases are reproducibly represented in sequenced long (>200 nucleotides) RNA molecules or GENCODE exons”. In fact, only 5.5% of this transcription overlaps with GENCODE exons. So we have two things here: existing GENCODE models largely based on mRNA / EST evidence, and novel transcripts inferred from RNAseq data. The suggestion, then, is that there is extensive transcription occurring outside of currently annotated GENCODE exons.
There's another scientist who thinks that 80% of the genome has some biological activity in spite of the fact that the ENCODE paper says it has "biochemical function." I don't think "biological activity" is compatible with "junk DNA," but who knows what they think?

Since this person is part of the ENCODE team, we can assume that at least some of the scientists on the team are confused.

The Sanger Institute (Cambridge, UK) was an important player in the ENCODE Consortium. It put out a press release on the day the papers were published [Google Earth of Biomedical Research]. The opening paragraph is ...
The ENCODE Project, today, announces that most of what was previously considered as 'junk DNA' in the human genome is actually functional. The ENCODE Project has found that 80 per cent of the human genome sequence is linked to biological function.
It looks like the Sanger Institute equates "biochemical function" and "biological function" and it looks like neither one is compatible with junk DNA.

I think the ENCODE leaders, including Ewan Birney, knew exactly what they were doing when they defined function. They meant "biological function" even though they equivocated by saying "biochemical function." And they meant for this to be interpreted as "not junk" even though they are attempting to backtrack in the face of criticism.

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. (E. Birney, corresponding author)

Thursday, March 14, 2013

Anonymous Nature Editors Respond to ENCODE Criticism

There are now been four papers in the scientific literature criticizing the way ENCODE leaders hyped their data by claiming that most of our genome is functional [see Ford Doolittle's Critique of ENCODE ]. There have been dozens of blog postings on the same topic.

The worst of the papers were published by Nature—this includes the abominable summary that should never have made it past peer review (Encode Consortium, 2012).

The lead editor on the ENCODE story was Brendan Maher and he promoted the idea that the ENCODE results showed that most of our genome has a function [ENCODE: The human encyclopaedia]
The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes.

Monday, April 06, 2020

The Function Wars Part VII: Function monism vs function pluralism

This post is mostly about a recent paper published in Studies in History and Philosophy of Biol & Biomed Sci where two philosophers present their view of the function wars. They argue that the best definition of function is a weak etiological account (monism) and pluralistic accounts that include causal role (CR) definitions are mostly invalid. Weak etiological monism is the idea that sequence conservation is the best indication of function but that doesn't necessarily imply that the trait arose by natural selection (adaptation); it could have arisen by neutral processes such as constructive neutral evolution.

The paper makes several dubious claims about ENCODE that I want to discuss but first we need a little background.

Background

The ENCODE publicity campaign created a lot of controversy in 2012 because ENCODE researchers claimed that 80% of the human genome is functional. That claim conflicted with all the evidence that had accumulated up to that point in time. Based on their definition of function, the leading ENCODE researchers announced the death of junk DNA and this position was adopted by leading science writers and leading journals such as Nature and Science.

Let's be very clear about one thing. This was a SCIENTIFIC conflict over how to interpret data and evidence. The ENCODE researchers simply ignored a ton of evidence demonstrating that most of our genome is junk. Instead, they focused on the well-known facts that much of the genome is transcribed and that the genome is full of transcription factor binding sites. Neither of these facts were new and both of them had simple explanations: (1) most of the transcripts are spurious transcripts that have nothing to do with function, and (2) random non-functional transcription factor binding sites are expected from our knowledge of DNA binding proteins. The ENCODE researchers ignored these explanations and attributed function to all transcripts and all transcription factor binding sites. That's why they announced that 80% of the genome is functional.

Friday, May 09, 2014

How does Nature deal with the ENCODE publicity hype that it created?

Let's briefly review what happened in September 2012 when the ENCODE Consortium published their results (mostly in Nature).

Here's the abstract of the original paper published in Nature in September 2012 (Birney et al. 2012). Manolis Kellis (see below) is listed as a principle investigator and member of the steering committee.
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Most people reading this picked up on the idea that 80% of the genome had a function.

Thursday, September 06, 2012

The ENCODE Data Dump and the Responsibility of Science Journalists

ENCODE (ENcyclopedia Of DNA Elements) is a massive consortium of scientists dedicated to finding out what's in the human genome.

They published the results of a pilot study back in July 2007 (ENCODE, 2007) in which they analyzed a specific 1% of the human genome. That result suggested that much of our genome is transcribed at some time or another or in some cell type (pervasive transcription). The consortium also showed that the genome was littered with DNA binding sites that were frequently occupied by DNA binding proteins.

THEME

Genomes & Junk DNA
All of this suggested strongly that most of our genome has a function. However, in the actual paper the group was careful not to draw any firm conclusions.
... we also uncovered some surprises that challenge the current dogma on biological mechanisms. The generation of numerous intercalated transcripts spanning the majority of the genome has been repeatedly suggested, but this phenomenon has been met with mixed opinions about the biological importance of these transcripts. Our analyses of numerous orthogonal data sets firmly establish the presence of these transcripts, and thus the simple view of the genome as having a defined set of isolated loci transcribed independently does not seem to be accurate. Perhaps the genome encodes a network of transcripts, many of which are linked to protein-coding transcripts and to the majority of which we cannot (yet) assign a biological role. Our perspective of transcription and genes may have to evolve and also poses some interesting mechanistic questions. For example, how are splicing signals coordinated and used when there are so many overlapping primary transcripts? Similarly, to what extent does this reflect neutral turnover of reproducible transcripts with no biological role?
This didn't stop the hype. The results were widely interpreted as proof that most of our genome has a function and the result featured prominently in the creationist literature.

Thursday, February 09, 2017

NIH and UCSF ENCODE researchers are on Reddit right now!

Check out Science AMA Series: We’re Drs. Michael Keefer and James Kobie, infectious .... (Thanks to Paul Nelson for alerting me to the discussion.)

Here's part of the introduction ...
Yesterday NIH announced its latest round of ENCODE funding, which includes support for five new collaborative centers focused on using cutting edge techniques to characterize the candidate functional elements in healthy and diseased human cells. For example, when and where does an element function, and what exactly does it do.

UCSF is host to two of these five new centers, where researchers are using CRISPR gene editing, embryonic stem cells, and other new tools that let us rapidly screen hundreds of thousands of genome sequences in many different cell types at a time to learn which sequences are biologically relevant — and in what contexts they matter.

Today’s AMA brings together the leaders of NIH’s ENCODE project and the leaders of UCSF’s partner research centers.

Your hosts today are:

Nadav Ahituv, UCSF professor in the department of bioengineering and therapeutic sciences. Interested in gene regulation and how its alteration leads to morphological differences between organisms and human disease. Loves science and juggling.
Elise Feingold: Lead Program Director, Functional Genomics Program, NHGRI. I’ve been part of the ENCODE Project Management team since its start in 2003. I came up with the project’s name, ENCODE!
Dan Gilchrist, Program Director, Computational Genomics and Data Science, NHGRI. I joined the ENCODE Project Management team in 2014. Interests include mechanisms of gene regulation, using informatics to address biological questions, surf fishing.
Mike Pazin, Program Director, Functional Genomics Program, NHGRI. I’ve been part of the ENCODE Project Management team since 2011. My background is in chromatin structure and gene regulation. I love science, learning about how things work, and playing music.
Yin Shen: Assistant Professor in Neurology and Institute for Human Genetics, UCSF. I am interested in how genetics and epigenetics contribute to human health and diseases, especial for the human brain and complex neurological diseases. If I am not doing science, I like experimenting in the kitchen.

Sunday, September 04, 2022

Wikipedia: the ENCODE article

The ENCODE article on Wikipedia is a pretty good example of how to write a science article. Unfortunately, there are a few issues that will be very difficult to fix.

When Wikipedia was formed twenty years ago, there were many people who were skeptical about the concept of a free crowdsourced encyclopedia. Most people understood that a reliable source of information was needed for the internet because the traditional encyclopedias were too expensive, but could it be done by relying on volunteers to write articles that could be trusted?

The answer is mostly “yes” although that comes with some qualifications. Many science articles are not good; they contain inaccurate and misleading information and often don’t represent the scientific consensus. They also tend to be disjointed and unreadable. On the other hand, many non-science articles are at least as good, and often better, than anything in the traditional encyclopedias (eg. Battle of Waterloo; Toronto, Ontario; The Beach Boys).

By 2008, Wikipedia had expanded enormously and the quality of articles was being compared favorably to those of Encyclopedia Britannica, which had been forced to go online to compete. However, this comparison is a bit unfair since it downplays science articles.

Wednesday, December 14, 2016

The ENCODE publicity campaign of 2007

ENCODE1 published the results of a pilot project in 2007 (Birney et al., 2007). They looked at 1% (30Mb) of the genome with a view to establishing their techniques and dealing with large amounts of data from many different groups. The goal was to "provide a more biologically informative representation of the human genome by using high-throughput methods to identify and catalogue the functional elements encoded."

The most striking result of this preliminary study was the confirmation of pervasive transcription. Here's what the ENCODE Consortium leaders said in the abstract,
Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap with one another.
ENCODE concluded that 93% of the genome is transcribed in one tissue or another. There are two possible explanations that account for pervasive transcription.

Tuesday, July 24, 2007

Junk DNA in New Scientist

I just got my copy of the July 14th issue of New Scientist so I can comment on the article Why 'junk DNA' may be useful after all by Aria Pearson. RPM at evolvgen thinks it's pretty good [Junk on Junk] and so does Ryan Gregory at Genomicron [New Scientist gets it right]. I agree. It's one of the best articles on the subject that I've seen in a long time.

First off, Aria Pearson does not make the common mistake of assuming that junk DNA is equivalent to non-coding DNA. The article makes this very clear by pointing out that we've known about regulatory sequences since the 1970's. The main point of the article is to discuss recent results that reveal new functions for some of the previously unidentified non-coding DNA that was classified as junk.

One such result is that reported Pennacchio et al. (2006) in Nature last year. They analyzed sequences in the human genome that showed a high degree of identity to sequences in the pufferfish genome. The idea is that these presumably conserved sequences must have a function. Pennacchio et al. (2006) tested them to see it they would help regulate gene expression and they found that 45% of the ones they tested functioned as enhancers. In other words, they stimulated the expression of adjacent genes in a tissue specific manner. The authors estimate that about half of the "conserved" elements play a role in regulating gene expression.

There are a total of 3,124 conserved elements and their average length is 1,270 bp. This accounts for 3.9 × 106 bp out of a total genome size of 3.2 × 109 bp or about 0.1% of the genome. The New Scientist article acknowledges, correctly, that more than 95% of the genome could still be junk.

Is this all junk DNA? Unlike most other science journalists, Pearson addresses this question with a certain amount of skepticism and she makes an effort to quote conflicting opinions. For example, Pearson mentions experiments claiming that ~90% of the genome is transcribed. Rather than just repeating the hype of the researchers making this claim, Pearson quotes skeptics who argue that this RNA might be just "noise."

Most articles on junk DNA eventually get around to mentioning John Mattick who has been very vocal about his claim that the Central Dogma has been overturned and most of the genome consists of genes that encode regulatory RNAs (Mattick, 2004; Mattick, 2007). This article quotes a skeptic to provide some sense of balance and demonstrate that the scientific community is not overly supportive of Mattick.
Others are less convinced. Ewan Birney of the European Bioinformatics Institute in Cambridge, UK, has bet Mattick that of the processed RNAs yet to be assigned a function - representing 14 per cent of the entire genome - less than 20 per cent will turn out to be useful. "I'll get a case of vintage champagne if I win," Birney says.
Under the subtitle "Mostly Useless," Pearson correctly summarizes the scientific consensus. (I wish she had used this as the title of the article. The actual title is somewhat misleading. Editors?)
Whatever the answer turns out to be, no one is saying that most of our genome is vital after all. "You could chuck three-quarters of it," Birney speculates. "If you put a gun to my head, I'd say 10 per cent has a function, maybe," says Lunter. "It's very unlikely to be higher than 50 per cent."

Most researchers agree that 50 per cent is the top limit because half of our genome consists of endless copies of parasitic DNA or "transposons", which do nothing except copy and paste themselves all over the genome until they are inactivated by random mutations. A handful are still active in our genome and can cause diseases such as breast cancer if they land in or near vital genes.
The ENCODE project made a big splash in the blogosphere last month (ENCODE Project Consortium, 2007). This study purported to show that much of the human genome was transcribed, leading to the suggestion that most of what we think is junk actually has some function. Aria Pearson interviewed Ewan Birney (see above) who is involved in the ENCODE project.

The real surprise is that ENCODE has identified many non-coding sequences in humans that seem to have a function, yet are not conserved in rats and mice. There seem to be just as many of these non-conserved functional sequences as there are conserved ones. One explanation is that these are the crucial sequences that make humans different from mice. However, Birney thinks this is likely to be true of only a tiny proportion of these non-conserved yet functional sequences. Instead, he thinks most are neutral. "They have appeared by chance and neither hinder nor help the organism."

Put another way, just because a certain piece of DNA can do something doesn't mean we really need it to do whatever it does. Such DNA may be very like computer bloatware: functional in one sense yet useless as far as users are concerned.
This is a perspective you don't often see in popular articles about junk DNA and Pearson is to be commended for taking the time and effort to find the right scientific perspective.

The article concludes by reporting the efforts to delete large amounts of mouse DNA in order to test whether they are junk or not. The results show that much of the conserved bits of DNA can be removed without any harmful effects. Some researchers urge caution by pointing out that very small effects may not be observed in laboratory mice but may be important for evolution in the long term.

ENCODE Project Consortium (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799-816. [PubMed Abstract]

Mattick, J.S. (2004) The hidden genetic program of complex organisms. Sci. Am. 291:60-7.

Mattick, J.S. (2007) A new paradigm for developmental biology. J. Exp. Biol. 210:1526-47. [PubMed Abstract].

Pennacchio, L.A., Ahituv, N., Moses, A.M., Prabhakar, S., Nobrega, M.A., Shoukry, M., Minovitsky, S., Dubchak, I., Holt, A., Lewis, K.D., Plajzer-Frick, I., Akiyama, J., De Val, S., Afzal, V., Black, B.L., Couronne, O., Eisen, M.B., Visel, A., Rubin, E.M. (2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature 444(7118):499-502.

Friday, August 26, 2022

ENCODE and their current definition of "function"

ENCODE has mostly abandoned it's definition of function based on biochemical activity and replaced it with "candidate" function or "likely" function, but the message isn't getting out.

Back in 2012, the ENCODE Consortium announced that 80% of the human genome was functional and junk DNA was dead [What did the ENCODE Consortium say in 2012?]. This claim was widely disputed, causing the ENCODE Consortium leaders to back down in 2014 and restate their goal (Kellis et al. 2014). The new goal is merely to map all the potential functional elements.

... the Encyclopedia of DNA Elements Project [ENCODE] was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types.

The new goal was repeated when the ENCODE III results were published in 2020, although you had to read carefully to recognize that they were no longer claiming to identify functional elements in the genome and they were raising no objections to junk DNA [ENCODE 3: A lesson in obfuscation and opaqueness].

Thursday, September 06, 2012

The ENCODE Data Dump and the Responsibility of Scientists

A few hours ago I criticized science journalists for getting suckered by the hype surrounding the publication of 30 papers from the ENCODE Consortium on the function of the human genome [The ENCODE Data Dump and the Responsibility of Science Journalists].

They got their information from supposedly reputable scientists but that's not an excuse. It is the duty and responsibility of science journalists to be skeptical of what scientists say about their own work. In this particular case, the scientists are saying the same things that were thoroughly criticized in 2007 when the preliminary results were published.

I'm not letting the science journalists off the hook but I reserve my harshest criticism for the scientists, especially Ewan Birney who is the lead analysis coordinator for the project and who has taken on the role as spokesperson for the consortium. Unless other members of the consortium speak out, I'll assume they agree with Ewan Birney. They bear the same responsibility for what has happened.

Saturday, August 01, 2020

ENCODE 3: A lesson in obfuscation and opaqueness

The Encyclopedia of DNA Elements (ENCODE) is a large-scale, and very expensive, attempt to map all of the functional elements in the human genome.

The preliminary study (ENCODE 1) was published in 2007 and the main publicity campaign surrounding that study focused on the fact that much of the human genome was transcribed. The implication was that most of the genome is functional. [see: The ENCODE publicity campaign of 2007].

The ENCODE 2 results were published in 2012 and the publicity campaign emphasized that up to 80% of our genome is functional. Many stories in the popular press touted the death of junk DNA. [see: What did the ENCODE Consortium say in 2012]

Both of these publicity campaigns, and the published conclusions, were heavily criticized for not understanding the distinction between fortuitous transcription and real genes and for not understanding the difference between fortuitous binding sites and functional binding sites. Hundreds of knowledgeable scientists pointed out that it was ridiculous for ENCODE researchers to claim that most of the human genome is functional based on their data. They also pointed out that ENCODE researchers ignored most of the evidence supporting junk DNA.

ENCODE 3 has just been published and the hype has been toned down considerably. Take a look at the main publicity article just published by Nature (ENCODE 3). The Nature article mentions ENCODE 1 and ENCODE 2 but it conveniently ignores the fact that Nature heavily promoted the demise of junk DNA back in 2007 and 2012. The emphasis now is not on how much of the genome is functional—the main goal of ENCODE—but on how much data has been generated and how many papers have been published. You can read the entire article and not see any mention of previous ENCODE/Nature claims. In fact, they don't even tell you how many genes ENCODE found or how many functional regulatory sites were detected.

The News and Views article isn't any better (Expanded ENCODE delivers invaluable genomic encyclopedia). Here's the opening paragraph of that article ...
Less than 2% of the human genome encodes proteins. A grand challenge for genomic sciences has been mapping the functional elements — the regions that determine the extent to which genes are expressed — in the remaining 98% of our DNA. The Encyclopedia of DNA Elements (ENCODE) project, among other large collaborative efforts, was established in 2003 to create a catalogue of these functional elements and to outline their roles in regulating gene expression. In nine papers in Nature, the ENCODE consortium delivers the third phase of its valuable project.1
You'd think with such an introduction that you would be about to learn how much of the genome is functional according to ENCODE 3 but you will be disappointed. There's nothing in that article about the number of genes, the number of regulatory sites, or the number of other functional elements in the human genome. It almost as if Nature wants to tell you about all of the work involved in "mapping the functional elements" without ever describing the results and conclusions. This is in marked contrast to the Nature publicity campaigns of 2007 and 2012 where they were more than willing to promote the (incorrect) conclusions.

In 2020 Nature seems to be more interested in obfuscation and opaqueness. One other thing is certain, the Nature editors and writers aren't the least bit interested in discussing their previous claims about 80% of the genome being functional!

I guess we'll have to rely on the ENCODE Consortium itself to give us a summary of their most recent findings. The summary paper has an intriguing title (Perspectives on ENCODE) that almost makes you think they will revisit the exaggerated claims of 2007 and 2012. No such luck. However, we do learn a little bit about the human genome.
  • 20,225 protein-coding genes [almost 1000 more than the best published estimates - LAM]
  • 37,595 noncoding genes [I strongly doubt they have evidence for that many functional genes]
  • 2,157,387 open chromatin regions [what does this mean?]
  • 1,224,154 transcription factor binding sites [how many are functional?]
That's it. The ENCODE Consortium seems to have learned only two things in 2012. They learned that it's better to avoid mentioning how much of the genome is functional in order to avoid controversy and criticism and they learned that it's best to ignore any of their previous claims for the same reason. This is not how science is supposed to work but the ENCODE Consortium has never been good at showing us how science is supposed to work.

Note: I've looked at some of the papers to try and find out if ENCODE stands by it's previous claim that most the genome is functional but they all seem to be written in a way that avoids committing to such a percentage or addressing the criticisms from 2007 and 2012. The only exception is a paper stating that cis-regulatory elements occupy 7.9% of the human genome (Expanded encyclopaedias of DNA elements in the human and mouse genomes). Please let me know if you come across anything interesting in those papers.


1. Isn't it about time to stop dwelling on the fact that 2% (actually less than 1%) of our genome encodes protein? We've known for decades that there are all kinds of other functional regions of the genome. No knowledgeable scientist thinks that the remaining 98% (99%) has no function.

Friday, September 07, 2012

More Expert Opinion on Junk DNA from Scientists

The Nature issue containing the latest ENCODE Consortium papers also has a New & Views article called "Genomics: ENCODE explained" (Ecker et al., 2012). Some of these scientist comment on junk DNA.

For exampleshere's what Joseph Ecker says,
One of the more remarkable findings described in the consortium's 'entrée' paper is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly 'junk DNA'. The authors report that the space between genes is filled with enhancers (regulatory DNA elements), promoters (the sites at which DNA's transcription into RNA is initiated) and numerous previously overlooked regions that encode RNA transcripts that are not translated into proteins but might have regulatory roles.
And here's what Inês Barroso, says,
The vast majority of the human genome does not code for proteins and, until now, did not seem to contain defined gene-regulatory elements. Why evolution would maintain large amounts of 'useless' DNA had remained a mystery, and seemed wasteful. It turns out, however, that there are good reasons to keep this DNA. Results from the ENCODE project show that most of these stretches of DNA harbour regions that bind proteins and RNA molecules, bringing these into positions from which they cooperate with each other to regulate the function and level of expression of protein-coding genes. In addition, it seems that widespread transcription from non-coding DNA potentially acts as a reservoir for the creation of new functional molecules, such as regulatory RNAs.
If this were an undergraduate course I would ask for a show of hands in response to the question, "How many of you thought that there did not seem to be "defined gene-regulatory elements" in noncoding DNA?"

I would also ask, "How many of you have no idea how evolution could retain "useless" DNA in our genome?" Undergraduates who don't understand evolution should not graduate in a biological science program. It's too bad we don't have similar restrictions on senor scientists who write News & Views articles for Nature.

Jonathan Pritchard and Yoav Gilad write,
One of the great challenges in evolutionary biology is to understand how differences in DNA sequence between species determine differences in their phenotypes. Evolutionary change may occur both through changes in protein-coding sequences and through sequence changes that alter gene regulation.

There is growing recognition of the importance of this regulatory evolution, on the basis of numerous specific examples as well as on theoretical grounds. It has been argued that potentially adaptive changes to protein-coding sequences may often be prevented by natural selection because, even if they are beneficial in one cell type or tissue, they may be detrimental elsewhere in the organism. By contrast, because gene-regulatory sequences are frequently associated with temporally and spatially specific gene-expression patterns, changes in these regions may modify the function of only certain cell types at specific times, making it more likely that they will confer an evolutionary advantage.

However, until now there has been little information about which genomic regions have regulatory activity. The ENCODE project has provided a first draft of a 'parts list' of these regulatory elements, in a wide range of cell types, and moves us considerably closer to one of the key goals of genomics: understanding the functional roles (if any) of every position in the human genome.
The problem here is the hype. While it's true that the ENCODE project has produced massive amounts of data on transcription binding sites etc., it's a bit of an exaggeration to say that "until now there has been little information about which genomic regions have regulatory activity." Twenty-five years ago, my lab published some pretty precise information about the parts of the genome regulating activity of a mouse hsp70 gene. There have been thousands of other papers on the the subject of gene regulatory sequences since then. I think we actually have a pretty good understanding of gene regulation in eukaryotes. It's a model that seems to work well for most genes.

The real challenge from the ENCODE Consortium is that they question that understanding. They are proposing that huge amounts of the genome are devoted to fine-tuning the expression of most genes in a vast network of binding sites and small RNAs. That's not the picture we have developed over the past four decades. If true, it would not only mean that a lot less DNA is junk but it would also mean that the regulation of gene expression is fundamentally different than it is in E. coli.



[Image Credit: ScienceDaily: In Massive Genome Analysis ENCODE Data Suggests 'Gene' Redefinition.

Ecker, J.R., Bickmore, W.A., Barroso, I., Pritchard, J.K. (2012) Genomics: ENCODE explained. Nature 489:52-55. [doi:10.1038/489052a]
Yoav Gilad
& Eran Segal