More Recent Comments

Showing posts sorted by relevance for query encode. Sort by date Show all posts
Showing posts sorted by relevance for query encode. Sort by date Show all posts

Wednesday, June 29, 2022

The Function Wars Part XII: Revising history and defending ENCODE

I'm very disappointed in scientists and philosophers who try to defend ENCODE's behavior on the grounds that they were using a legitimate definition of function. I'm even more annoyed when they deliberately misrepresent ENCODE's motive in launching the massive publicity campaign in 2012.

Here's another new paper on the function wars.

Ratti, E. and Germain, P.-L. (2021) A Relic of Design: Against Proper Functions in Biology. Biology & Philosophy 37:27. [doi: 10.1007/s10539-022-09856-z]

The notion of biological function is fraught with difficulties - intrinsically and irremediably so, we argue. The physiological practice of functional ascription originates from a time when organisms were thought to be designed and remained largely unchanged since. In a secularized worldview, this creates a paradox which accounts of functions as selected effect attempt to resolve. This attempt, we argue, misses its target in physiology and it brings problems of its own. Instead, we propose that a better solution to the conundrum of biological functions is to abandon the notion altogether, a prospect not only less daunting than it appears, but arguably the natural continuation of the naturalisation of biology.

Friday, November 20, 2015

The truth about ENCODE

A few months ago I highlighted a paper by Casane et al. (2015) where they said ...
In September 2012, a batch of more than 30 articles presenting the results of the ENCODE (Encyclopaedia of DNA Elements) project was released. Many of these articles appeared in Nature and Science, the two most prestigious interdisciplinary scientific journals. Since that time, hundreds of other articles dedicated to the further analyses of the Encode data have been published. The time of hundreds of scientists and hundreds of millions of dollars were not invested in vain since this project had led to an apparent paradigm shift: contrary to the classical view, 80% of the human genome is not junk DNA, but is functional. This hypothesis has been criticized by evolutionary biologists, sometimes eagerly, and detailed refutations have been published in specialized journals with impact factors far below those that published the main contribution of the Encode project to our understanding of genome architecture. In 2014, the Encode consortium released a new batch of articles that neither suggested that 80% of the genome is functional nor commented on the disappearance of their 2012 scientific breakthrough. Unfortunately, by that time many biologists had accepted the idea that 80% of the genome is functional, or at least, that this idea is a valid alternative to the long held evolutionary genetic view that it is not. In order to understand the dynamics of the genome, it is necessary to re-examine the basics of evolutionary genetics because, not only are they well established, they also will allow us to avoid the pitfall of a panglossian interpretation of Encode. Actually, the architecture of the genome and its dynamics are the product of trade-offs between various evolutionary forces, and many structural features are not related to functional properties. In other words, evolution does not produce the best of all worlds, not even the best of all possible worlds, but only one possible world.
How did we get to this stage where the most publicized result of papers published by leading scientists in the best journals turns out to be wrong, but hardly anyone knows it?

Back in September 2012, the ENCODE Consortium was preparing to publish dozens of papers on their analysis of the human genome. Most of the results were quite boring but that doesn't mean they were useless. The leaders of the Consortium must have been worried that science journalists would not give them the publicity they craved so they came up with a strategy and a publicity campaign to promote their work.

Their leader was Ewan Birney, a scientist with valuable skills as a herder of cats but little experience in evolutionary biology and the history of the junk DNA debate.

The ENCODE Consortium decided to add up all the transcription factor binding sites—spurious or not—and all the chromatin makers—whether or not they meant anything—and all the transcripts—even if they were junk. With a little judicious juggling of numbers they came up with the following summary of their results (Birney et al., 2012) ..
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
See What did the ENCODE Consortium say in 2012? for more details on what the ENCODE Consortium leaders said, and did, when their papers came out.

The bottom line is that these leaders knew exactly what they were doing and why. By saying they have assigned biochemical functions for 80% of the genome they knew that this would be the headline. They knew that journalists and publicists would interpret this to mean the end of junk DNA. Most of ENCODE leaders actually believed it.

That's exactly what happened ... aided and abetted by the ENCODE Consortium, the journals Nature and Science, and gullible science journalists all over the world. (Ryan Gregory has published a list of articles that appeared in the popular press: The ENCODE media hype machine..)

Almost immediately the knowledgeable scientists and science writers tried to expose this publicity campaign hype. The first criticisms appeared on various science blogs and this was followed by a series of papers in the published scientific literature. Ed Yong, an experienced science journalist, interviewed Ewan Birney and blogged about ENCODE on the first day. Yong reported the standard publicity hype that most of our genome is functional and this interpretation is confirmed by Ewan Birney and other senior scientists. Two days later, Ed Yong started adding updates to his blog posting after reading the blogs of many scientists including some who were well-recognized experts on genomes and evolution [ENCODE: the rough guide to the human genome].

Within a few days of publishing their results the ENCODE Consortium was coming under intense criticism from all sides. A few journalists, like John Timmer, recongized right away what the problem was ...
Yet the third sentence of the lead ENCODE paper contains an eye-catching figure that ended up being reported widely: "These data enabled us to assign biochemical functions for 80 percent of the genome." Unfortunately, the significance of that statement hinged on a much less widely reported item: the definition of "biochemical function" used by the authors.

This was more than a matter of semantics. Many press reports that resulted painted an entirely fictitious history of biology's past, along with a misleading picture of its present. As a result, the public that relied on those press reports now has a completely mistaken view of our current state of knowledge (this happens to be the exact opposite of what journalism is intended to accomplish). But you can't entirely blame the press in this case. They were egged on by the journals and university press offices that promoted the work—and, in some cases, the scientists themselves.


[Most of what you read was wrong: how press releases rewrote scientific history]
Nature may have begun to realize that it made a mistake in promoting the idea that most of our genome was functional. Two days after the papers appeared, Brendan Maher, a Feature Editor for Nature, tried to get the journal off the hook but only succeeded in making matters worse [see Brendan Maher Writes About the ENCODE/Junk DNA Publicity Fiasco].

Meanwhile, two private for-profit companies, illumina and Nature, team up to promote the ENCODE results. They even hire Tim Minchin to narrate it. This is what hype looks like ...


Soon articles began to appear in the scientific literature challenging the ENCODE Consortium's interpretation of function and explaining the difference between an effect—such as the binding of a transcription factor to a random piece of DNA—and a true biological function.

Eddy, S.R. (2012) The C-value paradox, junk DNA and ENCODE. Current Biology, 22:R898. [doi: 10.1016/j.cub.2012.10.002]

Niu, D. K., and Jiang, L. (2012) Can ENCODE tell us how much junk DNA we carry in our genome?. Biochemical and biophysical research communications 430:1340-1343. [doi: 10.1016/j.bbrc.2012.12.074]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) published online March 11, 2013. [PubMed] [doi: 10.1073/pnas.1221376110]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A., and Elhaik, E. (2013) On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution published online: February 20, 2013 [doi: 10.1093/gbe/evt028

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology, 23:R259-R261. [10.1016/j.cub.2013.03.023]

Hurst, L.D. (2013) Open questions: A logic (or lack thereof) of genome organization. BMC biology, 11:58. [doi:10.1186/1741-7007-11-58]

Morange, M. (2014) Genome as a Multipurpose Structure Built by Evolution. Perspectives in biology and medicine, 57:162-171. [doi: 10.1353/pbm.2014.000]

Palazzo, A.F., and Gregory, T.R. (2014) The Case for Junk DNA. PLoS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]

By March 2013—six months after publication of the ENCODE papers—some editors at Nature decided that they had better say something else [see Anonymous Nature Editors Respond to ENCODE Criticism]. Here's the closest thing to an apology that they have ever written ....
The debate over ENCODE’s definition of function retreads some old battles, dating back perhaps to geneticist Susumu Ohno’s coinage of the term junk DNA in the 1970s. The phrase has had a polarizing effect on the life-sciences community ever since, despite several revisions of its meaning. Indeed, many news reports and press releases describing ENCODE’s work claimed that by showing that most of the genome was ‘functional’, the project had killed the concept of junk DNA. This claim annoyed both those who thought it a premature obituary and those who considered it old news.

There is a valuable and genuine debate here. To define what, if anything, the billions of non-protein-coding base pairs in the human genome do, and how they affect cellular and system-level processes, remains an important, open and debatable question. Ironically, it is a question that the language of the current debate may detract from. As Ewan Birney, co-director of the ENCODE project, noted on his blog: “Hindsight is a cruel and wonderful thing, and probably we could have achieved the same thing without generating this unneeded, confusing discussion on what we meant and how we said it”.
Oops! The importance of junk DNA is still an "important, open and debatable question" in spite of what the video sponsored by Nature might imply.

(To this day, neither Nature nor Science have actually apologized for misleading the public about the ENCODE results. [see Science still doesn't get it ])

The ENCODE Consortium leaders responded in April 2014—eighteen months after their original papers were published.

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G. E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]

In that paper they acknowledge that there are multiple meanings of the word function and their choice of "biochemical" function may not have been the best choice ....
However, biochemical signatures are often a consequence of function, rather than causal. They are also not always deterministic evidence of function, but can occur stochastically.
This is exactly what many scientists have been telling them. Apparently they did not know this in September 2012.

They also include in their paper a section on "Case for Abundant Junk DNA." It summarizes the evidence for junk DNA, evidence that the ENCODE Consortium did not acknowledge in 2012 and certainly didn't refute.

In answer to the question, "What Fraction of the Human Genome Is Functional?" they now conclude that ENCODE hasn't answered that question and more work is needed. They now claim that the real value of ENCODE is to provide "high-resolution, highly-reproducible maps of DNA segments with biochemical signatures associate with diverse molecular functions."
We believe that this public resource is far more important than any interim estimate of the fraction of the human genome that is functional.
There you have it, straight from the horse's mouth. The ENCODE Consortium now believes that you should NOT interpret their results to mean that 80% of the genome is functional and therefore not junk DNA. There is good evidence for abundant junk DNA and the issue is still debatable.

I hope everyone pays attention and stops referring to the promotional hype saying that ENCODE has refuted junk DNA. That's not what the ENCODE Consortium leaders now say about their results.


Casane, D., Fumey, J., et Laurenti, P. (2015) L’apophénie d’ENCODE ou Pangloss examine le génome humain. Med. Sci. (Paris) 31: 680-686. [doi: 10.1051/medsci/20153106023]

Saturday, August 01, 2020

ENCODE 3: A lesson in obfuscation and opaqueness

The Encyclopedia of DNA Elements (ENCODE) is a large-scale, and very expensive, attempt to map all of the functional elements in the human genome.

The preliminary study (ENCODE 1) was published in 2007 and the main publicity campaign surrounding that study focused on the fact that much of the human genome was transcribed. The implication was that most of the genome is functional. [see: The ENCODE publicity campaign of 2007].

The ENCODE 2 results were published in 2012 and the publicity campaign emphasized that up to 80% of our genome is functional. Many stories in the popular press touted the death of junk DNA. [see: What did the ENCODE Consortium say in 2012]

Both of these publicity campaigns, and the published conclusions, were heavily criticized for not understanding the distinction between fortuitous transcription and real genes and for not understanding the difference between fortuitous binding sites and functional binding sites. Hundreds of knowledgeable scientists pointed out that it was ridiculous for ENCODE researchers to claim that most of the human genome is functional based on their data. They also pointed out that ENCODE researchers ignored most of the evidence supporting junk DNA.

ENCODE 3 has just been published and the hype has been toned down considerably. Take a look at the main publicity article just published by Nature (ENCODE 3). The Nature article mentions ENCODE 1 and ENCODE 2 but it conveniently ignores the fact that Nature heavily promoted the demise of junk DNA back in 2007 and 2012. The emphasis now is not on how much of the genome is functional—the main goal of ENCODE—but on how much data has been generated and how many papers have been published. You can read the entire article and not see any mention of previous ENCODE/Nature claims. In fact, they don't even tell you how many genes ENCODE found or how many functional regulatory sites were detected.

The News and Views article isn't any better (Expanded ENCODE delivers invaluable genomic encyclopedia). Here's the opening paragraph of that article ...
Less than 2% of the human genome encodes proteins. A grand challenge for genomic sciences has been mapping the functional elements — the regions that determine the extent to which genes are expressed — in the remaining 98% of our DNA. The Encyclopedia of DNA Elements (ENCODE) project, among other large collaborative efforts, was established in 2003 to create a catalogue of these functional elements and to outline their roles in regulating gene expression. In nine papers in Nature, the ENCODE consortium delivers the third phase of its valuable project.1
You'd think with such an introduction that you would be about to learn how much of the genome is functional according to ENCODE 3 but you will be disappointed. There's nothing in that article about the number of genes, the number of regulatory sites, or the number of other functional elements in the human genome. It almost as if Nature wants to tell you about all of the work involved in "mapping the functional elements" without ever describing the results and conclusions. This is in marked contrast to the Nature publicity campaigns of 2007 and 2012 where they were more than willing to promote the (incorrect) conclusions.

In 2020 Nature seems to be more interested in obfuscation and opaqueness. One other thing is certain, the Nature editors and writers aren't the least bit interested in discussing their previous claims about 80% of the genome being functional!

I guess we'll have to rely on the ENCODE Consortium itself to give us a summary of their most recent findings. The summary paper has an intriguing title (Perspectives on ENCODE) that almost makes you think they will revisit the exaggerated claims of 2007 and 2012. No such luck. However, we do learn a little bit about the human genome.
  • 20,225 protein-coding genes [almost 1000 more than the best published estimates - LAM]
  • 37,595 noncoding genes [I strongly doubt they have evidence for that many functional genes]
  • 2,157,387 open chromatin regions [what does this mean?]
  • 1,224,154 transcription factor binding sites [how many are functional?]
That's it. The ENCODE Consortium seems to have learned only two things in 2012. They learned that it's better to avoid mentioning how much of the genome is functional in order to avoid controversy and criticism and they learned that it's best to ignore any of their previous claims for the same reason. This is not how science is supposed to work but the ENCODE Consortium has never been good at showing us how science is supposed to work.

Note: I've looked at some of the papers to try and find out if ENCODE stands by it's previous claim that most the genome is functional but they all seem to be written in a way that avoids committing to such a percentage or addressing the criticisms from 2007 and 2012. The only exception is a paper stating that cis-regulatory elements occupy 7.9% of the human genome (Expanded encyclopaedias of DNA elements in the human and mouse genomes). Please let me know if you come across anything interesting in those papers.


1. Isn't it about time to stop dwelling on the fact that 2% (actually less than 1%) of our genome encodes protein? We've known for decades that there are all kinds of other functional regions of the genome. No knowledgeable scientist thinks that the remaining 98% (99%) has no function.

Monday, September 05, 2022

The 10th anniversary of the ENCODE publicity campaign fiasco

On Sept. 5, 2012 ENCODE researchers, in collaboration with the science journal Nature, launched a massive publicity campaign to convince the world that junk DNA was dead. We are still dealing with the fallout from that disaster.

The Encyclopedia of DNA Elements (ENCODE) was originally set up to discover all of the functional elements in the human genome. They carried out a massive number of experiments involving a huge group of researchers from many different countries. The results of this work were published in a series of papers in the September 6th, 2012 issue of Nature. (The papers appeared on Sept. 5th.)

Tuesday, June 25, 2013

"Reasons to Believe" in ENCODE

Fazale "Fuz" Rana is a biochemist at Reasons to Believe". He and his colleagues are Christian apologists who try to make their faith compatible with science. Fuz was very excited about the ENCODE results when they were first published [One of the Most Significant Days in the History of Biochemistry]. That's because Christians of his ilk were very unhappy about junk DNA and the ENCODE Consortium showed that all of our genome is functional.1

Fuz is aware of the fact that some people are skeptical about the ENCODE results. He wrote a series of posts defending ENCODE.
  1. Do ENCODE Skeptics Protest Too Much? Part 1 (of 3)
  2. Do ENCODE Skeptics Protest Too Much? Part 2 (of 3)
  3. Do ENCODE Skeptics Protest Too Much? Part 3 (of 3)
The first post is merely a list of the objections many of us raised.

Thursday, March 14, 2013

Anonymous Nature Editors Respond to ENCODE Criticism

There are now been four papers in the scientific literature criticizing the way ENCODE leaders hyped their data by claiming that most of our genome is functional [see Ford Doolittle's Critique of ENCODE ]. There have been dozens of blog postings on the same topic.

The worst of the papers were published by Nature—this includes the abominable summary that should never have made it past peer review (Encode Consortium, 2012).

The lead editor on the ENCODE story was Brendan Maher and he promoted the idea that the ENCODE results showed that most of our genome has a function [ENCODE: The human encyclopaedia]
The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes.

Friday, March 21, 2014

Science still doesn't get it

The latest issue of Science contains an article by Yudhijit Bhattacharjee about Dan Graur and his critique of the ENCODE publicity disaster of September 2012. The focus of the article is on whether Dan's tone is appropriate when discussing science.

Let me remind you what Science published back on September 7, 2012. Elizabeth Pennisi announced that ENCODE had written the eulogy for junk DNA. She quoted one of the leading researchers ...

Friday, May 09, 2014

How does Nature deal with the ENCODE publicity hype that it created?

Let's briefly review what happened in September 2012 when the ENCODE Consortium published their results (mostly in Nature).

Here's the abstract of the original paper published in Nature in September 2012 (Birney et al. 2012). Manolis Kellis (see below) is listed as a principle investigator and member of the steering committee.
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Most people reading this picked up on the idea that 80% of the genome had a function.

Monday, April 06, 2020

The Function Wars Part VII: Function monism vs function pluralism

This post is mostly about a recent paper published in Studies in History and Philosophy of Biol & Biomed Sci where two philosophers present their view of the function wars. They argue that the best definition of function is a weak etiological account (monism) and pluralistic accounts that include causal role (CR) definitions are mostly invalid. Weak etiological monism is the idea that sequence conservation is the best indication of function but that doesn't necessarily imply that the trait arose by natural selection (adaptation); it could have arisen by neutral processes such as constructive neutral evolution.

The paper makes several dubious claims about ENCODE that I want to discuss but first we need a little background.

Background

The ENCODE publicity campaign created a lot of controversy in 2012 because ENCODE researchers claimed that 80% of the human genome is functional. That claim conflicted with all the evidence that had accumulated up to that point in time. Based on their definition of function, the leading ENCODE researchers announced the death of junk DNA and this position was adopted by leading science writers and leading journals such as Nature and Science.

Let's be very clear about one thing. This was a SCIENTIFIC conflict over how to interpret data and evidence. The ENCODE researchers simply ignored a ton of evidence demonstrating that most of our genome is junk. Instead, they focused on the well-known facts that much of the genome is transcribed and that the genome is full of transcription factor binding sites. Neither of these facts were new and both of them had simple explanations: (1) most of the transcripts are spurious transcripts that have nothing to do with function, and (2) random non-functional transcription factor binding sites are expected from our knowledge of DNA binding proteins. The ENCODE researchers ignored these explanations and attributed function to all transcripts and all transcription factor binding sites. That's why they announced that 80% of the genome is functional.

Friday, August 26, 2022

ENCODE and their current definition of "function"

ENCODE has mostly abandoned it's definition of function based on biochemical activity and replaced it with "candidate" function or "likely" function, but the message isn't getting out.

Back in 2012, the ENCODE Consortium announced that 80% of the human genome was functional and junk DNA was dead [What did the ENCODE Consortium say in 2012?]. This claim was widely disputed, causing the ENCODE Consortium leaders to back down in 2014 and restate their goal (Kellis et al. 2014). The new goal is merely to map all the potential functional elements.

... the Encyclopedia of DNA Elements Project [ENCODE] was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types.

The new goal was repeated when the ENCODE III results were published in 2020, although you had to read carefully to recognize that they were no longer claiming to identify functional elements in the genome and they were raising no objections to junk DNA [ENCODE 3: A lesson in obfuscation and opaqueness].

Wednesday, May 14, 2014

What did the ENCODE Consortium say in 2012?

When the ENCODE Consortium published their results in September 2012, the popular press immediately seized upon the idea that most of our genome was functional and the concept of junk DNA was debunked. The "media" in this case includes writers at prestigious journals like Science and Nature and well-known science writers in other respected publications and blogs.

In most cases, those articles contained interviews with ENCODE leaders and direct quotes about the presence of large amounts of functional DNA in the human genome.

The second wave of the ENCODE publicity campaign is trying to claim that this was all a misunderstanding. According to this revisionist view of recent history, the actual ENCODE papers never said that most of our genome had to be functional and never implied that junk DNA was dead. It was the media that misinterpreted the papers. Don't blame the scientists.

You can see an example of this version of history in the comments to How does Nature deal with the ENCODE publicity hype that it created?, where some people are arguing that the ENCODE summary paper has been misrepresented.

Saturday, February 11, 2017

What did ENCODE researchers say on Reddit?

ENCODE researchers answered a bunch of question on Reddit a few days ago. I asked them to give their opinion on how much junk DNA is in our genome but they declined to answer that question. However, I think we can get some idea about the current thinking in the leading labs by looking at the questions they did choose to answer. I don't think the picture is very encouraging. It's been almost five years since the ENCODE publicity disaster of September 2012. You'd think the researchers might have learned a thing or two about junk DNA since that fiasco.

The question and answer session on Reddit was prompted by award of a new grant to ENCODE. They just received 31.5 million dollars to continue their search for functional regions in the human genome. You might have guessed that Dan Graur would have a few words to say about giving ENCODE even more money [Proof that 100% of the Human Genome is Functional & that It Was Created by a Very Intelligent Designer @ENCODE_NIH].

Sunday, September 04, 2022

Wikipedia: the ENCODE article

The ENCODE article on Wikipedia is a pretty good example of how to write a science article. Unfortunately, there are a few issues that will be very difficult to fix.

When Wikipedia was formed twenty years ago, there were many people who were skeptical about the concept of a free crowdsourced encyclopedia. Most people understood that a reliable source of information was needed for the internet because the traditional encyclopedias were too expensive, but could it be done by relying on volunteers to write articles that could be trusted?

The answer is mostly “yes” although that comes with some qualifications. Many science articles are not good; they contain inaccurate and misleading information and often don’t represent the scientific consensus. They also tend to be disjointed and unreadable. On the other hand, many non-science articles are at least as good, and often better, than anything in the traditional encyclopedias (eg. Battle of Waterloo; Toronto, Ontario; The Beach Boys).

By 2008, Wikipedia had expanded enormously and the quality of articles was being compared favorably to those of Encyclopedia Britannica, which had been forced to go online to compete. However, this comparison is a bit unfair since it downplays science articles.

Thursday, December 15, 2016

Nature opposes misinformation (pot, kettle, black)

The lead editorial in last week's issue of Nature (Dec. 8, 2016) urges us to Take the time and effort to correct misinformation. The author (Phil Williamson) is a scientist whose major research interest is climate change and the issue he's addressing is climate change denial. That's a clear example of misinformation but there are other, more subtle, examples that also need attention. I like what he says in the opening paragraphs,

Most researchers who have tried to engage online with ill-informed journalists or pseudoscientists will be familiar with Brandolini’s law (also known as the Bullshit Asymmetry Principle): the amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it. Is it really worth taking the time and effort to challenge, correct and clarify articles that claim to be about science but in most cases seem to represent a political ideology?

I think it is. Challenging falsehoods and misrepresentation may not seem to have any immediate effect, but someone, somewhere, will hear or read our response. The target is not the peddler of nonsense, but those readers who have an open mind on scientific problems. A lie may be able to travel around the world before the truth has its shoes on, but an unchallenged untruth will never stop.
I've had a bit of experience trying to engage journalists who appear to be ill-informed. I've had little success in convincing them that their reporting leaves a lot to be desired.

I agree with Phil Williamson that challenging falsehoods and misrepresentation is absolutely necessary even if it has no immediate effect. Recently I posted a piece on the misrepresentations of the ENCODE results in 2007 and pointed a finger at Nature and their editors [The ENCODE publicity campaign of 2007]. They are responsible because they did not ensure that the main paper (Birney et al., 2007) was subjected to appropriate peer review. They are responsible because they promoted misrepresentations in their News article and they are responsible because they published a rather silly News & Views article that did little to correct the misrepresentations.

That was nine years ago. Nature never admitted they were partly to blame for misrepresenting the function of the human genome.

Wednesday, December 14, 2016

The ENCODE publicity campaign of 2007

ENCODE1 published the results of a pilot project in 2007 (Birney et al., 2007). They looked at 1% (30Mb) of the genome with a view to establishing their techniques and dealing with large amounts of data from many different groups. The goal was to "provide a more biologically informative representation of the human genome by using high-throughput methods to identify and catalogue the functional elements encoded."

The most striking result of this preliminary study was the confirmation of pervasive transcription. Here's what the ENCODE Consortium leaders said in the abstract,
Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap with one another.
ENCODE concluded that 93% of the genome is transcribed in one tissue or another. There are two possible explanations that account for pervasive transcription.

Friday, March 15, 2013

On the Meaning of the Word "Function"

A lot of the debate over ENCODE's publicity campaign concerns the meaning of the word "function." In the summary article published in Nature last September the authors said, "These data enabled us to assign biochemical functions for 80% of the genome ...." (The ENCODE Project Consortium, 2012).

Here's how they describe function.
Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure).
What, exactly, do the ENCODE scientists mean? Do they think that junk DNA might contain "functional elements"? If so, that doesn't make a lot of sense, does it?

Ewan Birney tried to address this definitional morass on his blog [ENCODE: My own thoughts] where he says ....
It’s clear that 80% of the genome has a specific biochemical activity – whatever that might be. This question hinges on the word “functional” so let’s try to tackle this first. Like many English language words, “functional” is a very useful but context-dependent word. Does a “functional element” in the genome mean something that changes a biochemical property of the cell (i.e., if the sequence was not here, the biochemistry would be different) or is it something that changes a phenotypically observable trait that affects the whole organism? At their limits (considering all the biochemical activities being a phenotype), these two definitions merge. Having spent a long time thinking about and discussing this, not a single definition of “functional” works for all conversations. We have to be precise about the context. Pragmatically, in ENCODE we define our criteria as “specific biochemical activity” – for example, an assay that identifies a series of bases. This is not the entire genome (so, for example, things like “having a phosphodiester bond” would not qualify). We then subset this into different classes of assay; in decreasing order of coverage these are: RNA, “broad” histone modifications, “narrow” histone modifications, DNaseI hypersensitive sites, Transcription Factor ChIP-seq peaks, DNaseI Footprints, Transcription Factor bound motifs, and finally Exons.
That's about as clear as mud.

We all know what the problem is. It's whether all binding sites have a biological function or whether many of them are just noise arising as a property of DNA binding proteins. It's whether all transcripts have a biological function or whether many of those detected by ENCODE are just spurious transcripts or junk RNA. These questions were debated extensively when the ENCODE pilot project was published in 2007. Every ENCODE scientist should know about this problem so you might expect that they would take steps to distinguish between real biological function and nonfunctional noise.

Their definition of "function" is not helpful. In fact, it seems deliberately designed to obfuscate.

Let's see how other scientist interpret the ENCODE results. In a News & Views article published in Nature last September, Joseph R, Ecker (Salk Institute scientist) said ...
One of the more remarkable findings described in the consortium's 'entré' paper is that 80% of the genome contains elements linked to biochemical function, dispatching the widely held view that the human genome is mostly 'junk DNA.'
That makes at least one genomics worker who thinks that "biochemical function" and junk DNA are mutually exclusive.

Recently a representative of GENCODE responded to Dan Graur's criticism [On the annotation of functionality in GENCODE (or: our continuing efforts to understand how a television set works)]. This person (JM) says ...
Q1: Does GENCODE believe that 80% of the genome is functional?

As noted, we will only discuss here the portion of the genome that is transcribed. According to the main ENCODE paper, while 80% of the genome appears to have some biological activity, only “62% of genomic bases are reproducibly represented in sequenced long (>200 nucleotides) RNA molecules or GENCODE exons”. In fact, only 5.5% of this transcription overlaps with GENCODE exons. So we have two things here: existing GENCODE models largely based on mRNA / EST evidence, and novel transcripts inferred from RNAseq data. The suggestion, then, is that there is extensive transcription occurring outside of currently annotated GENCODE exons.
There's another scientist who thinks that 80% of the genome has some biological activity in spite of the fact that the ENCODE paper says it has "biochemical function." I don't think "biological activity" is compatible with "junk DNA," but who knows what they think?

Since this person is part of the ENCODE team, we can assume that at least some of the scientists on the team are confused.

The Sanger Institute (Cambridge, UK) was an important player in the ENCODE Consortium. It put out a press release on the day the papers were published [Google Earth of Biomedical Research]. The opening paragraph is ...
The ENCODE Project, today, announces that most of what was previously considered as 'junk DNA' in the human genome is actually functional. The ENCODE Project has found that 80 per cent of the human genome sequence is linked to biological function.
It looks like the Sanger Institute equates "biochemical function" and "biological function" and it looks like neither one is compatible with junk DNA.

I think the ENCODE leaders, including Ewan Birney, knew exactly what they were doing when they defined function. They meant "biological function" even though they equivocated by saying "biochemical function." And they meant for this to be interpreted as "not junk" even though they are attempting to backtrack in the face of criticism.

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. (E. Birney, corresponding author)

Thursday, September 20, 2012

Are All IDiots Irony Deficient?

As I'm sure you can imagine, the Intelligent Design Creationists are delighted with the ENCODE publicity. This is a case where some expert scientists support one of their pet beliefs; namely, that there's no such thing as junk DNA. The IDiots tend not to talk about other expert evolutionary biologists who disagree with them—those experts are biased Darwinists or are part of a vast conspiracy to mislead the public.

You might think that distinguishing between these two types of expert scientists would be a real challenge and you would be right. Let's watch how David Klinghoffer manoeuvres through this logical minefield at: ENCODE Results Separate Science Advocates from Propagandists. He begins with ....
"I must say," observes an email correspondent of ours, who is also a biologist, "I'm getting a kick out of watching evolutionary biologists attack molecular biologists for 'hyping' the ENCODE results."

True, and equally enjoyable -- in the sense of confirming something you strongly suspected already -- is seeing the way the ENCODE news has drawn a bright line between voices in the science world that care about science and those that are more focussed on the politics of science, even as they profess otherwise.

Monday, February 05, 2018

ENCODE's false claims about the number of regulatory sites per gene

Some beating of dead horses may be ethical, where here and there they display unexpected twitches that look like life.

Zuckerkandl and Pauling (1965)

I realize that most of you are tired of seeing criticisms of ENCODE but it's important to realize that most scientists fell hook-line-and-sinker for the ENCODE publicity campaign and they still don't know that most of the claims were ridiculous.

I was reminded of this when I re-read Brendan Maher's summary of the ENCODE results that were published in Nature on Sept. 6, 2012 (Maher, 2012). Maher's article appeared in the front section of the ENCODE issue.1 With respect to regulatory sequences he said ...
The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes ... But the job is far from done, says [Ewan] Birney, a computational biologist at the European Molecular Biology Laboratory’s European Bioinformatics Institute in Hinxton, UK, who coordinated the data analysis for ENCODE. He says that some of the mapping efforts are about halfway to completion, and that deeper characterization of everything the genome is doing is probably only 10% finished.

Sunday, September 09, 2012

Brendan Maher Writes About the ENCODE/Junk DNA Publicity Fiasco

Brendan Maher is a Feature Editor for Nature. He wrote a lengthy article for Nature when the ENCODE data was published on Sept. 5, 2012 [ENCODE: The human encyclopaedia]. Here's part of what he said,
After an initial pilot phase, ENCODE scientists started applying their methods to the entire genome in 2007. Now that phase has come to a close, signalled by the publication of 30 papers, in Nature, Genome Research and Genome Biology. The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes.
I expect encyclopedias to be much more accurate than this.

As most people know by now, there are many of us who challenge the implication that 80% of the genome has a function (i.e it's not junk).1 We think the Consortium was not being very scientific by publicizing such a ridiculous claim.

The main point of Maher's article was that the ENCODE results reveal a huge network of regulatory elements controlling expression of the known genes. This is the same point made by the ENCODE researchers themselves. Here's how Brendan Maher expressed it.

The real fun starts when the various data sets are layered together. Experiments looking at histone modifications, for example, reveal patterns that correspond with the borders of the DNaseI-sensitive sites. Then researchers can add data showing exactly which transcription factors bind where, and when. The vast desert regions have now been populated with hundreds of thousands of features that contribute to gene regulation. And every cell type uses different combinations and permutations of these features to generate its unique biology. This richness helps to explain how relatively few protein-coding genes can provide the biological complexity necessary to grow and run a human being.
I think that much of this hype comes from a problem I've called The Deflated Ego Problem. It arises because many scientists were disappointed to discover that humans have about the same number of genes as many other species yet we are "obviously" much more complex than a mouse or a pine tree. There are many ways of solving this "problem." One of them is to postulate that humans have a much more sophisticated network of control elements in our genome. Of course, this ignores the fact that the genomes of mice and trees are not smaller than ours.

Wednesday, March 13, 2013

Ford Doolittle's Critique of ENCODE

Ford Doolittle has never been one to shy away from controversy so it's not surprising that he weighs in against the misleading publicity campaign launched by ENCODE leaders last September (Doolittle, 2013). Recall that Ewan Birney and other prominent members of the consortium promoted the idea that our genome contained an extensive array of regulatory elements and that 80% of our genome was functional [Ewan Birney: Genomics' Big Talker] [ENCODE Leader Says that 80% of Our Genome Is Functional] [The ENCODE Data Dump and the Responsibility of Scientists].

This is the fourth paper that's critical of the ENCODE hype. The first was Sean Eddy's paper in Current Biology (Eddy, 2012). The second was a paper by Niu and Jiang (2012), and the third was a paper by Graur et al. (2013). In my experience this is unusual since the critiques are all directed at how the ENCODE Consortium interpreted their data and how they misled the scientific community (and the general public) by exaggerating their results. Those kind of criticisms are common in journal clubs and, certainly, in the blogosphere, but scientific journals generally don't publish them. It's okay to refute the data (as in the arsenic affair) but ideas usually get a free pass no matter how stupid they are.

In this case, the ENCODE Consortium did such a bad job of describing their data that journals had to pay attention. (It helps that much of the criticism is directed at Nature and Science because the other journals want to take down the leaders!)