Thursday, March 14, 2013

Anonymous Nature Editors Respond to ENCODE Criticism

There are now been four papers in the scientific literature criticizing the way ENCODE leaders hyped their data by claiming that most of our genome is functional [see Ford Doolittle's Critique of ENCODE ]. There have been dozens of blog postings on the same topic.

The worst of the papers were published by Nature—this includes the abominable summary that should never have made it past peer review (Encode Consortium, 2012).

The lead editor on the ENCODE story was Brendan Maher and he promoted the idea that the ENCODE results showed that most of our genome has a function [ENCODE: The human encyclopaedia]
The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes.
But the very next day (Sept. 6, 2012) Brendon Maher got wind of the controversy and started to defend Nature's decisions. He quoted several bloggers, including me [Fighting about ENCODE and junk]. His main defense was ...
ENCODE was conceived of and practised as a resource-building exercise. In general, such projects have a huge potential impact on the scientific community, but they don’t get much attention in the media. The journal editors and authors at ENCODE collaborated over many months to make the biggest splash possible and capture the attention of not only the research community but also of the public at large.
In other words, the editors of Nature thought about this for several months and then decided that it was okay to attack junk DNA because that would make a big splash in the media.

A few days ago (March 12, 2013) the editors of Nature published another response to criticism [Form and Function]. These editors don't identify themselves.

Let's see how they do by analyzing each part of the editorial. Let's begin with the subtitle ...
Although debate over scientific definitions is important, it risks obscuring the real issues.
The real issues are whether most of our genome is functional or not and whether the ENCODE leaders understood the concept of noise and chance associations. I hope that Nature realizes that it really screwed up by allowing stupid definitions of function to obscure those issues, giving rise to the idea that junk DNA was debunked. Let's see if they understand where they went wrong.
Science is at the mercy of its language. It can be difficult for researchers to communicate what most excites them about the beauty, intricacy and complexity of the natural world. And when words fail, debates and arguments often arise.

One enduring debate has been resurrected by ENCODE, the Encyclopedia of DNA Elements — an ongoing multimillion-dollar project to catalogue the functional elements of the human genome. A headline-grabbing claim, first made in this publication last September, was that roughly 80% of human DNA had been ascribed some “biochemical function” thanks to the efforts of more than 440 scientists (The ENCODE Project Consortium Nature 489, 57–74; 2012).

That percentage is remarkably high, in part because of a broad definition of ‘function’. The ENCODE team used the term to include binding by a regulatory protein, or transcription into RNA — activities identified as widespread. But almost immediately, other scientists began to take this definition to task, calling it essentially meaningless.
They got that part right. The immediate reaction to the Nature papers is that the journal made a big mistake by using a silly definition of "function"—one that was bound to be misinterpreted by everyone. Many of us thought (and still think) that the authors actually believed that most of the genome is functional in the classic sense. In other words, it's not at all clear that there's a difference between the ENCODE definition of function and the definition used by everyone else, at least in the minds of the ENCODE leaders.
Some background is useful. Genomes vary dramatically in size — sometimes irrespective of the complexity of the organism. Take, for example, the genome of the marbled lungfish (Protopterus aethiopicus), which clocks in at an excessive 133 billion base pairs. That of the puffer­fish (Takifugu rubripes), by contrast, sports only 365 million.

For the ENCODE paper to suggest that humans have little genomic redundancy implies that the 3.2-billion-base-pair human genome hits a sweet spot in efficiency. Critics suggested, sometimes sharply, that this was both anthropocentric and ignorant of how evolution shapes the genome. Much of human DNA is non-functional, they insisted. It is a relic of history, garbled by mutation and essentially junk.

The most recent formal critique, published this week, suggests that similar analyses on organisms with very large and very small genomes would probably find the same density of functional elements (W. F. Doolittle Proc. Natl Acad. Sci. USA http://doi.org/kr3; 2013). This investigation has yet to be done.
This is Ford Doolittle's critique but there are many others. Clearly, it's important to bring up this issue (variations in C-value) when discussing the possibility of junk DNA. I don't recall that this discussion took place in the original Nature papers or editorial comments. In fact. I don't recall any substantive discussion of junk or the possibility of "non-function" in any of the paper I read. Can anyone else find a reference?
The debate over ENCODE’s definition of function retreads some old battles, dating back perhaps to geneticist Susumu Ohno’s coinage of the term junk DNA in the 1970s. The phrase has had a polarizing effect on the life-sciences community ever since, despite several revisions of its meaning. Indeed, many news reports and press releases describing ENCODE’s work claimed that by showing that most of the genome was ‘functional’, the project had killed the concept of junk DNA. This claim annoyed both those who thought it a premature obituary and those who considered it old news.

There is a valuable and genuine debate here. To define what, if anything, the billions of non-protein-coding base pairs in the human genome do, and how they affect cellular and system-level processes, remains an important, open and debatable question. Ironically, it is a question that the language of the current debate may detract from. As Ewan Birney, co-director of the ENCODE project, noted on his blog: “Hindsight is a cruel and wonderful thing, and probably we could have achieved the same thing without generating this unneeded, confusing discussion on what we meant and how we said it” (see go.nature.com/8xorge).
Excellent! I'm glad to see that the editors are admitting some responsibility even though they are shifting most of the blame to "big talker" Ewan Birney and not to their reviewers (or themselves). On the other hand, to claim that junk DNA is still an "open and debatable question" seems like a bit of a cop-out. Yes, it's "debatable" but the proponents of junk DNA will probably win any debates. Rumors of the death of junk DNA are not "premature" and they are not "old news." Most of our genome is junk whether the ENCODE leaders believe it or not. It's a fact even if the editors of Nature are skeptical.

Any knowledgeable reviewer would have said the same thing. They would have pointed out that the discussions of function have to include all of the data suggesting that most of these sites are nothing nonfunctional noise. After all, we went through this same debate in 2007 when the preliminary ENCODE data was published. Ignoring this possibility is not good science. Good scientists think of ways their data could be falsified and they give appropriate credit to other interpretations that disagree with their own. It looks like the ENCODE scientists learned nothing in 2007 [see The ENCODE Data Dump and the Responsibility of Science Journalists for a discussion of what happened in 2007.]

We didn't see very much of that kind of good science in Nature last September and I'm still not seeing much of it here.
The ferocity of the criticism has no doubt been fuelled by dissatisfaction over ENCODE’s top-down, big-science approach and the large share of research funds that it has attracted. Many biologists have called the 80% figure more a publicity stunt than a statement of scientific fact. Nevertheless, ENCODE leaders say, the data resources that they have provided have been immensely popular. So far, papers that use the data have outnumbered those that take aim at the definition of function.
If you read Dan Graur's critique you'll see that the data resources are difficult to use and that they are contaminated by an emphasis on function. I think most biologists would be happy if the huge amount of money spent on the project really did yield useful databases. That's by no means certain.

And just because a lot of people might be using the data is no excuse for the publicity hype that misled thousands of scientists and all of the general public. It will take a long tome to undo the damage caused by Nature (and Science) and the ENCODE leaders. Most people now believe that our genome is packed with important regulatory features and that junk DNA no longer exists. I would be more impressed with this editorial if they made it clear that such conclusions were not supported by the ENCODE data.
The debate sounds like a matter of definitional differences. But to dismiss it as semantics minimizes the importance of words and definitions, and of how they are used to engage in research and to communicate findings. ENCODE continues to collect data and to characterize what the 3.2 billion base pairs might be doing in our genome and whether that activity is important. If a better word than ‘function’ is needed to describe those activities, so be it. Suggestions on a postcard please.
The editors are correct. This isn't just about semantics. Do they really need help in defining the abundant binding sites and transcripts that ENCODE found? If so, then clearly they haven't learned their lesson.

As Martin Hafner says in the second comment on the Nature website ...
'The English word to describe those activities you mention in your last paragraph is noise.'
Funny that the editors never discuss this possibility, isn't it?

[Photo Credit: The photograph of Brendan Maher was taken by Jason Varney.]

The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. (E. Birney, corresponding author)

58 comments:

  1. It sounds like they're basically doubling down on it, they're saying, "Oy, lots of great debates can be had over semantics, and it can be important. By the way, 80% of the genome is functional, this has generated lots of papers, and only a few disagree, let us know when you turn those thought-experiments into real experiments".

    ReplyDelete
    Replies
    1. Ironic, give the megabase deletion mouse was published 9 years ago in Nature.

      http://www.nature.com/nature/journal/v431/n7011/full/nature03022.html

      Delete
    2. If I recall, the megabase deletion included some ultraconserved elements, which I find a bit disturbing. Am I recalling correctly? And if so, would anyone like to suggest a reason other than selection for such strong sequence conservation? What that shows (again, if I'm remembering this right) is that the deletion doesn't do an adequate job of showing non-functionality. I'd say evolutionary conservation trumps current observation.

      Delete
    3. Together, the two deleted segments harbour 1,243 non-coding sequences conserved between humans and rodents (more than 100 base pairs, 70% identity). Some of the deleted sequences might encode for functions unidentified in our screen; nonetheless, these studies further support the existence of potentially ‘disposable DNA’ in the genomes of mammals.

      From the abstract, no mention of ultraconserved elements being deleted, which is far as I can get before hitting a paywall.

      If I understand the "conserved between humans and rodents" part correctly, these are noncoding sequences from the human/mouse common ancestor which are still 70% identical.

      Delete
    4. Well, now, is 70% identity inconsistent with neutral evolution from the common ancestor? If those were birds, I'd say it might be, given that these sequences are selected from around 10 times as much sequence. That's why the ultraconserved elements are interesting (if present).

      Delete
    5. The truth is that while this is not arguing over semantics, debunking junk DNA was never the goal of the ENCODE consortium, the goal was to find "functional elements" (which, unfortunately, has turned not to be nearly as straightforward as initially hoped for).

      It's only the main paper and the editorials that really feature those statements (and if you know what is meant by "function" there, they would not sound nearly as alarming), the companion papers are mostly focused and often very useful functional genomic studies that do not touch on this subject.

      Delete
    6. @ John Harshman,

      The text description of the region from the paper is,

      "We selected two regions for deletion, a 1,817 kilobase (kb) gene desert mapping to mouse chromosome 3, and a second region, 983 kb in length, mapping to mouse chromosome 19 (Fig. 1). Orthologous gene deserts of about the same size are present on human chromosomes 1p31 and 10q23, respectively. No striking sequence signatures such as repeat content or nucleotide composition distinguish these two selected gene deserts from other regions of the genome, except for their lack of annotated genes and lack of evidence of transcription (see Supplementary Information). Together, the two selected regions contain 1,243 human–mouse conserved non-coding elements (more than 100 base pairs (bp), 70% identity), also similar to genome averages, whereas no ultra-conserved elements9 or sequences conserved to fish (more than 100 bp, 70% identity) are present."

      Delete
    7. I stand corrected. There are no ultraconserved elements. That leaves us to wonder whether the existence of 1243 "conserved" elements in these 3MB is inconsistent with neutral evolution. Maybe that's just the expectation of site similarity in the absence of too many overlapping indels to confuse alignment, and these are the few sequences that haven't accumulated such indels. In other words, perhaps "conserved" is the wrong word here. I throw that out as a hypothesis. Does the paper address that hypothesis?

      Delete
    8. It was more of a molecular biology paper than a molecular evolution paper. So no, they did not directly test relative to neutral expectation. They just described how "typical" it was. The exact genome co-ordinates were;

      ch19:35033162-35878431
      ch3:151157238-152668920

      But the was for build October 2003 assembly in UCSC. This is no longer available, so I do not know if the co-ordinates are now accurate.

      Delete
    9. Re: Ultra Conserved Elements. Please look here for more information. http://ultraconserved.org/

      One of the papers listed on that website has a breakdown of where UCEs occur in the human genome, exons being in the minority. I think the short answer is that we don't really know much about UCEs at this point. They're getting noticed in phylogenetics though. We've recently been working with a set of UCEs that will (hopefully) be informative across tetrapods.

      Delete
  2. On this topic I tried to add a comment, in polite style, to the Nature "Form and Function" editorial, but was told that: "This account has been banned from commenting due to posting of comments classified as inappropriate or other violations of our Terms of Service." This is most mysterious. I rechecked the "Terms of Service," but the mystery remained. Perhaps someone has cracked my account and is submitting in my name. However, for anyone who might be interested, a fellow Canadian, Professor Gregory, has kindly allowed a posting of a version of the comment at his site.

    ReplyDelete
    Replies
    1. I am happy to say that, thanks to efforts on my behalf by Brendan Maher, the ban has been lifted, and my comment has been restored to Nature's March 12th editorial on "Form and Function." It seems I was a little too "tech-savvy" in my insertion of links to outside sources.

      Delete
  3. Dear Larry,
    I'm not sure why you want my grinning mug on your posts, but if you must, could you please credit the photographer, Jason Varney http://www.varneyphoto.com/. He's a wonderful guy from Philly, who gave me permission to use it some time ago. I've tried to credit him in platforms that allow it, but perhaps it got lost. It wouldn't be a bad idea to ask him for permission, either. He's a nice guy and will likely say, yes.

    I know Nature's editorial structure is opaque, but I was not the lead editor on the ENCODE story. I work on the news team, and although there is an obvious relationship with the back half editors, we are in practice separated by literal and figurative walls to allow (and even encourage) us on the news team to be tough and dispassionate about the research in our journal and others. I can only do my best to be as objective as possible. I'll continue to follow your blog as I continue to report on whether this behemoth project has worth the time and investment.

    Also, I noticed a point mutation has crept into my name. It's Brendan with an 'a'.
    Cheers!

    ReplyDelete
    Replies
    1. @Brendan Maher

      Thank-you for correcting my spelling and for letting me know that your photograph was taken by a professional named Jason Varney.

      It's true that Nature's editorial structure is "opaque." That wasn't true when John Maddox was editor. I spent 10 minutes on the Nature and NPG websites trying to find the names of editors. I couldn't even find YOUR name!

      Nature is under attack for irresponsible journalism. You were the one who responded to the initial criticism of ENCODE back in September but the editors have been silent since then—correct me if I'm worng.

      If you weren't involved in writing the recent editorial then who was? And why did they choose to be anonymous? Do they hold senior positions in the Nature hierarchy? Were they involved in the decisions to publish the ENCODE papers or are they covering up for underlings who made mistakes?

      Why didn't they tell us what kind of review the papers were subjected to before publication? After all, as you yourself said, the discussions went on for months.

      What is your personal opinion on the controversy now that you've had a chance to see both sides of the issue? Do you think that most of our genome is junk or do you think that most of it consists of thousands of biologically functional elements that control gene expression?

      If you had it to do over again would you still write what you wrote last September? Let me remind you what you said ...

      The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes.

      There's nothing in your original article of Sept. 5th that even hints at a controversy or how the ENCODE Consortium deals with the problem of noisy binding and spurious transcripts. Why?

      You mention the 2007 pilot project in your review but were you aware of the controversy that followed publication of those results? Do you know why the ENCODE authors failed to address those criticisms?

      As you know, other science journalists announced that ENCODE refutes junk DNA. This theme was prominent in most articles written for the general public ("Junk DNA Is Dead!"). There even an article by Joseph Ecker in the Sept.5th issue of Nature where he says ...

      One of the most remarkable findings ... is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly 'junk DNA.'

      Did you or any other Nature editors write to these other journalists and scientists to point out that they had misinterpreted the Nature papers? Did any of the ENCODE authors complain about this misrepresentation as far as you know?

      Delete
    2. Brendan,

      I apologize for my crack about Nature. I have cooled down a little.

      But, do you at least agree that these questions are relevant?

      Delete
    3. Dio, I'm so sorry your father never hugged you, but your case does not seem to be as bad as Larry's with narcissistic traits...

      Delete
    4. A revised version of my comment:

      Brendan,

      I understand that you are not responsible for everything written at Nature. I understand you may not have written the editorial we're discussing here.

      However, we get little response from Nature and Science regarding their "Death of Junk DNA" hoax, so I was hoping you could answer me a simple question.

      According to the myth of Func DNA, supposedly 3 billion nucleotides in the human genome are now regulatory elements or "switches" as John Stamatoyannopoulis told us, regulating ~25,000 genes including RNA genes. So each gene including RNA genes is regulated by on average 120,000 nucleotides, or "switches" as John Stamatoyonnopoulis blurted hysterically. That's the average.

      Name one, just one, well-studied gene known experimentally to be regulated by 120,000 nucleotides. Just one.

      If there were one such gene, it wouldn't prove that most of the genome can suffer deleterious mutations (meaning, non-junk)-- you would need to show that the AVERAGE gene, over 25,000 genes total, is regulated by 120,000 nucleotides.

      Admittedly, it might take longer for scientists to show that the AVERAGE gene, over 25,000 genes total, is regulated by 120,000 nucleotides, on average.

      But can you name just one such gene, right now?

      If you can't name even one gene, that is known experimentally to be regulated by 120,000 nucleotides, then how many publication cycles will you go through before before we expect to find one?

      And then the other 24,999 after that?

      Simple question. Do you agree the question is relevant?

      If it's relevant, how come the editors at Nature never asked it?

      Delete
    5. AB: Delete your comment and resend it without the 'crack', or you can pretty much kiss a response goodbye, I'd wager.

      Oh come on, he's Irish. He ought to have a sense of humor.

      Delete
    6. Dio, you and I fundamentally disagree about things, but I gotta admit I like ya
      ;)

      Delete
  4. I just noticed that this year's meeting of the Society for Molecular Biology and Evolution (July 7-11 in Chicago) will have a symposium on "Where Did The Junk Go?". Sounds like it would be a lively airing of the controversy. But wait ... the description of the symposium says that

    With the completion of the current ENCODE project junk DNA effectively disappeared because there's no useless DNA in the genomes no more [sic].

    Oy vey.

    ReplyDelete
    Replies
    1. What do you expect from the workshop organizer Wojciech Makalowski. Pre-ENCODE back in 2003 he already published No Junk after all in Science. Let's see what the Special Symposium: Ideas and Thoughts will bring.

      Delete
    2. Not all is lost, though. I was curious to see what ENCODE would mean to the next generation of textbooks. I recently got a copy of the new edition of Lewin's Genes (XI) that came out a month ago and I'm happy to see that the main reference to the genome evolution part of the book is Lynch's The Origins of Genome Architecture. ENCODE is referred to near the end of the book, in a somewhat positive way, but without making any big claims.

      So there's still some hope that the damage hasn't (and won't) spread too much, at least as far as textbooks go.

      Delete
    3. Regarding the new edition of Lewin's Genes XI (2014), a sample chapter touching on genome architecture, which I edited, - "The Interrupted Gene" - is freely available from the publisher as a PDF file: http://samples.jbpub.com/9781449659851/59059_CH04_081_099.pdf

      Delete
    4. Joe,

      I noticed the symposium on "Where Did The Junk Go?" and it almost tempted me to go to the meeting.

      Do you think there will be participants who defend junk DNA or do you think most attendees are adaptationists who dislike the idea of junk?

      Delete
    5. I notice that they are calling for 15 minute contributed talks, so there may be some of those who defend the existence of junk DNA. I won't be there -- I'm working on nonmolecular data these days and will be at the Evolution2013 meeting in Utah instead.

      One of the invited speakers is Dan Graur, and assuming all the invited speakers have accepted the invitation, he won't be shy about venting his outrage at the premature dismissal of the notion of junk DNA.

      Here is the description, and list of speakers, from Wojciech Makalowski's announcement which was posted on the evoldir mailing list"

      As a part of the is year SMBE meeting, I'm happy to announce the symposium "Where did the junk go"? We are currently accepting abstracts for contributed talks (15min). Abstract submissions are open until March 18th, and travel awards are available to support graduate and post-doc travel to the conference. Please check the meeting web site http://www.smbe2013.org/ for additional information and registration. For any inquires related to the symposium please contact Wojciech Makalowski at wojmak [at] uni-muenster.de. I expect very exciting discussion on this hot topic.


      Sincerely,

      Wojciech Makalowski, Ph.D.
      Professor and Director
      Institute of Bioinformatics
      University of Muenster
      Niels Stensen Strasse 14
      48149 Muenster, Germany

      On sabbatical at the Department of Medical Genome Sciences University of Tokyo


      Where did "junk" go?

      Late Susumu Ohno once said "So much junk DNA in our genome" and the phrase junk DNA was born. For a long time mainstream scientists avoided these parts of the genomes. However, over the years the picture slowly started to appear suggesting that the junk DNA hides a genomic treasure. With the completion of the current ENCODE project junk DNA effectively disappeared because there's no useless DNA in the genomes no more. This symposium will discuss the current understanding of these not so far ago obscure areas of the genomes with the special attention to transposable elements activities and their evolutionary consequences. The integral part of the symposium will be general discussion of Ohno's idea and its place in todays biology.


      Invited speakers include:

      1. Josefa Gonzalez (Institut de Biologia Evolutiva, Barcelona, Spain)

      2. Valer Gotea (National Human Genome Institute, Bethesda, USA)

      3. Dan Graur (University of Houston, Houston, USA)

      4. Dixie Mager (University of British Columbia, Canada)

      5. Masumi Nozawa (National Genetic Institute, Mishima, Japan)

      Delete
    6. Joe,

      Too bad they didn't invite Jonathan Wells. He could have given them a summary of "The Myth of Junk DNA."

      I don't know any of those people except Dan Graur. A quick Google search suggests that none of them are going to defend junk DNA.

      Why do you suppose that Wojciech Makalowski thinks that "junk DNA effectively disappeared because there's no useless DNA in the genomes no more." According to ENCODE leaders, they said no such thing! They were only identifying "biochemical functions" and those didn't refute junk DNA. It's amazing how many "experts" in the field misinterpreted the ENCODE results, isn't it?

      Delete
    7. I hope one of us can make it to Chicago to ask some pointed questions at the Q & A session.

      Simple question: supposedly 3 billion nucleotides are now regulatory elements regulating ~25,000 genes including RNA genes. So each gene incl. RNA genes is regulated by on average 120,000 nucleotides. That's the average.

      Name one, just one, well-studied gene known experimentally to be regulated by 120,000 nucleotides. Name just one, Makalowski.

      Yes, Malakowski is the genius who write "Not Junk After All" declaring Junk DNA dead back in 2003. Then it was dead again in 2007 and again in 2012.

      I am sure after this ENCODE stuff blows over, Malakowski will declare Junk DNA dead again in 2016, then again in 2022, etc. Some memes are immortal.

      With the completion of the current ENCODE project junk DNA effectively disappeared because there's no useless DNA in the genomes no more.

      Oy vey indeed. Genomes, plural. I guess he includes the marbled lungfish, onions, amoeba genomes etc. Nope, no junk in the genomeS no more.

      That should be Casey Luskin's signature. Well, if I'd known earlier I'd've written an abstract for submission, but if the deadline's the 18th...

      Delete
    8. The title of my talk at Chicago will be "The death of junk DNA will be the death of evolutionary theory" and I'll start my lecture with a quote from an important IDiot, William A. Dembski.

      "Consider the term “junk DNA.” Implicit in this term is the view that because the genome of an organism has been cobbled together through a long, undirected evolutionary process, the genome is a patchwork of which only limited portions are essential to the organism. Thus, on an evolutionary view we expect a lot of useless DNA. If, on the other hand, organisms are designed, we expect DNA, as much as possible, to exhibit function."

      Delete
  5. Interesting podcast on the subject. It interviews Dan Graur and M<icheal Eisen, and discussing the whole ENCODE media bomb.

    http://www.mendelspod.com/podcast/debating-encode-with-dan-graur-and-michael-eisen

    ReplyDelete
    Replies
    1. That is a great podcast. Graur is a hoot.

      I laughed at the part where Graur says, chewing gum sticks to my shoe. But it is not the function of my shoe to collect chewing gum.

      Delete
    2. I like the part where Dan Graur mentions that Elizabeth Pennisi was misled but she "likes to be misled."

      Delete
  6. This is rather unfortunate...

    Reading an article posted by JAMA yesterday. http://jama.jamanetwork.com/article.aspx?articleid=1666972#ref-jvp130033-5

    Crossing the Omic Chasm
    A Time for Omic Ancillary Systems

    4th paragraph states,

    "Omic data are different. An individual's germline genetic sequence changes little over a lifetime, but understanding of that sequence is changing rapidly. For years the DNA between coding regions was called “junk,” but it is now known that this DNA plays an important role in gene regulation.5- 6 The 1000 Genomes project has identified tens of millions of different genomic variants; the clinical significance of these variants is mostly unknown, but current understanding is rapidly changing. Unlike serum sodium levels, the clinical implications of NGS obtained today will keep changing for years as knowledge evolves."

    smh

    ReplyDelete
  7. @Georgi Marinov

    You said,

    The truth is that while this is not arguing over semantics, debunking junk DNA was never the goal of the ENCODE consortium, the goal was to find "functional elements" (which, unfortunately, has turned not to be nearly as straightforward as initially hoped for).

    It's true that debunking junk DNA was not a well-defined goal of the consortium. Their goal was to attribute function to as much of the genome as possible. You are correct when you say that this turned out to be difficult—especially in light of abundant evidence for junk.

    In your opinion, what's the best paper that discusses this difficulty and addresses the problems of noise and spurious transcripts? There should be at least one paper that tries to distinguish between real biological function and accidental elements, right?

    It's only the main paper and the editorials that really feature those statements (and if you know what is meant by "function" there, they would not sound nearly as alarming), the companion papers are mostly focused and often very useful functional genomic studies that do not touch on this subject.

    With all due respect, this is nonsense. In light of the 2007 criticism of the pilot project, every single paper should have been aware of the problem of distinguishing between real biological function and nonfunctional effects. In the absence of that critical analysis of their own data, I can only conclude that they meant to convey the idea of real biological function associated with most of their elements.

    There were six different articles written by ENCODE Consortium leaders who summarized their results. Every single one of them emphasized the biological importance of their elements and none of them even mentioned junk DNA or the possibility that they could be looking at artifacts.

    Do you think they were completely unaware of the criticism that was about to be leveled at their interpretation of their own data? Isn't it more likely that they actually believed that they were annotating true biological function and debunking junk DNA? Some of the leaders have actually said this in interviews.

    ReplyDelete
    Replies
    1. Their goal was to attribute function to as much of the genome as possible. You are correct when you say that this turned out to be difficult—especially in light of abundant evidence for junk.

      That's not correct either - the goal is to find the functional elements that drive gene regulation and to annotate the transcriptome as fully as possible. That's not the same as attributing function to as much of the genome as possible. Now, "function", whatever definition of it you want to adopt, has turned out to be more of a continuously distributed rather than a discrete, binary characteristic of genomic features, and then we start having the old argument over where you draw the threshold or whether there can even be a threshold, and that's where the difficulty arises.

      What I said regarding the companion papers is still true. I clearly said the main paper and the editorials are what the debate is over, most of the companion papers (which, BTW, are not limited to the 30+ papers in GR, GB, and other journals from September, many more papers came out before or after that and many are still to come out) are just focused on different aspects of genome biology and/or functional genomic technology development and do not say anything on the subject of junk DNA, because they don't have to. It is simply not correct to attack them on the same grounds as the main paper and the editorials.

      P.S. The problems with experimental artifacts and uncertainty in the interpretation of data are explicitly mentioned in what I have personally written and published (and there is a lot I have not published as a result of those topics featuring too prominently). They feature in other consortium papers too. Should discussion of them have a much more prominent place in papers - I agree wholeheartedly, but the facts of life are that the days of the papers that explicitly discussed at length what's wrong with the data are gone. It has always amazed me how often old papers, say from the 70s, would indulge in such discussions, and how such a thing is a non-starter in today's publishing climate. It's not a perfect situation but it is not easy to change it.

      Delete
    2. @Georgi Marinov

      You said,

      That's not correct either - the goal is to find the functional elements that drive gene regulation and to annotate the transcriptome as fully as possible. That's not the same as attributing function to as much of the genome as possible.

      I know that ENCODE characterized pseudogenes and that over 90% of them are not transcribed. They also looked at chromatin interacting regions—I don't know if this includes SARs but I don't think they are transcribed. Some of the groups studied evolution and sequence comparisons to find conserved stretches of DNA. Some studied SNPs and their association with other features of the genome.

      Are you saying that the ENCODE Consortium deliberately excluded origins of replication, centromeres, and telomeres in their attempt to annotate the genome? Why would they do that?

      Delete
    3. @Georgi Marinov

      You said,

      The problems with experimental artifacts and uncertainty in the interpretation of data are explicitly mentioned in what I have personally written and published ...

      Here's one of the papers with your name on it: Landscape of transcription in human cells. I don't know if you think of it as one of the companion papers.

      Here's part of the abstract ...

      Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.

      Now, if your results are so good that we have to redefine a gene then I expect a detailed discussion of possible artifacts and errors in the interpretation of your data. I didn't find that anywhere in the paper. No mention of spurious transcription and no references to the papers that discuss this possibility and criticize the earlier ENCODE results. Why is that?

      I also expected a discussion about aberrant splicing and possible artifacts. There should have been some mention of the possibility that what you were detecting was due to splicing errors. This directly impacts your re-definition of a gene. Did I miss this in your paper?

      I stand by my claim that most of the ENCODE papers focus on establishing the functionality of their discoveries with scant attention to any other possibility. This is not good science and it's inexcusable given the controversies that arose when the pilot project was published.

      Good scientists will go out of their way to address criticism and ways in which their hypotheses and interpretations can be falsified.

      Do you agree?

      Delete
    4. Are you saying that the ENCODE Consortium deliberately excluded origins of replication, centromeres, and telomeres in their attempt to annotate the genome? Why would they do that?

      Origins of replication were studied by the pilot phase and by modENCODE. For some reason, there wasn't a specific project on this in ENCODE2, though there was RepliSeq data generated for a lot of cell lines.

      Centromers and telomeres cannot really be studied with current functional genomics tools because the read lengths are too short and/or they're not even present in the genome assembly, plus we mostly already know what they do and what their chromatin state

      Delete
    5. @Georgi Marinov

      You say,

      Should discussion of [problems] have a much more prominent place in papers - I agree wholeheartedly, but the facts of life are that the days of the papers that explicitly discussed at length what's wrong with the data are gone. It has always amazed me how often old papers, say from the 70s, would indulge in such discussions, and how such a thing is a non-starter in today's publishing climate. It's not a perfect situation but it is not easy to change it.

      We all understand what's going on here. The ENCODE Consortium didn't really do anything that's not done routinely by lots of other labs. They are only getting criticized because their lack of attention to the "problems" was so blatant that we can't ignore it. They were given a chance to write a series of papers that explained their results and how they fit in with the abundant evidence for junk DNA. They blew it even when presented with this opportunity.

      It would be nice if they would just admit, as you did, that they made a mistake.

      Delete
    6. Georgi can handle this, but I think Larry is overgeneralizing. There are over 400 scientists involved in this. We can only demonstrate, with evidence, that a handful twisted the facts: Birney, Stamatoyannopoulis, a couple others, and some non-scientists like Elizabeth Pennisi, the writers of press releases, etc. You cannot generalize from that to 400 people.

      Moreover, the press releases are the real problem here, not so much the papers.

      I don't agree with Larry's thesis that all authors of all papers should have known this would happen.

      The 2007 kerfuffle was in the blogosphere, and how many scientists follow blogs? Most don't. I myself would not have heard about if I didn't argue with creationists.

      How could the average scientist of these 400 have known that Birney would pull a trick like that, or Stamatoyannopoulis? How could they see that coming? Most scientists tend to trust each other.

      They didn't all get together in a big room and vote on that!

      Anyway, if 10 nucleotides of an intron are found to be functional, you can't generalize to all nucleotides of all introns.

      Likewise, if 3-4 scientists out of 400 twisted the facts, you can't generalize to all 400. Larry here is pulling a Casey Luskin-style overgeneralization.

      Delete
    7. The 2007 kerfuffle was NOT confined to the blogosphere. There were several papers published in reputable journals. I discuss some of them in: Pervasive Transcription.

      The most important paper was Kevin Struhl's Nature paper: Transcriptional noise and the fidelity of initiation by RNA polymerase II. Here's the abstract ...

      Eukaryotes transcribe much of their genomes, but little is known about the fidelity of transcriptional initiation by RNA polymerase II in vivo. I suggest that approx90% of Pol II initiation events in yeast represent transcriptional noise, and that the specificity of initiation is comparable to that of DNA-binding proteins and other biological processes. This emphasizes the need to develop criteria that distinguish transcriptional noise from transcription with a biological function.

      You can't tell me that the ENCODE scientists and all the members of their labs were completely unaware of this problem. Do you mean to imply that this never came up in any of their group meetings or journal clubs?

      Delete
    8. Likewise, if 3-4 scientists out of 400 twisted the facts, you can't generalize to all 400. Larry here is pulling a Casey Luskin-style overgeneralization.

      I'm not saying any such thing. However, it would be nice to find at least one ENCODE paper that gets it right. Do you know of any?

      (All 400+ scientists put their names on several of the papers that made outlandish claims. I can understand why some of them didn't want to challenge their bosses but would you have done that?)

      Delete
    9. @Larry: The 2007 kerfuffle was NOT confined to the blogosphere. There were several papers published in reputable journals. I discuss some of them in: Pervasive Transcription.

      The most important paper was Kevin Struhl's Nature paper: Transcriptional noise and the fidelity of initiation by RNA polymerase II.


      Eh, OK I concede that point.

      Delete
    10. Here's one of the papers with your name on it: Landscape of transcription in human cells. I don't know if you think of it as one of the companion papers.

      First, why are you assuming that a paper with a large number of authors has been written collectively by all of those? Such a paper would never get published - it's hard enough to get all collaborators to respond in timely manner even on small paper that have just a few authors, for something with many dozens or hundreds of authors, it would be practically impossible to get it done in any sort of reasonable time, it would take years to finalize the text. Which itself would not be possible for the reason that if all authors had an equal say in the content, with such a large number of them, there will be tons of statements and phrases that someone would not agree with. These things get written by a small number of people while the rest of the authors are included because they contributed data, developed the analysis methodology, did some analysis, or didn't do anything but were associated with the project and got on the paper because in the shuffle nobody realized they did nothing. It's the nature of big consortium science.

      Now, if your results are so good that we have to redefine a gene then I expect a detailed discussion of possible artifacts and errors in the interpretation of your data. I didn't find that anywhere in the paper. No mention of spurious transcription and no references to the papers that discuss this possibility and criticize the earlier ENCODE results. Why is that?

      I also expected a discussion about aberrant splicing and possible artifacts. There should have been some mention of the possibility that what you were detecting was due to splicing errors. This directly impacts your re-definition of a gene. Did I miss this in your paper?


      Actually, the statement that the gene needs redefining is correct - the paper talks about later how the basic unit should be the transcript and not necessarily the gene. That's not that controversial, especially if we're talking about alternative promoters. Of course, a lot of the different isoforms that get expressed from the complex genes are most likely garbage, and that's not that controversial either. BTW, my paper on the subject is not that one - I was leading a separate analysis of a similar nature where I did discuss the issues of noise and artifacts at length, but that paper was never submitted for reasons I don't want to go into.

      Good scientists will go out of their way to address criticism and ways in which their hypotheses and interpretations can be falsified.

      Do you agree?


      There probably will be an official response to criticism at some point.

      Delete
    11. The most important paper was Kevin Struhl's Nature paper: Transcriptional noise and the fidelity of initiation by RNA polymerase II. Here's the abstract ...

      Kevin Struhl is one of those 400+ scientists on the main papers...

      Delete
    12. Georgi Marinov said:

      "First, why are you assuming that a paper with a large number of authors has been written collectively by all of those?"

      Why is it unreasonable to assume (or even conclude) that the people listed as "authors" actually authored (wrote) the paper?

      "Such a paper would never get published - it's hard enough to get all collaborators to respond in timely manner even on small paper that have just a few authors, for something with many dozens or hundreds of authors, it would be practically impossible to get it done in any sort of reasonable time, it would take years to finalize the text. Which itself would not be possible for the reason that if all authors had an equal say in the content, with such a large number of them, there will be tons of statements and phrases that someone would not agree with."

      If that's the case the collaborators should write their own individual papers, and include any disagreements they have with anyone else. Either that or not allow themselves to be listed as "authors".

      "These things get written by a small number of people while the rest of the authors are included because they contributed data, developed the analysis methodology, did some analysis, or didn't do anything but were associated with the project and got on the paper because in the shuffle nobody realized they did nothing."

      Then the non-authors should be listed as collaborators, developers, analyzers, contributors, or whatever label actually fits along what they actually did, and if they did nothing they shouldn't be listed at all. Is the janitor who cleans the building the paper is written in listed as an author because he/she 'contributed'?

      "It's the nature of big consortium science."

      That's a cop out and it certainly doesn't make false lists of so-called "authors" acceptable. Giving people credit for something they didn't do, especially if they did nothing, and especially if the false credit is in any way meant to inflate the 'authority' of the paper, just plays into the hands of the science bashing religion pushers, and it damages trust in science by the public.

      Trust must be earned and science needs the trust of all the scientists within it, and the public. Accepting responsibility engenders trust and anyone who is listed as an author should accept responsibility as an author.

      It's apparently past time for scientific papers to have a different, mandatory format. Think of the way that multiple-judge courts do their thing. The judges all read the briefs, hear the case, ask questions and/or make statements, research previous cases and precedents, collaborate/contribute/discuss/argue or whatever, mull it over for awhile, and then state their decisions ('opinions') in writing. Their individual 'opinions', including disagreements, are available to other courts and to the public. When multiple people are listed as "authors" of a scientific paper, the discussion section of the paper should include any disagreements between the listed "authors".

      Delete
    13. Typo alert:

      or whatever label actually fits along with what they actually did

      Delete
    14. Georgi Marinov said:

      "There probably will be an official response to criticism at some point."

      If so, who exactly will author it, and will all of the collaborators, developers, contributors, analyzers, or "didn't do anything" associates originally listed as "authors", who disagree with the 'official' claims/position, be allowed to speak in the probable "official response" too?

      I must say that your statement above sounds just like one that would typically come from a politician or their spokesperson.

      Delete
    15. I must say that your statement above sounds just like one that would typically come from a politician or their spokesperson.

      That's true, but Marinov cannot presume to speak for the consortium when nobody anointed him as spokesperson. It would be presumptuous for him to talk like he's the representative of the consortium.

      Delete
    16. I'm most definitely not speaking in the name of the consortium - how could anyone think that? I just said there probably will be an official response because people are discussing the critique-of-ENCODE papers and how to respond to them

      Delete
    17. I know.I'm agreeing with you, partially.

      Delete
  8. Just wanna let you all know, that I'm taking screen pictures of all the clever guys' comments here who are so convinced that the ENOCODE project is so wrong and the most of our DNA remains non-functional. Doing it just in case in a couple of years or even earlier, it will turn out that it is the other way around. I just wanna have some arrows to shoot at some of the smart asses here.

    Are you eligible for a retirement yet Larry?

    ReplyDelete
    Replies
    1. Rofl... care to stick your neck out and make a prediction?

      If junk-DNA hasn't collapsed in 5 years, you will send money to Larry Moran(or me, I'll gladly take your money). How much are you willing to part with?

      Delete
    2. Oh, also, care to give even a cursory hint that you have a clue about the subject?
      For example, according to your understanding, what function did the ENCODE project discover? Be specific.

      Delete
    3. Dominic, are you really saying that most of the human genome has function, in the sense of being capable of suffering deleterious mutations? Would you bet on that, or are you another ID pussy?

      I'd ask what evidence you cite for that, but I know ID pussies never answer questions relevant to the assertions they themselves make. They never answer, because Intelligent Design is a fraud.

      Simple question, Dominic: supposedly 3 billion nucleotides are now regulatory elements regulating ~25,000 genes including RNA genes. So each gene incl. RNA genes is regulated by on average 120,000 nucleotides. That's the average.

      Name one, just one, well-studied gene known experimentally to be regulated by 120,000 nucleotides. Name just one, Dominic.

      Dominic will not answer, because Intelligent Design is a fraud.

      Would you care to make a bet, Dominic? I bet you $2,000 that within a year and a half of the publication of the ENCODE papers, there will not be a single gene, not one gene, in the human genome known experimentally to be regulated by 120,000 nucleotides.

      Even if I lost that bet (I'd win), it wouldn't prove most of the genome can suffer deleterious mutations-- you need to show that the AVERAGE gene, over 25,000 genes total, is regulated by 120,000 nucleotides.

      Admittedly, it might take longer for evolutionists to show (only evolutionists make scientific discoveries nowadays) that the AVERAGE gene, over 25,000 genes total, is regulated by 120,000 nucleotides.

      But, if we're going in the right direction, surely within a year and a half of the publication of the ENCODE papers, there will be a single gene, just one gene, out of 25,000 in the human genome known experimentally to be regulated by 120,000 nucleotides. One out of 25,000-- any gene, any function-- you pick! You pick, pussy! YOU PICK, YOU PUSSY!

      So are you willing to bet $2,000 that will happen? Put your money where your mouth is, witch doctor creationist.

      Delete
    4. You're taking screen caps?!

      Oh no! I'm sure we would have all preferred to be much more circumspect if we'd only known that we were posting to a publicly accessible blog on the internet! Now what shall we do?!

      If you're really taking screen caps, even though it's completely daft, then you can add my message to your little archive.

      Larry has a handy page on this site called "What's in Your Genome?". Perhaps you'd be kind enough to look it over and tell us which genomic elements identified as "junk" actually have selectable, organism-level functions and what they are.

      Delete
  9. I have to say that there's never been a paper published under my name (and some of them are what passes for "big science" in systematics) that I haven't at least read and commented on before publication. If such a manuscript had said anything I seriously disagreed with (and some of them have), I would have (and have) mentioned that in my comments and requested it be changed (and almost always it has been). And I would consider this minimal professionalism.

    Thoughts?

    ReplyDelete
    Replies
    1. John,

      I can say the same thing but I've never been involved in a mega project with dozens of authors.

      I can see Georgi's point. On many of these papers he's just a cog in a big machine and it's really not possible for him to take the kind of stand that you and I would take.

      But let's talk about "groups." Most of the papers are the work of several groups and each one is under some PI. In Georgi's case, the PI is Barbara Wold at CalTech. I assume that her group discusses genomes and junk DNA from time to time and I assume that the PI has a view on whether most of the genome is functional or not.

      PI's could speak out if their views are being misrepresented in a paper. The fact that none of them have spoken out suggests to me that they all share the opinion of the ENCODE leaders; namely, that all that "function" they've discovered is actually real biological function.

      That's the sad part. This isn't really a debate about the definitions of function—it's a debate about the existence of junk DNA and the fact that many prominent biochemists and molecular biologists dismiss it out of hand without ever bothering to learn about the issue.

      I'm going to Los Angeles in June, maybe I'll visit CalTech and give them my "Junk DNA" talk to bring them up to speed. :-)

      Delete