More Recent Comments

Sunday, September 04, 2022

Wikipedia: the ENCODE article

The ENCODE article on Wikipedia is a pretty good example of how to write a science article. Unfortunately, there are a few issues that will be very difficult to fix.

When Wikipedia was formed twenty years ago, there were many people who were skeptical about the concept of a free crowdsourced encyclopedia. Most people understood that a reliable source of information was needed for the internet because the traditional encyclopedias were too expensive, but could it be done by relying on volunteers to write articles that could be trusted?

The answer is mostly “yes” although that comes with some qualifications. Many science articles are not good; they contain inaccurate and misleading information and often don’t represent the scientific consensus. They also tend to be disjointed and unreadable. On the other hand, many non-science articles are at least as good, and often better, than anything in the traditional encyclopedias (eg. Battle of Waterloo; Toronto, Ontario; The Beach Boys).

By 2008, Wikipedia had expanded enormously and the quality of articles was being compared favorably to those of Encyclopedia Britannica, which had been forced to go online to compete. However, this comparison is a bit unfair since it downplays science articles.

Let’s look at articles on ENCODE from Wikipedia and the Encyclopedia Britannica to see how they compare.

Encyclopedia Britannica: ENCODE

The Encyclopedia Britannica article on ENCODE was written by science journalist named Kara Rogers and it was last updated on June 30, 2016. It contains some information on the ENCODE project up until 2012 but it is by no means comprehensive. There are no references to the scientific literature. It is not a very good article for an encyclopedia.

The important part for Sandwalk readers is how it treats the conclusions that the ENCODE researchers announced in 2012. Here’s the relevant section ...

Production-phase data further revealed that 80 percent of the human genome is biochemically functional as a result of association with RNA or chromatin activities. Since most of the human genome is made up of noncoding DNA (what was previously considered “junk” DNA by some), the data implied that these regions, which do not produce protein and therefore had been presumed to be nonfunctional, are in fact functionally relevant. Although researchers outside the ENCODE project had reached this same conclusion previously, the ENCODE data emphasized its significance. The research performed independently and as part of ENCODE indicated that noncoding regions may play important roles in regulating the production of protein as well as in maintaining the structural integrity of the genome.

This is very misleading. There’s no discussion of the controversy and the fact that this conclusion has been challenged.

Wikipedia: ENCODE

The Wikipedia ENCODE article is ten times better. It covers lots of important material that is missing from the Encyclopedia Britannica article and it is, for the most part, better written. There are lots of references to the scientific literature.

Here’s how the Wikipedia editors treat the controversy.

However the conclusion that most of the genome is "functional" has been criticized on the grounds that ENCODE project used a liberal definition of "functional", namely anything that is transcribed must be functional. This conclusion was arrived at despite the widely accepted view, based on genomic conservation estimates from comparative genomics, that many DNA elements such as pseudogenes that are transcribed are nevertheless non-functional. Furthermore, the ENCODE project has emphasized sensitivity over specificity leading possibly to the detection of many false positives.[45][46][47] Somewhat arbitrary choice of cell lines and transcription factors as well as lack of appropriate control experiments were additional major criticisms of ENCODE as random DNA mimics ENCODE-like 'functional' behavior.[48]

45=Graur et al (2013) paper: 46=my Sandwalk blog: 47=Ryan Gregory’s blog: 48=Mike White’s blog

The next few paragraphs discuss the criticism and offer the standard defenses without discussing whether there is contradictory evidence for junk DNA. The overall tone of the discussion favors the ENCODE position.

The positive feature of this article is that it was written mostly by a single person (or a small number of like-minded people) who impose a consistent narrative and don’t allow the article to be disrupted by irrelevant material and a plethora of “sources.” It stuck to the recommended Wikipedia protocol by having only a few references—often only one at the end of each paragraph.

This is very different form many Wikipedia science articles in my areas of expertise. Those articles are clearly cobbled together by a groups of (mostly) amateurs who like to insert one or two sentences about their favorite topic. There’s no consistency and different parts of the articles often contradict each other. These articles tend to justify the fears of the initial skeptics that crowdsourcing is a bad idea (see Human genome, Pseudogene, Gene-centered view of evolution).

Although the ENCODE article reads well, and is very informative in parts, it nevertheless ends up being misleading when it comes to the most important controversy. It represents the Kellis et al. (2014) paper as the last word on function and ends up giving the impression that biochemical functions are important clues while “evolutionary” and “genetic” functions are not.

They concluded that in contrast to evolutionary and genetic evidence, biochemical data offer clues about both the molecular function served by underlying DNA elements and the cell types in which they act and ultimately all three approaches can be used in a complementary way to identify regions that may be functional in human biology and disease. Furthermore, they noted that the biochemical maps provided by ENCODE were the most valuable things from the project since they provide a starting point for testing how these signatures relate to molecular, cellular, and organismal function.

This misses the point; namely, whether it was correct to declare the death of junk DNA. It’s the widely publicized conclusion that’s the problem and the article really needs to address that issue. The only way to do that is to engage in a brief description of junk DNA and explain why there are so many of us who still think that most of our genome is junk. This view directly contradicts that of the ENCODE researchers and it would be a real challenge to edit this article in order to provide the appropriate balance.

In fact, I doubt that it would be possible to make those edits given the strong resistance it would evoke from Wikipedia editors. It’s those same editors who deleted the Wikipedia article on junk DNA on the grounds that it was bad science and had been refuted.

One of things that seems obvious from looking at good Wikipedia articles is that most of them seem to have been written by fairly knowledgeable people who have imposed some standards and organizational skills. This is somewhat surprising since the culture of Wikipedia editors and administrators is often antagonistic towards experts. They even have specific policies about not trusting experts [Wikipedia: Expert editors].

Expert editors can be very valuable contributors to Wikipedia, but they sometimes have a difficult time realizing that Wikipedia is a different environment from scholarly and scientific publishing.

The mission of Wikipedia is to provide articles that summarize accepted knowledge regarding their subjects, working in a community of editors who can be anonymous if they wish. We generally find "accepted knowledge" in high quality secondary sources like literature reviews and books.

Wikipedia has no formal structure with which to determine whether an editor is a subject-matter expert, and does not grant users privileges based on expertise; what matters in Wikipedia is what you do, not who you are. Previously published reliable sources, not Wikipedia editors, have authority for the content of this encyclopedia.

What this means is that the mission of Wikipedia is to rely on “reliable sources” instead of the experience of experts but this obviously raises the question of how to judge a “reliable source” if you are not an expert. Wikipedia editors have no answer to that question. Instead, they mistrust anyone who tries to remove or edit a statement that had a “reliable source.” You could get banned from Wikipedia if you try to do that.

Some Sandwalk readers have learned how to stay under the radar by making small changes that don’t attract the attention of the Wikipedia police. I tried to do that on the Non-coding DNA article over the past four months. Isn’t it a shame that this is how you have to behave if you want to fix Wikipedia? [Criticism of Wikipedia] [Why Wikipedia is not so great].

After I was blocked from editing on Wikipedia, I challenged the decision based on the fact that the material I deleted was scientifically incorrect and unsourced. I had given what I thought were valid scientific arguments in support of my edits. My challenge was rejected: here’s what I was told.

If you haven't already, please read [[WP:EXPERT]]. It's not required that one be an expert in the topic that they wish to edit about. There are encyclopedia projects with such a requirement, but not this one.

If I were able to edit the ENCODE article I might try temporarily inserting something like ...

“There are many other reliable scientific sources that present different views on the correct meaning of biological function (multiple citations). These scientists and philosophers tend to agree that the best indication of function is that a given sequence is conserved or currently under purifying selection and that biochemical function is only a clue to a possible function (multiple citations). This is why the current ENCODE researchers now refer to these biochemical makers as “candidate” functions (citation).

According to this definition of function, only about 10% of the human genome is functional and the rest is junk DNA. (multiple citations) This conflicts with the initial claims of the ENCODE researchers in 2012 who claimed that 80% of our genome is functional (citation). This controversy has not been resolved but it’s important to note that the original ENCODE claims about the death of junk DNA are not universally accepted by all scientific experts.(multiple citations)”

The best solution is a separate article on junk DNA that could be linked to from this article. What Wikipedia needs is a few authoritative articles on important topics that can be used as sources in other articles; ENCODE could be one of them and so should JUNK DNA and HUMAN GENOME. Instead, we get a plethora of articles that cover the same topic in an inconsistent and often contradictory manner.


Marc said...

On controversial topics, wikipedia is too biased & therefore unreliable.

Georgi Marinov said...

Wikipedia is basically impossible to correct at this point, I have no idea who it is that controls the content, but it is a very small circle of people and the hoops you have to jump through to get access to be able to edit it look like they are specifically designed to prevent wider input.

Your experience is not unique by any means.

ealloc said...

An attempt at having a wikipedia written by experts can be found in the "scholarpedia" project. "Scholarpedia is inspired by Wikipedia and aims to complement it by providing in-depth scholarly treatment of topics within the fields of mathematics and sciences including physical, biological, behavioral, and social sciences." Unlike wikipedia, the articles are authored by named "experts" and are subject to peer review.

I first heard of it >10 years ago from a well-respected professor of mine who was excited about it (and wrote some articles). I thought goal was to cover similar topics as wikipedia. But checking it now the articles seem more narrowly focused.

Out of curiosity, I checked what this "scholar's wikipedia" says on genetics. I suspect sandwalk readers might have problems with the "Genetic Variation in Nature" article, on alternative splicing, previous estimates of the number of human genes, and how SNPs are "to a large extent" adaptive.

Scholarpedia also currently has a call to writers to create a "Postgenetics" article, to cover "Genomics beyond Genes, the PostModern (and Post-ENCODE) era of Genomics." The proposed article should describe that (quote) While it used to be believed that the prevailing axioms in Genomics are "the Genes" (1.3%) and the "junk" (98.7%), the $50 M project "ENCODE" (led by NIH with 11 Countries participating) published on the 14th of June, 2007 its "Report", in which it was openly confessed that "the human genome is pervasively transcribed"; the "junk DNA" concept is essentially obsolete."

Perhaps a sandwalk scholar would be welcome to volunteer?