Wednesday, June 29, 2022

The Function Wars Part XII: Revising history and defending ENCODE

I'm very disappointed in scientists and philosophers who try to defend ENCODE's behavior on the grounds that they were using a legitimate definition of function. I'm even more annoyed when they deliberately misrepresent ENCODE's motive in launching the massive publicity campaign in 2012.

Here's another new paper on the function wars.

Ratti, E. and Germain, P.-L. (2021) A Relic of Design: Against Proper Functions in Biology. Biology & Philosophy 37:27. [doi: 10.1007/s10539-022-09856-z]

The notion of biological function is fraught with difficulties - intrinsically and irremediably so, we argue. The physiological practice of functional ascription originates from a time when organisms were thought to be designed and remained largely unchanged since. In a secularized worldview, this creates a paradox which accounts of functions as selected effect attempt to resolve. This attempt, we argue, misses its target in physiology and it brings problems of its own. Instead, we propose that a better solution to the conundrum of biological functions is to abandon the notion altogether, a prospect not only less daunting than it appears, but arguably the natural continuation of the naturalisation of biology.

I'm not interested in discussing their views on function. Suffice to say that it suffers from two fatal flaws: (1) The authors don't seem to understand how purifying selection (maintenance function) affects their arguments against selected effect function. (2) They propose eliminating the notion of 'function' without ever discussing what this might mean in the junk DNA debate.

What I want to discuss is their attempt to defend ENCODE. The authors identify various definitions of function, including classic CR function and SE function, where SE refers only to functions that have arisen by natural selection. They point out that several different definitions can be useful in different contexts, including those that only look at causal roles (they call them biological roles, BR). They defend the use of CR function, especially in the field of physiology.

From their perspective, ENCODE was simply using a traditional method of describing function.

Indeed, most of the ‘molecular functions’ assigned to gene products in the Gene Ontology are based on mere similarity to proteins with known functions, and while this is fairly predictive (e.g. of binding domains, catalytic activities, etc.), it could hardly be more remote from functions as selected effects. Of course, the interest in these activities is not random: biologists investigate enzymatic functions because they expect them to be relevant to the organism, but this link was long not at the core of the research programme. It would therefore be erroneous to think that ENCODE and the likes departed in an unorthodox fashion from an established biological tradition unified around natural selection: it is instead the legacy of a different, considerably older tradition, aimed at understanding biological complexity through a decomposition into parts whose properties and actions contribute to higher level phenomena—in other words, Cummins’ functional analysis [CR].

That could only be true if ENCODE researchers completely ignored other possible interpretations of their data. As I've said many times before, the ENCODE scientists should have been aware of the fact that most transcription factor binding sites had to be non-functional. Thus, to ascribe function to all of these sites was wrong. It cannot be excused as just the traditional way of ascribing CR function. They made a stupid biochemistry mistake, not an epistemological choice.

Not only that. The ENCODE researchers must have been aware of the controversy over function and junk DNA. As scientists, they had an obligation to discuss both sides of the issue even if they were strongly promoting one particular point of view. Their failure to address the possible problems with their function claims is inexcusable. The lack of proper peer review is also inexcusable.

Revisiting the ENCODE controversy

The ENCODE project, along with a series of similar projects (e.g. Roadmap epigenomics, FANTOM, Blueprint epigenomics, etc.), are best seen as representing recent steps in this tradition. It is not our aim here to discuss in any detail the ENCODE project or the controversy attached to it (see Germain et al. 2014; Eddy 2012, 2013; Doolittle 2013; Graur et al 2013; Niu and Jiang 2013; Pennisi 2012; Birney 2012; Brunet and Doolittle 2014; Doolittle and Brunet 2017; Doolittle et al 2014). However, since it has become an important locus of discussion about functions in biology, we believe it is important, especially in light of the present discussion, to address an important misconception regarding it.

This could be interesting. What 'misconception' could they mean? They start off on the wrong foot by misrepresenting the objective of ENCODE. Ratti and Germain claim that the goal was to characterize "the regulatory elements of the human genome." That's not quite correct; the actual goal was to identify ALL functional elements, not just regulatory sequences.

The Encyclopedia of DNA Elements (ENCODE) project aims to delineate all functional elements encoded in the human genome. [What did the ENCODE Consortium say in 2012?].

Ratti and Germain note that sequence conservation is a good clue that a given stretch of DNA has a function but ENCODE decided to ignore that clue.

However, as we have seen in “Introduction” section and argued before (Germain et al. 2014), there are a number of reasons to believe that, especially in our species, non-conserved regions can influence gene expression in a way that matters to biomedicine.

This is a ridiculous argument. Every intelligent biochemist knows that junk DNA can be mutated to cause medical problems. Genetic defects can be caused by gain-of-function mutations that create spurious splice sites or new transcription start sites. You can't assign a 'function' to every site that's responsible for a genetic disease.

Moreover, a major limitation of the conservation signature is that while it offers extremely good indications that a region is relevant to the organism, it offers no clue as to how it might be.

That's correct. But conservation helps us to eliminate false positives and this is extremely important when you know that your other criteria are unreliable indicators of true function. The properties of DNA binding proteins tell us that there will be a million binding sites in the human genome, just by chance. You need to rely on other criteria to separate the wheat from the chaff.

ENCODE therefore sought an alternative approach, building on the observation that the chromatin of regulatory elements tends to be associated with certain sets of biochemical characteristics. The consortium therefore reasoned that genome-wide profiling of an array of such characteristics would yield not only a list of putative regulatory elements, but their classification into various types, leading to more specific hypotheses as to how they can be expected to impact on the regulation of the genome. This is, in a nutshell, the biochemical signature strategy of ENCODE (Stamatoyannopoulos 2012).

The only reasonable approach was to make a list of PUTATIVE regulatory elements based on identifying transcription factor binding sites and their associated epiphenomena and then to winnow those sites by focusing on those that were conserved. ENCODE did not do this. In fact, they did not even identify their binding sites as POTENTIAL regulatory sites as Ratti and Germian incorrectly imply.

In their defense of ENCODE, Ratti and Germain reference a paper by one of the ENCODE leaders, John Stamatoyannopoulos of the Department of Genome Sciences and Medicine at the University of Washington (Seattle, WA, USA). This was a disingenuous choice since that paper refutes the claim of Ratti and Gemain and makes a mockery out of their defense of ENCODE. Stamatoyannopoulos makes it clear in his paper that identifying regulatory sequences using biochemical markers trumps sequence conservation. These are not PUTATIVE sites, according to Stamatoyannopoulos, they are actual regulatory sites. He says that "it is not unreasonable to expect that 40% and perhaps more of the genome sequence encodes regulatory information—a number that would have been considered heretical at the outset of the ENCODE project." (Yes, it IS unreasonable.)

You have to read the entire Stamatoyannopoulos paper to appreciate that he never discusses the possibility that some of the so-called regulatory sites might be non-functional and he never discusses the evidence that most of our genome is junk. With respect to the sequence conservation problem, he has a section titled "The evolution of conservation" where he points out that sequence conservation isn't that important. Here's one paragraph from that section.

On a practical level, the ability to measure function at scale has minimized the role of conservation as a discovery tool. But it has also exposed our ignorance concerning the evolutionary forces shaping the genome, particularly in noncoding regions. The fact that per-nucleotide evolutionary conservation, in combination with nucleotide-level DNA accessibility, can accurately trace a protein–DNA binding interface (Neph et al. 2012b) suggests that the operation of purifying selection is vastly more subtle and complexly structured than had been previously assumed. Moreover, nucleotide-level evolutionary conservation is by itself a poor predictor of functional regulatory variation (Maurano et al. 2012b). However, engrained habits of thought are difficult to escape, and highly conserved noncoding elements are still regularly conflated with regulatory elements (Lowe et al. 2011). Clearly, new models of evolutionary conservation are needed to explain the subtleties of regulatory DNA, and the vast trove of ENCODE data provides an unprecedented opportunity for novel and creative syntheses.

That's a ridiculous statement. What he is saying, in essence, is that the biochemical signatures are the most important way to define a true regulatory sequence and since many of them aren't conserved this is a problem for SE function, not CR function. It means, according to Stamatoyannopoulos, that you can no longer trust sequence conservation to be a reliable indicator of function. This is not a sophisticated defense of CR function as Ratti and Germain imply. It's just plain stupid.

Ratti and Germain try to explain why the ENCODE results were challenged.

Reporting in 2012 the application of this strategy to 147 cell types (to various degrees), the consortium noted that 80.4% of the genome was found to be covered with at least one potential functional element. ENCODE was hailed in mass media with a recurrent theme, claiming that although it was long thought that most of the genome is junk DNA (i.e. DNA not benefiting its bearer), thanks to ENCODE we now know that most of it is functional. Criticisms of ENCODE’s claims by leading biologists in several academic articles shortly followed (Eddy 2012, 2013; Doolittle 2013; Graur et al 2013; Niu and Jiang 2013). The consortium was accused, among other things, of using the wrong notion of function, i.e. something between BR and biological activity, instead of SE. Another criticism, coming in particular from Doolittle (Doolittle 2013) was that ENCODE have wrongly conflated SE and BR, whereby the ‘mere existence’ “of a structure or the occurrence of a process or detectable interaction, is taken as adequate evidence for its being under selection” (pp. 5296–5297).

While not incorrect, this description misses the point. As I've said many times, the initial criticism of ENCODE occurred within days of the papers appearing in September 2012 and this criticism did not focus on epistemology. It focused on ENCODE's misunderstanding of biochemistry and their failure to recognize that many transcripts and many binding sites were not functional. Those initial attacks also pointed out that ENCODE completely ignored all of the evidence for junk DNA. ENCODE was wrong about the science. It was only later, when the criticism appeared in the scientific literature, that the philosophy perspective entered the picture. I thought at the time that this was a major distraction and I have not changed my mind.

An important point to understand is that, as a physiological endeavor of the tradition just described, ENCODE was not about junk DNA—an essentially evolutionary concept. Its findings hardly have any bearing at all on the question of whether most of DNA is junk (Germain et al. 2014), and in fact the expression does not appear even a single time in the main publication (ENCODE Consortium 2012).

Bullsh*t! The ENCODE findings were deliberately slanted in order to refute the concept of junk DNA and demonstrate that the human genome is full of function.

It is really unfortunate that ENCODE’s PIs did not try right away to carefully shape the press-release of their publications in order to avoid misunderstandings around the scope of the project itself, and arguably even contributed to the misconstrual of the project as ‘re-writing evolutionary biology textbooks’.

More bullsh*t! I have carefully documented all the times that ENCODE PI's said that their results refuted junk DNA during the publicity campaign. There's absolutely no doubt that they were completely on board with the anti-junk publicity campaign [What did the ENCODE Consortium say in 2012?] [ENCODE Leader Says that 80% of Our Genome Is Functional].

What emerged from other ENCODE publications (for instance Kellis et al. 2014) aimed at de-escalating the controversy is that ‘biochemical activities’ are merely proxies, and that there is no necessary connection to functional ascriptions understood in an evolutionary sense. In other words, ENCODE ought to be interpreted as a means to understanding the inner workings of an organism, i.e. functional decomposition à la Cummins, and while conservation is an important guide to picking out important parts, it is not equivalent to it.

I don't understand this at all. A few days after the deluge of criticism, ENCODE researchers tried to play down their stupid errors and so did some of the writers at Nature but all of their excuses are meaningless. ENCODE thought they were talking about real function and they went out of their way to dismiss the conservation arguments. In spite of what Ratti and Germain say, there is no evidence that the ENCODE researchers were simply talking about "proxies" that may or may not indicate real function. They were all-in on their definition of function and they made no attempt to correct the massive publicity campaign announcing the death of junk DNA. In fact, they participated in it. [Anonymous Nature Editors Respond to ENCODE Criticism] [Brendan Maher Writes About the ENCODE/Junk DNA Publicity Fiasco] [The ENCODE Data Dump and the Responsibility of Scientists] [The ENCODE Data Dump and the Responsibility of Science Journalists]

Here's a publicity video prepared for Nature in order to promote the ENCODE results. It features Magdalena Skipper, who at the time was a Senior Editor at Nature, and Ewan Birney, the ENCODE leader. Check out what Skipper says at 2:28.

The striking overall result that the ENCODE project reports is that they can assign a function, a biochemical function, to 80% of the human genome. The reason why this is striking is because, not such a long time ago, we still considered that the vast proportion of the human genome was simply junk because we know that it's only 3% that encodes proteins.

Where did she get that idea if not from Ewan Birney? Watch Birney's performance to see if he challenges this interpretation or supports the concept that most of the human genome is involved in a vast network of complex controls.

Ratti and Germain continue ...

We want to stress that this succinct reinterpretation of the ENCODE controversy is by no means a defence of ENCODE itself, let alone of its usage of functions. Our aim is merely to point out that the entire controversy is based on flawed communication and a misunderstanding of the scope of the project.

Nope. Sorry. There was no "flawed communication." They intended to announce the death of junk DNA and that's exactly what they did even though they didn't refer to junk in their main publication. There was no "misunderstanding of the scope of the project." It was meant to define all of the functional elements in the genome. After the publicity fiasco was exposed, the ENCODE researchers tried to pretend that their goal was merely to map all of the spurious binding sites in junk DNA and all of the junk RNA transcripts but that excuse is laughable.


3 comments:

  1. I happen to have a great respect for the discipline of philosophy, at least properly practiced, as I think careful application of critical thinking and philosophical rigor is a bulwark against so much crankery and pseudoscience. So it is with considerable disappointment I see so much lack of clear thinking among philosophers of biology.

    ReplyDelete
  2. We need t-shirts saying "90% of our genome is junk"

    ReplyDelete
  3. A new nice review you may like: https://www.nature.com/articles/s41576-022-00514-4

    ReplyDelete