Wednesday, June 22, 2022

The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect

How much of the human genome is functional? This a problem that will be solved by biochemists not epistemologists.

What is junk DNA? What is functional DNA? Defining your terms is a key part of any scientific controversy because you can't have a debate if you can't agree on what you are debating. We've been debating the prevalence of junk DNA for more than 50 years and much of that debate has been (deliberately?) muddled by one side or the other in order to score points. For example, how many times have you heard the ridiculous claim that all noncoding DNA was supposed to be junk DNA? And how many times have you heard that all transcripts must have a function merely because they exist?

Most of the really serious scientific debate has been focused on a widely accepted (among knowledgeable scientists) definition of functional DNA as DNA that's currently being maintained in the genome by selection. In other words, it is subject to purifying selection. The definition that I quote in my book is the one used by Dan Graur in the latest edition (2016) of his textbook Molecular and Genome Evolution.

Functional DNA refers to any segment in the genome whose selected-effect function is that for which it was selected and/or by which it is maintained. Most functional sequences in the genome are maintained by purifying selection.
The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.

Stephen Jay Gould (1982)

The important point here is that functional sequences are currently maintained by purifying selection and it doesn't matter whether they arose historically by natural selection or by some nonadaptive process. [The Function Wars Part VI: The problem with selected effect function] Graur explains this in more detail later on in his book (p. 493).

From an evolutionary viewpoint, a function can be assigned to a DNA sequence if and only if it is possible to destroy it. All functional entities in the universe can be rendered nonfunctional by the ravages of time, entropy, mutation, and what have you. Unless a genomic functionality is actively protected by selection, it will accumulate deleterious mutations and will cease to be functional.

A more colloquial way of putting this is,

Functional DNA is any stretch DNA that cannot be deleted from the genome without affecting the fitness of the individual.

It follows that junk DNA is any stetch of DNA that CAN be deleted without affecting the fitness of the individual.

Dan Graur says that his definition is a "selected-effect" definition of function but that's not strictly true; at least it's not true if you listen to philosophers (who invented the term in its current usage). They tend to assign selected effect functions exclusively to traits that are known to arise historically by natural selection. That's why I say in my book that we should avoid using the term "selected effect" because it just leads to nitpicking. So let's just say that the currrent definition is the best operational definition of molecular function and leave it at that. If we have to give it a cute name, we can call it the maintenance function (MF) (see below).

Unfortunately, philosophers aren't very happy with "operational definitions."1 They want to carry on the function wars for as long as they possibly can. There are many molecular biologists who feel the same way although their motives might be different. Two new papers have recently appeared that illustrate the ongoing function wars. The first one is from Stefan Linquist, a philosophy professor at the University of Guelph (Guelph, ON, Canada). The link is to a preprint of his article that is due to appear in Biology and Philosophy.

Linquist, S. (2022) Causal-role myopia and the functional investigation of junk DNA. [PDF]

The distinction between causal role (CR) and selected effect (SE) functions is typically framed in terms of their respective explanatory roles. Interestingly, most of the controversy over function concepts in genomics surrounds their use in component-driven functional investigation (an activity that precedes explanation). This method begins with the identification of some DNA sequence as a likely candidate for biologically interesting CR-functions. It proceeds to search for the system-level effects of those entities, usually at the organismal level. I argue that this investigative process encounters a problem reminiscent of one that Gould and Lewontin (1979) associated with the adaptationist program. Just as their stereotypical adaptationst was engaged in the myopic pursuit of one selectionist hypothesis after another, so can the investigation of CR functions in genomics lead to an unending series putative organism-level functions for junk DNA. This is an acute problem for genomics because (1) the genome is littered with transposable elements (TEs) and their deactivated descendants which (2) often masquerade as CR-functional components and (3) it is experimentally onerous to determine whether they lack such a function. I further argue that selectionist reasoning about TE / host coevolutionary dynamics can greatly streamline the search for biologically interesting CR functions. Importantly, this reasoning offers epistemic benefits without placing onerous demands on the investigator, because selectionist hypotheses need not be well confirmed to be illuminating in this context. Indeed, an understanding of the coevolutionary dynamics that structure eucaryotic genomes provides a corrective to the otherwise popular idea that most of our DNA is somehow CR (and possibly SE) functional at the level of the organism.

In order to understand this paper, you have to understand a bit of context. If we step back and look at the big picture, we can ask whether most of the human genome is junk. The evidence pointing to abundant junk DNA is very solid and it includes the fact that most of our genome is evolving neutrally. That's evidence that most of our genome is not being maintained by purifying selection2 and this explanation fits nicely with the idea that a fully functional genome would subject our species to an intolerable mutation load. Junk DNA also explains a great deal of the data on variable genome sizes in different species (C-value paradox). Junk DNA is also consistent with the fact that a large fraction of our genome consists of broken genes (pseudogenes) and broken fragments of viruses and transposons. And the fact that only 10% of our genome is conserved is difficult to explain unless most of it is junk.

The concept of abundant junk DNA has enormous explanatory power that no competing concept possesses. That doesn't mean that it's correct but it's a pretty good clue.

I don't think that Linquist (and other philosophers) are discussing this big picture view. They are mostly talking about individual traits or segments of DNA where scientists are trying to decide whether these particular features have a function or not and whether it is a causal role (CR) function or a selected effect (SE) function. They are talking about trees and not about forests. This is interesting but it isn't going to affect the big picture because 90% of our genome is still junk whether or not some small individual segment has an acceptable function. The evidence for junk DNA isn't going away over some quibble about individual sequences.

There's one other issue that's been troubling me for some time. Let's take the example of a stretch of DNA that binds a transcription factor. We've known for half a century that DNA binding proteins will bind to any stretch of DNA that's a close match to their ideal binding site and that because these sites are short there will be many more of these spurious binding sites in a genome than the true biologically relevant sites that control gene expression. If you claim that any transcription factor binding site has a CR function then that's not a philosophical issue with epistemic consequences, it's just stupid. This is a biochemistry issue, not a philosophical one. There's no way that any knowledgeable biochemist could ever assign a function to a sequence that's highly likely to be a spurious binding site and you don't get to use the excuse that this could be some sort of sophisticated causal role (CR) function. That's a cop-out.

The same reasoning applies to transcripts. We know that stretches of DNA can be transcribed by accident to produce junk RNA so, if you assign a function to every transcript simply because it exists, then you are just expressing your ignorance of biochemistry. You don't get to use CR function when you should know full well that this isn't definitive evidence of real biologically relevant function. This is science, not philosophy.

Splice variants are another example. We know that splicing errors are frequent and incorrectly spliced transcripts can be easily detected. If you assume that all splice variants are biologically meaningful examples of alternative splicing then you are just wrong. Don't look to philosophers to excuse your scientific ignorance by defending some obscure meaning of CR function. And philosophers should not fall into the trap of assuming that any observable feature of a genome automatically qualifies as a legitimate CR function just because some ignorant scientist says so.

That doesn't mean that there aren't real issues when it comes to determining function at the level of individual sequences. I'm just trying to point out that knowing something about biochemistry means that the definition of a significant CR function has to be informed by that knowledge. This was the problem with the ENCODE publicity campaign. The ENCODE leaders were not just using "biochemical function" as some weak CR version of function that wasn't meant to be taken seriously. We know for a fact that many of the ENCODE leaders claimed that their results refuted junk DNA so they certainly believed that they were talking about real biologically relevant function. It wasn't philosophers who immediately recognized ENCODE's mistake on September 6, 2012—it was biochemists and molecular biologists who instantly recognized that their claim was stupid. [What did the ENCODE Consortium say in 2012?]

Now let's look at Stefan's take on the issue. His first point is correct and most informed scientists are aware of it. Junk DNA opponents are always looking for some excuse to assign function and if one attempt fails they will switch to another just like the strategies employed by adaptationists with their just-so stories. In the case of functional DNA vs junk, the behavior is motivated by assuming that a given stretch of DNA must have a function and you just have to find it. This is similar to the behavior of adaptationists who assume that a trait must have arisen by natural selection (as opposed to being neutral) and all you have to do is find the correct adapationist explanation. Both groups are using the wrong null hypothesis.

You would think that the best way to uncover real biological function would be to begin with conservation. If the given stretch of DNA is conserved in different species then it's very likely to be functional and you can go on to explore the various possible CR-type functions. But that's not how philosophers see it. Here's the problem according to Linquist.

...very different kinds of evidence are required to establish the truth of a CR as opposed to an SE functional explanation. CR explanations are typically supported with direct experimental evidence showing that the modification or removal of some component affects the relevant system-level capacities. SE functional explanations, in contrast, require evidence about historical processes that are often not amenable to direct experiment and involve more inferential arguments.

It's difficult to unpack this unless you're familiar with the philosophy perspective on SE function. It requires that you present evidence showing that the trait/feature evolved by natural selection (i.e it's an adapation). This is why it's referred to as a historical process.

Contrast this with the definition that I explained above. The best operational definition of function (IMHO) is that the sequence can't be deleted without affecting the fitness of the individual (MF). Thus, if Linquist's CR definition of function includes the fact that removing it causes a problem then that implies that it's under selection and it's functional by my definition. I don't care if historically it arose by natural selection or by some nonadaptive process. This illustrates the difference between an SE function in philosophy and what we use in biochemistry/molecular biology.

Stefan then explains why the philosophical version of SE function appears to cause a problem and why the problem is exaggerated.

However, this reasonable picture can easily lead to a more misguided doctrine: that selectionist reasoning has no place in the scientific investigation (as opposed to the explanation) of CR functions (e.g. Cummins 1976, Amundson & Lauder 1994, Griffiths 2006, Brandon 2013, Craver 2013). Because reasoning about selective history is often associated with SE functional explanation, other epistemic benefits of this practice, especially pertaining to the investigation of proximate mechanisms, are easily overlooked. The aim of this paper is to show that even when a researcher is not interested in explaining the evolutionary origin or selective maintenance of some trait, informed speculation about its selective history is sometimes useful (perhaps essential) for investigating its biologically interesting causal roles.

Stefan notes that with sufficient imagination you could easily assign some sort of CR function to every bit of DNA sequence in a genome. This is not helpful so researchers need to invoke some constraints on the CR definition. One of them is to prioritize selectionist reasoning in the form of "maintenance function" (= purifying selection) as described in a paper he published with Ford Doolittle and Alex Palazzo a few years ago (Linquist et al. 2020). This means that selection reasoning trumps pure CR reasoning— that's the position that most of us prefer.

But Linquist is also interested in showing that selectionist reasoning is important in other situations that constrain CR reasoning. He's not sure what these other restraints might be ...

The one caveat I insist upon is that, in order to do justice to scientific debates over junk DNA, proponents of CR function must identify some constraint on either the kinds of systems or the sorts of capacities that could be considered relevant to a CR-functional investigation. Aside of selective origin or maintenance, I don’t know what those criteria are. I therefore use the phrase “biologically interesting” as a placeholder for the relevant notion, whatever it might be. Hence, if genomics researchers are to traffic in CR functions and avoid the permissiveness problem, then they must restrict their focus to biologically interesting systems. My contention is that selectionist reasoning plays an important epistemic role in the investigation of such functions.

The "biologically interesting" criteria that he's referring to include fundamental biochemical properties of molecules and processes as I described above. That's the knowledge that you must bring to the table when hypothesizing genuine CR-type functions.

Stefan argues that this knowledge also includes some form of selective reasoning such as the evolutionary history of organisms. I understand his point. You don't argue for CR functions in a vacuum because your background knowledge includes an understanding of the history of life. I disagree that the term "selectionist reasoning" is the best way to describe this background knowledge because that sounds too much like adaptationist reasoning. I prefer to call it knowledge of evolution and I argue that this means knowledge of modern population genetics. If you don't understand evolution then your CR hypotheses are likely to be meaningless.

The paper gets a little complicated because the "selectionist reasoning" criterion is also applied to understanding that not all transcripts are functional. I refer to this as fundamental biochemistry facts, not selectionist reasoning. This is the most important difference between my view of a legitimate function hypothesis and Stefan's but it's not very significant. We both agree that ENCODE researchers over-reached in claiming that junk DNA was dead.

Unless a noncoding RNA transcript has some distinctive properties (e.g. its sequence is highly conserved over evolutionary time), it is unreasonable to regard that element as a likely component in some organism-level functional system (Palazzo and Lee 2015).

Therefore, it is quite surprising that ENCODE chose transcription as a proxy for CR functionality. What were they thinking?

Griffith's Paradox

Now we dive deep into the murky waters of philosophy. Paul Griffiths has written about the distinction between CR function and SE function from the perspective of the traditional philosophical definition of an SE function; namely, one that has arisen by adaptation. Paul and I discussed this view in an extended series of emails over the winter and I think I have a pretty good idea of what he means. He objects to the use of "selected-effect" for functions that may have arisen by nonadaptive means because that's not the original sense of the term. Furthermore, Paul objects to using historical reasoning in order to identify legitimate CR functions and he illustrates the problem by pointing out a logical paradox. (I'm quoting from the Lindquist paper.)

P1) If considerations about selective history are necessary for investigating the CR functions of components in contemporary organisms, then it must be possible to accurately identify the selective functions of those components in their ancestors.

P2) However, “ascriptions of selected function are generated by causal analysis of the capacities of ancestral organisms to survive and reproduce in ancestral environments” (Griffiths 2009, p 18). In other words, in order to determine whether some component has a SE function, one must first establish that it has a certain CR function.

P3) But according to P1, establishing that some component has a CR function requires accurately identifying the selective functions of those components in some (further back) ancestor. This gives rise to an infinite regress.

This is where my eyes start to glaze over and I lose interest. Stefan tries to challenge the Griffiths Paradox but I had a great deal of difficulty following the argument. It all seems like a waste of time since using MF as a clue to molecular function (instead of SE) makes the paradox irrelevant.3 Showing that a given sequence is currently under selection is the first step to discovering function; the second step is speculation about what that function might be based on a solid understanding of biochemistry. Those speculations (hypotheses) must be tested experimentally before concluding that you have discovered the real function.

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

1. I'm pretty sure that most philosophers will even want to debate whether my preferred definition of function even qualifies as an operational definition. :-)

2. I'm aware of the fact that some functional DNA is spacer DNA where the actual nucleotide sequence is irrelevant but the bulk is maintained by purifying selection.

3. Other kinds of functions (traits) are more of a problem; for example, reasoning about the possible adaptive role of zebra stripes or the two horns on African rhinos.

Linquist, S., Doolittle, W.F. and Palazzo, A.F. (2020) Getting clear about the F-word in genomics. PLOS Genetics 16:e1008702. [doi: 10.1371/journal.pgen.1008702]

1 comment:

  1. Should probably set up an account, but maybe this will work. Given the bare bones definition: “Functional DNA is any stretch DNA that cannot be deleted from the genome without affecting the fitness of the individual.” …how does redundancy fit in? My long ago readings of knockout studies remind me of issues with buffering or compensation. My poor memory is that duplication and divergence could result in some remnant functional overlap. But even slight effects could have fitness consequences. Divergence might not be complete yet taking over for a deleted gene might fall somewhat short. I am asking out of ignorance! But redundancy??? I posted similar notions on t.o. https://groups.google.com/g/talk.origins/c/t3-4ggROvs8

    Also when will your book be available!

    ReplyDelete