More Recent Comments

Friday, February 07, 2020

The Function Wars Part VI: The problem with selected effect function

The term "Function Wars" refers to the debate over the meaning of 'function,' especially in the context of junk DNA.1 That debate intensified in 2012 after the ENCODE publicity campaign that tried to redefine function to mean anything they want as long as it refutes junk DNA. This is the sixth in a series of posts exploring the debate and why it's important, or not. Links to the other five posts can be found at the bottom or this post.

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.

Stephan Jay Gould (1982)
Much of the discussion seems like quibbling over semantics but I'm reminded of a similar debate over the mode of evolution: is it gradual or punctuated? As Gould pointed out in 1982, there's a serious issue underlying the debate—an issue that shouldn't get lost in bickering over the meaning of 'gradualistic.' The same warning applies here. It's important to determine how much of the human genome is junk and that requires an understanding of what we mean by junk DNA. However, it's easy to get distracted by focusing on the exact meaning of the word 'function' instead of looking at the big picture.

Sean Eddy has made the same point (Eddy, 2013).
Attention focused on the squabbling more than the substance, and probably led some to wonder whether the arguments were just quibbling over the semantics of the word ‘function’.

Trying to conceptualize the forces that act on genome evolution is not just a matter of semantics.
My opinion, for what it's worth, is that we can't possibly come up with a precise definition of 'function' that doesn't have some exceptions. That really shouldn't come as a big surprise because there are many definitions in biology that suffer from the same problem. Think about 'gene,' for example, or 'species.' Entire books have been written about the meaning of 'species' and still there's no definitive answer. Meanwhile, evolution is still a fact and gene flow between populations is still restricted.

I think we should adopt a practical 'working definition' making it clear that there are exceptions and ambiguities. We should avoid proposing a precise definition because such a definition is easy to falsify and that just gives ammunition to opponents of junk DNA. My working definition is from The Function Wars: Part I.
So, we can adopt a working definition of function and junk based on whether or not deleting the DNA in question affects the survivability of the organism or its descendants. (Keeping in mind that there are minor exceptions).
I was stimulated to write this post because I've been thinking about the problem quite a lot recently since I'm writing a book about junk DNA and one of the chapters is "The Function Wars." I've also had some stimulating discussions recently with Ford Doolittle, Stefan Linquist, and Alex Pallazo. In addition, I've been reviewing Dan Graur's chapter from a recent book titled Evolution of the Human Genome I (Graur, 2017). (Dan was kind enough to send me a copy of the chapter because the book is too expensive to buy—it costs the equivalent of 100 large coffees from Tim Hortons.

Dan tries to sort out the various classes of DNA and he also attempts to define 'function' in a rigorous manner. I'll discuss his classification scheme in another post. For now, let's look at how he defines function. He's a big fan of the selected-effect (SE) definition of function—a definition that relies on whether a given DNA region of DNA is under selection or not. One form of that argument is the way he defines function on page 22 ...
The main advantage of the selected-effect function definition is that it suggests a clear and conservative method for inferring function in a DNA sequence—only sequences that can be shown to be under selection can be claimed with any degree of confidence to be functional. From an evolutionary viewpoint, a function can be assigned to a DNA sequence if and only if it is possible to destroy it (Graur et al. 2013). All functional entities in the universe can be rendered nonfunctional by the ravages of time, entropy, mutation, and what have you. Unless a genomic functionality is actively protected by selection, it will accumulate deleterious mutations and will cease to be functional. The absurd alternative is to assume that function can be assessed independently of selection, i.e., that no deleterious mutation can ever occur in the region that is deemed to be functional.
I agree with those who emphasize selected-effect but you have to be careful because there may be sequences that are under selection but you don't want to count them as functional for the purposes of identifying junk DNA. Ford Doolittle, for example, distinguishes between selected-effect function at different levels (Doolittle et al., 2014: right figure) [see: Restarting the function wars (The Function Wars Part V)]. Furthermore, there are certainly DNA regions that are functional but can accumulate mutations at the neutral rate (e.g. spacer DNA). These examples illustrate the perils of a rigorous definition.

Dan knows this because a few pages later (page 24) he splits functional DNA into two categories: literal DNA and indifferent DNA. Literal DNA refers to genomic regions "whose selected-effect function is that for which it was selected and/or by which it is maintained." Indifferent DNA is defined like this ...
Indifferent DNA includes genomic segments that are functional and needed, but the order of nucleotides in their sequences is of little consequence. In other words, indifferent DNA refers to sequences whose main function is being there but whose exact sequence is not important. They serve as spacers, fillers, and protectors against frameshifts and may possess nucleotypic functions, such as determining nucleus size.
We know for a fact that such functional DNA exists, at least as spacers. We don't know if any of the bulk DNA hypotheses are correct but we can't rule them out. This is why I think it's better to define functional DNA as regions that can't be deleted without harming the organism/species. That's still a selected-effect definition.

But this entire function wars thing is quite troubling because it's a classic example of nitpicking. Every scientist who's involved in this debate has roughly the same understanding of what a selected effect (SE) function means and none of them are going to sacrifice their lives or reputations on some strict interpretation of function that has to apply to every part of the functional genome. For example, Ford Doolittle defined a selected effect (SE) function as ...
... the functions of a trait of feature are all and only those effects of its presence for which it was under positive natural selection in the past and for which it is under (at least) purifying selection now (Doolittle, 2013).
This definition clearly includes the idea that a trait may have arisen by natural selection but it does not restrict the definition to functions that were necessarily under strong natural selection for all of their history. Doolittle (and Graur) make this point very clearly in other publications where they point out that some functional regions of the genome make have arisen by chance; for example, constructive neutral evolution (e.g. Doolittle et al, 2014; Graur, 2016).

Conversely, a gene or other stretch of DNA may appear to be conserved even though it is not functional. Pseudogenes are a classic example but one could argue that they can be dismissed because they are not undergoing purifying selection within a species at the present time. Another example might be a recent gene duplication common to two or more species where there just hasn't been enough time to accumulate mutations.

The gene for ABO blood types is an ambiguous example. The gene encodes the enzyme N-acetylgalactosaminyltransferase that's responsible for attaching sugar resides to the surface of red blood cells. People carrying defective copies of this gene have O-type blood and they appear to be perfectly healthy. In fact, there are many subpopulations that have lost the A and B alleles entirely [Is the high frequency of blood type O in native Americans due to random genetic drift?]. Is the ABO gene junk?

As i mentioned above, there are many scientists who point out that conservation (i.e. selected effect) doesn't always mean that a particular sequence is under purifying selection. There are well-known examples of spacer DNA, for example, where it is the size and position of a particular part of the genome that's important and not its sequence (Graur, 2016). In this case it's the size and position that are conserved and under active purifying selection but that's still a selected effect phenomenon. Spacer DNA cannot be deleted from the genome without serious consequences for the individual and/or species.

Don't forget the levels of selection. Doolittle and collaborators make a distinction between various levels where a function is applicable. For example, an integrated copy of bacteriophage λ will express the cI gene making λ repressor protein. That's a clear example of a functional gene and a functional protein but it does not apply to the E. coli host since the integrated bacteriophage can be inactivated or deleted without harming the bacterium. The λ gene is functional but only at the level of the bacteriophage according to this view of levels of selection. Similar arguments apply to active transposons in the human genome (Doolittle et al., 2014).

I want to make two points with this post.
  1. There is no strict definition of selected effect function that can be expressed in just a few words. Every knowledgeable scientist knows that evidence of purifying selection is powerful evidence of function but there are always exceptions. Knowledgeable scientists know that most definitions are just approximations to the truth, although one hopes that they are very good approximations. This is important because it often puts scientists in conflict with philosophers whose main goal in life is to come up with strict definitions. We'll see an example of this conflict in my next post.
  2. The selected effect definition of function, with all its ambiguities, is really just a statement of evidence for function. Some piece of DNA is functional BECAUSE it exhibits evidence of purifying selection. It's a well-established correlation that has stood the test of time and it's the best way we have of estimating function in the absence of other criteria. Furthermore, there are very few (perhaps none) examples of functional DNA that does NOT exhibit evidence of purifying selection so we can be confident that there are no other features that can be used to establish function on a global scale. (The main competitor is the causal role (CR) criterion,)
The Function Wars: Part I
The Function Wars: Part II
The Function Wars: Part III
The Function Wars: Part IV
Restarting the function wars (The Function Wars Part V)

Image Credit: The Battle of Trafalgar by J.M.W. Turner

1. The term was coined by my colleague, Alex Palazzo.

Doolittle, W. F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) 110:5294-5300. [doi: 10.1073/pnas.1221376110]

Doolittle, W.F., Brunet, T.D., Linquist, S., and Gregory, T.R. (2014) Distinguishing between “function” and “effect” in genome biology. Genome biology and evolution 6, 1234-1237. [doi: 10.1093/gbe/evu098]

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology, 23:R259-R261. [10.1016/j.cub.2013.03.023]

Gould, S.J. (1982) Darwinism and the expansion of evolutionary theory. Science, 216:380-387. [doi: 10.1126/science.7041256]

Graur, D. (2016) Molecular and Genome Evolution: Sinauer Associates, Inc. p. 494


  1. I can think of another situational exception to the SE definition of function. I could for example imagine selection against strongly deleterious mutations(they could be mutations that result in transcription of downstream regions that turn out to be toxic or maybe even lethal) in an otherwise useless stretch of DNA.

    In that sense even bona fide junk-DNA would be under some level of purifying selection. Selection not to interfere with other cellular processes.

  2. Of course, and absent selection pressure to minimize genome size, the "junk" would remain. Then one day along comes a rarely encountered unsavory virus. The cell panics and transcribes wildly. Lo and behold, that piece of "junk" just happens to be complementary to a viral nucleic acid segment. The cell lives, as does its host. A marginal selective value has arisen. Just as a cyclist could either carry a puncture-repair kit (the svelte approach), or 100 spare tyres (the "junk" approach), so perhaps our cells do likewise.

    1. "The cell panics and transcribes wildly." Could you explain this part? :-)

  3. "The cell panics and transcribes wildly". This transcribes a piece of "junk" that just happens to be complimentary to a vital nucleic acid segment, not just a to a viral one, and ...

    1. Is Donald Forsdyke implying that there's some kind of panic response that causes a cell to start transcribing everything in sight whenever it detects a virus? Is there any evidence to support such a claim?

      Or is he implying that pervasive transcription is advantageous because on rare occasions one of the junk RNA transcripts might prove to be beneficial?

      If it's the second implication then wouldn't more and more junk DNA be an advantage? What's stopping the human genome from expanding to the size of a lungfish genome and what's to stop humans from selecting for more and more spurious promoters?

    2. I was trying to make the point that a piece of junk DNA that is transcribed could also have a deleterious effect. Donald Forsdyke seems to assume that having a fortuitous positive effect is much more likely than having a fortuitous negative effect. Why?


      A viral infection is a stress, and stress has long been known to provoke the transcription of repetitive elements (e.g. Alu; Liu et al, 1995), from which run-on transcription can extend into neighboring regions of “junk” DNA (Cristillo et al. 1991).

      Thus, we and others (e.g. Zabolotneva et al. 2010) have suggested an RNA-based "intracellular 'immune system'," where alarms begin ringing when an RNA of viral origin forms dsRNA with an 'RNA antibody' of host origin. Like us, Zabolotneva et al. suspect that this may explain the variable quantities of 'junk DNA' found in genomes:

      "Casual [random] combinations of nucleotides in ... the genome might create new DNA motifs that theoretically, after being transcribed, could be used by the host organism as a tool for recognition and targeting of intracellular pathogen transcripts. Novel transcribed [host] DNA motifs that would target the host genes [i.e. 'self'] would be eliminated from the genome [negative selection], whereas those that complementarily match the pathogen RNAs would be positively selected. Neutral motifs [yet to find a pathogen target but not interacting with 'self'] could be 'stored' in the genomes as ordinary non-coding DNA."

      Others envisage a “retroposon transcriptome” (Faulkner et al. 2009).

      Cristillo et al. (2001) Double-stranded RNA as a not-self alarm signal: to evade, most viruses purine-load their RNAs, but some (HTLV-1, Epstein-Barr) pyrimidine-load. J. Theor. Biol. 208, 475-491.

      Faulker et al. (2009) The regulated retroposon transcriptome of mammalian cells. Nature Genetics 41, 563-571.

      Liu et al. (1995) Cell stress and translational inhibitors transiently increase the abundance of mammalian SINE transcripts. Nucleic Acids Res. 23, 1758-1765.

      Zabolotneva et al. (2010) How many antiviral small interfering RNAs may be encoded by the mammalian genome? Biology Direct 5, 62.

    4. Regarding Joe Felsenstein's point (Feb. 12), the evolutionary trade-offs issue is nicely dealt with by Bradde et al. in a forthcoming paper in the Proceedings of the National Academy of Sciences ("The size of the immune repertoire in bacteria"; Here

  4. I very much appreciate your efforts to lead this fight. However, I fear that one half of the other side is so ignorant that they don't even realize that much of what they publish is irrelevant or even wrong. The other half has already shown during the ENCODE drama that they deliberately deny facts and would not even admit that there is justified criticism. Still, I really hope your book will help.