More Recent Comments

Tuesday, July 01, 2014

The Function Wars: Part II

This is Part II of several "Function Wars"1 posts. The first one is on Quibbling about the meaning of the word "function" [The Function Wars: Part I].

The ENCODE legacy

I addressed the meaning of "function" in Part I It is apparent that philosophers and scientists are a long way from agreeing on an acceptable definition. There has been a mini-explosion of papers on this topic in the past few years, stimulated by the ENCODE Consortium publicity campaign where the ENCODE leaders clearly picked a silly definition of "function" in order to attract attention.

Unfortunately, the responses to this mistake have not clarified the issue at all. Indeed, some philosophers have even defended the ENCODE Consortium definition (Germain et al., 2014). Some have opposed the ENCODE definition but come under attack from other scientists and philosophers for using the wrong definition (see Elliott et al, 2014). The net effect has been to lend credence to the ENCODE Consortium’s definition, if only because it becomes one of many viable alternatives.

Ford Doolittle anticipated this debate last year (Doolittle, 2013) when he wrote,
In the end, of course, there is no experimentally ascertainable truth of these definitional matters other than the truth that many of the most heated arguments in biology are not about facts at all but rather about the words that we use to describe what we think the facts might be. However, that the debate is in the end about the meaning of words does not mean that there are not crucial differences in our understanding of the evolutionary process hidden beneath the rhetoric.
My position on the ENCODE publicity campaign is that they did not offer us a definition of function at all. Yes, they used the word "function" but they were completely wrong to use that word. What they were describing is sites that had some property, or exhibited some phenomenon. These might eventually turn out to be functional but all they were describing is data, not conclusions.

Doolittle et al. (2014) refer to these phenomena or properties as "effects" and the paper is devoted to Distinguishing between "Function" and "Effect"ion Genome Biology. The ENCODE Consortium was referring to "effects" and not functions. That should be the end of the story, in my opinion, but, unfortunately, there's some confusion among philosophers over the causal-role definition of function and it may encompass the "effects" that the ENCODE Consortium is talking about.

A working definition of function

I don’t think there’s much point in quibbling about the exact meaning of the word "function" and I don’t think that philosophers are going to make a substantive contribution to the debate other than by pointing out that no definition is entirely satisfactory. However, we do need some sort of definition even if it’s only a "working definition" with recognizable flaws and exceptions.

Personally, I think of a given stretch of DNA as having a "function" if deleting it from the genome affects the survivability of the organism or its progeny. Joe Felsenstein points out in the comments to Part I that this has to be seen in a long-term evolutionary context and not just the immediate survival of the organism and the next generation. That’s certainly the sense in which I think about "function" even if it’s not captured in the working definition. There are lots of other exceptions and quibbles about my preferred working definition. If anyone can offer something better then it can be changed.

There have been four recent papers on the Function Wars (Kellis et al, 2014; Germain et al., 2014; Elliott et al., 2014; Doolittle et al., 2014). None of them have proposed a definition of "function" that we can examine. The closest was Doolittle et al. Who propose that function should be tied to selected history (selected-effect or SE functionality). Doolittle offered a more precise definition in a paper he wrote on his own last year (Doolittle, 2013).
... the functions of a trait or feature are all and only those effects of its presence for which it was under positive natural selection in the (recent) past and for which it is under (at least) purifying selection now.
This is a selected-effect definition of function. It's a pretty good definition but it's an historical and not an empirical definition. That distinction isn't terribly important because no definition is rigorous enough to withstand close scrutiny but I note, for the record, that newly evolved functional genes would be excluded by Doolittle's strict definition.

There are two ways to tie function to natural selection, sequence conservation or not. It’s possible to have selection for bulk DNA or spacers without regard to sequence and any working definition should make clear which selected effect function is being proposed. Doolittle's definition doesn't rule out selection in the absence of sequence conservation.

The other problem is that there appear to be legitimate examples of sequence conservation that are NOT tied to function in the sense of the junk DNA debate so any definition must come with qualifiers so that readers don’t assume that it’s exclusive. In other words, the definition has to be a pragmatic definition and not a philosophically rigorous one.

Apparently, the selected-effect definition offered by Doolittle differs substantively from the working definition I've been using. Here's what he says about that in the 2013 paper ...
Another way to attribute function is through experimental ablation: whatever organism-level effect E does not occur after deleting or blocking the expression of a region R of DNA is taken to be the latter’s function. This attribution is close to the everyday understanding of function, as in the function of the carburetor is to oxygenate gasoline. The approach embodies what philosophers would call a causal role (CR) definition of function and supposedly eschews evolutionary or historical justifications. Much biological research into function is done this way, but I think that most biologists consider that experimental ablation indirectly points to SE. They believe that effect E could, under suitable conditions, be shown to have contributed to the past fitness of organisms and most importantly, that R exists as it does because of E.
That sounds okay to me. I don't really care if it's called a causal-role definition or a selected-effect definition as long as it works.

Is junk equivalent to nonfunction?

What about the definition of "junk"? The way I see it, DNA is either functional or it is junk so defining "function" is equivalent to defining "junk." Others see it differently, I think. For them, there’s either a third, unspecified, alternative, or a continuum with a fuzzy boundary. We can discuss this.

The clearest statement that I could find offering a contrary opinion comes from Ryan Gregory in his book on The Evolution of the Genome (Gregory, 2005). He says ...
Not only is ‘junk DNA’ an inappropriate moniker for noncoding DNA in general because of the minority status of pseudogenes within genome sequences, but it also has the unfortunate consequence of instilling a strong a priori assumption of nonfunction. As Zuckerkandl and Hennig (1995) pointed out, ‘given a sufficient lack of comprehension, anything (and that includes a quartet of Mozart) can be declared to be ‘junk.’ Indeed, it is becoming increasingly clear that some noncoding sequences play important regulatory of structural roles.
That statement needs a bit of unpacking in order to get at the true meaning. First, it was written when Gregory was attempting to restrict the definition of "junk DNA" to pseudogenes but that’s no longer his position so we can ignore that part. Second, it criticizes the idea that "junk" is equivalent to "nonfunction" but this is presumably on the grounds that what is called junk might actually have a function. Third, it invokes the idea that "junk" is being used as a synonym for "lack of comprehension" and this is inappropriate.

My position is that the term "junk DNA" is, indeed, a synonym for "nonfunctional DNA." That’s pretty much the working definition I prefer. Like most definitions, it is a form of a priori assumption. That’s not a weakness, it’s a strength. Furthermore, I reject the idea that I, and others, are using a working definition of "junk DNA" as a reflection of our ignorance of the field.

Before moving on to a more specific example, let’s look at the criticism raised by Zuckerkandl and Hennig (1995) in the paper quoted by Ryan above. They say in their opening sentences ...
Given a sufficient lack of comprehension, anything (and that includes a quartet of Mozart) can be declared to be junk. The junk DNA concept has exercized such a hold over a large part of the community of molecular biologists that it appears worth while to reiterate a point made five years ago: heterochromation is, in fact, a collector’s item.
They go on to describe some functions of heterochromatic regions of the genome. But if these regions really are functional then they are, by definition, not junk. The question before us is whether a large proportion of complex eukaryotic genomes are nothing but junk DNA and the evidence for that is very solid. If, from time to time, some new functions are discovered in that part of the genome, that does not mean that the entire concept has been overthrown and there is no such thing as "junk DNA."

Zuckerkandl believes that most of the genome has a function so he’s not a big fan of junk DNA. But he doesn’t make his case by implying that all junk proponents are ignorant (lack comprehension) and comparing us to someone who would think that a Mozart quartet is junk. This is not a semantic argument over inappropriate meanings of the word "junk." It’s a scientific dispute and the case for, and against, junk DNA has to be resolved by data.

Although the most recent papers on the Function Wars don't offer a definition of "junk," we do have a definition from Ford Doolittle's 2013 paper. He says,
... junk DNA—here specifically understood as DNA that does not encode information promoting the survival and reproduction of the organisms that bear it— ...
That looks like the opposite of his definition of function. I think this is the consensus view these days: junk DNA is DNA that has no function.

Are active transposons junk, or not?

Let’s look at a specific example to see how we can define "junk." The Elliott et al. (2014) paper gives us a nice example to debate. The authors address the question of "selfish DNA" (transposons). It relates to whether the papers by Doolittle and Sapienza (1980) and Orgel and Crick, 1980) were arguments FOR junk DNA or AGAINST it. If our genome were actually full of active transposons that were acting selfishly as parasites, then surely this is a "function" and it would be wrong to say that our genome was full of junk. In that sense, the 1980 papers can be seen as arguments AGAINST the idea of junk DNA.2

Fortunately, that’s not the case. We now know that at least half of our genome consists of defective transposons (pseudogenes) and fragments of transposons. That’s junk by any definition unless it can be shown to have a secondary function. I think this what Elliott et al. would say but, unfortunately, they didn’t say it, so I’m not sure. Thus, it turns out that transposons can be used to explain the origins of junk DNA because the fate of most transposons is death, turning them into pseudogenes.

Active transposons make up only a tiny fraction (<0.1%) of the genome so deciding whether they are junk or functional—or something else—isn’t going to affect the big picture. What it does is help to clarify the discussion. That was one of the goals of the Elliott et al. paper. I don’t think they succeeded. Here’s how they describe the problem ....

As described in this article, ascribing functions to specific components of the genome is uniquely challenging when the sequences involved are transposable elements. Their capacity for autonomous replication creates several major complications that confound the use of functional assessments typically implemented in studies of genes or regulatory regions
Elliott et al. propose to (partially) solve this problem by distinguishing between different levels of function. This is the same approach used in the Doolittle et al. (2014) paper and it’s described much better there so I’ll quote Doolittle et al. (Stefan Linquist and Ryan Gregory are authors on both papers).
... the trait or its effects could indeed be a product of natural selection, but at a level of organization lower (intragenomic) or higher (population or species) than the usual level of evolutionary explanation, namely organisms and their fitness-determining genes. No one would consider the induction and replication of prophages to be the evolutionary "function" of bacterial cells; instead, it is well understood that there is selection at the level of the viruses themselves as well as among their bacterial hosts, so this would be a function of the prophages, not their hosts. Likewise, it would be odd to consider the harboring of nonviral retroelements to be a function of the human genome. These and other transposable elements are indeed products of selection, but at the intragenomic level rather than the organismal level, at least initially. Similarly, the wide prevalence (though probably not the origin) of sexual reproduction might best be explained by reference to selection above the organism level (i.e., among lineages). At every level at which selection might be said to operate, we imagine that the CR/SE distinction can be applied. Strictly speaking, some traits that are nonfunctional at the organism level might possess intragenomic or supra-organismal selected effects. Since the usual focus of functional discourse is on organisms, features selected positively or negatively at higher or lower levels but neutral (or negative) for organisms are considered to have only casual role functions for the purposes of figure 1.
Their Figure 1 is shown on the right. If I understand these papers correctly, it leads to the following conclusions.

The genome can be divided into three components:
  1. Nonfunctional DNA by any definition: This is presumably equivalent to junk.
  2. True functional DNA: These are regions of DNA that have a function at the organism level and they would count as functional by any reasonable definition.
  3. Functional DNA at some other level: This would include transposons and other forms of selfish or parasite DNA but it also might include "higher level" funcations that are only manifest at the population level.
I've created a diagram to illustrate this and highlight the problem I'm concerned about.

The question I’m proposing to discuss is whether the functional DNA at other levels qualifies as junk DNA or something else. Specifically, are active transposons junk DNA? This is the question I was expecting to see an answer to in the Elliott et al paper but the authors tell me that I misunderstood their paper. Tyler Elliott (first author on the paper) says he doesn’t like using the word “junk” for active TEs because there are multiple levers of function. Ryan Gregory, third author on the paper, says that active TEs are "nonfunctonal" at the organismal level on his Facebook page. He declined to say that this was the same as "junk" so I don't know what his position is on that score.

Here's the exchange on Facebook,
T. Ryan Gregory Sorry, Larry -- you know I appreciate your blog posts, but it's clear that you didn't understand the paper(s) you criticize. As to your question: most active TEs are probably non-functional at the organism level. Some may be functional, and some may simply have beneficial side effects for the organism. This is all laid out in detail in the article.

Laurence A. Moran That's exactly how I interpreted your paper. The only part that was missing was the part where you said that active transposon sequences are junk DNA. Is that what you believe?

T. Ryan Gregory I just answered that question. Help me to understand what you're not following in my response or the paper.

Laurence A. Moran Are the words "non-functional" and "junk" synonyms? If so, then you answered my question. You believe that active transposons have a function at one level, but they are junk at another level. I disagree with you and that's what I wrote in my "muddled" post." I think it's extraordinarily muddled to say that a sequence that is transcribed to produce transposase or reverse transcriptase is "nonfunctional" or "junk" at any level. But the main point of my post is that it's muddled and unproductive to even have this metadiscussion where we quibble about the meaning of the word "function."

T. Ryan Gregory As I said, I really don't think you understood the papers or the arguments. Or perhaps you truly are not interested in dealing with the implications of different concepts of "function" (a bait and switch around which was how ENCODE made their hype campaign), and (apparently) you don't see multi-level selection as relevant, then we're really far apart on this point. In either scenario, it feels like it would be not a useful endeavour to go back and forth on our respective blogs. I've published what I think about the topics already.
If active transposons are put in the junk DNA category then these would be examples of DNA with “functions” at some level but junk at another level. Conceptually that’s not much different than the ENCODE proposal (I think). They would also be examples of genes that encode functional enzymes but are still “junk.” In other words, junk coding DNA. I’m not comfortable with that.

If active transposons are neither junk nor functional (at the organismal level) then you can’t define “junk” as just DNA that doesn’t have a function because there would now be a third category of DNA that doesn’t have a function at one level but isn’t junk at that level. In this case, the third category would be “selfish DNA.” I assume there are additional categories such as the population-level category.

I prefer to avoid the discussion about different kinds of function and just say that active transposons have a function and, therefore, they are not junk. When describing the reasons why certain parts of the genome exist, it doesn’t really matter to me whether they were selected for selfish regions or for survival of the species. The problem is that this conflicts with my working definition of function since these active transposons could be deleted without harming the organism. Transposons also appear to be excluded by Doolittle's selected-effect definition but I'm not certain about that.

I don’t see a way out of this conundrum and I don’t think the paper by Elliott et al. was very helpful in this regard. As a matter of fact, I don't believe that it's possible to resolve these Function Wars by publishing more papers on the meaning of "function" or "junk."

Can anyone help? Do you have a philosophically sound definition of "function" or "junk" that will "clarify" the discussion? Do you think that junk DNA is any stretch of DNA that doesn't have a function?

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

1. I thank Alex Palazzo for coming up with the term "function wars."

2. I recently re-read the entire collection of Nature papers from 1980 (Doolittle and Sapienza, 1980; Orgel and Crick, 1980; Cavalier-Smith, 1980; Dover, 1980; Dover and Doolittle, 1980; Orgel, Crick and Sapienza, 1980; Jain, 1980). It's amazing how much these papers still remain relevant today. In fact, I venture the opinion that none of the recent Function Wars papers adds anything substantive to the the debate from 34 years ago!

Cavalier-Smith, T. (1980) How selfish is DNA? Nature 285, 617-618. [doi: 10.1038/285617a0]

Doolittle, W. F. and Sapienza, C. (1980) Selfish genes, the phenotype paradigm and genome evolution. Nature 284, 601-3. [PDF

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) published online March 11, 2013. [PubMed] [doi: 10.1073/pnas.1221376110]

Doolittle, W.F., Brunet, T.D., Linquist, S., and Gregory, T.R. (2014) Distinguishing between “function” and “effect” in genome biology. Genome biology and evolution 6, 1234-1237. [doi: 10.1093/gbe/evu098]

Dover, G. (1980) Ignorant DNA? Nature 285, 618-619.

Dover, G., and Doolittle, W.F. (1980) Modes of genome evolution. Nature 288, 646-647.

Elliott, T. A., Linquist, S. and Gregory, T. R. (2014) Conceptual and empirical challenges of ascribing functions to transposable elements. The American naturalist 184:14-24. [doi: 10.1086/676588]

Germain, P.-L., Ratti, E. and Boem, F. (2014) Junk or functional DNA? ENCODE and the function controversy. Biology & Philosophy, 1-25. (published online March 21, 2014) [doi: 10.1007/s10539-014-9441-3]

Gregory, T. R. (2005) Genome Size Evolution in Animals. In The Evolution of the Genome (Gregory, T. R., ed.), pp. 3-87, Elsevier Academic Press, New york, Oxford etc.

Jain, H.K. (1980) Incidental DNA. Nature 288, 647-648.

Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E. and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proceedings of the National Academy of Sciences 111, 6131-6138. [doi: 10.1073/pnas.1318948111]

Orgel, L. E. and Crick, F. H. (1980) Selfish DNA: the ultimate parasite. Nature 284, 604-607. [doi: 10.1038/284604a0]

Orgel, L.E., Crick, F.H.C., and Sapienza, C. (1980) Selfish DNA. Nature 288, 645-646.

Zuckerkandl, E., and Hennig, W. (1995) Tracking heterochromatin. Chromosoma 104, 75-83.


Claudiu Bandea said...

The idea that most of the genome in species with high C-value, such as humans, has informational roles has been dismissed decades ago for well rationalized reasons, including the C-value paradox, mutational load, and the evolutionary origin of most genomic sequences from transposable elements. Unfortunately, the ENCODE leaders have chosen to conduct and present their massive and expensive project on annotating the functional sequences of the human genome by completely disregarding this fundamental knowledge. Whether this blatant omission represents a case of scientific incompetence or scientific fraud remains to be resolved by the historians of science.

Therefore, the major question remaining in the field of genome evolution and biology is whether most of the human genome and that of other organisms with relatively high C-value has non-informational functions or not. It is highly relevant to emphasize that this question addresses the biological functions of most of the genome (e.g. >90% in the human genome), not of a small fraction of it; as important as a small fraction of the genome might be, it is not relevant in the context of this major remaining question on genome evolution and biology.

Georgi Marinov said...

I don't really see the topic as being as contentious as you do.

First, I am one of those people who do not see function as a binary trait that a piece of DNA either has or does not have - it is a lot more natural to think of it as a continuous variable, especially if we are to adopt the selected effect definition - selection coefficients are not binary after all, quite the opposite.

Second, I have always seen the relationship between genomes and organisms to be essentially the reverse of the more common anthropocentric view of the genome as being there to encode the organism - sure, it does that, but the view that the organism is there to make sure the genome is copied into the future is the more accurate representation of reality. And from that point of view, it is a little bit easier to separate active TEs from the genome and look at them as their own little genomes trying to copy themselves into the future (which is what they are), and their enzymes are surely functional for them, but they have (usually) a negative fitness effect on the larger genome they parasitizing on, and are very much non-functional (or even detrimental) from its perspective - the major reason they persist is because the larger genome is unable to get rid of them efficiently.

Now, it's fine to make that distinction from time to time, but the reality is that what the vast majority of people care about is the perspective of the large genome, not all the other levels (as people care mostly about people and not much else), so if we are to answer the question "are active TEs junk?", it is that point of view that we should answer it from. I'm afraid trying to present finely parsed distinctions between different levels of organization will only sow further confusion and that's not helpful.

Larry Moran said...

I agree with you that bulk DNA speculations are the only way to avoid the conclusions that most of our genome is junk. However, Doolittle (2013) thinks that even if these speculations turn out to be correct, the DNA may still be called "junk."

Larry Moran said...

Are active transposons junk or not?

If the function of a given piece of DNA is a "continuous variable" then can it ever be "junk" with a selection coefficient of either zero or some slightly negative value? If so, then how much of our genome falls into this category, in your opinion?

I'm afraid trying to present finely parsed distinctions between different levels of organization will only sow further confusion and that's not helpful.

I agree with that but I'm not sure that talking about a continuum between functional and junk is much of an improvement.

Georgi Marinov said...

Sure it can - if it's invisible to selection (for which there are clear quantitative criteria) and effectively neutral, it is clear what we would call it. Same if the selection coefficient is large in absolute value but negative.

I agree with that but I'm not sure that talking about a continuum between functional and junk is much of an improvement.

OK, you got me. But it is the reality of the situation.

Claudiu Bandea said...

Laurence A. Moran: However, Doolittle (2013) thinks that even if these speculations turn out to be correct, the DNA may still be called "junk."

I don’t think so. You might want to read my post “Junk DNA is bunk, but not as suggested by ENCODE or Doolittle” ( Here is an excerpt form this post outlining Doolittle’s perspective:

Doolittle navigates through deep conceptual gaps left open by decades of neglect in defining even the most basic notions, such as the meaning of biological function, and concludes his epic journey with a sensible prescription: “A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed” and that, by building this theoretical framework, “Much that we now call junk could then become functional”.

Claudiu Bandea said...

Laurence A. Moran: I agree with you that bulk DNA speculations are the only way to avoid the conclusions that most of our genome is junk

If you agree with that (and I think that most scholars in the field of genome evolution and biology would also agree with that; is that correct Ryan?), then it would make sense to focus on this issue, not on putative ‘functions’ or ‘non-functions’ of small fractions of the genome, such as ‘active TEs.’

John Harshman said...

The other problem is that there appear to be legitimate examples of sequence conservation that are NOT tied to function

Such as...?

For what it's worth, I would call transposons and other selfish elements junk. You might consider active retroelement families in your genome to be affected by selection as a population, but individual retroelements aren't; that is, the sequences of individual insertions are not conserved. It's just that those insertions that happen not to have inactivating mutations will continue to propagate. I don't see how the individual insertions can be called anything but junk.

Nor do I see why sequences that encode a functional protein should not be called junk as long as that protein-coding sequence isn't under selection. And what about SINEs? They don't encode proteins but they still propagate, and so are under selection of a sort at the level of insertion family. You may disagree, but I say it's spinach and to hell with it.

Claudiu Bandea said...

John, I agree that, unless there are some unknown mechanisms for averting random mutations from specific regions of the genomic DNA, sequence conservation is tied to function.

un said...

In the following popular lecture about Hype In Science, Ford Doolittle defines junk DNA as follows:

"Junk DNA: DNA that has no informational role* in determining the fitness of the organism that bears it.

* Although it might serve as a sort of 'clean fill'.
(Is "junk" bunk? Panadaptationism and Functional Genomics. Ford Doolittle. Starts at around 1:07:00).

This differs somewhat from your definition of junk since (1) it includes stretches of DNA that you consider to be functional (i.e. bulk DNA hypotheses), and (2) the definition above seems to suggest that Doolittle agrees with the idea that junk should be defined at the organism level.

Anonymous said...

I think all these definitional problems arise because the word 'function' subtly implies the presence of mind. Either a mind that creates the function with intent or at least a mind the observes and ascribes function. The word 'purpose' is often used interchangeably with function and there I think the connection is more obvious. If we use the definition of function that assumes an observer the definition will always be vague. I think its better to just use the definition that assumes intent in which case we'll have to agree that nothing in biology has function and we'll only use the word informally. It might be useful to come up with a new word for function that describes the complex causally connected systems produced by evolution.

If any of you think I'm guilty of excessive philosophizing I'd say that it should not come as a shock that our language is completely inadequate to describe many of the things we learn from science. In many cases we shouldn't try to pigeon-hole old words- like 'function' -for these new principles. When physicists discover drastically counter-intuitive things about the universe they have mathematics to describe it; biologists need new words.

Claudiu Bandea said...


Thanks for bringing up this statement by Ford Doolittle, which he also made in his paper “Is junk DNA bunk? A critique of ENCODE” (, although without the confusing *asterisk* he added to it his slide.

As I pointed out in my critique of Doolittle’s paper (see:, saying that “junk DNA” (jDNA) was used as a metaphor for genomic DNA that has no *informational* role is misleading. Historically, jDNA was used as a metaphor for genomic DNA that presumably has no biological function (period), whether *informational* or *non-informational*.

As I suggested in the comment, this was somewhat of a “red herring,” as it allowed him to build a conceptual platform suggesting that by adding non-informational functions to genomic DNA (e.g. nucleo-skeletal and nucleotypic functions) and some other hierarchical selected functions, then the concept of jDNA might be bunked.

Pedro A B Pereira said...

Are active transposons junk or not?

For what it's worth, for me they are. The fact that their genes are in perfect working order is irrelevant to me, because the question boils down to the organism level perspective (as far as I'm concerned). If a few get co-opted into doing something "useful" for an organism at some point down the line, then they aren't junk anymore, and therefore "functional". In other words, I see the term "functional" as it relates to the term "junk", and it's the term junk at the organism level that is the real discussion here.

There tons of things in science in which the meaning of a word depends on context, even within the same science. I find absolutely no problem in using the term "functional" when telling someone that a *particular* transposon is "functional" and therefore not "dead", while reserving a second meaning for the term "functional" when referring to functional vs junk, which is related to the organim level and is a completely diferent discussion. If we are smart enough to be able to grasp the meaning of terms from context when it comes to everything else, what exactely is the problem now?

Pedro A B Pereira said...

When physicists discover drastically counter-intuitive things about the universe they have mathematics to describe it; biologists need new words.

So what do you suggest? Using random letter generators everytime we need a new term?

Anonymous said...

OK, well ....maybe suggesting we need a new word...or resurrecting an old one is unreasonable. But I do think the word function has the connotation of mind behind it. How many centuries has the word been part of the english language? Before about 1850 I dont think it could have meant a complex system that does something without a mind behind it because no one could conceive of that. ( and some people still cant). Rather than try to disentangle its current meaning for certain situations I thought it might be easier to use a different word.

Joe Felsenstein said...

Forgive me for evading the central question of Larry's post, but let me set aside the words "function" and "junk" and just ask why all that extra DNA is there. It seems to me that

1. It might be there because of mutational processes that insert it, including transposons jumping around selfishly. Some of that resulting DNA would then have its presence be neutral, some would have it be deleterious.
2. It might be there because having it is good enough for the organism that natural selection keeps it there.

ENCODE's publicity machine backed #2, in effect. (By the way, judging by responses of some of the people in my department who were part of ENCODE, the whole ENCODE team did not get together and agree on having Birney say what he said).

Perhaps there is some way, without getting lost in defining "function", to address which of #1 and #2 is the reason for the presence of all the extra DNA. (And no, I do not want to hear more about Claudiu's theory, until he calculates the selection coefficients that it would generate).

Georgi Marinov said...

You will have to wait for that a very long time - I've done it for him multiple times, to no effect...

Unknown said...

Joe, you are most certainly right that the whole ENCODE team did not get together and agree on having Birney say what he said. I personally tried for about two years to not use the 80% number. What 80% refers to in my humble opinion is the percentage of bases that reproducibly showed a signal on any of the collection of assays ENCODE used in any of the cell lines assayed.

Many times I asked Ewan at the least to exclude introns. It seems to me that something that is destroyed almost as soon as it is made is very close to most people's definition of junk.

In general I think "junk" is a pretty good term for the vast majority of the genome. I distinguish sharply between "junk" and "garbage." Garbage needs to be thrown away relatively soon or it will stink up your kitchen. Junk can hang around in the attic harmlessly indefinitely. Occasionally even you will wonder up there and find a piece of it useful. Some of it has sentimental value, it can remind you of old times. Most of it ultimately does get thrown away, after you die, when your descendants are sorting through your belongings.

Ah, it's so much easier to criticize your big science consortium in public after your funding is cut. I'm sorry for the confusion we propagated in the public. I'm not sorry for all the sweat and hard thinking that went into a project that did enough functional assays to get signal on 80% of the bases of the genome (though to be truthful, it would probably be closer to 70% if we did not inflate the ChIP-seq peaks and DNAse peaks from window size effects). The assays were for the most part done carefully. The data if used cautiously can save the working biologist much time performing the similar assays in their own lab in a one-off low-throughput manner.

A very common problem we face now, and will face more in the future is determining whether a rare variant genetic variant observed in a human is causal of some phenotype the human would be happier and healthier if they did not have. Since a typical human contains thousands of rare variants, this is a difficult problem. Judicious use of the ENCODE data can help very much in sifting through these variants, particularly the ones that are not found in coding regions. I think and hope in the end the ENCODE data will be very useful biomedically. I must say I don't think the 80% claim for biochemical function ever was though.

Joe Felsenstein said...
This comment has been removed by the author.
Joe Felsenstein said...

@Jim: It is good to hear from one more researcher in the ENCODE consortium. My own feeling (which is mostly uninformed) is that Dan Graur is wrong when he says that the whole ENCODE project did not produce helpful information. I think most ENCODE researchers are happy with the data produced, and I think that they should be.

But where Dan is right is that in addition ENCODE produced one other product, namely Ewan Birney's declarations about the demise of junk DNA. Which led to a blaze of publicity. Ryan Gregory made this extremely helpful and extremely depressing collection of links to responses. It includes most of the best known, and most professional, science journalists.

It is going to take years to reverse the impression that Birney made on the journalists, and even longer (a decade?) to reverse the impression that they all made on the lay audience.

Most researchers in the ENCODE consortium have not been heard from on the issue of junk DNA, so it is very helpful to have your response. Some other genomicists did respond with public dismay (good examples being here at Sean Eddy's Cryptogenomicon blog).

But most of the negative reaction has come from non-genomicists (Nick Matzke and Larry being outstanding examples). Most genomicists in ENCODE have sat this one out. I congratulate them on the lab work, but I do hope that they will come to rethink that silence. Thanks for weighing in here.

Claudiu Bandea said...

Now that we have defined “The ENCODE legacy” as ”a case of scientific incompetence or scientific fraud” driven by a few reckless leaders who have compromised the work of hundreds of ENCODE researchers for personal benefits and fame (, and that we have clarified that only a small fraction of the human genome can have informational functions, it is time to move on and address the major enigma remaining in the field of genome evolution and biology:

Does most of the genome in organisms with relatively high C-value have non-informational functions, or most of it is non-functional, metaphorically speaking, junk?

While we should encourage scientists and philosophers to keep refining the concept of biological function, I think that our common sense combined with or our high-end philosophical take on ‘function’ --*we know a function when we see one*—can guide us towards a sensible answer to this major question.

The good news is that we might not need to spend hundreds of millions of dollars to address this question as we might have enough data and observations to build sensible working hypotheses and paradigms. And to do that, we need to systematically reevaluate all the ideas and hypotheses on putative non-informational biological function for genomic DNA.

There is no doubt that some of the researchers in the field of genome biology would prefer to maintain the field in confusion, so they can continue to obtain funds and perform nonsensical research; and, yes, very likely, there is little we can do about that (I already hear them saying: “That’s right, you can’t do anything about it!”). However, by continuing to address the pseudoscientific paradigm that most of our genome *can have information functions*, we are giving them the ammunition to continue their misleading research.

(Please see also Joe Felsenstein’s comment below: Joe Felsenstein, Sunday, July 06, 2014 8:29:00 AM)

Claudiu Bandea said...

Joe Felsenstein: “My own feeling (which is mostly uninformed) is that Dan Graur is wrong when he says that the whole ENCODE project did not produce helpful information”

I don’t think that Dan Graur has stated that the ENCODE project has not produced *any* helpful information, but that by conducting and presenting their expensive project with blatant disregard of the existing knowledge on the evolution and biology of the human genome, such as the C-value paradox, mutational load, and the evolutionary origin of most genomic sequences from transposable elements, the ENCODE leaders have directed the production of large quantities of nonsensical data, which is an unacceptable waist of public resources, not to mention a disrespect for the hard work of dozens of their colleagues working at the ‘bench’.

Joe Felsenstein: “Most researchers in the ENCODE consortium have not been heard from on the issue of junk DNA, so it is very helpful to have your response”

I think it is relevant to bring forward some of the exchanges I had with Anshul Kundajea, a prominent ENCODE researcher (second author of the ENCODE flagship paper in Nature: about the scientific value of the project and his personal view about the ENCODE fiasco (see Lior Pachter post at:

Anshul Kundaje: “Can you support the hypothesized claims that ENCODE has had a negative impact on Science. Any large project would draw funds away from smaller projects. Does that by itself mean that the project should not be undertaken? There is absolutely no proof that ENCODE has had a negative impact on Science.”

CB: ”Generating data and observations is only half of the scientific process; the other half is their interpretation and integration into the existing body of knowledge. Paradoxically, the leaders of the ENCODE project, which was intended and funded primarily to generate data, decided instead to emphasize and focus, at least in the public arena, on the broad interpretation of the acquired data, which apparently overstepped their expertise, or was deliberately used to mislead the public opinion about the significance of the project…. Obviously, generating any data, as long as it is not artifact, is beneficial to Science, but the question is how beneficial compared to other types of data; and, in this respect, the value of ENCODE data is yet to be fully evaluated. However, it appear that ENCODE’s interpretation of the data is meaningless, which discredits Science.”

Anshul Kundaje: “You are judging an entire body of scientific analysis based on one statement/paragraph in the main ENCODE and the massive media hype (which I personally also agree was unnecessary and misdirected … this is my personal view). Criticize the media hype and the way the project was advertised as much as you like. I personally think that is absolutely justified.”

CB: ”…if you and some of your ENCODE colleagues consider this broad interpretation and propaganda as “unnecessary and misdirected” and feel that our critique is “absolutely justified”, it would make sense to publish a clear statement reflecting your discontent with the way some of your reckless leaders compromised your hard work and scientific contribution”

The whole truth said...

Speaking of the human genome, I just came across this:


Georgi Marinov said...

That's protein coding genes only

The whole truth said...

A couple more articles that you guys/gals might find interesting:

John Harshman said...

Naked URLs are a waste of time. If we might find them interesting, tell us why. And tell us what you think about them.

The whole truth said...

And one more:

John Harshman said...

You should also read replies and respond to them.

The whole truth said...

"It might be useful to come up with a new word for function that describes the complex causally connected systems produced by evolution."

How about:

an interaction that results in
a reaction that results in
(or an event that results in)

If natural biology and evolution is the result of natural, non-intelligent, non-directed chemistry, and if chemicals just interact or react, should the terms/phrases used to describe chemical interactions or reactions be used to describe biology and evolution?

I'm not a chemist and I'm wondering if the word 'function' is normally used much by chemists?

The whole truth said...

John, I'll admit that I'm not well versed in molecular biology, genetics, chemistry, and the like but I'm under the impression that this site is intended to facilitate and encourage information, opinions, and discussions about biology and evolution. When I read posts and comments here and then see articles elsewhere that are or may be relevant I sometimes post links to those articles. The way I see it is that even if the articles are not credible or relevant there's no harm in pointing them out.

Some of the things that are topical in Larry's recent posts and/or the comments in response to them are mentioned in the articles that I linked to (genome size, cancer, etc.). I'm also under the impression, maybe mistakenly, that links to new discoveries may be welcomed by the readers and participants here and that more information is a good thing even if it ultimately doesn't specifically pertain to the topic at hand.

I like to think that there are some smart people here who can determine which articles are credible, relevant, or just interesting and those which are not and that they will be kind enough to point out which are which in an informative way. I'm just trying to learn and be helpful and by sometimes posting links to articles that are or may be credible, relevant, or interesting I merely hope to encourage informed discussions. Who knows, maybe some articles about new discoveries will affect the opinions/conclusions of the readers and participants here? Something I know is that even what seems like a tiny or seemingly irrelevant idea or discovery can often lead to more curiosity, discussion, and research and eventually to more knowledge.

John Harshman said...

All I ask is that you not post naked urls but instead give me some reason to look at them. Comment on them, use them to promote some argument, at the very least give me some clue as to what they're about and why you found them interesting. Is that too much?

Unknown said...

Has anyone else noticed that when you press "preview" your message just gets eaten? I'm on chrom. Aarrgh! Let's see if I can remember it....

I'm playing hookie on the second day of the bit three day annual ENCODE meeting. This seems to be the day that focuses primarily on standardizing things far far too early, with the result of much circular discussion. Will those with the toughest butts win, or those with the loudest voices? It's always a close call on that one, but my money is on the tough butts winning, by default, in mid 2015.

The loud voices lost their champion, Ewan, in 2012, and though there are several others, none has quite the elan to flip a coin, as Ewan did in ENCODE2 to decide on the peak caller. It looks like most of the decisions in ENCODE3 will require a six-sided dice. To be safe, I recommend going to the D&D supply store and obtaining a wide collection of rolling randomizing devices, ranging in size from 4 to 20 sided. Well, that's what I'd do if I were in charge, and obsessed with there being _the_one_true_way_. Fortunately I am neither!

Joe F - I sent you a piece of interesting email that I hope you are mulling over, but in case I fat fingered the address, could you send a note to when you see this to let me know I don't need to resend?

Unknown said...

Apologies for the cruelty in the previous message. The Tough Butts do not like that nickname and prefer to be known as Professional Sitters. Also Ewan absolutely did the right thing I think in flipping the coin, but maybe you had to have been there. He's well aware of my opinion on the 80% biochemically functional claim. After giving up on one of two options:
1) 80% of genome has signal in functional genomics assays.
2) 10-15% of the genome appears to be functional in the sense it's under selection.
I offered
3) 100% is functional because:
a) DNase polymerase copies it all, and that's biochemistry!
b) Bigger genomes make bigger cells and bigger animals. Look at Xenopus laevis and tropicalis! That's a nice trait for selection, no?
c) Pretty much any base in the genome could make a difference in phenotype if a misplaced double-stranded-break repair puts it in the wrong context.

Manolis Kellis, who found himself in the curious position of defending the 80% claim, is actually the one who first pointed out to me that population genetics rules out 80% functionality in the traditional evolutionary biological use of the functionality term as well. (Though the "functional" in "functional assay" is a well established usage, and probably more familiar to the genomicist with a non-evolutionary background.)

Joe Felsenstein said...

Jim: I got your email and will reply soon.

I am unclear from this comment what the issues are and which side these various players within ENCODE are on. What "coin" did Ewan Birney flip? Why did he get "lost" in 2012? Is he still missing and are search parties looking for him? What side of what issue are the TB's on?

I think we would all have to be there to know what these references mean.

judmarc said...

Patiently waiting for the various popular science publications that had "Death of Junk DNA" stories a couple of years ago to come out with "Most of it Really *Is* Junk" stories....

Larry Moran said...

@Jim Kent,

Your name is on the Birney et al. paper from 2012 and the Kellis et al. paper from 2014 but your comments suggest that you don't agree with the conclusions in either paper. Now, maybe I'm just being old-fashioned but when I was younger I never would have put my name on a paper if I thought my results were being misinterpreted or misrepresented. Do you think it's ethical to publish a paper when you know that the interpretation is wrong?

Larry Moran said...

I sure hope you're not holding your breath.

The authors of all those stories are patiently waiting for one of two things to happen...

1. Vindication, when other scientists show that ENCODE was correct after all.

2. Obscurity, where everyone fogets that they blew it.

Elizabeth Pennisi is hoping that John Maddox will help her with #1.

Ewan Birney is hoping for #2.

Larry Moran said...

I imagine that one of the hot topics of discussion at the annual ENCODE3 meeting is how much of the genome is worthless junk and how much of their data is just noise. Am I right?

If not, why not?

Unknown said...

If it were a small science paper I would have removed my name. In large science the papers generally are a mix of many things. I'd say that I agree with the _majority_ of what was in the papers. The papers also serve as a marker of the work involved, which for me was a tremendous amount.

But, no, I don't feel entirely ethically clean.

The "no junk" was actually not in any of the papers. It was part of the press communications which I had nothing at all to do with.

Unknown said...

ENCODE has been funded in three rounds - the pilot round covering 1% of the genome, the second round (which we now call ENCODE 2) which covered the full genome, and has the papers and press that has the evolutionary biologists and many other biologists justifiably upset. The third round started two years ago, and we call it ENCODE 3. Ewan was the head of the Data Analyis Center, which was responsible for the flagship papers on ENCODE 2. He has moved onto other things, and to the best of my knowledge receives no funds and spends little if any time on ENCODE 3.

A problem with big science is that typically many scientists are working on similar things. For instance there are currently over a dozen "peak callers." These are programs which take a file containing alignments of typically 5-50 million short sequence reads to the genome, and convert it to a set of perhaps 5-50 thousand regions of the genome that contain most of the reads. These regions are called peaks. With so many peak finders, including several written by consortium members, it would be nice to choose a single "best" one so that later, downstream analysis, would just need to be run on a single set of peak calls, rather than run repeatedly on the results of each caller. However picking the best is not always straightforward. One may be more sensitive, one more specific, one may do better on factors like CTCF that tend to have a punctate signal, another may do better on marks that tend to have a more diffuse signal. At a certain point some decision needed to be made so that the downstream work could proceed. The process that ended up happening is that by careful comparison there was a general consensus that two of the peak finders were better than the others from most points of view. Ewan, rather than picking the one that was written by his favorite lab, flipped a coin to decide between the two.

I am actually not free to talk about the contents of the current meeting. I strongly disagree with the reasoning behind this. However when working with a group sometimes you have to go along with group decisions you do not agree with. I'm sure you don't for instance agree with all of the laws of your country, but probably at least a good many of these you follow nonetheless. It's really only the truly important issues are worth fighting for.

I think much of the problem with ENCODE was structural, and it is a structure that I think is pretty pervasive in big science. There's a desire to standardize while the science and technology is still in ferment. It ends up favoring people with political skills, and the ability I seem to lack, of sitting through large committee meetings. In brief it starts to resemble congress. Perhaps it is no accident since the funding comes through congress, and the organizers are based out of the Washington DC region.

I'm happy to talk some more on this, but it looks like there is a fragment of good interesting science about to be discussed. I'm shifting my attention to it. Talk with you later.

Georgi Marinov said...

I think much of the problem with ENCODE was structural, and it is a structure that I think is pretty pervasive in big science. There's a desire to standardize while the science and technology is still in ferment. It ends up favoring people with political skills, and the ability I seem to lack, of sitting through large committee meetings. In brief it starts to resemble congress. Perhaps it is no accident since the funding comes through congress, and the organizers are based out of the Washington DC region.

That problem seems to exist with any human enterprise of such scale, ultimately deriving from the primary drivers of human behavior and the cognitive deficiencies of the species. Do you have counterexamples of doing it right?

Unknown said...

It is very seldom where large projects don't end up this way with committees in deadlock until out of weariness they make an essentially arbitrary decision. Usually when large scientific projects do avoid committee paralysis it is because there is a very clear common goal, or a clear common threat. Some large groups of scientists worked very fast and effectively in the Manhattan project from the clear threat. The International Human Genome Project had it's big science problems, but much less so than ENCODE, because of the clear common goal.

I just am not convinced that we need such a big group working on 'functional genomics.' There are some advantages, such as working on common biosamples so as to make assays done in different labs more comparable. I think the disadvantages are starting to outweigh these though. Especially as we move from cell lines to human and mouse tissues, I think the common biosample argument is less compelling.

For the software standards, it seems everyone with compute skills and resources wants to do the whole analysis their way starting with the fastq files anyway really. The algorithms tend to improve over time. The standardization efforts we have in ENCODE on the software side I feel are entirely misplaced. The biggest reason for standardization is to make downstream analysis not confounded by some of the upstream analysis done one way, and some the other. However the analysts will always want to redo the analysis from the start with their latest favorite tool if they are good. Typically by the time the standard is agreed on, the tools it is evaluating are obsolete to one degree or another.

Unknown said...

@Georgi - I agree very much it is a human nature thing driving what I consider the pathologies of much of big science.

@Joe - the bits about the loud talkers and tough butts was an attempt at humor in describing a fairly common way the pathology unfolds in an environment where people are trying to make decisions by consensus of a committee. Consensus works well when the committee is composed entirely of flexible, reasonable, and not entirely selfish people. This is seldom the case, and of course the larger the committee the less likely it is going to be that way. Typically however _most_ of a committee will be flexible and reasonable, and thus, in the end, it is sad to say, irrelevant in the decision process. These are the people who get weary and tend eventually, in a pragmatic effort to just get a decision made, give in to selfish people who tend to adopt one of two strategies. One strategy is that of the loud talker, who will dominate the talk, not yielding the floor, and repeat their point of view (often, but not so much in the case of Ewan, with considerable negativity towards opposing views). I will not name names on who are the loud talkers in ENCODE. The other strategy is to sit in silence, not expressing one's opinion, but not particularly listening and not allowing oneself to be swayed by argument. I call this the tough butt strategy. It is often accompanied with back room deals and payoffs. Again I will not name names. The tough butts are of course harder to spot since they are silent until an attempt at a decision. They will stay until the end, when everyone else has gotten weary and left, and then be able to claim consensus in their favor because they are the only ones left.

@Larry - I hope you don't mind me using your blog to talk so much about ENCODE internals. I also hope you don't consider me a niave pan-adaptationist for taking the pathologies of large committees as a sign that we humans, who do seem to like to proceed by consensus when possible, evolved largely in smaller groups.

@ENCODE - please don't think I don't value your work, even as I add my voice to those who would seek to prevent an ENCODE 4 in favor of a return to small science. Small science just means that the labs would be funded individually. They could still choose to work together, but it would be in a bottom up rather than a top down fashion.

Joe Felsenstein said...

@Jim: I appreciate that you are respecting ENCODE's policy of not discussing the disagreements at their annual meeting. I would like to think that some of that discussion is about Ewan's dramatic 2012 announcement of the demise of the concept of junk DNA. (Well, one can hope that people in ENCODE have noticed that their name and prestige are now associated with a great setback for understanding of molecular evolution).

As one who has has very poor success in getting grant money lately, I too worry about the growth of Big Science. It is not always a bad thing (the human genome project is one example where it worked reasonably well). But lately individual grants with one or two PIs have not worked well for me. Fortunately I am a theoretician, who needs only a pad of paper and a computer to do research. The departure of my lab employees when the money ran out has also freed me from the absolute necessity of getting money for them. And I still have my teaching salary (and I am part of a couple of recent grant applications by others, so some money may come). But now I can spend my grant-writing time doing actual research. An unexpected bonus.

But I do worry about younger researchers whose careers depend on success in an atmosphere increasingly dominated by ENCODE-like consortia. Life has gotten harder for them. Some very good science is being done, but the funding of it is more and more dependent on the politics of consortia.

Unknown said...

I guess will talk a little more about this week's meeting. The spirit of the ban on publicly talking about the meeting was to prevent people who were sharing preliminary data from getting scooped by people outside of the consortium. I don't think anything I say here goes against that spirit. Also most of what I speak of here occurred well before this meeting.

Most of the discussion about the broader scientific communities response took place earlier, as the response was unfolding in 2012, and during the preparation of the Kellis et al 2014 manuscript that yes, does have my name on it. For a little while it looked like, as Barbara Wold suggested, we might issue an apology. However soon some sort of consortium pride kicked in, particularly among those I would characterize as alpha males, and eventually weighed against an apology.

Ewan had already left by the time the 2014 manuscript was being prepared. Eventually those who cared decided to defend themselves mostly by pointing out that they had indeed *defined* biochemical functionality as they meant it carefully in the paper, and people simply were not paying attention to how we were using it. It was therefore a semantic issue, no more. The response then shifted into trying to prove how good and useful the data itself was. There was in general, except among Gingeras and Stamatoyannopoulos, bitterness that we were being tarred with the same brush as Ewan Birney for the "no junk" claim, which was made during a press event, and never in the scientific papers.

By the time of the meeting that ended yesterday came along, pretty much everyone in the consortium was wanting to focus on day to day how do we coordinate and improve our assays issues, and what have the labs and analysts been doing in the last year. Since I was playing hookie I really can't say what happened Wed, except that I heard the user feedback session was pretty bloody and brutal. I don't think the consortium has really started to feel much the sting of the response to Kellis et al (2014). It takes a while for things to penetrate through the thick hides of overscheduled PIs. The only ones I heard discuss it were the internal dissenters, whose feelings on it are very complex indeed.

Joe, you raise some very excellent points about the costs scientists outside the consortia are paying for big science. I'm sure these costs are part of the intensity of the response against ENCODE, which though it does involve some semantic issues, surely involves much more than that.

Take care,

Larry Moran said...

Joe Felsenstein said,

Well, one can hope that people in ENCODE have noticed that their name and prestige are now associated with a great setback for understanding of molecular evolution.

I think it's fair to say that some ENCODE leaders have NOT noticed this. They still believe they have discovered that the human genome is full of regulatory sites that control a huge number of genes.

I think it's also fair to speculate that many ENCODE leaders don't really care about the controversy surrounding the 2012 papers. They are convinced that they are on the cutting edge of research and that includes the interpretation of their results.

They aren't going to pay attention to a bunch of old geezers who don't even have active research groups and can't name a single peak finder.