More Recent Comments

Wednesday, June 25, 2014

The Function Wars: Part I

This is Part I of the "Function Wars: posts. The second one is on The ENCODE legacy.1

Quibbling about the meaning of the word "function"

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.

Stephan Jay Gould (1982)
The ENCODE Consortium tried to redefine the word “function” to include any biological activity that they could detect using their genome-wide assays. This was not helpful since it included a huge number of sites and sequences that result from spurious (nonfunctional) binding of transcription factors or accidental transcription of random DNA sequences to make junk RNA [see What did the ENCODE Consortium say in 2012?]..

I believe that this strange way of redefining biological function was a deliberate attempt to discredit junk DNA. It was quite successful since much of the popular press interpreted the ENCODE results as refuting or disproving junk DNA. I believe that the leaders of the ENCODE Consortium knew what they were doing when they decided to hype their results by announcing that 80% of the human genome is functional [see The Story of You: Encode and the human genome – video, Science Writes Eulogy for Junk DNA]..

The ENCODE Project, today, announces that most of what was previously considered as 'junk DNA' in the human genome is actually functional. The ENCODE Project has found that 80 per cent of the human genome sequence is linked to biological function.

[Google Earth of Biomedical Research]

Theme

Genomes
& Junk DNA
It’s unfortunate that one of the consequences of the ENCODE Consortium publicity campaign is an ongoing debate about the exact meaning of the word “function.” This debate has drawn in several philosphers as well as biologists. In some cases this has led to pointless quibbles that do nothing to settle the controversy over junk DNA. These debates also have the unfortunate consequence of seeming to justify the decision of the ENCODE Consortium leaders. I agree with Sean Eddy when he says (Eddy, 2013) ...
Attention focused on the squabbling more than the substance, and probably led some to wonder whether the arguments were just quibbling over the semantics of the word ‘function’.

Trying to conceptualize the forces that act on genome evolution is not just a matter of semantics.
(This is from the commentary in Current Biology where Eddy criticized Dan Graur’s paper (Graur et al., 2013) as “angry, dogmatic, scattershot, sometimes inaccurate.” I strongly disagree with Sean Eddy on that point even though I am sympathetic to the point he makes about quibbling over the meaning of “function” being a distraction.)

Although I am going to quibble about the word “function” in this lengthy post, my main point is that the function wars are, for the most part, distracting and unproductive. We’re interested in the big picture—whether most of our genome is junk—and that’s not going to be resolved by settling on a definition of “function.” We have enough experience in biology to know that very few terms can be defined unambiguously (e.g. “gene,” “species”).

Biomedically Relevant Function

Let’s look at an examples of quibbling over the meaning over of “function.” A recent paper by Germain et al. (2014) points out that the purpose of the ENCODE project was to discover functional sequences in the human genome. They are correct to say this in spite of the fact that the ENCODE leaders are now pretending that looking for function was not a very important part of the ENCODE project. According to the latest revisionist account, the most important contribution was just collecting massive amounts of data (Kellis et al. 2014).2

Germain et al. then go on to say that ....
ENCODE’s controversial claim of functionality should be interpreted as saying that 80% of the genome is engaging in relevant biochemical activities and that are likely to have causal roles in phenomena deemed relevant to biomedical research.
This seems to echo the view of the ENCODE Consortium since in their latest attempt at backtracking (Kellis et al. 2014) they emphasize this same point about medical relevance. After pointing out that only 1% of the genome encodes protein, the ENCODE leaders say ...
More recently, genome-wide association studies have indicated that a majority of trait-associated loci, including ones that contribute to human diseases and susceptibility, also lie outside protein-coding regions. These findings suggest that the noncoding regions of the human genome harbor a rich array of functionally significant elements with diverse gene regulatory and other functions.
They are suggesting that there’s a “rich array” of mysterious sequences that affect genetic diseases. I doubt very much that this is true. The mutations that produce genetic defects in humans will almost certainly turn out to be in well-understood parts of the gene or its closely associated regulatory sequences. There’s no reason to assume that mapping of genetic disease mutations in humans is likely to uncover a huge number of new regulatory elements that escaped detection by geneticists, biochemists, and molecular biologists.

The focus on putative functions that are biomedically relevant is just another way of describing the original claim of the ENCODE Consortium and it does nothing to advance our understanding of “function.” The correct way of expressing this idea is to say that 80% of the human genome might possibly have something to do with biomedical research. Using that kind of logic, one is forced to conclude that the most important result of ENCODE is to narrow the target by showing that 20% of the genome has nothing to do with biomedical research. But even that’s not true because most of the ENCODE leaders won’t rule out undiscovered functions in the remaining 20% of the genome.

It’s easy, and correct, to talk about “biochemical activities” as “putative function” or “potential function” and if that’s all that the ENCODE Consortium did then there would have been no headlines about the death of junk DNA. But even saying that 80% of the genome has a “putative function” is misleading since we know for a fact that one of the fundamental properties of DNA binding proteins is nonspecific (nonfunctional) binding and that these nonspecific sites must outnumber specific (functional) sites in a large genome (Yamamoto and Alberts, 1976) [see Slip Slidin' Along - How DNA Binding Proteins Find Their Target, DNA Binding Proteins ]..

Similarly, a great deal of the pervasive transcription detected by ENCODE is confined to a small number of cell types and very low abundance—a fact only reluctantly admitted eighteen months after the original papers were published (Kellis et al., 2014). What this means is that much of that pervasive transcription cannot be functional. So, we know for a fact that most of this “putative function” has to be nonfunctional (Struhl, 2007, van Bakel et al., 2010). Incidentally, one of the best ways to prove that accidental binding and spurious transcription is significant is to employ a negative control like the Random Genome Project (Eddy, 2013).

The best way to express this scientifically is not the statement that Germain et al. propose but something like: “The various sites identified by the ENCODE assays cover as much as 80% of the genome. Most of these sites will not have a biological function by any reasonable definition of ‘function’ but a small percentage of them have important, and well-understood, biological functions. It’s quite possible that an even smaller fraction of these sites have functions that we do not yet know about.” Somehow, that doesn’t seem quite as catchy as saying that 80% of the genome is functional.

In fairness, Germain et al seem to recognize the limitations of their argument when they admit that “this 80% cannot strictly speaking be called ... functional as ENCODE claimed.” However, they reveal their bias when they say that it is very likely to be functional. But this is the heart of the dispute. I, and many others, claim that most of this 80% is almost certainly nonfunctional and we have evidence and arguments to back up that claim. Evidence that Germain et al. seem to ignore.

Other philosophers believe that “function” can have different meanings depending on one’s interests. In Elliot et al. (2014) for example, the authors3 point out that medicine uses different definitions just as Germain et al. (2014) suggest. The example used by Elliot et al. is a mutation that causes cancer—this could be an oncogenic genome rearrangement, for example. Physicians could legitimately say that the mutation functions in causing cancer. This is not helpful.

The Many Meanings of “Function”

The issue of whether a large part of our genome is junk is not just a philosophical debate about the meaning of “function” but a large part of the Germain et al. paper is devoted to just that. The authors discuss two philosophical definitions called the causal role account of function and the selected-effect account. I find their discussion tedious and almost incomprehensible. The distinction between the two definitions is explained much better in Doolittle (2013) and Doolittle et al. (2014) but both discussions suffer from the over-emphasis of a false premise; namely, that it’s possible to define “function” in an unambiguous way that sheds light on the junk DNA debate.

The paper by Graur et al. (2013) suffers from the same problem. Those authors come down firmly on the side of selected-effect functions although they recognize that, “Estimates of functionality based on conservation are likely to be, well, conservative.” The best way to define function, according to Graur et al. is in terms of whether losing it has consequences. This is the best working definition, in my opinion: a sequence is functional if deleting it from the genome has an effect on the survivability of the organism or its progeny. This is the definition I’ve been using for almost two decades [see Junk DNA Poll].

Strictly speaking, this definition does not correspond to either the causal-role definition or the selected-effect definition because it can include functional DNA whose sequence is not conserved. This is the same definition used by Niu and Jiang (2013) for the same reason.

Dan Graur has expanded on this point in: "What is function?" A Section from a Future Textbook Chapter (would greatly appreciate comments) but now he seems to focus exclusively on functions that can be destroyed by mutation. This implies that a functional part of the genome has to have a specific sequence that is required for the function. This rules out spacer DNA and any of the bulk DNA hypothesis that are used by opponents of junk DNA. Even worse, this test of function fits the causal-role (CR) definition (not the selected-effect (SE) definition) according to some philosophers (Elliot et al., 2014).

As I mentioned above, Elliot et al. argue that different branches of biology use the word “function is different ways. They also argue that biologists who criticize ENCODE often appeal to the distinction between causal-role (CR) function and selected-effect (SE) function but “they do so in a way that many philosophers would find problematic.” It’s worth pointing out that philosophers are sometimes guilty of writing about biology in ways that many biologists would find problematic. The real question here is whether the debate about the amount of junk DNA in our genome is a biological problem, or a philosophical problem.

For the record, here’s the position adopted by Elliot et al. (2014)
Today, perhaps the closest thing to a consensus among philosophers of biology is that each function concept is associated with a distinct type of explanatory goal. On this view, the SE-function concept is appropriate for developing evolutionary or ultimate explanations, while the CR concept is appropriate for explaining proximate mechanisms.
I don’t know about you, but what this tells me is that philosophers aren’t going to make much of a contribution to the debate over junk DNA but they are going to be active participants in the function wars.

It is absolutely safe to say that if you meet somebody who claims not to believe in evolution, that person is ignorant, stupid or insane (or wicked, but I’d rather not consider that).

Richard Dawkins
I don’t think it’s possible to define biological function in a way that can satisfy everyone. This isn’t unusual in biology since there are many important words that resist airtight definitions. I’m thinking of “gene” and “species” but there are many more. I agree with Doolittle et al. (2014) and Graur et al (2013) in one sense; namely, that defining “function” in terms of evolution and conservation (selected-effect) is vastly superior to defining biological function in terms of something that just does something else (causal-role). I also agree with all critics of the ENCODE Consortium that their attempt to use a causal-effect definition of function was just plain silly. (Or, possibly wicked, but I’d rather not consider that.)

The ENCODE leaders now (2014) take a slightly different approach to defining function. They refer to three approaches to the problem: genetic, biochemical, and evolutionary (Kellis et al., 2014).

The genetic approach relies on identifying function by recognizing stretches of DNA where mutations have an observable effect. This is a pretty good way of recognizing function. I prefer to think of the genetic approach in terms of whether or not a given sequence can be deleted without causing any significant effect but the basic idea is the same. Kellis at al point out the technical limitations of the genetic approach but that’s not very relevant when we’re talking about ways of defining function.

The evolutionary approach looks at sequence conservation as the hallmark of functional regions of the genome. This is a tried-and-true method of recognizing functional regions of the genome but there are some limitations (see discussion below). There can, in theory, be large regions of the genome that are functional but not conserved in terms of sequence. There is no evidence that this possibility is correct although we know for a fact that there are small regions of the genome that fall into this category,

The ENCODE leaders want you to know that it’s not always easy to recognize short conserved (functional) regions of the genome because multiple sequence alignments are a “substantial challenge.” They remind us that secondary structures in RNA might be conserved even though the sequence can change and that you can have substitutions in binding sites that still allow significant binding. (Nevertheless, scientists have been successful at identifying consensus sequences for over three decades.) The ENCODE leaders also want you to know that new functional sequences that have arisen specifically in the human lineage cannot be detected by the evolutionary approach. While true, this is likely to be trivial, as far as I’m concerned, but there are a surprising number of scientists who actually believe that a large fraction of the genome could have evolved new essential functions since humans diverged from chimpanzees. That’s why they keep mentioning this possibility.

The biochemical approach looks at molecules and sequences to determine what they do. It’s an excellent experimental method of determining whether a given DNA sequence has a function. The only limitation is that you have to understand biochemistry and that means understanding that just because you detect a biochemical effect of some sort, does not mean that you have identified a function. For example, human transcription factors will bind to million of sites in plant genomes but this activity doesn’t mean that they have a function in plants. Similarly, human transcription factors MUST bind to junk DNA, if it exists, because that’s the nature of DNA binding proteins. That’s a biochemical fact that’s described in all the textbooks.

The problem, as I see it, is that while biological function can most often be associated with conservation and selection, it isn’t a sufficient definition and it sometimes misidentifies sequences that don’t really have a significant biological function. In other words, there are both false positives and false negatives.

A good working definition of “biological function” is to consider a particular stretch of DNA functional if deleting it affects the survival of the organism or its descendants. Conversely, if the DNA can be removed without consequences then it is probably junk. These are not rigorous definitions because there are all kinds of cases where a gene with a known function can be deleted without harm to the organism.

For example, think of our primitive ancestor who just acquired a mutation in the gene for making vitamin C. That sequence is now junk because it can no longer encode an enzyme but was it junk or was it functional just before it acquired an inactivating mutation? I think we would want to say that the DNA sequence encoding the enzyme (L-glucono-γ-lactone oxidase) has a biological function even if we know that deleting it will have no effect.

An even better example is the gene for the enzyme N-acetylaminogalactosyl-transferase. This is the gene that controls ABO blood types. People with O-type blood are homozygous for alleles that make the gene nonfunctional and no enzyme is produced [Online Mendelain Inheritance in Man (OMIM) 110300]. As a consequence, the protein on the surface of red blood cells is not glycosylated as it is in people with A-type, B-type, and AB-type blood.

There is no evidence that people with the defective gene and O-type blood are any worse off than people that have the glycosylated protein. Does that mean that the ABO gene is junk even though it has a well-defined function? I don’t think that makes a lot of sense. This is a functional gene even though it meets our working definition of junk DNA.

Given examples like these, the working definition of junk DNA is not an airtight, unambiguous, way to identify junk DNA because it includes some DNA that has a clear biological function. Conversely, it may be possible to delete fairly large regions of the genome without immediate consequences as was done in the now-famous mouse genome deletion experiment (Nobrega et al., 2004) but opponents of junk DNA will not accept this as proof that the DNA was junk because they can imagine functions that might go undetected under laboratory conditions. Furthermore, there are those who argue that if we were to delete all the putative junk DNA from our genome there would probably be consequences. Cells might be smaller and cell divisions might be more frequent so that humans with very little junk DNA might look very different. This could be true but it doesn’t mean that the extra DNA in our genome is actually functional. It’s still junk.

What this means is that defining junk DNA as DNA that can be deleted without consequences will always be contested by quibbling. Nevertheless, it’s the best definition we have and it works quite well as long as you ignore the nitpicking and think about the big picture. About 90% of our genome is junk according to the best available biological evidence. Quibbling about the meaning of “function” (or "junk”) isn’t going to change that very much. The gray area, where a given sequence could be “junk” or “functional” represents only a few percent of the genome. (Although it probably takes up 90% of the published literature.)

What about identifying function by relying on sequence conservation? This is an evolutionary definition. It seems to be a pretty good way identifying functional regions of the genome (Doolittle et al,. 2014, Graur et al., 2013) and it’s slightly different from a definition that identifies function by saying that the DNA can’t be deleted without consequences. Looking for sequence conservation is a positive way of recognizing functional regions of the genome—at least in theory. It has worked pretty well in the past 50 years or so.

I agree with most biologists that conserved DNA is a pretty good proxy for functional DNA and that nonconserved DNA is most likely junk. However, even this definition is neither inclusive nor exclusive. There are examples of conserved DNA that look like junk and examples of nonconserved DNA that has a function.

As mentioned above, two large regions of the mouse genome were deleted without effect (Norbrega et al., 2004). Together, those regions covered 1,243 segments of DNA that were 70% identical in mice and humans (100 bp. window). This tells us that sequence conservation is not a reliable indication of function.

Similarly, Ahituv et al. (2007) detected four “ultraconserved” regions of the mouse genome that were shown to function as enhancers in vitro. Deleting these regions from the mouse genome yielded viable, fertile, mice that were indistinguishable from mice whose genomes contained the ultraconserved regions. The regions were conserved and potentially functional but they appear to be junk DNA.

We also have examples of pseudogenes whose sequences are relatively conserved in closely related species but they are, nevertheless, junk. Bits and pieces of defective transposons are important examples in this discussion since they represent a significant portion of the genome that is conserved between, say, humans and chimpanzees. They are conserved because they descend from an active transposon that inserted into that locus in the common ancestor of chimpanzees and human. But, today, those sequences are junk.

Speaking of transposons, active transposons have enhancers, a promoter, and at least one open reading frame encoding reverse transcriptase or transposase, depending on the type of transposon. The gene is functional and so are the regulatory regions. Under the right circumstances the gene will be transcribed and the transposon can move to a new location in the genome. Are active transposons junk DNA or are they part of the functional portion of the genome?

The question is analogous to asking whether an integrated copy of bacteriophage lambda in the E. coli genome (prophage) is functional or not. I think we would want to say that it IS functional and so are active transposons. These are not true examples of junk DNA. (Active transposons make up only a tiny proportion of the mammalian genome so the resolution of this semantic problem has no effect on the big picture debate.)

Questions like this can be of immense interest to philosophers and to those interested in the philosophy of biology. The previously mentioned paper by Elliot et el. (2014) addresses just this point: Conceptual and Empirical Challenges of Ascribing Functions to Transposable Elements. They talk about distinguishing between different levels of function such as the organismal level and the transposon level. It’s not clear whether they consider transposons functional at the transposon level and junk at the organismal level because much of the discussion is about whether transposons can affect the survival of the organism. That paper (Elliot et al., 2014) is a good example of the difficulties one can get into when the emphasis is on semantics (or philosophy) rather than the real question of how much of our genome is junk.

So conservation doesn’t necessarily mean that the DNA is functional. But are there examples of nonconserved sequences that are functional? Yes, there are. The best examples are spacer DNAs that separate DNA binding sites that have to form a loop when bound by their respective factors. The classic example is binding of lac repressor to two operator sites upstream of the promoter for the lac operon (Krämer et al. 1987; Krämer et al. 1988). You need the spacer but its sequence is unimportant. It has a function. Similarly, there’s a minimal size of intron because the assembly of the spliceosome requires an RNA loop [Junk in Your Genome: Protein-Encoding Genes] [Junk in Your Genome: Intron Size and Distribution].

These particular exceptions aren’t going to make much of a difference because they don’t involve a large percentage of the genome. That’s why sequence conservation is a good approximation of function and lack of conservation is still a fairly reliable indicator of junk DNA.

However, there are some possible “exceptions” to the rule that may be more important. One of them concerns a different kind of “spacer” DNA based on our understanding of chromosome bands and puffs in Drosophila polytene chromosomes and lampbrush chromosomes in vertebrate oocytes (especially amphibians). The idea is that genes are arranged on long loops of DNA that form compact higher order chromatin structures when the genes are silent but large extended loops when they are active. Emil Zukerkandl suggested back in 1976 that a certain amount of spacer DNA was necessary to keep genes apart on these loops and to form the complex heterochromatic state required for gene silencing. If more complex species needed more spacer DNA (larger loops), this would explain the C-value paradox (Zuckerkandl, 1976). A similar idea was suggested by Gall (1981).

There’s no evidence to support this hypothesis so it has been ignored in recent years. I mention it only to show that there are “spacer DNA” explanations that can account for a large percentage of the genome. This is DNA that cannot be identified by sequence conservation.

In addition, some people think that bulk DNA serves an important function in protecting against mutation, or in regulating the size of the nucleus. (There are other possibilities.) The point is that these bulk DNA hypotheses, like the one mentioned above, do not require sequence conservation but they do postulate that a lot of DNA has a function—it is not junk. If any of these hypotheses are correct then sequence conservation is not a reliable proxy for function. Fortunately, none of the bulk DNA hypotheses make any sense, so the point is moot.

So, we can adopt a working definition of function and junk based on whether or not deleting the DNA in question affects the survivability of the organism or its descendants. (Keeping in mind that there are minor exceptions).

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

1. Alex Palazzo suggested that we call these the “function wars.” Thanks, Alex.

2. At a cost of $200,000,000.

3. Only one of them, Linquist, is a card-carrying philosopher.

Ahituv, N., Zhu, Y., Visel, A., Holt, A., Afzal, V., Pennacchio, L. A. and Rubin, E. M. (2007) Deletion of ultraconserved elements yields viable mice. PLoS biology 5, e234.

Doolittle, W. F. (2013) Is junk DNA bunk? A critique of ENCODE. Proceedings of the National Academy of Sciences 110, 5294-5300. [doi: 10.1073/pnas.1221376110 ]

Eddy, S. R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology 23:R259-R261. [10.1016/j.cub.2013.03.023]

Elliott, T. A., Linquist, S. and Gregory, T. R. (2014) Conceptual and empirical challenges of ascribing functions to transposable elements. The American naturalist 184:14-24. [doi: 10.1086/676588]

Gall, J. G. (1981) Chromosome structure and the C-value paradox. The Journal of cell biology 91, 3s-14s. [PDF]

Germain, P.-L., Ratti, E. and Boem, F. (2014) Junk or functional DNA? ENCODE and the function controversy. Biology & Philosophy, 1-25. (published online March 21, 2014) [doi: 10.1007/s10539-014-9441-3]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A. and Elhaik, E. (2013) On the immortality of television sets:“function” in the human genome according to the evolution-free gospel of ENCODE. Genome biology and evolution 5, 578-590. [doi: 10.1093/gbe/evt028]

Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E. and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proceedings of the National Academy of Sciences 111, 6131-6138. [doi: 10.1073/pnas.1318948111]

Krämer, H., Niemöller, M., Amouyal, M., Revet, B., von Wilcken-Bergmann, B. and Müller-Hill, B. (1987) lac repressor forms loops with linear DNA carrying two suitably spaced lac operators. The EMBO journal 6:1481-1491. [PDF]

Krämer, H., Amouyal, M., Nordheim, A. and Müller-Hill, B. (1988) DNA supercoiling changes the spacing requirement of two lac operators for DNA loop formation with lac repressor. The EMBO journal 7:547-556. [PDF]

Niu, D.-K. and Jiang, L. (2013) Can ENCODE tell us how much junk DNA we carry in our genome? Biochemical and biophysical research communications 430, 1340-1343. [doi: 10.1016/j.bbrc.2012.12.074

Nobrega, M. A., Zhu, Y., Plajzer-Frick, I., Afzal, V. and Rubin, E. M. (2004) Megabase deletions of gene deserts result in viable mice. Nature 431, 988-993.

Palazzo, A.F. and Gregory, T R. (2014) The Case for Junk DNA. PLoS genetics 10, e1004351 [doi: 10.1371/journal.pgen.1004351]

Struhl, K. (2007) Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature structural & molecular biology 14:103-105. [doi: 10.1038/nsmb0207-103]

van Bakel, H., Nislow, C., Blencowe, B. J. and Hughes, T. R. (2010) Most “dark matter” transcripts are associated with known genes. PLoS biology 8, e1000371. [doi: 10.1371/journal.pbio.1000371]

Yamamoto, K. and Alberts, B. (1976) Steroid Receptors: Elements for Modulation of Eukaryotic Transcription. Annual review of biochemistry 45, 721-746.

Zuckerkandl, E. (1976) Gene control in eukaryotes and thec-value paradox “Excess” DNA as an impediment to transcription of coding sequences. Journal of molecular evolution 9, 73-104. [PDF]

113 comments :

DGA said...

2 entries you refer to do not seem to be available (Slip Slidin' Along - How DNA Binding Proteins Find Their Target , DNA Binding Proteins ). I get the message "Sorry, the page you were looking for in this blog does not exist." for both

Georgi Marinov said...

There is really way too much quibbling about things that really don't matter that much in the whole discussion.

The important questions are:

1) Is most of the genome junk, or more specifically and what is really the heart of the "debate", is most of the content of the large genome we observe in certain lineages there in order to and necessary for the specification of their generally (but by no means always) higher organismal complexity.

2) Are the genomes in question shaped largely by adaptive or nonadaptive evolutionary forces?

3) In the case of the human genome, what is the precise identity and function of the relevant to the phenotype segments of the genome?

1 & 2 are important because they affect how we think about the genome and ultimately, ourselves. 3 is important in practical biomedical terms.

A lot of the quibbling about what exactly is junk and how we define function is really deep in the weeds and does not really affect the answers to questions 1 & 2, and with respect to 3, it has to be recognized that "function" is not a binary attribute that a piece of DNA either has or does not have, but is more of a continuously distributed variable - sure, there are examples of DNA that suddenly became functional or (more often) nonfunctional, but IMO the slow drift into and out of functionality and latent functionality is more common, especially for regulatory elements.

Tyler said...

As first author of Elliott et al. 2014, my official position is many TEs are functional at their level (selfish), but not functional at the level of the host. In most vertebrate genomes the vast majority of TE sequences are inactive and fragmented, and no longer capable of selfish replication and are best thought of as junk until sufficient evidence to the contrary comes to light.

John S. Wilkins said...

That's an interesting take on "functional" Tim. It suggests that "functional" is scale and system relative.

Aceofspades said...

Larry,

You give the following definition:

"A sequence is functional if deleting it from the genome has an effect on the survivability of the organism or its progeny."

By this definition then, wouldn't all of our human DNA be functional since we need 90% of our DNA to be non-sequence specific in order to serve as a buffer against mutations. Without this, it would affect the survivability of our progeny.

Surely your definition also needs to include sequence specificity as well?

Aceofspades said...

Perhaps a better definition would be:

"A sequence is functional if scrambling it has an effect on the survivability of the organism or its progeny"

Anonymous said...

There could be a more subtle example of function along the lines of aceofspades above and following Larry's definition.
If a particular TF had 50 functional binding sites and 950 non-functional semi-canonical sites and you went in and somehow managed to genetically engineer all of the non-functional sites out it would probably have a deleterious effect on the organism. The reason is the expression level of the TF protein and its bind strength to its functional sites has been 'tuned' to the fact that most of it will be soaked up by non-functional sites. Removing them is the equivalent to whopping over-expression. But this is function in only a roundabout sense

Piotr Gąsiorowski said...

Shouldn't it be "a deleterious effect", just in case scrambling should accidentally produce something functional?

Joe Felsenstein said...

Larry: very good discussion.

One quibble: when you say that

Similarly, Ahituv et al. (2007) detected four “ultraconserved” regions of the mouse genome that were shown to function as enhancers in vitro. Deleting these regions from the mouse genome yielded viable, fertile, mice that were indistinguishable from mice whose genomes contained the ultraconserved regions.

... then I wonder, indistinguishable by whom? If there were a 1% difference in fitness between those mice and their peers, that would be a very strong natural selection that would be effective in keeping those regions conserved. But in the lab, we would never notice it unless we did massive breeding experiments.

Molecular biologists often make that mistake -- Benjamin Lewin, editor of the journal Cell, once declared that it is a problem that so many genes can be deleted without noticeable effect. He forgot that Benjamin Lewin's ability to "notice" is much poorer than nature's. Fitness is not just either 0 or 1, but there are values in between.

Faizal Ali said...

But then "scrambling" might produce a new function that is neither advantageous or deleterious. Which would render the sequence "junk" by that definition.

Larry's definition seems much better. That junk DNA serves as a buffer against mutations does not change the fact that deleterious effects only result from changes in functional sequences.

Corneel said...

@Aceofspades
Larry addressed this point in the second-last paragraph. Bulk DNA hypotheses, such as protection from mutation, aren't very well supported.

Bryan said...

There is no evidence that people with the defective gene and O-type blood are any worse off than people that have the glycosylated protein. Does that mean that the ABO gene is junk even though it has a well-defined function?
I don't think your ABO example was a very good one - if anything, it really highlights the difficulty in defining "function".

There is pretty good evidence that ABO blood types evolved in response to infectious burdens. Non-B blood types appear to have evolved in regions with high levels of malaria, and emerged at roughly the same time as the species of malaria which infect humans. Both A and O type individuals are less likely to suffer severe malarial consequences, and if infected when pregnant, have better pregnancy outcomes (as a rule O's do much better than A's & B's, but both 'O's and 'B's do better that 'A's). The mechanism appears to be the evolution of reduced cytoadherance between infected RBCs (which, when they occlude vessels, causes ischemia, which is responsible for a lot of the pathophysiology of malaria).

So in this example A & B have an obvious biochemical function - glycosylation, but oddly fail your definition of function - e.g. you can delete them - ala the 'O' allele - without reduced fitness. In fact, in malaraia-prone regions fitness increases.

The 'O' allele has no biochemical function but has an evolutionary one - in the sense that the 'O' allele provides a survival benefit when the carrier encounters what is probably the most serious pathogen humans have encountered in their recent evolutionary history.

Corneel said...

But this is function in only a roundabout sense
Why? It fits all of the definitions given above. Organisms aren't constructed; they evolve. It is only to be expected that sometimes an organism comes to depend on a certain DNA-sequence by chance. My only doubt is whether Larry's definition allows you to remove all of these sites at once. Removal of any individual site will have negligible effect.

Piotr Gąsiorowski said...

What about dysfunctional sequences (with a negative effect in terms of survival/reproductive success)? Scrambling or deleting them produces a positive effect.

Anonymous said...

I think this is the kind of 'function' that philosophers would get very excited about but biologists would note in passing.
To the extent this discussion on junk to to deprive creationists of the argument that a mostly functional genome is consistent with design, this type of function is certainly not consistent with a designer.

The whole truth said...

So where does that leave me? I have AB negative blood.

Corneel said...

Absolutely! The word "function" is very misleading, as it summons shadows of the watchmaker. Not only creationists suffer from that bias.

Tyler said...

I think it's important to remember that the genome is composed of multiple levels of evolution (selection, drift, etc.) and that it's not necessarily easy to give a single, simple to answer to a given question

Aceofspades said...

@Corneel

I guess I don't understand why he would say:

"Fortunately, none of the bulk DNA hypotheses make any sense, so the point is moot."

I'd appreciate if anybody could help me understand this from a genetic load perspective. If we accumulate 100 new mutations per generation and if about 10% of our genome is functional. Then does that not mean that each new generation accumulates 10 additional deleterious mutations?

Larry Moran said...

Sorry 'bout that. I think I fixed all the broken links.

Bryan said...

Then you have one copy of 'A' and one of 'B' (and none of 'O'). I'd guess you'd end up with an intermediary phenotype, but AFAIK, no one has assessed that in detail.

Personally, I'd recommend avoiding malaria over relying on your bloodtype to protect you...

Larry Moran said...

I agree with you.

But didn't your name just appear on a PNAS paper that devoted three pages to discussing different ways of defining "function"? Correct me if I'm wrong but isn't this what you wrote? ....

Quest to Identify Functional Elements in the Human Genome

... the scale of the ENCODE Project survey of biochemical activity (across many more cell types and assays) led to a significant increase in genome coverage and thus accentuated the discrepancy between biochemical and evolutionary estimates. This discrepancy led to much debate both in the scientific literature and in online forums, resulting in a renewed need to clarify the challenges of defining function in the human genome and to understand the sources of the discrepancy.

To address this need and provide a perspective by ENCODE scientists, we review genetic, evolutionary, and biochemical lines of evidence, discuss their strengths and limitations, and examine apparent discrepancies between the conclusions emanating from the different approaches.

Larry Moran said...

Tyler,

I think it's a shame that you didn't come right out and say that active transposons are functional at the TE-level but junk at the organismal-level. By not saying that, you contributed to the very "lack of clarity" that you were trying to dispel.

Also, as John points out. To declare that the word "function" (and "junk") depends on context and the interest of the researcher is sort of like begging the question.

Finally, your statement that there are multiple mechanisms of evolution (e.g. natural selection and random genetic drift) doesn't make any sense. What did you mean by that?

Larry Moran said...

Aceofspades says,

By this definition then, wouldn't all of our human DNA be functional since we need 90% of our DNA to be non-sequence specific in order to serve as a buffer against mutations. Without this, it would affect the survivability of our progeny.

If it were true that this extra DNA was actually selected for its ability to protect against mutations then it would not be junk by my definition.

However, I have never seen a rational defense of such a claim. Would you like to the the first one to explain how all that extra DNA in various species of onion has the function of protecting the genes (and other sequences) against mutation?

Larry Moran said...

Joe,

I understand your argument. You can use it to challenge any attempt to demonstrate that a given stretch of DNA is junk (has no biological function). What you are doing is putting the onus on junk DNA proponents to prove that the DNA is functionless.

However, your argument doesn't work when you try to use it to look at forests instead of trees. If all of the excess DNA has a very tiny evolutionary function then the genetic load would be intolerable, wouldn't it? Also, I don't think your argument passes the onion test when applied at the level of the entire genome and it certainly doesn't explain the C-Value Paradox the way junk DNA does.

Larry Moran said...

Bryan says,

The 'O' allele has no biochemical function but has an evolutionary one - in the sense that the 'O' allele provides a survival benefit when the carrier encounters what is probably the most serious pathogen humans have encountered in their recent evolutionary history.

So, a gene has a function when it makes a protein/enzyme but when you knock out that function it also has a function because it doesn't make a protein?

Do you really think that's a helpful contribution to the function wars? :-)

Larry Moran said...

Aceofspades asks,

I'd appreciate if anybody could help me understand this from a genetic load perspective. If we accumulate 100 new mutations per generation and if about 10% of our genome is functional. Then does that not mean that each new generation accumulates 10 additional deleterious mutations?

No, it does not mean that.

It means that the functional part of our genome acquires about 10 new mutations every generation. Some of these might be lethal so they never show up in newborn babies. Many of them are effectively neutral. Only a couple of them could be deleterious and we can tolerate that without going extinct.

Bryan said...

Do you really think that's a helpful contribution to the function wars? :-)
Well, it does somewhat highlight the futility of trying to define functional...plus it does fit your definition!

A good working definition of “biological function” is to consider a particular stretch of DNA functional if deleting it affects the survival of the organism or its descendants

Deleting A or B (e.g. making it 'O') effects survival ... opposite of how you seem to be using your definition ;-)

Mikkel Rumraket Rasmussen said...

If Quest is to identify functional elements in the human genome I might as well answer for him: All.

:P

Tyler said...

I can only speak for myself but I distinguish between junk DNA and selfish DNA. Inactive, dead TEs for which we have no evidence of host or element level function are junk until proven otherwise. I dislike using the term junk for actives TEs because I find it ignores the the fact that there are multiple levels going on in the genome and we need to take that into account.

Sorry, I didn't mean to say that selection, drift, and other evolutionary forces are different levels. I meant it in the sense of levels in the hierarchy (TEs, cells, organisms, populations, etc.) at which those forces can act. People often say levels of selection but that is not the only force going on at multiple levels.

I think because we have this situation of a multi-level entity such as the genome we do have to keep context (levels) in mind. I see the ignorance of this fact in the TE literature all the time and I think it creates problems in interpretation of the information people gather.

Aceofspades said...

Thanks Larry,

With regards to the onion test, I agree that the vast majority of the onion genome will be junk.

Elsewhere on this blog you have a running calculation for the amount of functional DNA in the human genome. Currently the total is sitting at 8.7% - this agrees with the fact that about 9% of our genome is conserved.

What fraction of mutations within this functional region would you say are neutral?

judmarc said...

A good working definition of “biological function” is to consider a particular stretch of DNA functional if deleting it affects the survival of the organism or its descendants

Can/should we insert "adversely" between "t" and "affects"? I.e., "...if deleting it negatively affects the survival of the organism or its descendants."

Nick Jeffery said...

I agree with Tyler above that
i) Junk DNA isn't a good term to describe TEs in general. Active TEs that can replicate or move within the genome are better accounted as "selfish" while dead TEs that can no longer replicate and are likely subject to neutral mutation within the genome at this point are best described as "junk"
ii) It's best to think of the genome in terms of multi-level selection. As he mentioned, different forces may be occurring depending on how you look at the genome. Active TEs may be "functional" as defined in Elliott et al. (2014) at their own selfish level, but may not be considered functional for the organism itself (this depends on where they insert in the genome, and may be deleterious, neutral or beneficial). However, despite the fact that TEs may be functional at their own level and non-functional at the host level does not classify them as "junk".

Corneel said...

@judmarc
Why on earth would we want to do that? The fitness effects of the AB0 locus are simply dependent on environmental and genomic context. This is true for many, if not, all genes.

BTW, there are some indications for balancing selection at this locus, so I agree with Bryan it may not be a good example of a gene without fitness consequences.

Piotr Gąsiorowski said...

Aceofspades:

By this definition then, wouldn't all of our human DNA be functional since we need 90% of our DNA to be non-sequence specific in order to serve as a buffer against mutations. Without this, it would affect the survivability of our progeny.

If we had less junk DNA but the same amount of "functional" DNA, the number of mutations per generation in the functional region would be roughly the same (and the total numbert of mutations, harmful, neutral or advantageous, would be smaller in a smaller genome). I hope I don't misunderstand genetic load completely, but it seems to me it would only increase if the functional region were larger in absolute terms. It doesn't really matter what fraction of the whole genome it amounts to (which is precisely why the junk fraction varies so wildly from taxon to taxon). Removing junk should not in principle increase the load.

T Ryan Gregory said...

I would encourage anyone with an interest in this issue to actually read our paper for themselves.

Conceptual and Empirical Challenges of Ascribing Functions to Transposable Elements

In case you can't access it but would like to know what our major arguments are, here is the Abstract:

"Media attention and the subsequent scientific backlash engendered by the claim by spokespeople for the Encyclopedia of DNA Elements (ENCODE) project that 80% of the human genome has a biochemical function highlight the need for a clearer understanding of function concepts in biology. This article provides an overview of two major function concepts that have been developed in the philosophy of science—the causal role concept and the selected effects concept—and their relevance to ENCODE. Unlike in some previous critiques, the ENCODE project is not considered problematic here because it employed a causal role definition of function (which is relatively common in genetics) but because of how this concept was misused. In addition, several unique challenges that arise when dealing with transposable elements (TEs) but that were ignored by ENCODE are highlighted. These include issues surrounding TE level versus organism-level selection, the origins versus the persistence of elements, and accidental versus functional organism-level benefits. Finally, some key questions are presented that should be addressed in any study aiming to ascribe functions to major portions of large eukaryotic genomes, the majorities of which are made up of transposable elements."

And the Concluding Remarks:

"The possibility that the majority of noncoding DNA plays an important functional role at the organism level has been actively discussed for many decades. While it is not true that most of the genome was simply dismissed as useless junk, there have long been legitimate debates regarding the percentage of DNA that is biologically important in large eukaryotic genomes. This is a question that will require both empirical data and conceptual clarification to resolve.

For example, the recent claims by the ENCODE project leadership that 80% of the human genome can be assigned a “biochemical function” are highly misleading because of the way in which the concept of “function” was employed. The issue is not simply that ENCODE made use of a causal role definition of function rather than a selected effects definition, as the CR definition is relatively common in genetics. Rather, it is because ENCODE misapplied this definition of function by using criteria that were far too broad. Equivocation between this loose concept of CR function and phenotypically relevant biological functions exacerbated the confusion surrounding the ENCODE results.

As described in this article, ascribing functions to specific components of the genome is uniquely challenging when the sequences involved are transposable elements. Their capacity for autonomous replication creates several major complications that confound the use of functional assessments typically implemented in studies of genes or regulatory regions. These unique challenges were ignored by ENCODE because the entire human genome was treated in the same way, despite the fact that it is made up primarily of TEs. Future work that aims to provide an estimate of the percentage of DNA in the human genome with a biologically meaningful function at the organism level will therefore require a much more sophisticated approach that takes these issues into account."

Larry Moran said...

Ryan and I are discussing this on his Facebook page. facebook.com/tryangregory

He thinks my post is extraordinarily muddled about function concepts. He says, "... it's clear that you didn't understand the paper(s) you criticize."

His recent comment is ...

As I said, I don't think you understood the point or content of the paper, so we'll just have to agree to disagree about the usefulness of our discussion. I also don't see why you are ao focused on which TEs count as junk - - junk is even more nebulous concept than function. Meanwhile, if you're not interested in this topic, just don't read that literature about it. It doesn't mean no one else should care about or work on these issues.

I'm trying to understand where he thinks I've failed to understand his paper. If I find out, I'll post an update with the correct interpretation according to Ryan Gregory.

I think it's possible that I understand Ryan's paper but that I disagree with parts of it. Ryan might be seeing this as a lack of understanding rather than a legitimate difference of opinion. I've re-read the paper and my blog post and I still don't see a problem.

BTW, Ryan confirms that active transposons are junk DNA by his way of defining terms. I wish he and the other authors had specifically stated this in the paper because it would be an example of a coding region that is junk and that's a significant point that should have merited further discussion in the paper. It would mean that junk DNA is not confined to noncoding DNA as most people believe.

I'm also a bit confused about his claim that "junk" is more nebulous than "function." It seems to me that those are the only two choices so that if you define one then you define the other. Maybe there's a third option that I'm not familiar with?

The whole truth said...

Bryan, thanks for your response.

The whole truth said...

Does this affect the debate about junk DNA?

http://www.sciencedaily.com/releases/2014/06/140623131331.htm

Aceofspades said...

Not likely.

1) Are the proteins that they encode functional? This is an open question that is being researched but it seems unlikely. There have been other examples of novel proteins found in humans but their expression levels are so low that they are unlikely to be functional.

2) This is in yeast and has yet to be confirmed whether these are found in mammals.

3) How large is the subset of these lncRNAs that is engaged by the translation machinery and so can produce protein products? It seems like it would probably be insignificant.

Aceofspades said...

Hey Piotr,

That makes sense. After looking into the genetic load argument for junk DNA by A. Palazzo and Ryan Gregory (http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1004351), I was mistakenly lead to believe that we need a certain ratio of junk to non-junk in order to buffer against mutations but I can see now that that doesn't follow.

What I don't understand is how they can both say "only 1% of the nucleotides in the genome are essential for viability in a strict sequence-specific way" and then also say: "at most 10% of the human genome exhibits detectable organism-level function"

How do we explain that difference?

Larry Moran said...

Aceofspades, read my blog post on "What's in Your Genome? and let me know if there's anything you still don't understand.

Aceofspades said...

Thanks Larry,

I understand that post, but my question is:

Of those sequences that are functional (the 8.7%), what proportion of the mutations that occur within that region would you say are neutral?

Overall, we should expect to find that only 10% of mutations within this functional region are deleterious if we expect there to be only 1 new deleterious mutation per individual.

I saw another post elsewhere where you argued that only 20% of mutations within exons are deleterious.

So what about all the other stuff? The α-satellite DNA in the centromeres, the regulatory sequences and the other conserved intragenic DNA.

Can they handle many more mutations than the exons?

T Ryan Gregory said...

Larry, by your own supposedly clearer definition of "function", active TEs would be non-functional if they don't contribute to organism survival. And we did not say that all active TEs are necessarily non-functional *at the organism level*. We said, and I told you again in the discussion, that some may be functional, some may have beneficial side-effects for the host, and many are probably not functional *for the organism*. You don't seem to acknowledge that we're discussing this from a multi-level perspective.

I also completely disagree with your opinion that working out concepts of function is mere "quibbling" and is not productive. On the contrary, I consider it a fundamental part of the debate and a necessary step for focusing future research. Just because there isn't a single, easy definition (though you seem to want to present one nonetheless) doesn't mean it is a conceptual free-for-all. That's how we got the equivocation from ENCODE to begin with. It's fine if you have particular views on definitions, but you are not simply offering a different set of ideas, you're suggesting that other people should not even bother working on this issue.

Again, I will simply direct your readers to the actual papers and leave it at that.

Elliott, T.A., S. Linquist, and T.R. Gregory (2014). Conceptual and empirical challenges of ascribing functions to transposable elements. American Naturalist 184: 14-24.

Doolittle, W.F., T.D.P. Brunet, S. Linquist, and T.R. Gregory (2014). The distinction between “function” and “effect” in genome biology. Genome Biology and Evolution 6: 1234-1237.

Aceofspades said...

There seems to be this assumption that mutations have an equal chance of appearing at any point in the genome.

Could it not be that there is some other sort of selection effect going on here?

Perhaps mutations aren't distributed evenly and some regions are just ultrastable while other are just more error prone because somehow the structure of the chromosome at that point makes it so?

John Harshman said...

Larry,

I don't think Joe was intending a defense of the idea that the entire genome is functional. Or claiming that we can't tell whether most of the genome is junk. He's saying quite the opposite, that when we have other good evidence for function -- they're ultra conserved, for FSMsake -- the inability to detect selection in the lab isn't good evidence of non-function. Are you sure you don't agree?

Joe Felsenstein said...

I agree with John -- that is what I intended. I am very far from thinking that most of the genome is "functional" in any meaningful sense.

Larry Moran said...

Larry, by your own supposedly clearer definition of "function", active TEs would be non-functional if they don't contribute to organism survival.

As I point out in my blog post, active TEs don't fit my definition. They can be deleted without effect but I still don't want to call them junk because they have a clear biological function. That's why NO definition works in all cases and it's a waste of time to try and come up with foolproof definitions.

And we did not say that all active TEs are necessarily non-functional *at the organism level*. We said, and I told you again in the discussion, that some may be functional, some may have beneficial side-effects for the host, and many are probably not functional *for the organism*. You don't seem to acknowledge that we're discussing this from a multi-level perspective.

I understand the distinction you are trying to make about having different definitions of "function" for different (ill-defined) levels. That does not mean I have to agree with it.

I also understand that you discuss abstract theoretical cases where transposons can have a "function" at the organismal level that's different from their function at the TE-level. What I was looking for in the paper was a clear statement on whether active TEs are junk DNA in the absence of any of these theoretical functions. You have answered on Facebook. I understand that the answer is "yes," active TEs are junk by your definition unless they have a new and different function unrelated to their selfish DNA function. My opinion is not the same as yours. That does not make you wrong but it also doesn't make you right. Your paper would have benefited enormously from a clear statement on this question along with a discussion about why some people legitimately disagree with you.

As I pointed out on Facebook, such a statement would have made it clear that you consider some coding regions (genes for transposase and reverse transcriptase) to be junk DNA and that would have been a contribution to clarity since many scientists (including you) frequently assume that junk DNA is confined to noncoding DNA.

T Ryan Gregory said...

But Larry, obviously you *do* care about the issue of definitions. You keep saying that you wish we had given a more specific indication of what counts as junk, and you present your own definition of function in this post. (You don't justify it or actually deal with the glaring exceptions, but that's beside the point).

In any case, I invite you to make a contribution to the primary literature in which you clearly lay out, develop, and defend your ideas about how to define "junk" and "function" -- or indeed, why they should not or can not be defined, if that is your position.

Larry Moran said...

I also completely disagree with your opinion that working out concepts of function is mere "quibbling" and is not productive.

I discuss several papers in my blog and there's more to come. How do you think the "productive" thingy is working out so far? Do you think that everyone reading this blog, and the papers, now has a much clearer idea understanding of the word "function"? I don't.

The Germain et al. paper was a very philosophical approach to the issue and they defended the ENCODE definition. You think that was "productive"?

Just because there isn't a single, easy definition (though you seem to want to present one nonetheless) doesn't mean it is a conceptual free-for-all.

From my perspective, it certainly looks like a conceptual free-for-all where everyone offers their own opinion on the precise meaning of "function" and "junk." Your own definition of "junk" has evolved considerably over the past decade but 90% of the genome is still junk.

It's fine if you have particular views on definitions, but you are not simply offering a different set of ideas, you're suggesting that other people should not even bother working on this issue.

Yes, that's exactly what I'm suggesting. I'm not looking forward to a plethora of papers from scientists and philosophers arguing about the nuances of causal-role and selected-effect definitions and discussing whether DNA can be junk if you look at it from one perspective but not from another.

The net effect of all that will be to lend credence to the position taken by the ENCODE leaders. After all, if scientists and philosophers can't agree on a definition then maybe the ENCODE definition is okay after all.

Ryan, I would rather not have these debates over the precise meaning of "function" and "junk" but you, and others, have chosen to engage in this quibbling exercise. Having made that choice, don't be surprised if people quibble about what you wrote in your paper.

You should be prepared to defend what you wrote.

Larry Moran said...

Ryan Gregory says,

But Larry, obviously you *do* care about the issue of definitions.

Yes, of course I care. We need to have some idea of what we're talking about when we use the words "function" and "junk."

You keep saying that you wish we had given a more specific indication of what counts as junk, ...

That's correct. Neither of your recent papers offers a specific definition of "function" or "junk" and the Elliott et al. paper doesn't really commit to whether transposons are junk DNA or not.

...and you present your own definition of function in this post. (You don't justify it or actually deal with the glaring exceptions, but that's beside the point).

Hmmm ... I called it a "working definition" in order to highlight the fact that it was not intended to be a philosophically defensible definition that would withstand all criticism. We need something. If you have a better "working definition" then please let me know.

Right after presenting this definition as a possibility, I add ...

These are not rigorous definitions because there are all kinds of cases where a gene with a known function can be deleted without harm to the organism.

I then go on to describe some cases where my "working definition" doesn't work. I can find exceptions to every single definition I've ever seen. Why don't we just admit that quibbling about CR and SE isn't getting us anywhere and just settle on some reasonable definition? Then we can get on with the task of looking at real biological data to decide whether most of our genome is junk?

In any case, I invite you to make a contribution to the primary literature in which you clearly lay out, develop, and defend your ideas about how to define "junk" and "function" -- or indeed, why they should not or can not be defined, if that is your position.

I'm doing that on my blog. It's faster and a lot cheaper. (I don't have several thousand dollars to spend on publishing papers in science journals.) You reference your own blog in your "primary literature" publications so now you can reference mine as well! :-)

Georgi Marinov said...

@ Laurence A. Moran

I don't really see where the contradiction is

Georgi Marinov said...

Laurence A. MoranSaturday, June 28, 2014 8:04:00 AM

I'm doing that on my blog. It's faster and a lot cheaper. (I don't have several thousand dollars to spend on publishing papers in science journals.) You reference your own blog in your "primary literature" publications so now you can reference mine as well! :-)


Yes, but first, based on my personal observations you are the exception among the people from your generation, who generally do not read blogs, and second, publications in the scholarly literature still hold more significantly more weight in people's minds than blog postings. It would be of everyone's benefit to have a more official paper trail of people's opinions. I don't think the cost is such an impediment, not all journals charge thousands of dollars for publication.

Larry Moran said...

You probably think the cost is unimportant because you don't have to pay for it out of your own personal bank account. :-)

The most important reasons for preferring blogging over publishing are: (1) instant feedback via comments - I love the debate, (2) the ability to make corrections and updates, (3) photos and images, (4) direct lnks to other blog posts and publications, (5) speed, I don't have the patience to wait six months before expressing my opinion, (6) ego, I don't have to edit my opinion based on criticisms from reviewers (admittedly, about 5% of reviewers turn out to be helpful), and possibly (7) more people will read a blog post than a published paper.

Georgi Marinov said...

It is not a question of one approach or the other, they are complementary and both are necessary in this case.

Also, there are cheap options like PeerJ (which is $100 per paper although it does not publish opinion and perspective pieces), and now there is bioRxiv too. And if someone invites you to write a perspective on the subject, I would imagine that would be free.

Claudiu Bandea said...

Laurence A. Moran asked Georgi Marinov: “But didn't your name just appear on a PNAS paper that devoted three pages to discussing different ways of defining "function"? Correct me if I'm wrong but isn't this what you wrote? ....

Larry,

Before asking Giorgi more questions, I think it would make sense he answers the questions you asked him a couple of weeks ago (http://sandwalk.blogspot.com/2014/05/how-does-nature-deal-with-encode.html#comment-form):

“Are you saying that over a period of five or six years the majority of members of the ENCODE Consortium were just interested in data collection and storage and didn't think much about the implications or whether they were actually cataloguing sites that had biological significance?”

“Are you suggesting that during group meetings nobody wondered wether the pervasive transcription they were recording was real or just spurious transcripts as many had already suggested in the published literature."

"Are you telling us that nobody in those labs raised any questions about nonspecific binding of transcription factors as described in the textbooks?"

"Is it true that none of the PI's, postdocs, or graduate students gave journal club presentations on the junk DNA controversy and how if impacted the work they were doing on the characterization of the human genome?”

“Is it really true that the people in your lab never talked about whether the transcription factor binding sites they were analyzing were really regulatory sites or artifacts?"

Did you never discuss ways of identifying functional sites from nonfunctional sites or was the goal just to publish the locations of all the sites and let someone else try and figure out which ones were real?”

“Are you saying that it's true that the members of the ENCODE Consortium weren't very interested in making sense of their results and trying to understand the biologial functions of the genome?"

I also ask him the following questions, which might be relevant for all ENCODE scientists:

1. Considering that the C-value paradox (also referred to as C-value enigma) has been one of the most fundamental concepts in genome biology for decades, how is it possible to design and conduct a huge project on the human genome, such as ENCODE, without having this fundamental concept at the center of it?

2. Apparently, you just finished graduate school, focusing I presume on studying genome biology; in your studies on genome biology and evolution have you learned about C-value paradox? More specifically have you studied the articles written by the scholars in the field such as, for example, those written by out host Larry Moran or Ryan Gregory?

Claudiu Bandea said...

I have a few more questions for Georgi (please don’t take it personally, but at least here at Sandwalk you represent the ENCODE scientists) regarding the PNAS paper by Kellis et al. ( http://www.ncbi.nlm.nih.gov/pubmed/24753594):

1. Why did you write the paper?

2. What new findings, ideas and concepts have you presented?

3. Are your going to revise or retract the paper as suggested at Lior Pachter’s blog “Bits of DNA”( http://liorpachter.wordpress.com/2014/04/30/estimating-number-of-transcripts-from-rna-seq-measurements-and-why-i-believe-in-paywall/)?

Pedro A B Pereira said...

i) Junk DNA isn't a good term to describe TEs in general. Active TEs that can replicate or move within the genome are better accounted as "selfish" while dead TEs that can no longer replicate and are likely subject to neutral mutation within the genome at this point are best described as "junk"

Maybe I'm missing something, but a TE being active or not is irrelevant for the organism itself and has (presumably) no fitness impact on the organism, so both active and inactive TEs will be subjected to neutral mutations. There is no selection for TEs since selection is acting on the organim. They're all neutral, active or not. That's why most of them are inactive: there's no selection acting on them, and they're fate is sealed.

Or am I missing something?

Pedro A B Pereira said...

I do have a problem with the following definition, but at a different level:

"A sequence is functional if deleting it from the genome has an effect on the survivability of the organism or its progeny."

Since we all agree that junk can potentialy, by molecular evolutionary means, produce genetic novelty and turn into something useful somewhen down a lineage, wouldn't that mean that including the "or it's progeny" part would automatically make the genome 100% functional? Not all of it would become functional down the line, obviously, but since the potential is there for any part of it to become functional we cannot at any point say it's junk because it could affect "the progeny". That would invalidate the existence of junk by that definition. Seems to me that leaving the "progeny" part of the definiton would be better, or at least say "direct progeny" instead.

Am I missing something?

Pedro A B Pereira said...

The reason is the expression level of the TF protein and its bind strength to its functional sites has been 'tuned' to the fact that most of it will be soaked up by non-functional sites. Removing them is the equivalent to whopping over-expression. But this is function in only a roundabout sense

That's a very interesting commentary, thanks for sharing. I still think that wouldn't affect what we consider to be genomic junk in any sensible sense, but it does illustrate how perfect definitions won't be forthcoming. By the way, has anyone compared expression levels among closely related organisms with wildly varying amounts of junk (in the famous "onions" spirit)?

Pedro A B Pereira said...

I also interpreted Joe's comment the same way as John. It seems to me a very valid point. Lab conditions are far from "real world" conditions, and failing to detect effects depends on the function of the gene and associated phenotype, which may or may not be obvious under lab conditions.

Nevertheless, it can be used as an argument to use functionality as the null hyphotesis, although I think it would be unreasonable to do so given all the rest. But it is important to keep those issues in mind.

Larry Moran said...

Or am I missing something?

You are missing two things.

1. Whether we should refer to active TEs as "junk"?

2. A better definition of "function" that resolves these issues?

Larry Moran said...

Am I missing something?

What you are missing is a superior definition of "function."

This would be a good place to post it, if you have one. Otherwise, let's just agree that an airtight definition doesn't exist so there's no point in publishing any more papers any that subject.

Larry Moran said...

Also, there are cheap options like PeerJ (which is $100 per paper although it does not publish opinion and perspective pieces), and now there is bioRxiv too.

LOL

Thanks for injecting a bit of humor into this thread.

Wait a minute .... you weren't serious by any chance, were you?

Pedro A B Pereira said...

Maybe I didn't explain myself correctly. My point was only that if one states that dead TEs are subject to neutral mutations (as the above quote refers), then it would be implied that active TEs are not. But as far as we know, they both are, so that has no relevance to one being junk and the other not.

As for active TEs being seen as junk or not, I don't think anyone will be "right" or "wrong" here, since it depends on seeing functionality from the point of view of usefulness to the organim or simply if the genes in TEs are themselves functional or not. I know you see them as not being junk DNA, but others consider them so, and I don't think any position is necessarily correct.

Pedro A B Pereira said...

My proposal is the same as above but without the "or its progeny" part. I don't claim it to be perfect, though.

judmarc said...

Have to keep the progeny, Pedro. What if the removal of a sequence results in offspring that are still born or sterile? Such a sequence would then not be junk.

Pedro A B Pereira said...

Nice point. Perhaps we could say "or its direct progeny" instead? That way we avoid both potential problems (the one you mention and the one I mentioned originally).

But I agree with Moran: just like the definition of species, no definition will probably ever be 100% satisfactory. We just need to be aware of particular cases/exceptions.

Claudiu Bandea said...

Laurence A. Moran: If all of the excess DNA has a very tiny evolutionary function then the genetic load would be intolerable, wouldn't it?

Larry,

I’m sure you know that genomic sequences can have informational functions (iDNA) or non-informational functions (niDNA) and that the ‘genetic load’ only applies to iDNA; you might want to correct your statement.

Joe Felsenstein: I am very far from thinking that most of the genome is "functional" in any meaningful sense.

Joe,

Did you have the chance to read my paper “On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al.” ( http://biorxiv.org/content/early/2013/11/18/000588) in which I present additional evidence and arguments for a putative biological function(protection against insertion mutagenesis) of so called “junk DNA” (jDNA)?

Georgi Marinov said...

1. I have not written the paper, I have written portions of it, done some of the analysis, and made some of the figures

2. How do you define "new"?

3. 1) Why? 2) What makes you think any individual has that power?

Georgi Marinov said...

I don't see what is so laughable. I mentioned PeerJ because it is cheap and it is PubMed-indexed so it is part of the "official literature". Unfortunately, it does not publish perspectives, so it's obviously not useful in this case, but I used it as an example to illustrate the point that not all journals have exorbitant publication fees.

bioRxiv is not PubMed-indexed, but at least it is officially citable as articles there receive a doi.

Georgi Marinov said...

True, DNA with non-informational function does not directly suffer from point mutations (one would imagine it would suffer from indels though, but let's ignore that right now)

However, I have asked you repeatedly to show us how genomes would end up with such DNA through adaptive means in the first place, and you have never bothered to reply. You know, as Joe Felsenstein once said here, "Show me the selection coefficients", that kind of stuff.

It is not enough to provide verbal arguments about what function something might be playing, you need to have the population genetics to back it up too. And none of the papers promoting such views that I have ever seen have even touched the subject.

Tyler said...

Pedro,

Whether a TE is active or not can have fitness consequences for the host organisms, since active elements can jump into functionally relevant regions of the genome and cause deleterious mutations. Like any other mutations new TE insertions can be neutral, deleterious or beneficial, so some TE insertions are subject to positive selection at the host level, some are subject to negative selection at the host level and most probably just fluctuate in frequency due to drift because they are neutral at the host level.

BUT these forces also occur at the level of the elements, whereby in the element population those elements that are better are surviving and reproducing in the genome will be favoured by intra-genomic selection and will increase in copy number. This force will often by opposed by negative selection at the host level.

In the human genome most TE sequences are dead and degraded copies, I suspect the vast majority of these are of not functional relevance to the host organism and can be considered junk. Some inactive TE sequences happen to be in a beneficial spot or have a beneficial sequence for the host and then they are subject to positive selection at the host level, but this is the minority just as it is with mutations in general.

Whether one sees active, selfish TEs as junk or not for the organism seems to be a matter of perspective. Do many actives TEs contribute beneficial to the individual organism? I highly doubt it. There are examples where they do, such as telomere maintaining TEs in Drosophilids and some other taxa, but these examples appear to be the exception and not the rule.

I distinguish between junk DNA and selfish DNA because I think it can cause us to ignore that fact that TEs are their own level and that they are interesting and important entities to study in their own right. But I can see how some people would want to label active TEs as junk, because they usually aren't functional for the host. I just think we need to take a more nuanced approach and be careful about which level we are speaking about when discussing function.

Pedro A B Pereira said...

Dear Tyler,

I agree with what you said, generaly speaking. As I stated before, it amounts to a matter of perspective, and there are no right or wrong answers here. My perspective is that whatever future benefit comes out of TE activity can't be used as a discrimination between being junk or not, since under that criteria any part of the genome could potentially supply something useful down the line and junk as a concept would make no sense. All genomes would be effectively 100% "functional" just because of that future potential.


Do many actives TEs contribute beneficial to the individual organism? I highly doubt it. There are examples where they do, such as telomere maintaining TEs in Drosophilids and some other taxa, but these examples appear to be the exception and not the rule.

At this point, those TEs are not "junk" anymore, and I don't think anyone would claim they aren't functional.

I distinguish between junk DNA and selfish DNA because I think it can cause us to ignore that fact that TEs are their own level and that they are interesting and important entities to study in their own right. But I can see how some people would want to label active TEs as junk, because they usually aren't functional for the host.

I understand your point. I happen to somewhat favor the other side of the fence, but quite honestly, as long as everyone understands what the real point is, this should be of no practical relevance. Sometimes this seems to me as useful as deciding if viruses are alive or not.

Arno Wouters said...

How about distinguishing the *definition* of function from the *criteria* or *methods* for determining whether that definition applies? Function can be defined as the role of a part of a system in the production of an 'organized' activity, process, ability or property of that system. Biological function can be defined as the role in an organism's ability to maintain the living state. In this view, Larry's genetic, evolutionary and biochemical approaches aren't different ways to *define* function, but different ways to determine whether a part has a function and what that function is.

Claudiu Bandea said...

@Georgi Marinov:

“However, I have asked you repeatedly to show us how genomes would end up with such DNA through adaptive means in the first place, and you have never bothered to reply."

To my recollection, I always ‘bother’ to reply; otherwise the discussion is meaningless or one-sided. I would suggest that if you make this type of statements, you provide a link to the questions you asked, just like I did in my comment above (see my comment on Sunday, June 29, 2014 11:45:00 AM).

“It is not enough to provide verbal arguments about what function something might be playing, you need to have the population genetics to back it up too. And none of the papers promoting such views that I have ever seen have even touched the subject.”

As per your own admission a few weeks ago (http://sandwalk.blogspot.com/2014/05/how-does-nature-deal-with-encode.html#comment-form), I don’t think that you are familiar enough with the fundamental concepts and the scientific literature on these subjects in order to seriously evaluate them:

“But there are things that are absolutely necessary and they just don't even exist -- I am about to officially get my PhD in two weeks and in neither my undergraduate nor my graduate institution did I even have the option to take a serious evolution class...”


Georgi Marinov said...

So basically once again you refuse to answer, but this time you add some cheap (and wrong) ad hominem attacks to the mix ...

Claudiu Bandea said...
This comment has been removed by the author.
Claudiu Bandea said...

Can you please direct us the questions you mentioned?

Georgi Marinov said...

I want you to show me a convincing population genetics and molecular-biology-based argument for how the non-information function might have been selected for, i.e how a small compact genome become a big apparently bloated genome, in which, however, the extra DNA plays a non-informational role as you propose, in a adaptive manner.

Claudiu Bandea said...

You can find this information in my paper “On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al.” ( http://biorxiv.org/content/early/2013/11/18/000588). Please read it and if you have specific questions, I’ll be happy to address them.

Georgi Marinov said...

That paper contains nothing of the sort - all it has is verbal arguments that assume the existence of a big genome, but do not tell us anything about how getting so big in the first place was advantageous according to your hypothesis. I did that exercise for you in detail some time ago, you ignored it. I won't bother to do it again, but in short, even if we assume that extra DNA is conferring a protective advantage against mutation, there is no way it could have been selected for because even the largest individual insertions would only increase that protection by a tiny amount that would invisible to selection in a population with low Ne.

So we are back to the much more reasonable explanation of that DNA being there because of the balance between mutational processes and selection strength. Which you can then take and claim "But once it's there, it is playing a protective role and is maintained by selection" but"

1) There is no reason for that additional layer of explanatory complexity
2) The argument that that extra DNA is maintained by selection is not convincing at all, because, by the reverse form of the same reasoning I mentioned above, individual small deletions would be invisible to selection and eventually drift to fixation and shrink the genome (if, of course, the indel balance was in favor of deletions),
3) you still fail the onion test (in the general form)

Claudiu Bandea said...

It is hard understanding your narrative. For example, what do you mean when you say: “So we are back to the much more reasonable explanation of that DNA being there because of the balance between mutational processes and selection strength.” What selection are you talking about?

Georgi Marinov said...

Natural selection, naturally

Claudiu Bandea said...

“Natural selection, naturally”, I like that.

So you agree that the genome size, like most if not all organismal features, is the result of the balance between mutational processes and natural selection strength.

Let’s consider for the sake of this discussion, that the so called ‘junk DNA’ (jDNA) in species with high C-value, such as humans, is the result of other evolutionary forces such genetic drift and neutral evolution.

So, in this hypothetical example, does this jDNA (e.g. 90% of the genome) confer a protective mechanism against insertion mutagenesis by inserting elements, such as retroviruses, or not?

Georgi Marinov said...

You still don't get it and it's becoming hopeless at this point.

You do talk about balance between the different mutational processes, which is a good first step, and you do mention drift, but you seem unable to get past that initial good star (BTW, "neutral evolution" is not an evolutionary force).

Let me summarize it once again:

Why some genomes are big is sufficiently well explained by the following:

1) An overall balance of individual mutations (small indels and large insertions of TEs) that is in the direction of expanding these genomes
2) Low Ne, which means that the slightly negative selection coefficients of each of these individual mutations are mostly invisible to selection in these lineages and are free to drift to fixation

Your hypothesis is:

Having a lot of noncoding DNA protects against insertional mutagenesis so the trait of having a lot of noncoding DNA is maintained by selection.

There are multiple severe problems with that hypothesis:

1) It does not at all explain the phylogenetic distribution of genome size values (i.e. the onion test)

2) The population genetics does not work. And it does not work because there is no "large genome" allele, the genome is large because of a very large number of very small compared to the whole genome in size insertions of noncoding DNA, and it is those alleles that evolution works with. That means that:

3) There was no plausible way for "protection against insertional mutagenesis" to be selected for and for the genome to grow as a result of it, as I have explained to you in the past (and that is without even going into the negative effects of fitness of those insertions)

4) There is no plausible way for it to be maintained by selection (by the reverse argument - if the indel balance was in the direction of deletions, the genome would be shrinking over time because each individual small deletion would be decreasing the effectiveness of protection against insertional mutagenesis by such a small amount that you would need very large Ne for it to be selected against (but Ne is low in these lineages).

That's what you need to work out to make your hypothesis worth taking seriously.

And I keep repeating it and you keep ignoring it.

Claudiu Bandea said...

I don’t know why your keep bringing up ‘selection’, when I just said in my previous comment to consider that the so called ‘junk DNA’ (jDNA) is the result of genetic drift and neutral evolution, not selection.

Considering that in humans approximately 90% of the genome consists of the so called jDNA, does it provide a protective mechanism against insertion mutagenesis by inserting elements, such as retroviruses, or not?

John Harshman said...

Claudiu,

The answer to your question is irrelevant. Here's an example to show you why: Do random sequences bind transcription factors or not? Well, yes they do, but does random sequence therefore have the function of binding transcription factors? No.

Georgi Marinov said...

Are you going to come up with a substantial reply or not?

It's completely irrelevant if it protects or does not protect against insertion mutagenesis (a question that can be answer only if all else is equal, and all else is not equal) if what you are claiming is that there is purifying selection maintaining all that extra DNA, and that claim makes no sense. Point 4) above.

Regarding Point 3), you wrote this yourself:

Notably, this model couples the mechanisms and the selective forces responsible for the origin of jDNA with its putative protective biological function, which represents a classic example of ‘fighting fire with fire.’ One of the key tenets of this theory is that in humans and many other species, jDNAs serves as a protective mechanism against insertional oncogenic transformation. As an adaptive defense mechanism, the amount of protective DNA varies from one species to another based on the rate of its origin, insertional mutagenesis activity, and evolutionary constraints on genome size.

You cannot claim in the same time that:

1) junk DNA arose as a result of the nonadaptive interaction between mutational balances and the population genetics environment of the lineage
2) It is an adaptive mechanism guarding against insertional mutagenesis
3) It is currently maintained by purifying selection due to this "function" it has

These cannot be all true in the same time.

Finally, once again, explain to me how the process works at the level of an individual transposable element insertion, 6kb in size, within a 3GB genome?

Claudiu Bandea said...

John,

Do random sequences bind transcription factors or not? Yes, some random sequences (but not all) bind transcription factors.

Do these random sequences therefore have the function of binding transcription factors? Yes, those random sequences that bind the right transcription factors and are located at the right position in the genome in order to regulate the expression of a functional gene in beneficial way for the organism are indeed functional. The other random sequences are not; they are ‘dysfunctional’ :-) for the organism.

BTW, you did not hesitate to answer your ‘irrelevant question.’ Why, then, do you hesitate to answer the question that I’m asking:

Does the so called ‘junk DNA’, which represents approximately 90% of human genome, constitute a protective mechanism against insertion mutagenesis by inserting elements, or not?

Here is my position ( http://biorxiv.org/content/early/2013/11/18/000588):

Whether jDNA has been evolutionary maintained simply because of a mutational imbalance, favoring amplification of parasitic DNA versus deletion, or because jDNA is under host positive selection (whatever this selection might be), the protective function of jDNA in humans and other eukaryal organisms against insertional mutagenesis by endogenous and exogenous mobile genetic elements, such as retroviruses, is a bona fide fact.

Claudiu Bandea said...

Giorgi,

Again, for whatever reason, you keep bringing up *selection*. Selection is not essential for my theory that jDNA provides a protective function against insertion mutagenesis by endogenous and exogenous mobile genetic elements, such as retroviruses.

Please see this excerpt from my paper ( http://biorxiv.org/content/early/2013/11/18/000588):

”Whether jDNA has been evolutionary maintained simply because of a mutational imbalance, favoring amplification of parasitic DNA versus deletion, or because jDNA is under host positive selection (whatever this selection might be), the protective function of jDNA in humans and other eukaryal organisms against insertional mutagenesis by endogenous and exogenous mobile genetic elements, such as retroviruses, is a bona fide fact.”

Just to clarify, in the phrase bellow, “adaptive defense mechanism” is used as an conceptual analog to the CRISPR/Cas *adaptive* defense system against viral elements in prokaryotes.

“As an *adaptive* defense mechanism, the amount of protective DNA varies from one species to another based on the rate of its origin, insertional mutagenesis activity, and evolutionary constraints on genome size.”

Georgi Marinov said...

1) Stop misspelling my name, you do that all the time

2) I don't see how you can argue that something is there for a reason because it plays an important function and then say that selection is not involved.

3) Does the so called ‘junk DNA’, which represents approximately 90% of human genome, constitute a protective mechanism against insertion mutagenesis by inserting elements, or not?

All else equal, it does. Is all else equal? No

John Harshman said...

Claudiu, I'm afraid your blinders are working too well. My irrelevant question was a direct analogy to your irrelevant question and showed why it was irrelevant. You answered my question in a weaselly way that let you say "yes" and mean "no", thus avoiding having to think about what "function" means.

Claudiu Bandea said...

Georgi,

1) Sorry for misspelling your name

2) I say that the so called jDNA, which I increasingly refer to as symbiotic DNA (sDNA), plays a critical function. As a commentator here at Sandwalk, you know that natural selection is not the only force of evolution; just ask Larry :-)

3) You say that because “all else is not equal” (whatever that means) your answer to my question is ‘No.” That’s fine with me.

Claudiu Bandea said...

John,

I answered your question according to the existing data: indeed, very few random sequences inserted in the genome can bind transcription factors, and even less sequences can serve as productive promoter elements. That means that the vast majority of inserted genomic sequences *do not* play a role in gene regulation. If the ENCODE scientists would have had the same appreciation about the biology and evolution of the human genome, it would have been great, don't you think?

So, my answer to your question is clear: with few exceptions the random sequences *cannot* be functional as promoter elements.

Now, do you have the courtesy and the ‘courage’ to answer my question?

Georgi Marinov said...

2) I say that the so called jDNA, which I increasingly refer to as symbiotic DNA (sDNA), plays a critical function. As a commentator here at Sandwalk, you know that natural selection is not the only force of evolution; just ask Larry :-)

If it plays a critical function and is important to the organism fitness, then it is subject to purifying selection by definition. Not purifying selection on the sequence level, but on its length.

So clearly selection has to be involved in maintaining it, otherwise your claims do not even begin to make sense.

However, as I repeatedly point out to you, there is no mechanism to maintain that extra DNA in a situation in which Ne is low and the indel balance is in the direction of deletions. As you yourself wrote, the protective capability of that extra DNA is directly proportional to its quantity. But if I delete a 2.8kb piece of such DNA from a 3.2GB genome that has 200-400MB of sequence-constrained DNA, I am decreasing that capacity by 1x10^(-6). Greatly inflating the fitness effect and assuming a selection coefficient for that mutation equal to the negative of that number, there is now way such a deletion could be selected against when Ne is around 10^4. So the genome would be shrinking. Note that it would be shrinking too if you had a lot of small individual deletions, of say 3bp in size, even if Ne was 10^8, I just used the 3kb region for the sake of the argument.

So there is no plausible way that extra DNA is maintained by selection.

Now the indel balances are more often than not in the direction of expansion, but if that is the case, then your hypothesis is not needed to explain anything - indel balance and effective population size explain why the genome is so big without any need for postulating a causative role of the proposed protective role of the extra DNA. We are looking for minimal fully explanatory models, after all.

Claudiu Bandea said...

Obviously, the so called jDNA is present in our genome and many other species. As I wrote in the conclusion of my paper, it is possible that this jDNA ”has been evolutionary maintained simply because of a mutational imbalance, favoring amplification of parasitic DNA versus deletion”. However, regardless of the forces behind the origin and maintenance of this jDNA, it serves as a protective mechanism against insertional mutagenesis and, therefore, it would make sense to refers to it as protective, symbiotic DNA (sDNA).

BTW, Georgi, what percentage of the human genome do you think is junk DNA?

Larry Moran said...

If all that extra DNA is such good "protection" then,

1. Why don't bacteria bulk up their genomes to get protection?

2. How does your speculation fare in The Onion Test?

3. How does creating a genome with dozens of active transposons that survive a million years of evolution count as "protection."

Joe Felsenstein said...

Claudiu Bandea may well be right that junk DNA "serves as a protective mechanism against insertional mutagenesis", but before I would rename it I would want evidence that it is maintained by natural selection for this.

The difficulty in concluding that are the selection coefficients involved. I often give Larry a hard time here when he thinks that selection coefficients such as 0.0001 indicate neutrality. But if we try to calculate selection coefficients favoring retaining a particular 100-base piece of jDNA as a result of this protective effect, it will be far smaller. Won't we get a number so small even I will agree that the deletion of that piece of junk DNA is neutral?

Claudiu Bandea said...

Larry,

These are highly relevant questions. However, I addressed these issues in my papers, so either you did not read them, or did not read them carefully enough.

Please read my paper “On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al.” ( http://biorxiv.org/content/early/2013/11/18/000588), including the material in the *Data Supplements*, which is accessible at the same link.

You might also want to read my post entitled “Junk DNA is bunk, but not as suggested by ENCODE or Doolittle” ( http://www.ncbi.nlm.nih.gov/pubmed/23479647#cm23479647_1429), in which I outline my perspective on the *nucleoskeletal* and *nucleotypic* theories, which have dominated the thinking on genome size evolution over the last few decade, and which have been embraced by Doolittle as pillars for his theoretical framework on genome size evolution and biology.

If after reading this material you still want to ask these 3 questions, or if you have additional ones, I’ll be happy to address them.

You might want to become familiar with these so called “DNA-bulk theories” for another reason. As stated in my comment (see first comment) at your post *The Function Wars: Part II* ( http://sandwalk.blogspot.com/2014/07/the-function-wars-part-ii.html), “The idea that most of the genome in species with high C-value, such as humans, has informational roles has been dismissed decades ago for well rationalized reasons, including the C-value paradox, mutational load, and the evolutionary origin of most genomic sequences from transposable elements” Therefore, in context of this reality, I suggested that ”the major question remaining in the field of genome evolution and biology is whether most of the human genome and that of other organisms with relatively high C-value has *non-informational* functions or not”.

Apparently, you agree with this: ”I agree with you that bulk DNA speculations are the only way to avoid the conclusions that most of our genome is junk”. So, it would make sense to write some posts on these theories, and becoming familiar with them would help.

Larry Moran said...

I am perfectly familiar with all the bulk DNA speculations. Just because I agreed with you that they are possible ways of explaining large genomes does not mean that they are correct. In fact, most of them are just plain silly and the others are so vague and imprecise that it's difficult to tell what the authors actually mean.

Claudiu, your speculations fall mostly into the first category (silly) but they are also confusing and vague. It's because your ideas are so confusing that you are getting questions whenever you post comments.

Your refusal to answer those questions is becoming both boring and annoying. Many of us have read your manuscripts and we stll don't know what you mean. Answer the questions or shut up.

Claudiu Bandea said...

@ Joe Felsenstein,

Thanks for keeping an open mind regarding my theory that the so called so called ‘junk DNA’ (jDNA) serves as a protective mechanism against deleterious insertion mutagenesis.

As stated in my original and more recent papers, as well as on this thread (please see above), it is possible that the so called ‘junk DNA’ (jDNA) is simply the result of a mutational imbalance, favoring its amplification versus deletion (parenthetically, this might be a genuine example of Masatoshi Nei’s ‘mutation driven evolution’). Therefore, selection is not essential to explain the protective function of the so called jDNA.

That being said, I think there is plenty of selection involved in shaping the genome evolution and the accumulation of the so called jDNA. First of all, as emphasized in my model, there is very strong selection on the location where transposable elements can insert and the overall quantity of so called jDNA.

Evidently, most insertions in the informational DNA (iDNA) are deleterious and, therefore, the hosts carrying them are eliminated by selection. And, the same happens with most insertions in specific non-informational genomic DNA sequences (niDNA) that have constrains on length or overall sequence composition.

However, insertions in niDNA that have no such constrains [i.e. those acting as protective or symbiotic DNA (sDNA)] can survive evolutionary along with their hosts. Nevertheless, as discussed in these papers there are limits of how much of this sDNA can accumulate.

For example, in organisms hosts with high ‘metabolic and energetic constrains,’ such as bacteria, only those individuals organisms with limited amounts of jDNA can survives evolutionary; in these organisms, the high selective pressure imposed by insertion mutagenesis has led to the co-evolution of highly efficient protective mechanisms in form of site specific integration. However, in many eukaryal organisms, including most multicellular species, the costs for maintaining these sequences are small compared to those associated with other organismal features so the purifying selection against the accumulation of jDNA is relatively weak, at least up to a certain quantity.

Joe Felsenstein: “But if we try to calculate selection coefficients favoring retaining a particular 100-base piece of jDNA as a result of this protective effect, it will be far smaller. Won't we get a number so small even I will agree that the deletion of that piece of junk DNA is neutral?

I agree.

Joe Felsenstein said...

Sounds pretty much like the standard theory for the presence of jDNA, at least as far as I can see.

Claudiu Bandea said...

Larry,

Sorry to say it, but I agree with T. Ryan Gregory ( http://sandwalk.blogspot.com/2014/07/the-function-wars-part-ii.html):

”Sorry, Larry -- you know I appreciate your blog posts, but it's clear that you didn't understand the paper(s) you criticize”

However, as promised, here are the answers to your 3 questions:

1. Why don't bacteria bulk up their genomes to get protection?

Due to the high metabolic and energetic constrains associated with increasing their genome size, in context high reproductive rate and large populations, bacteria has co-evolved other protective mechanism against deleterious insertion mutagenesis, such as specific sites of integration, which consists of relatively short sequences.

2. How does your speculation fare in The Onion Test?

The amount of protective or symbiotic DNA (usually referred to as ‘junk DNA’) as an adaptive defense system (as in *adaptive immunity*; e.g. see the CRISPR/Cas *adaptive immunity* system in bacteria) varies from one species to another (including various species of onions) based on the rate of its origin and deletion, insertional activity, and evolutionary constraints on genome size. More specific, if a certain species of onions is exposed to high insertional activity by endogenous or exogenous viral elements as compared to other species including other species of onions, then its genome would increase in size until the inserting activity levels off or until the genome size becomes a metabolic or physiological burden.

3. How does creating a genome with dozens of active transposons that survive a million years of evolution count as "protection."

Well, you exist don’t you?

I doubt that you agree with these answers, which is fine with me. But I hope you’ll have the confidence to specifically address the following points:

Your stance on the so called ‘junk DNA’ is as clear it can be: the so called ‘junk DNA’ is the result of genetic drift and neutral evolution, and that natural selection has nothing to do with it. (yea or nay?)

You are also clear that the products of genetic drift and neutral evolution can have biological functions, can’t they?

So, if we consider that jDNA is the product of genetic drift and neutral evolution, why can’t it provide a protective biological function against deleterious insertional mutagenesis?

Georgi Marinov said...

Sigh...

Let's repeat once again:

As stated in my original and more recent papers, as well as on this thread (please see above), it is possible that the so called ‘junk DNA’ (jDNA) is simply the result of a mutational imbalance, favoring its amplification versus deletion (parenthetically, this might be a genuine example of Masatoshi Nei’s ‘mutation driven evolution’). Therefore, selection is not essential to explain the protective function of the so called jDNA.

If something has a function then it is subject to selection, most of the time purifying.

That being said, I think there is plenty of selection involved in shaping the genome evolution and the accumulation of the so called jDNA. First of all, as emphasized in my model, there is very strong selection on the location where transposable elements can insert and the overall quantity of so called jDNA.

You keep repeating this, without backing it up with anything, and I keep showing you (with numbers) how it is nonsense. And then you repeat it again....

Evidently, most insertions in the informational DNA (iDNA) are deleterious and, therefore, the hosts carrying them are eliminated by selection. And, the same happens with most insertions in specific non-informational genomic DNA sequences (niDNA) that have constrains on length or overall sequence composition.

How many examples of intergenic "non-informational" DNA with very tight constraints on its length can you cite?' It's not enough to just posit the existence of a phenomenon that is very important for the theory you like and then use it in support of it, it has to actually exist.

Joe Felsenstein: “But if we try to calculate selection coefficients favoring retaining a particular 100-base piece of jDNA as a result of this protective effect, it will be far smaller. Won't we get a number so small even I will agree that the deletion of that piece of junk DNA is neutral?

I agree.


You agreed with an argument that invalidates your hypothesis...

Georgi Marinov said...

So, if we consider that jDNA is the product of genetic drift and neutral evolution, why can’t it provide a protective biological function against deleterious insertional mutagenesis?

Let's repeat this one once again too.

If the existence of junk DNA can be explained entirely by drift and neutral evolution, then there is no need to add an additional explanatory layer of complexity.

the products of genetic drift and neutral evolution can have biological functions

There is something called constructive neutral evolution. It happens and it happens a lot. But the products of constructive neutral evolution are locked in their irreducibly complex state and maintained by purifying selection.

We have nothing of the sort here because there is neither a conceivable mechanism through which purifying selection could maintain the size of the human genome nor there is a need to invoke it.

Claudiu Bandea said...

@ Georgi

According to my model on genome evolution in organisms with relatively high C-values, such as humans, the so called “junk DNA” provides a protective mechanism against insertional mutagenesis and, therefore, it fulfils the definition of symbiotic DNA (sDNA). As I said before, if you don’t agree with this theory, that’s fine.

However, as a member of the ENCODE project it would be interesting to know how much of the human genome do you believe is junk DNA?

Claudiu Bandea said...

Laurence A. Moran: “Claudiu, your speculations fall mostly into the first category (*silly*) but they are also *confusing and vague*

Larry,

Let me remind you about Ewan Birney’s answer when asked if his perspective that 80% of the human genome is function passes the ‘onion test’ or not? He said that the ‘onion test’ is *silly*, and he set it aside as an irrelevant issue ( http://www.genomicron.evolverzone.com/2012/09/birney-thinks-onion-test-silly/). Highly disingenuous, isn’t it?

And why is my paradigm that the vast majority (approximately (90%) of the human genome, which I label symbiotic DNA (sDNA), provides a defense mechanism against deleterious inserting mutagenesis by endogenous and exogenous inserting elements, such as retroviruses, *confusing and vague*.

Is it because you and some of the other commenters here (see above) cannot conceive that sDNA could have originated and maintained not by natural selection but by other evolutionary forces, such as this genetic drift?

Why do you think that an organismal feature that has originated and is being maintained by genetic drift and neutral evolution cannot have a biological function?

Pierre-Luc Germain said...

Dear Prof. Moran,
I hope that despite being late I might get an answer... I'll try to be concise and avoid "tedious and almost incomprehensible" discussions.
You propose that:
"A good working definition of “biological function” is to consider a particular stretch of DNA functional if deleting it affects the survival of the organism or its descendants. Conversely, if the DNA can be removed without consequences then it is probably junk."
Your working definition is a-historical, whereas most evolutionary biologists would probably agree that junk DNA is first and foremost a historical concept, i.e. that this DNA is there NOT because of natural selection (at the level of the organism) acting on it, but for other reasons. But let's leave that aside for the moment. Could you please provide a working definition of "affecting the survival of human beings or their descendants"?
The issue we tried to emphasize in our paper is that any difference in genotype makes a difference to the organism, and that although many such differences are irrelevant, fitness is not a panacea for sorting them out. A clear example is that there are several diseases that do not significantly affect our lifespan, nor our capacity to reproduce. Of course you could claim that the medicalized environment we live in is not our "normal environment", but unless you can provide non-historical reasons for this, there are plenty of potential environments to look at (e.g. does this genetic variation affect 70 years old men and women's capacity to climb the Kilimandjaro?). What matters, we thought, were differences we actually care about, in our current environment.
So to get to the core of the issue: If we deleted the 50% of the human genome you're most certain is non-functional, are you sure you would notice no difference in any phenomena we care about, from drugs side effects to aging or risk for ASD? I think we don't know, and that's the substantive part of this debate.
Now, of course none of this has to do with junk DNA historically understood -- for which many authors, such as Gregory, have made a powerful claim. And I think it's precisely a problem of this debate that the two questions systematically tend to get conflated.
The concept of function might be doing more harm than good in biology.

Joe Felsenstein said...

Larry will, I hope, answer for himself. But as an evolutionary biologist I would almost endorse his definition, with one exception. I would call a sequence functional if deleting it reduces the fitness of the organism. Note that

1, This means a transposon insertion may be deemed nonfunctional if deleting it increases the fitness of the organism. If I weld a bunch of junk to the front of my car, it may reduce the speed of the car, and deleting it would increase the speed of my car. Nevertheless that junk is nonfunctional.

2. There seems to be a consensus that conserved sequence is the gold standard for "function". In such cases deleting the sequence would reduce fitness. But not necessarily noticeably. In evolution a mutation that reduces fitness can be effectively selected against if the selection coefficient is greater than 1/N. For most organisms that is so small a number that we would not notice the change in the laboratory. Nature does a longer and bigger experiment than we can.

3. Notice that I wrote "fitness", not "survival". A sequence could affect fertility but not viability, and that seems to have escaped attention here.

4. If you delete all those transposon copies that have a negative effect on the fitness, the resulting increase in fitess might be noticeable. But deleting each one individually might not lead to a noticeable improvement.

Whether the definition of functional that I am backing is of much use is to be doubted, since we will not easily be able to assess it.

Larry Moran said...

Thank-you for responding. I'm hoping to get back to blogging within a few days and the next post on the Function Wars will address my concerns with your paper.

My point is that quibbling about the exact meanings of terms like "function" or "junk" is unlikely to be productive. I'm making this point by quibbling. :-)

Similarly, when we use terms like "affecting the survival of an organism or its descendants" I'm hoping that people will appreciate the sense of the phrase rather than insist on a precise meaning. Few terms in biology can stand up to quibbling. I doubt very much that you misunderstood my meaning.

The issue we tried to emphasize in our paper is that any difference in genotype makes a difference to the organism, and that although many such differences are irrelevant, fitness is not a panacea for sorting them out.

It cannot possibly be true that "any difference in genotype makes a difference to the organism." At least, not true in any biologically relevant way.

It's also true, in my opinion, that fitness is not a panacea for determining function. I gave some examples in my posts. Nevertheless, the value of a working definition is that it applies in the majority of cases and exceptions are just that, ... exceptions.

That's the best we can do. By the way, you and your colleagues focus on diseases but genetic diseases are not a normal function of the genome. They are mutations.

If we deleted the 50% of the human genome you're most certain is non-functional, are you sure you would notice no difference in any phenomena we care about, from drugs side effects to aging or risk for ASD?

Yes, I'm pretty sure that there would be no effect on drug side effects or risk for ASD. But the real issue concerns the burden of proof.

I think we don't know, and that's the substantive part of this debate.

There's a sense in which "we don't know" is meaningful but this isn't one of those cases. The evidence for junk is not something that you can ignore in this debate. The burden of proof is on those who claim that most of our genome is functional. Part of that burden requires that you refute the evidence. Here's a post on Five Things You Should Know if You Want to Participate in the Junk DNA Debate.

Pierre-Luc Germain said...

Thanks for your answer, and I'm very much looking forward having more feedback.

Our aim with the paper was twofold: a) to show that the SE account is highly problematic and that the CR account shouldn't be dismissed so quickly, and b) that beyond semantics there's also an empirical disagreement at play.

Regarding a), which is probably what you call quibbling (granted, contemporary "professional" philosophy has brought quibbling to quite a level, but in my opinion philosophy should be more a critical than a positive enterprise), I think there are at least 3 important values in such quibbling:
1) from a didactic point of view, something such as this controversy is a most excellent way for students and scientists to think about a range of questions and considerations;
2) challenging concepts and arguments often leads to their improvement (for instance I think a little quibbling could improve your interesting "five-things-you-should-know");
3) it reminds us that some of the concepts we're working with (and your positions on functions are roughly the orthodoxy) are just that: perhaps useful but problematic working concepts (about functions I even have doubts on the "useful" bit).

Concerning b), I'm referring to the question I asked you about deleting. I'd be very curious to see a poll about this, because I'm not sure what proportion of biologists share your intuition. You may be right in saying that the burden of the proof shouldn't be equally distributed, but the allegedly overwhelming evidence you refer to isn't actually solving that question because it's tied to fitness (and my question wasn't). After a few cycles of quibbling-and-reformulation, I wouldn't deny the core of your "five-things-you-should-know", and yet it doesn't answer my question. Genetic load, for instance, tells us that only a small percentage of our genome contains critical information, but does it tell us whether the rest can make a small difference in, say, old age?
I think it'd be constructive to consider a moment that ENCODE was claiming to provide evidence that much of the rest of DNA did make a such a difference, and to assess this claim.
Pierre-Luc

judmarc said...

but does it tell us whether the rest can make a small difference in, say, old age?

These hypothetical sequences wouldn't be retained in the genome on the basis of making a small difference in old age. Any beneficial effect invisible to selection would be accidental, as would any deleterious effect invisible to selection. How interested are we in whether we have some small number of such "happy accidents" to comfort us in our old age (I cannot imagine there would be a lot of them), that will pass randomly out of the genome since selection won't act to retain them?