More Recent Comments

Sunday, April 29, 2007

Noncoding DNA and Junk DNA

Scientific American has published another short note on junk DNA [Jumping 'Junk' DNA May Fuel Mammalian Evolution]. RPM noticed that there was no reference to the actual study being quoted in the article so it wasn't possible to verify the accuracy of the reporting [Junk DNA in Scientific American]. I couldn't find it either when I looked last week but it has now appeared on the PNAS website [Thousands of human mobile element fragments undergo strong purifying selection near developmental genes]. RPM also complained about the over-use of the term "junk DNA" in the Scientific American Article. That's what I want to discuss.

The author of the Scientific American article, JR Minkle, has responded on the Scientific American website [The DNA Formerly Known as Junk]. Minkle is a science writer who has covered a lot of stories in many different fields. As far as I know Minkle has not written very much about biology before summarizing the work in the PNAS paper. There was a time when all the science in that journal was written by scientists who were experts in the field [The Demise of Scientific American]. Anyway, that's not the main point here. JR Minkle has listened to the critics and made a decision to avoid the term "junk DNA" from now on.

That's a bad decision. RPM never asked anyone to avoid the word "junk." He merely called for appropriate use. Ryan Gregory has serious doubts about the usefulness of the term as he explains in his excellent article A word about "junk DNA".. If you want to keep up with the discussion about junk DNA you need to read that article—but you don't need to agree with everything in it. :-)

Gregory has also commented on the Scientific American article by proposing a new term, Junctional DNA, to describe DNA that probably has a function but that function isn't known. According to him, this avoids the confusion between using "junk" DNA to describe DNA that we really know to be junk (pseudogenes) and DNA for which no function has been discovered so we assume it has none.

I think we don't need to go there. It's sufficient to remind people that lots of DNA outside of genes has a function and these functions have been known for decades. Thus, it is highly inappropriate to assume that all non-genic DNA is junk and no scientist should ever do this. Note that I'm avoiding the term "noncoding" DNA here. This is because to me the term "coding DNA" only refers to the coding region of a gene that encodes a protein. Thus, in my mind, there are many genes for RNAs that are not properly called coding regions so they would fall into the noncoding DNA category. Also, introns in eukaryotic genomes would be "noncoding DNA" as far as I'm concerned. I think that Ryan Gregory and others use the term "noncoding DNA" to refer to all DNA that's not part of a gene instead of all DNA that's not part of the coding region of a protein encoding gene. I'm not certain of this.

The importance of the term "junk DNA" is to highlight the fact that it has not evolved by natural selection. This is a point I made in one of my first blog postings way back in November [Bill Dembski Needs Help, Again] and again a few days later [The IDiots Don't Understand Junk DNA] [Two Kooks in a Pod].

This isn't original. Everyone knows that junk DNA poses a major threat to both Intelligent Design Creationism and adaptationism [Junk DNA Disproves Intelligent Design Creationism] [Evolution by Accident]. Read Gregory's article for the short concise version of this dispute. What it means is that junk DNA threatens the worldviews of both Dembski and Dawkins!

Science writers often get trapped into thinking like an adaptationist when it comes to junk DNA. Remember that according to the adaptationist worldview the existence of huge amounts of truly nonfunctional DNA in a genome must be a problem. It can't be explained if natural selection is a powerful driving force behind most of evolution. You can't propose that all minor changes in behavioral genes, for example, have been selected and then turn around and admit that 95% of the human genome is junk!

Adaptationists celebrate every discovery that some little bit of DNA has found a function. That's because in their heart of hearts they think that almost all of the junk DNA will eventually be found to have a function. This is one of the reasons why papers like the PNAS paper mentioned above get so much attention.

I want to keep the term "junk DNA" to refer to all functionless DNA. That includes DNA for which we have direct and indirect evidence of no function (pseudogenes, most of intron DNA, corrupted transposons etc.) and it also includes the rest of the DNA for which no function has currently been discovered and we think it's junk because it's not conserved (among other reasons). Junk DNA is not noncoding DNA and anyone who claims otherwise just doesn't know what they're talking about.

The term "junk DNA" forces people to think about the underlying causes of evolution. It makes them stop to appreciate the fact that modern organisms could have evolved with useless DNA in their genomes and the only way this could have happened is if there's a lot more to evolution than just natural selection and adaptation. It's a good term. It's an accurate term. It's a useful term. And it makes people think.

10 comments :

T Ryan Gregory said...

Ryan Gregory has serious doubts about the usefulness of the term as he explains in his excellent article A word about "junk DNA".

Just to clarify, I think the term could be useful -- indeed, it was useful when Ohno coined it. The problem is that it is seldom used in an appropriate way. If the meaning were specified explicitly to be "regions strongly suspected of being non-functional with evidence to back it up" (which, incidentally, is not the original definition according to Ohno (1972) or Comings (1972)), and if people used it only in this way, then I would not have a problem with this. But given the difficulty that people seem to have in accepting that some DNA may truly not have a function at the organism level, I don't know if we could ever get it to be used with such precision.

...a new term, Junctional DNA, to describe DNA that probably has a function but that function isn't known... think we don't need to go there. It's sufficient to remind people that lots of DNA outside of genes has a function and these functions have been known for decades.

That neologism was suggested in response to Minkel's appeal for a term that would "make the distinction between functional and nonfunctional noncoding DNA clear to a popular audience". My main suggestion was to call DNA by what it is known to be, if at all possible, by function ("regulatory DNA", "structural DNA") or by type ("pseudogene", "transposable element", "intron"). Your definition of "junk DNA" is also more precise than most usages, meaning that you specify that the term only be applied to sequences for which there is evidence (not just assumption) of non-function. That leaves us with something in between for journalists to talk about with a catchy buzzword. "Junctional DNA" lets them specify that we're not talking about "junk DNA" or "functional DNA" -- i.e., there is some evidence for function (e.g., being conserved) but no evidence of what that function is. The main utility would be to stop the very frustrating leap that gets made from "this 1% of the genome may have a function, so the whole thing must have this function" kind of reporting. Now they could say "another 1% has moved into the category of 'junctional DNA'". I think that would be considerably less misleading than current wording.

Note that I'm avoiding the term "noncoding" DNA here. This is because to me the term "coding DNA" only refers to the coding region of a gene that encodes a protein ... there are many genes for RNAs that are not properly called coding regions so they would fall into the noncoding DNA category ... introns in eukaryotic genomes would be "noncoding DNA" as far as I'm concerned. I think that Ryan Gregory and others use the term "noncoding DNA" to refer to all DNA that's not part of a gene instead of all DNA that's not part of the coding region of a protein encoding gene. I'm not certain of this.

By definition, non-coding DNA is, and always has been, everything other than exons. The reason this is relevant is that early work in genome biology assumed that there should be a 1 to 1 correspondence between DNA content and protein-coding gene number. This is work that occurred for at least two decades before the discovery of introns, pseudogenes, and other non-coding DNA. Now we have more descriptive names for the categories of DNA that are not the genes, all the genes, and nothing but the genes. I actually don't know of anyone else who would have a problem calling introns, pseudogenes, and regulatory regions "non-coding DNA". Certainly, Ohno, Crick, and many others have historically put introns in the same non-protein-coding grouping as pseudogenes. It's just a category -- you also have more specific subcategories to apply to each of the types of non-coding DNA. Perhaps your objection relates to an undue emphasis on the distinction between exons and everything else -- well, that's the history of the past half century of this field, so it should be no surprise that the terminology reflects this.

Read Gregory's article for the short concise version of this dispute. What it means is that junk DNA threatens the worldviews of both Dembski and Dawkins!

Not quite. What you're leaving out of this is the possibility of multiple levels of selection. In the original edition of The Selfish Gene (1976, p.76), Dawkins argued that "the simplest way to explain the surplus DNA is to suppose that it is a parasite, or at best a harmless but useless passenger, hitching a ride in the survival machines created by the other DNA". Cavalier-Smith (1977) drew a similar conclusion (before he had read Dawkins), and Doolittle and Sapienza (1980) and Orgel and Crick (1980) [yes, that Crick] independently developed the concept of "selfish DNA" a few years later. This is an explicitly multi-level selection approach because it specifies that non-coding DNA can be present due to selection within the genome rather than exclusively on the organism (or gene, in Dawkins's case) (see, e.g., Gregory 2004, 2005). (Incidentally, this idea of parasitic DNA dates back at least to 1945, when Gunnar Östergren characterized B chromosomes in this fashion). Of course, they tended to do what Ohno did and applied this one idea to all non-coding DNA, which is too ambitious. The modern view is more pluralistic (see, e.g., Pagel and Johnstone 1992 vs. Gregory 2003). Some non-coding DNA is just accumulated "junk" (in the definition of evidence-supported non-function that you espouse). Some (perhaps most) is "selfish" or "parasitic" and persists because there is selection within the genome as well as on organisms (in fact, an argument could be, and has been, made that "selfish DNA" would be a much more accurate term than "junk DNA" for most non-coding DNA). Some non-coding DNA is clearly functional at the organism level, including regulatory regions and chromosome structure components. Some of these latter functional non-coding DNA sequences are derived from elements that originally were of one of the first two types, most notably transposable elements that take on a regulatory function through co-option (or, in another manner of thinking, that undergo a shift in level of selection).

Junk DNA is not noncoding DNA and anyone who claims otherwise just doesn't know what they're talking about.

I'm afraid I don't follow what you mean here. By your definition, "junk DNA" is any non-functional sequence of DNA, including pseudogenes (i.e., the original meaning). Those sequences do not encode proteins. Hence, your version of junk DNA is non-coding. I think this reflects the confusion that is imposed by the term "junk DNA", which is why I generally think it is more obfuscating than enlightening.


________

References

Cavalier-Smith, T. 1977. Visualising jumping genes. Nature 270: 10-12.

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Dawkins, R. 1976. The Selfish Gene. Oxford University Press, Oxford.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Gregory, T.R. 2003. Variation across amphibian species in the size of the nuclear genome supports a pluralistic, hierarchical approach to the C-value enigma. Biological Journal of the Linnean Society 79: 329-339.

Gregory, T.R. 2004. Macroevolution, hierarchy theory, and the C-value enigma. Paleobiology 30: 179-202.

Gregory, T.R. 2005. Macroevolution and the genome. In The Evolution of the Genome (ed. T.R. Gregory), pp. 679-729. Elsevier, San Diego.

Ohno, S. 1972. So much "junk" DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

Östergren, G. 1945. Parasitic nature of extra fragment chromosomes. Botaniska Notiser 2: 157-163.

Pagel, M. and R.A. Johnstone. 1992. Variation across species in the size of the nuclear genome supports the junk-DNA explanantion for the C-value paradox. Proceedings of the Royal Society of London, Series B: Biological Sciences 249: 119-124.

SPARC said...

"junctional DNA"
I doubt that it is useful to introduce this term, since "junctional DNA" is sometimes used for certain elements of EBV. In addition, the term "junctional sequences" is occasionally used for the description of parts of rearranged TCR and Ig genes as well as in reference to translocations. Just my 2 cents.

T Ryan Gregory said...

I doubt that it is useful to introduce this term, since "junctional DNA" is sometimes used for certain elements of EBV. In addition, the term "junctional sequences" is occasionally used for the description of parts of rearranged TCR and Ig genes as well as in reference to translocations.

Fair enough, and I did consider that it has (minor) prior uses, but if you do a Google search for "junctional DNA", the top reference is to my post. Therefore I don't think using this term in popular reporting (which is the context in which it was suggested) will be confusing to most people. Lots of terms have taken on broader meanings (for better or for worse), "junk DNA" and "satellite DNA" being notable examples.

Nick (Matzke) said...

That post by Ryan Gregory is great. However the term "junctional" will probably not work, since to most readers it will imply "junction" or "joining", something different than what is intend.

T Ryan Gregory said...

..."junctional" will probably not work, since to most readers it will imply "junction" or "joining", something different than what is intend.

Actually, "joining" is part of the intended meaning, in that it is "an indication that the sequences so described reside at the crossroads [or 'junction'] between DNA with no evident function and that with a clear function". As Larry argues, if there is evidence of non-function it is "junk" in the sense commonly used. If it has a clear function then it should be labeled according to the function. In the middle, joining these two ends of the spectrum, is something else -- in the present term, "junctional DNA". It also represents a joining of two levels of selection in many instances. And finally, it may involve the joining of former parasites with the rest of the functional portion of the genomes of their hosts.

If "junctional DNA" won't work, then other proposed terms include "funk DNA" (Petsko 2003) or "dark DNA" (Carroll 2005), though both of these seem to be applied to all non-coding DNA and not the "gray area" between non-functional and functional.

Similarly, if "junk DNA" is too loaded or usually used too loosely to be invoked strictly in reference to sequences for which there is convincing evidence of non-function, then there are many alternatives that have been proposed. I listed 16 terms that have been used in the past in my original discussion.

__________

References

Carroll, S.B. 2005. Endless Forms Most Beautiful. W.W. Norton & Co., New York.

Petsko, G.A. 2003. Funky, not junky. Genome Biology 4: 104.

Larry Moran said...

I guess I didn't make myself clear about "noncoding DNA." To me it doesn't cover all the genes for ribosomal RNAs, tRNAs, and the various small RNAs. What term do you use for those genes?

When I said that junk DNA is not noncoding DNA I meant that the term "junk DNA" should not be used as a synonym for "noncoding DNA." We have examples of such misuse in the scientific literature.

By definition, non-coding DNA is, and always has been, everything other than exons.

Not to be picky but do you really equate "exons" and "coding DNA"? I hope not because there are lots of exons that don't encode polypeptides.

To me, it's the term "noncoding DNA" that should be banished from polite conversation. It has a precise meaning but not a very useful one.

I think most intron sequences are junk. Do you agree?

T Ryan Gregory said...

I guess I didn't make myself clear about "noncoding DNA." To me it doesn't cover all the genes for ribosomal RNAs, tRNAs, and the various small RNAs. What term do you use for those genes?

I think "noncoding DNA" could easily be modified to mean anything without a coding function at the organism level (otherwise LINEs would qualify too, for example), which would then include ribosomal RNA genes and other such elements (although not all copies in the rRNA array are transcribed).

When I said that junk DNA is not noncoding DNA I meant that the term "junk DNA" should not be used as a synonym for "noncoding DNA." We have examples of such misuse in the scientific literature.

Agreed, although it's usually a problem that people use "junk DNA" when they mean "noncoding DNA" -- i.e., they assume non-function with no evidence (or worse, they use a demonstration of some function in a small portion of the DNA to overthrow the idea that the majority could be "junk").

Not to be picky but do you really equate "exons" and "coding DNA"? I hope not because there are lots of exons that don't encode polypeptides.

The traditional use of the term is shorthand for "non-protein-coding DNA". The reason that protein-coding sequences (recognized as exons since Gilbert 1978) get top billing is historical rather than biologically realistic, but that's where the term came from.

To me, it's the term "noncoding DNA" that should be banished from polite conversation. It has a precise meaning but not a very useful one.

I would argue that we need the general term that does not carry implications about function or source or mode of insertion/deletion, and then whenever possible we should refer to non-coding sequences with more descriptive terms. "Junk DNA", as you have noted, applies to a subset of non-coding DNA, namely that with no function (or, in Ohno's initial use, pseudogenes). However, for most non-coding DNA we have little conclusive evidence either way as to function/non-function. Non-coding DNA could have a function that is independent of sequence, which would mean that lack of conservation is not evidence against function. If anything, I see too many appeals to function in this way. We do know that total DNA amount has effects (e.g., on cell size, cell division), but whether this qualifies as function is a subject of disagreement in the literature.

I think most intron sequences are junk. Do you agree?

Are all introns non-functional? I don't know. It's possible that some introns play a role in alternative splicing and other processes which could qualify as functions. There is also a question of how much non-coding DNA is intronic, as some authors have suggested that most of it is in animals (but not in plants). Again, I would consider myself agnostic in this because the evidence is no convincing either way yet.

Anonymous said...

What I gather from the conversation -

Shorthand: All junk (non-functional) DNA is non-coding, but not all non-coding DNA is junk (non-functional).

Further shorthand: We are uncertain about the functionality (whether it exists, what it is) of much non-coding DNA.

Gloss on further shorthand: Like many mysteries, the fact that there's uncertainty about the functionality of much non-coding DNA has led to speculation about its origin and purpose, both from those with non-scientific agendas (Dembski, et al.) and those with scientific hypotheses (if one wishes to take a positive view) or scientific agendas (if one wishes to take a less positive view), such as Dawkins.

Anonymous said...

Not sure why you say noncoding DNA is a problem for Dawkins specifically. Wasn't a major criticism of THe Selfish Gene that he took an extreme gene's-eye-view of the unit of selection, despite the importance of integrated whole organisms to real-life evolution? If DNA sequences are viewed as the replicators of interest, then parasitic selfish "functionless" "genes" seem perfectly reasonable.
As for adaptationism, this seems something of a straw man. Is anybody really so vehement an adaptationist as to argue that selection must explain every freaking nucleotide sequence in a genome? Most adaptationists I know (and I am probably one, by your definition) are talking about phenotypic traits at least, andintegrated organisms (with inherent trade-offs) most of the time.

T Ryan Gregory said...

As for adaptationism, this seems something of a straw man. Is anybody really so vehement an adaptationist as to argue that selection must explain every freaking nucleotide sequence in a genome?

Overall, I encounter far more claims of a universal adaptive function for non-coding DNA than claims of it all being useless junk. Since Comings (1972) there has often been a general assumption that non-coding DNA must be doing something for the organism or it would have been eliminated by natural selection long ago. This is rarely thought of on the scale of individual nucleotides (because most proposed functions are independent of sequence), but rather as an aggregate role in structure or regulation or buffering against mutations or some other such thing. See here and here.