Friday, May 09, 2014

The Case for Junk DNA: The onion test

I draw your attention to a new paper on junk DNA by my friends Alex Palazzo and Ryan Gregory (Palazzo and Gregory, 2014).

You should read this paper if you want a nice summary of the evidence for a high percentage of junk in our genome. They cover genetic load, sequence conservation, and the evidence from the genome sequence itself. There's a brief description of the nearly-neutral theory of molecular evolution and why it's relevant to the debate.1

One of the most important contributions is an explanation of the C-Value Paradox and the Onion Test. The Onion Test was originally published on Ryan's blog (The onion test) but some people won't reference blog posts so here it is in a peer-reviewed paper.
There are several key points to be understood regarding genome size diversity among eukaryotes and its relationship to the concept of junk DNA. First, genome size varies enormously among species [18], [19]: at least 7,000-fold among animals and 350-fold even within vertebrates. Second, genome size varies independently of intuitive notions of organism complexity or presumed number of protein-coding genes (Figure 1). For example, a human genome contains eight times more DNA than that of a pufferfish but is 40 times smaller than that of a lungfish. Third, organisms that have very large genomes are not few in number or outliers—for example, of the >200 salamander genomes analyzed thus far, all are between four and 35 times larger than the human genome [18]. Fourth, even closely related species with very similar biological properties and the same ploidy level can differ significantly in genome size.

These observations pose an important challenge to any claim that most eukaryotic DNA is functional at the organism level. This logic is perhaps best illustrated by invoking “the onion test” [20]. The domestic onion, Allium cepa, is a diploid plant (2n = 16) with a haploid genome size of roughly 16 billion base pairs (16 Gbp), or about five times larger than humans. Although any number of species with large genomes could be chosen for such a comparison, the onion test simply asks: if most eukaryotic DNA is functional at the organism level, be it for gene regulation, protection against mutations, maintenance of chromosome structure, or any other such role, then why does an onion require five times more of it than a human? Importantly, the comparison is not restricted to onions versus humans. It could as easily be between pufferfish and lungfish, which differ by ~350-fold, or members of the genus Allium, which have more than a 4-fold range in genome size that is not the result of polyploidy [21].

In summary, the notion that the majority of eukaryotic noncoding DNA is functional is very difficult to reconcile with the massive diversity in genome size observed among species, including among some closely related taxa. The onion test is merely a restatement of this issue, which has been well known to genome biologists for many decades [18].


1. A little birdy tells me that there's a "better" paper coming out in a few months.

Palazzo, A. and Gregory T.R. (2014) The Case for Junk DNA. PLoS Genetics (published May 8, 2014) [doi: 10.1371/journal.pgen.1004351]

24 comments:

  1. It seems to me that even though the ENCODE people partially retracted their careless statements in a recent paper a good way to really convince the no-junk people would be to take a large section of the lungfish genome and subject it to all the biochemical assays ENCODE used. My guess is that you'd see the same density of TF binding, low level RNA production and epigenetic tags that you see throughout the human genome. This would strongly suggest that nonfunctional DNA can have all of these biochemical signatures.
    RodW

    ReplyDelete
    Replies
    1. You need to sequence that genome first. And that's still impossible

      Delete
    2. Why? Because of all the repetitive elements? Even sequencing one large contig would still make the point

      Delete
    3. How exactly are you going to do ChIP-seq, RNA-seq and everything else on a contigs alone?

      The source of the reads is not the contigs - it's the whole genome.

      Also, in order to interpret something like lungfish, we will need to have a really good grasp of its current ploidy, WGD history, etc.

      It will be sequenced eventually, but it will take a few more years of technology development.

      Delete
    4. Maybe Drosophila species which integrated different amounts of Wolbachia naturally or experimentally could be a start. What would be the prediction for the complete Wolbachia genome in Drosophila ananassae? Or as I suggested elsewhere: Just take a completely sequneced E. coli genome and put it into one of the cell lines ENCODE used and re-run the analysis.

      Delete
    5. Actually an even better experiment was done by Mike White -
      http://www.homolog.us/blogs/blog/2013/07/17/random-dna-sequence-mimics-encode/

      Delete
  2. a good way to really convince the no-junk people would be to take a large section of the lungfish genome and subject it to all the biochemical assays ENCODE used.

    Or perhaps onions - more readily available and easier to keep on hand than lungfish, I assume.

    ReplyDelete
  3. I wonder if there are enough genome size measures at this point to use the data to try and construct a 'phylosize' tree?
    It's kind of a silly exercise but it might be useful in an instructional setting to show the results - that even if you get some kind of a tree, the tree makes no sense in terms of our the relatedness we see in morphology or sequence similarity. This could be represented as a kind of negative control on phylogenetic trees - emphasizing the important point that one of the great strengths of phylogenetics is that DNA sequence and morphological trees are largely congruent, but that that is an emergent observation, not a necessary consequence of tree building.

    ReplyDelete
    Replies
    1. There are several studies tracing genome size on phylogenies for various groups, but I don't know of a nice awesome figure...it should be done!

      Delete
  4. "if most eukaryotic DNA is functional at the organism level, be it for gene regulation, protection against mutations, maintenance of chromosome structure, or any other such role, then why does an onion require five times more of it than a human?

    The wording of the onion test is unfortunate since it equivocates between "function" and "required". There are many examples of well-understood functional genes that are nevertheless not required in the can't-live-without-it manner implied by the onion test.

    ReplyDelete
    Replies
    1. If we are thinking of the same things, I don't agree with your interpretation. There are genes that are conserved over evolutionary time, which can be subjected to targeted knocking out in model organisms, in the lab environment.

      However, the fact that these genes were conserved over evolutionary time implies that sequence variations were deleterious and subject to purifying selection.This purifying selection shows that the functional gene product was required. The non-lethality of the "clean" knockout in the lab setting makes no predictions as to how a natural knockout would do outside the lab, over successive generations.

      Delete
    2. The onion test doesn't talk about 'genes', only DNA. There is no mention of sequence conservation either. Properly phrased without equivocating between "functional" and "required", the onion test should be something like: "if all DNA is functional, explain why the onion has 5 times more functionality than humans". Of course, once you remove the rhetorical equivocation it could be postulated that onion DNA functionality is 5 times less efficient than human DNA functionality as an explanation for why there is 5 time more DNA in onions.

      Delete
    3. Genes (protein-rRNA-tRNA, etc) are just easier targets. Conservation of control elements or other genomic features would make a similar argument - but it is hard to imagine an element that has no effect upon knockout.

      And you are missing a key part of the onion test - the within onion variation. If your "less efficient" idea holds, how is A. altyncolicum 5x more efficient than A. ursinum, given the immense similarity in most of the existing DNA? The test is actually a good one at weeding out such musings.

      Delete
    4. Whimple, your reverse logic would mean that bats, hummingbirds and fugu have far "more efficient" DNA than humans. Right. You can postulate anything you want, but can you make that testable?

      Delete
  5. The onion test is basically an argument from ignorance. If we cannot know the reason for what appears to be an unnecessarily large genome size of the onion, just chalk it up to junk.

    After all T. Ryan Gregory ask "...but why would the onion have such a large genome?"

    ..... it just has to be junk!

    More probably, it will be chalked up to the onion keeping a diary of all the cyclical mutations it has employed to successfully thwart countless parasite/virus attacks, and all the countless environmental changes over the years.

    What can you say, the onion has had a harder time of it that other organisms.

    Time for tea.

    ReplyDelete
    Replies
    1. [..] the onion keeping a diary of all the cyclical mutations it has employed[...]

      And how, even if true, is that not junk?

      Delete
    2. Steve says,

      The onion test is basically an argument from ignorance. If we cannot know the reason for what appears to be an unnecessarily large genome size of the onion, just chalk it up to junk.

      Palazzo and Gregory (2014) say,

      Importantly, the concept of junk DNA was not based on ignorance about genomes. On the contrary, the term reflected KNOWN details about genome size variability, the mechanism of gene duplication and mutational degradation, and population genetics theory. Moreover, each of these observations and theoretical considerations remains valid. In this review, we examine several lines of evidence—both empirical and conceptual—that support the notion that a substantial percentage of the DNA in many eukaryotic genomes lacks an organism-level function and that the junk DNA concept remains viable post-ENCODE.

      Almost all of the recent blog posts, lectures, and publications from junk DNA proponents have emphasized that it's a gross misunderstanding to attribute junk DNA to an argument from ignorance. In fact, these same scientists have pointed out quite clearly that most of the OPPONENTS of junk DNA are much more guilty of arguing from a position of ignorance.

      Steve, you have obviously failed to read any of these papers so you remain ignorant of the positive arguments in favor of junk DNA. Fortunately for you, ignorance is curable.

      Delete
    3. "The onion test is basically an argument from ignorance. If we cannot know the reason for what appears to be an unnecessarily large genome size of the onion, just chalk it up to junk."

      There's more than one species of onion. Genome sizes vary between them by as much as a factor of 5. The argument is not that onion genomes are "unnecessarily large", it's that even within closely related onion species there's huge variations in genome size.

      Onions are not unique in this respect, there are other groupings with even larger variations.

      Once again we see another ID-creationist argument about junk-DNA resting on their ignorance about the subject and the history behind the concept. I don't know what kind of propaganda they all read, but there must be some kind of explanation for why multiple different IDcreationists continually turn up here with the same refuted misunderstanding all the time; e.g. that junk-DNA is an argument from ignorance.

      Steve, what litterature on the subject did you read on the junk-DNA question that made you think it was an argument from ignorance? Do you frequent Uncommon Descent, is that where you got your understanding of the concept?

      Delete
    4. Maybe Steve read Mattick's HUGO paper in which Mattick refers to the c-value paradox and the onion test in the following way:

      "The other substantive argument that bears on the issue, alluded to in the quotes that preface the Graur et al. article, and more explicitly discussed by Doolittle (Doolittle 2013), is the so-called ‘C-value enigma’ , which refers to the fact that some organisms (like some amoebae, onions, some arthropods, and amphibians) have much more DNA per cell than humans, but cannot possibly be more developmentally or cognitively complex, implying that eukaryotic genomes can and do carry varying amounts of unnecessary baggage. That may be so, but the extent of such baggage in humans is unknown. However, where data is available, these upward exceptions appear to be due to polyploidy and/or varying transposon loads (of uncertain biological relevance), rather than an absolute increase in genetic complexity (Taft et al. 2007). Moreover, there is a broadly consistent rise in the amount of non-protein-coding intergenic and intronic DNA with developmental complexity, a relationship that proves nothing but which suggests an association that can only be falsified by downward exceptions, of which there are none known (Taft et al. 2007; Liu et al., 2013)."

      Seemingly he is unaware of the second part of the onion test that refers to different genome sizes in different onion species. The other funny thing here is that Mattick refers to "polyploidy and/or varying transposon loads (of uncertain biological relevance)" to explain higher c-values. Seemingly he would accept transposons not contributing to complexity if they are in a genome of what he would assume less complex organism that happens to be bigger than the human genome. In addition, he seems to be unaware that the c-value ist the size of the haploid genome. Actually, Ewan Birney made the very same mistake back in 2012. Thus, IMO Birney really was on the 80% side of the debate back then.

      Delete
    5. Many things are amazing about that quote from Mattick, to begin with, he doesn't know what C value means. But he tries to wave away the C value paradox without knowing what C value is.

      The claim that "there are no downward exceptions" is quite amazing. Never heard of fugu, bats, hummingbirds, carnivorous bladderworts, etc. Amazing.

      Delete
    6. What exactly would count as a "downward exception"? What's the standard against which we measure downward vs. upward, and in fact against which we decide what the exceptions are exceptions to?

      Delete
    7. I guess his standard is the line on a Dog's Ass plot. Mathematically, any fitted line MUST have points scattered above it and below it, unless they're all collinear. If he means a fitted trend line, there MUST be downward exceptions.

      Delete
    8. Then again, on a trend line, there would on average be as many downward as upward exceptions, and yet he says the downward exceptions are rare.

      Delete
    9. Steve: “The onion test is basically an argument from ignorance.”

      The ‘onion test’ is a powerful (and cool) expression of the phenomenon called C-value paradox, or enigma, an expression that most people, both scientists and layman, can easily relate to; it is not “an argument from ignorance’, as it reflects a phenomenon that scientists (with the apparent exception of ENCODE leaders) have tried to explain for half of century and are still working hard at it.

      Delete