Friday, July 24, 2015

John Parrington discusses genome sequence conservation

John Parrington has written a book called, The Deeper Genome: Why there is more to the human genome than meets the eye. He claims that most of our genome is functional, not junk. I'm looking at how his arguments compare with Five Things You Should Know if You Want to Participate in the Junk DNA Debate

There's one post for each of the five issues that informed scientists need to address if they are going to write about the amount of junk in you genome. This is the last one.

1. Genetic load
John Parrington and the genetic load argument
2. C-Value paradox
John Parrington and the c-value paradox
3. Modern evolutionary theory
John Parrington and modern evolutionary theory
4. Pseudogenes and broken genes are junk
John Parrington discusses pseudogenes and broken genes
5. Most of the genome is not conserved (this post)
John Parrington discusses genome sequence conservation

5. Most of the genome is not conserved

There are several places in the book where Parrington address the issue of sequence conservation. The most detailed discussion is on pages 92-95 where he discusses the criticisms leveled by Dan Graur against ENCODE workers. Parrington notes that 9% of the human genome is conserved and recognizes that this is a strong argument for function. It implies that >90% of our genome is junk.

Here's how Parrington dismisses this argument ...
John Mattick and Marcel Dinger ... wrote an article for the HUGO Jounral, official journal of the Human Genome Organisation, entitled "The extent of functionality in the human genome." ... In response to the accusation that the apparent lack of sequence conservation of 90 per cent of the genome means that it has no function, Mattick and Dinger argued that regulatory elements and noncoding RNAs are much more relaxed in their link between structure and function, and therefore much harder to detect by standard measures of function. This could mean that 'conservation is relative', depending on the type of genomic structure being analyzed.
In other words, a large part of our genome (~70%?) could be producing functional regulatory RNAs whose sequence is irrelevant to their biological function. Parrington then writes a full page on Mattick's idea that the genome is full of genes for regulatory RNAs.

The idea that 90% of our genome is not conserved deserves far more serious treatment. In the next chapter (Chapter 7), Parrington discusses the role of RNA in forming a "scaffold" to organize DNA in three dimensions. He notes that ...
That such RNAs, by virtue of their sequence but also their 3D shape, can bind DNA, RNA, and proteins, makes them ideal candidates for such a role.
But if the genes for these RNAs make up a significant part of the genome then that means that some of their sequences are important for function. That has genetic load implications and also implications about conservation.

If it's not a "significant" fraction of the genome then Parrington should make that clear to his readers. He knows that 90% of our genome is not conserved, even between individuals (page 142), and he should know that this is consistent with genetic load arguments. However, almost all of his main arguments against junk DNA require that the extra DNA have a sequence-specific function. Those facts are not compatible. Here's how he justifies his position ...
Those proposing a higher figure [for functional DNA] believe that conservation is an imperfect measure of function for a number of reasons. One is that since many non-coding RNAs act as 3D structures, and because regulatory DNA elements are quite flexible in their sequence constraints, their easy detection by sequence conservation methods will be much more difficult than for protein-coding regions. Using such criteria, John Mattick and colleagues have come up with much higher figures for the amount of functionality in the genome. In addition, many epigenetic mechanisms that may be central for genome function will not be detectable through a DNA sequence comparison since they are mediated by chemical modifications of the DNA and its associated proteins that do not involve changes in DNA sequence. Finally, if genomes operate as 3D entities, then this may not be easily detectable in terms of sequence conservation.
This book would have been much better if Parrington had put some numbers behind his speculations. How much of the genome is responsible for making functional non-coding RNAs and how much of that should be conserved in one way of another? How much of the genome is devoted to regulatory sequences and what kind of sequence conservation is required for functionality? How much of the genome is required for "epigenetic mechanisms" and how do they work if the DNA sequence is irrelevant?

You can't argue this way. More than 90% of our genomes is not conserved—not even between individuals. If a good bit of that DNA is, nevertheless, functional, then those functions must not have anything to do with the sequence of the genome at those specific sites. Thus, regions that specify non-coding RNAs, for example, must perform their function even though all the base pairs can be mutated. Same for regulatory sequences—the actual sequence of these regulatory sequences isn't conserved according to John Parrington. This requires a bit more explanation since it flies on the face of what we know about function and regulation.

Finally, if you are going to use bulk DNA arguments to get around the conflict then tell us how much of the genome you are attributing to formation of "3D entities." Is it 90%? 70%? 50%?


  1. "This could mean that 'conservation is relative'" - well, yes, we all know that the degree of conservation varies, and it really is not helpful to pretend that conservation is an on-off sort of thing.

    So Parrington is trotting out the idea that the degree of conservation is something that needs to be quantified more precisely and follows up with, er, not attempting to quantify it at all?

  2. Larry

    We don't know the exact minimum number of deleterious mutations that have to happen per generation in order to cause a problem. It's probably less than two (2). It's probably not as low as 0.5. It should be no more than 1 or 2 deleterious mutations per generation.

    If you and others don't know what genetic load can be tolerated by any given organism, based on what evidence would you criticize others, like John Parrington, for having doubts about "science" that is based on nothing but pure speculation?
    Those are your own words: "We don't know... and probably"

    1. Are you really stupid or do you just play an idiot on the internet?

    2. Here we go again. I would like an answer. This time without insults. Can you handle it?

    3. Larry was trying to say that a conceivable range between 0.5 and 2 is not equivalent to "don't know". I suppose you might have not noticed that qualifier "exact".

      I have faith in you, by the way. I think you're really, sincerely stupid. But I don't think you come by it naturally; I think you work hard at it.

    4. Scientists use wording like "we don't know the exact minimum" and "probably" so that other scientists will know that they're not just talking out of their ass. If you find that scientists are routinely scoffing at your arguments, could it be because you are not using these phrases to appropriately quantify the uncertainty in your own claims? Just a thought.

    5. I would like Larry to answer his own challenge. Is it too much to ask?

    6. No problem. I'm just wondering if you would answer one for me. Do you find it inconvenient when monkeys fly out of your ass? Yes or no, please.

    7. Septic Mind,

      That you would quote and take only those few words, without the slightest effort for understanding what the sentences meant, and the misuse of terms in your "question" make you an obviously ignorant and uneducable imbecile. Again, you have nobody else to blame but yourself for the conclusions we attain about you.

    8. "We don't know" >"probably">scientific fact> if you don't believe it you must be stupid, stupid and an idiot.

    9. We don't know (and can't know) the exact size of the world's population at the moment. It's probably greater than 7 billion and probably smaller than 7.5 billion. It's nevertheless a scientific fact that there are approximately 7,250,000,000 humans on Earth, give or take one or two hundred million.

    10. ""We don't know" >"probably">scientific fact> if you don't believe it you must be stupid, stupid and an idiot."

      And there is your problem. But you don't get it, do you?

    11. Those are your own words: "We don't know... and probably"

      I know you have no way of understanding this stuff, so all you can do is fasten on a couple of phrases you think are weak and harp on them.

      I'm sure anyone who does understand realizes why no exact number is needed (hint: poison can kill you even if the exact fatal dose hasn't been determined).

  3. Obviously, some onions need way more 3-D structuring of their DNA to produce onions than other onions. Duh!

  4. I don't read and I have not read at all anyone's posts that start with an insult. So, don't waste your time writing them.

    1. You say that like anyone gives a flying fuck.

    2. I don't read

      I could have guessed that.

    3. Hey genius who just stupidly posted as KevNick, thus revealing your real identity is Quest...

      We don't write replies to you so you can read them. We write replies for the sake of neutral observers who blunder in here, to prevent them from being deceived by creationists. And it drives you crazy.

      Pro tip: if you sanctimoniously tell people not to insult you, it makes people want to insult you.

    4. I don't read and I have not read at all anyone's posts that start with an insult.

      you must be stupid, stupid and an idiot

      Very consistent of you, SM.

  5. Larry,

    You may be interested to know that the genetic diversity of ~0.1% difference in genomic DNAs is about the optimum level for our species. Any thing more than that would mean less fitness and complex diseases as our just published paper on Parkinson's disease has shown. So, a population of patients has greater genetic variation than a population of normal individuals. Is this not clear evidence for no junk DNA in our genome?

    Some quotations relevant to the junk DNA debate from our Parkinson's disease paper "Enrichment of Minor Alleles of Common SNPs and Improved Risk Prediction for Parkinson's Disease" just published in PLoS One:


    Recent studies have begun to show that a much larger than expected portion of the human genome may be functional [24–29].

    An organism can certainly accommodate some limited amounts of random variations within its building parts or DNAs, but too much random errors or mutations may exceed an organism’s maximum level of tolerable disorder or entropy. Thus overall level of randomness or minor allele amounts may be expected to be higher in complex diseases relative to controls.

    In fact, while most bench biologists have thought otherwise, nearly all in the population genetics field still believe that most SNPs are neutral or that most minor alleles are minor because of random drift rather than because of disease-association.

    1. So, a population of patients has greater genetic variation than a population of normal individuals. Is this not clear evidence for no junk DNA in our genome?


    2. Gnomon, WTF is this paragraph?

      An organism can certainly accommodate some limited amounts of random variations within its building parts or DNAs, but too much random errors or mutations may exceed an organism’s maximum level of tolerable disorder or entropy.

      WTF is this!? Did you just use the word "entropy" to describe a non-thermodynamic system, non-Shannon metric? What the $%&* equation are you using for entropy?

      And does this sound like professional scientific writing?

      An organism can certainly accommodate some limited amounts of random variations within its...DNAs

      NO. "Random variation within its DNAs"? Jeepers.


      too much random errors or mutations

      Too much mutations? Ouch.

      Leaving the sophomoric style aside, the scientific blunder is saying random mutations build up "entropy". Where is the $%&*king equation for that?

      Ow ow ow, who would publish that? What editor or reviewer would miss this shit? Firing squad for the reviewer who missed this shit.

    3. Talking of entropy. From the article:

      The findings of higher MAC in PD cases is consistent with our intuitive hypothesis that a highly complex and ordered system such as the human brain must have an optimum limit on the level of randomness or entropy in its building parts or DNAs.


      Negative selection by way of common diseases such as PD may be one of the ways to maintain a maximum or optimum limit on genomic entropy and to render the disease risk alleles minor ones in the population.

      Ouch, ouch, ouch... So much for editorial/review control in PLOS ONE.

    4. We still have the advantage of post-peer-review. PLOS has a system for comments and such.

    5. re: What the $%&* equation are you using for entropy?

      answer: a derivative of Boltzmann's entropy formula
      S=kG ln W  

      kG = Gnomon’s constant: The product of Gnomon’s Idiocy and the total quantity of Hydrogen in universe equals a universal constant!

      W = ”Wahrscheinlichkeit “ i.e. the probability that gratuitous employ of polysyllabic scientific jargon in a paper will overwhelm insecure reviewers unsure of their own scientific acumen until they capitulate and sign off publication in PLOS ONE

      I don’t know what exactly Gnomon’s problem is; but I’ll bet it is hard to pronounce!

    6. Re: entropy

      An arcane and barely relevant factoid not pertinent to Gnomon's blithering blathering:

    7. Pleasepleaseplease - listen to photosynthesis and head over to post (sensible, non-rude) comments on the actual paper over at the PLOS website, rather than here. Post-peer-review can work, but the comments need to be where the readers of the paper will see it!

  6. Parrington might have done himself a favor by not relying so much on Mattick.

  7. The lack of conservation is not necessarily an indication of non functionality but could be evidence of orphan functionality (aka taxonomically restricted features) if we later discover function. If there is disease association in non-conserved regions, this is an indication of not just functionality, but orphan functionality.

    There may be tolerable functional variability in certain organisms. Not all parts of the genome can tolerate the same level of change. Hence the fact some SNPs and other kinds of mutation are tolerated in some regions quite well while other variations are not.

    So, the sequence conservation argument might not be the best argument for functionality of DNA in humans if it turns out the non-conserved regions are also functional!

    1. Put up some numbers. How much of the human genome do you think has a function that's restricted to humans and not found in chimpanzees? How much of the chimpanzee genome is only functional in chimpanzees?

      How much of the onion genome is onion-specific?

      Imagine a region of junk DNA within the intron of an important protein-coding gene. A mutation occurs that creates a new 3' splice site. This leads to aberrant splicing and disease. Does that mean the original sequence had a function?

    2. Let's see. We have up to 10% "established functionality" (with evidence of conservation), and about 90% "possible de novo functionality", right? And the proportion is roughly the same for all mammals. Kangaroos have more non-conserved DNA than humans, bats less, but in all of them there's several times more non-conserved DNA than conserved stuff. What's the reason for this recent explosion of "orphan functionality" in mammals (not to mention other eukaryotes)? Why is much of the non-conserved stuff homologous between different lineages, why do the patterns of correspondence reflect the nested hierarchy of the family tree, and why are they compatible with the expected effects of neutral evolution over tens of millions of years? Can you explain it without a lot of special pleading?

    3. Onion Test, for crying out loud. Why would some species of onion have genomes whose *differences from other onion genomes* are regions of what Gnomon politely calls "orphan functionality" **several times larger than the entire human genome**?

    4. Hey, Diogenes, it was Sal Cordova (a.k.a. "liarsfordarwin"), not Shi Huang (a.k.a. "Gnomon").

  8. Bagging the question, a common practice in evolutionary genetics (junks assumed, then deduced)

    The molecular evolution and popgen field is known to have the most mathematics among all branches of biology. But precisely because of that, it needs many simplifying assumptions or premises, which often lead to the fallacy of bagging the question. I here gave a few examples regarding the concept of the mutation or genetic load and the genetic load argument for junk DNA, based on reading a paper on the genetic load (Lesecque et al 2012). Some have used the genetic load as the best argument for the junk DNA notion (Palazzo and Gregory, 2014; Graur, 2015). I also discuss the assumption of non-conservation equaling non function and the assumption of infinite sites.

    Lesecque et al say: “The mutation load was more formally defined as the proportional reduction in mean fitness of a population relative to that of a mutation-free genotype, brought about by deleterious mutations (Crow 1970):

    L = (Wmax - Wmean)/Wmax

    where Wmean is the mean fitness of the population at equilibrium and Wmax is the mean fitness of a deleterious mutation-free individual.”

    Is there a deleterious mutation-free individual in a real world or even an imagined world? All mutations, as random mistakes, have a deleterious aspect to an ordered system, if not individually, then collectively. Many mutations could be both deleterious and beneficial. For example, they could be beneficial to adaptive immunity that requires genome variation for producing diverse antibody responses but deleterious to innate immunity that requires conserved proteins to recognize conserved sequences shared by a certain class of microorganisms. By failing to recognize the both deleterious and beneficial nature of most mutations and by classifying mutations into two kinds (deleterious and non-deleterious with the latter consisted of mostly neutral ones), the assumption on the concept of deleterious and non-deleterious mutations eventually led to the genetic load argument for the conclusion that most mutations must be neutral. Here, one sees that the neutral conclusion is already embedded in the premise that led to it. The premise does not recognize the fact that most mutations appear neutral or nearly neutral as a result of balancing selection, and the fact that all mutations have a deleterious aspect as noises to a finely tuned system. Of course, that premise works for a junkyard like system.

    Lesecque et al say: ““If the fitness effects of deleterious mutations are independent from one another, the mutation load across all loci subject to recurrent mutation is approximately

    L = 1-e-U

    (Kimura and Maruyama 1966), where U is the overall rate of deleterious mutation per diploid genome per generation. This simple formula is a classic result of evolutionary genetics.”

    So, a classic formula for the genetic load argument is based on the assumption that the fitness effects of deleterious mutations are independent from one another. For a junk yard, yes, the consequences of errors in the building parts are independent from one another. However, for a system that is ordered and built by network-like interactions among the building parts, no, the consequences of errors in the building parts are NOT independent from one another. In fact, recent studies in genomics are constantly discovering epistatic interactions among mutations. So, here one sees clearly again, the neutral or junk DNA conclusion is already embedded in the premise that treats an organism more as a junkyard than a highly ordered system with components organized in a network fashion. When you have already assumed an organism to be junk like, why bother showing us the math formula and deduction leading to the junk DNA conclusion? You should just say that most DNAs are junks because I said so.

    1. Many mutations could be both deleterious and beneficial.

      They could be, the way any real number could be both positive and negative, for example 3.5 = (+5.0) + (-1.5). A Kuhnian paradigm shift in arithmetic is nigh!

  9. -continued:
    Finally, none of the premises related to the genetic load concept recognized the fact that a large collection of otherwise harmless mutations within an individual could be deleterious, as our recent papers have shown. Well, again, such a fact certainly does not exist for a junkyard-like system. By not recognizing that fact or being too naïve to see it, the practitioners in the popgen field have again and again assumed biological systems to be junk like before setting out to prove/deduce that they are made of largely junks.

    I also briefly comment on a paper by the Ponting group concluding that human genome is only about 8% functional (Rands et al, 2014). The premise for that deduction is that non-conservation means non-function. Again, building parts for different junk yards are not conserved and nonfunctional. So, non-conservation means non function holds for junk yards. But for organisms relying on mutations to adapt to fast changing environments, recurrent or repeated mutations at the same sites at different time points in their life history are absolutely essential for their survival. Less conserved sequences are more important for adaptation to external environment, while the more conserved ones are important for internal integrity of a system. For bacteria or flu viruses to escape human immunity or medicines, the fast changing or non-conserved parts of their genome are absolutely essential. So, here again, by assuming non-function for the non-conserved parts of the genome, one is assuming an organism to be like a junk yard.

    Other key assumptions like the infinite sites model (means neutral sites) are critical for phylogenetics as it is practiced today and for the absurd Out of Africa model of human origin that uses imagined bottlenecks to explain away the extremely low genetic diversity of humans. Well, a junk yard can certainly have an infinite number of parts and tolerate an infinite number of errors. An organism’s genome is finite in size and essentially nothing compared to infinite size. Within such finite size genomes, the proportion that can be free to change without consequences is even more limited or finite.

    A paradigm shift (or revolutionary science) is, according to Thomas Kuhn, a change in the basic assumptions, or paradigms, within the ruling theory of science. The above analyses show that the assumptions for the popgen and molecular evolutionary field are largely out of touch with reality as more reality becomes known, and must be changed quickly if the field wants to avoid fading into oblivion and stay relevant to mainstream bench biology, genomic medicine, archeology, and paleontology. Those assumptions have produced few useful and definitive deductions that can be independently verified and avoid the fate of constant and endless revisions, like we have seen from 1987 to now for the Out of Africa model or the Neanderthals.

    Lesecque Y, Keightley PD, Eyre-Walker A (2012) A resolution of the mutation load paradox in humans. Genetics 191: 1321–1330 .

    Palazzo AF, Gregory TR (2014) The Case for Junk DNA. PLoS Genet 10(5): e1004351. doi:10.1371/journal.pgen.1004351

    Dan Graur (2015) If @ENCODE_NIH is right each of us should have on average from 3 × 10^19 to 5 × 10^35 children. …

    Rands CM, Meader S, Ponting CP, Lunter G (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLoS Genet 10(7): e1004525.

    1. That was some weird reasoning there, but I will limit myself to pointing out that the idiom is "begging the question"; "bagging the question" would be something else.

      OK, I'll go a bit farther. The idea that a given mutation could be both beneficial and deleterious in the same environment fails to realize that a selection coefficient is the overall sum of all effects. The idea that balancing selection results in apparent neutrality is interesting but implausible. If the fitness effects of deleterious mutations are not independent, wouldn't that make genetic load worse, and thus even more indicative that most of your DNA is junk?

    2. Yes. Gnomon provides further evidence that basic critical thinking skills (e.g. the ability to follow the logic of an argument, even one's own) is not necessary for acquiring an advanced degree. It's quite a problem.

    3. Gnomon's problem is that some of the things he says "make sense," so he assumes that his whole construct is right. He's a kook who finds validation on being right about 10% of what he says. Hard for him to see where his problem starts. Harder if the right stuff is spread across the whole diatribe.

    4. Where is Gnomon's $%&*king equation for entropy? Entropy is not a $%&*king figure of speech!

      Short story: another pompous boob who assumes we haven't heard of the science of the 1970's and expects us all to be shocked by false accusations of circular logic.

      No response to the Onion Test.

      Gnomon assumes everything is functional, and presents that as evidence that everything is functional.

  10. In an attempt to deconstruct Gnomen's casuistic circular reasoning:

    Does Human-Chimpanzee DNA sequence identity argue for DNA functionality?

  11. This reminds me Mr Spock Chess.
    Our linear logic is insufficient to play that game.