Friday, July 31, 2015

For the King - Teaser Trailer

This is the game my son, Gordon Moran, and his friends at IronOak Games are developing. Please send him lots of money when Kickstarter is activated in September.

I'm buying a university and a professor character for the game. The professor will battle the forces of evil and superstition. Ms. Sandwalk is contributing enough for a medieval faire with lots of games where you can win prizes.

Find out more at

Thursday, July 30, 2015

The next step in genomics

The draft sequence of the human genome was published in 2001. The "finished" version was published a few years later but annotation continues.

A massive amount of data on complex genomes has been published, especially on the human genome. The next step is to decide what this data means. Here are the most important questions from my perspective.

An accomodationist defends the science of the Pope in the journal Nature

I don't think scientific journals or scientific organizations should take a position on the conflict between science and religion but that doesn't mean they should stay away from the subject altogether. The journal Nature has just (July 28, 2015) published a defense of accomodationism written by David M. Lodge [Faith and science can find common ground]. Lodge describes himself as a "Protestant ecologist embedded for 30 years in a Roman Catholic university." The Catholic University is Notre Dame [see David M. Lodge].

His main argument is that the current Pope understands the science of the environment and has spoken out in favor of protecting the environment. David Lodge thinks this represents an accomomodation between science and religion.

Wednesday, July 29, 2015

Michael Lynch on modern evolutionary theory

Of the Five Things You Should Know if You Want to Participate in the Junk DNA Debate, the most difficult to explain is "Modern Evolutionary Theory." Most scientists think they understand evolution well enough to engage in the debate about junk DNA. However, sooner or later they will mention that junk DNA should have been deleted by selection if it ever existed. You can see that their worldview leads them to believe that everything in biology has an adaptive function.

It's been a few years since I posted Michael Lynch's scathing comments on panadaptationism and how it applies to understanding genomes [Michael Lynch on Adaptationism and A New View of Evolution]. You're in for a treat today.

Here's what you need to know about evolution in order to discuss junk DNA. The first quotation is from the preface to The Origins of Genome Architecture (pages xiii-xiv). The second quotations are from the last chapter (page 366 and pages 368-369.
Contrary to popular belief, evolution is not driven by natural selection alone. Many aspects of evolutionary change are indeed facilitated by natural selection, but all populations are influenced by nonadaptive forces of mutation, recombination, and random genetic drift. These additional forces are not simple embellishments around a primary axis of selection, but are quite the opposite—they dictate what natural selection can and cannot do. Although this basic principle has been known for a long time, it is quite remarkable that most biologists continue to interpret nearly aspect of biodiversity as an outcome of adaptive processes. This blind acceptance of natural selection as the only force relevant to evolution has led to a lot of sloppy thinking, and is probably the primary reason why evolution is viewed as a soft science by much of society.

A central point to be explained in this book is that most aspects of evolution at the genome level cannot be fully explained in adaptive terms, and moreover, that many features could not have emerged without a near-complete disengagement of the power of natural selection. This contention is supported by a wide array of comparative data, as well as by well-established principles of population genetics. However, even if such support did not exist, there is an important reason for pursuing nonadaptive (neutral) models of evolution. If one wants to confidently invoke a specific adaptive scenario to explain an observed pattern of comparative data, then an ability to reject a hypothesis based entirely on the nonadaptive forces of evolution is critical.

The blind worship of natural selection is not evolutionary biology. It is arguably not even science.

Michael Lynch
Despite the tremendous theoretical and physical resources now available, the field of evolutionary biology continues to be widely perceived as a soft science. Here I am referring not to the problems associated with those pushing the view that life was created by an intelligent designer, but to a more significant internal issue: a subset of academics who consider themselves strong advocates of evolution but who see no compelling reason to probe the substantial knowledge base of the field. Although this is a heavy charge, it is easy to document. For example, in his 2001 presidential address to the Society for the Study of Evolution, Nick Barton presented a survey that demonstrated that about half of the recent literature devoted to evolutionary issues is far removed from mainstream evolutionary biology.

With the possible exception of behavior, evolutionary biology is treated unlike any other science. Philosophers, sociologists, and ethicists expound on the central role of evolutionary theory in understanding our place in the world. Physicists excited about biocomplexity and computer scientists enamored with genetic algorithms promise a bold new understanding of evolution, and similar claims are made in the emerging field of evolutionary psychology (and its derivatives in political science, economics, and even the humanities). Numerous popularizers of evolution, some with careers focused on defending the teaching of evolution in public schools, are entirely satisfied that a blind adherence to the Darwinian concept of natural selection is a license for such activities. A commonality among all these groups is the near-absence of an appreciation of the most fundamental principles of evolution. Unfortunately, this list extends deep within the life sciences.


... the uncritical acceptance of natural selection as an explanatory force for all aspects of biodiversity (without any direct evidence) is not much different than invoking an intelligent designer (without any direct evidence). True, we have actually seen natural selection in action in a number of well-documented cases of phenotypic evolution (Endler 1986; Kingsolver et al. 2001), but it is a leap to assume that selection accounts for all evolutionary change, particularly at the molecular and cellular levels. The blind worship of natural selection is not evolutionary biology. It is arguably not even science. Natural selection is just one of several evolutionary mechanisms, and the failure to realize this is probably the most significant impediment to a fruitful integration of evolutionary theory with molecular, cellular, and developmental biology.

Natural selection is just one of several evolutionary mechanisms, and the failure to realize this is probably the most significant impediment to a fruitful integration of evolutionary theory with molecular, cellular, and developmental biology.It should be emphasized here that the sins of panselectionism are by no means restricted to developmental biology, but simply follow the tradition embraced by many areas of evolutionary biology itself, including paleontology and evolutionary ecology (as cogently articulated by Gould and Lewontin in 1979). The vast majority of evolutionary biologists studying morphological, physiological, and or behavioral traits almost always interpret the results in terms of adaptive mechanisms, and they are so convinced of the validity of this approach that virtually no attention is given to the null hypothesis of neutral evolution, despite the availability of methods to do so (Lande 1976; Lynch and Hill 1986; Lynch 1994). For example, in a substantial series of books addressed to the general public, Dawkins (e,g., 1976, 1986, 1996, 2004) has deftly explained a bewildering array of observations in terms of hypothetical selection scenarios. Dawkins's effort to spread the gospel of the awesome power of natural selection has been quite successful, but it has come at the expense of reference to any other mechanisms, and because more people have probably read Dawkins than Darwin, his words have in some ways been profoundly misleading. To his credit, Gould, who is also widely read by the general public, frequently railed against adaptive storytelling, but it can be difficult to understand what alternative mechanisms of evolution Gould had in mind.

Tuesday, July 28, 2015

I never expected this!

David Klinghoffer writes at Evolution News & Views (sic): In The New Yorker, Tom Wolfe Compares Persecution of Intelligent Design Advocates to the "Spanish Inquisition".
Interviewed by The New Yorker earlier this year, the great novelist and journalist Tom Wolfe acknowledged that he's writing a book about evolution -- actually, "a history of the theory of evolution from the nineteenth century to the present." No indication of what his overall thesis might be, but he "invokes the Spanish Inquisition when discussing how academics have cast out proponents of intelligent design for 'not believing in evolution the right way.'"

On the total length of all DNA molecules on the planet

If you were to line up all the DNA molecules from all the individuals in all the species on Earth, how long would it be? This is a kind of "Fermi question" or "Fermi problem." You should be able to estimate an answer based on what you know and reasonable assumptions.

Michael Lynch has a crude estimate in his book The Origins of Genome Architecture. Without reading the book, can you come up with an estimate of your own? Is it larger than the circumference of the Earth? Larger than the distance to Pluto? Longer than the distance to the nearest star (other than the sun) or the the center of the galaxy? Would the string of DNA molecules stretch to the nearest large galaxy (Andromeda)? Or, would it be even longer than that?

In case you've forgotten everything you once knew about the structure of DNA, here's a brief refresher: The Three-Dimensional Structure of DNA.

You may assume that all of the DNA molecules are in the standard B-form with the dimensions shown in the figure.

I will not accept any answers in archaic measurement units like leagues, miles, yojana, or cubits.

Readings from Trends in Biochemical Sciences on the Central Dogma

I'm re-reading The Inside Story edited by Jan Witkowski, the former editor-in-chief of Trends in Biochemical Sciences (TIBS). The book is a collection of essays that appeared in the journal. The collection centers around "the theme of the Central Dogma of molecular biology." Here's how Jan Witkowski describes the collection in the preface (page xii)...
When I came to look more closely, it was clear that the area the articles covered most comprehensively, where the most interesting selection could be made, was the Central Dogma, that is DNA, RNA, and protein synthesis. And the number of relevant articles was just right for the size of book we had in mind.
This explains the subtitle of the book, "DNA to RNA to Protein."

This is not going to be another complaint about misinterpretations of the Central Dogma. Quite the contrary, as we shall see.

The Forward was written by Tim Hunt who was the editor-in-chief from 1992-2000. He refers to "The General Idea."
"Jim, you might say, had it first. DNA makes RNA makes protein. That became the general idea." Thus did Francis Crick explain to Horace Judson years later, long after he had written with such clarity and force on the subject of protein synthesis in the 1958 Symposium on "The Biological Replication of Macromolecules" [see Crick, 1959). This article is celebrated for its prediction of the existence of tRNA (although by the time the article appeared in print, tRNA had been discovered), but it is chiefly worth reading and rereading, even today, for its enunciation of the two principles that together constitute the "General Idea." The first principle is the Sequence Hypothesis; the idea that the sequence of amino acids in proteins is specified by the sequence of bases in DNA and RNA. The second principle is the famous "Central Dogma"; not DNA makes RNA makes Protein, but the assertion that "Once information has passed into protein it cannot get out again." It isn't completely clear why one is a hypothesis and the other a dogma and the two together an idea. The Dogma stuck in some throats, mainly because it was called a dogma, with heavy religious overtones.
I quote Tim Hunt to show that there are some knowledgeable scientists who understand the Central Dogma [see The Central Dogma of Molecular Biology].

Hunt continues ...
Crick explains that calling it a dogma was a misunderstanding on his part: he thought the word stood for "an idea for which there was no reasonable evidence," blaming his "curious religious upbringing" for the error. But it probably wasn't that much of a mistake after all, for the Oxford Dictionary allows dogma to mean simply a principle, although the alternative "Arrogant declaration of opinion" is probably how most people who were not molecular biologists took it, considering its never modest author. That is probably how they were meant to take it, too. It was the most important article of faith among the circle of biologists centered on Watson and Crick and remained so for quite a long time until the mechanism of protein synthesis became clear. Crick said that if you did not subscribe to the sequence hypothesis and the central dogma "you generally ended up in the wilderness," although he did not offer alternative scenarios for public consumption, even though they probably played an important part in convincing him of the dogmatic status of the General Idea's second component.
This is the concept that I "grew up" with as a graduate student in the late 1960s. We saw the "General Idea" as an important concept and a way of understanding the data that was coming out of many labs working on DNA replication, transcription, and protein synthesis. We knew, especially after 1970 (Crick, 1970), that RNA could be used as a template to make DNA and that there were many types of RNA other than messenger RNA. We also knew that Francis Crick was a very smart man and it was unwise to disagree with him because he was usually right about big ideas.

Fig. 1. Information flow and the sequence hypothesis. These diagrams of potential information flow were used by Crick (1958) to illustrate all possible transfers of information (left) and those that are permitted (right). The sequence hypothesis refers to the idea that information encoded in the sequence of nucleotides specifies the sequence of amino acids in the protein.
At some point in the last 40 year the "General Idea" has been subverted in two ways.
  1. The Sequence Hypothesis has come to be interpreted as the Central Dogma. This is mostly due to Jim Watson who propagated this misinterpretation in his Molecular Biology of the Gene textbook.
  2. The Central Dogma is taken to mean that the ONLY important information in the genome is that which encodes proteins. It's assumed, incorrectly, that Crick meant to say that the role of all genes is to encode proteins.
One of the essays in The Inside Story is "Forty Years under the Central Dogma," published in 1998. The authors are Denis Thieffry and Sahotra Sarkar (Thieffry and Sarkar, 1998).

Here's how they explain some of the confusion about the Central Dogma ...
The most obvious interpretation of Crick’s original (1958) formulation of the Central Dogma is in negative terms. The Central Dogma only forbids a few types of information transfer, namely, from proteins to proteins and from proteins to nucleic acids. However, after its rapid adoption by most of the biologists interested in protein synthesis, it was most often interpreted or reformulated in a more restrictive way, constricting the flow of information from DNA to RNA and from RNA to protein (Fig. 1).

Figure 1 The Central Dogma as envisioned by Watson in 1965. ‘We should first look at the evidence that DNA itself is not the direct template that orders amino acid sequences. Instead, the genetic information of DNA is transferred to another class of molecules, which then serve as the protein templates. These intermediate templates are molecules of ribonucleic acid (RNA)...Their relation to DNA and protein is usually summarised by the formula (often called the central dogma).'

According to Watson’s autobiography, he had already derived this ‘formula’ (Fig. 1) in 1952. In fact, such schemes were commonly entertained during the early 1950s, at least among the biologists interested in protein synthesis. ... Much more restrictive than Crick’s original statement, Watson’s formula was immediately confronted with a series of possible exceptions, some of which are mentioned below. Crick, meanwhile, remained rather cautious in his interpretation of the Central Dogma. On several occasions, he felt it necessary to come back to his original idea and explicate what he thought to be its correct interpretation. For example, in 1970, Crick devoted a paper specifically to the Central Dogma, including a diagram reportedly conceived (but not published) in 1958.[see the figure at the top of this page]
The authors recognize several challenges to the Central Dogma, at least to the version preferred by Watson. There were two discoveries in the 1960s that seemed to threaten the Central Dogma. The first was the discovery that the genetic material of some viruses (e.g. TMV) was RNA, not DNA. The second was the discovery that RNA could be copied into DNA by reverse transcriptase. This was not a problem for Crick ....
These findings prompted Crick to write his 1970 piece for Nature, in which he explicitly showed how the new facts fitted into his scheme.
It's difficult to evaluate the importance of the Central Dogma in the 21st century because so many scientists don't understand it. The incorrect version seems to mostly serve as a whipping boy to promote "new" ideas that overthrow the strawman version of the Central Dogma.

Back in 1998, the authors of this article asked Crick what he thought of the Central Dogma ...
In a recent answer to a question addressing the relevance of these challenges, Crick stated that he still believes in the value of the Central Dogma today (F.H.C. Crick, pers. commun.). However, he also acknowledges the existence of various exceptions, most of which he regards as minor. For him, the most significant exception is RNA editing. Still, according to Crick, simplifications of the Central Dogma in terms such as ‘DNA makes RNA and RNA makes protein’ were clearly inadequate from the beginning.

Crick, F.H.C. (1958) On protein synthesis. Symp. Soc. Exp. Biol. XII:138-163. [PDF]

Crick, F. (1970) Central Dogma of Molecular Biology. Nature 227, 561-563. [PDF file]

Thieffry, D. and Sarkar, S. (1998) "Forty years under the central dogma." Trends in Biochemical Sciences 23:312–316. [doi: 10.1016/S0968-0004(98)01244-4}

Monday, July 27, 2015

More confusion about the central dogma of molecular biology

I was doing some reading on lncRNAs (long non-coding RNAs) in order to find out how many of them had been assigned real biological functions. My reading was prompted by the one of the latest updates to the human genome sequence; namely, assembly GRCh38.p3 from June 2015. The Ensembl website lists 14,889 lncRNA genes but I'm sure that most of these are just speculative [Ensembl Whole Genome].

The latest review by my colleagues here in the biochemistry department at the University of Toronto (Toronto, Canada), concludes that only a small fraction of these putative lncRNAs have a function (Palazzo and Lee, 2015). They point out that in the absence of evidence for function, the null hypothesis is that these RNAs are junk and the genes don't exist. That's not the view that annotators at Ensembl take.

I stumbled across a paper by Ling et al. (2015) that tries to make a case for function. I don't think their case is convincing but that's not what I want to discuss. I want to discuss their view of the Central Dogma of Molecular Biology. Here's the abstract ...
The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer.
This is getting to be a familiar refrain. I understand how modern scientists might be confused about the difference between the Watson and the Crick versions of the Central Dogma [see The Central Dogma of Molecular Biology]. Many textbooks perpetuate the myth that Crick's sequence hypothesis is actually the Central Dogma. That's bad enough but lots of researchers seem to think that their false view of the Central Dogma goes even further. They think it means that the ONLY kind of genes in your genome are those that produce mRNA and protein.

I don't understand how such a ridiculous notion could arise but it must be a common misconception, otherwise why would these authors think that non-coding RNAs are a challenge to the Central Dogma? And why would the reviewers and editors think this was okay?

I'm pretty sure that I've interpreted their meaning correctly. Here's the opening sentences of the introduction to their paper ...
The Encyclopedia of DNA Elements (ENCODE) project has revealed that at least 75% of the human genome is transcribed into RNAs, while protein-coding genes comprise only 3% of the human genome. Because of a long-held protein-centered bias, many of the genomic regions that are transcribed into non-coding RNAs (ncRNAs) had been viewed as ‘junk’ in the genome, and the associated transcription had been regarded as transcriptional ‘noise’ lacking biological meaning.
They think that the Central Dogma is a "protein-centered bias." They think the Central Dogma rules out genes that specify noncoding RNAs. (Like tRNA and ribosomal RNA?)

Later on they say ....
The protein-centered dogma had viewed genomic regions not coding for proteins as ‘junk’ DNA. We now understand that many lncRNAs are transcribed from ‘junk’ regions, and even those encompassing transposons, pseudogenes and simple repeats represent important functional regulators with biological relevance.
It's simply not true that scientists in the past viewed all noncoding DNA as junk, at least not knowledgeable scientists [What's in Your Genome?]. Furthermore, no knowledgeable scientists ever interpreted the Central Dogma of Molecular Biology to mean that the only functional genes in a genome were those that encoded proteins.

Apparently Lee, Vincent, Picler, Fodde, Berindan-Neagoe, Slack, and Calin knew scientists who DID believe such nonsense. Maybe they even believed it themselves.

Judging by the frequency with with such statements appear in the scientific literature, I can only assume that this belief is widespread among biochemists and molecular biologists. How in the world did this happen? How many Sandwalk readers were taught that the Central Dogma rules out all genes for noncoding RNAs? Did you have such a protein-centered bias about the role of genes? Who were your teachers?

Didn't anyone teach you who won the Nobel Prize in 1989? Didn't you learn about snRNAs? What did you think RNA polymerases I and III were doing in the cell?

Ling, H., Vincent, K., Pichler, M., Fodde, R., Berindan-Neagoe, I., Slack, F.J., and Calin, G.A. (2015) Junk DNA and the long non-coding RNA twist in cancer genetics. Oncogene (published online January 26, 2015) [PDF]

Palazzo, A.F. and Lee, E.S. (2015) Non-coding RNA: what is functional and what is junk? Frontiers in genetics 6: 2 (published online January 26, 2015 [Abstract]

Friday, July 24, 2015

John Parrington talks about The Deeper Genome

Here's a video from Oxford Press where you can hear John Parrington describe some of the ideas in his book: The Deeper Genome: Why there is more to the human genome than meets the eye.

John Parrington discusses genome sequence conservation

John Parrington has written a book called, The Deeper Genome: Why there is more to the human genome than meets the eye. He claims that most of our genome is functional, not junk. I'm looking at how his arguments compare with Five Things You Should Know if You Want to Participate in the Junk DNA Debate

There's one post for each of the five issues that informed scientists need to address if they are going to write about the amount of junk in you genome. This is the last one.

1. Genetic load
John Parrington and the genetic load argument
2. C-Value paradox
John Parrington and the c-value paradox
3. Modern evolutionary theory
John Parrington and modern evolutionary theory
4. Pseudogenes and broken genes are junk
John Parrington discusses pseudogenes and broken genes
5. Most of the genome is not conserved (this post)
John Parrington discusses genome sequence conservation

5. Most of the genome is not conserved

There are several places in the book where Parrington address the issue of sequence conservation. The most detailed discussion is on pages 92-95 where he discusses the criticisms leveled by Dan Graur against ENCODE workers. Parrington notes that 9% of the human genome is conserved and recognizes that this is a strong argument for function. It implies that >90% of our genome is junk.

Here's how Parrington dismisses this argument ...
John Mattick and Marcel Dinger ... wrote an article for the HUGO Jounral, official journal of the Human Genome Organisation, entitled "The extent of functionality in the human genome." ... In response to the accusation that the apparent lack of sequence conservation of 90 per cent of the genome means that it has no function, Mattick and Dinger argued that regulatory elements and noncoding RNAs are much more relaxed in their link between structure and function, and therefore much harder to detect by standard measures of function. This could mean that 'conservation is relative', depending on the type of genomic structure being analyzed.
In other words, a large part of our genome (~70%?) could be producing functional regulatory RNAs whose sequence is irrelevant to their biological function. Parrington then writes a full page on Mattick's idea that the genome is full of genes for regulatory RNAs.

The idea that 90% of our genome is not conserved deserves far more serious treatment. In the next chapter (Chapter 7), Parrington discusses the role of RNA in forming a "scaffold" to organize DNA in three dimensions. He notes that ...
That such RNAs, by virtue of their sequence but also their 3D shape, can bind DNA, RNA, and proteins, makes them ideal candidates for such a role.
But if the genes for these RNAs make up a significant part of the genome then that means that some of their sequences are important for function. That has genetic load implications and also implications about conservation.

If it's not a "significant" fraction of the genome then Parrington should make that clear to his readers. He knows that 90% of our genome is not conserved, even between individuals (page 142), and he should know that this is consistent with genetic load arguments. However, almost all of his main arguments against junk DNA require that the extra DNA have a sequence-specific function. Those facts are not compatible. Here's how he justifies his position ...
Those proposing a higher figure [for functional DNA] believe that conservation is an imperfect measure of function for a number of reasons. One is that since many non-coding RNAs act as 3D structures, and because regulatory DNA elements are quite flexible in their sequence constraints, their easy detection by sequence conservation methods will be much more difficult than for protein-coding regions. Using such criteria, John Mattick and colleagues have come up with much higher figures for the amount of functionality in the genome. In addition, many epigenetic mechanisms that may be central for genome function will not be detectable through a DNA sequence comparison since they are mediated by chemical modifications of the DNA and its associated proteins that do not involve changes in DNA sequence. Finally, if genomes operate as 3D entities, then this may not be easily detectable in terms of sequence conservation.
This book would have been much better if Parrington had put some numbers behind his speculations. How much of the genome is responsible for making functional non-coding RNAs and how much of that should be conserved in one way of another? How much of the genome is devoted to regulatory sequences and what kind of sequence conservation is required for functionality? How much of the genome is required for "epigenetic mechanisms" and how do they work if the DNA sequence is irrelevant?

You can't argue this way. More than 90% of our genomes is not conserved—not even between individuals. If a good bit of that DNA is, nevertheless, functional, then those functions must not have anything to do with the sequence of the genome at those specific sites. Thus, regions that specify non-coding RNAs, for example, must perform their function even though all the base pairs can be mutated. Same for regulatory sequences—the actual sequence of these regulatory sequences isn't conserved according to John Parrington. This requires a bit more explanation since it flies on the face of what we know about function and regulation.

Finally, if you are going to use bulk DNA arguments to get around the conflict then tell us how much of the genome you are attributing to formation of "3D entities." Is it 90%? 70%? 50%?

John Parrington discusses pseudogenes and broken genes

We are discussing Five Things You Should Know if You Want to Participate in the Junk DNA Debate and how they are described in John Parrington's book The Deeper Genome: Why there is more to the human genome than meets the eye. This is the fourth of five posts.

1. Genetic load
John Parrington and the genetic load argument
2. C-Value paradox
John Parrington and the c-value paradox
3. Modern evolutionary theory
John Parrington and modern evolutionary theory
4. Pseudogenes and broken genes are junk (this post)
John Parrington discusses pseudogenes and broken genes
5. Most of the genome is not conserved
John Parrington discusses genome sequence conservation

4. Pseudogenes and broken genes are junk

Parrington discusses pseudogenes at several places in the book. For example, he mentions on page 72 that both Richard Dawkins and Ken Miller have used the existence of pseudogenes as an argument against intelligent design. But, as usual, he immediately alerts his readers to other possible explanations ...
However, using the uselessness of so much of the genome for such a purpose is also risky, for what if the so-called junk DNA turns out to have an important function, but one that hasn't yet been identified.
This is a really silly argument. We know what genes look like and we know what broken genes look like. There are about 20,000 former protein-coding pseudogenes in the human genome. Some of them arose recently following a gene duplication or insertion of a cDNA copy. Some of them are ancient and similar pseudogenes are found at the same locations in other species. They accumulate mutations at a rate consistent with neutral theory and random genetic drift. (This is a demonstrated fact.)

It's ridiculous to suggest that a significant proportion of those pseudogenes might have an unknown important function. That doesn't rule out a few exceptions but, as a general rule, if it looks like a broken gene and acts like a broken gene, then chances are pretty high that it's a broken gene.

As usual, Parrington doesn't address the big picture. Instead he resorts to the standard ploy of junk DNA proponents by emphasizing the exceptions. He devotes more that two full pages (pages 143-144) to evidence that some pseudogenes have acquired a secondary function.
The potential pitfalls of writing off elements in the genome as useless or parasitical has been demonstrated by a recent consideration of the role of pseudgogenes. ... recent studies are forcing a reappraisal of the functional role of these 'duds."
Do you think his readers understand that even if every single broken gene acquired a new function that would still only account for less than 2% of the genome?

There's a whole chapter dedicated to "The Jumping Genes" (Chapter 8). Parrington notes that 45% of our genome is composed of transposons (page 119). What are they doing in our genome? They could just be parasites (selfish DNA), which he equates with junk. However, Parrrington prefers the idea that they serve as sources of new regulatory elements and they are important in controlling responses to environmental pressures. They are also important in evolution.

As usual, there's no discussion about what fraction of the genome is functional in this way but the reader is left with the impression that most of that 45% may not be junk or parasites.

Most Sandwalk readers know that almost all of the transposon-related sequences are bits and pieces of transposons that haven't bee active for millions of years. They are pseudogenes. They look like broken transposon genes, they act like broken genes, and they evolve like broken transposons. It's safe to assume that's what they are. This is junk DNA and it makes up almost half of our genome.

John Parrington never mentions this nasty little fact. He leaves his readers with the impression that 45% of our genome consists of active transposons jumping around in our genome. I assume that this is what he believes to be true. He has not read the literature.

Chapter 9 is about epigenetics. (You knew it was coming, didn't you?) Apparently, epigentic changes can make the genome more amenable to transposition. This opens up possible functional roles for transposons.
As we've seen, stress may enhance transposition and, intriguingly, this seems to be linked to changes in the chromatin state of the genome, which permits repressed transposons to become active. It would be very interesting if such a mechanism constituted a way for the environment to make a lasting, genetic mark. This would be in line with recent suggestions that an important mechanism of evolution is 'genome resetting'—the periodic reorganization of the genome by newly mobile DNA elements, which establishes new genetic programs in embryo development. New evidence suggests that such a mechanism may be a key route whereby new species arise, and may have played an important role in the evolution of humans from apes. This is very different from the traditional view of evolution being driven by the gradual accumulation of mutations.
It was at this point, on page 139, that I realized I was dealing with a scientist who was in way over his head.

Parrington returns to this argument several times in his book. For example, in Chapter 10 ("Code, Non-code, Garbage, and Junk") he says ....
These sequences [transpsons] are assumed to be useless, and therefore their rate of mutation is taken to taken to represent a 'neutral' reference; however, as John Mattick and his colleague Marcel Dinger of the Garvan Institute have pointed out, a flaw in such reasoning is 'the questionable proposition that transposable elements, which provide the major source of evolutionary plasticity and novelty, are largely non-functional. In fact, as we saw in Chapter 8, there is increasing evidence that while transposons may start off as molecular parasites, they can also play a role in the creation of new regulatory elements, non-coding RNAs, and other such important functional components of the genome. It is this that has led John Stamatoyannopoulos to conclude that 'far from being an evolutionary dustbin, transposable elements appear to be active and lively members of the genomic regulatory community, deserving of the same level of scrutiny applied to other genic or regulatory features. In fact, the emerging role for transposition in creating new regulatory mechanisms in the genome challenges the very idea that we can divide the genome into 'useful' and 'junk' coomponents.
Keep in mind that active transposons represent only a tiny percentage of the human genome. About 50% of the genome consists of transposon flotsam and jetsam—bits and pieces of broken transposons. It looks like junk to me.

Why do all opponents of junk DNA argue this way without putting their cards on the table? Why don't they give us numbers? How much of the genome consists of transposon sequences that have a biological function? Is it 50%, 20%, 5%?

John Parrington and modern evolutionary theory

We are continuing our discussion of John Parrington's book The Deeper Genome: Why there is more to the human genome than meets the eye. This is the third of five posts on: Five Things You Should Know if You Want to Participate in the Junk DNA Debate

1. Genetic load
John Parrington and the genetic load argument
2. C-Value paradox
John Parrington and the c-value paradox
3. Modern evolutionary theory (this post)
John Parrington and modern evolutionary theory
4. Pseudogenes and broken genes are junk
John Parrington discusses pseudogenes and broken genes
5. Most of the genome is not conserved
John Parrington discusses genome sequence conservation

3. Modern evolutionary theory

You can't understand the junk DNA debate unless you've read Michael Lynch's book The Origins of Genome Architecture. That means you have to understand modern population genetics and the role of random genetic drift in the evolution of genomes. There's no evidence in Parrington's book that he has read The Origins of Genome Architecture and no evidence that he understands modern evolutionary theory. The only evolution he talks about is natural selection (Chapter 1).

Here's an example where he demonstrates adaptationist thinking and the fact that he hasn't read Lynch's book ...
At first glance, the existence of junk DNA seems to pose another problem for Crick's central dogma. If information flows in a one-way direction from DNA to RNA to protein, then there would appear to be no function for such noncoding DNA. But if 'junk DNA' really is useless, then isn't it incredibly wasteful to carry it around in our genome? After all, the reproduction of the genome that takes place during each cell division uses valuable cellular energy. And there is also the issue of packaging the approximately 3 billion base pairs of the human genome into the tiny cell nucleus. So surely natural selection would favor a situation where both genomic energy requirements and packaging needs are reduced fiftyfold?1
Nobody who understands modern evolutionary theory would ask such a question. They would have read all the published work on the issue and they would know about the limits of natural selection and why species can't necessarily get rid of junk DNA even if it seems harmful.

People like that would also understand the central dogma of molecular biology.

1. He goes on to propose a solution to this adaptationist paradox. Apparently, most of our genome consists of parasites (transposons), an idea he mistakenly attributes to Richard Dawkins' concept of The Selfish Gene. Parrington seems to have forgotten that most of the sequence of active transposons consists of protein-coding genes so it doesn't work very well as an explanation for excess noncoding DNA.

John Parrington and the C-value paradox

We are discussing John Parrington's book The Deeper Genome: Why there is more to the human genome than meets the eye. This is the second of five posts on: Five Things You Should Know if You Want to Participate in the Junk DNA Debate

1. Genetic load
John Parrington and the genetic load argument
2. C-Value paradox (this post)
John Parrington and the c-value paradox
3. Modern evolutionary theory
John Parrington and modern evolutionary theory
4. Pseudogenes and broken genes are junk
John Parrington discusses pseudogenes and broken genes
5. Most of the genome is not conserved
John Parrington discusses genome sequence conservation

2. C-Value paradox

Parrington addresses this issue on page 63 by describing experiments from the late 1960s showing that there was a great deal of noncoding DNA in our genome and that only a few percent of the genome was devoted to encoding proteins. He also notes that the differences in genome sizes of similar species gave rise to the possibility that most of our genome was junk. Five pages later (page 69) he reports that scientists were surprised to find only 30,000 protein-coding genes when the sequence of the human genome was published—"... the other big surprise was how little of our genomes are devoted to protein-coding sequence."

Contradictory stuff like that makes it every hard to follow his argument. On the one hand, he recognizes that scientists have known for 50 years that only 2% of our genome encodes proteins but, on the other hand, they were "surprised" to find this confirmed when the human genome sequence was published.

He spends a great deal of Chapter 4 explaining the existence of introns and claims that "over 90 per cent of our genes are alternatively spliced" (page 66). This seems to be offered as an explanation for all the excess noncoding DNA but he isn't explicit.

In spite of the fact that genome comparisons are a very important part of this debate, Parrington doesn't return to this point until Chapter 10 ("Code, Non-code, Garbage, and Junk").

We know that the C-Value Paradox isn't really a paradox because most of the excess DNA in various genomes is junk. There isn't any other explanation that makes sense of the data. I don't think Parrington appreciates the significance of this explanation.

The examples quoted in Chapter 10 are the lungfish, with a huge genome, and the pufferfish (Fugu), with a genome much smaller than ours. This requires an explanation if you are going to argue that most of the human genome is functional. Here's Parrington's explanation ...
Yet, despite having a genome only one eighth the size of ours, Fugu possesses a similar number of genes. This disparity raises questions about the wisdom of assigning functionality to the vast majority of the human genome, since, by the same token, this could imply that lungfish are far more complex than us from a genomic perspective, while the smaller amount of non-protein-coding DNA in the Fugu genome suggests the loss of such DNA is perfectly compatible with life in a multicellular organism.

Not everyone is convinced about the value of these examples though, John Mattick, for instance, believes that organisms with a much greater amount of DNA than humans can be dismissed as exceptions because they are 'polyploid', that is, their cells have far more than the normal two copies of each gene, or their genomes contain an unusually high proportion of inactive transposons.
In other words, organisms with larger genomes seem to be perfectly happy carrying around a lot of junk DNA! What kind of an argument is that?
Mattick is also not convinced that Fugu provides a good example of a complex organism with no non-coding DNA. Instead, he points out that 89% of this pufferfish's DNA is still non-protein-coding, so the often-made claim that this is an example of a multicellular organism without such DNA is misleading.
[Mattick has been] a true visionary in his field; he has demonstrated an extraordinary degree of perseverance and ingenuity in gradually proving his hypothesis over the course of 18 years.

Hugo Award Committee
Seriously? That's the best argument he has? He and Mattick misrepresent what scientists say about the pufferfish genome—nobody claims that the entire genome encodes proteins—then they ignore the main point; namely, why do humans need so much more DNA? Is it because we are polyploid?

It's safe to say that John Parrington doesn't understand the C-value argument. We already know that Mattick doesn't understand it and neither does Jonathan Wells, who also wrote a book on junk DNA [John Mattick vs. Jonathan Wells]. I suppose John Parrington prefers to quote Mattick instead of Jonathan Wells—even though they use the same arguments—because Mattick has received an award from the Human Genome Organization (HUGO) for his ideas and Wells hasn't [John Mattick Wins Chen Award for Distinguished Academic Achievement in Human Genetic and Genomic Research].

For further proof that Parrington has not done his homework, I note that the Onion Test [The Case for Junk DNA: The onion test ] isn't mentioned anywhere in his book. When people dismiss or ignore the Onion Test, it usually means they don't understand it. (For a spectacular example of such misunderstanding, see: Why the "Onion Test" Fails as an Argument for "Junk DNA").

Five things John Parrington should discuss if he wants to participate in the junk DNA debate

It's frustrating to see active scientists who think that most of our genome could have a biological function but who seem to be completely unaware of the evidence for junk. Most of the positive evidence for junk is decades old so there's no excuse for such ignorance.

I wrote a post in 2013 to help these scientists understand the issues: Five Things You Should Know if You Want to Participate in the Junk DNA Debate. It was based in a talk I gave at the Evolutionary Biology meeting in Chicago that year.1 Let's look at John Parrington's new book to see if he got the message [Hint: he didn't].

There's one post for each of the five issues that informed scientists need to address if they are going to write about the amount of junk in your genome.

1. Genetic load
John Parrington and the genetic load argument
2. C-Value paradox
John Parrington and the c-value paradox
3. Modern evolutionary theory
John Parrington and modern evolutionary theory
4. Pseudogenes and broken genes are junk
John Parrington discusses pseudogenes and broken genes
5. Most of the genome is not conserved
John Parrington discusses genome sequence conservation

1. It hasn't seemed to help very much.

John Parrington and the genetic load argument

We are discussing John Parrington's book The Deeper Genome: Why there is more to the human genome than meets the eye. This is the first of five posts on: Five Things You Should Know if You Want to Participate in the Junk DNA Debate

1. Genetic load (this post)
John Parrington and the genetic load argument
2. C-Value paradox
John Parrington and the c-value paradox
3. Modern evolutionary theory
John Parrington and modern evolutionary theory
4. Pseudogenes and broken genes are junk
John Parrington discusses pseudogenes and broken genes
5. Most of the genome is not conserved
John Parrington discusses genome sequence conservation

1. Genetic load

The genetic load argument has been around for 50 years. It's why experts did not expect a huge number of genes when the genome sequence was published. It's why the sequence of most of our genome must be irrelevant from an evolutionary perspective.

This argument does not rule out bulk DNA hypotheses but it does rule out all those functions that require specific sequences in order to confer biological function. This includes the speculation that most transcripts have a function and it includes the speculation that there's a vast amount of regulatory sequence in our genome. Chapter 5 of The Deeper Genome is all about the importance of regulatory RNAs.
So, starting from a failed attempt top turn a petunia purple, the discovery of RNA interference has revealed a whole new network of gene regulation mediated by RNAs and operating in parallel to the more established one of protein regulatory factors. ... Studies have revealed that a surprising 60 per cent of miRNAs turn out to be recycled introns, with the remainder being generated from the regions between genes. Yet these were parts of the genome formerly viewed as junk. Does this mean we need a reconsideration of this question? This is an issue we will discuss in Chapter 6, in particular with regard to the ENCODE project ...
The implication here is that a substantial part of the genome is devoted to the production of regulatory RNAs. Presumably, the sequences of those RNAs are important. But this conflicts with the genetic load argument unless we're only talking about an insignificant fraction of the genome.

But that's only one part of Parrington's argument against junk DNA. Here's the summary from the last Chapter ("Conclusion") ...
As we've discussed in this book, a major part of the debate about the ENCODE findings has focused on the question of what proportion of the genome is functional. Given that the two sides of this debate use quite different criteria to assess functionality it is likely that it will be some time before we have a clearer idea about who is the most correct in this debate. Yet, in framing the debate in this quantitative way, there is a danger that we might lose sight of an exciting qualitative shift that has been taking place in biology over the past decade or so. So a previous emphasis on a linear flow of information, from DNA to RNA to protein through a genetic code, is now giving way to a much more complex picture in which multiple codes are superimposed on one another. Such a viewpoint sees the gene as more than just a protein-coding unit; instead it can equally be seen as an accumulation of chemical modifications in the DNA or its associated histones, a site for non-coding RNA synthesis, or a nexus in a 3D network. Moreover, since we now know that multiple sites in the genome outside the protein-coding regions can produce RNAs, and that even many pseudo-genes are turning out to be functional, the very question of what constitutes a gene is now being challenged. Or, as Ed Weiss at the University of Pennsylvania recently put it, 'the concept of a gene is shredding.' Such is the nature of the shift that now we face the challenge of not just recognizing the true scale of this complexity, but explaining how it all comes together to make a living, functioning, human being.
I've already addressed some of the fuzzy thinking in this paragraph [The fuzzy thinking of John Parrington: The Central Dogma and The fuzzy thinking of John Parrington: pervasive transcription]. The point I want to make here is that Parrington's arguments for function in the genome require a great deal of sequence information. They all conflict with the genetic load argument.

Parrington doesn't cover the genetic load argument at all in his book. I don't know why since it seems very relevant. We could not survive as a species if the sequence of most of our genome was important for biological function.