abc AustraliaJohn Mattick has just published a paper dealing with the controversy over the ENCODE results and junk DNA. As you might imagine, Mattick defends the idea that most of our genome is functional. He attempts to explain why most of the critics are wrong.
The title of the paper is "The extent of functionality in the human genome" (Mattick and Dinger, 2013). It's published in the HUGO Journal. Recall that HUGO (Human Genome Organization) gave Mattick a prestigious award for his contributions to genome research. (See The Dark Matter Rises for a discussion of these contributions.)
UPDATE: Mike White also discusses this paper at: Having your cake and eating it: more arguments over human genome function.
Mattick's paper begins by mentioning three of the papers that were critical of ENCODE results: Dan Graur's paper (Graur et al. 2013), Ford Doolittle's paper (Doolittle, 2013), and the paper by Niu and Jiang (2013).
He begins by addressing one of Dan Graur's points about conservation.
Let's cover a bit of background before dealing with Mattick.
Scientists have fifty years of experience looking at sequences. We have repeatedly observed that some sequences are conserved while other are not. Conserved sequences are remarkably correlated with functional regions of proteins, RNAs, and the genome. By contrast, non-conserved sequences almost always correlate with nucleic acid and amino acid sequences that are not essential for function. In the case of genomic sequences (DNA) these non-conserved sequences have often been tested and, with only a few exceptions, no evidence of function has been discovered. (Promoter bashing experiments are a good example.)
In a few cases, large regions of the genome have been deleted with no apparent effect on the organism. In other cases, considerable variation within a population is observed (e.g. humans) and the absence of some stretches of DNA in some individuals does not seem to affect these individuals. Thus, deleting non-conserved DNA doesn't appear to affect fitness suggesting strongly that it is nonfunctional.
& Junk DNASome parts of the human genome resemble functional genes and transposons but all available evidence indicates that these regions no longer function like the genes and transposons they resemble. They appear to be pseudogenes and defective transposons (or fragments of transposons). By looking at well-identified orthologs in different species we can see that these pseudogenes have gained fixed mutations at the rate perfectly consistent with the rate of fixation of neutral alleles by random genetic drift.
Whole genome comparisons of mammalian genes also demonstrate that 90% of their genomes are not conserved and are evolving as though the nucleotide sequence was irrelevant. (Rate of fixation equals the mutation rate.) This observation is also consistent with genetic load data showing that about 90% of our genome can't be constrained by negative selection.
It is reasonable to conclude that most of the typical mammalian genome is not functional. The only other possibility is that a large percentage of these genomes is functional but the function has nothing to do with the actual sequence of DNA.
John Mattick does not like this line of reasoning. He says ...
the substantive scientific argument of Graur et al. is based primarily on the apparent lack of sequence conservation of the vast majority (~90%) of the human genome, suggesting that this indicates lack of selective constraint (and therefore function). The fundamental flaw, however, in this argument is that conservation is relative, and its estimation in the human genome is largely based on the questionable proposition that transposable elements, which provide the major source of evolutionary plasticity and novelty (Brosius 1999), are largely non-functional. This argument also overlooks a number of other assumptions and considerations that are tacitly embedded in conservation comparisons and their interpretation (Pheasant and Mattick 2007) ...Mattick raises five objections that, in his opinion, make the Graur et al. argument (and the data) invalid.
1. ... relative conservation implies function lack of discernible sequence imputes nothing.
This is nonsense. Lack of sequence conservation implies lack of function based on decades of work. It's true that there are some parts of the genome whose function doesn't depend on sequence but the burden of proof is on those who claim that most non-conserved sequences are functional.
2. regulatory sequences have more relaxed structure-function constraints than protein-coding sequences.
As a general rule, regulatory sequences consist of short conserved sequences that bind proteins. They are easy to identify and relatively easy to recognize. They are well conserved between related species. There may be a few exceptions but the general rule applies. Regulatory sequences are just as well conserved as typical amino acid codons in proteins. Mattick is wrong.
3. regulatory sequences are the main genetic substrates for the exploration of phenotypic diversity in animals.
It's true that phenotypic differences between species can often be explained by differences in otherwise conserved regulatory sequences.
4. the conclusion of lack of conservation of most of the human genome is largely based on a circular comparison with the rate of evolution of pan-mammalian ancient ‘repeats’
Mattick complains that the lack of conservation of genomic sequences is largely based on a circular argument. This is hard to understand. He says ...
... one assumes that a subset of the genome is evolving neutrally and is therefore indicative of the rate of unconstrained divergence, then finds that most of the rest of the genome is behaving similarly, which is therefore concluded to also be non-functional. If the first assumption is incorrect ... the derived conclusion of non-functionality of the rest of the genome is also incorrect.The logic seems relatively uncontroversial. Mattick is correct. If one assumes that part of the genome is evolving neutrally then the conclusion will be invalid if the assumption is incorrect.
The problem is that we have plenty of evidence that most of the genome is evolving neutrally so it's not an assumption. It's a fact. Maybe I don't understand this argument?
5. even if ancient repeats are neutrally evolving (which we think unlikely), the extant comparison set is restricted to those whose orthology is recognizable ...
This is true. We can only determine that pseudogenes and defective transposons are evolving neutrally if we know that the DNA regions in different species are orthologous. Fortunately, we have plenty of excellent examples. These allow us to deduce the common ancestor and determine the rate of fixation of allele in each lineage. They serve as good examples of fixation of neutral alleles by random genetic drift.
I'm not sure I understand why this is so important to Mattick.
The C-Value Paradox
Mattick correctly identifies the main argument for junk DNA based on genome size comparisons.
... the so-called ‘C-value enigma’ , which refers to the fact that some organisms (like some amoebae, onions, some arthropods, and amphibians) have much more DNA per cell than humans, but cannot possibly be more developmentally or cognitively complex, implying that eukaryotic genomes can and do carry varying amounts of unnecessary baggage.He argues that, while this may be true, the differences are often due to polyploidy or increases in the amount of defective transposon sequences. It's not clear to me why this invalidates the conclusion that some eukaryotes can carry a lot of junk in their genomes.
He then goes on to say ...
... there is a broadly consistent rise in the amount of non-protein-coding intergenic and intronic DNA with developmental complexity, a relationship that proves nothing but which suggests an association that can only be falsified by downward exceptions, of which there are none known.That "correlation" only exists in the mind of John Mattick. He mentions "downward exceptions" and says that there are none known. I don't know what he means by this. Does he mean that the minimum size of a vertebrate genome is defined by the pufferfish, with about 27,000 genes and a total genome size of 0.33 ×109 or about 1/10 the size of the human genome.
Or does he mean the minimum size of the mammalian genome defined by the Bent-winged bat at approximately half the size of the human genome? Either way, the human genome must contain a lot of junk that isn't required to specify a complex vertebrate.
This argument doesn't make any sense.
We now come to the most important part of Mattick's defense of ENCODE. The question is whether pervasive transcription is a reflection of noise or whether the majority of the RNAs produced have a function. Keep in mind that most of these RNAs are complementary to defective transposon sequences and their sequence is not conserved. Also keep in mind that only a small percentage reach a concentration of at least one molecule per cell. (Mattick does NOT mention concentration.)
Mattick's main argument for function is ..
... the vast majority of the mammalian genome is differentially transcribed in precise cell-specific patterns (Mercer et al. 2008) to produce large numbers of intergenic, interlacing, antisense and intronic non-protein-coding RNAs, which show dynamic regulation in embryonal development ...Let's think about that for a minute.
Let's assume that the human genome is littered by chance with short sequences that resemble transcription factor binding sites. This has to be true unless there is strong negative selection against anything that resembles the binding sites of any transcription factor. There's no conceivable way that this could happen so it follows logically that there will be spurious binding sites.
Transcription factors will blind to these spurious nonfunctional sites as long as the DNA is available for binding. In some cases the accidental binding of a transcription factor would lead to spurious, accidental, transcription.
In almost all cases, these spurious transcripts will be extremely rare—their concentration will be less than one transcript per cell. This is important since you can't have a serious discussion of this issue without considering concentration.
If our assumption is correct, there's one other feature of the spurious transcription that must be observed: the transcription will be cell specific or developmentally regulated. This is because different transcription factors are present in different cell types and at different stages of development. It's also because the accessibility of different parts of the genome vary from cell type to cell type and at different kinds of development. This is the transition from "open" chromatin to a "closed" version resembling heterochromatin.
We're left with the conclusion that spurious, accidental, transcripts must be differentially expressed as a function of cell type and development. That's exactly what we observe. But Mattick uses this necessary feature as an argument for function. That makes no sense.
He claims that ...
... differential expression (including extensive alternative splicing) of RNAs is a far more accurate guide to the functional content of the human genome than logically circular assessments of sequence conservation, or lack thereof. Assertions that the observed transcription represents random noise (tacitly or explicitly justified by reference to stochastic (‘noisy’) firing of known, legitimate promoters in bacteria and yeast), is more opinion than fact and difficult to reconcile with the exquisite precision of differential cell- and tissue-specific transcription in human cells.I don't think it's fair to say that spurious transcription is "more opinion than fact." It's a biochemical necessity as long as you understand the properties of DNA binding proteins.
Mattick has one more argument up his sleeve and it's the same argument made by Intelligent Design Creationists.
Moreover, where tested, these noncoding RNAs usually show evidence of biological function in different developmental and disease contexts, with, by our estimate, hundreds of validated cases already published and many more en route, which is a big enough subset to draw broader conclusions about the likely functionality of the rest.There are over one million different transcripts that have been detected in human cells. In some cases there have been obvious clues that these transcripts have a function. Many of these best candidates have been investigated and it turns out that quite a few have a function.
That's not a surprise. But just because there are functional RNAs does not mean that all RNAs are functional. It does not even mean that a substantial percentage are functional. (Remember that 300 functional RNAs out of one million is 0.03%.)
[Mattick has been] a true visionary in his field; he has demonstrated an extraordinary degree of perseverance and ingenuity in gradually proving his hypothesis over the course of 18 years.
Hugo Award Committee A Question of Motives
Now we get to the end of the paper and the most astonishing claim. I had to read this several times before I sure I was interpreting it correctly.
There may also be another factor motivating the Graur et al. and related articles (van Bakel et al. 2010; Scanlan 2012), which is suggested by the sources and selection of quotations used at the beginning of the article, as well as in the use of the phrase “evolution-free gospel” in its title (Graur et al. 2013): the argument of a largely non-functional genome is invoked by some evolutionary theorists in the debate against the proposition of intelligent design of life on earth, particularly with respect to the origin of humanity. In essence, the argument posits that the presence of non-protein-coding or so-called ‘junk DNA’ that comprises >90% of the human genome is evidence for the accumulation of evolutionary debris by blind Darwinian evolution, and argues against intelligent design, as an intelligent designer would presumably not fill the human genetic instruction set with meaningless information (Dawkins 1986; Collins 2006). This argument is threatened in the face of growing functional indices of noncoding regions of the genome, with the latter reciprocally used in support of the notion of intelligent design and to challenge the conception that natural selection accounts for the existence of complex organisms (Behe 2003; Wells 2011).The last two references are to Michael Behe's paper about functional pseudogenes and to Jonathan Wells' book The Myth of Junk DNA. I don't think I've ever see a legitimate scientific paper that references that book by Jonathan Wells.
Mattick also uses IDiot terminology when he says that, "...the argument posits that the presence of non-protein-coding or so-called ‘junk DNA’ that comprises >90% of the human genome is evidence for the accumulation of evolutionary debris by blind Darwinian evolution." As you should all know by now, the accumulation of junk DNA is the antithesis of "Darwinian evolution." You should also note that it's mostly IDiots who get confused about the difference between junk DNA and "non-protein-coding" DNA.
I find that very troubling.
Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc Natl Acad Sci USA 110:5294-5300.
Graur D., Zheng Y., Price N., Azevedo R.B., Zufall R.A., and Elhaik E. (2013) On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol 5:578-590.
Mattick, J. S. and Dinger, M. E. (2013) The extent of functionality in the human genome. The HUGO Journal 7, 2 [doi: 10.1186/1877-6566-7-2] [Abstrat]
Niu, D.K. and Jiang, Ll (2013) Can ENCODE tell us how much junk DNA we carry in our genome? Biochem Biophys Res Commun 430:1340-1343.