Wednesday, November 29, 2006

The Three Domain Hypothesis (part 4)

[Part 1][Part 2][Part 3]

Ludwig and Schleifer question the reliability of the SSU tree. They begin by comparing trees constructed from the small ribosomal RNA subunit (SSU) and the large ribosomal RNA subunit (LSU). The example they use is 18 species of Enterococcus and they show that there are significant differences between the two trees. Surprisingly, they dismiss these differences as “minor local differences.” These authors are convinced that “SSU and LSU rRNA genes fulfill the requirements of ideal phylogenetic markers to an extent far greater than do protein coding genes.”

In spite of this bias, they compiled a database of protein trees from conserved genes that are found in all three of the proposed Domains. According to them, the Three Domain Hypothesis is supported by EF-Tu, the large subunits of RNA polymerase, Hsp60, and some aminoacyl-tRNA synthetases (aspartyl, leucyl, tryptophanyl, and tyrosyl).

The Three Domain Hypothesis is refuted by ATPase, DNA gyrase A, DNA gyrase B, Hsp70, RecA, and some aminoacyl-tRNA synthetases. Note the inclusion of ATPase in this list. The phylogeny of ATPase was one of the strongest bits of evidence for the Three Domain Hypothesis back in 1989 but further work has shown that these genes (proteins) now refute the hypothesis.

My own favorite is the HSP70 gene family, arguably the most highly conserved gene in all of biology and therefore an excellent candidate for studies of deep phylogeny. Hsp70 is the main chaperone in all species. It is responsible for the correct folding of proteins as they are synthesized. It forms a complex with DnaJ and GrpE in bacteria and similar proteins in eukaryotes. The complex associates with the translation machinery (ribomes etc.) during protein synthesis.

The conflict between trees constructed with HSP70 and the ribosomal RNA trees has been known for a long time. The actual pattern of the HSP70 tree can be interpreted in two different ways depending on where you place the root [see 1995] but neither one agrees with the Three Domain Hypothesis.

Here’s an example of an HSP70 tree that I just created using the latest sequences. It’s fairly typical of the trees that do not support the Three Domain Hypothesis. Eukaryotes cluster as a monophyletic group (lower left) and all prokaryotes form another distinct clade. The archaebacteria sequences (black dots) do not form a single clade, let alone a “domain.” Instead, they tend to be dispersed among the other bacterial groups.

Note that this tree, like many others, shows numerous short branches at the bottom of the bacteria tree suggesting that the diversity among bacteria is ancient. Phillippe and Forterre (1999) were among the first to document the serious differences between conserved protein trees and rRNA trees in “The Rooting of the Universal Tree of Life Is Not Reliable” (J. Mol. Evol. 49:509-523). It’s worth quoting their abstract in order to emphasize the controversy since Ludwig and Schleifer don’t do a very good job.
Several composite universal trees connected by an ancestral gene duplication have been used to root the universal tree of life. In all cases, this root turned out to be in the eubacterial branch. However, the validity of results obtained from comparative sequence analysis has recently been questioned, in particular, in the case of ancient phylogenies. For example, it has been shown that several eukaryotic groups are misplaced in ribosomal RNA or elongation factor trees because of unequal rates of evolution and mutational saturation. Furthermore, the addition of new sequences to data sets has often turned apparently reasonable phylogenies into confused ones. We have thus revisited all composite protein trees that have been used to root the universal tree of life up to now (elongation factors, ATPases, tRNA synthetases, carbamoyl phosphate synthetases, signal recognition particle proteins) with updated data sets. In general, the two prokaryotic domains were not monophyletic with several aberrant groupings at different levels of the tree. Furthermore, the respective phylogenies contradicted each others, so that various ad hoc scenarios (paralogy or lateral gene transfer) must be proposed in order to obtain the traditional Archaebacteria-Eukaryota sisterhood. More importantly, all of the markers are heavily saturated with respect to amino acid substitutions. As phylogenies inferred from saturated data sets are extremely sensitive to differences in evolutionary rates, present phylogenies used to root the universal tree of life could be biased by the phenomenon of long branch attraction. Since the eubacterial branch was always the longest one, the eubacterial rooting could be explained by an attraction between this branch and the long branch of the outgroup. Finally, we suggested that an eukaryotic rooting could be a more fruitful working hypothesis, as it provides, for example, a simple explanation to the high genetic similarity of Archaebacteria and Eubacteria inferred from complete genome analysis.
The problem is obvious. All trees, RNA and protein, have potential problems of saturation and long branch attraction. Although Ludwig and Schleifer argue in favor of the ribosomal RNA tree, there is still serious debate over which sequences are revealing the “true” phylogeny. Are there good reasons for rejecting those trees that refute the Three Domain Hypothesis as it's supporters maintain?

Microbobial Phylogeny and Evolution: Concepts and Controversies Jan Sapp, ed., Oxford University Press, Oxford UK (2005)

Jan Sapp The Bacterium’s Place in Nature

Norman Pace The Large-Scale Structure of the Tree of Life.

Woflgang Ludwig and Karl-Heinz Schleifer The Molecular Phylogeny of Bacteria Based on Conserved Genes.

Carl Woese Evolving Biological Organization.

W. Ford Doolittle If the Tree of Life Fell, Would it Make a Sound?.

William Martin Woe Is the Tree of Life.

Radhey Gupta Molecular Sequences and the Early History of Life.

C. G. Kurland Paradigm Lost.


  1. One of the standard views about microbial species is the Core Genome Hypothesis, that there is a core set of genes, mostly "housekeeping" genes, that are highly conserved and very resistant to lateral transfer. What would these genes show? Is HSP70 one of them?

    I'd be very interested to see a supertree analysis of core genome genes.

  2. John,

    When it comes to HSP70, how much more "core" can you get? It is absolutely essential for protein folding inside the cell and, as I mentioned, it is the most highly conserved gene in all of biology. Thus, I just showed you a supertree of one core protein. :-)

    Other "core" genes that refute the Three Domain Hypothesis include ATPase, gyrase, and some aminoacyl tRNA synthetases.

    Analysis of whole genomes reveals that most metabolic genes in archaebacteria are more closely related to other bacteria than to eukaryotes. This is not consistent with the Three Domain Hypothesis.

    Proponents of the hypothesis usually refer to a subset of genes that supports their favorite phylogeney. They pick out genes that are involved in standard information flow pathways (transcription and translation) and claim that these "informational genes" represent the real core that tracks species phylogeny.

    They ignore the "informational genes" (e.g. HSP70, some ribosomal proteins, some aminoacyl-tRNA synthetases) that don't fit with their proposal. They also tend to ignore any evidence of lateral gene transfer in this "informational" group of gene (e.g. EF-Tu: Inaqaki et al. (2006)).

    We'll talk about this later in Part 5 or Part 6.

  3. Wouldn't it have been oh so clever to contain your commentary on the thre domain hypothesis in three parts?

  4. Look at that tree up there, Mustafa. He's got scores of parts yet...

  5. Oh, and I meant to ask - is HSP70 protected from lateral transfer? Forgive my ignorance, but being a "core" gene in the sense of being extremely important doesn't mean it cannnot be laterally swapped. The "Core Gene" hypothesis (ref below) is that these genes are so tightly integrated into the rest of the genome and functioning of the organism that it becomes unlikely they can be swapped with distally related versions.

    Dykhuizen, D. E. (1998), "Santa Rosalia revisited: Why are there so many species of bacteria?" Antonie Van Leeuwenhoek 73 (1):25-33.

    Dykhuizen, D. E., and L. Green (1991), "Recombination in Escherichia coli and the definition of biological species", Journal of Bacteriology 173 (22):7257-7268.

    Coleman, Maureen L., Matthew B. Sullivan, Adam C. Martiny, Claudia Steglich, Kerrie Barry, Edward F. DeLong, and Sallie W. Chisholm (2006), "Genomic Islands and the Ecology and Evolution of Prochlorococcus", Science 311 (5768):1768 - 1770.

    Wertz, J. E., C. Goldstone, D. M. Gordon, and M. A. Riley (2003), "A molecular phylogeny of enteric bacteria and implications for a bacterial species concept", J Evol Biol 16 (6):1236-1248.

  6. No gene is "protected" from lateral gene transfer. They are all candidates. The proponents of the Three Domain Hypothesis would like you to believe that their favorite genes are special but the arguments don't stand up to close scrutiny.

    My favorite examples are the genes for ribosomal proteins. Some of them provide weak support for the Three Domain Hypothesis and some of them refute it. This pretty much trashes the argument for special "core" genes that reveal the one true phylogeny.

    Of course, a nasty little fact like that isn't going to dissuade the faithful .... :-)

  7. Why do you keep using aminoacyl-tRNA synthetases (aaRSs) to make your case? It does not appear the last common ancestor encoded all 20 aaRSs. When it comes to Gln-tRNA formation, each domain of life utilizes different enzymes. Archaea encode an archaeal specific tRNA-dependent amidotransferase, GatDE. Most bacteria utilize a different tRNA-dependent amidotransferase, GatCAB. The amidotransferases transamidate Glu-tRNA(Gln). Eukaryotes and some bacteria (like E. coli) use a glutaminyl-tRNA synthetase (GlnRS). The GlnRS it appears is related to the glutamyl-tRNA synthetase and the GlnRS found in bacteria is a result of gene transfer. Most archaea and bacteria utilize GatCAB to form Asn-tRNA(Asn). Two LysRSs have evolved in nature (one a Class I synthetase and the other a Class II synthetase). A few archaea that code for both LysRSs also encode a novel synthetase for Pyrrolysine. Some archaea do not code for a CysRS and use an indirect pathway to form Cys-tRNA. The first step is to aminoacylate tRNA(Cys) with O-phosphoserine which then gets modified to Cys. The evidence doesn't argue against the three domain hypothesis but rather that all the essential components of protein synthesis found in weird organisms like eukaryotes and strange bacteria like E. coli are not universal.