More Recent Comments

Tuesday, January 14, 2020

The Three Domain Hypothesis: RIP

The Three Domain Hypothesis died about twenty years ago but most people didn't notice.

The original idea was promoted by Carl Woese and his colleagues in the early 1980s. It was based on the discovery of archaebacteria as a distinct clade that was different from other bacteria (eubacteria). It also became clear that some eukaryotic genes (e.g. ribosomal RNA) were more closely related to archaebacterial genes and the original data indicated that eukaryotes formed another distinct group separate from either the archaebacteria or eubacteria. This gave rise to the Three Domain Hypothesis where each of the groups, bacteria (Eubacteria), archaebacteria (Archaea), and eukaryotes (Eucarya, Eukaryota), formed a separate clade that contained multiple kingdoms. These clades were called Domains.

The two most important features of the Three Domain Hypothesis are: (1) there are three distinct domains, and (2) the eukaryotic domain is more closely related to archaebacteria than to the other domain of bacteria. Both of these claims are wrong. We now know that some of the nuclear genes in eukaryotes arose from within the Archaea domain so there aren't three domains. We now know that most of the nuclear genes in eukaryotes are more closely related to genes from eubacteria than to genes from archaebactreial species so the tree of life shown on the left misrepresents the origin of eukaryotes.

Early skeptics of the Three Domain Hypothesis

The PR machine orchestrated by Carl Woese and Norman Pace was spectacularly successful so that by the early 1990s almost everyone was convinced that the Three Domain Hypothesis was correct and it became part of standard textbook dogma. However, there were a few skeptics. One of the most prominent was Jim Lake who argued that eukaryotes arose from within Archaea. He pointed out that eukaryotic genes appeared to be more similar to the genes from Eocytes than other branches of the Archaea tree suggesting that eukaryotes share a more recent common ancestor with Eocytes than other groups [Jim Lake and the Eocyte tree].

The skeptics were subjected to harsh criticism from the PR machine so that the reputations of scientists like Jim Lake suffered greatly. More and more evidence for this tree has emerged over the past thirty years so that almost all researchers in this field now agree that the tree of life is a Two-Domain tree with some eukaryotic genes sharing a common ancestor with the Asgard group of archaea from within the the Archaea Domain. Here's the latest paper on this subject—it's what prompted me to write this post.
Williams, T.A., Cox, C.J., Foster, P.G., Szöllősi, G.J., and Embley, T.M. (2020) Phylogenomics provides robust support for a two-domains tree of life. Nature ecology & evolution, 4:138-147. [doi: 10.1038/s41559-019-1040-x]

Hypotheses about the origin of eukaryotic cells are classically framed within the context of a universal ‘tree of life’ based on conserved core genes. Vigorous ongoing debate about eukaryote origins is based on assertions that the topology of the tree of life depends on the taxa included and the choice and quality of genomic data analysed. Here we have reanalysed the evidence underpinning those claims and apply more data to the question by using supertree and coalescent methods to interrogate >3,000 gene families in archaea and eukaryotes. We find that eukaryotes consistently originate from within the archaea in a two-domains tree when due consideration is given to the fit between model and data. Our analyses support a close relationship between eukaryotes and Asgard archaea and identify the Heimdallarchaeota as the current best candidate for the closest archaeal relatives of the eukaryotic nuclear lineage.
The figure below is from a review of the paper by Gribaldo and Brochier-Armanet (2020). It illustrates the difference between a three-domain tree and a two-domain tree.

I should draw your attention to the quality of the discussion in the Williams et al. paper. It contains a lengthy introduction concerning the controversy over three-domain and two-domain trees and the data supporting each tree. The experiments in the paper are designed to distinguish between the two possibilities and the results and conclusion section contain critical analyses of the results and how they compare to other data. This is the way scientific papers should be written.1

Experienced readers will immediately recognize part of the problem from looking at the figure above. It's the issue of long branch attraction—a phenomenon that can artifactually cluster two long branches such as the bacterial and eukaryotic branches. This will produce a three-domain tree that does not faithfully represent the true tree of life. The authors of the paper try to correct for this (presumed) artifact by using sophisticated phylogenetics software and Bayesian relative rate tests. I'm more than a little skeptical about whether the quality of the underlying data can support such manipulations but, nevertheless, it makes a lot of sense that some eukaryotic genes arose from within the Archaea.2

The ring of life

One of the distinguishing features of eukaryotic cells is that they contain mitochondria that are clearly descendants of an endosymbiotic event where an archaeal cell engulfed a primitive alphaproteobacterium. Over time, a good proportion of the alphaproteobacterial genes migrated to the nucleus so that a typical eukaryotic genome now contains a majority of genes that are proteobacterial in origin (see the figure below from McInerney and O'Connell, 2017).
What this means is that it is extremely misleading to represent the origin of eukaryotes as simply the descendants of a single archaeal species. Both of the genomes involved in the original fusion contributed to the modern eukaryotic genome.

Endosymbiosis is not new but it took some time for scientists to realize that the contribution of alphaproteobactrial genes was very significant and that the original Three Domain Hypothesis was misleading. Eventually this idea came to be known as the Ring of Life (e.g. McInerney et al, 2014 [see figure on the right]; McInerney et al., 2015; Lake, 2015)

Proponents of the Three Domain Hypothesis dismissed the evidence for a ring of life by claiming that some genes were more important than others in constructing ancient phylogenies. It became a major talking point to assume that ribosomal RNA genes and genes for proteins involved in translation and transcription were the only ones that count in determining the origin of eukaryotes (informational genes). Since these genes tend to be more similar to archaea than to eubacteria, it lent support to the Three Domain Hypothesis. Indeed, as late as 2009 Norman Pace was still arguing that the Three Domain Hypothesis was valid because the small subunit RNA genes (SSU) were the only reliable phylogenetic marker (Pace, 2009).

ThemeThe Three Domain HypothesisHowever, by that time there was increasing evidence that the genes arose from within Archaea casting doubt on the existence of a separate Eucarya domain and, furthermore, there were well-informed researchers who pointed out that the great majority of eukaryotic genes (~80%) that are more closely related to Eubacteria than to Archaea are genes involved in fundamental aspects of metabolism. It seems rather silly to trace the origin of eukaryotes by only looking at a biased subset of eukaryotic genes and ignoring the majority that give a different result (see "The Tree of One percent," Dagan and Martin, 2006) (see "The real 'domains' of life," Walsh and Doolittle, 2005).

What's interesting about this controversy is that the side that's fighting against the established Three-Domain dogma involves many of the same players that we see in other disputes. For example, that's Bill Martin on the right enjoying a cup of coffee and a donut at Tim Hortons. Another prominent critic of Three Domains is Ford Doolittle. I think the reason for this is that scientists who are knowledgeable about molecular evolution tend to recognize misconceptions about evolution and that's why they are more likely to see problems with Three Domains, opposition to junk DNA, alternative splicing, ENCODE etc.

The web of life

The importance of lateral gene transfer (LGT) became apparent in the 1990s and this led to further complications in constructing a universal tree of life. This led to an important article by Ford Doolittle in the February 2000 issue of Scientific American (see figure below). The idea is that LGT may have been so rampant in the early history of life that it's impossible to draw a universal tree that represents all the genes in a major clade. The major divisions such as archaea, eubacteria, and eukaryotes may only have emerged from the gene pool after several hundred million years.

I remember being invited to be an observer at a meeting in Halifax (Nova Scotia, Canada) in 2009 and coming away totally confused about the tree of life. (The photo is from Christina Behme at the workshop in Halifax in July 2009 on "Questioning the Tree of Life." That's me having dinner on the first evening with Ford Doolittle (left), John Dupré (standing), and Andrew Roger (right).) Now it's more than 10 years later and I still don't think there's a clear consensus of what a tree of life should look like at its deepest branches.

The popular press is just as confused as everybody else. The average science writer hasn't grasped the notion that the Three Domain Hypothesis is dead but some of them have clued into the fact that there's controversy about the tree of life. That's what led to the infamous Darwin Was Wrong article in New Scientist back in January 2009. This prompted a critical letter from Daniel Dennett, Jerry Coyne, Richard Dawkins, and PZ Myers [Blunt Talk from Four Evolutionists] but it's worth noting that three of these scientists are adaptationists of various flavors.

The tree of life is in trouble, that's the part that's right, but Darwin never said anything about what a universal tree of life should look like so, in this case, he wasn't wrong.

I'll leave you with a the words of Ford Doolittle from a 2015 interview published in PLoS Genetics [The Philosophical Approach: An Interview with Ford Doolittle].
I think there are two groups of prokaryotes: Bacteria and Archaea. They are not well defined, and there are many genes that are derived from lateral gene transfer from bacteria into archaea, somewhat fewer in the other direction. So to really say that “this bug is an archaeon” when the majority of its genes are actually bacterial, what you really mean is that you are privileging the ribosomal RNA as the definer. And that is what people do, so I will let them do that.

And then what people would believe, and I guess what I would believe, is that the eukaryotic transcriptional/translational machinery—the informational machinery in the eukaryotic cell—arose within the Archaea, more recently than the Bacteria and the Archaea diverged from each other. That would be the standard view.

But we think that a tremendous number of genes have been exchanged back and forth between bacteria and between bacteria and archaea, and also between bacteria and eukaryotes after eukaryotes arose from within the Archaea—so much transfer that it is really rather arbitrary to define these lineages by virtue of their transcriptional and translational machinery any more. Had Woese started looking at glycolysis enzymes, rather than ribosomal RNA, we might not even be talking this way.

1. There are quite a few other high-quality papers on the Two-Domain Hypothesis; for example, Williams et al. 2013. This is not meant to be a comprehensive review of all the work on this subject.

2. Another problem in these experiments arises from the use of concatenated data where a number of genes are strung together to produce a single large "gene." This is necessary because the amount of information in a single gene is not sufficient to resolve deep phylogenies. However, the individual gene trees don't usually agree with the concatenated tree suggesting that there's a problem (Thiergart et al., 2014).

Dagan, T., and Martin, W. (2006) The tree of one percent. Genome Biol, 7:118. [doi: 10.1186/gb-2006-7-10-118]

Gribaldo, S., and Brochier-Armanet, C. (2020) Evolutionary relationships between Archaea and eukaryotes. Nature ecology & evolution, 4:20-21. [doi: 10.1038/s41559-019-1073-1]

McInerney, J.O., and O'Connell, M.J. (2017) Microbiology: mind the gaps in cellular evolution. Nature, 541:297. [doi: 10.1038/nature21113]

Pace, N.R. (2009) Mapping the tree of life: progress and prospects. Microbiology and Molecular Biology Reviews, 73:565-576. [doi: 10.1128/​MMBR.00033-09]

Thiergart, T., Landan, G., and Martin, W.F. (2014) Concatenated alignments and the case of the disappearing tree. BMC evolutionary biology, 14:266. [doi: 10.1186/s12862-014-0266-0]

Walsh, D.A., and Doolittle, W.F. (2005) The real ‘domains’ of life. Current Biology, 15:R237-R240. [doi: 10.1016/j.cub.2005.03.034]

Williams, T.A., Foster, P.G., Cox, C.J., and Embley, T.M. (2013) An archaeal origin of eukaryotes supports only two primary domains of life. Nature, 504:231-236. [doi: 10.1038/nature12779]


Georgi Marinov said...

IMHO it is entirely reasonable to "privilege" the information genes.

It was an endosymbiotic event, i.e. there were a host and an endosymbiont involved, not a fusion event. The partners are still physically compartmentalized to this day, and there is clearly a dominant one (as evidenced by the fact that the endosymbiont genome has been lost on numerous occasions, and in at least one case, the whole mitochondrion has been lost outright). So even though a majority of eukaryotes' nuclear genes derived from the endosymbiont (or from other bacteria through HGT) and there may well have been outright fusion events long before eukaryogenesis, by that point in time I think we can speak of "lineages" at the cellular level without much controversy.

Also, I think it should always also be pointed out that the three-domain model being wrong does not mean that the two-domain model that precededed it is correct.

We are back to a two-domain model but it is two very different domains that feature in it now .

Arlin said...

LMAO at those arrogant self-righteous followers of Carl Woese. I totally agree with your characterization of a propaganda campaign. In some ways it was worse than that, it was a cult. And they had so much influence! A few years ago I used the word "prokaryote" in a sentence and I was corrected by a younger colleague who said something along the lines of "so far as I know, there is no such thing as a prokaryote." The poor sod did not understand that there is no such thing as "Archaea."

Joe Felsenstein said...

Is the issue here what the genealogyy looks like (allowing for symbioses and horizontal gene transfers as well)? Or how we classify? It seems to be the latter, with people who might agree on the genealogy firing cannons at each other over the classification. Which to me is inherently less important.

Jonathan Badger said...

Exactly. The modern two-domain model is basically the same thing as the three domain model only more so. When I worked on archaea in the 1990s, the exciting thing was how the standard "prokaryotic" explanations for molecular biology just didn't hold for these organisms, which acted more like eukaryotes. Accepting that eukaryotes are actually derived from archaea isn't really changing anything; the point is that archaea aren't bacteria as the traditional two domains would have it.

Joe Felsenstein said...

To be fair, Larry has presented a two-domain tree and a three-domain tree. The issue of which tree is better is important. It is just the designation of official Domain that adds nothing to that.

João said...

I agree with you when you say that "it is entirely reasonable to "privilege" the information genes".

However, while it's true that " the three-domain model being wrong does not mean that the two-domain model that precededed it is correct", I think that the evidence points to the conclusion that we have only two domains, not three. Even if the we now have " two very different domains", they still are only two.

S Johnson said...

I've lost track. Is the issue that there are prokaryotes and eukaryote domains, or that there are eukaryote and archea domains? Or maybe eubacteria and archea domains, with "eukaryotes" included in archaes? Or are "eukaryotes" included in eocytes? Or is the Ring of Life hypothesis that there are no domains, just a ring? I suppose I can agree Woese is a monster, as I don't know him therefore hurting his feelings doesn't hurt mine.

Joe Felsenstein said...

Ask about the genealogy. Declaring "domains" is, given the genealogy, just an exercise in decoration.

Jonathan Badger said...

But names for clades are how we talk about groups of organisms. And it is clear that in both the three domain tree and the new two-domain tree you can't talk about a "prokaryotic" clade the way you could with the traditional two domain tree that lumped Bacteria and Archaea on one side and gave Eukaryotes their own clade. I think that isn't just "decoration", personally.

Joe Felsenstein said...

Yes, but how we set up these groups is then mostly a matter of taste. Some will insist that all of them must be monophyletic, others will say that groups like "invertebrate" are useful. But with the genealogy one at least is talking about something that is not dependent on some assessment of utility of terminology.

John Harshman said...

The problem lies not with assigning names to clades but with assigning ranks.

Jonathan Badger said...

No, that's a separate issue entirely. Yes, you can argue about what rank a clade should be (or even if ranks above species are even useful), but to the degree we believe that groups of organisms should be based on phylogeny rather than just tradition, then we should avoid names like "invertebrate", "protist", and "prokaryote" because they just don't correspond to any clades on accepted phylogenies.

Rosie Redfield said...

I never understood why Jim Lake was treated as a pariah by people whose judgement I generally respected.

Larry Moran said...

Are humans archaebacteria?

John Harshman said...

Is Archaebacteria even a taxon? I thought the term was "Archaea". The question is complicated by the multiple sources of the eukaryote nuclear genome. But if you go by the portion that's nested within Archaea and suppose that's a kind of core identity, then the answer is yes, eukaryotes are archaeans.

Jonathan Badger said...

I concur with John. Eukaryotes themselves are a clade, but are a subclade of Archaea, much as birds are of reptiles.

S Johnson said...

Reviewing, the thought maybe counting percentage of genes might not be the most effective way of defining a clade of single-celled organisms niggles at me. If multicellular organisms might be usefully defined as species when they breed with each other, maybe single-celled organisms might be usefully defined as the equivalent when they eat each other?

(But I confess to not being properly gene-centered, so there's that.)

Larry Moran said...

Since most eukaryotic genes are similar to proteobacterial genes, are eukaryotes also a subclade of Eubacteria? If so, how can they be a distinct clade while also being the subclade of two other clades?

Jonathan Badger said...

No. You can't naively claim that the phylogeny of something is the sum of the phylogeny of the individual genes. The first people to do phylogeny, the historical linguists, understood this long ago. English is a Germanic language. And yet the majority of its vocabulary is of Romance origin, due to several factors like the Norman Conquest, and the prestige that Latin had in the Western World. How do we know English is a Germanic language and not a Romance one? Well, yes, we are lucky because we do have written documents in Old English where it is clearer, but even if we didn't have knowledge of Old English, we could figure this out by looking on how English processes its information via its grammar, which is clearly Germanic. That's the essential part of any system whether linguistic or biological. It's why pathogens lose their metabolic genes but hold on to their information ones.

Joe Felsenstein said...

One can think of the "tree of cells", in which case the eikaryotic cells arose from archaeal cells (right?). Saying what clade eukaryotes are in then depends on which genes you look at, or whether you ask about the cells themselves. If eukaryotic cells arose from cells of an archaean lineage then eukaryotes are a clade but non-eukaryote archaeans are not a clade -- the Eukaryote clade is a subclade of the archaen clade, so in that sense we are archaeans too.

Larry Moran said...

@Jonathan Badger
Your example of English as a Germanic language is excellent but I'm not sure you can apply it in such a blanket manner to the question at hand. It requires that you privilege so-called "information genes" in determining the root phylogeny. It also exaggerates the relationship of those "information" genes because the number of such genes that show a clear association between eukaryotes and archaea is much less than most people realize.

I think that basic energy/biosynthetic metabolism is a fundamental property of eukaryotic cells and one could legitimately argue that those genes play an important role in determining the origin of eukaryotes.

Larry Moran said...

@Joe Felsenstein
I think we can all agree that it was an archaeal cell that engulfed a proteobacterial cell and the two different species lived in symbiotic relationship for (probably) millions of years before archaeal genes in the cytoplasm began to be replaced by proteobacterial genes from what will become mitochondria.

This cell argument is a good argument for privileging the archaeal ancestor in determining the phylogeny of eukaryotes. However, I think we can also agree that there's more to the story than that and showing a phylogeny that ignores the eubacterial component of eukaryotic genomes is misleading.

Mitochondria are an extremely important component of eukaryotic cells and ancient archaea contributed very little to modern mitochondria. It's not like Jonathan's example of English as a Germanic language since mitochondria aren't just late replacements of some underlying archaeal mechanism of energy metabolism.

Do you object to the "ring of life" view of eukaryotic origins? If so, why?

Joe Felsenstein said...

It's not as simple as a single "ring". Above, I have tried to use the term "genealogy" of life, rather than tree, for that reason. And once you go around declaring Domains, you necessarily do some violence to that genealogy.

S Johnson said...

Is an endosymbiotic event "descent" in the ordinary sense of the word?

Jmac said...

Has anybody here actually tried to replicate endosymbiosis in the lab?
I guess not, because if anyone had, restriction enzymes would be mentioned, among other now well known problems facing endosymbiosis....

S Johnson said...

You're saying endosymbiosis is not an experimentally attested, implying it is not a legitimate explanation? Personally, I''m not a Popperian. And unlike Larry Moran I think history of life is "science." So endosymbiosis is attested by historical analysis, no matter how indirect this is. But this isn't so extreme, is it? Didn't the demonstration of evolution depend very much on fossil history?

Corneel said...

@S Johnson
New endosymbiotic events are happening to this day in modern species. The most famous example is Wolbachia, which is an endosymbiont infecting many species of arthropods. It belongs to alphaproteobacteria, just like the putative ancestor of mitochondria.