Monday, August 31, 2015

The origin of eukaryotes and the ring of life

The latest issue of Philosophical Transactions of the Royal Society B (Sept. 26, 2015) is devoted to Eukaryotic origins: progress and challenges. There are 16 articles and anyone interested in this subject has to read all of them.

Many (most) of you aren't going to do that so let me try and summarize the problem and the best current ideas on how to solve it. We begin with the introduction to the issue by the editors, Tom Williams, Martin Embley (Williams and Embly, 2015). Here's the abstract ...
The origin of eukaryotic cells is one of the most fascinating challenges in biology, and has inspired decades of controversy and debate. Recent work has led to major upheavals in our understanding of eukaryotic origins and has catalysed new debates about the roles of endosymbiosis and gene flow across the tree of life. Improved methods of phylogenetic analysis support scenarios in which the host cell for the mitochondrial endosymbiont was a member of the Archaea, and new technologies for sampling the genomes of environmental prokaryotes have allowed investigators to home in on closer relatives of founding symbiotic partners. The inference and interpretation of phylogenetic trees from genomic data remains at the centre of many of these debates, and there is increasing recognition that trees built using inadequate methods can prove misleading, whether describing the relationship of eukaryotes to other cells or the root of the universal tree. New statistical approaches show promise for addressing these questions but they come with their own computational challenges. The papers in this theme issue discuss recent progress on the origin of eukaryotic cells and genomes, highlight some of the ongoing debates, and suggest possible routes to future progress.
The problem is that most people think the origin of eukaryotes was solved by Carl Woese when he published the Three Domain Hypothesis. According to the ribosomal RNA tree, eukaryotes and Archaea are sister groups that are distantly related to Eubacteria (see "a" below).

ThemeThe Three Domain Hypothesis
The data doesn't support such a simple interpretation and that's why the Three Domain Hypothesis has been abandoned. We now know that only one-third of the ancient genes in eukaryotes are more closely related to Archaea than to Eubacteria (Bacteria). Most of the genes have closer homologues in Bacteria. That's because eukaryotes arose from a fusion of a primitive archaebacterium and a primitive eubacterium—the Endosymbiotic Hypothesis. The primitive eubacterium became mitochondria and transferred most of its genes to the archaebacterial genome, which became the nuclear genome. (In the beginning, you couldn't tell which genome was going to become the biggest.)

That's the view shown in part "b" of the figure. This is not consistent with the view of eukaryotic origins promoted by Woese and his colleagues.

The other part of the problem has to do with the relationships of the eukaryotic genes that have bacterial homologues. The ones derived from eubacteria map to the alphaproteobacteria branch of the tree indicating that eukaryotes arose, in part, from within the Eubacterial Domain [see Eukaryotic genes come from alphaproteobacteria, cynaobacteria, and two groups of Archaea]. The genes of Archaeal origin should not come from a species of Archaea if the Three Domain Hypothesis is to be preserved, at least in part. But that's not what the latest results show.

There is growing evidence that the Archaeal ancestor in the fusion event came from a branch within the Archaeal Domain. That branch used to be called "Eocytes" but later on it became known as "Crenarchaeota." As more and more Archaeal genomes were sequenced, it became clear that Crenarchaeota were part of a large superphylum that included Thaumarchaeota, Aigarchaeota, and Korarchaeota. The superphylum is named "TACK" after these four groups.The ancient eukaryotic genes that are related to Archaea seem to come from this group.

The best way to describe the origin of eukaryotes is to use the Ring of Life metaphor and that's the subject of a paper by McInerney et al. (2015). The various phylogenetic trees depicting the origin of eukaryotes are shown in a figure from their paper (below). Right now the data strongly supports a Ring of Life ("d") and not any of the other trees. (The second-best tree, showing the phylogeny of most of the ancient eukaryotic genes, isn't even shown in the figure. It would have eukaryotes clustered with Eubacteria.)

You might wonder why anyone bothers to make a fuss about shape of the the phylogeny. It's because how we think about these things influences the way we write and talk about the origin of eukaryotes. For example, those people who were brainwashed convinced by Woese and his colleagues to adopt the Three Domain view of life will often maintain that the most important eukaryotic genes are Archaeal-related (e.g. "information" genes) and thus, the Three Domain view is still the best way to think of eukaryote origins.

McInerney et al. make a good case for rejecting that view. They advocate a "domain-free" view of life.
To conclude, it is clear that eukaryotes cannot be correctly defined as ‘derived’ Archaebacteria, or as ‘derived’ Eubacteria. Indeed, to view eukaryotes as being from either the archaebacterial or the eubacterial lineages is an over-simplification. Each human is derived equally from both parents. They would not exist without a genetic contribution from both, and it does not matter if they look more like their mother or father, or which surname they carry, if any. The reality is that a human only exists as a consequence of a contribution from both parents. Analogously, eukaryotes are equally eubacterial and archaebacterial. A taxonomic debate exists in the literature on the early evolution of life, whereby hypotheses have been suggested to be characterizable either as three-domains or two-domains based (2D versus 3D hypotheses). This characterization inherently assumes the existence of a tree-like pattern of evolution, which is misleading. Because eukaryotes arose from both Archaebacteria and Eubacteria, there are only two (monophyletic) lineages of life: (i) cellular life and (ii) the eukaryotes. Monophyletic eukaryotes are nested within monophyletic life. Eukaryotes make domain-based classifications obsolete and we therefore advocate dismissing the use of this term (which can easily be replaced by the term lineage, for instance) entirely. That is, we advocate a ‘domain-free’ view of the history of life, as debates about whether there should be two domains or three are essentialist and moot. [my emphasis LAM]

In a pluralistic view of cellular life on the planet, we can see that the merging of eubacterial genes with archaebacterial genes gave rise to the halophiles and indeed it made an enormous contribution to the origins of most of the major groups of Archaebacteria. We see that photosynthesis can only be interpreted as a series of gene flows around the prokaryotic and eukaryotic worlds. We see that eukaryotes have arisen as a consequence of major flows between prokaryotes initially (eukaryogenesis), and later, between a prokaryote group and a eukaryotic group (plastid origins) [84].

Life's history is complex and we should not try to simplify it to suit our need for orderly nomenclatural systems.

James McInerney, J., Pisani, D., O'Connell, M.J. (2015) The ring of life hypothesis for eukaryote origins is supported by multiple kinds of data. Phil Trans. R. Soc. B 370: published online Aug. 31, 2015. [doi: 10.1098/rstb.2014.0323]

Williams, T.A., and Embley, T.M. (2015) Changing ideas about eukaryotic origins. Phil Trans. R. Soc. B 370: published online Aug. 31, 2015. [doi: 10.1098/rstb.2014.0318]


  1. I'd love to read all of them, but my institutional access doesn't include the last 12 months [sigh]. The open-access part, though, is enough food for thought to last me a few days.

    1. Let me make a linguistic analogy. According to Wikipedia, only 26% of words in modern English are of Germanic origin, with the majority coming from either French or Latin. In McInerney's view, this would mean that English isn't a Germanic language and we should adopt a "language-family free" view of the history of language. I don't think many linguists would find that a very useful way to think of language evolution, nor do I think it is for biological evolution.

    2. McInerney describes Jonathan Badger's view as the autogenesis or "slow-drip" hypothesis. It postulates an archaeal ancestor that gradually accumulated bacterial genes one-at-a-time over millions of years. That's analogous to a Germanic language like English gradually borrowing French and Latin words over many centuries.

      They reject that hypothesis because it doesn't fit the data. They prefer the symbiogenic ("big-bang") hypothesis whereby the first eukaryotic cell was formed instantly when two different cells with different lineages came together.

      That's much more like two parents, say Stanley Ann Dunham and Barack Obama Sr., producing a child (Barack Obama Jr.). Some people will insist that the child should be labelled as belonging to one particular race while other, more reasonable, people agree that the child has a mixed heritage that should be recognized and celebrated.

      I side with the more reasonable people and that's why I favor a mixed heritage for eukaryotes.

    3. Why would it *matter* in terms of accepting either biological or linguistic domains/families if the acquisition happened either gradually or all at once? And it isn't as if language evolution occurs entirely gradually anyway (The Norman Conquest in 1066 certainly was a significant event in English's evolution, even if it picked up non-Germanic words both prior and after it).

    4. With due respect, the analogy is misleading because its vocabulary doesn't play the role for a language than genes play for a life form. English is still (historically) classified as a Germanic language because of the primary sound shift aka Grimm's law, the existence of the weak preterite etc.

    5. It's not a perfect analogy of course, but what you bring up is exactly my point. English is a Germanic language not because of its Germanic vocabulary (or lack thereof), but because of its shared grammar. Likewise, eukaryotes share a common grammar with archaea in that their key molecular biological processes of transcription and translation are clearly related to the exclusion of the bacteria.

  2. I would like to read all of them also but your summary is great. The potential for a monumental paradigm shift.

  3. The three domains are looking increasingly problematic because the eucaryote host cells appear to be nested within archaea. However, I do not find the idea convincing that an endosymbiosis event is necessarily comparable to a "each human is derived equally from both parents" scenario.

    We are not talking sexual reproduction here but instead a bacterial lineage colonising a new habitat, the host cell. So I fail to see why we couldn't say that the eucaryotes are a sublineage of the Archaea, and the mitochondria and chloroplasts are sublineages of the bacteria.

    Yes, the problem is that there are lots of bacterial genes in the nucleus. But the important thing to understand is that that is what might be called an epistemological problem derived simply from the fact that we mostly use molecular data for phylogenetic analysis - it makes it harder to resolve phylogenetic relationships. It is not, however, or at least in my eyes, classification relevant because we are not and have never been interested in classifying gene copies. Systematics is about classifying individuals into species and species into clades. To think that we should stop trying to find out the lineage-relationships because sometimes lateral gene transfer happens between lineages is the tail wagging the dog, conceptually speaking.

    1. "However, I do not find the idea convincing that an endosymbiosis event is necessarily comparable to a "each human is derived equally from both parents" scenario."

      Yes, it's a stupid analogy because in sexual reproduction the whole point is that parents have the same genes as each other (although possibly different alleles), as does the child who gets alleles from the parents. While there are some cases where eukaryotic cells have copies of a given gene from different lineages, this isn't in general the case and so the analogy fails.

      "So I fail to see why we couldn't say that the eucaryotes are a sublineage of the Archaea, and the mitochondria and chloroplasts are sublineages of the bacteria."

      That's basically what most people do these days. But that isn't nearly as dramatic as calling for a paradigm shift to reject the tree of life and what not.

  4. "The genes of Archaeal origin should not come from a species of Archaea if the Three Domain Hypothesis is to be preserved, at least in part."

    Should, right?

    1. No, "should not." According to the Three Domain Hypothesis eukaryotes and archaea share a common ancestor that's neither eukaryote, nor archaea.

    2. ***No, "should not." According to the Three Domain Hypothesis eukaryotes and archaea share a common ancestor that's neither eukaryote, nor archaea***

      I never got that impression. I always assumed eukaryotes evolved from a stem archean....but I guess a stem archean doesn't count as an archean?

    3. @ lantog

      re ***No, "should not." According to the Three Domain Hypothesis eukaryotes and archaea share a common ancestor that's neither eukaryote, nor archaea***

      I never got that impression. I always assumed eukaryotes evolved from a stem archean....but I guess a stem archean doesn't count as an archean?

      That is exactly where I too remain perplexed! Like Jonathan Badger, I cannot not get past his excellent linguistics metaphor.

      How about a similar metaphor: Humans and Chimpanzees share a common ancestor that's neither Human nor Chimpanzee. True; but Chimpanzees, Humans and their common ancestor were all nonetheless "Ape".

      So in a sense, it is correct to say that Humans are descended from Apes provided we quickly and precisely define our terms correctly by distinguishing Modern Apes from extinct ancestral Apes.

      If the previous sentence is correct, then it would also be correct to claim that eukaryotes and archaea share a common ancestor that is also archaean in the same generic sense as "Ape" before.

      That is how I always understood (misunderstood?) Woese.

      I (and my students) welcome correction.

      best and grateful regards

    4. I never got that impression. I always assumed eukaryotes evolved from a stem archean....but I guess a stem archean doesn't count as an archean?

      The difference between the Three Domain and the Eocyte hypotheses is that according to the Three Domain hypothesis all modern archaea (known and unknown lineages) are monophyletic while they are paraphyletic in the eocyte scenario.

    5. But that's just a matter of definitions and cladistic wankery, not actual evolutionary events. If you just accept eukaryotes are a type of archaea (which is kind of suggested by the rooted three domain tree anyway), there isn't any problem of paraphyly.

    6. As a taxonomist, I get really frustrated that people doing systematics expect the taxonomic system to do more than it can do. There are three importantly different groups, which we can call Archaea, Bacteria, and Eukaryotes. We often want to talk about these groups and to compare and contrast them with each other. If we treat them at the same taxonomic rank, Archaea is paraphyletic to Eukaryotes, and so are bacteria. The cladists get all twisted up over this. Take a deep breath, realize that although we like our taxonomic categories to reflect ancestry in a simple way, taxonomic categories are also somewhat arbitrary and have to fulfill purely human needs to classify, and especially that these categories cannot reflect a complex phylogeny in a simple way. The world will not collapse if we call Archaea, Bacteria, and Eukaryotes all domains.

      Note the special problem that the initial Eukaryote is in some sense a hybrid, a combination anyway. Treat it like we do taxa that originated as intergeneric hybrids in grasses. Set up a new genus for the taxon that originated as a hybrid, and don't loose sleep over the fact that the genera of its parental taxa are now paraphyletic to it. Be practical and move on. (And teach students that taxonomic categories are human constructs and sometimes inconsistent.)

    7. Hi Barbara

      Thank you so much for your reality check.

      If I may add, the rate at which DNA is transferred from organelles to the nucleus has posed a semantical red herring. Nothing new here - this has been known for 20 years already.

      Those presumably less autonomous organelles are actually far more complex considering the transfer of most of their genome to the nucleus only to have their proteins to be retargeted and transferred back to endosymbionts.

      ITMT - eukaryotes-archea must represent some sort of common lineage separate from eubacteria given commonalities of eukaryotes-archea vs. eubacteria. Off the top of my head I can cite ribosomal differences (tetracycline kills mitochondria no differently than eubacteria) the presence of introns and the differences in DNA synthesis machinery. I cite all three as eukaryote-archea commonalities of a common lineage as opposed to Eubacteria as the obvious outlier.

      At least that is what I cite in class. Again and as always I welcome correction.

      Writing from Strassburg (a beautiful German city still under French occupation)

      ...writing late in the evening after a few glasses of excellent wine - I hope my missive was not too incoherent.

    8. Jonathan Badger, Wednesday, September 02, 2015 8:35:00 AM:

      "But that's just a matter of definitions and cladistic wankery, not actual evolutionary events...."

      Agree. You might want to take a look at a fusion/anti-fusion model for the origin of cellular and viral domains on pgs. 5-7 of this paper:

      Because of your interest in Bacteriovorax and extreme viruses you might also be interested in this article:

  5. "those people who were -brainwashed- convinced by Woese"

    That hardly seems fair that you're implying brainwashing here. We are all led to believe in the best supported hypothesis at the time.

    1. The Three Domain Hypothesis was promoted very aggressively by its followers in spite of the evidence, not because of it.

  6. The paper probably closest to the truth argued Eukaryotes originated from Eukaryotes. Shazam! What a great insight!

    Carlos Mariscal, W. Ford Doolittle

    "....In such ‘eukaryotes first’ (EF) scenarios, the last universal common ancestor is imagined to have possessed significantly many of the complex characteristics of contemporary eukaryotes,"


      One might reasonably speculate that Sal read only the abstract.

    2. "...Eukaryotes originated from Eukaryotes."

      While you at it, information originated from information and life from life.

    3. liesfor,

      Read the whole thing! Don't be such an obvious straightforward idiot!

    4. We have a winner...

      I linked to that issue a couple days ago and note that it will be subject to a lot of quote mining, and we should expect it to start very soon.

      And here it comes...

  7. Great post. Thank you for reading the intro article & synthesising it.

  8. Now I wish someone would write a totally up-to-date introductory article on the origin and early evolution of life for non-traditional students, non-science majors, i.e., students who have just for the first time heard terms like "eukaryote" and "archaea" and are still iffy about understanding evolution at its basics. Can someone do that? Replace the Three Domains textbook chapters?

    1. This is what Koonin wrote in The Logic of Chance:

      I intended to write the book in the style of the aforementioned excellent popular books in physics, but the story took a life of its own and refused to be written that way. The result is a far more scientific, specialized text than originally intended, although still a largely nontechnical one, with only a few methods described in an oversimplified manner