Thursday, July 18, 2013

Contradictory Phylogenies for Cyanobacteria

The cyanobacteria are interesting for a number of reasons. They have a complex photosynthesis pathway with two separate phostosystems and an oxygen evolving complex. That means they can use water as an electron donor and NADP as an electron acceptor.

Cyanobacteria probably played an important role in creating an atmosphere with significant levels of oxygen but, contrary to some speculation, they almost certainly arose fairly late in the history of life (i.e. after 500 million years). Cyanobacteria make up a significant proportion of life in the ocean. Primitive cyanobacteria gave rise to chloroplasts in modern plants and algae.

For all of these reasons, cyanobacteria phlylogeny is important. I've been interested in the literature for 25 years, ever since I realized that my favorite genes (HSP70) had chloroplast versions that were similar to the cyanobacteria homologs.

Two groups have recently published phylogenies based on whole genome sequences of cyanobacteria. The first one out was published in PNAS last January but I wasn't aware of it until Jonathan Eisen mentioned it on his blog [New paper from some in the Eisen lab: phylogeny driven sequencing of cyanobacteria]. The paper is by Shih et al. (2013).

The authors sequenced 54 new species of cyanobacteria. They deliberately selected new species that would represent the diversity of the phylum. They selected 31 "conserved" proteins and constructed a tree of cynaobacteria using the concatenated sequences. You can read what Jonathan Eisen has to say about these 31 genes and the problems with sequence alignments at: Bacteria Phylogeny: Facing Up to the Problems. Their tree is quite similar to those made using 16S RNA sequences.

The second paper was just published in Genome Biology and Evolution (Dagan et al., 2013). The senior author is Bill Martin. The only reason I know about this paper is because we were given a free copy of the journal at SMBE2013.1

The second group sequenced six new species of cyanobacteria. They selected a set of 324 genes common to all 51 species in their dataset and constructed a tree from the concatenated sequences. They generated a Maximum Likelihood (ML) tree but they report that the Neighbor Joining tree (NJ) is identical and has stronger support. The NJ tree was published. (That's satisfying because I don't trust ML trees.)

The two phylogenies don't agree. The two papers agree that Gleobacter violaceus is the deepest rooting branch followed by two strains of Synechococcus (JA 3-3Ab and JA2-3b). Both groups recognize the main clades such as the grouping of the abundant small marine organims (Proclorococcus) and various other Synechococcus species (SynPro). The biggest difference is that Dagan et al. (2013) divide all other cyanobacteria into two deeply divergent clades. One contains the SynPro group and the other contains all remaining species of cyanobacteria.

There are other differences. In general, the Shih et al. tree is more complicated than the Degan et al. tree, which tends to have larger monophyletic groups. I'm sure that the Martin group would attribute this to errors caused by lateral gene transfer, which they tried to control for.

It's puzzling that two highly respected groups come up with different phylogenies. This has implications when trying to decipher the origin of chloroplast genomes but that's a topic for another post.

Photo Credit: Fischerella sp. from Cyanobacteria. This is one of the new species sequenced by both groups.

1. This makes me realize that I'm relying heavily on news reports and blogs to alert me to important papers. The only journals I read are Science and Nature and I've all but given up scanning the tables of content of other journals.

Dagan, T., Roettger, M., Stucken, K., Landan, G., Koch, R., Major, P., Gould, S. B., Goremykin, V.V., Rippka, R., de Marsac, N.T., Gugger, M., Lockhart, P.J., Allen, J.F., Brune, I., Maus, I., Pühler, A. and Martin, W.A. (2013) Genomes of stigonematalean cyanobacteria (Subsection V) and the evolution of oxygenic photosynthesis from prokaryotes to plastids. Genome biology and evolution 5:31-44.
[doi: 10.1093/gbe/evs117]

Shih, P.M., Wu, D., Latifi, A., Axen, S.D., Fewer, D.P., Talla, E., Calteau, A., Cai, F., de Marsac, N.T. and Rippka, R. (2013) Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. Proc. Nat. Acad. Sci. (USA) 110:1053-1058. [doi: 10.1073/pnas.1217107110]


  1. Larry,

    How is it possible to trust neighbor-joining trees but not maximum likelihood trees? NJ is a rough estimate of a least squares fit tree, which is a maximum likelihood tree if the model of evolution you use to correct the distances is itself correct and without error. And NJ trees are known for weird artifacts if there are among-site rate differences and missing data.

    One possible explanation for differences among trees is taxon sampling. It might be biased in some way in one or more of the studies, or -- my personal favorite -- dense taxon sampling commonly gives an improved estimate of phylogeny, making the Shih et al. tree a better candidate. Horizontal transfer might be a problem, and there are ways to test for that; I recommend gene-jackknifing as a start.

  2. Curious. The phylogeny of the Alphaproteobacteria (where the mitochondria originated from) has been highly problematic due to many members having a tendency to streamline their genome with a high degree of horizontal gene transfer: a streamlined genome may have allowed the protomitochondrion to form its symbiosys, but it is also problematic when genes are hastily concatenated en mass. I wonder if something similar is going on with cyanobacterial phylogeny, the living relatives of other organelle.