Sandwalk: Bacteria Phylogeny: Facing Up to the Problems

Tuesday, October 14, 2008

Bacteria Phylogeny: Facing Up to the Problems

There are millions of species of bacteria. Sorting out their evolutionary history has been a major challenge for decades. Unlike the much bigger, multicellular, eukaryotes, there are few morphological markers to assist scientists in classifying bacteria. The fossil record is mostly silent.

Molecular evolution came to the rescue thirty years ago when cloning and sequencing became common. Soon there were elaborate and detailed phylogenetic trees based on comparing sequences of conserved genes from many species.

The gene of choice was the one for the small subunit ribosomal RNA (SSU rRNA). This gene was well conserved in bacteria and it was easy to get sequences simply by PCR. (The ends of the SSU rRNA gene are conserved and this means that you can develop universe primers for PCR.)

Over the years, the SSU rRNA gene has become what is called the "gold standard" in bacterial phylogeny and taxonomy. Many species have been assigned to taxa based entirely on the sequence of their SSU rRNA gene. Unfortunately, the "gold standard" has become somewhat tarnished lately.

Our fellow blogger, Jonathan Eisen of The Tree of Life, has recently published a paper that looks at the problems with bacterial phylogeny (Wu and Eisen, 2008). He posted a brief summary of the paper and commented on why he likes the journal Genome Biology [Happy Open Access Day: Back to Genome Biology for Me].

There is much to like about this paper. The authors face up to the problems with the current bacterial phylogeny, which is based almost entirely on a single gene (SSU rRNA). They point out that this is risky given what we know about molecular phylogenies. Furthermore, in the case of the SSU ribosomal RNA gene we know for a fact that this has led to problems and inconsistencies. In addition to the practical difficulties there are good theoretical reasons for being suspicious of phylogenies constructed from nucleotide sequences.

What to do? One possible solution is to abandon SSU rRNA as a "gold standard" and replace it with a highly conserved protein coding gene. Unfortunately, this doesn't get around the problem of relying on a single gene. The way around this is to use an artificial concatenated sequence made up of several different conserved genes laid out end-to-end in one large string of amino acids.

So why isn't this done? Because, as Wu and Eisen point out, it ain't that easy. The main difficulty in any phylogenetic study is getting a proper alignment. This is a problem that many workers simply ignore when they use automated alignment software like CLUSTALW. These workers assume that the alignments are valid.

They aren't, and this is another example of facing up to the problem. Many scientists agonize over what program to use when constructing their trees—should they use maximum likelihood, parsimony, etc. etc.? In most cases these decisions are a complete waste of time because their alignments aren't good enough to make a difference.

Here's how Wu and Eisen explain it ...

It has been shown that alignment quality can have greater impact on the final tree than does the tree-building method employed [20]. Therefore, preparing high quality sequence alignments is a most critical part of any molecular phylogenetic analysis. This preparation typically involves careful but tedious manual editing and trimming of the generated alignments, and thus remains the biggest challenge to automation. When scaling up this process, the trimming step is often simply ignored. Automated trimming based on the number of gaps in each column or each column's conservation score can be used to select conserved blocks, but still is not satisfactory when a high quality tree is required.

Keep in mind that what is being proposed is a large tree based on concatenated sequences from many genes. You don't want to do multiple sequence alignments for every gene by hand, and yet up until now, that was the only way to get accurate results.

Wu and Eisen have written a program called AMPHORA that hopefully solves this problem. They begin by manually creating "seed alignments" that are manually curated. Then they use AMPHORA to align all the other sequences to the seed alignments. In this way they hope to overcome the limitations of automated multisequence alignment without having to align everything by hand.

None of this would be possible, of course, unless there were large numbers of species where every one of the target genes have been cloned and sequenced. In the 20th century this would have been impossible but now there are hundreds of completely sequenced bacterial genomes. This means that each one of them has a sequenced copy of the genes required for this kind of analysis.

All that's left is to identify the completely sequenced genomes and pick the set of genes. There are 578 genomes in the database but many of these are close relatives that will not be useful in constructing a large tree of all bacterial sequences. The final set contains 310 genomes with representatives of all the major groups.

The authors selected 31 genes for their initial proof of principle paper (dnaG, frr, infC, nusA, pgk, pyrG, rplA, rplB, rplC, rplD, rplE, rplF, rplK, rplL, rplM, rplN, rplP, rplS, rplT, rpmA, rpoB, rpsB, rpsC, rpsE, rpsI, rpsJ, rpsK, rpsM, rpsS, smpB, tsf). Those of you who recognize these genes will see that 21 of them are small ribosomal proteins. This was not the best choice, in my opinion, but the authors of the paper note that they are continuing the study by incorporating better genes such as HSP70 (dnaL) and EF-Tu (tufA). You can't just choose any conserved gene because it has to be present in most species and there are surprisingly few genes that meet that criterion.

After all that, what's the bottom line? The grand phylogeny is shown at the top of this posting. It resolves many groups that are unresolvable using the SSU rRNA tree. In some cases this tree reveals species that have been incorrectly assigned to higher taxa. These species will have to be reclassified if this result holds up.

The most important finding is that the method works and it yields trees with excellent resolution of the major bacterial taxa.

Wu, Martin, Eisen, Jonathan (2008). A simple, fast, and accurate method of phylogenomic inference Genome Biology, 9:R151 [Genome Biology] [doi:10.1186/gb-2008-9-10-r151]

17 comments :

Christopher Taylor said...: Very nice (the DOI link to the paper doesn't work at the moment, by the way, but the other one does). The main problem that I can imagine with using concatenated sequences for prokaryotes (without my being an expert, of course) would be the Horizontal Gene Transfer issue - if different gene sections are giving you different but both perfectly accurate estimates of phylogeny, what effect will that have on your result? I've wondered if prokaryotes might be one group of organisms where a supertree approach might be more effective - making separate trees from separate genes, then combining them for your end phylogeny.; Tuesday, October 14, 2008 9:20:00 PM
Sage said...: Thirty years ago Woese was proposing to spin off the Archaebacteria, but he was not sequencing rRNA genes. He was isolating the rRNA itself and using RNA fingerprinting techniques based on Sanger's 1965 2-D electrophoresis method to compile the oligonucleotides produce from RNase digestion. He was measuring homologies and sequences differences, but not actually producing any full rRNA sequences.

PCR, nominally invented in 1984, doesn't open the doors to easy sequencing until after about 1989.; Tuesday, October 14, 2008 11:51:00 PM
Allen MacNeill said...: What is the point of calling the terminal taxa in prokaryotic phylogenies "species", beyond a wholly spurious analogy with eukaryotes? None of the definitions of "species" that are generally applied to eukaryotes can be applied to prokaryotes for the simple reason that the latter do not couple reproduction with sexual recombination. The Dobzhansky-Mayr "biological species concept" is (like Darwin's) based on reproductive isolation; to be members of the same species, two individuals must be capable of interbreeding and producing fertile offspring under "natural" conditions. Prokaryotes simply don't do this at all.

Lynn Margulis (taking her cue from Sorin Sonea Lucien Mathieu, of the Université de Montreal) asserts that

"...bacteria do not have species at all (or, which amounts to the same thing, all of them together constitute one single cosmopolitan species). Speciation is a property only of nucleated organisms."

For more on the whole problem of "species" in evolutionary biology, see:
http://evolutionlist.blogspot.com/2006/03/origin-of-specious.html; Wednesday, October 15, 2008 8:55:00 AM
Anonymous said...: There is no problem of "species" in evolutionary biology. None. There are a lot of confused people who think that "species" has to be something definable in absolute terms and applicable in 100% cases. Which isn't obviously always possible because Nature is more complex than we wish it to be. ANY classification is oversimplification of reality.

And so yes, taxonomy in prokaryotes is much more difficult but is not fundamentally different from eukaryotes. We still need to call different bacteria different names, right? So why not "species"? There would have to be *some* name anyway. If species don't exist in procaryotes then what does exist? - Some different *word*, obviously. But it's just a word.; Wednesday, October 15, 2008 9:53:00 AM
Anonymous said...: You could also check out this recent paper on defining species in strictly asexual rotifers for a way to treat speciation where recombination and reproduction have been decoupled.; Wednesday, October 15, 2008 11:52:00 AM
Larry Moran said...: Allen MacNeill,

What is the point of calling the terminal taxa in prokaryotic phylogenies "species", beyond a wholly spurious analogy with eukaryotes?

You have to call them something and "species" is as good as anything on a blog such as this.

I'm well aware of the problems with defining "species."

You must have a better word or you wouldn't have raised the issue. Perhaps you could share it with us?

BTW, this part of your comment is dead wrong.

The Dobzhansky-Mayr "biological species concept" is (like Darwin's) based on reproductive isolation; to be members of the same species, two individuals must be capable of interbreeding and producing fertile offspring under "natural" conditions. Prokaryotes simply don't do this at all.; Wednesday, October 15, 2008 4:31:00 PM
Allen MacNeill said...: How, precisely, is the statement "dead wrong"? Seems to me there are several possibilities:

1) That's not what Darwin, Dobzhansky, and Mayr said

2) The "biological species concept" isn't based on reproductive isolation

3) Reproduction in prokaryotes is indeed coupled with sexual recombination in essentially the same way it is in eukaryotes (especially animals)

4) Prokaryotes can be "fertile" or "infertile" in the same way that eukaryotes can

I'm curious; which of these assertions do you agree with, and on what evidence?; Thursday, October 16, 2008 12:33:00 AM
Allen MacNeill said...: The reason I don't like to use the term "species" to refer to the termini of prokaryote phylogenies is that it conveys all the wrong ideas about what phylogenies are all about. Focusing on reproductive incompatibility has little or no bearing on diversification among prokaryotes, and is hopelessly muddled by the problem of horizontal gene transfer. Indeed, I would go further: I think the whole idea of "species" is a holdover from Platonic typological thinking, and has done little or nothing to advance our understanding of how genetic and phenotypic diversification has proceeded in most phylogenies, including animals (the only group in which reproductive incompatibility plays a crucial role in phylogenetic diversification).

"Species", in other words, are a figment of the human imagination.; Thursday, October 16, 2008 12:40:00 AM
Anonymous said...: Allen MacNeill said:
"Species", in other words, are a figment of the human imagination.

And so are colors. The whole idea of color "green" is a holdover from Platonic typological thinking and has done little or nothing to advance our understanding of how electromagnetic waves are perceived by photoreceptors.

Strangely enough, I still find it useful. :-); Thursday, October 16, 2008 2:42:00 AM
Larry Moran said...: Allen MacNeill asks,

How, precisely, is the statement "dead wrong"? Seems to me there are several possibilities:

... to be members of the same species, two individuals must be capable of interbreeding and producing fertile offspring under "natural" conditions. Prokaryotes simply don't do this at all.

This part is dead wrong. Many bacteria "species" have something akin to sex where individuals can exchange alleles,; Thursday, October 16, 2008 9:42:00 AM
Jonathan Eisen said...: Well #1 thanks for writing about our paper

#2 ... I just want to chime in on the species issue. I think there is no reason why we cannot use the term species for bacterial groups. Sure, they are not quite the same groupings as we would see in eukaryotic species, but there really do seem to be true groupings where on average many of the genes in the genome are more similar/related among members of a group than with other groups. To me, that is good enough to call things a species as it indicates some higher rate of gene flow within the group than between. Sure, lateral transfer messes things up, but it seems to not have eliminated consistent groupings (whether you believe Ford Doolittles argument about what this means or not).; Thursday, October 16, 2008 7:53:00 PM
Jonathan Eisen said...: Oh, and I note, for the Venter Sargasso Sea paper, I used HSP70, EF-TU, EFG, RpoB and RecA as our protein markers, so maybe you would like those more?; Thursday, October 16, 2008 11:00:00 PM
Anonymous said...: It's wonderful for a layperson like me to be able to read about research like this.

I've looked at the figure reproduced in the post in the PDF version at Genome Biology, at a high enough zoom level for my middle-aged eyes to read text. What I see there leads me to the following question: Do the results tend to show bacteria such as Thermus thermophilus and Deinococcus geothermalis at the base of the phylogenetic tree? If so, this may be something well-understood in the field, but I hadn't known it and find it interesting.; Friday, October 17, 2008 7:09:00 AM
Anonymous said...: Hi Jud, the tree in the figure is an unrooted tree so it does not really say anything about the base; Friday, October 17, 2008 1:16:00 PM
Anonymous said...: anonymous wrote: [T]he tree in the figure is an unrooted tree so it does not really say anything about the base.

Yup, sorry, my fault for incorrectly phrasing the question. An attempt to do better: Does it appear possible or likely that bacteria such as Thermus thermophilus and Deinococcus geothermalis were ancestral to the other bacteria species shown in the figure?; Friday, October 17, 2008 1:41:00 PM
Christopher Taylor said...: Jud - as pointed out, the tree as presented is not meant to be read as rooted, so the base of the tree could just as readily be between Gammaproteobacteria and all other lineages as between Deinococci and other lineages as shown. That said, the arrangement shown has obviously been chosen for its similarity to the arrangement found in many rDNA trees, which find Aquificae, Thermotogae and Deinococci quite low on the tree. Whether or not this arrangement represents the actual evolutionary history is still very debatable.

Ultrastructurally, eubacteria can be divided into two broad groups. Monodermata include mostly Gram-positive bacteria with a single cell membrane inside the cell wall, and would include Firmicutes, Actinobacteria and Thermotogae. Didermata are mostly Gram-negative bacteria with two cell membranes, one on either side of the cell wall, and would include everything to the right of Firmicutes in Wu and Eisen's tree, as well as Cyanobacteria, Aquificae and Deinococci. Either one of these two groups could be paraphyletic with regard to each other. The separation of Deinococci and Aquificae from other didermates might indicate a long-branch effect with those two groups appearing in the wrong part of the tree. Alternatively, Monodermata could have arisen polyphyletically through multiple losses of the outer membrane, or Didermata could be polyphyletic with multiple monodermate ancestors developing an outer membrane.

I suppose my central point is that the higher-level bacterial phylogeny is far from settled (and if researchers like Doolittle are correct, may not even be identifiable if time and LGT have eroded all the reliable signals).; Sunday, October 19, 2008 9:17:00 PM
Joanna said...: This is a very helpful approach for metagenomic and phylogenetic research. I am a master student in Greece and I would die to try the AMPHORA pipeline in my master thesis and an article we prepare here in our lab at the Biomedical Research Foundation (BRF) of the Academy of Athens. Unfortunately I can not make it work, even after a lot of effort from friends and my shelf. The main problem is that I am not sure about the format of the input file I use (txt in Fasta format of 104 bacterial proteoms). But even when I try to run AMPHORA/Phylotyping.pl with a reference file, the application runs but it stops without printing or saving the tree. Please help!; Tuesday, April 12, 2011 8:39:00 AM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Tuesday, October 14, 2008

Bacteria Phylogeny: Facing Up to the Problems

17 comments :