Sandwalk: The Core Genome

Friday, October 28, 2011

The Core Genome

Hundreds of genomes have been sequenced. It should be relatively easy to search all these genomes to identify those genes that are found in every single species. This small class of genes should represent the core genome—the genes that were probably present in the first living cell.

Turns out it's not that easy. For one thing, you have to remove parasitic bacteria from your set of genomes because these species could easily be getting by without some essential genes that are supplied by their hosts. Next you have to make sure you have a huge variety of different species that cover all possible forms of life. In practice, this means that you need about 300 different genomes, mostly bacteria.

I'm reading The Logic of Chance: The Nature and Origin of Biological Evolution, by Eugene Koonin. This is just one of many books that are critical of the most popular views of evolution. Most of these books are written by kooks or religious nutters but some of them are valid scientific critiques of modern evolutionary theory. Koonin's book is one of those and I agree with most of what he has to say. One of his topics is genome evolution.

As Koonin describes it, the first genome comparisons looked at Haemophilus influenzae and Mycoplasma genitalium, two species of bacteria that aren't distantly related. There were about 240 orthologous genes found in both species.¹ The first surprise was that this core set was missing some very important members that should have been there.

Some essential metabolic reactions must have been catalyzed by enzymes in the very first cells but the Haemophilus enzyme isn't present in Mycoplasma and vice versa. It took a bit of digging but eventually the problem was solved with the discovery of different enzymes that carried out the same reaction. The genes for these enzymes are completely unrelated.

As more and and more genomes were sequenced, the size of the core genome set shrunk until today it comprises fewer than 100 genes. Most of these genes are genes for the three ribosomal RNAs, about 30 tRNAs, and a few other essential RNA molecules. There are only about 33 protein-encoding genes in the universal core set. They include genes for the three large RNA polymerase subunits and 30 proteins required for translation (mostly ribosomal proteins).

DNA polymerase isn't in the core set because some species of bacteria have unusual DNA polymerases that replicate DNA just fine but are unrelated to the enzymes found in most cells. There are multiple, unrelated, versions of the aminoacyl tRNA synthetases—the enzyme that attaches an amino acid to its cognate tRNA. Some species have one version and other species have the second version. Some species have both. In any case, no single synthetase gene is found in every species so it's not part of the core set.

Koonin refers to this observation as non-orthologous gene displacement (NOGD). He envisages a scenario where a cell with gene X takes up a copy of a non-orthologous gene (gene Y) that catalzyes the same reaction. Over time the newly acquired gene displaces the original version. In this way a non-orthologous version (e.g. gene Y) could have arisen after the formation of the first cell and spread to a variety of different species by horizontal gene transfer. The scenario doesn't rule out the possibility that the two non-orthologous versions could have arisen independently in two separate origins of life but this seems less likely.

Let's look at a couple of examples. Biochemistry textbook writers have known for decades that there are different versions of some common metabolic genes ² The aldolase enzyme in gluconeogenesis & glucolysis is a classic. Some species have the class I enzyme/gene while others have the class II enzyme/gene. Some species have both.

This is an example of convergent evolution. The enzymes have different mechanisms and, as you can see from the figure, completely different structures. It doesn't seem to matter if a species has a class I enzyme or a class II enzyme since both enzymes are very good at catalyzing the fusion of two three-carbon molecules into a six-carbon fructose molecule or cleaving the six-carbon molecule in the reverse reaction.

The pyruvate dehydrogenase complex (PDC) is a huge enzyme that catalyzes an important metabolic reaction making acetyl-CoA—the substrate for the citric acid cycle. It seemed likely that every single species would have the genes for all of the PDC subunits but many species of bacteria were missing the entire complex. They have a different enzyme, pyruvate:ferredoxin oxidoreducatase that catalyzes a similar reaction. The enzymes have completely different mechanisms and are unrelated.

In this case, we have reason to believe that the enzyme requiring ferredoxin is more primitive and the more common pyruvate dehydrogenase complex evolved later. The PDC genes displaced the gene for pyruvate:ferredoxin oxidoreductase in many, but not all, species. That's why the genes for neither enzyme are part of the core set.

We don't know whether the existing core set of 100 genes truly represents genes that were present in the first living cell or whether they completely displaced the original versions. The fact that many of these genes are part of large operons might have made it easier for them to be transferred by horizontal gene transfer. (The selfish operon model.)

The bottom line is that attempts to reconstruct the genome of the first cell have failed because of NOGD and we now have to incorporate that concept into our way of thinking about early evolution. The good news is that the evolution of completely new genes seems to be much easier than we first imagined. We even have examples of three or four completely different enzymes carrying out the same reaction.³

1. Koonin refers to conserved genes as Clusters of Orthologous Genes or COGs. It actually counts conserved domains rather than entire genes but the differences aren't great so I'll just refer to them as genes.

2. That is, those textbook writers that emphasize comparative biochemistry or an evolutionary approach to biochemistry. Some textbooks just cover human (mammalian) biochemistry so they won't even mention whether bacteria do biochemistry.

3. I'm not sure how the Intelligent Design Creationists explain these observations. Maybe there were several different designers who each came up with their ideal solution to the problem? Maybe there was only one designer who just got a kick out of making different versions of the same enzyme activity but got bored at only two or three?

6 comments :

Peter said...: In practice, this means that you need about 300 different genomes, mostly bacteria.
What do you think to the hypothesis that viruses should be included in the mix - i.e. that substantial chunks of the bacterial / archaeal / eukaryotic gene repertoires may ultimately have a viral origin?

e.g.
http://www.pnas.org/content/103/10/3669.full
http://rstb.royalsocietypublishing.org/content/364/1527/2263.full; Friday, October 28, 2011 5:32:00 PM
John S. Wilkins said...: A minor point of clarification: The Core Genome Hypothesis is the view that what makes asexual bacterial species is a shared set of core genes. It is not widely adopted, but this might cause some small confusion.

Wertz, J. E., C. Goldstone, D. M. Gordon, and M. A. Riley. 2003. A molecular phylogeny of enteric bacteria and implications for a bacterial species concept. J Evol Biol 16 (6):1236-1248.; Friday, October 28, 2011 6:15:00 PM
DAK said...: Koonin has one of the strangest accents I've ever heard. This, coupled with his bizarre vocal mannerisms, had me transfixed and mesmerised while watching a video of one of his lectures at the From RNA to Humans Symposium.

The lecture was quite good too.; Friday, October 28, 2011 6:20:00 PM
jaxkayaker said...: It's Mycoplasma genitalium, not Micrococcus genitalium, isn't it?; Friday, October 28, 2011 10:06:00 PM
Larry Moran said...: Yes, it's Mycoplasma.

Thanks.; Saturday, October 29, 2011 5:21:00 AM
Anonymous said...: Larry are Isozymes a good term to refer to these convergent non-homologus enzymes that perform the same function?; Thursday, November 03, 2011 11:02:00 PM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Friday, October 28, 2011

The Core Genome

6 comments :