Nematodes are small wormlike creatures that live almost everywhere. Many of them are parasites but there are thousands of species that live in the soil. "... it is said that if everything on the earth were to disappear except the nematodes, the outlines of everything would still be visible: the mountains, lakes and oceans, the plants and the animals would all be outlined by the nematodes living in every habitat."1
The free-living species Caenorhabditis elegans was chosen by Sydney Brenner as a model organism for the study of development [Nobel Laureates: Sydney Brenner, Robert Horvitz, John Sulston]. It turned out to be an excellent choice and by the mid 1990s this small metazoan (multi-cellular animal) was selected as the best metazoan candidate for genome sequencing.
The complete genome sequence was published in 1998. The genome is 100 Mb in size (= 100 million base pairs). This was smaller than the predicted size of the fruit fly genome (165 Mb) or the human genome (3,200 Mb). The first estimates of the number of genes were over 19,000 and at the time this was thought to be a reliable estimate although there were many, including me, who though that it was probably too high.
Over the years we have become more skeptical of these initial gene counts because there are many problems. The location of genes is determined by sophisticated computer programs that are trained to recognize the important characteristics of gene sequences (protein coding genes). This year marks the tenth anniversary of the publication of the C. elegans genome sequence and most people will be surprised to learn that the annotation of this sequence is just beginning to be complete.
A recent paper by James Thomas summarizes the result so far (Thomas, 2008).
Thomas points out that gene prediction suffers from the presence of false positives. One of the complications is pseudogenes, which are not easy to distinguish from real genes. Another complication is proving that a predicted gene is actually functional and not just a computational artifact. There is no better way to resolve these issues than by having real live people look at every potential gene. This is why annotation takes so long.
The latest estimate is 20,140 protein coding genes in the Caenorhabditis elegans genome. The coding regions (exons) would take up about 40 Mb of DNA or 24% of the genome. Most of the remainder is junk DNA.
The number of genes is remarkably close to the original prediction although it should be noted that estimates of the number of genes went up after the initial draft sequence was published. Nevertheless, unlike the gene count in humans, the number of genes has held pretty steady.
The number of genes can be compared to the number in the Drosohila melanoaster genome (~15,000) and the human genome (20,500). These are the only two other
There are about 23,000 distinct transcripts from these genes. What that means is that roughly 18,000 genes produce a single transcript and about 2,000 produce two or three different transcripts by alternative splicing.
The C. elegans genes can be divided into two categories. About 8,000 of them are unique and the remainder belong to gene families. A gene family consists of multiple copies of the same gene in the same genome. The copies (paralogues) may be identical or they may be quite different but still related. Some of the gene families are very large and some have only two members.
There seem to be about 3,000 genes families contributing to the 12,000 genes that are not unique. The bottom line is that there are about 11,000 (8K + 3K) different kinds of gene in C. elegans. Interestingly, only 1800 of these genes are found in both insects (Drosophila) and primates (humans). The rest are restricted to just insets and nematodes or just nematodes (10,000 are found in other nematode species).
James Thomas points out that the determination of orthology (same genes in other species) is much more difficult than one might imagine. Many of the online databases, for example, contain erroneous entries based on faulty predictions. These false predictions propagate so that it often isn't reliable to use the database to confirm that a predicted gene actually exists. That's why he restricts his comparisons to well-annotated genomes wherever possible.
Partially annotated genome sequences of Caenorhabditis brigsae and Caenorhabditis remaneri are available. Orthologous gene comparisons indicate that the three species are remarkably dissimilar for species within the same genus. They probably diverged at least 20 My ago.
A new nematode genome sequence was published this week. The species is Pristionchus pacificus, a parasite of the oriental beetle Examala orientalis (Dieteridh et al. 2008). The authors note that there is a different species of parasitic nematode associated with almost every species of beetle, which means that there are at least as many nematodes as insects.
The Pristionchus pacificus genome is 169 Mb in size, which is considerably larger than the size of the Caenorhabditis elegans genome (100 Mb). P. pacificus has 23,500 genes.
Some of the increase in genome size is due to more genes but this is only a minor difference. Some of it is due to the presence of additional copies of repetitive DNA sequences in P. pacificus but the increase doesn't account for the extra 69 Mb of DNA.
The differences in gene number are almost entirely due to increases in the members of gene families in the P. pacificus genome. Several specific examples were given, notably 250 extra copies of ribosomal protein genes compared to C. elegans.
Another remarkable difference is in the number of genes involved in detoxification, or removal of poisonous substances. There are about 250 extra copies of gene family members in this category. The authors speculate that this expansion may be selection for detoxifying enzymes in parasites as opposed to the free-living C. elegans.
In addition to the various Caenorhabditis species, we now have a complete genome of the nematode Brugia malayi the parasite responsible for filariasis in humans. Pristionchus diverged from Caenorhabditis about 350 My (million years) ago and Brugia diverged from the others about 900 My ago according to Dietrich et al. (2008). Thomas (2008) cautions that these divergence times are based on an underestimate of mutation/fixation rates and that nematodes may be evolving more rapidly than other phyla. Nevertheless, it is clear that nematodes are an ancient, diverse, and abundant group of animals.
2. See the discussion in the comments for examples of other well-annotated eukaryotic genomes. Yeast is obvious but what about Arabidopsis?
[Photo Credit: Christina Beck]
Christoph Dieterich, Sandra W Clifton, Lisa N Schuster, Asif Chinwalla, Kimberly Delehaunty, Iris Dinkelacker, Lucinda Fulton, Robert Fulton, Jennifer Godfrey, Pat Minx, Makedonka Mitreva, Waltraud Roeseler, Huiyu Tian, Hanh Witte, Shiaw-Pyng Yang, Richard K Wilson, Ralf J Sommer (2008). The Pristionchus pacificus genome provides a unique perspective on nematode lifestyle and parasitism Nature Genetics DOI: 10.1038/ng.227
J. H. Thomas (2008). Genome evolution in Caenorhabditis Briefings in Functional Genomics and Proteomics, 7 (3), 211-216 DOI: 10.1093/bfgp/eln022