Thursday, December 21, 2006

Mammalian Gene Families: Humans and Chimps Differ by 6%


The first issue of PLoS ONE has just been published. PloS ONE publishes peer-reviewed, open-access, articles that are freely available on the internet. The journal is supported by the Public Library of Science (PloS), a non-profit organization.

The article that I've been waiting to see is,
Demuth, J.P., De Bie, T., Stajich, J.E., Cristianini, N., and Hahn, M.W. (2006) The Evolution of Mammalian Gene Families
Demuth et al. examined gene families in five species whose genomes have been sequenced (human, chimpanzee, mouse, rat, dog). Gene families are normally defined as groups of related genes having more than one copy in a genome. For example, the globin gene family consists of multiple copies of related globin genes such as myoglobin, α-globin, β-globin, and others. The authors appear to use a different definition, which counts orthologous genes in different species as a gene family. Thus, their paper discusses "gene families" that have single genes in different species.

By scanning the available genome sequences, Demuth et al. were able to cluster all genes into 15,389 groups called "gene families." Of these, 3,114 were single genes confined to a single species. These were presumed to be annotation artifacts and were discarded. Not all of the remaining groups were present in all five species. A total of 2,285 additional groups were confined to distinct lineages on the mammalian tree indicating that they had been "created" after divergence from the common ancestor. This leaves 9,990 groups that were probably present in the ancestor of dog, human, chimp, mouse, and rat.

The question is, how many of these gene families show gain or loss of numbers during mammalian evolution? The answer is 5,622 or 56.3% (5622/9,990). The data is shown in Figure 1 (below). The red section of the pie chart represents groups that have experienced a reduction in the number of members of a gene family (or loss of the entire group) in a particular lineage. The green section represents a gain in the number of genes in a family.

Figure 1. Distribution of gene gain and loss among
mammalian lineages.
Creative Commons Attribution License

If we focus on the human/chimp comparison, it turns out that the human genome contains 1,418 genes that do not have orthologs in the chimpanzee genome. What this means is that if we look at the identical sections of human and chimp chromosomes one of them will have a gene that the other one does not have at that position. It turns out that the human genome has 689 genes not present in the chimp and the chimp has 729 genes not present in humans. If there are 22,000 genes in the genome, then this total of 1,418 differences represents 6.4% of the genes.

It's important to note that this does not mean that entirely new genes are created or destroyed. What it means is that there have been duplication events such that a gene has been duplicated in one of the lineages. For example, let's say that the region of the chromosome containing the α-globin genes was duplicated in the chimpanzee lineage. This would count as a gain in chimps relative to humans.

There are several problems with the analysis. One of the most severe is the lack of complete coverage of the chimp genome and the relatively poor annotation compared to the human genome. Only 94% of the chimp genome is available while the human genome is about 99% complete and much more accurate. This means that there will be a number of genes in humans that won't appear in chimps. It's unlikely that these problems lead to errors of more than 2-fold.

The authors are clearly aware of the fact that most of these changes in gene number have no effect on the organism. They are accidental changes due to random genetic drift. They are also aware of the fact that some of the duplications and losses are variants that are segregating in the human and chimp populations. In other words, they are not fixed differences.

Nevertheless, Demuth et al. point out that some of the gains and losses of genes could be responsible for the phenotypic differences between chimpanzees and humans. They caution us that the traditional 1% difference in the sequences of orthologous genes may not be the whole story.

1 comment: