Friday, March 23, 2007
How Many Genes Do We Have?
The number of genes in the human genome flutuates on a monthly basis as the genome annotators add new genes and remove false positives. It's an ongoing process that's not likely to be complete in the near future.
The original draft sequences of the human genome had between 25,000 and 30,000 genes but these numbers were not reliable since they were based entirely on computer predictions. The programs were still in the testing stage for complex genomes when they were used in 2001. They are much better now but it really takes human intevention to assess whether a prediction is correct or not. The annotation process is tedious.
The latest summary from NCBI is based on the Oct. 17, 2006 genome assembly [NCBI Reference Assembly]. It lists 28,961 genes for the public genome and 26,245 for the private Celera assembly.
The Ensembl site has better data because the curation seems to be more rigorous. It lists 26,720 genes of which 3,994 have RNA products (mainly ribosomal RNA, tRNAs, and snoRNAs) [Ensembl Homo sapiens]. This is not much different than the NCBI number. It looks like the total number of genes is stabilizing at 27,000 total genes and about 23,000 protein encoding genes.
Carl Zimmer recently posted an article about the number of genes in the human genome [You Don't Miss Those 8,000 Genes, Do You?]. He referred to the PANTHER database where they quote 25,431 genes on their current website [PANTHER pie chart]. This differs considerably from the 18,308 genes shown in Zimmer's original article at this site [PANTHER filtered NP]. The difference is due to filtering the total number of genes (25,431) by showing only those that have a RefSeq entry in the Entrez database. This is an underestimate since not all genes have been assigned a RefSeq entry, particularly those that produce an RNA product rather than a protein.
[Thanks to Scientia Natura for the cartoon]