This functional data has been collected and assembled in a large database called the Gene Ontology Resource (GO). Unfortunately, it only covers about one third of all human proteins.
Feuermann et al. (2025) developed a program to extend the coverage provided by direct annotation of human genes by making use of annotation records in related genes from other species. They constructed phylogenetic trees for 6,333 gene families representing a total of 17,079 human genes. This allowed them to deduce the functions of many more genes by taking advantage of GO annotations in homologous genes.
The "Venn diagram" (their words) illustrates the advantage of combining phylogenetic data with GO annotation. The yellow part represent the fraction of human proteins that have been directly annotated in the GO database while the brown and blue sections show how function can be inferred from a much larger set of proteins by looking at annotations in other species.
The results are not surprising. Most of the proteins are involved in basic metabolism such as cellular processes: (deep blue), cellular metabolism: (blue), and cell structure (cell structure: dark blue). Most of the genes families involved in these functions are derived from single genes that are also found in bacteria. The important point here is that the increase in number of genes in complex eukaryotes is mostly due to gene duplication and subspecialization.
Feuermann, M., Mi, H., Gaudet, P., Muruganujan, A., Lewis, S.E., Ebert, D., Mushayahama, T., Consortium, G.O. and Thomas, P.D. (2025) A compendium of human gene functions derived from evolutionary modelling. Nature 640:146-154. [doi: 10.1038/s41586-025-08592-0]




