More Recent Comments

Monday, May 11, 2026

The functions of human protein-coding genes

There are about 20,000 protein-coding genes in the human genome. We'd like to know what all these genes are doing but the only way to find out is to rely on experiments that explore the function of an individual protein.

This functional data has been collected and assembled in a large database called the Gene Ontology Resource (GO). Unfortunately, it only covers about one third of all human proteins.

Feuermann et al. (2025) developed a program to extend the coverage provided by direct annotation of human genes by making use of annotation records in related genes from other species. They constructed phylogenetic trees for 6,333 gene families representing a total of 17,079 human genes. This allowed them to deduce the functions of many more genes by taking advantage of GO annotations in homologous genes.

The "Venn diagram" (their words) illustrates the advantage of combining phylogenetic data with GO annotation. The yellow part represent the fraction of human proteins that have been directly annotated in the GO database while the brown and blue sections show how function can be inferred from a much larger set of proteins by looking at annotations in other species.

The results are not surprising. Most of the proteins are involved in basic metabolism such as cellular processes: (deep blue), cellular metabolism: (blue), and cell structure (cell structure: dark blue). Most of the genes families involved in these functions are derived from single genes that are also found in bacteria. The important point here is that the increase in number of genes in complex eukaryotes is mostly due to gene duplication and subspecialization.


Feuermann, M., Mi, H., Gaudet, P., Muruganujan, A., Lewis, S.E., Ebert, D., Mushayahama, T., Consortium, G.O. and Thomas, P.D. (2025) A compendium of human gene functions derived from evolutionary modelling. Nature 640:146-154. [doi: 10.1038/s41586-025-08592-0]

1 comment :

gert korthof said...

Thanks Larry. Remarkable: in the diagram (2e illustration) the red part representing 'Multicellular processes' is so small compared to all the others! but it makes the difference between bacteria and all animals and plants! amazing! Once you have a single-celled organism, relatively few additional genes are required to create a multicellular organism. So, it seems.
2e) The fact that this protein database exist, what does it imply about exon borders? Could it include (an unknown fraction of) entries with poorly defined exons borders? would that be possible?