Monday, May 11, 2026

The functions of human protein-coding genes

There are about 20,000 protein-coding genes in the human genome. We'd like to know what all these genes are doing but the only way to find out is to rely on experiments that explore the function of an individual protein.

This functional data has been collected and assembled in a large database called the Gene Ontology Resource (GO). Unfortunately, it only covers about one third of all human proteins.

Feuermann et al. (2025) developed a program to extend the coverage provided by direct annotation of human genes by making use of annotation records in related genes from other species. They constructed phylogenetic trees for 6,333 gene families representing a total of 17,079 human genes. This allowed them to deduce the functions of many more genes by taking advantage of GO annotations in homologous genes.

The "Venn diagram" (their words) illustrates the advantage of combining phylogenetic data with GO annotation. The yellow part represent the fraction of human proteins that have been directly annotated in the GO database while the brown and blue sections show how function can be inferred from a much larger set of proteins by looking at annotations in other species.

The results are not surprising. Most of the proteins are involved in basic metabolism such as cellular processes: (deep blue), cellular metabolism: (blue), and cell structure (cell structure: dark blue). Most of the genes families involved in these functions are derived from single genes that are also found in bacteria. The important point here is that the increase in number of genes in complex eukaryotes is mostly due to gene duplication and subspecialization.


Feuermann, M., Mi, H., Gaudet, P., Muruganujan, A., Lewis, S.E., Ebert, D., Mushayahama, T., Consortium, G.O. and Thomas, P.D. (2025) A compendium of human gene functions derived from evolutionary modelling. Nature 640:146-154. [doi: 10.1038/s41586-025-08592-0]

Sunday, May 10, 2026

Why do scientists at "elite" universities dominate scientific discourse?

We all know that scientists at elite universities publish a lot more papers than scientists at other universities. Why is that? Is it because those universities have better labs and equipment? Is it because the scientists at elite universities are smarter than other scientists? Is it because of the reputation of the universities that makes it easier to get papers accepted in the best journals?

A group of scientists at the University of Colorado (Boulder, Colorado, USA) decided to examine the question and they came up with another answer—one that I have long suspected.

Saturday, May 09, 2026

Pervasive transcription = genes + noise

Most of the DNA in the human genome is transcribed at some point in development or in some cell type. This fact has been known since the late 1960s.

There are basically two types of transcripts. Functional transcripts mostly come from genes although there might be a few exceptions (e.g. enhancer RNAs). Non-functional transcripts can be produced by pseudogenes or from virus and transposon fossils. They can also due to transcriptional noise caused by spurious transcription.

Friday, May 08, 2026

Philosophers talk about junk DNA

There is considerable debate in the scientific literature over the amount of junk DNA in the human genome. The standard model was developed 50 years ago; it postulated that only 10% of the genome is functional and 90% is junk. Most of the evidence since then has supported that model but there are many scientists who reject it.

This would seem to be fertile ground for philosophers of biology and, indeed, there are some philosophers who have made a significant contribution, mostly in sorting out how to define function (Brunet et al., 2021; Linquist et al., 2020). Also, many philosophers are interested in the history of biology and some (e.g. Morange, 2020) have done a good job of describing the history of the junk DNA concept.