It would be almost as interesting to know how many are required for just survival of a particular cell. This set is the group of so-called "housekeeping genes." They are necessary for basic metabolic activity and basic cell structure. Some of these genes are the genes for ribosomal RNA, tRNAs, the RNAs involved in splicing, and many other types of RNA. Some of them are the protein-coding genes for RNA polymerase subunits, ribosomal proteins, enzymes of lipid metabolism, and many other enzymes.
The ability to knock out human genes using CRISPR technology has opened to door to testing for essential genes in tissue culture cells. The idea is to disrupt every gene and screen to see if it's required for cell viability in culture.
Three papers using this approach have appeared recently:
Blomen, V.A., Májek, P., Jae, L.T., Bigenzahn, J.W., Nieuwenhuis, J., Staring, J., Sacco, R., van Diemen, F.R., Olk, N., and Stukalov, A. (2015) Gene essentiality and synthetic lethality in haploid human cells. Science, 350:1092-1096. [doi: 10.1126/science.aac7557 ]Each group identified between 1500 and 2000 protein-coding genes that are essential in their chosen cell lines.
Wang, T., Birsoy, K., Hughes, N.W., Krupczak, K.M., Post, Y., Wei, J.J., Lander, E. S., and Sabatini, D.M. (2015) Identification and characterization of essential genes in the human genome. Science, 350:1096-1101. [doi: 10.1126/science.aac7041]
Hart, T., Chandrashekhar, M., Aregger, M., Steinhart, Z., Brown, K.R., MacLeod, G., Mis, M., Zimmermann, M., Fradet-Turcotte, A., and Sun, S. (2015) High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell 163:1515-1526. [doi: 10.1016/j.cell.2015.11.015]
One of the annoying things about all three papers is that they use the words "gene" and "protein-coding gene" as synonyms. The only genes they screened were protein-coding genes but the authors act as though that covers ALL genes. I hope they don't really believe that. I hope it's just sloppy thinking when they say that their 1800 essential "genes" represent 9.2% of all genes in the genome (Wang et al. 2015). What they meant is that they represent 9.2% of protein-coding genes.
By looking only at genes that are essential for cell survival, they are ignoring all those genes that are specifically required in other cell types. For example, they will not identify any of the genes for olfactory receptors or any of the genes for keratin or collagen. They won't detect any of the genes required for spermatogenesis or embryonic development.
What they should detect is all of the genes required in core metabolism.
The numbers seen too low to me so I looked for some specific examples.
The HSP70 gene family encodes the major heat shock protein of molecular weight 70,000. The protein functions as a chaperone to help fold other proteins. They are among the most highly conserved genes in all of biology and they are essential. The three genes for the normal cellular proteins are HSPA5 (Bip, the ER protein); HSPA8 (the cytoplasmic version); and HSPA9 (mitochondrial version). All three are essential in the Blomen et al. paper. Only HSPA5 and HSPA9 are essential in Hunt et al. (This is an error.) (I can't figure out how to identify essential genes in the Wang et al. paper.)
There are two inducible genes, HSPA1A and HSPA1B. These are the genes activated by heat shock and other forms of stress and they churn out a lot of HSP70 chaperone in order to save the cells. There are not essential genes in the Blomen et al. paper and they weren't tested in the Hunt et al. paper. This is an example of the kind of gene that will be missed in the screen because the cells were not stressed during the screening.
I really don't like these genomics papers because all they do is summarize the results in broad terms. I want to know about specific genes so I can see if the results conform to expectations.
I looked first at the genes encoding the enzymes for gluconeogenesis and glycolysis. The results are from the Blomen et al. paper. In the figure below, the genes names in RED are essential and the ones in blue are not.
As you can see, at least one of the genes for the six core enzymes is essential. But none of the other genes is essential. This is a surprise since I expect both pathways (gluconeogenesis and glycolysis) to be active and essential in those cells. Perhaps the cells can survive for a few days without making these enzymes. It means they can't take up glucose because one of the hexokinase enzymes should be essential.
These result suggest that the Blomen et al. study is overlooking some important essential genes.
Now let's look at the citric acid cycle. All of the enzymes should be essential.
That's very strange. It's hard to imagine that cells in culture can survive without any of the genes for the subunits of the pyruvate dehydrogenase complex or the subunits of the succinyl C0A synthetase complex. Or malate dehydrogenase, for that matter.
Something is wrong here. The study must be missing some important essential genes. I wish the authors had looked at some specific sets of genes and told us the results for well-known genes. That would allow us to evaluate the results. Perhaps this sort of thing isn't done when you are in "genomics" mode?
The "core fitness" protein-coding genes that were identified are more highly conserved than the other genes and they tend to be more highly expressed. They also show lower levels of variation within the human population. This is consistent with basic housekeeping features.
Each group identified several hundred unannotated genes in their core sample. These are genes with no known function (yet).
The results of the three studies do not overlap precisely but most of the essential genes were common to all three analyses.