Thursday, December 10, 2015

How many human protein-coding genes are essential for cell survival?

The human genome contains about 20,000 protein-coding genes and about 5,000 genes that specify functional RNAs. We would like to know how many of those genes are essential for the survival of an individual and for long-term survival of the species.

It would be almost as interesting to know how many are required for just survival of a particular cell. This set is the group of so-called "housekeeping genes." They are necessary for basic metabolic activity and basic cell structure. Some of these genes are the genes for ribosomal RNA, tRNAs, the RNAs involved in splicing, and many other types of RNA. Some of them are the protein-coding genes for RNA polymerase subunits, ribosomal proteins, enzymes of lipid metabolism, and many other enzymes.

The ability to knock out human genes using CRISPR technology has opened to door to testing for essential genes in tissue culture cells. The idea is to disrupt every gene and screen to see if it's required for cell viability in culture.

Three papers using this approach have appeared recently:
Blomen, V.A., Májek, P., Jae, L.T., Bigenzahn, J.W., Nieuwenhuis, J., Staring, J., Sacco, R., van Diemen, F.R., Olk, N., and Stukalov, A. (2015) Gene essentiality and synthetic lethality in haploid human cells. Science, 350:1092-1096. [doi: 10.1126/science.aac7557 ]

Wang, T., Birsoy, K., Hughes, N.W., Krupczak, K.M., Post, Y., Wei, J.J., Lander, E. S., and Sabatini, D.M. (2015) Identification and characterization of essential genes in the human genome. Science, 350:1096-1101. [doi: 10.1126/science.aac7041]

Hart, T., Chandrashekhar, M., Aregger, M., Steinhart, Z., Brown, K.R., MacLeod, G., Mis, M., Zimmermann, M., Fradet-Turcotte, A., and Sun, S. (2015) High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell 163:1515-1526. [doi: 10.1016/j.cell.2015.11.015]
Each group identified between 1500 and 2000 protein-coding genes that are essential in their chosen cell lines.

One of the annoying things about all three papers is that they use the words "gene" and "protein-coding gene" as synonyms. The only genes they screened were protein-coding genes but the authors act as though that covers ALL genes. I hope they don't really believe that. I hope it's just sloppy thinking when they say that their 1800 essential "genes" represent 9.2% of all genes in the genome (Wang et al. 2015). What they meant is that they represent 9.2% of protein-coding genes.

By looking only at genes that are essential for cell survival, they are ignoring all those genes that are specifically required in other cell types. For example, they will not identify any of the genes for olfactory receptors or any of the genes for keratin or collagen. They won't detect any of the genes required for spermatogenesis or embryonic development.

What they should detect is all of the genes required in core metabolism.

The numbers seen too low to me so I looked for some specific examples.

The HSP70 gene family encodes the major heat shock protein of molecular weight 70,000. The protein functions as a chaperone to help fold other proteins. They are among the most highly conserved genes in all of biology and they are essential. The three genes for the normal cellular proteins are HSPA5 (Bip, the ER protein); HSPA8 (the cytoplasmic version); and HSPA9 (mitochondrial version). All three are essential in the Blomen et al. paper. Only HSPA5 and HSPA9 are essential in Hunt et al. (This is an error.) (I can't figure out how to identify essential genes in the Wang et al. paper.)

There are two inducible genes, HSPA1A and HSPA1B. These are the genes activated by heat shock and other forms of stress and they churn out a lot of HSP70 chaperone in order to save the cells. There are not essential genes in the Blomen et al. paper and they weren't tested in the Hunt et al. paper. This is an example of the kind of gene that will be missed in the screen because the cells were not stressed during the screening.

I really don't like these genomics papers because all they do is summarize the results in broad terms. I want to know about specific genes so I can see if the results conform to expectations.

I looked first at the genes encoding the enzymes for gluconeogenesis and glycolysis. The results are from the Blomen et al. paper. In the figure below, the genes names in RED are essential and the ones in blue are not.

As you can see, at least one of the genes for the six core enzymes is essential. But none of the other genes is essential. This is a surprise since I expect both pathways (gluconeogenesis and glycolysis) to be active and essential in those cells. Perhaps the cells can survive for a few days without making these enzymes. It means they can't take up glucose because one of the hexokinase enzymes should be essential.

These result suggest that the Blomen et al. study is overlooking some important essential genes.

Now let's look at the citric acid cycle. All of the enzymes should be essential.

That's very strange. It's hard to imagine that cells in culture can survive without any of the genes for the subunits of the pyruvate dehydrogenase complex or the subunits of the succinyl C0A synthetase complex. Or malate dehydrogenase, for that matter.

Something is wrong here. The study must be missing some important essential genes. I wish the authors had looked at some specific sets of genes and told us the results for well-known genes. That would allow us to evaluate the results. Perhaps this sort of thing isn't done when you are in "genomics" mode?

The "core fitness" protein-coding genes that were identified are more highly conserved than the other genes and they tend to be more highly expressed. They also show lower levels of variation within the human population. This is consistent with basic housekeeping features.

Each group identified several hundred unannotated genes in their core sample. These are genes with no known function (yet).

The results of the three studies do not overlap precisely but most of the essential genes were common to all three analyses.


  1. Lots of organisms survive just fine without many core metabolism genes. Does that mean that they can survive in minimal media? Of course not. But the real world isn't a minimal medium but a quite rich place. Even humans as a complete organism are auxotrophs for all kinds of things -- it shouldn't surprise anyone in the least that particular cell lines can get away with even fewer metabolism genes than that.

    1. Yeah I was thinking the same thing. It's hard to talk essentiality in general terms without supplying an environmental context.

      So heatshock genes aren't essential if the cell isn't heat-shocked. Isn't that just expected?

    2. And the very reason why we have been able to survive for 60 million years without the GULO gene is that we can obtain enough vitamion C from our diet. Likewise, all animals need vitamin B12, but only bacteria and archaea can synthesise it.

    3. At an intuitive level I understand an "essential" gene to be a gene without which a cell simply cannot grow and divide under any of the "normal" circumstances it lives under. If there's a "natural" circumstance under which it can grow and divide without some particular gene, then I would say that gene isn't essential.

  2. Unfortunately immortalised cells in culture don't read the text books and often their metabolism is substantially different from "normal" - especially if they are cells derived from cancer in the first place. Unless one starts labelling metabolites and following them through the cell by means of, for example, NMR metabolomics one doesn't know how the cell is using the metabolites and therefore whether an enzyme is essential or not.

    In my favourite cell line glucose doesn't enter the TCA cycle at all and is converted to lactate even in normoxia. Glutamine feeds the TCA cycle via glutamate and alpha-ketogluterate. In this specific case, I don't think these cells would be too fussed by deletion of the pyruvate dehydrogenase complex.

    1. Look at the KDH reaction. One of the substrates is acetyl-CoA. Where does it come from? Is it all due to the breakdown of fatty acids? If so, where do the fatty acids come from?

      And in your favorite cell line, if all the glucose is converted to lactate then where do the cells get their carbon for making DNA, RNA, and protein?

    2. Acetyl-CoA is probably coming from ATP-citrate lyase which we see to be expressed at very high levels compared with similar cell lines that appear to undergo normal oxidative phosphorylation - (a nice review:

      I worded my previous post carelessly: not *all* the glucose is converted to lactate but that which gets to the bottom of the glycolytic pathway is largely converted into lactate. The pentose phosphate pathway is active and fed by glucose for the formation of nucleotides. Carbons for amino acid synthesis comes from both glutamine and glucose depending on the amino acid in question.

    3. @Ed

      Thanks for the reference. I wasn't aware of the enzyme you call ATP-citrate lyase [= ATP citrate synthase EC].

      It doesn't explain why the cells in this study appear to be missing some important enzymes. (BTW, it's very unlikely that the cells are actually missing those enzymes. The fact that the genes appear to be nonessential is likely an artifact of the screening procedures.)

      But even if it's true that aberrant cells (e.g. cancer cells) have a strange metabolism, this just confirms that the studies aren't really defining all of the cell-essential genes in normal human cells.

  3. I'm not the least bit surprised that some very important genes are not essential in cells grown in culture in the laboratory. The inducible heat shock genes are a good example. It merely illustrates the fact that genes detected in these assays are a subset of the real core essential genes required in a natural environmennt.

    However, there are some genes that I expect to be essential in these cells. The fact that the cells can survive without producing or metabolizing glucose doen't seem right to me. The fact that they don't need a complete citric acid cycle doesn't make sense. Where are they getting their energy?

    Maybe these strange human cells don't need these pathways in their artificial environment. That's one possibility. Another possibility is that some essential genes aren't detected in their assay because they only looked at a few generations of survival and the cells had enough protein to keep them alive for that long after the gene was knocked out. (One of the papers discusses this limitation.)

    If the cells really don't need these pathways then that's very curious. You can't just dismiss it by saying that it's not a surprise. If true, it IS a surprise.

  4. Another possibility is that some essential genes aren't detected in their assay because they only looked at a few generations of survival....

    Can't see the papers right now but that sounds like a real experimental possibility if the authors are compelled to raise the point.

  5. Hi Professor Moran,

    You write that the human genome contains about 5,000 genes that specify for functional RNAs. It appears that Dr. Richard Sternberg disagrees with you. On an Evolution News and Views post (March 12, 2010), he writes: "While there are ~25,000 protein-coding genes in our DNA, the number of RNA-coding genes is predicted to be much higher, >450,000," although he adds that the latter "range in length from being quite short--only 20 or so genetic letters--to being millions of letters long." To support his claim, he references the following source: Rederstorff M, Bernhart SH, Tanzer A, Zywicki M, Perfler K, Lukasser M, Hofacker IL, Hüttenhofer A. 2010 (In Press). RNPomics: Defining the ncRNA transcriptome by cDNA library generation from ribonucleo-protein particles. Nucleic Acids Research.

    Would you care to comment? Are you saying that only 5,000 of these 450,000 genes are functional?

    1. Vincent, have you read the paper by Rederstorff et al.? I don't think it supports Sternberg's claims. Indeed, it is much closer to Dr. Moran's estimate than to Sternberg's.

    2. Vincent Torley writes,

      It appears that Dr. Richard Sternberg disagrees with you.

      Stop the presses! Imagine that! An Intelligent Design Creationist who disagrees with me! Wonders never cease.

      Vincent, you are reasonably intelligent so it continues to surprise me that you trust the information being spewed by ID proponents. You should know better.

      There are lots and lots of RNAs made by various human cells. We know for certain that some of them are nonfunctional; therefore, they are not specified by "genes."

      What we don't know for certain is how many of them are functional. The onus is on those who claim functionality to prove that the transcripts have a function. So far only a few hundred transcripts, at most, have been shown to have a function. That means only a few hundred "genes" for noncoding RNAs. There's no evidence to suggest that this number will get much larger in the future.

      I'm arbitrarily choosing 5,000 to make sure I'm in the optimistic range of predictions. Ensembl (GRCh38.p5) predicts 25,000 but they're being ridiculous.

      Here's the reference to the paper that Richard Sternberg mentioned.

      Rederstorff, M., Bernhart, S.H., Tanzer, A., Zywicki, M., Perfler, K., Lukasser, M., Hofacker, I. L., and Hüttenhofer, A. (2010) RNPomics: defining the ncRNA transcriptome by cDNA library generation from ribonucleo-protein particles. Nucl. Acid. Res. 38:e113-e113. [doi: 10.1093/nar/gkq057]

      The authors are fully aware of the problem of identifying function. The decided to look at how many of the transcripts formed RNP particles. This is an indication of function. They found several hundred potentially functional RNAs.

      Vincent, you need to learn to be more skeptical of your fellow ID proponents. Not all of them are wrong all of the time but their track record is sufficiently bad that it should make you pause before believing them.

    3. Hi Professor Moran,

      Thank you very much for your reply. I think your figure of 5,000 functional RNA genes is a fair one.

    4. Well, I'm sure your expert opinion means the world to Larry, Vincent Torley. But are you conceding that those other 445,000 "genes" are junk? If so, shouldn't you be trying to convince your fellow creationists of that? Many of them still think all of the human genome is functional.

  6. Larry, I am a bit confused by the your criticisms and by figures you put in this essay. Specifically, you seem to be saying that these studies (at least two of them - the Cell paper is infuriating in the way it hides, or does not make available, most of the information and results most people would like to see) suggest that cultured cells can make do without core metabolic pathways. However, in the two figures you show, it looks like the relevant enzymes are encoded by gene families. I would take home from this the likelihood that purported cell- or tissue- specific genes and enzyme isoforms are not so specific, such that knock-downs or knock-outs of most of the individual genes involved have little effect because other genes can pick up the slack, so to speak.

    Have I missed something here?

    1. Some of them are gene families—the hexokinases are an example. You are correct (my bad!) in saying that two or more of these genes may be expressed in the cells in culture so that no single gene is "essential."

      Some of the others aren't gene families but subunits of a multimeric protein.

      In other cases there are different genes for cytoplasmic and mitochondrial versions of the enzyme. In those cases (e.g. malate dehydrogenase) it's very unlikely that the two enzymes can substitute for one another.

    2. "In other cases there are different genes for cytoplasmic and mitochondrial versions of the enzyme. In those cases (e.g. malate dehydrogenase) it's very unlikely that the two enzymes can substitute for one another."

      Has anyone looked to see if dual targeting, say, of the purported mitochondrial isoforms, can be operative here? For example, if but a few percent of the transcriptional output of the gene encoding the mitochondrial isoform ended up in the form of a cytoplasmic enzyme, would this be enough to overcome a deficit in the expression of the gene encoding the cytoplasmic isoform? Perhaps, akin to splicing errors, import is not so perfect such that some protein/enzyme ends up inappropriately localized.

      Lots of food for thought is buried in these papers. At many different levels. I sort of like how the two studies, their discrepancies and agreements, and surprising observations such as are pointed out in blog essay, challenge some (or much) of what I often take for granted.