More Recent Comments

Sunday, December 18, 2022

Protein concentrations in E. coli are mostly controlled at the level of transcription initiation

The most important step in the regulation of protein-coding genes in E. coli is the rate of binding of RNA polymerase to the promoter region.

A group of scientists at the University of California at San Diego and their European collaborators looked at the concentrations of proteins and mRNAs of about 2000 genes in E. coli. They catalogued these concentrations under several different growth conditions in order to determine whether the level of protein being expressed from each of these genes correlated with transcription rate, translation rate, mRNA stability or other levels of gene expression.

The paper is very difficult to understand because the authors are primarily interested in developing mathematical formulae to describe their results. They expect you to understand equations like,

even though they don't explain the parameters very well. A lot of important information is in the supplements and I couldn't be bothered to download and read them. I don't think the math is anywhere near as important as the data and the conclusions.

The most important correlation is between a protein and its corresponding mRNA. The figure shows a strong correlation over a very large range of difference concentrations. This tells you that regulation at the level of translation is not very important. Proteins that are present at low concentrations have low abundance mRNAs and proteins that are present at high concentrations are translated from mRNAs present at high concentration.

The authors tested the average rates of translation to see if they were similar on all mRNAs as you might expect from the data and they were. All mRNAs are translated at approximately the same rate.

The concentration of mRNA could be controlled by the rate of degradation, or the half-life, such that the regulation of protein synthesis depends on the stability of the mRNA. The data does not support this hypothesis, all mRNA are about equally stable once they are made.

The authors then looked at the rate of transcription and this is where the interpretation of the data becomes tricky. It's well known that the rate of transcription can depend on the strength of the promoter such that genes with strong promoters are transcribed more frequently than genes with weak promoters. This consituitive level of transcription is not strictly "regulation" since cells don't control the expression of genes by changing the sequences of promoters. However, cells do control the rate of transcription by repressing genes with intrinsically strong promoters and activating genes with intrinsically weak promotes using repressors and activators.

The authors reduce this distinction to a simple parameter called the "promoter on rate (ki)" for all genes under all conditions. I don't know what this "on rate" stands for since the strength of a promoter is usually described by an equilibrium binding/association constant (Ka) or a dissociation constant (Kd) that tells you something about the probability that an RNA polymerase molecule is going to be sitting at a promoter sequence once it finds it. The promoter strength can also depend on how long it takes to form the transcription bubble, synthesize the RNA primer, and transition from the initiation complex to the elongation complex.

The kinetics of binding are obviously related to the equibrium constant since it's the ratio of on and off rates at equilibrium that determine the equilibrium constant but the "on rate" by itself isn't sufficient. Furthermore, not all E. coli promoters are the same since the cell contains a variety of sigma factors (transcription factors) that bind to different promoter sequences. The rate of transcription of other genes depends on additional activators and repressors. Finally, RNA polymerase finds a promoter by first binding nonspecifically to DNA then slidiing along the DNA until it bumps into a promoter sequence where it can initiate transcription. This is not a second-order reaction; in fact it is 100 times faster than a diffusion-controlled second order reaction. What is the "on rate" in this case?

All of these issues are important so I don't understand what the authors mean when they assign an "on rate" to an individual gene. I assume that what they are actually measuring is the overall probability that a transcription initiation event will occur on a given gene. The results are shown in the figure on the right. The concentration of proteins correlates with the on rate (whaever that is) and not with the rate of translation initiation or the rate of RNA degradation.

This leads to the following coclusion.

The results revealed two simple rules on promoter and mRNA characteristics, which profoundly shape how E. coli responds to environmental changes while coping with global constraints: (i) Promoter on rates span more than three orders of magnitudes across genes but vary much less (at most approximately fivefold) across conditions for most genes. Thus, each gene is expressed within an innate abundance range across conditions—e.g., with ribosomal genes belonging to the most abundant and DNA replication proteins belonging to one of the least abundant classes. (ii) mRNA characteristics, including translation initiation rate and mRNA degradation rate, vary little (less than twofold for half of the genes) across genes and conditions. The translation initiation rates are sufficiently rapid to maintain a high density of ribosomes on the mRNA (five ribosomes per kilobase; Fig. 2A and fig. S6E), resulting in high protein production despite short mRNA half-lives.

What does all this mean? It means that gene expression in E. coli is controlled primarily at the level of transcription initiation. This is not a paradigm shift, nor is it any sort of a breakthrough. It's just confirmation of the model that we've been using for decades using much better data (I think).

Recall that Jacques Monond once said that "... anything found to be true of E. coli must also be true of elephants." Does this mean that gene expression in elephants is also controlled primarliy at the level of transcription initiation? The authors of this paper caution against such a conclusion,

The results described here are specific to bacteria. Eukaryotic gene expression involves complex posttranscriptional regulation, including protein secretion and degradation through ubiquination and autophagy. Global constraints are less understood, in particular the extent to which protein density may vary across conditions. Even quantifying the cell volume may be difficult because large portions within a cell may be occupied by subcellular compartments (e.g., vacuoles) that do not contribute to the cytosol. Nonetheless, our study provides a framework to quantitatively explore gene expression in such complex systems.

I understand their caution but I'm inclined to agree with Mark Ptashne when he says,

The thing that nature figured out—it’s kind of amazing, actually—is that once you have all the reading machinery, it’s just a question of recruiting it to the right place. And to do that we have evolved these very simple little factors that get together and attract the RNA polymerase to the gene.

This is a quotation from Kat Arney's interview with Mark Ptashne in her book Herding Hemingway's Cats (p.58). Ptashne is referring to his "recruitment model" (Ptashne and Gann, 1997; Ptashne, 2013) and what he means is that the E. coli and bacteriophage model is probably (mostly) true of eukaryotes as well. Support for the recruitment model is a way of fighting back against the hype of chromatin changes and "epigenetics" (whatever that is) and against the idea that post-transcriptional mechanisms of regulation such as alternative splicing, RNA degradation, and translational control by regulatory RNAs play a significant role in controlling the concentration of proteins in a eukaryotic cell.

The important lessons are: (1) gene expression is mostly controlled at the level of transcription initiation in bacteria, (2) whether or not this is also true in eukaryotes hasn't been decided.

Balakrishnan, R., Mori, M., Segota, I., Zhang, Z., Aebersold, R., Ludwig, C., and Hwa, T. (2022) Principles of gene regulation quantitatively connect DNA to RNA and proteins in bacteria. Science, 378:eabk2066. [doi:10.1126/science.abk2066]

Doing the math on the central dogma

Gene expression can in theory be modulated at the level of transcription or translation, but both of these processes have constraints that complicate prediction of their outputs. To obtain a better quantitative understanding of the control of gene expression in bacteria, Balakrishnan et al. measured promotor on-rates, messenger RNA abundance, and protein abundance for more than 1500 genes in the bacterium Escherichia coli under many different growth conditions. Protein abundance largely reflects gene promoter on-rates and transcription, but has to comply with general constraints that keep the protein concentration constant and limit the number of ribosomes—and thus translational capacity. The authors propose a balancing of transcription with translation through Rsd, a factor that controls the availability of RNA polymerase. Their results may be useful in the design of synthetic circuits in bacteria and the prediction of their behavior in various growth conditions. —LBR

Structured Abstract


The intracellular concentration of a protein depends on the rates of several processes, including transcription, translation, and the degradation and/or dilution of messenger RNAs (mRNAs) and proteins. These rates can be vastly different for different genes and across different growth conditions because of gene-specific regulation. At the systems level, protein concentrations are further affected by the availability of shared gene expression machineries—e.g., RNA polymerases and ribosomes—and are constrained by the approximately invariant cellular mass density. Even in one of the best-characterized model organisms, Escherichia coli, it is unclear how the gene-specific and systems-level effects work together toward setting the cellular proteome. This knowledge gap has not only hindered our efforts in building a predictive framework of gene expression but has also limited our abilities in guiding the rational design of gene circuits.


We undertook a quantitative, genome-scale study, combining experimental and theoretical approaches, to tease apart the contribution of the specific and global effects on cellular protein concentrations in exponentially growing E. coli cells across a variety of growth conditions. We complemented genome-scale proteomic and transcriptomic data with biochemical measurements of total absolute mRNA abundances and synthesis rates. We compared these measurements to gene dosage and the concentrations of ribosomes and RNA polymerases to quantitatively characterize the activity of the gene expression machinery across conditions. This comprehensive dataset allowed us to analyze, in quantitative detail, the interplay between the activity of gene expression machinery, the activity of individual promoters, and the resulting protein concentrations.


We compiled a comprehensive atlas of the determinants of gene expression across conditions—from the concentrations of genes, mRNAs, and proteins to the rates of transcriptional and translational initiation and mRNA degradation for thousands of genes. We were able to determine the on rate of each promoter, a quantity capturing the overall effect of transcriptional regulation that has been elusive through most existing gene expression studies. Unexpectedly, we found that for most genes, the cytosolic protein concentrations were primarily determined by the innate magnitude of their promoter on rates, which spanned more than three orders of magnitude. Changes in protein concentrations resulting from changes in growth conditions were typically much smaller—well within one order of magnitude—and were mostly exerted through changes in transcription initiation.

E. coli’s strategy to implement gene regulation can be summarized by two design principles. First, protein concentrations are predominantly set transcriptionally, with relatively invariant posttranscriptional characteristics (translation efficiencies and degradation rates) for most mRNAs and growth conditions. Second, the overall fluxes of transcription and translation are tightly coordinated: The average density of five ribosomes per kilobase is nearly invariant across mRNA species and across growth conditions, even though the mRNA and ribosome abundances can each vary substantially. We find this coordination to be implemented through the anti-sigma factor Rsd, which modulates the availability of RNA polymerases for transcription across different growth conditions. These two principles lead to a quantitative formulation of the central dogma of bacterial gene expression, connecting mRNA and protein concentrations to the regulatory activities of the corresponding promoters.


These quantitative relationships reveal the unexpectedly simple strategies used by E. coli to attain desired protein concentrations despite the complexity of global physiological constraints: Individual protein concentrations are primarily set by gene-specific transcriptional regulation, with global transcriptional regulation set to cancel the strong growth rate dependence of protein synthesis. These relations provide the basis for understanding the behavior of more complex genetic circuits in different conditions and for the inverse problem of deducing regulatory activities given the observed mRNA and protein levels.

Ptashne, M. and Gann, A. (1997) Transcriptional activation by recruitment. Nature 386:569-577. [doi: 10.1038/386569a0]

Ptashne, M. (2013) Epigenetics: core misconcept. Proceedings of the National Academy of Sciences (USA) 110:7101-7103. [doi: 10.1073/pnas.1305399110]


Graham Jones said...

I found this in the SM.

Assumptions underlying the molecular model of transcription

Central to the quantitative Central Dogma relation is the “molecular” relation between the mRNA
transcription initiation rates, the promoter on-rates and the concentration of available RNAP. For
each gene i, we model the transcription initiation rate alpha_mi as the product of two terms, the
concentration of available RNA polymerase [RNAP]_av and the on-rate of the associated promoter,
k_i. Each of these terms lump together several complicated mechanistic factors. The concentration
of available polymerases, [RNAP]_av, depends on interactions among core polymerase, sigma and
anti-sigma factors; RNAP holoenzymes can be cytosolic, non-specifically bound to the DNA, or
engaged in transcription. All these factors affect the number of RNAP complexes available for
transcription (see Note S5 for an estimate of these quantities for cells grown in reference
condition). On the other hand, k_i lumps in mechanistic aspects of the RNAP-DNA interaction such
as DNA unwinding and the dynamics of transcription complex formations at the (possible
multiple) promoters upstream of the gene. Hence, the breakdown of transcription rate as the
product of [RNAP]_av and k_i represents the simplest assumption that allows us to separate
machinery-dependent and promoter-dependent features of transcription.

Larry Moran said...

@Graham Jones

Thanks for taking the time to read the supplemental material. We can ignore the reference to the Central Dogma.

The rest of it confirms my suspicion that what they are really looking at is the probability that a given gene will be transcribed given the strength of the promoter and the availability of various transcription factors in addition to RNA polymerase.

Ted said...

As for implications for "low value/junkish DNA, naively one might expect that the cell could accumulate a lot of it, with comparatively low metabolic cost, if the "junk" was rarely transcribed. Isn't that what ENCODE actually found, as its critics pointed out?