Thursday, June 22, 2017

Are most transcription factor binding sites functional?

The ongoing debate over junk DNA often revolves around data collected by ENCODE and others. The idea that most of our genome is transcribed (pervasive transcription) seems to indicate that genes occupy most of the genome. The opposing view is that most of these transcripts are accidental products of spurious transcription. We see the same opposing views when it comes to transcription factor binding sites. ENCODE and their supporters have mapped millions of binding sites throughout the genome and they believe this represent abundant and exquisite regulation. The opposing view is that most of these binding sites are spurious and non-functional.

The messy view is supported by many studies on the biophysical properties of transcription factor binding. These studies show that any DNA binding protein has a low affinity for random sequence DNA. They will also bind with much higher affinity to sequences that resemble, but do not precisely match, the specific binding site [How RNA Polymerase Binds to DNA; DNA Binding Proteins]. If you take a species with a large genome, like us, then a typical DNA protein binding site of 6 bp will be present, by chance alone, at 800,000 sites. Not all of those sites will be bound by the transcription factor in vivo because some of the DNA will be tightly wrapped up in dense chromatin domains. Nevertheless, an appreciable percentage of the genome will be available for binding so that typical ENCODE assays detect thousand of binding sites for each transcription factor.

This information appears in all the best textbooks and it used to be a standard part of undergraduate courses in molecular biology and biochemistry. As far as I can tell, the current generation of new biochemistry researchers wasn't taught this information.

In light of available knowledge of the properties of DNA binding proteins, it make sense to assume that most of these sites have nothing to do with regulating transcription. They could easily be sitting on junk DNA. That's not what ENCODE researchers conclude.

It seems to me that the onus is on those claiming that a transcription factor binding site is functional. In the absence of evidence for function we should assume that it's just spurious binding, especially since this is the predicted result based on 50 decades of research on DNA binding proteins.1

Some people have been concerned enough about the controversy to develop global tests for possible function. A recent paper by one of these groups caught my eye ...
Cusanovich, D.A., Pavlovic, B., Pritchard, J.K., and Gilad, Y. (2014) The functional consequences of variation in transcription factor binding. PLoS Genet, 10(3), e1004226.[doi: 10.1371/journal.pgen.1004226]

ABSTRACT: One goal of human genetics is to understand how the information for precise and dynamic gene expression programs is encoded in the genome. The interactions of transcription factors (TFs) with DNA regulatory elements clearly play an important role in determining gene expression outputs, yet the regulatory logic underlying functional transcription factor binding is poorly understood. Many studies have focused on characterizing the genomic locations of TF binding, yet it is unclear to what extent TF binding at any specific locus has functional consequences with respect to gene expression output. To evaluate the context of functional TF binding we knocked down 59 TFs and chromatin modifiers in one HapMap lymphoblastoid cell line. We then identified genes whose expression was affected by the knockdowns. We intersected the gene expression data with transcription factor binding data (based on ChIP-seq and DNase-seq) within 10 kb of the transcription start sites of expressed genes. This combination of data allowed us to infer functional TF binding. Using this approach, we found that only a small subset of genes bound by a factor were differentially expressed following the knockdown of that factor, suggesting that most interactions between TF and chromatin do not result in measurable changes in gene expression levels of putative target genes. We found that functional TF binding is enriched in regulatory elements that harbor a large number of TF binding sites, at sites with predicted higher binding affinity, and at sites that are enriched in genomic regions annotated as “active enhancers.”

Author Summary: An important question in genomics is to understand how a class of proteins called “transcription factors” controls the expression level of other genes in the genome in a cell-type-specific manner – a process that is essential to human development. One major approach to this problem is to study where these transcription factors bind in the genome, but this does not tell us about the effect of that binding on gene expression levels and it is generally accepted that much of the binding does not strongly influence gene expression. To address this issue, we artificially reduced the concentration of 59 different transcription factors in the cell and then examined which genes were impacted by the reduced transcription factor level. Our results implicate some attributes that might influence what binding is functional, but they also suggest that a simple model of functional vs. non-functional binding may not suffice.
The authors clearly understand the controversy and they clearly understand that spurious binding is a problem.

What they did was to construct cell lines where production of a given transcription factor was reduced (knockdown). The looked at expression of about 8,000 genes to see which ones, if any, were altered by reducing expression of the transcription factor (TF). The idea is to see whether all TF binding sites are affecting expression of nearby genes or whether only a subset of TF binding sites are actually affecting transcription. The result is that "the regulation of the vast majority of target genes is not affected by perturbations to the expression levels of the TFs." In other words, most transcription factor binding sites don't seem to play a role in regulating expression nearby genes. This is exactly what is predicted by the known properties of DNA binding proteins and it conflicts with the claims of ENCODE researchers who believe that most TF binding sites are functional.

What makes this paper a cut above the standard publications is the extensive, and critical, discussion of their findings in a lengthy Discussion section. They list several caveats that could challenge their conclusion. It's well worth reading.

ASIDE: I'm a big fan of teaching fundamental principles and concepts. That's why I spent some time on the general properties of DNA binding proteins in my molecular biology courses. I tried to explain these general properties using well-studied examples where the kinetics of binding were known and the equilibrium binding constants had been determined for specific and non-specific binding. They were usually examples from E. coli.

My colleagues and I also taught general concepts of regulation including the well-known fact that many bacterial transcription factors could function as both repressors and activators depending on the circumstances. There were some excellent examples we could use to illustrate this important concept. I incorporated some of them in one of my textbooks from 1994.
We have seen that CRP-cAMP can be both activator and a repressor, depending on which gene is being controled. It functions as an activator when its binding site is just upstream of the promoter, but it functions as repressor when the binding site overlaps the promoter and CRP-cAMP competes with RNA polymerase in binding DNA. There are many similar examples of regulatory proteins that can be both repressors and activators; one well-studied protein is AraC, which regulates genes involved in utilization of arabinose. The regulation of arabinose operons is complex; by binding to different sites on DNA, AraC functions as either a repressor (in the absence of arabinose) or an activator (when arabinose is available).

Finally, MerR is a simpler example of a regulatory protein that is both a repressor and an activator. The protein is required for the regulation of the mer operon, whose genes encode proteins that chelate mercury ions. MerR represses transcriptions of the mer operon by binding near the promoter. In the presence of mercury a MerR-Hg++ complex forms, and this complex acts directly as an activator at the same promoter.
I don't think these concepts are taught to undergraduates any more. I think most undergraduate courses have eliminated almost all references to to non-eukaryotic systems. What this means is that the fundamental concepts that were developed over several decades of work in simple systems are being ignored in undergraduate and graduate courses.

This point was brought home to me while reading the Cusanovich et al. paper. I came across the following statement.
In addition to considering the distinguishing characteristics of functional binding, we also examined the direction of effect that perturbing a transcription factor had on the expression level of its direct targets. We specifically addressed whether knocking down a particular factor tended to drive expression of its putatively direct (namely, bound) targets up or down, which can be used to infer that the factor represses or activates the target, respectively. Transcription factors have traditionally been thought of primarily as activators, and previous work from our group is consistent with that notion. Surprisingly, the most straightforward inference from the present study is that many of the factors function as repressors at least as often as they function as activators.
It's true that most transcription factors in eukaryotes function mostly as activators but the result wouldn't have been a surprise to the authors if they had been taught correctly as undergraduates and graduate students.

I think it's a bad idea that we are ignoring so much of the important work on phage and bacteria from the 1960s and 1970s. I recently asked my students if they knew anything about bacteriophage lamdba and was mostly met with blank stares. I didn't dare ask them if they could explain the genetic switch.

Mark Ptashne would not be amused.





1. I started working on DNA binding proteins for my Ph.D. thesis in 1968. That's only 49 years ago. Others had been working on the problem before me. :-)

3 comments :

  1. ***It's true that most transcription factors in eukaryotes function mostly as activators but the result wouldn't have been a surprise to the authors if they had been taught correctly as undergraduates and graduate students.***

    If I recall correctly repression would have to be by a completey different mechanism in eukaryotes because there is no steric inhibition of Pol2. It would have to be either by recruitment of inhibitory factors when present or even more indirectly, by activating a repressor gene.
    Seems to me the extreme functionalists could claim a roundabout function for most TF binding sties. Presumbly the expression level for any particular TF is tuned to account for non-functional sites. So if one could delete all the 'non-functional' sites it would drastically increase the effective concentration of that TF and throw off gene expression

    ReplyDelete
  2. ***It's true that most transcription factors in eukaryotes function mostly as activators but the result wouldn't have been a surprise to the authors if they had been taught correctly as undergraduates and graduate students.***

    Is there evidence this is universally true across eukarya or is this extrapolation from a couple of mammals? I know of numerous repressors in yeast for example.

    ReplyDelete