Sandwalk: Educating an Intelligent Design Creationist: The Specificity of DNA Binding Proteins

Thursday, April 11, 2013

Educating an Intelligent Design Creationist: The Specificity of DNA Binding Proteins

I'm replying to a post by andyjones (More and more) Function, the evolution-free gospel of ENCODE. This was the fourth post in a series and I'm working my way through five issues that Intelligent Design Creationists need to understand. The first two were "Pervasive Transcription" and "Rare Transcripts."

Educating an Intelligent Design Creationist: Introduction
Educating an Intelligent Design Creationist: Pervasive Transcription
Educating an Intelligent Design Creationist: Rare Transcripts

The Specificity of DNA Binding Proteins

It is absolutely essential that you understand the basic biochemistry of DNA binding proteins if you want to interpret the ENCODE results and the controversy surrounding junk DNA. You might think this is a given since almost everyone involved in the discussion has had some exposure to biochemistry in undergraduate courses. Unfortunately, most of these courses don't teach that stuff anymore¹ so we've raised a generation of scientists who were never exposed to the facts.

THEME:
Transcription
I've blogged about this many times in the past. The model system is the lac repressor since so much work has been done on DNA binding over the past 40 years [DNA Binding Proteins] [Repression of the lac Operon]. We know a great deal about the thermodynamics and kinetics of binding of lac repressor to DNA. It binds specifically to three DNA sequences (operators) near the promoter of the lac operon. The three operators have slightly different sequences and this affects the strength of binding. It was results like this that led to the concept of a "consensus sequence"—a DNA sequence that represents the ideal binding site. Regions of DNA that resemble the consensus sequence will be bound less tightly. There's a progression in strength of binding that ranges from very strong binding to the consensus sequences all the way down to very weak binding to a DNA sequence that has no resemblance to the consensus.

Why do specific DNA binding proteins also bind to random sequences of DNA? There are two good reasons. First, it is impossible for DNA binding proteins to discriminate absolutely between sequences that resemble the binding site and those that don't. All DNA binding proteins have to recognize the sugar-phosphate backbone of DNA before they can probe the exact sequence of the stacked bases in the interior of the molecule.

The second reason is more important. Here's how I described it in: DNA Binding Proteins.

Now, here's the important point: all specific DNA binding proteins also bind DNA non-specifically. In many cases it's part of the search mechanism for the specific binding site. In the case of lac repressor, for example, the protein binds to any old place on the DNA molecule and slides along the DNA searching for a specific binding sequence. After sliding for a second or so it falls off and re-binds to another part of the DNA molecule.

You'll have to read that post for more details. Just keep in mind that all specific DNA binding proteins must also bind non-specifically.

The lac repressor is the very best protein at discriminating between specific and non-specific sites. The non-specific binding constant is K_a ~ 10⁶ M^-1. That may not mean very much to most of you but it's pretty high. It means that lac repressor binds pretty tightly to any old sequence of DNA. This value is about the same for all DNA binding proteins, including RNA polymerase. The specific binding constant (equilibrium association constant) represents the strength of binding to the ideal operator site (the consensus sequence). It's value is K_a ~ 10¹³ M^-1. That's seven orders of magnitude stronger than non-specific binding. No other DNA binding protein binds so strongly to its target site.

In spite of this huge difference, most lac repressor molecules inside an E. coli cell will be sitting on sites other than the operators at any one time. That's because there are 4.6 million non-specific binding sites and only three specific binding sites. If the E. coli genome was full of extra DNA, like the mammalian genome, then there would be 6.4 billion binding sites. That's why eukaryotic cells need so many more molecules of each transcription factor compared to bacteria—most of them are sitting where they're not supposed to be (Yamamoto and Alberts, 1976).

RNA polymerase binds specifically to promoter sequences. It adopts the same binding mechanism as other specific DNA binding proteins; namely, it binds non-specifically then slides along DNA until it finds a promoter sequence. The sequence of a eukaryotic promoter is not very well defined so in most cases eukaryotic RNA polymerase needs help in the form of a nearby transcription factor to bring it to the correct transcription initiation site.

Nevertheless, the basic concept is the same. We know the kinetics and binding constants for E. coli RNA polymerase and they produce the expected distribution [see How RNA Polymerase Binds to DNA for all the references]. A significant percentage of RNA polymerase molecules are sitting at sites other than genes and promoters. The situation is much worse in mammals with large genomes. You need tens of thousands of RNA polymerase molecules in order to ensure that promoters will be occupied. Most of these are sitting at sites that fortuitously resemble real promoters or they are bound to a transcription factor that is also at a non-specific site.

None of this is controversial once you have read the papers and understand the principles of DNA binding. It is straightforward biochemistry/molecular biology at the undergraduate level. We expect the mammalian genome to be covered with non-functional transcription factor binding sites and bound RNA polymerase molecules. Many of these will be in pre-initiation complexes and many will actually be the sites of spurious transcription by accident.

This is exactly what Kevin Stuhl (2007) was talking about when the preliminary ENCODE results were published six years ago. He said ...

The issue of transcriptional noise has become increasingly important, because recent studies in a wide range of eukaryotic organisms indicate that there is far more transcription than expected from the classical view of the transcriptome. Here, on the basis of experimental observations, including a recent analysis of genome-wide distribution of Pol II [RNA polymerase II], I estimate that only 10% of the elongating Pol II molecules in the yeast Saccharomyces cerevisiae are engaged in transcription that initiates from conventional promoters and that the remaining 90% of the elongating Pol II molecules represent transcriptional noise. Furthermore, these calculations suggest that the specificity of Pol II initiation (an approx 10⁴-fold difference between an optimal site and an average genomic site) is comparable to that of sequence-specific DNA-binding proteins and other biological processes considered to be specific.

The ENCODE preliminary result confirmed back in 2007 what we knew about the properties of transcription factors and RNA polymerase. The completed project extended this result to the entire genome. The ENCODE experiments detected 636,336 sites where there were bound transcription factors (119 different factors) and tens of thousands of sites where RNA polymerase was bound. It would have been surprising if these sites had NOT been found.

Andyjones asks,

And how do we know that RNA polymerase is meant to bind only promoters? Is it safe to assume that when RNA polymerase binds to other sites, this must be accidental or unintentional? Could it be that these other sites are meant to be transcribed only rarely? I don’t know, but I would like to know. Why don’t we encourage scientists to take a deeper look? Oh great, that’s what ENCODE are doing.

I hope I've answered your question. We have excellent reasons for believing that many of those bound RNA polymerases are sitting at spurious binding sites that aren't real promoters. That doesn't mean that all of them are at non-functional sites but it does mean that most of them have to be at non-promoters or else our understanding of the basic biochemistry of binding is seriously flawed.

Andyjones continues ...

So, Larry thinks that rare transcription (which he believes is due to RNAP binding sites that are not recognised promoters) indicates accidental transcription. That is his argument. Ironically, Larry is using a design heuristic here (assuming that promoter means ‘bind only here’). All I am suggesting (not claiming to know for sure) is that perhaps the correct design heuristic should be that a promoter really means ‘bind more often here’? If so, there would be no reason to assume non-function.

It is quite reasonable now to expect that details of actual function will subsequently be found for much of the genome. Therefore we should keep looking for that function.

I hope that andyjones will continue this conversation if there's still something he doesn't understand about the properties of DNA binding proteins. I recommend that he read an introductory biochemistry/molecular biology textbook if he's still confused. I know of one that I'd recommend.

1. It's not covered on the MCAT!

[Image Credit: Moran, L.A., Horton, H.R., Scrimgeour, K.G., and Perry, M.D. (2012) Principles of Biochemistry 5th ed., Pearson Education Inc. [Pearson: Principles of Biochemistry 5/E] © 2012 Pearson Education Inc.]

Struhl, K. (2007) Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Structural & Molecular Biology 14:103-105. [doi: 10.1038/nsmb0207-103]

Yamamoto, K.R. and Alberts, B.M. (1976) Steroid Receptors: Elements for Modulation of Eukaryotic Transcription. Ann. Rev. Biochm. 45:721-746. [

18 comments:

AnonymousThursday, April 11, 2013 6:21:00 PM

Thanks for these wonderful posts L. You're really going out of your way to explain all this considering that many IDers have already said that even if much of the genome was proven to be junk it still wouldn't be a problem for ID. I guess wars are won by taking one hill at a time.
ReplyDelete
Replies
PeterThursday, April 11, 2013 7:32:00 PM
We have excellent reasons for believing that many of those bound RNA polymerases are sitting at spurious binding sites that aren't real promoters.

An obvious way of testing this would be technical replication. If ENCODE do the same ChIP experiment for the same transcription factor in the same tissue type twice (or more), do they find the same binding sites each time? Did they try this?
ReplyDelete
Replies
Georgi MarinovThursday, April 11, 2013 9:15:00 PM
And how do we know that RNA polymerase is meant to bind only promoters?

One well understood class of such sites are enhancers - when you do ChIP-seq against Pol2, often it will cross-link quite robustly to enhancers because of the looping of the latter to the promoter. And sometimes it might transcribe them though I am personally not at all convinced in the functional importance of the whole eRNA story.
ReplyDelete
Replies
whimpleFriday, April 12, 2013 12:09:00 AM
There is no a priori reason why DNA polymerase should have any greater specificity for DNA sequences than RNA polymerase, but the licensing of DNA replication origins is observed to be very strictly controlled so it is NOT the case that spurious biochemical activity is a necessary consequence of the limited site specificity of any single given protein, contrary to the implication in the post. You could rather argue that spurious transcription is of much less consequence than spurious replication, and that therefor transcription doesn't need such stringent licensing and therefore spurious transcription is to be expected, but spurious activity as a necessary consequence of the thermodynamics of protein/DNA binding for higher order protein complexes is false.
ReplyDelete
Replies

Add comment