Sandwalk: Nils Walter disputes junk DNA: (8) Transcription factors and their binding sites

Thursday, March 14, 2024

Nils Walter disputes junk DNA: (8) Transcription factors and their binding sites

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the seventh post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift/shaft. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' In the fourth post I discuss his claim that differing opinions on junk DNA are mainly due to philosophical disagreements. The fifth, sixth, and seventh posts address specific arguments in the junk DNA debate.

DNA-binding proteins

I did my PhD in a lab that specialized in DNA binding proteins so I became very familiar with the properties of these molecules. It was a time when the basic properties of DNA binding were being worked out in many labs and the results of those studies have now been in the textbooks for many decades. Unfortunately, it seems that this kind of hard core biochemistry is being ignored in most courses these days in favor of much more emphasis on human biochemistry, physiology, and disease.

Several DNA binding proteins have been studied intensely, beginning with the lac repressor in a classic series of papers by Lin and Riggs (Lin and Riggs, 1972; Lin, Riggs, and Wells, 1972; Lin and Riggs, 1975). I covered the details in previous posts but the take-home lesson is that DNA binding proteins interact with both DNA in general (non-specific binding) and with small specific DNA binding sites (specific binding). The kinetics of the interactions and the association constants of the equilibria dictate that at any one time most DNA binding proteins will be bound non-specifically and only a small fraction will be bound to their specific binding site [DNA binding proteins].

The distibution of RNA polymerase in E. coli cells is a little more complex because RNA polymerase remains bound to the gene it's transcribing for a very long time as it moves from the promoter to the termination site. Nevertheless, data from many years ago shows that a substantial percentage of RNA polymerase molecules are bound nonspecifically at any one time. I'm including a figure from my textbook to illustrate this distribution.

This has enormous consequences for the binding of transcription factors in eukaryotic cells since the amount of DNA is huge compared to the target sites. This means that a large number of these transcription factors will be bound non-specifically at any one time. This problem was addressed by Lin and Riggs (1975) who calculated the enormous difficulty that a small number of lac represor molecules would have finding an operator sequence in a human lymphocyte nucleus.

Similar calculations were performed in computer simulations by my fellow graduate student, Keith Yamamoto, using the estrogen receptor as a model. He calculated that you would need about 10,000 estrogen receptor molecules in each nucleus in order to activate the appropriate genes. That's because most of them would be bound non-specifically or to randomly occurring non-productive sites identical to the specific binding sites. He was purifying the estogen receptor at the time and the yield he was seeing was consistent with this calculation (Yamamoto and Alberts, 1974; Yamamoto and Alberts, 1975).

This is all covered in Chapter 8 of my book in a section titled "On the importance of DNA-binding proteins." The important properties of DNA-binding proteins have been confirmed time and time again in the 50 years since they were first discovered. Knowledeable scientists know that large eukaryotic genomes will be littered with transiently bound transcription factors and other DNA binding proteins such as RNA polymerase. Some of these will trigger transcription at inappropriate sites that are unrelated to biologically relevant promoters, leading to a low level of spurious transcription throughout the genome.

This spurious transcription activity has been documented in several species and so has the transcription of random segments of DNA inserted into cells.

For this reason, many of us were shocked when ENCODE published their results in 2012 claiming that every transcription factor binding site was a true regulatory sequence. That didn't make any sense and the conclusion was immediately challenged. That's why the ENCODE researchers retracted their claim and now refer to these sites as "candidate" sites. They recognize that they need to offer further evidence that these sites are functional and not accidental.

Questioning the biochemistry

Nils Walter addresses this issue in a section of his paper titled "Transcription factors bind to random genome sequences!" I have to quote the entire first paragraph because I'm not sure I could do it justice by paraphrasing.

Transcription factors bind to random genome sequences!

One final argument of geneticists is the suggestion that many random pieces of DNA can promote transcription by recruiting transcription factors locally.[75] However, rarely is the transient binding of a single transcription factor sufficient to recruit an RNA polymerase molecule to a transcription site. Rather, a combinatorial cooperation between cis-regulatory sequence elements in the genome, trans-acting transcription factors and signaling molecules, and gene-distal, but cis-acting ncRNA enhancer transcripts is needed to initiate directional transcription events that govern the tissue-specific, spatiotemporally controlled expression dynamics of genes (Figure 3B).[98] Consequently, the still poorly understood constant spatial reorganization of chromosomes in the densely packed nucleus—guided by a plethora of enhancer lncRNAs (Figure 1)—is both the result of and prerequisite for correct transcriptional programs that allow for the plasticity and adaptability of the semi-autonomous gene expression observed in each individual cell of a multicellular eukaryotic organism.[99]

I don't know whether this is intentional obfuscation or just poorly explained. The issue is how many regulatory sites there are in the human genome. The original ENCODE report in 2012 claimed that 80% of our genome was functional and this included 8.5% devoted to regulatory sites as identified by transcription factor binding or open chromatin domains. They suggested that once they look at more transcription factors, the amount of DNA devoted to regulation could be at least 20%. [What did the ENCODE Consortium say in 2012?] The hype over the past few years has emphasized that there could be one million regulatory sites in the human genome.

The argument against this view is that a huge majority of those sites are not true regulatory sites. Instead, they are spurious binding sites just like the ones predicted 50 years ago. Walter doesn't confront that controversy. He briefly mentions the argument of the "geneticists" (i.e. biochemists) then moves on to a description of complex transcription initiation sites involving enhancer RNAs (eRNAs). That's a different issue. We still want to know how many of these sites are present and what percentage of the genome they occupy.

He says there are hundreds of thousands of these sites but he offers no evidence to support this claim and he makes no attempt to deal with the fact that "transcription factors bind to random genome sequences"—the title of this section of his essay.

So, what's the point? It has something to do with his model of transcription regulation as shown in Figure 3 of the paper. I'll reproduce it below.

This is a complicated model but the main point seems to be that humans have many regulatory RNAs that have evolved from transposons. These RNAs are required for complex transcription initiation.

Nils Walter is perfectly entitled to present his personal model of transcription initiation. The next step is to develop tests of that model to see if it is supported by evidence or refuted by evidence. That's normally the way science works. But this seems to be different. It seems like Walter is giving us this model as "evidence" that there must be abundant regulatory RNAs in spite of data that seems to refute that idea. In other words, he seems to be begging the question.

This is how I interpret his closing paragraph (below) in a section that was supposed to review the evidence for abundant regulatory sites in light of data that many of them are artifacts.

In this modern view of eukaryotic gene expression, only those transcription events will occur that are sufficiently robustly proofread by a sequence of kinetically controlled, reversible assembly events that have to enhance each other and outcompete a vast number of possible alternative events.[26] In the resulting holistic model (Figure 3B), the significant number of defined transcripts detected by ENCODE then become a signature of select cellular processes that are allowed to proceed among a much larger number of possible transcripts. While we still do not understand the phenotypic functions of a majority of these primarily non-protein coding RNAs, we have to assume that the likelihood is high for eventually finding many functions that evolution has preserved across the many generations of individuals. Collectively, all these organisms and their cells were exposed to a vast array of rapidly changing environmental conditions, which imprinted on their genome and were inherited by the following generations.

Here's what I think he's saying. He's agreeing that spurious transcription is a problem but he argues that humans (and other mammals?) had to evolve sophisticated mechanisms to overcome this problem so that only the proper genes are transcribed at significant rates. Part of that sophisticated mechanism requires hundreds of thousands of regulatory RNAs.

I don't find this very helpful. The expression of dozens and dozens of human genes has been examined in considerable detail and, as far as I know, with very few exceptions, their transcription can be adequately explained by transcription factors binding upstream of the promoter and interacting with RNA polymerase complexes to form an initiation complex. Scientists have taken these regulatory regions, fused them to reporter genes such as E. coli β-galactosidase and reinserted them into mammalian cells to show that the regulatory sequences behave as predicted. In some cases, the constructs were used to create transgenic mice to show that they worked in whole organisms when inserted randomly in the genome. My colleagues and I did this in 1989 using 550 bp of the HSP70, heat-shock inducible regulatory region (Kothary et al. 1989).

There's no obvious reason to assume that the majority of human genes are regulated in a much more complicated manner involving thousands of regulatory RNAs. (I'm not ruling out the occasional exception.) If you are going to advance and defend such a model, I would have expected more data and more critical thinking in an essay on "Are non-protein coding RNAs junk or treasure?" I still don't know how many transcription start sites Nils Walter envisages and how much of the genome they are supposed to occupy and those seem to be important questions if you are advocating that most of our genome is functional.

Kothary, R., Clapoff, S., Darling, S., Perry, M.D., Moran, L.A. and Rossant, J. (1989) Inducible expression of an hsp68-lacZ hybrid gene in transgenic mice. Development 105:707-714. doi: [PDF]
Lin, S.-y. and Riggs, A.D. (1972) lac represser binding to non-operator DNA: detailed studies and a comparison of equilibrium and rate competition methods. Journal of molecular biology 72:671-690. [doi: 10.1016/0022-2836(72)90184-2]

Lin, S.-y. and Riggs, A.D. (1975) The general affinity of lac repressor for E. coli DNA: implications for gene regulation in procaryotes and eucaryotes. Cell 4:107-111. [doi: 10.1016/0092-8674(75)90116-6]

Riggs, A., Lin, S. and Wells, R. (1972) Lac repressor binding to synthetic DNAs of defined nucleotide sequence. Proceedings of the National Academy of Sciences 69:761-764. [doi: 10.1016/0022-2836(72)90184-2]

Yamamoto, K.R. and Alberts, B. (1974) On the specificity of the binding of the estradiol receptor protein to deoxyribonucleic acid. Journal of Biological Chemistry 249:7076-7086. [PDF]

Yamamoto, K.R. and Alberts, B. (1975) The interaction of estradiol-receptor protein with the genome: an argument for the existence of undetected specific sites. Cell 4:301-310. [doi: 10.1016/0092-8674(75)90150-6]

Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]

1 comment :

SPARC said...: If the binding would be as specific and exclusive as Walter and others assume I wonder if they are unaware that EMSAs and footprinting experiments which employ nuclear extracts necessitate the addition of unspecific competitor nucleic acids to reduce unspecific binding of proteins for which the labeled DNA doesn't contain any binding sites. Maybe they just do in vivo cross-linking which just tells you that a protein was bound to the DNA but not if the binding was specific.; Monday, March 18, 2024 2:47:00 AM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Thursday, March 14, 2024

Nils Walter disputes junk DNA: (8) Transcription factors and their binding sites

DNA-binding proteins

Questioning the biochemistry

1 comment :