Sandwalk: Even more regulatory elements?

Thursday, January 15, 2026

Even more regulatory elements?

The expression of genes is regulated at many levels but one of the most important is regulation at the level of transcription. Transcription initiation is controlled by transcription factors that bind to sequences near the promoter and either activate or repress transcription.

A lot of work has been done on transcription regulation in mammals over the past 40 years. The general impression from these detailed studies of individual genes is that regulation usually involves a relatively small number of transcription factors that bind to sequences within 1000 bp or so of the transcription start site.

This model was challenged by the ENCODE studies in 2012. ENCODE researchers claimed to have discovered hundreds of thousands of cis-regulatory elements (CRE's) covering a substantial percentage of the genome. If they are correct, then this means that there are dozens of transcription factors controlling the expression of every gene.

All researchers need to realize that the best scientific practice is produced when, like Darwin, they persistently search for flaws in their arguments.

Bruce Alberts et al. (2015)
"Self-correction in science at work"
Science 348: 1420

Many scientists pointed out that what the ENCODE researchers were really looking at was transcription factor binding sites and not CRE's. In a genome full of junk DNA, we expect a large number of spurious transcription factor binding sites. These sites are NOT CREs although they may be good candidates for biologically relevant regulatory sites. Later ENCODE researchers seemed to (reluctantly) agree with this criticism so they began to label those sites as "candidate" cis-regulatory elements or cCRE's.

The controversy continues. I've blogged about it repeatedly in an effort to alert people to the real issue; namely, whether a transcription factor binding site is real or spurious [How many regulatory sites in the human genome?]. Last month I drew your attention to a study of TF binding sites in random DNA sequences inserted into human cells. That study confirmed that you could detect these sites in random DNA suggesting that the ENCODE data might contain a lot of spurious sites that have nothing to do with regulation [The activity of "random" DNA supports the junk DNA model].

Now we're starting 2026 with another study demonstrating that ENCODE supporters haven't listened to any of the criticism leveled against their interpretation. For reasons that are very unclear to me, this most recent study was published in Nature, one of the most prestigious science journals.

Moore, J.E., Pratt, H.E., Fan, K., Phalke, N., Fisher, J., Elhajjajy, S.I., Andrews, G., Gao, M., Shedd, N. et al. (2026) An expanded registry of candidate cis-regulatory elements. Nature:1-10. [doi: 10.1038/s41586-025-09909-9]

Mammalian genomes contain millions of regulatory elements that control the complex patterns of gene expression. Previously, the ENCODE consortium mapped biochemical signals across hundreds of cell types and tissues and integrated these data to develop a registry containing 0.9 million human and 300,000 mouse candidate cis-regulatory elements (cCREs) annotated with potential functions2. Here we have expanded the registry to include 2.37 million human and 967,000 mouse cCREs, leveraging new ENCODE datasets and enhanced computational methods. This expanded registry covers hundreds of unique cell and tissue types, providing a comprehensive understanding of gene regulation. Functional characterization data from assays such as STARR-seq, massively parallel reporter assay, CRISPR perturbation and transgenic mouse assays have profiled more than 90% of human cCREs, revealing complex regulatory functions. We identified thousands of novel silencer cCREs and demonstrated their dual enhancer and silencer roles in different cellular contexts. Integrating the registry with other ENCODE annotations facilitates genetic variation interpretation and trait-associated gene identification, exemplified by the identification of KLF1 as a novel causal gene for red blood cell traits. This expanded registry is a valuable resource for studying the regulatory genome and its impact on health and disease.

So now we have 2.37 million transcription factor binding sites that may or may not be true regulatory elements. They are "candidates" (cCREs) but the authors claim that this study provides "a comprehensive understanding of gene regulation" because 90% of these candidate sites are actually involved in regulation.

Let's think about that. 90% of 2.37 million is still 2.13 million sites. This means an average of 85 regulatory sites per gene if there are 25,000 genes. Does anyone seriously believe that the average human gene is controlled by that many regulatory sites? (Keep in mind that about 10,000 of those genes are housekeeping genes that are transcribed in almost every cell.)

Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can—if you know anything at all wrong, or possibly wrong—to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it as well as those that agree with it.

Richard Fyenman (1985)
"Cargo Cult Science"
in Surely You're Joking, Mr. Feynman"

Apparently the authors are quite comfortable with this conclusion. They note that the current CRE database covers 21% of the genome but there may be more sites that have yet to be discovered. Here's part of their discussion.

With a genomic footprint of 21%, the human registry represents a comprehensive catalogue of the cis-regulatory repertoire, as it integrates data across thousands of biosamples spanning most human organs and tissues. However, we recognize the need for further evaluation using single-cell data to determine whether the registry may miss high-activity CREs specific to numerically rare cell types. Additionally, the potential emergence of novel CREs under disease or stimulation conditions remains an open area for investigation. Our initial assessments using single-cell data (Supplementary Note 1.7) support the overall completeness of the registry, but future work will be necessary to refine and expand its coverage using these more granular datasets.

Note the subtle shift from cCREs to just CREs.

Let me be clear about my critique. I'm not denying that there may be a huge number of biologically relevant regulatory sites hidden in the junk DNA. I'm skeptical, but still trying to keep an open mind.

What I object to most strongly is the fact that Moore et al. don't even consider the possibility that they may be looking at spurious TF binding sites and they don't even discuss the implications of their conclusions.

The fact that this paper was published without acknowledging the controversy tells me that peer review has failed.

10 comments :

Anonymous said...: I always wonder what authors of such papers think about how many different states of a cell exist. Surely, during development there is a lot of differentiation to form the different cell types, tissues and organs. However, once adulthood is reached how much regulation is required and more importantly why would a salamander need more DNA and multiple times as many regulatory elements as mammals to meet the demands of live. Or should we believe that the development of a salamander requires more regulation and in adulthood salamanders are challenged by more environmental changes then human beings, toads or frogs? In addition, isn’t much of the necessary day-to day regulation primarily happening on the cellular and the protein level rather than through transcriptional changes.; Friday, January 16, 2026 4:10:00 AM
gert korthof said...: Larry, I asked google Ai: "how many regulatory sites do human genes have?".
The answer included a reference to: 'UMass Chan scientists annotate largest map yet of human genome’s regulatory switches', 15 Jan 2026. (press release)
which is rather surprising because that means that AI answers are apparently permanently updated. A one-day old publication is listed. So, it is not true that AI uses a fixed database of data.

"This means an average of 85 regulatory sites per gene":
could 85 be approximately true if something like 10 regulatory sites per cell type are used and the rest is non-functional in that cell-type? So, each cell type uses a different set of regulatory sites? Add to this a lot of redundancy?; Friday, January 16, 2026 6:20:00 AM
Larry Moran said...: @gert korthof: What exactly are you suggesting? Are you suggesting that the 10,000 housekeeping genes are controlled by different transcription factors in different cell types? Are you thinking that the gene for triose-phosphate isomerase (TPI1) is controlled by 10 specific transcription factors in skin cells but a different 10 transcription factors in liver cells?; Friday, January 16, 2026 11:52:00 AM
gert korthof said...: Larry, I didn't mention 'housekeeping genes'! Is there consensus about the number of regulatory sites for non-housekeeping genes in humans? What is the number according to you?
PS: I found estimates of 3,140 to 6,909 housekeeping genes in humans, the rest would be non-housekeeping genes.; Saturday, January 17, 2026 6:41:00 AM
Larry Moran said...: @gert korthof: True, you didn't mention housekeeping genes but they are important when one is making claims about there being 85 regulatory sites per gene. You proposed that this number could be explained if there were genes that were expressed in multiple cell types but used different transcription factors in each cell type.

I asked the obvious question in order to clarify your proposal. Housekeeping genes are the classic example of genes that are expressed in multiple cell types. Why didn't you answer my question?

Do you believe that there are millions of biologically relevant regulatory sites in the human genome?

Do you agree that the authors of this paper should have mentioned the possible existence of spurious transcription factor binding sites?; Saturday, January 17, 2026 11:41:00 AM
Arthur Hunt said...: Larry, you asked "Are you suggesting that the 10,000 housekeeping genes are controlled by different transcription factors in different cell types? Are you thinking that the gene for triose-phosphate isomerase (TPI1) is controlled by 10 specific transcription factors in skin cells but a different 10 transcription factors in liver cells?"

This doesn't sound too far-fetched to me. Well, maybe 10 completely cell-specific factors is a bit much, but the general idea seems a reasonable suggestion.; Tuesday, January 20, 2026 1:03:00 PM
Larry Moran said...: @Arthur Hunt: Let's think about what we would see if your speculation is correct.

Imagine that the genes for all the enzymes in the Krebs cycle are controlled by transcription factors A, B, and C in liver cells and by factors X, Y, and Z in muscle cells. If we look carefully at muscle cells we should be able to show that the genes for A, B, and C are not expressed in those cells and similarly the genes for X, Y, and Z are not expressed in liver cells.

Is there any evidence that this is true of general transcription factors that are required for a large number of housekeeping genes?

Also, we should see lots of examples of promoter bashing experiments where you get very different results depending on whether you do the experiments in HeLa cells or a neuroblastoma cell line. The scientific literature should be full of such anomalies, right?; Tuesday, January 20, 2026 6:26:00 PM
Anonymous said...: This feels similar to the initial estimates of the number of human genes as the HGP was finishing up, with some put at 100,000+. Only now they can back up their guesses by abusing genome-wide data in their analyses; Thursday, January 22, 2026 3:41:00 PM
Larry Moran said...: Anonymous: For 30 years (from 1970) The knowledgeable experts were predicting about 30,000 genes in the human genome and they weren't far off.

How many protein-coding genes in the human genome? (2); Friday, January 23, 2026 3:48:00 PM
Arthur Hunt said...: Larry, what is true is that there is not a single rule that is at work. One can find general TFs that are expressed everywhere, members of TF families that are tissue and cell type-specific, and most any permutation one can come up with. This is true for housekeeping genes as well as more tightly-regulated genes.

You ask: "Also, we should see lots of examples of promoter bashing experiments where you get very different results depending on whether you do the experiments in HeLa cells or a neuroblastoma cell line. The scientific literature should be full of such anomalies, right?" I don't know, maybe the literature is chock full of such anomalies. Lord knows there are easy enough to find. Maybe someone with lots of time on their hand can do an analysis.; Tuesday, February 03, 2026 11:30:00 PM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Thursday, January 15, 2026

Even more regulatory elements?

10 comments :