The expression of genes is regulated at many levels but one of the most important is regulation at the level of transcription. Transcription initiation is controlled by transcription factors that bind to sequences near the promoter and either activate or repress transcription.
A lot of work has been done on transcription regulation in mammals over the past 40 years. The general impression from these detailed studies of individual genes is that regulation usually involves a relatively small number of transcription factors that bind to sequences within 1000 bp or so of the transcription start site.
This model was challenged by the ENCODE studies in 2012. ENCODE researchers claimed to have discovered hundreds of thousands of cis-regulatory elements (CRE's) covering a substantial percentage of the genome. If they are correct, then this means that there are dozens of transcription factors controlling the expression of every gene.
All researchers need to realize that the best scientific practice is produced when, like Darwin, they persistently search for flaws in their arguments. Bruce Alberts et al. (2015) "Self-correction in science at work" Science 348: 1420Many scientists pointed out that what the ENCODE researchers were really looking at was transcription factor binding sites and not CRE's. In a genome full of junk DNA, we expect a large number of spurious transcription factor binding sites. These sites are NOT CREs although they may be good candidates for biologically relevant regulatory sites. Later ENCODE researchers seemed to (reluctantly) agree with this criticism so they began to label those sites as "candidate" cis-regulatory elements or cCRE's.
The controversy continues. I've blogged about it repeatedly in an effort to alert people to the real issue; namely, whether a transcription factor binding site is real or spurious [How many regulatory sites in the human genome?]. Last month I drew your attention to a study of TF binding sites in random DNA sequences inserted into human cells. That study confirmed that you could detect these sites in random DNA suggesting that the ENCODE data might contain a lot of spurious sites that have nothing to do with regulation [The activity of "random" DNA supports the junk DNA model].
Now we're starting 2026 with another study demonstrating that ENCODE supporters haven't listened to any of the criticism leveled against their interpretation. For reasons that are very unclear to me, this most recent study was published in Nature, one of the most prestigious science journals.
Moore, J.E., Pratt, H.E., Fan, K., Phalke, N., Fisher, J., Elhajjajy, S.I., Andrews, G., Gao, M., Shedd, N. et al. (2026) An expanded registry of candidate cis-regulatory elements. Nature:1-10. [doi: 10.1038/s41586-025-09909-9]
Mammalian genomes contain millions of regulatory elements that control the complex patterns of gene expression. Previously, the ENCODE consortium mapped biochemical signals across hundreds of cell types and tissues and integrated these data to develop a registry containing 0.9 million human and 300,000 mouse candidate cis-regulatory elements (cCREs) annotated with potential functions2. Here we have expanded the registry to include 2.37 million human and 967,000 mouse cCREs, leveraging new ENCODE datasets and enhanced computational methods. This expanded registry covers hundreds of unique cell and tissue types, providing a comprehensive understanding of gene regulation. Functional characterization data from assays such as STARR-seq, massively parallel reporter assay, CRISPR perturbation and transgenic mouse assays have profiled more than 90% of human cCREs, revealing complex regulatory functions. We identified thousands of novel silencer cCREs and demonstrated their dual enhancer and silencer roles in different cellular contexts. Integrating the registry with other ENCODE annotations facilitates genetic variation interpretation and trait-associated gene identification, exemplified by the identification of KLF1 as a novel causal gene for red blood cell traits. This expanded registry is a valuable resource for studying the regulatory genome and its impact on health and disease.
So now we have 2.37 million transcription factor binding sites that may or may not be true regulatory elements. They are "candidates" (cCREs) but the authors claim that this study provides "a comprehensive understanding of gene regulation" because 90% of these candidate sites are actually involved in regulation.
Let's think about that. 90% of 2.37 million is still 2.13 million sites. This means an average of 85 regulatory sites per gene if there are 25,000 genes. Does anyone seriously believe that the average human gene is controlled by that many regulatory sites? (Keep in mind that about 10,000 of those genes are housekeeping genes that are transcribed in almost every cell.)
Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can—if you know anything at all wrong, or possibly wrong—to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it as well as those that agree with it. Richard Fyenman (1985) "Cargo Cult Science" in Surely You're Joking, Mr. Feynman"Apparently the authors are quite comfortable with this conclusion. They note that the current CRE database covers 21% of the genome but there may be more sites that have yet to be discovered. Here's part of their discussion.
With a genomic footprint of 21%, the human registry represents a comprehensive catalogue of the cis-regulatory repertoire, as it integrates data across thousands of biosamples spanning most human organs and tissues. However, we recognize the need for further evaluation using single-cell data to determine whether the registry may miss high-activity CREs specific to numerically rare cell types. Additionally, the potential emergence of novel CREs under disease or stimulation conditions remains an open area for investigation. Our initial assessments using single-cell data (Supplementary Note 1.7) support the overall completeness of the registry, but future work will be necessary to refine and expand its coverage using these more granular datasets.
Note the subtle shift from cCREs to just CREs.
Let me be clear about my critique. I'm not denying that there may be a huge number of biologically relevant regulatory sites hidden in the junk DNA. I'm skeptical, but still trying to keep an open mind.
What I object to most strongly is the fact that Moore et al. don't even consider the possibility that they may be looking at spurious TF binding sites and they don't even discuss the implications of their conclusions.
The fact that this paper was published without acknowledging the controversy tells me that peer review has failed.

No comments :
Post a Comment