I've been interested in genome organization for several decades and I've been following the literature on pervasive transcription and transcription factor binding in whole genome studies. I'm reasonably familiar with the techniques although I've never done them myself.
I'm not bragging; I'm just saying that I know a little bit about this stuff so when I saw this paper in one of the latest issues of Nature I decided to look more carefully.
Heinz, S., Romanoski, C., Benner, C., Allison, K., Kaikkonen, M., Orozco, L. and Glass, C. (2013) Effect of natural genetic variation on enhancer selection and function. Nature 503:487-492. [doi: 10.1038/nature12615]Almost everyone knows about the major problem in this field after the ENCODE publicity fiasco of last year. The "problem" that it's relatively easy to identify transcription factor binding sites but it's quite another matter to determine if they are functional.1 The title of the paper suggested to me that these authors had found a way to address this problem by looking at natural variation between inbred mouse strains. If they detect differences in transcription factor binding sites, will these be correlated with differences in the regulation of identifiable (annotated) genes? Or, will the evidence suggest that those binding sites are just "noise"? Will the strain differences affect the production of low levels of transcription (i.e. most of "pervasive" transcription)?
As soon as I read the abstract it became apparent that the authors were not addressing the most important issue.2 They were looking at something else. But what, exactly? Here's the abstract ... see if you can guess.
The mechanisms by which genetic variation affects transcription regulation and phenotypes at the nucleotide level are incompletely understood. Here we use natural genetic variation as an in vivo mutagenesis screen to assess the genome-wide effects of sequence variation on lineage-determining and signal-specific transcription factor binding, epigenomics and transcriptional outcomes in primary macrophages from different mouse strains. We find substantial genetic evidence to support the concept that lineage-determining transcription factors define epigenetic and transcriptomic states by selecting enhancer-like regions in the genome in a collaborative fashion and facilitating binding of signal-dependent factors. This hierarchical model of transcription factor function suggests that limited sets of genomic data for lineage-determining transcription factors and informative histone modifications can be used for the prioritization of disease-associated regulatory variants.After reading the paper very carefully (three times!) I think I know what they mean but it sounded a lot like gibberish the first time I read it. I'm curious to know what the rest of you think. Do you understand what this paper is all about after reading the abstract?
Maybe you have to read the introduction to get a better idea. Here it is.
Inter-individual genetic variation is a major cause of diversity in phenotypes and disease susceptibility. Although sequence variants in gene promoters and protein-coding regions provide obvious prioritization of disease-causing variants, most (88%) genome-wide association study (GWAS) loci are in non-coding DNA, suggesting regulatory functions1. Prioritization of functional intergenic variants remains challenging, owing in part to an incomplete understanding of how regulation is achieved at the nucleotide level in different cell types and environmental contexts2,3,4,5,6,7,8,9,10,11. Recent studies have described important roles for lineage-determining transcription factors (LDTFs), also referred to as pioneer factors or master regulators, in selecting cell-type-specific enhancers12,13,14,15, but the sequence determinants that guide their binding are poorly understood. Previous findings in macrophages and B cells suggest a hierarchical model of regulatory function6, in which a relatively small set of LDTFs collaboratively compete with nucleosomes to bind DNA in a cell-type-specific manner (Fig. 1A, a and b). The binding of these factors is proposed to ‘prime’ DNA by initiating deposition of histone modifications that are associated with cis-active regulatory regions (Fig. 1A, b and c) and enable concurrent or subsequent binding of signal-dependent transcription factors that direct regulated gene expression6,13,15,16 (Fig. 1A, c–e). In principle, this model provides a straightforward framework that allows non-coding variants to be classified with respect to their ability to directly perturb LDTF binding and their potential to exert indirect effects on binding of other LDTFs and signal-dependent transcription factors. To test the validity of this model and its ability to explain effects of genetic variation on transcription factor binding and function, we exploited the naturally occurring genetic variation between the inbred C57BL/6J and BALB/cJ mouse strains (~4 million single nucleotide polymorphisms (SNPs) and ~750 k indels17) as an ‘in vivo mutagenesis screen’.Does that help?
I may post an English translation in the comments after a few days but for now I'd like to hear from you. Have we reached the stage where Nature articles are all but incomprehensible to people who aren't actively working the specific field? (This is a main article, not a "Letter.")
1. The problem is compounded my misuse of the word "enhancer." An enhancer is a region of DNA that's known to be required for gene expression. Those who do whole genome studies tend to use the word to mean ANY site of transcription factor binding. These are "potential" enhancers. They could also be junk binding sites with no biological function. I don't think it's a good idea to misuse the word "enhancer" in this manner. (See the title of the paper.)
2. Turns out there's nothing in the paper to suggest that the authors are the least bit interested in whether their results are biologically relevant.