Sandwalk: On the importance of controls

Thursday, December 31, 2020

On the importance of controls

When doing an exeriment, it's important to keep the number of variables to a minimum and it's important to have scientific controls. There are two types of controls. A negative control covers the possibility that you will get a signal by chance; for example, if you are testing an enzyme to see whether it degrades sugar then the negative control will be a tube with no enzyme. Some of the sugar may degrade spontaneoulsy and you need to know this. A positive control is when you deliberately add something that you know will give a positive result; for example, if you are doing a test to see if your sample contains protein then you want to add an extra sample that contains a known amount of protein to make sure all your reagents are working.

Lots of controls are more complicated than the examples I gave but the principle is important. It's true that some experiments don't appear to need the appropriate controls but that may be an illusion. The controls might still be necessary in order to properly interpret the results but they're not done because they are very difficult. This is often true of genomics experiments.

Consider the ENCODE experiments where a great effort was made to map RNA transcripts, transcription factor binding sites, and open chromatin domains. In order to interpet these results correctly, you need both positive and negative controls but the most important was the negative control. Here's how Sean Eddy describes the required control (Eddy 2013):

To clarify what noise means, I propose the Random Genome Project. Suppose we put a few million bases of entirely random synthetic DNA into a human cell, and do an ENCODE project on it. Will it be reproducibly transcribed into mRNA-like transcripts, reproducibly bound by DNA-binding proteins, and reproducibly wrapped around histones marked by specific chromatin modifications? I think yes.

... Even as a thought experiment, the Random Genome Project states a null hypothesis that has been largely absent from these discussions in genomics. It emphasizes that it is reasonable to expect reproducible biochemical activities ... in random unselected DNA.

This may be a case where creating the control isn't easy but we are reaching the stage where it may become necessary because stamp-collecting will only get you so far. Ford Doolittle has come up with a similar type of control to interpret the functional elements (FE) described by ENCODE (Doolittle, 2013):

Suppose that there had been (and probably, some day, there will be) ENCODE projects aimed at enumerating, by transcriptional and chromatin mapping, factor footprinting, and so forth, all of the FEs in the genomes of Takifugu and a lungfish, some small and large genomed amphibians (including several species of Plethodon), plants, and various protists. There are, I think, two possible general outcomes of this thought experiment, neither of which would give us clear license to abandon junk. The first outcome would be that FEs (estimated to be in the millions in our genome) turn out to be more or less constant in number, regardless of C-value—at least among similarly complex organisms. ... The second likely general outcome of my thought experiment would be that FEs as defined by ENCODE increase in number with C-value, regardless of apparent organismal complexity.

I've been thinking a lot lately about transcripts and alternative splicing. Massive numbers of RNAs are being identified in all kinds of tissues and all kinds of species now that the techniques have become routine. When multiple transcript variants from the same gene are identified they are usually interpreted as genuine examples of alternative splicing. The field needs controls. The negative control is similar to the one proposed by Sean Eddy but it's important to have a positive control, which in this case would be a well-characterized set of genes with real alternative splicing where the function of the splice variants has been demonstrated. If your RNA-Seq experiment fails to detect the known alternatively spliced genes then something is wrong with the experiment.

It's not easy to identify this set of genes; that's why I admire the effort made by a graduate student (soon to be Ph.D.) at the University of British Columbia, Shams Bhuiyan, who tried very hard to comb the literature to come up with some gold standards to serve as positive controls (Bhuiyan, 2018). His efforts were not very successful because there aren't very many of these genuine examples. This is a problem for the field of alternative splicing but most workers ignore it.

This brings me to a recent paper that caught my eye:

Uebbing, S., Gockley, J., Reilly, S.K., Kocher, A.A., Geller, E., Gandotra, N., Scharfe, C., Cotney, J. and Noonan, J.P. (2019) Massively parallel discovery of human-specific substitutions that alter neurodevelopmental enhancer activity. Proc. Natl. Acad. Sci. (USA) 118: e2007049118. [doi: 10.1073/pnas.2007049118]

Genetic changes that altered the function of gene regulatory elements have been implicated in the evolution of human traits such as the expansion of the cerebral cortex. However, identifying the particular changes that modified regulatory activity during human evolution remain challenging. Here we used massively parallel enhancer assays in neural stem cells to quantify the functional impact of >32,000 human-specific substitutions in >4,300 human accelerated regions (HARs) and human gain enhancers (HGEs), which include enhancers with novel activities in humans. We found that >30% of active HARs and HGEs exhibited differential activity between human and chimpanzee. We isolated the effects of human-specific substitutions from background genetic variation to identify the effects of genetic changes most relevant to human evolution. We found that substitutions interacted in both additive and nonadditive ways to modify enhancer function. Substitutions within HARs, which are highly constrained compared to HGEs, showed smaller effects on enhancer activity, suggesting that the impact of human-specific substitutions is buffered in enhancers with constrained ancestral functions. Our findings yield insight into how human-specific genetic changes altered enhancer function and provide a rich set of candidates for studies of regulatory evolution in humans.

This is a very complicated set of experiments using techniques that I'm not familiar with. I suspect that there are only a few hundred scientists in the entire world that can read this paper and understand exactly what was done and whether the experiments were performed correctly. I imagine that there are even fewer who can evaluate the results in the proper context.

The objective is to identify mutations in the human genome that are responsible for making us different from our ancestors, notably the common ancestor we share with chimps. The authors assume, correctly, that these differences are likely to reside in regulatory sequences. They focused on regions of the genome that have been previously identified as the sites of chromatin modifications and/or transcription factor binding sites. They then narrowed down the search by choosing only those sites that showed either accelerated changes in the human lineage (1,363 HARs) or increased enhancer activities in humans (3,027 HGEs).

All of these sites, plus their chimp counterparts, were linked to reporter genes and the constructs were assayed for their ability to drive transcription of the reporter gene in cultures of human neural stem cells. Those cells were chosen because the authors expect a lot of human-specific changes in brain cells as opposed to other tissues. (That's not a reasonable assumption and, furthermore, it looks like brain cells have a lot more spurious transcription than other cells (except for testes).)

They found that only 12% of their HARs were active in this assay and only 34% of HGEs were active. That's interesting but it doesn't tell us a lot; for example, it doesn't tell us whether any of these sites are biologically significant because we don't have the results of Sean Eddy's Random Genome Project to tell us how many of ENCODE's sites are significant. We know that some small fraction of random DNA sequences have enhancer activity and we know that this fraction increases when you select for stretches of DNA that are known to bind transcription factors. What that means is that many of these sites are not real regulatory sequences but we don't know which ones are real and which ones are spurious.

Next, they focused on those sites that showed differential expression of the reporter genes when you compared the chimp and human versions. About 3% of all HARs and 12% of all HGEs fell into this category. Then they looked at the specific nucleotide differences to see if they were responsible for the differential expression and they found some examples, but most of them were modest changes (less than 2-fold). Here's the conclusion:

We identified 424 HARs and HGEs with human-specific changes in enhancer activity in human neural stem cells, as well as individual sequence changes that contribute to those regulatory innovations. These findings now enable detailed experimental analyses of candidate loci underlying the evolution of the human cortex, including in humanized cellular models and humanized mice. Comprehensive studies of the HARs and HGEs we have uncovered here, both individually and in combination, will provide novel and fundamental insights into uniquely human features of the brain.

This is a typical ENCODE-type conclusion. It leaves all the hard work to others. But here's the rub. How many labs are willing to take one of those 424 candidates and devote money, graduate students, and post-docs, to finding out whether they are really regulatory sites? I bet there are very few because, like the rest of us, they are so skeptical of the result that they are unwilling to risk their careers on it.

The experiments conducted by Uebbing et al. lack proper controls. There are times when simple data collection experiments are justified and there are times when additional genomics survey experiments are useful but as we enter 2021 we need to recognize that those times are behind us. The time has come to sort the wheat from the chaff and that means calling a halt to publishing experiments that can't be meaningfully interpreted.

Image Credit: The control flowchart is from ErrantScience.com.

Bhuiyan, S.A., Ly, S., Phan, M., Huntington, B., Hogan, E., Liu, C.C., Liu, J. and Pavlidis, P. (2018) Systematic evaluation of isoform function in literature reports of alternative splicing. BMC Genomics 19: 637. [doi: 10.1186/s12864-018-5013-2]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) 110: 5294-5300. [doi: 10.1073/pnas.1221376110]

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology 23: R259-R261. [10.1016/j.cub.2013.03.023]

1 comment :

William Spearshake said...: The importance of controls is not limited to the field of “experimentation”. My field is routine analytical chemistry. Things like lead content in drinking water. In addition to the calibration standards we run, we also run controls such as an independent reference material, blanks and spikes. In addition, we will often measure the analyte on several lines (wavelengths) to rule out background interferences.; Friday, January 08, 2021 9:52:00 PM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Thursday, December 31, 2020

On the importance of controls

1 comment :