Sandwalk: The Importance of the Null Hypothesis

Friday, May 25, 2012

The Importance of the Null Hypothesis

Jonathan Eisen of The Tree of Life is hosting a series of guest postings by the authors of recetnly published papers. The latest is a guest post by Josh Weitz on their paper on BMC Genomics: A neutral theory of genome evolution and the frequency distribution of genes. The paper tries to explain the concept of a pan genome, where closely related species, or strains, each have a subset of the total number of genes in the entire collection of species/strains. Why do some strains and some genes and not others?

Josh Weitz makes a point that bears repeating because most people just don't understand it.

So, let me be clear: I do think that genes matter to the fitness of an organism and that if you delete/replace certain genes you will find this can have mild to severe to lethal costs (and occasional benefits). However, our point in developing this model was to try and create a baseline null model, in the spirit of neutral theories of population genetics, that would be able to reproduce as much of the data with as few parameters as possible. Doing so would then help identify what features of gene compositional variation could be used as a means to identify the signatures of adaptation and selection. Perhaps this point does not even need to be stated, but obviously not everyone sees it the same way. In fact, Eugene Koonin has made a similar argument in his nice paper, Are there laws of adaptive evolution: "the null hypothesis is that any observed pattern is first assumed to be the result of non-selective, stochastic processes, and only once this assumption is falsified, should one start to explore adaptive scenarios''. I really like this quote, even if I don't always follow this rule (perhaps I should). It's just so tempting to explore adaptive scenarios first, but it doesn't make it right.

37 comments :

Joe Felsenstein said...: ... and if we set the bar high enough for proof that natural selection is involved, our wonderful Null Hypothesis will get rid of natural selection altogether. Wow.; Friday, May 25, 2012 3:04:00 PM
Larry Moran said...: Exactly. If you can't rule out chance then why waste time on just-so stories?

If everyone followed this rule, evolutionary psychology would practically disappear.; Friday, May 25, 2012 4:25:00 PM
konrad said...: Actually, the trend (at the molecular level) is in the opposite direction. Selective pressure is never _exactly_ neutral, and as we get more data and develop better methods we are getting better at identifying small deviations from neutrality. People used to look for positive selection that is strong enough to elevate the rate of nonsynonymous substitution averaged over all sites in a gene and over all time in a phylogeny. Then they started looking at individual sites (while still averaging over time) and found many more examples of positive selection (as you would expect). We recently developed a method (I'd link to the paper but it's still in press at PLoS Genetics) that looks for selection acting at individual sites on only a subset of lineages and (again as you would expect) we again find many more examples of positive selection.

Of course, this does not mean that most phenotypes which rise to fixation are adaptive.; Friday, May 25, 2012 4:57:00 PM
Joe Felsenstein said...: Larry: Exactly. If you can't rule out chance then why waste time on just-so stories?

... or even on hypotheses that have some evidence for them. The all-powerful Null Hypothesis wins every time!

konrad: Selective pressure is never _exactly_ neutral, ... Watch out, "konrad", Larry's invincible Null Hypothesis is that it is exactly zero.; Friday, May 25, 2012 5:07:00 PM
konrad said...: Not a problem - that is the hypothesis being tested. One _could_ argue that setting up a test for a hypothesis that should really get a prior of zero (selective pressure being a real-valued parameter, and 0.999999999999 being unequal to 1) is a Bad Idea (the alternative is just to measure selective pressure and report a confidence interval), but it's a standard thing to do; in a hypothesis testing framework even Larry is accommodated.

(Why are you putting my name in quotes? It's my real name. http://www.cs.sun.ac.za/~kscheffler/ if you _must_ know.); Friday, May 25, 2012 5:50:00 PM
Joe Felsenstein said...: (Why are you putting my name in quotes? It's my real name. http://www.cs.sun.ac.za/~kscheffler/ if you _must_ know.)

Your name was in lower case. So I couldn't know whether it was your real name. I was not intending to implythat you were misrepresenting yourself.; Friday, May 25, 2012 6:08:00 PM
W. Benson said...: Suppose a bright showy butterfly flies through the yard; what experiment should I conduct to falsify the null hypothesis that the color pattern has no adaptive function? Should I design the experiment differently if the butterfly were rather clumsy and flying slowly than if it were agile and evasive? Should I proceed differently if the yard was in Brazil and not Canada? Justify your answer logically, but without referring to post-1858 concepts or making up a just-so story.; Friday, May 25, 2012 9:16:00 PM
Rosie Redfield said...: Boring but ubiquitous process B is known to affect phenotype P, but some researchers claim that process P is caused by a postulated glamorous process G.

A reasonable null hypothesis is that phenotype P is entirely due to process B. We are only justified in citing P as evidence of process G if it cannot be explained by B.; Saturday, May 26, 2012 12:28:00 AM
Steven Jenkins said...: Characterizations like "boring" and "glamorous" have no meaning in statistical inference testing, and nothing about the null hypothesis requires it to be the "no effect" or "boring" hypothesis. Given any test statistic, the null hypothesis is simply a set of assumptions sufficient to calculate a distribution for that statistic, and then reject or fail to reject based on observed data.

"Phenotype P is entirely due to process G" is also in principle a reasonable null hypothesis. If, based on data, we can't reject one and not the other, we have more work to do.

I have no opinion on the biology, just the statistics. :-); Saturday, May 26, 2012 1:14:00 AM
Steven Jenkins said...: Or to be more succinct, the null hypothesis is not necessarily the most scientifically parsimonious.; Saturday, May 26, 2012 1:18:00 AM
qetzal said...: Steven Jenkins,

You've focused on the wrong adjectives. It's not the "boring" vs. "glamorous" that disqualifies "Phenotype P is entirely due to process G" as a reasonable null hypothesis. Process B is "ubiquitous" (i.e. well-established), while Process G is only "postulated" (i.e. not even proven to exist).; Saturday, May 26, 2012 9:24:00 AM
Anonymous said...: "The null hypothesis is that any observed pattern is first assumed to be the result of non-selective, stochastic processes"

How many stochastic processes? Leading to an infinite number of null hypotheses? And selection itself cannot be stochastic?

If one looks at good examples of selection, the argument is based on understanding of how processes work, not on statistics without any biological background.; Saturday, May 26, 2012 10:33:00 AM
DK said...: You will see that a lot of biologists don't understand what exactly "null hypothesis" means. They think that, instead of some default that we chose, it is something written in stone. E.g., Dawkins in comments on this blog claimed that neutrality *must* be null hypothesis:

"It is not MY null hypothesis. That isn't how null hypotheses work. It is THE null hypothesis, that which an experiment seeks to disprove. Perhaps you are unfamiliar with the technical term 'null hypothesis'?"

http://sandwalk.blogspot.com/2011/02/dawkins-darwin-drift-and-neutral-theory.html

I did reply to him that null is formulated based on available knowledge but got no response.; Saturday, May 26, 2012 10:49:00 AM
Steven Jenkins said...: "Reasonable" is the word that matters. Biologically reasonable and statistically reasonable are not the same thing. Clearly, one hypothesis is more parsimonious, and therefore more biologically reasonable. But they are equally valid as statistical null hypotheses, because nothing whatever in the actual decision process hinges on parsimony.; Saturday, May 26, 2012 1:32:00 PM
Jonathan Badger said...: Obviously, statistics should never be applied without thinking about what you are testing. But on the other hand, all statistical models are just that - models - and therefore are of course flawed representations of the world that can be criticized for lack of realism (assuming exactly zero change in fitness and the like) Presumably those who argue against null hypothesis testing have a preferred technique and are not suggesting that we go back to the fuzzy days of qualitative science with no statistical support. So, what is this alternative statistical method to replace null hypothesis testing?; Saturday, May 26, 2012 1:50:00 PM
Richard Edwards said...: I am a big fan of applying statistical tests where possible and it really annoys me when people either forget or (worse) ignore statistics. That said, however, it has to be remembered that statistics are (a) pretty arbitrary and (b) highly dependent on underlying assumptions. There is nothing magic about a 5% probability and a non-significant result does not always mean that the null hypothesis is right, just that you cannot (currently) rule it out with the chosen level of confidence, assuming your underlying model is right. Remember that there are "Type II" statistical errors - incorrectly accepting the null hypothesis.

Also, we should not confuse statistical significance with biological significance. They are not synonymous. Sometimes, it is worth following up a potentially biologically significant result, even when it isn't statistically significant. You may have too much noise in your system (or insufficient data) to get a statistically robust answer and therefore need to look to additional experiments or evidence - or just repeat the analysis with more data. So, yes, don't forget the null hypothesis... but also don't forget the power of the statistical test, and that failure to reject the null hypothesis does not mean that the null hypothesis is right. (The "argument from ignorance" logical fallacy.); Saturday, May 26, 2012 6:14:00 PM
David Winter said...: You think it's "not a problem" to test a hypothesis that no one believes, and then pretend you've got a meaningful result when a hypothesis you didn't believe is rejected.

It might be "standard", but it does seem very odd doesn't it?; Saturday, May 26, 2012 8:18:00 PM
qetzal said...: Sure. From a purely statistical perspective, any hypothesis can be treated as the null.

But we don't do science from a purely statistical perspective. We do it from the persepective of our current knowledge. So when we're talking about using statistics to try to infer new scientific knowledge (as Rosie was), it is NOT reasonable to prefer a postulated process as the null over a ubiquitous one that has the known ability to cause the outcome being studied.; Sunday, May 27, 2012 8:50:00 AM
Joe Felsenstein said...: Well, Bayesian inference, for example. Or a strict likelihoodist view of someone like Anthony Edwards in his book Likelihood. Also maybe a decision theory machinery involving minimizing some loss function. None of those involves having a null hypothesis, just a set of hypotheses.

The issue here was whether neutrality gets to be the null hypothesis. When we are testing drugs to cure cancer, and want to be very cautious about concluding that they work, that seems to be different case than when we try to get a picture of how much natural selection there is. In the natural selection case it is not a disaster if we make occasional mistakes about which places in the genome are under selection, as long as we get the right overall conclusion about how much natural selection there is, since that is what we are trying to infer. But in the drug case we want to get it right for each individual drug, because it will get widely used if it is thought to be effective.; Sunday, May 27, 2012 10:26:00 AM
DK said...: @Jonathan Badger:

In Bayesian world, null hypothesis does not exist.; Sunday, May 27, 2012 11:30:00 AM
Michael M said...: I think that, if someone explained what genetic drift and natural selection are, it would become immediately clear why genetic drift is the null hypothesis.; Sunday, May 27, 2012 1:39:00 PM
Jonathan Badger said...: Okay, but in both AWF Edwards' method (I have the book, but it was fairly tough going last time I tried to work through it) and in Bayesian methods you are still comparing models. Wouldn't one of them still be "no selection"? Are we just talking about doing the stats in a different way or is it more than that? Are there equivalents to the selection detection program PAML already implemented using these alternatives?; Sunday, May 27, 2012 3:09:00 PM
Joe Felsenstein said...: I'm confused by this comment. There are detailed explanations of the math here. Is that what you are asking for? Or do you imagine that all of us in the discussion here don't know about them?; Sunday, May 27, 2012 3:39:00 PM
Michael M said...: I'm saying that, if evolution is a stochastic process (as Larry has said it is before), one is always going to get a distribution of possible allele frequencies, regardless of any bias toward the reproduction of one phenotype. Therefore, claiming that a change in allele frequency is due to natural selection is a bit non-sensical, because the probability of a change in allele frequency is far greater that the probability of no change in allele frequency, even if there is no bias toward the reproduction of a particular phenotype.; Sunday, May 27, 2012 6:08:00 PM
DK said...: Wouldn't one of them still be "no selection"?

It would. Or it can be. Or, for people with certain beliefs about reality, it wouldn't. The simple point is that there is no such thing as THE null hypothesis. Other points are more complex.; Sunday, May 27, 2012 9:26:00 PM
DK said...: one is always going to get a distribution of possible allele frequencies, regardless of any bias toward the reproduction of one phenotype.

With a strong selection, that distribution will be delta function.; Sunday, May 27, 2012 10:00:00 PM
Lou Jost said...: Jonathan, the alternative to null hypothesis testing is parameter estimation, with confidence intervals around your estimate of the parameter. In most cases, nobody cares about the null hypothesis that a character is exactly neutral. Any interesting character is highly unlikely to be exactly neutral, to 10 decimal places. A better question is "How close to neutral is it?" So we find a meaningful parameter (like the selection coefficient s) and try to measure it experimentally or in the field. That parameter estimate (with confidence interval) is what we really want to know, and is much more interesting and informative than a darn p-value on a null hypothesis.; Monday, May 28, 2012 1:14:00 AM
Joe Felsenstein said...: If there is strong enough selection, the shift on the mean of the distribution of gene frequencies over time will be strong enough to be detectable. And of course the same is true for quantitative characters.; Monday, May 28, 2012 8:10:00 AM
Larry Moran said...: You could falsify the null hypothesis by showing that the coloration is adaptive.

Here's the problem. If experiment after experiment fails to show that the showiness actually confers a selective advantage then you can't just assume that natural selection is the explanation. Whenever you publish a paper you have to state clearly in the introduction that you have not eliminated the possibility that the showiness is just an accident of evolution.; Monday, May 28, 2012 9:01:00 AM
Larry Moran said...: Joe Felsenstein writes,

The issue here was whether neutrality gets to be the null hypothesis. When we are testing drugs to cure cancer, and want to be very cautious about concluding that they work, that seems to be different case than when we try to get a picture of how much natural selection there is.

I don't see why there should be a difference. You are free to postulate that natural selection is the cause of some phenotype but if you can't demonstrate adaptation then you can't eliminate the possibility that your postulated cause isn't working.

In the natural selection case it is not a disaster if we make occasional mistakes about which places in the genome are under selection, as long as we get the right overall conclusion about how much natural selection there is, since that is what we are trying to infer.

No, actually in most cases scientists are trying to work out the cause of a particular phenotype. It's not a disaster if scientists falsely speculate that the differences between Indian and African rhinos is due to adaptation but it is a scientific disaster if they completely ignore any other possibility.; Monday, May 28, 2012 9:25:00 AM
Hawks said...: No, actually in most cases scientists are trying to work out the cause of a particular phenotype. It's not a disaster if scientists falsely speculate that the differences between Indian and African rhinos is due to adaptation but it is a scientific disaster if they completely ignore any other possibility.

It seems to me like starting by assuming drift rather than starting by assuming selection ignore the other possibility to the same extent. So, why not start by actually "trying to work out the cause of a particular phenotype"? So before comparing what is expected under drift versus what is expected under selection, none of the hypotheses are assumed.; Monday, May 28, 2012 3:51:00 PM
Joe Felsenstein said...: It seems to me that you (Larry) are trying to generalize about how much evolution is due to neutrality and how much to selection. In which case starting with neutrality and calling it the Null Hypothesis and then being very skeptical of evidence for natural selection will give you a biased picture.; Tuesday, May 29, 2012 10:41:00 AM
Joshua Weitz said...: My blog post seems to have, in part, kick-started this lively debate that has gone off in a number of other directions. Hence, I wanted to add one clarifying point: the original blog and BMC Genomics article on which it is based do not claim that failure to reject the neutral model implies therefore that evolution is entirely neutral. From our conclusion:

"In the present case the excellent fits obtained, e.g., for our flexible core model (model D), should not be interpreted as an indication for the validity of its assumptions. Rather, our analysis shows that gene frequency distributions do not contain sufficient information for the inference of evolutionary mechanisms underlying the observed distributions."

Hope this is of some help to this discussion.; Tuesday, May 29, 2012 11:32:00 AM
konrad said...: Not really - rejecting the hypothesis means that there is _significant_ evidence against it. This is meaningful for two reasons: 1) we can be confident that the observed signal is not just due to noise; 2) in practice, if the signal is strong enough to give statistical significance, the effect size is not small. So the sites at which we are able to reject neutrality really are different from the others.

Also bear in mind that we are not so much interested in whether the null hypothesis of neutrality is rejected (which would not tell us whether we're dealing with adaptive or purifying selection, or a mixture of the two), but in finding evidence for a _particular_ type of selection.; Tuesday, May 29, 2012 2:34:00 PM
Richard Edwards said...: You have to be a little careful here. Rejecting the Null Hypothesis means that there is significant evidence that one or more of the underlying assumptions of the Null Hypothesis are wrong. It may not be the one you think, though. If, for example, you also assume uniform mutation across sites, it might be this uniformity that is violated, not the neutrality.

Also, in the modern "post-genomic" age, I am not sure how safe it is to say that "in practice, if the signal is strong enough to give statistical significance, the effect size is not small". If your dataset is very large, the effect can be tiny and still be significant. The bioinformatics literature is riddled with people finding significant but tiny effects that are essentially useless if you want to apply them to individual situations because the effect is so small. If you really trust your model, they might tell you something about nature. Often, though, I suspect they are probably due to small random deviations away from the Null hypothesis assumptions.; Wednesday, May 30, 2012 1:20:00 PM
Richard Edwards said...: Oops! When I wrote "small random deviations away from the Null hypothesis assumptions", I meant "small boring deviations"... Obviously, they need to be consistent to result in significance.; Wednesday, May 30, 2012 1:23:00 PM
konrad said...: Of course you are right about the interpretation - there is always the caveat that the results are reliable only insofar as the model assumptions are believable. That is why we are always striving to make the assumptions more biologically realistic. For instance, a key point of the study I'm referring to is that we are removing the assumption that rates are constant over time. (Not assuming that rates are constant over sites is already pretty standard.)

By "in practice" I meant "in the cases that are actually being analyzed at present or will be in the near future". I completely agree that significance becomes uninteresting when the effect size is too small; that just doesn't happen to be the case in this field at present.; Wednesday, May 30, 2012 2:06:00 PM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Friday, May 25, 2012

The Importance of the Null Hypothesis

37 comments :