Friday, May 25, 2012

The Importance of the Null Hypothesis

Jonathan Eisen of The Tree of Life is hosting a series of guest postings by the authors of recetnly published papers. The latest is a guest post by Josh Weitz on their paper on BMC Genomics: A neutral theory of genome evolution and the frequency distribution of genes. The paper tries to explain the concept of a pan genome, where closely related species, or strains, each have a subset of the total number of genes in the entire collection of species/strains. Why do some strains and some genes and not others?

Josh Weitz makes a point that bears repeating because most people just don't understand it.
So, let me be clear: I do think that genes matter to the fitness of an organism and that if you delete/replace certain genes you will find this can have mild to severe to lethal costs (and occasional benefits). However, our point in developing this model was to try and create a baseline null model, in the spirit of neutral theories of population genetics, that would be able to reproduce as much of the data with as few parameters as possible. Doing so would then help identify what features of gene compositional variation could be used as a means to identify the signatures of adaptation and selection. Perhaps this point does not even need to be stated, but obviously not everyone sees it the same way. In fact, Eugene Koonin has made a similar argument in his nice paper, Are there laws of adaptive evolution: "the null hypothesis is that any observed pattern is first assumed to be the result of non-selective, stochastic processes, and only once this assumption is falsified, should one start to explore adaptive scenarios''. I really like this quote, even if I don't always follow this rule (perhaps I should). It's just so tempting to explore adaptive scenarios first, but it doesn't make it right.


37 comments:

  1. ... and if we set the bar high enough for proof that natural selection is involved, our wonderful Null Hypothesis will get rid of natural selection altogether. Wow.

    ReplyDelete
    Replies
    1. Exactly. If you can't rule out chance then why waste time on just-so stories?

      If everyone followed this rule, evolutionary psychology would practically disappear.

      Delete
    2. Actually, the trend (at the molecular level) is in the opposite direction. Selective pressure is never _exactly_ neutral, and as we get more data and develop better methods we are getting better at identifying small deviations from neutrality. People used to look for positive selection that is strong enough to elevate the rate of nonsynonymous substitution averaged over all sites in a gene and over all time in a phylogeny. Then they started looking at individual sites (while still averaging over time) and found many more examples of positive selection (as you would expect). We recently developed a method (I'd link to the paper but it's still in press at PLoS Genetics) that looks for selection acting at individual sites on only a subset of lineages and (again as you would expect) we again find many more examples of positive selection.

      Of course, this does not mean that most phenotypes which rise to fixation are adaptive.

      Delete
  2. Larry: Exactly. If you can't rule out chance then why waste time on just-so stories?

    ... or even on hypotheses that have some evidence for them. The all-powerful Null Hypothesis wins every time!

    konrad: Selective pressure is never _exactly_ neutral, ... Watch out, "konrad", Larry's invincible Null Hypothesis is that it is exactly zero.

    ReplyDelete
    Replies
    1. Not a problem - that is the hypothesis being tested. One _could_ argue that setting up a test for a hypothesis that should really get a prior of zero (selective pressure being a real-valued parameter, and 0.999999999999 being unequal to 1) is a Bad Idea (the alternative is just to measure selective pressure and report a confidence interval), but it's a standard thing to do; in a hypothesis testing framework even Larry is accommodated.

      (Why are you putting my name in quotes? It's my real name. http://www.cs.sun.ac.za/~kscheffler/ if you _must_ know.)

      Delete
    2. You think it's "not a problem" to test a hypothesis that no one believes, and then pretend you've got a meaningful result when a hypothesis you didn't believe is rejected.

      It might be "standard", but it does seem very odd doesn't it?

      Delete
    3. Not really - rejecting the hypothesis means that there is _significant_ evidence against it. This is meaningful for two reasons: 1) we can be confident that the observed signal is not just due to noise; 2) in practice, if the signal is strong enough to give statistical significance, the effect size is not small. So the sites at which we are able to reject neutrality really are different from the others.

      Also bear in mind that we are not so much interested in whether the null hypothesis of neutrality is rejected (which would not tell us whether we're dealing with adaptive or purifying selection, or a mixture of the two), but in finding evidence for a _particular_ type of selection.

      Delete
    4. You have to be a little careful here. Rejecting the Null Hypothesis means that there is significant evidence that one or more of the underlying assumptions of the Null Hypothesis are wrong. It may not be the one you think, though. If, for example, you also assume uniform mutation across sites, it might be this uniformity that is violated, not the neutrality.

      Also, in the modern "post-genomic" age, I am not sure how safe it is to say that "in practice, if the signal is strong enough to give statistical significance, the effect size is not small". If your dataset is very large, the effect can be tiny and still be significant. The bioinformatics literature is riddled with people finding significant but tiny effects that are essentially useless if you want to apply them to individual situations because the effect is so small. If you really trust your model, they might tell you something about nature. Often, though, I suspect they are probably due to small random deviations away from the Null hypothesis assumptions.

      Delete
    5. Oops! When I wrote "small random deviations away from the Null hypothesis assumptions", I meant "small boring deviations"... Obviously, they need to be consistent to result in significance.

      Delete
    6. Of course you are right about the interpretation - there is always the caveat that the results are reliable only insofar as the model assumptions are believable. That is why we are always striving to make the assumptions more biologically realistic. For instance, a key point of the study I'm referring to is that we are removing the assumption that rates are constant over time. (Not assuming that rates are constant over sites is already pretty standard.)

      By "in practice" I meant "in the cases that are actually being analyzed at present or will be in the near future". I completely agree that significance becomes uninteresting when the effect size is too small; that just doesn't happen to be the case in this field at present.

      Delete
  3. (Why are you putting my name in quotes? It's my real name. http://www.cs.sun.ac.za/~kscheffler/ if you _must_ know.)

    Your name was in lower case. So I couldn't know whether it was your real name. I was not intending to implythat you were misrepresenting yourself.

    ReplyDelete
  4. Suppose a bright showy butterfly flies through the yard; what experiment should I conduct to falsify the null hypothesis that the color pattern has no adaptive function? Should I design the experiment differently if the butterfly were rather clumsy and flying slowly than if it were agile and evasive? Should I proceed differently if the yard was in Brazil and not Canada? Justify your answer logically, but without referring to post-1858 concepts or making up a just-so story.

    ReplyDelete
    Replies
    1. You could falsify the null hypothesis by showing that the coloration is adaptive.

      Here's the problem. If experiment after experiment fails to show that the showiness actually confers a selective advantage then you can't just assume that natural selection is the explanation. Whenever you publish a paper you have to state clearly in the introduction that you have not eliminated the possibility that the showiness is just an accident of evolution.

      Delete
  5. Boring but ubiquitous process B is known to affect phenotype P, but some researchers claim that process P is caused by a postulated glamorous process G.

    A reasonable null hypothesis is that phenotype P is entirely due to process B. We are only justified in citing P as evidence of process G if it cannot be explained by B.

    ReplyDelete
  6. Characterizations like "boring" and "glamorous" have no meaning in statistical inference testing, and nothing about the null hypothesis requires it to be the "no effect" or "boring" hypothesis. Given any test statistic, the null hypothesis is simply a set of assumptions sufficient to calculate a distribution for that statistic, and then reject or fail to reject based on observed data.

    "Phenotype P is entirely due to process G" is also in principle a reasonable null hypothesis. If, based on data, we can't reject one and not the other, we have more work to do.

    I have no opinion on the biology, just the statistics. :-)

    ReplyDelete
    Replies
    1. You will see that a lot of biologists don't understand what exactly "null hypothesis" means. They think that, instead of some default that we chose, it is something written in stone. E.g., Dawkins in comments on this blog claimed that neutrality *must* be null hypothesis:

      "It is not MY null hypothesis. That isn't how null hypotheses work. It is THE null hypothesis, that which an experiment seeks to disprove. Perhaps you are unfamiliar with the technical term 'null hypothesis'?"

      http://sandwalk.blogspot.com/2011/02/dawkins-darwin-drift-and-neutral-theory.html

      I did reply to him that null is formulated based on available knowledge but got no response.

      Delete
  7. Or to be more succinct, the null hypothesis is not necessarily the most scientifically parsimonious.

    ReplyDelete
  8. Steven Jenkins,

    You've focused on the wrong adjectives. It's not the "boring" vs. "glamorous" that disqualifies "Phenotype P is entirely due to process G" as a reasonable null hypothesis. Process B is "ubiquitous" (i.e. well-established), while Process G is only "postulated" (i.e. not even proven to exist).

    ReplyDelete
    Replies
    1. "Reasonable" is the word that matters. Biologically reasonable and statistically reasonable are not the same thing. Clearly, one hypothesis is more parsimonious, and therefore more biologically reasonable. But they are equally valid as statistical null hypotheses, because nothing whatever in the actual decision process hinges on parsimony.

      Delete
    2. Sure. From a purely statistical perspective, any hypothesis can be treated as the null.

      But we don't do science from a purely statistical perspective. We do it from the persepective of our current knowledge. So when we're talking about using statistics to try to infer new scientific knowledge (as Rosie was), it is NOT reasonable to prefer a postulated process as the null over a ubiquitous one that has the known ability to cause the outcome being studied.

      Delete
  9. "The null hypothesis is that any observed pattern is first assumed to be the result of non-selective, stochastic processes"

    How many stochastic processes? Leading to an infinite number of null hypotheses? And selection itself cannot be stochastic?

    If one looks at good examples of selection, the argument is based on understanding of how processes work, not on statistics without any biological background.

    ReplyDelete
    Replies
    1. Obviously, statistics should never be applied without thinking about what you are testing. But on the other hand, all statistical models are just that - models - and therefore are of course flawed representations of the world that can be criticized for lack of realism (assuming exactly zero change in fitness and the like) Presumably those who argue against null hypothesis testing have a preferred technique and are not suggesting that we go back to the fuzzy days of qualitative science with no statistical support. So, what is this alternative statistical method to replace null hypothesis testing?

      Delete
    2. Well, Bayesian inference, for example. Or a strict likelihoodist view of someone like Anthony Edwards in his book Likelihood. Also maybe a decision theory machinery involving minimizing some loss function. None of those involves having a null hypothesis, just a set of hypotheses.

      The issue here was whether neutrality gets to be the null hypothesis. When we are testing drugs to cure cancer, and want to be very cautious about concluding that they work, that seems to be different case than when we try to get a picture of how much natural selection there is. In the natural selection case it is not a disaster if we make occasional mistakes about which places in the genome are under selection, as long as we get the right overall conclusion about how much natural selection there is, since that is what we are trying to infer. But in the drug case we want to get it right for each individual drug, because it will get widely used if it is thought to be effective.

      Delete
    3. @Jonathan Badger:

      In Bayesian world, null hypothesis does not exist.

      Delete
    4. Okay, but in both AWF Edwards' method (I have the book, but it was fairly tough going last time I tried to work through it) and in Bayesian methods you are still comparing models. Wouldn't one of them still be "no selection"? Are we just talking about doing the stats in a different way or is it more than that? Are there equivalents to the selection detection program PAML already implemented using these alternatives?

      Delete
    5. Wouldn't one of them still be "no selection"?

      It would. Or it can be. Or, for people with certain beliefs about reality, it wouldn't. The simple point is that there is no such thing as THE null hypothesis. Other points are more complex.

      Delete
    6. Jonathan, the alternative to null hypothesis testing is parameter estimation, with confidence intervals around your estimate of the parameter. In most cases, nobody cares about the null hypothesis that a character is exactly neutral. Any interesting character is highly unlikely to be exactly neutral, to 10 decimal places. A better question is "How close to neutral is it?" So we find a meaningful parameter (like the selection coefficient s) and try to measure it experimentally or in the field. That parameter estimate (with confidence interval) is what we really want to know, and is much more interesting and informative than a darn p-value on a null hypothesis.

      Delete
    7. Joe Felsenstein writes,

      The issue here was whether neutrality gets to be the null hypothesis. When we are testing drugs to cure cancer, and want to be very cautious about concluding that they work, that seems to be different case than when we try to get a picture of how much natural selection there is.

      I don't see why there should be a difference. You are free to postulate that natural selection is the cause of some phenotype but if you can't demonstrate adaptation then you can't eliminate the possibility that your postulated cause isn't working.

      In the natural selection case it is not a disaster if we make occasional mistakes about which places in the genome are under selection, as long as we get the right overall conclusion about how much natural selection there is, since that is what we are trying to infer.

      No, actually in most cases scientists are trying to work out the cause of a particular phenotype. It's not a disaster if scientists falsely speculate that the differences between Indian and African rhinos is due to adaptation but it is a scientific disaster if they completely ignore any other possibility.

      Delete
    8. It seems to me that you (Larry) are trying to generalize about how much evolution is due to neutrality and how much to selection. In which case starting with neutrality and calling it the Null Hypothesis and then being very skeptical of evidence for natural selection will give you a biased picture.

      Delete
  10. I am a big fan of applying statistical tests where possible and it really annoys me when people either forget or (worse) ignore statistics. That said, however, it has to be remembered that statistics are (a) pretty arbitrary and (b) highly dependent on underlying assumptions. There is nothing magic about a 5% probability and a non-significant result does not always mean that the null hypothesis is right, just that you cannot (currently) rule it out with the chosen level of confidence, assuming your underlying model is right. Remember that there are "Type II" statistical errors - incorrectly accepting the null hypothesis.

    Also, we should not confuse statistical significance with biological significance. They are not synonymous. Sometimes, it is worth following up a potentially biologically significant result, even when it isn't statistically significant. You may have too much noise in your system (or insufficient data) to get a statistically robust answer and therefore need to look to additional experiments or evidence - or just repeat the analysis with more data. So, yes, don't forget the null hypothesis... but also don't forget the power of the statistical test, and that failure to reject the null hypothesis does not mean that the null hypothesis is right. (The "argument from ignorance" logical fallacy.)

    ReplyDelete
  11. I think that, if someone explained what genetic drift and natural selection are, it would become immediately clear why genetic drift is the null hypothesis.

    ReplyDelete
  12. I'm confused by this comment. There are detailed explanations of the math here. Is that what you are asking for? Or do you imagine that all of us in the discussion here don't know about them?

    ReplyDelete
  13. I'm saying that, if evolution is a stochastic process (as Larry has said it is before), one is always going to get a distribution of possible allele frequencies, regardless of any bias toward the reproduction of one phenotype. Therefore, claiming that a change in allele frequency is due to natural selection is a bit non-sensical, because the probability of a change in allele frequency is far greater that the probability of no change in allele frequency, even if there is no bias toward the reproduction of a particular phenotype.

    ReplyDelete
    Replies
    1. one is always going to get a distribution of possible allele frequencies, regardless of any bias toward the reproduction of one phenotype.

      With a strong selection, that distribution will be delta function.

      Delete
    2. If there is strong enough selection, the shift on the mean of the distribution of gene frequencies over time will be strong enough to be detectable. And of course the same is true for quantitative characters.

      Delete
  14. No, actually in most cases scientists are trying to work out the cause of a particular phenotype. It's not a disaster if scientists falsely speculate that the differences between Indian and African rhinos is due to adaptation but it is a scientific disaster if they completely ignore any other possibility.

    It seems to me like starting by assuming drift rather than starting by assuming selection ignore the other possibility to the same extent. So, why not start by actually "trying to work out the cause of a particular phenotype"? So before comparing what is expected under drift versus what is expected under selection, none of the hypotheses are assumed.

    ReplyDelete
  15. My blog post seems to have, in part, kick-started this lively debate that has gone off in a number of other directions. Hence, I wanted to add one clarifying point: the original blog and BMC Genomics article on which it is based do not claim that failure to reject the neutral model implies therefore that evolution is entirely neutral. From our conclusion:

    "In the present case the excellent fits obtained, e.g., for our flexible core model (model D), should not be interpreted as an indication for the validity of its assumptions. Rather, our analysis shows that gene frequency distributions do not contain sufficient information for the inference of evolutionary mechanisms underlying the observed distributions."

    Hope this is of some help to this discussion.

    ReplyDelete