## Thursday, February 24, 2011

### The "Null Hypothesis" in Evolution

There's been a lot of discussion about the proper way to engage in thinking about evolution. When faced with a new problem, some people think that it's proper to begin by investigating adaptationist explanations. Others think that the proper way to begin is by assuming that the character in question is mostly influenced by random genetic drift. We are having a lively debate about this at Dawkins, Darwin, Drift, and Neutral Theory.

Part of the discussion boils down to a debate about the proper "null hypothesis" in evolutionary theory.

Here are some explanations from the textbooks that may help explain the "null hypothesis."
The most widely used methods for measuring selection are based on comparisons with the neutral theory, in which variation is shaped by the interaction between mutation and random genetic drift (Chapter 15). The neutral theory serves as a well-understood null hypothesis, and deviations from it may be caused by various kinds of selection. In the following sections, we examine ways of detecting and measuring selection by comparison with neutral theory.

EVOLUTION by Nicholas H. Barton, Derek E.G. Briggs, Jonathan A. Eisen, David B. Goldstein, and Nipam H. Patel, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2007 (p. 530)
The first step in a statistical test is to specify the null hypothesis. This is the hypothesis that there is actually no difference between the groups. In our example, the null hypothesis is that the presence or absence of wing markings does not effect the way jumping spiders respond to flies. According to this hypothesis, the true frequency of attack is the same for flies with markings on their wings as for flies without markings on their wings.

The second step is to calculate a value called a test statistic....

The third step is to determine the probability that chance alone could have made the test statistic as large as it is. In other words, if the null hypothesis were true, and we did the same experiment many times, how often would we get a value for the test statistic that is larger than the one we actually got?

EVOLUTIONARY ANALYSIS by Scott Freeman and Jon C. Herron, Prentice Hall, Upper Saddle River, New York 1998 (p. 73)
Genetic drift and natural selection are the two most important causes of allele substitution—that is, of evolutionary change—in populations. Genetic drift occurs in all natural populations because, unlike ideal populations at Hardy Weinberg equilibrium, natural populations are finite in size. Random fluctuations in allele frequencies can result in the replacement of old alleles by new ones, resulting in non-adaptive evolution. That is, while natural selection results in adaptation, genetic drift does not—so this process is not responsible for those anatomical, physiological, and behavioral features of organisms that equip them for survival and reproduction. Genetic drift nevertheless has many important consequences, especially at the molecular genetic level: it appears to account for much of the differences in DNA sequences among species.

Because all populations are finite, alleles at all loci are potentially subject to random genetic drift—but all are not necessarily subject to natural selection. For this reason, and because the expected effects of genetic drift can be mathematically described with some precision, some evolutionary geneticists hold the opinion that genetic drift should be the "null hypothesis" used to explain an evolutionary observation unless there is positive evidence of natural selection or some other factor. This perspective is analogous to the "null hypothesis" in statistics: the hypothesis that the data does not depart from those expected on the basis of chance alone. According to this view, we should not assume that a characteristic, or a difference between populations or species, is adaptive or has evolved by natural selection unless there is evidence for this conclusion.

EVOLUTION by Douglas Futuyma, Sinauer Associates Inc., Sunderland, MA, USA 2009 (p. 256)
Here are some papers from the scientific literature that illustrate how one goes about using the null hypothesis to ask questions about evolution.

Duret, L. and Galtier, N. (2007) Adaptation or biased gene conversion? Extending the null hypothesis of molecular evolution. Trends in Genetics 23:273-27 [doi:10.1016/j.tig.2007.03.011]

Orr, H.A. (1998) Testing Natural Selection vs. Genetic Drift in Phenotypic Evolution Using Quantitative Trait Locus Data. Genetics 149:2099-2104. [Abstract]

Brown, G.B. and Silk, J.B. (2002) Reconsidering the null hypothesis: Is maternal rank associated with birth sex ratios in primate groups? Proc. Natl. Acd. Sci. (USA) 99:11252-11255. [doi: 10.1073/pnas.162360599]

Nachman, M.W., Boyer, S.N., and Aquadro, C.F. (1994) Nonneutral evolution at the mitochondrial NADH dehydrogenase subunit 3 gene in mice. Proc. Natl. Acd. Sci. (USA) 91:6364-6368. [Abstract]

Fincke, O.M. (1994) Female colour polymorphism in damselflies: failure to reject the null hypothesis. Anìm. Behav. 47:1249-1266. [PDF]

Roff, D. (2000) The evolution of the G matrix: selection or drift? Heredity 84:135–142. [doi:10.1046/j.1365-2540.2000.00695.x]

#### 20 comments :

1. These excerpts reflect a rather narrow definition of "null hypothesis" which is not accepted by all statisticians (presumably not by all philosophers of science either, though I know you don't have much respect for them).

Based on DK's posts in the thread you linked to, I'm guessing he/she is at least to some extent an empirical Bayesian. Empirical Bayesians don't accept that "no difference" is the default null hypothesis; they want null hypotheses to be formulated on the basis of what is currently known.

Statistical methods taught in textbooks are usually the most conservative possible advice for the most general situation. A real statistician would be able to provide more specific advice for a specific problem. This even applies to something as basic as formulating null hypotheses.

2. There is a good discussion of "null hypotheses" in Chapter 10 ("testing biological hypotheses") of Pigliucci and Kaplan's book: Making Sense of Evolution. They argue (as have many others) that perhaps null models are not the way to go. Why not just test alternative models against each other using model selection and information theory (e.g., AIC, BIC, etc)? Also, to talk about a "null model" begs the question of what we expect a null model to look like. Does null model = nothing going on? What about using a random walk to model drift, as argued by Pigliucci/Kaplan? Or regarding morphology, it seems that one can simply develop alternative models of the influence of selection versus drift via the tools of diffusion theory, or the myriad "neutral morphological evolution" models out there. Essentially, carefully constructing several alternative models that capture the biological factors we're interested in seems like the best way to go. We can then test these models against each other.

As an example, talking about drift as being the null model of selection can lead to problems, because selection can actually be stronger in very small populations when we treat fitness as a random variable rather than a fixed one (this is not a misphrasing on my part--see Sean Rice's paper in BMC Evolutionary Biology, 2008). How do we properly develop a null model of selection in small populations when we now know that some of our core assumptions (i.e., large N = stronger selection) can be violated if we model fitness as having stochastic variation. It seems that, again, developing and testing alternative models against each other with respect to how they fit the data is a better approach that deeming one model "a null" model.

3. @rich lawler,

Imagine that you are interested in knowing whether allele B confers some selective advantage over allele A.

You take a sample of the population at time zero and find that you have 50% A's and 50% B's. One hundred generations later you have 40% A's and 60% B's. Does that prove selection?

As you attempt to answer this question you will find that the only reasonable way to answer it is to test your selection hypothesis against what might have occurred by chance alone.

If you assume selction as the null hypothesis then your data confirms selection.

4. Genetic drift is the null hypothesis because it is the result of evolution's occurring in finite populations. All that is needed for genetic drift to occur is a finite population. There need no be any fitness differential (i.e., the requirement for natural selection).

Show me an infinite population and we can discuss a population where it is inappropriate to use drift as a null hypothesis for evolutionary change.

5. You take a sample of the population at time zero and find that you have 50% A's and 50% B's. One hundred generations later you have 40% A's and 60% B's. Does that prove selection?

As you attempt to answer this question you will find that the only reasonable way to answer it is to test your selection hypothesis against what might have occurred by chance alone.

This is not the only way. You can do a monte carlo simulation for your 100 generations with a variety of putative selection coefficients, then determine the probability that each potential coefficient will yield the observed final distribution.

6. This is not the only way. You can do a monte carlo simulation for your 100 generations with a variety of putative selection coefficients, then determine the probability that each potential coefficient will yield the observed final distribution.

First, you need to make sure that it makes sense to posit that there is any natural selection occurring. Since drift is the only thing that needs to always occur in a finite population, it makes sense to determine what the possible distribution of traits without selection is.

7. I, too, question the need for such a conservative approach to a null. A better approach is to let the data be the arbiter through an approach not dissimilar to what Anonymous laid out. But I've found frequentist approaches lacking, in general.

8. @Michael M

Genetic drift is the null hypothesis because it is the result of evolution's occurring in finite populations. All that is needed for genetic drift to occur is a finite population. There need no be any fitness differential (i.e., the requirement for natural selection).

I don't really think that's good enough. When people talk of "infinite population" they tend to mean "infinite-and-well-shaken-and-environmentally-homogeneous (-and-sexual)". Only then can you turn off drift and turn on the convergence upon the expected value of fitness that large numbers reveal. But however big the population, it tends to occupy a heretogeneous range which is not covered equally by randomly-assorting members - drift, and selection, can both decrease variation locally.

The fact that drift - an increased role of chance for all alleles, not just neutral ones - may increase with diminishing population size does not necessarily make it the null hypothesis of choice. In the very smallest populations, there is simply no way to tell. Every individual with allele A may have been causally favoured by its possession, but dumb luck still got allele B to fixation against the differential. Whatever experiments you may do show allele A as superior - but B's fixation remains an inconvenient historical fact.

Larry frequently points out that an allele with this or that advantage will still be lost to drift n% of the time. However, adaptation is not a single-strike assessment. If there is a particular challenge to be overcome, many alleles may respond to it. Cold selects for fur and fat and migration and shivering and hibernation and whatever else. That particular cat can be skinned in many biochemical ways. The fact that evolution is stochastic over the full continuum of possible selective differentials means there is not a favoured position wrt drift, at the midpoint or anywhere else. Evolution is marked by persistence. One allele is lost to drift - meh, here's another. And another... Multiple mutations at multiple sites have the effect of increasing sample size above that of the census population at a given moment, as far as the likelihood of getting an adaptation to a challenge fixed is concerned. If 99% of novel beneficial alleles are lost, what happens if we generate 10000 of them, all over the genome and all over the population?

9. First, you need to make sure that it makes sense to posit that there is any natural selection occurring.

No you don't, and in fact you shouldn't. The potential selection coefficients are smoothly variable so you simply assess the probability that a given range of coefficients are consist with generating the observed data with whatever probabilistic cutoff you choose (such as 95%).

You can make a semantic argument about how close to zero the selection coefficient needs to be in order to call it "drift", or equivalently how far away from zero it needs to be to call it "selection". If this is what the "controversy" is all about, it's not a very interesting controversy.

10. @ Larry Moran,

Why not just develop two models, one that captures a directional process (that the allele frequency increased due to a directional force, like selection) and one that captures a stochastic process (the allele frequency increased due to random sampling of alleles from parental generations). In this case, we can test each model's fit to the data--test the models against each other and assess their likelihood (weighted by number of parameters). It's probably true that each model would explain the data with relatively equal likelihood. This is the essence of model selection: if they data are somewhat equivocal, then we can't conclude that one model is better than another, and this would be reflected in the fact that both models have fairly equal AIC values (or another metric of the "fit to data"). If the data are equivocal, our conclusions should be equivocal, even if we've developed rigorous models of the processes that we think are generating the data. I'd rather conclude that both models fit the data equally well, rather than let some arbitrary p = 0.05 dictate my conclusions. (Obviously, I'm glossing over a lot of details that might help us tease out selection from drift, but my general point here relates to multimodel inference versus p-values and null models).

11. Larry: Rather than the hypothetical you gave Rich Lawler, I wonder if you don't already have a good real world test from the Dawkins, Darwin...Drift thread: human blood types.

There, your conclusion that the available information indicated distribution of blood types was due to drift (and founder effects) was challenged by people saying blood types showed selection for two reasons: (1) conservation of certain alleles for a very long time; and (2) selection for immune response.

In the conservation situation, no mutations have successfully gone to fixation for a very long time. In the situation of selection for immune response, mutations have gone to fixation by reason (it is said) of selection. It's rather wonderful to me that both lack of mutation and mutation apparently indicate selection!

I think you have your answer, at least for the folks who were arguing for selection in the matter of human blood types, as to what their "null hypothesis" is.

12. If you toss a coin ten times, and it comes up 8 heads and 2 tails, does that mean that the coin is biased to coming up heads?

13. Even better, if no one has ever flipped a coin before and then the first coin ever flipped comes up heads, does this mean that flipped coins always come up heads?

If one uses Baysian reasoning, the answer must be yes, and your prior (i.e. your only test ever) is that the probability of a flipped coin coming up heads is 100%. Then you flip it again, and then modify your calculation of the probability based on your new observation, rinse, repeat...

According to this perspective, how would you answer Larry's 50% A/50% B —> 40% A/60% B question? And what does this perspective say about null hypotheses?

14. If you toss a coin ten times, and it comes up 8 heads and 2 tails, does that mean that the coin is biased to coming up heads?

Does it mean that it isn't? A classic situation calling for "more research"!

15. Does it mean that it isn't?

I think you're missing the point here. We have a set of rather basic techniques (i.e., techniques taught in any introductory inferential statistics course) that allow us to make a statement about how confident we can be that the observed proportion of heads was due to an intrinsic bias in the production of heads rather than the observation of that particular sequence of trials.

A classic situation calling for "more research"!

What if you no longer have access to the coin or any information that might allow you to recreate the coin to an arbitrary degree of precision?

16. Does it mean that it isn't?

I think you're missing the point here.

No, just joking - nonetheless, for that specific example, there is insufficient information for those "rather basic" statistical techniques to work upon. In an infinite run of tosses an unbiased coin we expect to see 50/50 heads and tails. But within the infinite group there will be runs of 10, or randomly selected groups of 10, that cluster 8 heads. Binomial probability is <5% for a fair coin. So we might make some "95%" ruling, but I think sample size is too small, hence:

A classic situation calling for "more research"!

What if you no longer have access to the coin or any information that might allow you to recreate the coin to an arbitrary degree of precision?

Then you have no information upon which to base anything, null hypothesis or otherwise. A fixed allele is a fixed allele, regardless of how it came to be so. We can infer more if we have some kind of reference point - neighbouring genes, other alleles at the same locus, or the same locus in other species - but you can't reconstruct the causal path from mutation to fixation, or determine selective differential retrospectively. When fixed, the advantageous, neutral and detrimental alleles A, B and C are equal - they have all achieved s=0. However, any new mutation at the locus is more likely to be on the wrong side of A than it is if C won (by drift). It's not just a matter of getting there, but staying there. And a constant stream of novel alleles all subject to the same basic constraints causes selective differential to stand out as signal against the random noise of drift, much as if we took more and more samples of 10 from our infinite set of coin tosses and were able to see more clearly whether the coin was biased. Effectively, by ongoing mutation we increase sample space and ensure that selective advantage will tend to override the noise in the generality, if not in the particular. For this reason, my default assumption tends to be adaptation. To call a default assumption a null hypothesis would be stretching it, though.

17. No, just joking - nonetheless, for that specific example, there is insufficient information for those "rather basic" statistical techniques to work upon. In an infinite run of tosses an unbiased coin we expect to see 50/50 heads and tails. But within the infinite group there will be runs of 10, or randomly selected groups of 10, that cluster 8 heads. Binomial probability is <5% for a fair coin. So we might make some "95%" ruling, but I think sample size is too small.

First, you contradict yourself; then, you miss the point yet again. You claimed that "there is insufficient information for those 'rather basic' statistical techniques to work upon" and went on to perform the test that you said there was "insufficient information" to perform. Furthermore, you miss the point that drift is always present in a finite population; therefore, you have to "subtract out" drift in order to determine if selection is at work.

Then you have no information upon which to base anything, null hypothesis or otherwise. A fixed allele is a fixed allele, regardless of how it came to be so. We can infer more if we have some kind of reference point - neighbouring genes, other alleles at the same locus, or the same locus in other species - but you can't reconstruct the causal path from mutation to fixation, or determine selective differential retrospectively. When fixed, the advantageous, neutral and detrimental alleles A, B and C are equal - they have all achieved s=0. However, any new mutation at the locus is more likely to be on the wrong side of A than it is if C won (by drift). It's not just a matter of getting there, but staying there. And a constant stream of novel alleles all subject to the same basic constraints causes selective differential to stand out as signal against the random noise of drift, much as if we took more and more samples of 10 from our infinite set of coin tosses and were able to see more clearly whether the coin was biased. Effectively, by ongoing mutation we increase sample space and ensure that selective advantage will tend to override the noise in the generality, if not in the particular. For this reason, my default assumption tends to be adaptation. To call a default assumption a null hypothesis would be stretching it, though.

But drift occurs because the population is finite. You do not have to make any other assumptions to introduce drift in to the population. Therefore, drift is the null hypothesis.

18. First, you contradict yourself; then, you miss the point yet again. You claimed that "there is insufficient information for those 'rather basic' statistical techniques to work upon" and went on to perform the test that you said there was "insufficient information" to perform. Furthermore, you miss the point that drift is always present in a finite population; therefore, you have to "subtract out" drift in order to determine if selection is at work.

Yes, that was rather contradictory. But when you are dealing with very small sample sizes, it is impossible to untangle chance fluctuations from a definite bias.

But drift occurs because the population is finite.

God, no! Drift occurs because populations are real. This idea that you can simply scale up N, while every individual continues to have an exactly equal chance of mating with any other in a uniform environment with competing alleles defined precisely by random variables ... this idea is bunk. Every increase in N needs to be accompanied by an increase in the amount of work the members of the population have to do to remain panmictic.

Consider this: the largest conceivable real population, covering a globe of (somehow) uniform conditions with a minute, genetically uniform organism. There is no selective differential, until the first mutation occurs. Have we eliminated drift by increasing N to the maxiumum permitted on this planet? Suppose, instead of a sphere, the organisms occupied an infinite plane?

Anyway, I understand why a numerologist might prefer to assume chance unless proven otherwise, but when we look at the genes that survive, over the long term, we are looking at a filtered pool, with a very effective mechanism, chance notwithstanding, of performing that filtration. So, whatever maths may say, I prefer to assume survivorship from a filtration process, ta!

19. Allan Miller says,

Every increase in N needs to be accompanied by an increase in the amount of work the members of the population have to do to remain panmictic.

I tried being panmictic when I was much younger. It was exhausting.

20. But when you are dealing with very small sample sizes, it is impossible to untangle chance fluctuations from a definite bias.

The same is true when the selective difference is very small (as it is the vast majority of time), which is why drift is the only null hypothesis.

God, no! Drift occurs because populations are real.

Nope, drift occurs because populations are finite. In a finite universe all populations are also finite, so all real populations (i.e., all populations occurring within the aforementioned universe) are finite.

This idea that you can simply scale up N, while every individual continues to have an exactly equal chance of mating with any other in a uniform environment with competing alleles defined precisely by random variables ... this idea is bunk.

It is only bunk because you're redefining "random variable" so that certain things that are normally considered random variables are no longer random variables.

Every increase in N needs to be accompanied by an increase in the amount of work the members of the population have to do to remain panmictic.

While drift may preclude the population from being truly panmictic, panmixis does not preclude drift from occurring in the population.

And modern evolutionary theory recognizes that large populations are often subdivided into smaller subpopulations (demes) that are have limited gene flow amongst one another. In other words, you have erected yet another straw man.

Consider this: the largest conceivable real population, covering a globe of (somehow) uniform conditions with a minute, genetically uniform organism. There is no selective differential, until the first mutation occurs. Have we eliminated drift by increasing N to the maxiumum permitted on this planet?

A sphere is a finite surface, so assuming that your "minute organism" is not infinitesimally small a population covering the entire surface of a sphere is finite.

Suppose, instead of a sphere, the organisms occupied an infinite plane?

This population too could be finite.

Anyway, I understand why a numerologist might prefer to assume chance unless proven otherwise, but when we look at the genes that survive, over the long term, we are looking at a filtered pool, with a very effective mechanism, chance notwithstanding, of performing that filtration.

The people to whom you refer are not numerologists (though I can see how someone who refuses to take the time to gain a passing acquaintance with elementary probability theory and statistics might benefit more labeling them so); they are mathematicians and mathematical biologists. The fact that you reject their work (which is widely accepted in the evolutionary biology) says more about your knownoghtingism than it does about the unphyiscality of their work.

So, whatever maths may say, I prefer to assume survivorship from a filtration process, ta!

Then you are completely ignoring the Modern Synthesis.