Wednesday, June 15, 2016

What does a person's genome reveal about their ethnicity and their appearance?

If you knew the complete genome sequence of someone could you tell where they came from and their ethnic background (race)? The answer is confusing according to Siddhartha Mukherjee writing in his latest book "The Gene: an intimate history." The answer appears to be "yes" but then Mukherjee denies that knowing where someone came from tells us anything about their genome or their phenotype. He writes the following on page 342.

... the genetic diversity within any racial group dominates the diversity between racial groups. This degree of intraracial variability makes "race" a poor surrogate for nearly any feature: in a genetic sense, an African man from Nigeria is so "different" from another man from Namibia that it makes little sense the lump them into the same category.

For race and genetics, then, the genome is strictly a one-way street. You can use the genome to predict where X or Y came from. But knowing where A or B came from, you can predict little about the person's genome. Or: every genome carries a signature of an individual's ancestry—but an individual's racial ancestry predicts little about the person's genome. You can sequence DNA from an African-American man and conclude that his ancestors came from Sierra Leone or Nigeria. But if you encounter a man whose great-grandparents came from Nigeria or Sierra Leone, you can say little about the features of this particular man. The geneticist goes home happy; the racist returns empty-handed.
I find this view very strange. Imagine that you were an anthropologist who was an expert on humans and human evolution. Imagine you were told that there's a woman in the next room whose eight great-grandparents all came from Japan. According to Mukherjee, such a scientist could not predict anything about the features of that woman. Does that make any sense?

I suspect this is just a convoluted way of reconciling science with political correctness.

Steven Monroe Lipkin has a different view. He's a medical geneticist who recently published a book with Jon R. Luoma titled "The Age of Genomes: tales from the front lines of genetic medicine." Here's how they explain it on page 6.
Many ethnic groups carry distinct signatures. For example, from a genome sequence you can usually tell if an individual is African-American, Caucasian, Asian, Satnami, or Ashkenazi Jew, even if you've never laid eyes on the patient. A well-regarded research scientist whom I had never met made his genome sequence publically available as part of a research study. I remember scrolling through his genetic variant files and trying, more successfully than I had expected, to guess what he would look like before I peeked at his webpage photo. The personal genome is more than skin deep.
This makes more sense to me. If you know what you look for—and Simon Monroe certainly does—then many of the features of a particular person can be deduced from their genome sequence. And if you know which variants are more common in certain ethnic groups then you can certainly predict what a person might look like just by knowing where their ancestors came from.

What's wrong with that?


79 comments:

  1. There is also an increasing realization that ethnicity is associated with genetic variation associated with disease outcome which really isn't congruent with the "race is just a social construct" idea. For example, some collaborators of mine have found that variations in the LSAMP locus predominate in prostate tumors from African American men but not in other groups, and this probably contributes to the greater incidence of prostate cancer among African Americans.

    ReplyDelete
    Replies
    1. There are a variety of problems with this:
      a) Anybody who puts together the phrase "just a social construct" is obviously ignorant of what the term "social construct" means. It's on par with "just a theory". There's no "just" here. Of course race is a social construct. Gravity is a social construct. Social construct isn't the opposite of valid.
      b) "African American men" isn't a useful category. That category is not valid, because it is at best a legal term. In terms of genetic differences there's a far greater diversity among "african americans" (simply because the migrant populations were subject to the founder effect) and after the institution of the "one drop rule" it became even less meaningful.
      c) Why would you design a study using race as a category, when you could do a GWAS, which would capture genetic variations more clearly and via mapping also points to loci of interest? In particular since pooling data in this way means that you will not capture correlations for a substantial amount of human genetic variation and your statistical analysis can run into Simpson paradox type problems. It's not as if genotyping in humans was prohibitively expensive.

      Delete
    2. a)"Gravity is a social construct"? Isn't it a "transformative hermeneutic" as per Sokal?
      b) It really, really, *is* a useful category for medicine, though. Yes, AAs are admixtures of European and African ancestry and yes, individuals vary as to their makeup. But while not all AAs with prostate cancer have the LSAMP variation, vary few non-AAs have it. So typical treatments for prostate cancer might not work very well for AAs. Part of the problem with medical research in the past is that white European-descended males are treated as the "standard" human. Females and people of other ancestries simply don't respond in the same way to a lot of treatments.
      c) The study is dealing with actual tumors from actual patients, not mere GWAS studies

      Delete
    3. a) A social construct is the relationship between a symbol and its meaning. The word gravity and what it refers to is a social construct. So is that F=G*m_1*m_2/r² refers to Newtons law of Gravity, because if G was Coulombs constant and m_1 and m_2 were charges, that would be Coulomb's law for electrostatic forces. By convention we use m to designate mass and G as the gravitational constant, but it's not as if that convention was in any way predicated by nature.
      b and c) I still disagree. Your argument is that race is useful as a proxy for genetic diversity among humans. But I think we are in agreement that it does not resolve most of the genetic diversity that exists.We can also straight up use genotyping to directly access genetic diversity, which makes a proxy superfluous. Now, I tried to identify the study in question and found Petrovics et al (2015). If that's what you are referring to then they did perform a GWAS. And very much to the point they found the LSAMP variation in about 1/4th of African Americans vs. 1/8th in Caucasians. No other data is given. This means that in the absence of no further data using a treatment that works better for the LSAMP variation as a default for African Americans would improve things for 1/4th of the patients, but make things worse for 3/4th of them. It's not as if genotyping as a diagnostic tool was SciFi at this point, we can individualize treatment based on the actual case. So while I agree with the point made about treating a subset of humans as the standard, I disagree strongly about the possible remedy: We can connect different responses to treatment directly to genetic differences and we can individualize treatment based on using genotyping.

      Delete
    4. GWAS studies in the normal meaning of the term are dealing with genotyping of individuals. It's important to understand that Petrovics et al are dealing with the genomes of the tumors, which is not the same thing. Variants found on the tumors are much more relevant to potential treatments and couldn't be found by simple genotyping of populations.

      Delete
    5. Well, If you are running EIGENSOFT you're doing a GWAS. I also wonder what could prompt a difference in the tumors in a heritable fashion if not existing genomic diversity. AFAIK (and I'm no expert on cancer) the main reason particular types of tumors are more or less prevalent is that there are recessive mutations within the population that allow tumors to arise when the dominant copy in heterozygotes mutates.
      If LSAMP variations lead to different treatment, then genotyping tumors seems like a reasonable diagnostic step in all cases of prostate cancer. I still don't understand how race is a useful category here.

      Delete
    6. It's presumed that the genotype of the individual is responsible for variants on the tumors, but actually finding them on the somatic genome is difficult and why normal GWAS has a rather low reputation in cancer biology. It isn't a one-to-one mapping, as so focus has shifted to studying the genomes of the tumors themselves.

      As to the rational, African Americans suffer from a considerably larger incidence of aggressive prostate cancer and patient groups want to know why.

      Delete
    7. Aren't somatic GWAS reposible for identifying the BCRAs for instance?

      Delete
  2. Honestly, it sounds more like Mukherjee is confused here, rather than trying to be "politically correct." It seems he's struggling with the interpretation of genetic diversity within and between racial categories. I.e. because of the (comparatively high) genetic diversity within some of these categories, category may be poorly predictive at the level of "all known variants." Of course, that broad categories such as white, black, Asian, are poorly predictive at this level, does not mean that one cannot predict with reasonable accuracy the presence or absence of a subset of genetic variants, those common to members of that racial category (assuming that these categories do indeed reflect shared ancestry on some level). Unsurprisingly, category becomes more useful a predictor the more it defines a specific population.

    ReplyDelete
  3. From Mukherjee:
    "the genetic diversity within any racial group dominates the diversity between racial groups. This degree of intraracial variability makes "race" a poor surrogate for nearly any feature: in a genetic sense"

    It seems to me this idea that genetic diversity indicates that race is an illusion was started by Lewontin back in the 80s. I think the idea was that misrepresenting the science can be justified for the greater good of mitigating racism, but I doubt anyone in the KKK gave a fig for any of this.
    Here is why I think this is wrong, but if my interpretation is off I of course expect a vigorous correction.

    Consider several loci involved in skin pigmentation. It could easily be the case that if one looks at genetic diversity within a population from Nigeria and within a population from Iceland one would see greater diversity within those populations that between the consensus sequences for both populations. But skin pigmentation is not an illusion- those 2 populations fall at extreme opposite ends of the phenotypic spectrum. Most of the 'diversity' within the population isn't doing anything and a few key differences between the populations explain most of the differences

    ReplyDelete
    Replies
    1. It seems to me this idea that genetic diversity indicates that race is an illusion was started by Lewontin back in the 80s. I think the idea was that misrepresenting the science can be justified for the greater good of mitigating racism

      Can you be more precise on how Lewontin has "misrepresented science"? Lewontin, based on a small number of genes for which data was available estimated F_ST for humans at about .15. Now we have results based on a large number of SNPs and the current best estimate of F_ST is about .12 (which one should note is lower than Lewontins estimate). In the mid 70s antropologists affirming the existence of human races defined them through an S_FT greater than .25, in the mid 80s they accepted Lewontis estimate, but shifted their cutoff to .15 and now that the SNP data shows we're lower than that they are down to F_ST>.1. The key question here is, how much we can reasonably have the goalpost moved. Race among humans was first introduced as a concept involving multiple events of creation by god. There's some BS right there. Then we got phylogenetic hypotheses that placed some "races" closer to chimps than to other humans. Again that's easy to refute. So again we got a new definition of race, now as separate divergences from an extinct ancestor, that still for some reason ended up interfertile. Now that of course was bollocks, too. So then we got definitions based on F_ST values and they changed whenever it turned out that human F_ST values were to small to make the accepted definition apply. If we move beyond mapping SNPs and it turns out that aligning full genome assemblies yields and F_ST of 0.09, we'll certainly see "race realists" move the goalpost once again and drop the cutoff to say 0.05.
      We understand human evolution well enough to readily grasp why post-dispersal differentiation can't be that big a deal. Most of human evolution happened within a somewhat confined area with a population that was at least somewhat close to panmictic. And then there was dispersal and maybe 2500 generation of limited interbreeding, but there are still clines that show interbreeding took place during that time. Trying to shoehorn our evolutionary history into a model of races, that was conceived before evolutionary biology even existed seems like a patently bad idea.

      Delete
    2. He misrepresented it by claiming that because race can't be precisely defined genetically, the concept of race is an illusion. Many others who picked up on this- from pop science writers to the Phil Donohue Show suggested race was the equivalent of defining a group of people based on being left-handed, or bisexual
      I think the Fst is irrelevant for whether race can have a genetic definition, but even if its not a real taxonomic category doesn't mean its an illusion.
      I think CBSim in the next post is right. This does more harm than good if the goal is to mitigate racism.

      Delete
    3. Race is an illusion. Geographically structured genetic variation is not an illusion. Is this so hard to understand?

      Delete
    4. 'Is this so hard to understand?'

      Not at all. Then certain defined geographically structured genetic variations = race.

      Delete
    5. Race is a word that has been around a lot longer than "geographically structured genetic variations." In fact, it was around for quite a while before anyone realized genes were made of DNA, let alone how they worked.

      Basically, race was used to indicate a few, large groups, usually defined by skin color-- black, white, red, yellow, and sometimes brown. This classed Australian aborigines and some South Asians in the same category with Africans. No competent scientist today would accept that definition as a description of "geographically structured genetic variation."

      Based on this very inadequate description of genetic variation, elaborate "racial differences" were then "scientifically" described. (If you want to know what the "black personality" was supposed to be like, look at Barack Obama-- and take the exact opposite on pretty much every measure.)

      Obviously, this historic use of the word race has no scientific basis, and virtually no connection to modern genetic studies. Unfortunately, it is still used by some people as a sociological tool to separate and discriminate-- always to the advantage of one's own "race."

      Allowing the word race to be used as a synonym for specific genetic variations found in increased concentrations in specific populations does nothing to further science. It's on the level with creationists claiming that evolution is just another "creation story."

      This is why a lot of people want the word race dropped entirely. As long as it is confounded with old, disproven ideas, it contributes nothing to science, and means nothing but trouble for society.

      Delete
    6. @hoary puccoon

      Thank-you for describing the politically correct American position.

      Do you think the word "race" should be banned entirely from the scientific literatre, including its use to describe large demes in other species?

      What word should we use to describe large subpopulation of humans that are genetically isolated from each other; for example Africans, Asians, and Europeans? Should we just ignore the scientific evidence and pretend those groups don't exist?

      What is the political goal you are trying to achieve? Are you hoping that we can eliminate racism by having a bunch of scientists say there's no scientific evidence for human subpopulations?

      Delete
    7. But Africans, Asians, and Europeans are not genetically isolated from each other. These groups aren't groups. This isn't about political correctness; it's about reality. What we have is clinal variation in lots of alleles, with not all that much correspondence in clines among different genes.

      Delete
    8. I wasn't aware that these subpopulations were "genetically isolated" from each other. Generally reproductive isolation gets tested empirically, usually starting about 15 minutes after the boat lands. So far no isolation has been found.

      Or does Larry's term "genetically isolated" just mean differs in multiple gene frequencies enough that, by combining lots of them, we can tell them apart?

      Delete
    9. What political goal are YOU trying to achieve?

      The way the word race has been used in years past (and is still used in many general discussions) and the way different ethnic groups are studied for specific alleles, usually having to do with medical conditions, have no relationship to each other. As I understand it, there is far more variation among Africans than there is between certain African groups and the entire population of the rest of the world. So when you're using "Africans" as an example of a "large subpopulation of humans that are genetically isolated from each other" you're simply wrong.

      You could, of course, redefine race so that various African groups that show major genetic differences are considered different races. But that would almost certainly lead to confusion with the old definition of race, meaning all Africans, along with some Australians and Indians. Why would you want to do that?

      As far as using "race" to mean Black Angus as opposed to Zebu cattle, for example, I'm reasonably sure that cattle breeds are far more cohesive genetically than the human groups traditionally defined as races.

      I really don't understand why you would want to use a term whose common use has very little to do with any scientific reality. If you mean specific genetic cohorts, why not just say so?

      It also makes me a little uneasy that you're tossing a loaded term like politically correct into the discussion, without considering first what is scientifically correct.

      Please, please tell me you haven't been Trumped.

      Delete
    10. Joe Felsenstein asks,

      Or does Larry's term "genetically isolated" just mean differs in multiple gene frequencies enough that, by combining lots of them, we can tell them apart?

      Yes, it means that Homo sapiens is not a single panmictic population. It means there is limited gene (allele) flow between different population such that allele frequencies in these different groups can be very different. Different enough that it's quite possible to identify a member of such a population simply by looking at their genome.

      Many of these allele frequency differences produce visible phenotypes so it's possible to identify members of large subpopulations simply by looking at the physical appearance of individuals.

      Joe, do you agree with what I said or do you deny that such genetically isolated groups of humans actually exist?

      Delete
    11. hoary pucoon says,

      I really don't understand why you would want to use a term whose common use has very little to do with any scientific reality. If you mean specific genetic cohorts, why not just say so?

      There are two separate issues here. I understand the first one. You, and others, don't like the word "race" because of it's political connotations and it's history.

      Fine, I can see your point.

      The second issue is about the genetics of human populations. In this case, you seem to be denying that there's any genetic differences between different groups of humans.

      That's simply not true. My "political goal" is to fight against the misrepresentation of science.

      We could quibble about the exact boundaries of the various genetically isolated groups and we could quibble about what to call them but to deny that there are any differences is absurd.

      Like it or not, the average citizen of New Delhi is different from the average citizen of Tokyo. Different enough that you and I could easily sort a mixture of 1000 individuals from each city. Why in the world would you want to pretend that there is no scientific evidence for genetic difference between these two groups?

      Delete
    12. Why in the world would you want to pretend that there is no scientific evidence for genetic difference between these two groups?

      Nobody is saying there isn't; that's your strawman. Read what people are actually saying. There are no isolated human groups. Sampling two geographically distant populations, as race advocates almost always do, ignores the continuous variation between them. There are no races because the geographic variation is not separable into discrete, sharply divided units. A transect of human populations between Delhi and Tokyo would be instructive here.

      Delete
    13. Of course by the same argument you could argue that there are no such things as languages because besides people speaking Spanish and French, you have people speaking things like Catalan, which share features from both.

      Delete
    14. It's true. Languages also are constructs created by arbitrarily dividing a continuum. What we call languages are actually prestige dialects of much more continuous variation. Sometimes that continuity cuts across national borders. Catalan is spoken in both France and Spain, and grades into both languages. I don't think that works in your favor.

      Delete
    15. @John Harshman

      I very carefully read what you wrote. You said, "There are no isolated human groups." I interpret this to mean there are no genetically isolated human groups, is that correct?

      Your reason for making such a statement is that all humans are members of a single species. That means they can all interbreed successfully. Since there will always be some limited amount of gene flow between human groups, this means that demes don't exist.

      Is that a fair summary of your position?

      Does your objection apply to all species or just to humans? Can you give me an example of a genetically isolated group in another species or do none of them exist as long as there is even a tiny amount of gene flow between them?

      Delete
    16. You're the one who used the term "genetically isolated". I was just using your terminology. I assumed you just meant something like "lack of significant gene flow" rather than actual speciation. I don't think there are any such groups in the human species. There are such groups in many other species, and we usually refer to them as subspecies, and this is usually because of geographic separation. There are countless arguments over whether populations deserve to be called subspecies, species, or nothing.

      But go ahead. If you think there are human subspecies, can you delimit them? How many are there? Where are they?

      Delete
    17. @John Harshman

      So, just to be clear. You are saying that the human species cannot be subdivided into groups that have reduced gene flow between them. Is that correct?

      You aren't claiming that Homo sapiens is one large panmictic population, are you?

      Delete
    18. Larry, you are setting up a false dichotomy here. There are a lot of ways a population can show structure without being divisible into subpopulations with limited gene flow. Clinal variation in allele frequencies is one of them and certainly relevant in humans.
      No one is claiming humans are panmictic, but there are a lot of ways for populations to not be panmictinc and still not be readily decomposed into distinct subpopulations. The island model is not universal and we have a far better understanding of human evolution to just blindly apply it. Look at how lactose-persistence is distributed for instance. We find clines radiating out from various locations and these seem to correspond to different alleles producing lactose-persistence. These clines were further affected by migratory movements and cultural effects, mainly by how important milk based food became in diets, which in turn was influenced by local climate variation. There are a lot of interesting things going on and most of them are obscured rather than elucidated by moving towards an island model for humans.

      Delete
    19. @Simon Gunkel,

      I'm not sure I understand your position. Are you saying that the differences between the Maasai and the Central African pygmies could be just due to clinal variation and not to restricted gene flow?

      Delete
    20. It's always a combination of migration patters, clinal variation, reduced gene flow and in some cases bottlenecks. If you look at Y chromosomes, mtDNA and autosomes across Africa you find evidence for rather intricate patterns of dispersal, numerous loci for which there are clines and plenty of evidence for at least repeated temporary high rates of gene flow. Heck, both groups you mention are interesting case, because they seem to be very stable as cultural groups, but genetically characterized by repeated large scale introgressions from various sources, indicating that they absorbed culturally distinct groups migrating to their respecitve regions regulary, without that having a large impact on their language (imagine England getting hit by the roman conquest, the Anglo-Saxons and the Norman conquest and rather than ending up with the linguistic hodgepodge that is English sticking to the Celtic of the Britons).

      Delete
    21. I'll agree that there are some regions of reduced gene flow. But reduced enough to create subspecies? Probably not. Most animal subspecies are either completely allopatric or have narrow hybrid zones. None of that for humans. Again, if you think there are human subspecies, how many are there and where are they?

      Delete
    22. @John Harshman

      So we agree that the species Homo sapiens is subdivided into subgroups that are genetically isolated from each other, right? We agree that the genetic difference in allele frequencies are sufficiently different that we can identify members of such groups just by looking at the sequence of their genomes, right?

      Now we're just quibbling over what to call some of those groups, right? You don't think the major divisions, Africans, Asians, and Europeans, should be called "races" or "subspecies," right?

      In addition, you want to defend the position that races don't exist by quibbling over the exact boundaries and by pointing out that in the modern world there is enhanced gene flow due to extensive migration, right?

      Delete
    23. @Larry: We have to look at what we do when we go from arguing that populations can be distinguished, using enough markers, to focusing on the "major divisions". Are the divisions there sharp-edged and do the differences between them explain most of the genetic variation between individuals?

      In terms of the explanation of variation at a single locus, no. Lewontin's figure of 15% of the variation being due to "races" turns out to be a bit large.

      Note also that what Siddhartha Mukherjee was talking about was that figure. From your original post, he was saying that

      This degree of intraracial variability makes "race" a poor surrogate for nearly any feature: in a genetic sense, an African man from Nigeria is so "different" from another man from Namibia that it makes little sense the lump them into the same category.


      I don't see that as denying that there are real genetic differences between Nigerians and Namibians.

      Delete
    24. Larry, just insert a "no" after every time you say "right?". You just aren't reading what I write, and that's the only way you can see a post in which I say there are no genetically isolated human populations and take that as agreement that there are genetically isolated human populations.

      You seem to be having similar problems parsing what everyone else is saying. I don't understand why.

      Delete
    25. John Harshman says, "I'll agree that there are some regions of reduced gene flow."

      John Harshman says, "You just aren't reading what I write, and that's the only way you can see a post in which I say there are no genetically isolated human populations and take that as agreement that there are genetically isolated human populations."

      John, I really am trying to understand your position. I'm guessing that you make an important distinction between groups with "reduced gene flow" and "genetically isolated population." Is that correct?

      Delete
    26. Joe Felsenstein asks, "Are the divisions there sharp-edged and do the differences between them explain most of the genetic variation between individuals?"

      The answer to both questions is "no." When you look at closely related species (e.g. chimps and bonobos) the answer to the first question is closer to "yes" (sharp-edged divisions) but the answer to the second question is probably still no.

      Nobody is questioning the idea that the total number of differences between two unrelated individuals from the same region is some value, say "x," and that the differences between two individuals from different groups is some value "y" where "y" may be only a bit bigger than "x."

      This is true of all recognized subspecies or races in other species, I think. Wouldn't it apply to the Western and Eastern subspecies of gorilla, for example?

      What is your point?

      Delete
    27. Larry,

      I don't know what you mean by "groups with reduced gene flow". I was talking about areas of reduced gene flow. That is, there are places on the map where gene flow occurs between nearby points at a lower rate than between nearby points at other places. Oceans are good examples of such places of reduced gene flow, as are any areas with no people. In order to have this refer to groups, you would, I suppose, need a complete circle of such places. I don't think there are any human populations that are this isolated. Certainly the broad descriptions "Asian, African, European" don't come at all close to any such ideal. Where are you getting all this?

      Delete
    28. This is true of all recognized subspecies or races in other species, I think. Wouldn't it apply to the Western and Eastern subspecies of gorilla, for example?

      See, F_ST for Western and Eastern Gorillas is ~.38 (Thalmann et al. 2007,The Complex Evolutionary History of Gorillas: Insights from Genomic Data, Mol. Biol. Evol. 24:146–158.), and within conservation biology the .25 value (which Wright mentioned as he defined F_ST) is still being used as a cutoff point for designating subspecies.
      The Thalmann et al paper estimates that Western and Eastern Gorillas had a panmictic ancestral population ~60,000-70,000 generations ago, with limited gene flow ending about 10,000 generations ago, after which there was no gene flow. Compare that to humans where estimates for a panmictic ancestor are ~2,500-5,000 generations ago and limited gene flow never stopped.

      Delete
    29. This is actually a problem caused by the way that geneticists have quantified and partitioned genetic diversity. Genetic diversity is usually quantified as heterozygosity, and then this is additively partitioned into within- and between-group components. (Fst or Gst is the ratio of this between-group component to the total (pooled) heterozygosity.) So when someone says that most of the "diversity" is within groups, they mean the total (pooled) heterozygosity minus the mean within-group heterozygosity is very small.

      Geneticists then often misinterpret this to mean that genetic differences between groups are small. You can see that this is wrong just by noting two things: (1) heterozygosity is a probability, so it cannot exceed unity, and (2) total heterozygosity =< mean within-group heterozygosity (because heterozygosity is a concave function). Thus when within-group heterozygosity is high, it approaches unity, and total heterozygosity also necessarily approaches unity. Subtracting the former from the latter therefore necessarily gives a "between-group heterozygosity" close to zero, no matter how different the groups are. So the relative size of the "between-group" diversity is irrelevant to population structure, and could be close to zero even if the groups shared no alleles whatsoever. Try it and see for yourself.

      This is also why Fst and Gst do not really measure genetic differentiation between groups; it will always be close to zero when within-group diversity is high, even if the groups share no alleles. Try it and see.

      Lewontin made a similar mathematical mistake in his paper, when he compared within-group entropy to total entropy. The logarithmic nature of entropy means that when within-group entropic diversity is high, the total and within-group entropies will be similar in magnitude, even if all groups are completely different. So the additive "between-group" entropic diversity will be low relative to the total, whenever the within-group diversity is high, even if the groups shared no alleles at all. Even if the groups were different species.

      All this is the result of making poorly-thought-out assumptions about the mathematics of diversity. A more rigorous approach shows that heterozygosity needs to be transformed to Kimura and Crow's effective number of alleles before it can be used in ratio measures of group similarity or differentiation, and one must take the exponential of entropy to use it in such ratio measures. The complete theory connecting diversity and differentiation is given in my 2007 paper in Ecology, "Partitioning diversity into independent alpha and beta components", and my 2008 paper in Molecular Ecology, "Gst and its relatives do not measure differentiation". I treat Lewontin's mistake in the latter paper.

      Simon: The arbitrary Fst cut-offs mentioned by Wright, which you mention, are complete nonsense. Fst cannot be interpreted without knowing the within-group diversity.

      Lou Jost

      Delete
    30. @Lou Jost ("unknown"): Without yet having gone over the papers of yours that you cite, I am puzzled. Are you arguing that the amount of heterozygosity in human populations is near-maximum? Humans have less heterozygosity than invertebrates, for example?

      Delete
    31. @Larry: What is your point?

      That Siddhartha Mukherjee is not wrong in what he says about human "races" and predictability of differences in single traits. That saying that races are arbitrary constructions is not the same as saying that there are no genetic differences among human populations. It is very clear from Mukherjee's point about Nigerians versus Namibians that his definition of "race" places them in the same "race" even though they are of different populations.

      Delete
    32. A simple though highly artificial case that ought to illustrate Mukherjee's point: suppose there are 4 unlinked loci each with 4 alleles, and two haploid populations both of which have all alleles present. Call the alleles A, a, B, b at the first locus. In each case population 1 has 40% A, 40% B, 10% a, and 10% b, while population 2 has 40% each of a and b, 10% each of A and B. The other loci are similarly distributed with locus two having CcDd, locus three having EeFf, and locus four having GgHh. If you see genotype ACEG, there is a very probability that the sampled individual belongs to population 1*. But the probability that a randomly sampled individual in population 1 has that genotype is only 0.0256.

      (*Well, this actually depends on how you sample. If you sample people randomly and population 2 is enough larger than population 1, then it's more likely that the individual belongs to population 2. But forget that.)

      Delete
    33. Joe, no, I am not making any empirical claim at all. I am making a mathematical claim. Population geneticists incorrectly partition heterozygosity into within- and between-group components. Their additive between-group component ALWAYS necessarily approaches zero when the within-group "diversity" is high. So just because "most of the diversity is within groups", you can't infer anything about genetic differentiation between groups. Please look at my genetics paper or do some concrete examples. Try a simple case like two demes, each with 20 equally common alleles, none shared between groups. Even though these groups have no alleles in common, the "between-group" component of heterozygosity will be very much smaller than the mean with-group heterozygosity.

      And then to add insult to injury, geneticists take this false partition and divide it by the total heterozygosity to get a measure of genetic differentiation. This doesn't work because heterozygosity is not linear wrt to pooling of equally large, equally diverse demes.

      Lewontin does something similar in his paper, using the ratio of within-group entropy to total entropy as a measure of genetic similarity between groups. Entropy, like heterozygosity, is nonlinear with respect to pooling demes. So the ratio approaches unity even if the demes share no alleles.

      So my point is that there is a great deal of sloppy talk and incorrect inferences above about within- and between-group diversity. This is just a mathematical point of order, though; the conclusions could still be right even if the reasoning is wrong. But it would be nice to see valid mathematical inferences when people are arguing about this.

      Lou Jost

      Delete
    34. @Lou Jost: You are basically arguing that using F_ST is subject to a version of long branch attraction. However in this case we know that the branch lengths for the two Gorilla subspecies are longer than for any human populations. In other words, you are basically saying that LBA is a more relevant issue when branch lengths are smaller.

      Delete
    35. Simon, I am not sure I would put it that way. Whatever the origin of the within-group diversity, Fst will always be low when within-group diversity is high. That's just a mathematical fact.

      It is easy to fix by partitioning heterozygosity correctly (it is not additive). That's what my 2008 genetics paper is about. Hedrick (2005) also saw this and made an empirical correction that is almost right.

      Lou Jost

      Delete
    36. @Joe Felsenstein

      Mukherjee said, "But if you encounter a man whose great-grandparents came from Nigeria or Sierra Leone, you can say little about the features of this particular man."

      Do you agree with him?

      Mukherjee also said, "... the genetic diversity within any racial group dominates the diversity between racial groups. "

      Isn't this going to be true of almost all subspecies, races, demes etc? In other words, it's not really an argument against the existance of such genetically isolated subdivisions, is it?

      Delete
    37. Larry, see my comments above. It depends how you quantify diversity. If you quantify it as heterozygosity, the genetic diversity within any group necessarily dominates the genetic diversity between groups, if within-group diversity is high (as it usually is for genetic markers). This is true even when the groups are completely different species (try it and see for yourself); it is a mathematical property of heterozygosity and that's why heterozygosity cannot be used as a diversity measure.

      So yes, you are right, the second statement you quote from Mukherjee is not relevant to the debate about the existence of races.

      It could be that at a given locus there is no single fixed "Nigerian" allele, but there could be a set of alleles with a recent common ancestor that are more common in Nigerian people than other people. Mukherjee's statement about genetic diversity doesn't contradict this scenario. (He could still be right by accident, though. I only point out that his conclusion doesn't follow from what he said in that quote.)
      Lou Jost

      Delete
    38. @Larry: Mukherjee said, "But if you encounter a man whose great-grandparents came from Nigeria or Sierra Leone, you can say little about the features of this particular man."

      Do you agree with him?


      No if we are talking about skin color or type of hair. Yes if we are talking about shape of nose, shape of fingers or parameters of kidney function. The prediction from genetic differences made by statistics like FST would be appropriate if there were no natural selection. Otherwise it would depend on the type of natural selection.

      Mukherjee also said, "... the genetic diversity within any racial group dominates the diversity between racial groups. "

      Isn't this going to be true of almost all subspecies, races, demes etc? In other words, it's not really an argument against the existance of such genetically isolated subdivisions, is it?


      Is it intended to argue that human populations do not differ genetically? Or is it intended to argue that the large entities called "races" are not responsible for the great majority of genetic differences. We have a real what-is-the-question question here.

      Delete
  4. This is a great example of what can go wrong when you try to explain science to 5-year-olds. I understand the Mukherjee is making about pairwise variance in his first sentence, but honestly, most people don't have the math. They read this, get some fuzzy general impression of racial equality, and pass it around in a game of Telephone, where it gets totally garbled. I've had people tell me, with a straight face, that I (an American of European descent) am more closely related to Africans than I am to other Europeans,

    What's worse, racists hear this garbled end result and figure "Scientists are so dumb they can't tell races apart. I can, so I must be a lot smarter than them."

    ReplyDelete
  5. Couldn't agree more. When interpreting a clinical genetic test, knowing the patient ethnicity can be very helpful. Mukherjee should look at the Exome Aggregation Consortium data (ExAC) - variants are separated into broad ethnic groups and you can see obvious and sometimes dramatic differences in frequencies between them.

    ReplyDelete
  6. We can use genetic markers, if we have lots of them, to tell apart almost any two populations (say, North Swedes and South Swedes). Does that make them different races?

    I think that we are jumping from acknowledging that there are real genetic differences between populations to making assertions about "race" being real and not being a social construct.

    ReplyDelete
    Replies
    1. If you really could, why not? The French used to speak of being the "Gallic Race" -- the problem was that people like Cavalli-Sforza showed that Europeans were pretty genetically homogenous across borders and that it wouldn't be possible to distinguish the French from Germans, etc.

      Delete
    2. But there are clines in distribution of all sorts of alleles. They just don't track national borders very closely. Bet you could tell a Breton from a Brandenburger pretty easily, given lots of data.

      Delete
    3. No doubt. Especially the Breton-speaking Bretons that tend to intermarry among themselves. And there might be important health implications based on that genetic difference that would be worth making the distinction.

      Delete
    4. So people are using "race" here just to mean different populations?

      One is not saying that, say, a Swede and an Iranian are of "the same race"?

      Delete
    5. Yes. That's what it means to say races are or are not genetic or are mere "social constructs". It may well be, as often is the case, morphological characters such as skin color are not good markers for the underlying population structure, and that, for example, various light skinned peoples are not particularly genetically related, but that simply means that a traditional taxonomic unit should be revised in favor of ones better supported, not that they are all figments of the imagination.

      Delete
    6. So how many "races" are there, as you use the word? 1000?

      Delete
    7. Maybe. Certainly we are beginning to realize that even groups once thought to be purely cultural (like French Canadians) are associated with various genetic variations associated with disease.

      Delete
  7. 'Nambia'? Doesn't exist.

    Does he mean Namibia?

    ReplyDelete
  8. It must be a miracle when one considers how some monkeys have evolved into different skin colored monkeys now are called intelligent monkeys....
    I guess evolution is not only random. It is also confusing but not to the devout believers...

    ReplyDelete
    Replies
    1. Eric says: evolution is confusing
      Yes, I believe Eric is confused. Not only with respect to the difference between monkeys and the great apes, including man.

      But everything in biology is of course confusing as long as you believe and insist that evolution is wrong.

      Delete
    2. How are you defining "evolution"? Darwin's and dawkins' version cannot be tested so it isn't even wrong...

      Delete
    3. It's shit that some people chose to cherish. It's their problem but sanity needs to be verified...

      Delete
  9. Isn't an 80,000 year separation long enough to get significant differences in the DNA of two populations - or would it take much longer than that. Aren't there many characteristics that would differentiate the population in Japan from the population in Kenya?

    ReplyDelete
  10. Yes anyone who talks about people groups is under the burden of dealing with the historical issues of RACE. So yes pC is operative here while not corrupting the science.
    There is indeed no such thing as race.
    What there is IS segregated populations that gain particular details in their bodies..
    This is a flaw in evolutionist thinking.
    For example a YEC would say all european population groups were first brown skinned and only later , upon migration to europe, became white. YET this was in already segregated populations with different languages.
    A evolutionist says there was a brown skin group that migrated to europe , became white, then segregated into groups with different languages coming soon after.
    The creationist sees traits as from influences from the environment and so our bodies adapt instantly to them.
    So no races exist . The evolutionist must see races as existing. They neede the original tribe to evolve its traits before separation.
    Race is not a social construction in evolutionary terms. Its real populations that evolved separately in the past.
    i think a flaw in evolutionary concepts is shown in the problem of how to group mankind.
    It works fine for Genesis believing creationists.
    YEC should jump on this.


    ReplyDelete
  11. I'm not sure the quotes necessarily contradict each other. Mukherjee has two points: 1. It's easy predict a person's ancestry from their genome, 2. It's hard to predict genetic features in a person's genome from their ancestry.

    Lipkin agrees with the 1st point, but says nothing about the second.

    I know little about human genetics, but here's how I might believe the 2nd point: If a particular "race" had a slight increase in probability of a certain set of snps (say, a 1% increase in frequency of 100000 snps) you would expect a genome of that race to have 1000 (+/-200) more of those snps than someone else: Probably a pretty statistically reliable test for ancestry. On the other hand, knowing a person was of a particular race tells you little about a snp. If the snps have a 50% probability in race 1 and a 51% probability in race 2, if you try to predict the snp based on race you're going to be wrong about half of the time. I base my impression on a rough skim of http://dx.doi.org/10.1038%2Fncomms1104 , hopefully I did not butcher the idea.

    If that's Mukherjee's point, I still agree that he is muddying the issue a bit. He is carefult to add the caveat "in a genetic sense", but it would probably be clearer to say "genetic feature" instead of just "feature". I'm also not sure I agree that "you can predict little about the person's genome", or that "it makes little sense the lump them into the same category". It depends what he means by "little". The KKK could still argue while you might not be able to reliably predict particular genetic features given ancestry, you could still predict global genome properties like # of snps.

    ReplyDelete
    Replies
    1. Opps that doi was supposed to be http://dx.doi.org/10.1534%2Fgenetics.106.067355

      Delete
  12. I'm late to the party, but:

    ========================
    Continuous geographic structure is real, “discrete races” aren’t

    By Nick Matzke on February 29, 2012 2:38 PM
    http://www.pandasthumb.org/archives/2012/02/continuous-geog.html
    ========================

    ReplyDelete
    Replies
    1. Unfortunately, Larry will read this as "There's no way to distinguish Nigerians from Norwegians".

      Delete
    2. @John and Nick

      Mukherjee says,

      But if you encounter a man whose great-grandparents came from Nigeria or Sierra Leone, you can say little about the features of this particular man.

      Do you agree with him?

      Delete
    3. @Nick

      I understand your point. It depends on a strict interpretation of the word "race" with no allowance for the fact that other scientists have different definitions.

      That's fine by me. You can use whatever definition suits your purpose. What I object to is scientists who write as though the issue was settled and the only scientific conclusion is that there are no human races.

      It would be much better to state that there are differing opinions within the scientific community about the ways we define human populations that differ in allele frequencies.

      There's not much doubt that Norwegians, on average, differ from Nigerians in terms of allele frequencies. Since they are part of the same species, it follows naturally that there will be interbreeding at the edges of the various populations that occupy the space between Norway and Nigeria.

      Does that mean that such populations don't really exist?

      As you point out in your article, Jerry Coyne disagrees with you. Coyne wrote the book on speciation so it's safe to assume he has some expertise in this field.

      Isn't it fair to say that there are differing opinions among scientists about whether there are genetically isolated human populations?

      Delete
    4. Larry,

      Jerry Coyne did indeed write the book on speciation (with H. Allen Orr), and it's a great book. But it isn't a book on subspecies, so I'm not sure how that's relevant. I don't actually know what Coyne has to say on human subspecies or races. Can you point me to it?

      It isn't a question of a little interbreeding around the edges. As Nick says, it's a more or less continuous geographic distribution. A transect from Nigeria to Norway, perhaps by way of Egypt and the Bosporus, would be appropriate. Where in that transact would you place the division between races?

      I take Mukherjee to be referring to genetic features, not just the handful of obvious physical characteristics. And clearly he's right about that; there are few if any private alleles.

      Can we all agree to ignore Eric?

      Delete
    5. By the way, Larry, have you read Nick's post? I'm wondering, because none of his points depend on having any sort of strict definition of "race".

      Delete
    6. @John: Actually Nicks post makes points that have to do with a strict definition of "race", namely that anyone proposing that there are races, should damn well have a solid definition of the term. So far Larry has mainly claimed that there are genetically isolated populations, which of course is just wrong. I don't think one can reasonably make the claim that a particular theoretical term is useful for science, unless it can be shown to do some work and that generally requires the term to mean something. If the "race-realists" can't give a functional definition of what race is, I think it's reasonable to dismiss them for that reason alone.

      Delete
    7. Isn't it fair to say that there are differing opinions among scientists about whether there are genetically isolated human populations?

      As a layperson I'm going to ask what may be a loaded question: How "isolated" is "isolated"? Any actual figures available (did I miss them in the discussion above; is Nick Matzke's illustration in the Panda's Thumb article sufficient)?

      Delete
  13. Larry says: Isn't it fair to say that there are differing opinions among scientists about whether there are genetically isolated human populations?
    If we look at dog genetics, we don't have much of a problem with identifying specific races - from Chihuahua to Mastiff? Yet they are all dogs. Maybe we need a new word because 'race' is a loaded word when used with respect to the human race?

    ReplyDelete
    Replies
    1. Well, again we could look at the data. There are studies on how divergent dog breeds are from one another (e.g. Parker et al, 2004 "Genetic Structure of the Purebred Domestic Dog", Science, 304:1160-1164.) and if we went by F_ST (despite Lou Josts points above) we find that it`s .33 for dog breeds. We know that an island model is valid for dog breeding, because what breeders have been doing is very much setting up a situation in which the island model is valid. That's a case where we do have discrete variation and we have a higher degree of differentiation, mainly because of the founder effect. This is a qualitatively and quantitatively different situation than in humans.

      Delete