More Recent Comments

Wednesday, June 15, 2016

What does a person's genome reveal about their ethnicity and their appearance?

If you knew the complete genome sequence of someone could you tell where they came from and their ethnic background (race)? The answer is confusing according to Siddhartha Mukherjee writing in his latest book "The Gene: an intimate history." The answer appears to be "yes" but then Mukherjee denies that knowing where someone came from tells us anything about their genome or their phenotype. He writes the following on page 342.

... the genetic diversity within any racial group dominates the diversity between racial groups. This degree of intraracial variability makes "race" a poor surrogate for nearly any feature: in a genetic sense, an African man from Nigeria is so "different" from another man from Namibia that it makes little sense the lump them into the same category.

For race and genetics, then, the genome is strictly a one-way street. You can use the genome to predict where X or Y came from. But knowing where A or B came from, you can predict little about the person's genome. Or: every genome carries a signature of an individual's ancestry—but an individual's racial ancestry predicts little about the person's genome. You can sequence DNA from an African-American man and conclude that his ancestors came from Sierra Leone or Nigeria. But if you encounter a man whose great-grandparents came from Nigeria or Sierra Leone, you can say little about the features of this particular man. The geneticist goes home happy; the racist returns empty-handed.
I find this view very strange. Imagine that you were an anthropologist who was an expert on humans and human evolution. Imagine you were told that there's a woman in the next room whose eight great-grandparents all came from Japan. According to Mukherjee, such a scientist could not predict anything about the features of that woman. Does that make any sense?

I suspect this is just a convoluted way of reconciling science with political correctness.

Steven Monroe Lipkin has a different view. He's a medical geneticist who recently published a book with Jon R. Luoma titled "The Age of Genomes: tales from the front lines of genetic medicine." Here's how they explain it on page 6.
Many ethnic groups carry distinct signatures. For example, from a genome sequence you can usually tell if an individual is African-American, Caucasian, Asian, Satnami, or Ashkenazi Jew, even if you've never laid eyes on the patient. A well-regarded research scientist whom I had never met made his genome sequence publically available as part of a research study. I remember scrolling through his genetic variant files and trying, more successfully than I had expected, to guess what he would look like before I peeked at his webpage photo. The personal genome is more than skin deep.
This makes more sense to me. If you know what you look for—and Simon Monroe certainly does—then many of the features of a particular person can be deduced from their genome sequence. And if you know which variants are more common in certain ethnic groups then you can certainly predict what a person might look like just by knowing where their ancestors came from.

What's wrong with that?


79 comments :

Jonathan Badger said...

There is also an increasing realization that ethnicity is associated with genetic variation associated with disease outcome which really isn't congruent with the "race is just a social construct" idea. For example, some collaborators of mine have found that variations in the LSAMP locus predominate in prostate tumors from African American men but not in other groups, and this probably contributes to the greater incidence of prostate cancer among African Americans.

unknowing said...

Honestly, it sounds more like Mukherjee is confused here, rather than trying to be "politically correct." It seems he's struggling with the interpretation of genetic diversity within and between racial categories. I.e. because of the (comparatively high) genetic diversity within some of these categories, category may be poorly predictive at the level of "all known variants." Of course, that broad categories such as white, black, Asian, are poorly predictive at this level, does not mean that one cannot predict with reasonable accuracy the presence or absence of a subset of genetic variants, those common to members of that racial category (assuming that these categories do indeed reflect shared ancestry on some level). Unsurprisingly, category becomes more useful a predictor the more it defines a specific population.

Anonymous said...

From Mukherjee:
"the genetic diversity within any racial group dominates the diversity between racial groups. This degree of intraracial variability makes "race" a poor surrogate for nearly any feature: in a genetic sense"

It seems to me this idea that genetic diversity indicates that race is an illusion was started by Lewontin back in the 80s. I think the idea was that misrepresenting the science can be justified for the greater good of mitigating racism, but I doubt anyone in the KKK gave a fig for any of this.
Here is why I think this is wrong, but if my interpretation is off I of course expect a vigorous correction.

Consider several loci involved in skin pigmentation. It could easily be the case that if one looks at genetic diversity within a population from Nigeria and within a population from Iceland one would see greater diversity within those populations that between the consensus sequences for both populations. But skin pigmentation is not an illusion- those 2 populations fall at extreme opposite ends of the phenotypic spectrum. Most of the 'diversity' within the population isn't doing anything and a few key differences between the populations explain most of the differences

CherryBombSim said...

This is a great example of what can go wrong when you try to explain science to 5-year-olds. I understand the Mukherjee is making about pairwise variance in his first sentence, but honestly, most people don't have the math. They read this, get some fuzzy general impression of racial equality, and pass it around in a game of Telephone, where it gets totally garbled. I've had people tell me, with a straight face, that I (an American of European descent) am more closely related to Africans than I am to other Europeans,

What's worse, racists hear this garbled end result and figure "Scientists are so dumb they can't tell races apart. I can, so I must be a lot smarter than them."

Ian Bosdet said...

Couldn't agree more. When interpreting a clinical genetic test, knowing the patient ethnicity can be very helpful. Mukherjee should look at the Exome Aggregation Consortium data (ExAC) - variants are separated into broad ethnic groups and you can see obvious and sometimes dramatic differences in frequencies between them.

Joe Felsenstein said...

We can use genetic markers, if we have lots of them, to tell apart almost any two populations (say, North Swedes and South Swedes). Does that make them different races?

I think that we are jumping from acknowledging that there are real genetic differences between populations to making assertions about "race" being real and not being a social construct.

CrocodileChuck said...

'Nambia'? Doesn't exist.

Does he mean Namibia?

Jonathan Badger said...

If you really could, why not? The French used to speak of being the "Gallic Race" -- the problem was that people like Cavalli-Sforza showed that Europeans were pretty genetically homogenous across borders and that it wouldn't be possible to distinguish the French from Germans, etc.

John Harshman said...

But there are clines in distribution of all sorts of alleles. They just don't track national borders very closely. Bet you could tell a Breton from a Brandenburger pretty easily, given lots of data.

Jmac said...

It must be a miracle when one considers how some monkeys have evolved into different skin colored monkeys now are called intelligent monkeys....
I guess evolution is not only random. It is also confusing but not to the devout believers...

Jonathan Badger said...

No doubt. Especially the Breton-speaking Bretons that tend to intermarry among themselves. And there might be important health implications based on that genetic difference that would be worth making the distinction.

Unknown said...

It seems to me this idea that genetic diversity indicates that race is an illusion was started by Lewontin back in the 80s. I think the idea was that misrepresenting the science can be justified for the greater good of mitigating racism

Can you be more precise on how Lewontin has "misrepresented science"? Lewontin, based on a small number of genes for which data was available estimated F_ST for humans at about .15. Now we have results based on a large number of SNPs and the current best estimate of F_ST is about .12 (which one should note is lower than Lewontins estimate). In the mid 70s antropologists affirming the existence of human races defined them through an S_FT greater than .25, in the mid 80s they accepted Lewontis estimate, but shifted their cutoff to .15 and now that the SNP data shows we're lower than that they are down to F_ST>.1. The key question here is, how much we can reasonably have the goalpost moved. Race among humans was first introduced as a concept involving multiple events of creation by god. There's some BS right there. Then we got phylogenetic hypotheses that placed some "races" closer to chimps than to other humans. Again that's easy to refute. So again we got a new definition of race, now as separate divergences from an extinct ancestor, that still for some reason ended up interfertile. Now that of course was bollocks, too. So then we got definitions based on F_ST values and they changed whenever it turned out that human F_ST values were to small to make the accepted definition apply. If we move beyond mapping SNPs and it turns out that aligning full genome assemblies yields and F_ST of 0.09, we'll certainly see "race realists" move the goalpost once again and drop the cutoff to say 0.05.
We understand human evolution well enough to readily grasp why post-dispersal differentiation can't be that big a deal. Most of human evolution happened within a somewhat confined area with a population that was at least somewhat close to panmictic. And then there was dispersal and maybe 2500 generation of limited interbreeding, but there are still clines that show interbreeding took place during that time. Trying to shoehorn our evolutionary history into a model of races, that was conceived before evolutionary biology even existed seems like a patently bad idea.

Joe Felsenstein said...

So people are using "race" here just to mean different populations?

One is not saying that, say, a Swede and an Iranian are of "the same race"?

Unknown said...

There are a variety of problems with this:
a) Anybody who puts together the phrase "just a social construct" is obviously ignorant of what the term "social construct" means. It's on par with "just a theory". There's no "just" here. Of course race is a social construct. Gravity is a social construct. Social construct isn't the opposite of valid.
b) "African American men" isn't a useful category. That category is not valid, because it is at best a legal term. In terms of genetic differences there's a far greater diversity among "african americans" (simply because the migrant populations were subject to the founder effect) and after the institution of the "one drop rule" it became even less meaningful.
c) Why would you design a study using race as a category, when you could do a GWAS, which would capture genetic variations more clearly and via mapping also points to loci of interest? In particular since pooling data in this way means that you will not capture correlations for a substantial amount of human genetic variation and your statistical analysis can run into Simpson paradox type problems. It's not as if genotyping in humans was prohibitively expensive.

Unknown said...

Isn't an 80,000 year separation long enough to get significant differences in the DNA of two populations - or would it take much longer than that. Aren't there many characteristics that would differentiate the population in Japan from the population in Kenya?

Jonathan Badger said...

Yes. That's what it means to say races are or are not genetic or are mere "social constructs". It may well be, as often is the case, morphological characters such as skin color are not good markers for the underlying population structure, and that, for example, various light skinned peoples are not particularly genetically related, but that simply means that a traditional taxonomic unit should be revised in favor of ones better supported, not that they are all figments of the imagination.

Larry Moran said...

He wrote "Namibia." My mistake, corrected.

Thanks.

Robert Byers said...

Yes anyone who talks about people groups is under the burden of dealing with the historical issues of RACE. So yes pC is operative here while not corrupting the science.
There is indeed no such thing as race.
What there is IS segregated populations that gain particular details in their bodies..
This is a flaw in evolutionist thinking.
For example a YEC would say all european population groups were first brown skinned and only later , upon migration to europe, became white. YET this was in already segregated populations with different languages.
A evolutionist says there was a brown skin group that migrated to europe , became white, then segregated into groups with different languages coming soon after.
The creationist sees traits as from influences from the environment and so our bodies adapt instantly to them.
So no races exist . The evolutionist must see races as existing. They neede the original tribe to evolve its traits before separation.
Race is not a social construction in evolutionary terms. Its real populations that evolved separately in the past.
i think a flaw in evolutionary concepts is shown in the problem of how to group mankind.
It works fine for Genesis believing creationists.
YEC should jump on this.


Rolf Aalberg said...

Eric says: evolution is confusing
Yes, I believe Eric is confused. Not only with respect to the difference between monkeys and the great apes, including man.

But everything in biology is of course confusing as long as you believe and insist that evolution is wrong.

Jonathan Badger said...

a)"Gravity is a social construct"? Isn't it a "transformative hermeneutic" as per Sokal?
b) It really, really, *is* a useful category for medicine, though. Yes, AAs are admixtures of European and African ancestry and yes, individuals vary as to their makeup. But while not all AAs with prostate cancer have the LSAMP variation, vary few non-AAs have it. So typical treatments for prostate cancer might not work very well for AAs. Part of the problem with medical research in the past is that white European-descended males are treated as the "standard" human. Females and people of other ancestries simply don't respond in the same way to a lot of treatments.
c) The study is dealing with actual tumors from actual patients, not mere GWAS studies

Joe Felsenstein said...

So how many "races" are there, as you use the word? 1000?

Jonathan Badger said...

Maybe. Certainly we are beginning to realize that even groups once thought to be purely cultural (like French Canadians) are associated with various genetic variations associated with disease.

Anonymous said...

He misrepresented it by claiming that because race can't be precisely defined genetically, the concept of race is an illusion. Many others who picked up on this- from pop science writers to the Phil Donohue Show suggested race was the equivalent of defining a group of people based on being left-handed, or bisexual
I think the Fst is irrelevant for whether race can have a genetic definition, but even if its not a real taxonomic category doesn't mean its an illusion.
I think CBSim in the next post is right. This does more harm than good if the goal is to mitigate racism.

Joe G said...

How are you defining "evolution"? Darwin's and dawkins' version cannot be tested so it isn't even wrong...

ealloc said...

I'm not sure the quotes necessarily contradict each other. Mukherjee has two points: 1. It's easy predict a person's ancestry from their genome, 2. It's hard to predict genetic features in a person's genome from their ancestry.

Lipkin agrees with the 1st point, but says nothing about the second.

I know little about human genetics, but here's how I might believe the 2nd point: If a particular "race" had a slight increase in probability of a certain set of snps (say, a 1% increase in frequency of 100000 snps) you would expect a genome of that race to have 1000 (+/-200) more of those snps than someone else: Probably a pretty statistically reliable test for ancestry. On the other hand, knowing a person was of a particular race tells you little about a snp. If the snps have a 50% probability in race 1 and a 51% probability in race 2, if you try to predict the snp based on race you're going to be wrong about half of the time. I base my impression on a rough skim of http://dx.doi.org/10.1038%2Fncomms1104 , hopefully I did not butcher the idea.

If that's Mukherjee's point, I still agree that he is muddying the issue a bit. He is carefult to add the caveat "in a genetic sense", but it would probably be clearer to say "genetic feature" instead of just "feature". I'm also not sure I agree that "you can predict little about the person's genome", or that "it makes little sense the lump them into the same category". It depends what he means by "little". The KKK could still argue while you might not be able to reliably predict particular genetic features given ancestry, you could still predict global genome properties like # of snps.

ealloc said...

Opps that doi was supposed to be http://dx.doi.org/10.1534%2Fgenetics.106.067355

John Harshman said...

Race is an illusion. Geographically structured genetic variation is not an illusion. Is this so hard to understand?

Anonymous said...

'Is this so hard to understand?'

Not at all. Then certain defined geographically structured genetic variations = race.

Unknown said...

a) A social construct is the relationship between a symbol and its meaning. The word gravity and what it refers to is a social construct. So is that F=G*m_1*m_2/r² refers to Newtons law of Gravity, because if G was Coulombs constant and m_1 and m_2 were charges, that would be Coulomb's law for electrostatic forces. By convention we use m to designate mass and G as the gravitational constant, but it's not as if that convention was in any way predicated by nature.
b and c) I still disagree. Your argument is that race is useful as a proxy for genetic diversity among humans. But I think we are in agreement that it does not resolve most of the genetic diversity that exists.We can also straight up use genotyping to directly access genetic diversity, which makes a proxy superfluous. Now, I tried to identify the study in question and found Petrovics et al (2015). If that's what you are referring to then they did perform a GWAS. And very much to the point they found the LSAMP variation in about 1/4th of African Americans vs. 1/8th in Caucasians. No other data is given. This means that in the absence of no further data using a treatment that works better for the LSAMP variation as a default for African Americans would improve things for 1/4th of the patients, but make things worse for 3/4th of them. It's not as if genotyping as a diagnostic tool was SciFi at this point, we can individualize treatment based on the actual case. So while I agree with the point made about treating a subset of humans as the standard, I disagree strongly about the possible remedy: We can connect different responses to treatment directly to genetic differences and we can individualize treatment based on using genotyping.

hoary puccoon said...

Race is a word that has been around a lot longer than "geographically structured genetic variations." In fact, it was around for quite a while before anyone realized genes were made of DNA, let alone how they worked.

Basically, race was used to indicate a few, large groups, usually defined by skin color-- black, white, red, yellow, and sometimes brown. This classed Australian aborigines and some South Asians in the same category with Africans. No competent scientist today would accept that definition as a description of "geographically structured genetic variation."

Based on this very inadequate description of genetic variation, elaborate "racial differences" were then "scientifically" described. (If you want to know what the "black personality" was supposed to be like, look at Barack Obama-- and take the exact opposite on pretty much every measure.)

Obviously, this historic use of the word race has no scientific basis, and virtually no connection to modern genetic studies. Unfortunately, it is still used by some people as a sociological tool to separate and discriminate-- always to the advantage of one's own "race."

Allowing the word race to be used as a synonym for specific genetic variations found in increased concentrations in specific populations does nothing to further science. It's on the level with creationists claiming that evolution is just another "creation story."

This is why a lot of people want the word race dropped entirely. As long as it is confounded with old, disproven ideas, it contributes nothing to science, and means nothing but trouble for society.

Larry Moran said...

@hoary puccoon

Thank-you for describing the politically correct American position.

Do you think the word "race" should be banned entirely from the scientific literatre, including its use to describe large demes in other species?

What word should we use to describe large subpopulation of humans that are genetically isolated from each other; for example Africans, Asians, and Europeans? Should we just ignore the scientific evidence and pretend those groups don't exist?

What is the political goal you are trying to achieve? Are you hoping that we can eliminate racism by having a bunch of scientists say there's no scientific evidence for human subpopulations?

John Harshman said...

But Africans, Asians, and Europeans are not genetically isolated from each other. These groups aren't groups. This isn't about political correctness; it's about reality. What we have is clinal variation in lots of alleles, with not all that much correspondence in clines among different genes.

Joe Felsenstein said...

I wasn't aware that these subpopulations were "genetically isolated" from each other. Generally reproductive isolation gets tested empirically, usually starting about 15 minutes after the boat lands. So far no isolation has been found.

Or does Larry's term "genetically isolated" just mean differs in multiple gene frequencies enough that, by combining lots of them, we can tell them apart?

John Harshman said...

Isolation by distance, if anything.

hoary puccoon said...

What political goal are YOU trying to achieve?

The way the word race has been used in years past (and is still used in many general discussions) and the way different ethnic groups are studied for specific alleles, usually having to do with medical conditions, have no relationship to each other. As I understand it, there is far more variation among Africans than there is between certain African groups and the entire population of the rest of the world. So when you're using "Africans" as an example of a "large subpopulation of humans that are genetically isolated from each other" you're simply wrong.

You could, of course, redefine race so that various African groups that show major genetic differences are considered different races. But that would almost certainly lead to confusion with the old definition of race, meaning all Africans, along with some Australians and Indians. Why would you want to do that?

As far as using "race" to mean Black Angus as opposed to Zebu cattle, for example, I'm reasonably sure that cattle breeds are far more cohesive genetically than the human groups traditionally defined as races.

I really don't understand why you would want to use a term whose common use has very little to do with any scientific reality. If you mean specific genetic cohorts, why not just say so?

It also makes me a little uneasy that you're tossing a loaded term like politically correct into the discussion, without considering first what is scientifically correct.

Please, please tell me you haven't been Trumped.

Jonathan Badger said...

GWAS studies in the normal meaning of the term are dealing with genotyping of individuals. It's important to understand that Petrovics et al are dealing with the genomes of the tumors, which is not the same thing. Variants found on the tumors are much more relevant to potential treatments and couldn't be found by simple genotyping of populations.

Larry Moran said...

Joe Felsenstein asks,

Or does Larry's term "genetically isolated" just mean differs in multiple gene frequencies enough that, by combining lots of them, we can tell them apart?

Yes, it means that Homo sapiens is not a single panmictic population. It means there is limited gene (allele) flow between different population such that allele frequencies in these different groups can be very different. Different enough that it's quite possible to identify a member of such a population simply by looking at their genome.

Many of these allele frequency differences produce visible phenotypes so it's possible to identify members of large subpopulations simply by looking at the physical appearance of individuals.

Joe, do you agree with what I said or do you deny that such genetically isolated groups of humans actually exist?

Larry Moran said...

hoary pucoon says,

I really don't understand why you would want to use a term whose common use has very little to do with any scientific reality. If you mean specific genetic cohorts, why not just say so?

There are two separate issues here. I understand the first one. You, and others, don't like the word "race" because of it's political connotations and it's history.

Fine, I can see your point.

The second issue is about the genetics of human populations. In this case, you seem to be denying that there's any genetic differences between different groups of humans.

That's simply not true. My "political goal" is to fight against the misrepresentation of science.

We could quibble about the exact boundaries of the various genetically isolated groups and we could quibble about what to call them but to deny that there are any differences is absurd.

Like it or not, the average citizen of New Delhi is different from the average citizen of Tokyo. Different enough that you and I could easily sort a mixture of 1000 individuals from each city. Why in the world would you want to pretend that there is no scientific evidence for genetic difference between these two groups?

John Harshman said...

Why in the world would you want to pretend that there is no scientific evidence for genetic difference between these two groups?

Nobody is saying there isn't; that's your strawman. Read what people are actually saying. There are no isolated human groups. Sampling two geographically distant populations, as race advocates almost always do, ignores the continuous variation between them. There are no races because the geographic variation is not separable into discrete, sharply divided units. A transect of human populations between Delhi and Tokyo would be instructive here.

Jonathan Badger said...

Of course by the same argument you could argue that there are no such things as languages because besides people speaking Spanish and French, you have people speaking things like Catalan, which share features from both.

John Harshman said...

It's true. Languages also are constructs created by arbitrarily dividing a continuum. What we call languages are actually prestige dialects of much more continuous variation. Sometimes that continuity cuts across national borders. Catalan is spoken in both France and Spain, and grades into both languages. I don't think that works in your favor.

Larry Moran said...

@John Harshman

I very carefully read what you wrote. You said, "There are no isolated human groups." I interpret this to mean there are no genetically isolated human groups, is that correct?

Your reason for making such a statement is that all humans are members of a single species. That means they can all interbreed successfully. Since there will always be some limited amount of gene flow between human groups, this means that demes don't exist.

Is that a fair summary of your position?

Does your objection apply to all species or just to humans? Can you give me an example of a genetically isolated group in another species or do none of them exist as long as there is even a tiny amount of gene flow between them?

John Harshman said...

You're the one who used the term "genetically isolated". I was just using your terminology. I assumed you just meant something like "lack of significant gene flow" rather than actual speciation. I don't think there are any such groups in the human species. There are such groups in many other species, and we usually refer to them as subspecies, and this is usually because of geographic separation. There are countless arguments over whether populations deserve to be called subspecies, species, or nothing.

But go ahead. If you think there are human subspecies, can you delimit them? How many are there? Where are they?

Larry Moran said...

@John Harshman

So, just to be clear. You are saying that the human species cannot be subdivided into groups that have reduced gene flow between them. Is that correct?

You aren't claiming that Homo sapiens is one large panmictic population, are you?

Unknown said...

Larry, you are setting up a false dichotomy here. There are a lot of ways a population can show structure without being divisible into subpopulations with limited gene flow. Clinal variation in allele frequencies is one of them and certainly relevant in humans.
No one is claiming humans are panmictic, but there are a lot of ways for populations to not be panmictinc and still not be readily decomposed into distinct subpopulations. The island model is not universal and we have a far better understanding of human evolution to just blindly apply it. Look at how lactose-persistence is distributed for instance. We find clines radiating out from various locations and these seem to correspond to different alleles producing lactose-persistence. These clines were further affected by migratory movements and cultural effects, mainly by how important milk based food became in diets, which in turn was influenced by local climate variation. There are a lot of interesting things going on and most of them are obscured rather than elucidated by moving towards an island model for humans.

Unknown said...

Well, If you are running EIGENSOFT you're doing a GWAS. I also wonder what could prompt a difference in the tumors in a heritable fashion if not existing genomic diversity. AFAIK (and I'm no expert on cancer) the main reason particular types of tumors are more or less prevalent is that there are recessive mutations within the population that allow tumors to arise when the dominant copy in heterozygotes mutates.
If LSAMP variations lead to different treatment, then genotyping tumors seems like a reasonable diagnostic step in all cases of prostate cancer. I still don't understand how race is a useful category here.

Larry Moran said...

@Simon Gunkel,

I'm not sure I understand your position. Are you saying that the differences between the Maasai and the Central African pygmies could be just due to clinal variation and not to restricted gene flow?

Jonathan Badger said...

It's presumed that the genotype of the individual is responsible for variants on the tumors, but actually finding them on the somatic genome is difficult and why normal GWAS has a rather low reputation in cancer biology. It isn't a one-to-one mapping, as so focus has shifted to studying the genomes of the tumors themselves.

As to the rational, African Americans suffer from a considerably larger incidence of aggressive prostate cancer and patient groups want to know why.

Unknown said...

Aren't somatic GWAS reposible for identifying the BCRAs for instance?

Unknown said...

It's always a combination of migration patters, clinal variation, reduced gene flow and in some cases bottlenecks. If you look at Y chromosomes, mtDNA and autosomes across Africa you find evidence for rather intricate patterns of dispersal, numerous loci for which there are clines and plenty of evidence for at least repeated temporary high rates of gene flow. Heck, both groups you mention are interesting case, because they seem to be very stable as cultural groups, but genetically characterized by repeated large scale introgressions from various sources, indicating that they absorbed culturally distinct groups migrating to their respecitve regions regulary, without that having a large impact on their language (imagine England getting hit by the roman conquest, the Anglo-Saxons and the Norman conquest and rather than ending up with the linguistic hodgepodge that is English sticking to the Celtic of the Britons).

John Harshman said...

I'll agree that there are some regions of reduced gene flow. But reduced enough to create subspecies? Probably not. Most animal subspecies are either completely allopatric or have narrow hybrid zones. None of that for humans. Again, if you think there are human subspecies, how many are there and where are they?

Larry Moran said...

@John Harshman

So we agree that the species Homo sapiens is subdivided into subgroups that are genetically isolated from each other, right? We agree that the genetic difference in allele frequencies are sufficiently different that we can identify members of such groups just by looking at the sequence of their genomes, right?

Now we're just quibbling over what to call some of those groups, right? You don't think the major divisions, Africans, Asians, and Europeans, should be called "races" or "subspecies," right?

In addition, you want to defend the position that races don't exist by quibbling over the exact boundaries and by pointing out that in the modern world there is enhanced gene flow due to extensive migration, right?

Joe Felsenstein said...

@Larry: We have to look at what we do when we go from arguing that populations can be distinguished, using enough markers, to focusing on the "major divisions". Are the divisions there sharp-edged and do the differences between them explain most of the genetic variation between individuals?

In terms of the explanation of variation at a single locus, no. Lewontin's figure of 15% of the variation being due to "races" turns out to be a bit large.

Note also that what Siddhartha Mukherjee was talking about was that figure. From your original post, he was saying that

This degree of intraracial variability makes "race" a poor surrogate for nearly any feature: in a genetic sense, an African man from Nigeria is so "different" from another man from Namibia that it makes little sense the lump them into the same category.


I don't see that as denying that there are real genetic differences between Nigerians and Namibians.

John Harshman said...

Larry, just insert a "no" after every time you say "right?". You just aren't reading what I write, and that's the only way you can see a post in which I say there are no genetically isolated human populations and take that as agreement that there are genetically isolated human populations.

You seem to be having similar problems parsing what everyone else is saying. I don't understand why.

Larry Moran said...

John Harshman says, "I'll agree that there are some regions of reduced gene flow."

John Harshman says, "You just aren't reading what I write, and that's the only way you can see a post in which I say there are no genetically isolated human populations and take that as agreement that there are genetically isolated human populations."

John, I really am trying to understand your position. I'm guessing that you make an important distinction between groups with "reduced gene flow" and "genetically isolated population." Is that correct?

Larry Moran said...

Joe Felsenstein asks, "Are the divisions there sharp-edged and do the differences between them explain most of the genetic variation between individuals?"

The answer to both questions is "no." When you look at closely related species (e.g. chimps and bonobos) the answer to the first question is closer to "yes" (sharp-edged divisions) but the answer to the second question is probably still no.

Nobody is questioning the idea that the total number of differences between two unrelated individuals from the same region is some value, say "x," and that the differences between two individuals from different groups is some value "y" where "y" may be only a bit bigger than "x."

This is true of all recognized subspecies or races in other species, I think. Wouldn't it apply to the Western and Eastern subspecies of gorilla, for example?

What is your point?

John Harshman said...

Larry,

I don't know what you mean by "groups with reduced gene flow". I was talking about areas of reduced gene flow. That is, there are places on the map where gene flow occurs between nearby points at a lower rate than between nearby points at other places. Oceans are good examples of such places of reduced gene flow, as are any areas with no people. In order to have this refer to groups, you would, I suppose, need a complete circle of such places. I don't think there are any human populations that are this isolated. Certainly the broad descriptions "Asian, African, European" don't come at all close to any such ideal. Where are you getting all this?

Unknown said...

This is true of all recognized subspecies or races in other species, I think. Wouldn't it apply to the Western and Eastern subspecies of gorilla, for example?

See, F_ST for Western and Eastern Gorillas is ~.38 (Thalmann et al. 2007,The Complex Evolutionary History of Gorillas: Insights from Genomic Data, Mol. Biol. Evol. 24:146–158.), and within conservation biology the .25 value (which Wright mentioned as he defined F_ST) is still being used as a cutoff point for designating subspecies.
The Thalmann et al paper estimates that Western and Eastern Gorillas had a panmictic ancestral population ~60,000-70,000 generations ago, with limited gene flow ending about 10,000 generations ago, after which there was no gene flow. Compare that to humans where estimates for a panmictic ancestor are ~2,500-5,000 generations ago and limited gene flow never stopped.

Unknown said...

This is actually a problem caused by the way that geneticists have quantified and partitioned genetic diversity. Genetic diversity is usually quantified as heterozygosity, and then this is additively partitioned into within- and between-group components. (Fst or Gst is the ratio of this between-group component to the total (pooled) heterozygosity.) So when someone says that most of the "diversity" is within groups, they mean the total (pooled) heterozygosity minus the mean within-group heterozygosity is very small.

Geneticists then often misinterpret this to mean that genetic differences between groups are small. You can see that this is wrong just by noting two things: (1) heterozygosity is a probability, so it cannot exceed unity, and (2) total heterozygosity =< mean within-group heterozygosity (because heterozygosity is a concave function). Thus when within-group heterozygosity is high, it approaches unity, and total heterozygosity also necessarily approaches unity. Subtracting the former from the latter therefore necessarily gives a "between-group heterozygosity" close to zero, no matter how different the groups are. So the relative size of the "between-group" diversity is irrelevant to population structure, and could be close to zero even if the groups shared no alleles whatsoever. Try it and see for yourself.

This is also why Fst and Gst do not really measure genetic differentiation between groups; it will always be close to zero when within-group diversity is high, even if the groups share no alleles. Try it and see.

Lewontin made a similar mathematical mistake in his paper, when he compared within-group entropy to total entropy. The logarithmic nature of entropy means that when within-group entropic diversity is high, the total and within-group entropies will be similar in magnitude, even if all groups are completely different. So the additive "between-group" entropic diversity will be low relative to the total, whenever the within-group diversity is high, even if the groups shared no alleles at all. Even if the groups were different species.

All this is the result of making poorly-thought-out assumptions about the mathematics of diversity. A more rigorous approach shows that heterozygosity needs to be transformed to Kimura and Crow's effective number of alleles before it can be used in ratio measures of group similarity or differentiation, and one must take the exponential of entropy to use it in such ratio measures. The complete theory connecting diversity and differentiation is given in my 2007 paper in Ecology, "Partitioning diversity into independent alpha and beta components", and my 2008 paper in Molecular Ecology, "Gst and its relatives do not measure differentiation". I treat Lewontin's mistake in the latter paper.

Simon: The arbitrary Fst cut-offs mentioned by Wright, which you mention, are complete nonsense. Fst cannot be interpreted without knowing the within-group diversity.

Lou Jost

Joe Felsenstein said...

@Lou Jost ("unknown"): Without yet having gone over the papers of yours that you cite, I am puzzled. Are you arguing that the amount of heterozygosity in human populations is near-maximum? Humans have less heterozygosity than invertebrates, for example?

Joe Felsenstein said...

@Larry: What is your point?

That Siddhartha Mukherjee is not wrong in what he says about human "races" and predictability of differences in single traits. That saying that races are arbitrary constructions is not the same as saying that there are no genetic differences among human populations. It is very clear from Mukherjee's point about Nigerians versus Namibians that his definition of "race" places them in the same "race" even though they are of different populations.

John Harshman said...

A simple though highly artificial case that ought to illustrate Mukherjee's point: suppose there are 4 unlinked loci each with 4 alleles, and two haploid populations both of which have all alleles present. Call the alleles A, a, B, b at the first locus. In each case population 1 has 40% A, 40% B, 10% a, and 10% b, while population 2 has 40% each of a and b, 10% each of A and B. The other loci are similarly distributed with locus two having CcDd, locus three having EeFf, and locus four having GgHh. If you see genotype ACEG, there is a very probability that the sampled individual belongs to population 1*. But the probability that a randomly sampled individual in population 1 has that genotype is only 0.0256.

(*Well, this actually depends on how you sample. If you sample people randomly and population 2 is enough larger than population 1, then it's more likely that the individual belongs to population 2. But forget that.)

Unknown said...

Joe, no, I am not making any empirical claim at all. I am making a mathematical claim. Population geneticists incorrectly partition heterozygosity into within- and between-group components. Their additive between-group component ALWAYS necessarily approaches zero when the within-group "diversity" is high. So just because "most of the diversity is within groups", you can't infer anything about genetic differentiation between groups. Please look at my genetics paper or do some concrete examples. Try a simple case like two demes, each with 20 equally common alleles, none shared between groups. Even though these groups have no alleles in common, the "between-group" component of heterozygosity will be very much smaller than the mean with-group heterozygosity.

And then to add insult to injury, geneticists take this false partition and divide it by the total heterozygosity to get a measure of genetic differentiation. This doesn't work because heterozygosity is not linear wrt to pooling of equally large, equally diverse demes.

Lewontin does something similar in his paper, using the ratio of within-group entropy to total entropy as a measure of genetic similarity between groups. Entropy, like heterozygosity, is nonlinear with respect to pooling demes. So the ratio approaches unity even if the demes share no alleles.

So my point is that there is a great deal of sloppy talk and incorrect inferences above about within- and between-group diversity. This is just a mathematical point of order, though; the conclusions could still be right even if the reasoning is wrong. But it would be nice to see valid mathematical inferences when people are arguing about this.

Lou Jost

Unknown said...

@Lou Jost: You are basically arguing that using F_ST is subject to a version of long branch attraction. However in this case we know that the branch lengths for the two Gorilla subspecies are longer than for any human populations. In other words, you are basically saying that LBA is a more relevant issue when branch lengths are smaller.

Unknown said...

Simon, I am not sure I would put it that way. Whatever the origin of the within-group diversity, Fst will always be low when within-group diversity is high. That's just a mathematical fact.

It is easy to fix by partitioning heterozygosity correctly (it is not additive). That's what my 2008 genetics paper is about. Hedrick (2005) also saw this and made an empirical correction that is almost right.

Lou Jost

Larry Moran said...

@Joe Felsenstein

Mukherjee said, "But if you encounter a man whose great-grandparents came from Nigeria or Sierra Leone, you can say little about the features of this particular man."

Do you agree with him?

Mukherjee also said, "... the genetic diversity within any racial group dominates the diversity between racial groups. "

Isn't this going to be true of almost all subspecies, races, demes etc? In other words, it's not really an argument against the existance of such genetically isolated subdivisions, is it?

Unknown said...

Larry, see my comments above. It depends how you quantify diversity. If you quantify it as heterozygosity, the genetic diversity within any group necessarily dominates the genetic diversity between groups, if within-group diversity is high (as it usually is for genetic markers). This is true even when the groups are completely different species (try it and see for yourself); it is a mathematical property of heterozygosity and that's why heterozygosity cannot be used as a diversity measure.

So yes, you are right, the second statement you quote from Mukherjee is not relevant to the debate about the existence of races.

It could be that at a given locus there is no single fixed "Nigerian" allele, but there could be a set of alleles with a recent common ancestor that are more common in Nigerian people than other people. Mukherjee's statement about genetic diversity doesn't contradict this scenario. (He could still be right by accident, though. I only point out that his conclusion doesn't follow from what he said in that quote.)
Lou Jost

Joe Felsenstein said...

@Larry: Mukherjee said, "But if you encounter a man whose great-grandparents came from Nigeria or Sierra Leone, you can say little about the features of this particular man."

Do you agree with him?


No if we are talking about skin color or type of hair. Yes if we are talking about shape of nose, shape of fingers or parameters of kidney function. The prediction from genetic differences made by statistics like FST would be appropriate if there were no natural selection. Otherwise it would depend on the type of natural selection.

Mukherjee also said, "... the genetic diversity within any racial group dominates the diversity between racial groups. "

Isn't this going to be true of almost all subspecies, races, demes etc? In other words, it's not really an argument against the existance of such genetically isolated subdivisions, is it?


Is it intended to argue that human populations do not differ genetically? Or is it intended to argue that the large entities called "races" are not responsible for the great majority of genetic differences. We have a real what-is-the-question question here.

Jmac said...

It's shit that some people chose to cherish. It's their problem but sanity needs to be verified...

NickM said...

I'm late to the party, but:

========================
Continuous geographic structure is real, “discrete races” aren’t

By Nick Matzke on February 29, 2012 2:38 PM
http://www.pandasthumb.org/archives/2012/02/continuous-geog.html
========================

John Harshman said...

Unfortunately, Larry will read this as "There's no way to distinguish Nigerians from Norwegians".

Larry Moran said...

@John and Nick

Mukherjee says,

But if you encounter a man whose great-grandparents came from Nigeria or Sierra Leone, you can say little about the features of this particular man.

Do you agree with him?

Larry Moran said...

@Nick

I understand your point. It depends on a strict interpretation of the word "race" with no allowance for the fact that other scientists have different definitions.

That's fine by me. You can use whatever definition suits your purpose. What I object to is scientists who write as though the issue was settled and the only scientific conclusion is that there are no human races.

It would be much better to state that there are differing opinions within the scientific community about the ways we define human populations that differ in allele frequencies.

There's not much doubt that Norwegians, on average, differ from Nigerians in terms of allele frequencies. Since they are part of the same species, it follows naturally that there will be interbreeding at the edges of the various populations that occupy the space between Norway and Nigeria.

Does that mean that such populations don't really exist?

As you point out in your article, Jerry Coyne disagrees with you. Coyne wrote the book on speciation so it's safe to assume he has some expertise in this field.

Isn't it fair to say that there are differing opinions among scientists about whether there are genetically isolated human populations?

John Harshman said...

Larry,

Jerry Coyne did indeed write the book on speciation (with H. Allen Orr), and it's a great book. But it isn't a book on subspecies, so I'm not sure how that's relevant. I don't actually know what Coyne has to say on human subspecies or races. Can you point me to it?

It isn't a question of a little interbreeding around the edges. As Nick says, it's a more or less continuous geographic distribution. A transect from Nigeria to Norway, perhaps by way of Egypt and the Bosporus, would be appropriate. Where in that transact would you place the division between races?

I take Mukherjee to be referring to genetic features, not just the handful of obvious physical characteristics. And clearly he's right about that; there are few if any private alleles.

Can we all agree to ignore Eric?

John Harshman said...

By the way, Larry, have you read Nick's post? I'm wondering, because none of his points depend on having any sort of strict definition of "race".

Unknown said...

@John: Actually Nicks post makes points that have to do with a strict definition of "race", namely that anyone proposing that there are races, should damn well have a solid definition of the term. So far Larry has mainly claimed that there are genetically isolated populations, which of course is just wrong. I don't think one can reasonably make the claim that a particular theoretical term is useful for science, unless it can be shown to do some work and that generally requires the term to mean something. If the "race-realists" can't give a functional definition of what race is, I think it's reasonable to dismiss them for that reason alone.

Rolf Aalberg said...

Larry says: Isn't it fair to say that there are differing opinions among scientists about whether there are genetically isolated human populations?
If we look at dog genetics, we don't have much of a problem with identifying specific races - from Chihuahua to Mastiff? Yet they are all dogs. Maybe we need a new word because 'race' is a loaded word when used with respect to the human race?

Unknown said...

Well, again we could look at the data. There are studies on how divergent dog breeds are from one another (e.g. Parker et al, 2004 "Genetic Structure of the Purebred Domestic Dog", Science, 304:1160-1164.) and if we went by F_ST (despite Lou Josts points above) we find that it`s .33 for dog breeds. We know that an island model is valid for dog breeding, because what breeders have been doing is very much setting up a situation in which the island model is valid. That's a case where we do have discrete variation and we have a higher degree of differentiation, mainly because of the founder effect. This is a qualitatively and quantitatively different situation than in humans.

judmarc said...

Isn't it fair to say that there are differing opinions among scientists about whether there are genetically isolated human populations?

As a layperson I'm going to ask what may be a loaded question: How "isolated" is "isolated"? Any actual figures available (did I miss them in the discussion above; is Nick Matzke's illustration in the Panda's Thumb article sufficient)?