More Recent Comments

Wednesday, January 23, 2019

What happens when twins get their DNA tested?

The Canadian Broadcastng Company (CBC) has a TV show called Marketplace that promotes itself as an advocate of consumers' rights. It has a history of testing the claims of advertisers and usually shows that these claims are misleading or false. Here's what they say on their website.
On air since 1972, Marketplace is Canada’s consumer watchdog. We get the goods to help you shop smarter and protect yourself from slick scams and misleading marketing claims. We investigate the products and services we all use every day and push companies and government for answers. And we expose the truth on stories that matter to you and your family.
Last spring, one of the hosts of the show, Charlise Agro, decided to get her DNA tested by five of the leading ancestry sites to see how accurate they were at predicting where her ancestors came from. The twist in this story is that she has an identical twin sister, Carley Agro, who submitted her DNA to the same five companies. The results were widely reported on the Canadian National News and in social media and the dominant theme was that the results were very different for the two sisters calling into question the claims of companies like Ancestry.com, 23andMe, MyHeritage, Living Your Ancestry, and Family Tree. The on-air show was also pretty negative about the ancestry results and the fact that the twins had different results [Twins get some 'mystifying' results when they put 5 ancestry DNA kits to the test]. Let's see whether the negative press was justified

The results from the two most popular DNA testing companies, Ancestry.com and 23andMe, were pretty accurate so I'm going to ignore the other companies. I'll concentrate on explaining the Ancestry.com results since I know more about that service and I've posted a couple of articles showing that my results were quite accurate at predicting where my ancestors came from and who I'm related to [On the accuracy of Ancestry.com DNA predictions] [My DNA story].

Here are the twins' results from Ancestry.com and 23andMe.



There are two issues here and I'll discuss them separately: (1) why are the tests not identical for identical twins, and (2) how accurate is the estimate of where their ancestors come from?

Why are the results not identical?

Adult identical twins do not have identical DNA sequences in every cell of their bodies. That's the first myth we have to dispel. Yes, it's true that they come from the same zygote so their DNA should be very similar but it won't be identical because of mutations that occurred subsequent to the splitting of the early embryo. There's some controversy over the somatic cell mutation rate with some workers arguing that it's 3-10× higher than the germ line mutation rate of 0.5 mutations per cell division but let's just use the much more reliable germ line mutation rate to see how different the twins DNA could be when it's extracted from adult epithelial cells (e.g. cheek cells from inside your mouth) [Somatic cell mutation rate in humans].

Epithelial cells divide fairly rapidly but I don't know how many cell divisions have occurred from zygote to cheek cells of an adult. Let's guess that it's 1000 cell divisions—that means 500 mutations in each twin so their DNA will differ at 1000 sites.1 That may not seem like much in a genome of 6.4 billion base pairs but keep in mind that the DNA testing companies are looking at 700,000 bp covering most of the hotspots where the mutation rates are higher than normal. Chances are pretty good that they'll detect a few of these differences so the twins DNA results will not be identical because of somatic cell mutations.

But that's probably not the main source of the difference between the twins' DNA results. The main problem by far is due to the way the tests are done which is by hybridizing the customers' DNA to DNA on a microchip and reading the chip to see if there's a match. (Ancestry.com uses the latest Illumina microchip that assays 700,000 SNPs.) I think the rate of false positives is quite low but the rate of false negatives is about 2% according to 23andMe [Ancestry]. The absence of a match where there should be one can be due to bad luck and differences in the threshold level of binding that constitutes a "hit." It's these "no-reads" that makes up most of the false negatives. Because of these limitations of the assay the twins' DNA results could differ by 2-4% of the SNPs being tested.

Charlise Agro visited the lab of Mark Gerstein at Yale University to see if he could explain why the sisters' DNA was not identical. Gerstein says the following on the video ....
I have to say that one really shocked us. I mean, we expected two identical twins to have the exact same ancestry, and they should. So the fact that they present different results between you and your sister I find very mystifying ...
His group looked at the raw data and found that the DNA from the twins was between 98.4% and 99.7% in agreement—a result they report as statistically identical. In the case of Ancestry.com, for example, the company looked at 664,429 sites and 656,197 (98.8%) were identical. Nevertheless, there were still more than 8,000 sites that differed between the two twins. (This is probably due to no-reads on one or the other of the twins' microchips and it's a lower frequency that I estimated above.) Gerstein's group doesn't explain why identical twins' DNA wouldn't be identical and that's a missed opportunity to educate the public on the accuracy of these tests.

Are the ancestry predictions accurate?

In order to predict where your ancestors came from, the testing company needs to compare your haplotypes to a large database of people from different parts of the world. If you have a particular haplotype, say XYZ, and people from Italy have a high frequency of the XYZ haplotype then chances are good that you have Italian ancestors. The accuracy of this prediction depends to a large extent on the size of of the database and that's why the results from Ancestry.com and 23andMe are bound to be more accurate than the predictions from smaller companies.

There are many ways of parsing the haplotype data to divide it into geographical regions and there are different ways of labeling those regions. The clustering algorithms are constantly being improved as more and more data comes in and this is why Ancestry.com recently revised its ancestry predictions but it's not surprising that two different companies would give slightly different predictions and that's why you see different percentages of Italian and Eastern European ancestry when comparing Ancestry.com and 23andMe. The companies agree that almost two thirds of the twins' DNA comes from ancestors who lived in Italy and Eastern Europe but they apportion those predictions differently. I suspect this is largely due to differences in clustering and labeling; for example, if a haplotype is common in the Trieste region of Northern Italy do you include it in "Italian" or "Eastern European"? The fact that the percentages are different for each twin is probably due to differences in how the algorithms handled the slight differences in the microchip data due to false negatives.

We aren't told very much about the ancestors of the Agro twins beyond the fact that some of them, presumably on their father's side, are from Sicily and some are Polish/Ukranian. It's too bad that they didn't report more about their genealogy so they could confirm that the DNA results were accurate.

I conclude that for Ancestry.com and 23andMe the results are consistent with identical twins given that their DNA is not identical and that the assay has an associated error rate. I conclude that the ancestry predictions are probably fairly accurate given the current state of the databases and the quality of the clustering algorithms although I didn't expect such a big difference between Ancestry.com and 23andMe. Nevertheless, it's clear that the Agro twins' immediate ancestors are from Italy/Sicily (father) and Eastern Europe (mother) and that fits with what they said in the show.


1. This will depend on the age of the sisters but if you think I'm going to guess their age then you must think I'm crazy.

4 comments :

Unknown said...

When I started to read your article, I thought any differences in identical twin sisters would be a result of X-chromosome inactivation. When you didn't mention that, I wondered if X-chromosome inactivation happens before a zygote twins. Would you elaborate on the timing of these events and if X-chromosome inactivation would affect genetic ancestry testing more or less than somatic mutations?

Unknown said...

X - chromosome inactivation wouldn't change the presence of the DNA, just the expression of the genes on the chromosome.

whimple said...

You could get your own DNA tested again and see if your results agree with your prior results.

Unknown said...

I design microarrays for a living. Yes, the data isn't going to be 100% the same on each run, even if you start with the same tube of DNA extract. (Our spec is 98% reproducibility; a good run is 99.8%, a solid run 99.6%, and below that we get twitchy.) Keep in mind that there's a lengthy process between DNA and data, such that you're just not going to get usable data for every single spot on the array on every single run. It's more often "No Call" than actually wrong, because we have many layers of checks for bad data, but still.

So, the data isn't going to be the same every time. If there's a quality difference, that can easily be enough to punt a DNA region from "Eastern European" to "Broadly European". Between companies, you'll also have differences in the ancestry databases they use, and in the algorithms they use to determine what is a match.

There are scam artists who will basically collect your sample and fake a report. But these companies are all legit, and the data is as consistent as the technology can manage.