More Recent Comments

Friday, March 03, 2023

Do you understand the scientific literature?

I'm finding it increasingly difficult to understand the scientific literature even in subjects that I've been following for decades. Is it just because I'm getting too old to keep up?

Here's an example of a paper that I'd like to understand but after reading the abstract and the introduction I gave up. I'll quote the first paragraph of the introduction to see if any Sandwalk readers can do better.

I'm not talking about the paper being a complete mystery; I can figure out roughly what's it's about. What I'm thinking is that the opening paragraph could have been written in a way that makes the goals of the research much more comprehensible to average scientifically-literate people.

Weiner, D. J., Nadig, A., Jagadeesh, K. A., Dey, K. K., Neale, B. M., Robinson, E. B., ... & O’Connor, L. J. (2023) Polygenic architecture of rare coding variation across 394,783 exomes. Nature 614:492-499. [doi = 10.1038/s41586-022-05684-z]

Genome-wide association studies (GWAS) have identified thousands of common variants that are associated with common diseases and traits. Common variants have small effect sizes individually, but they combine to explain a large fraction of common disease heritability. More recently, sequencing studies have identified hundreds of genes containing rare coding variants, and these variants can have much larger effect sizes. However, it is unclear how much heritability rare variants explain in aggregate, or more generally, how common-variant and rare-variant architecture compare: whether they are equally polygenic; whether they implicate the same genes, cell types and genetically correlated risk factors; and whether rare variants will contribute meaningfully to population risk stratification.

The first question that comes to mind is whether the variant that's associated with a common disease is the cause of that disease or merely linked to the actual cause. In other words, are the associated variants responsible for the "effect size"? It sounds like the answer is "yes" in this case. Has that been firmly esablished in the GWAS field?


Joe Felsenstein said...

I am not up on the GWAS literature, but I think that
1. Individual SNP loci that are found by GWAS to be correlated with a trait may not be responsible for any variation in that trait, if they happen to be in linkage disequilibrium ("LD") with nearby variants that one can't detect. Genetic drift or admixture can cause that LD.
2. When the data are complete genome sequences, one should be able to either untangle these associations, or at least quantify the uncertainty as to which site is causal.
3. Note that the paper discusses whether effect sizes are larger in rarer alleles. But that can be an artifact. QTLs can be harder to detect if (a) they have smaller effects, or (b) if their allele frequencies are more extreme. Thus one will have a harder time seeing effects that are smaller if the allele frequencies are more extreme, leaving us with a pattern that seems to show that effect sizes are bigger for rarer alleles.

Larry Moran said...


I thought #1 was true. As for #2, in theory the British 100,000 Genomes Project might be able to answer that question by seeing if the query SNP was linked to any other nearby variant. You would need at least 100,000 genome sequences in order to have any meaningful data, right? I don't know if they confirmed those associations. I wonder how far away the variants have to be to not show significant linkage disequilibrium? I bet you know the answer for a reasonable sample size.

All of this could have been explained in a few sentences in the introduction to the paper. I wonder why the reviewers didn't ask for it.

Joe Felsenstein said...


If you have full sequences, with both copies sequenced and all haplotypes resolved, you don't need to ask about LD. You can simply
see whether there is association, and use regression methods to try to infer whether the effect is due to alleles at one locus, the other, or both. And to what extent there is uncertainty.