More Recent Comments

Friday, July 01, 2016

How to read the scientific literature?

Science addressed the problem of How to (seriously) read a scientific paper by asking a group of Ph.D. students, post-docs, and scientists how they read the scientific literature. None of the answers will surprise you. The general theme is that you read the abstract to see if the work is relevant then skim the figures and the conclusions before buckling down to slog through the entire paper.

None of the respondents address the most serious problems such as trying to figure out what the researchers actually did while not having a clue how they did it. Nor do they address the serious issue of misleading conclusions and faulty logic.

I asked on Facebook whether we could teach undergraduates to read the primary scientific literature. I'm skeptical since I believe it takes a great deal of experience to be able to profitably read recent scientific papers and it takes a great deal of knowledge of fundamental concepts and principles. We know from experience that many professional scientists can be taken in by papers that are published in the scientific literature. Arseniclife is one example and the ENCODE papers published in September 2012 are another. If professional scientists can be fooled, how are we going to teach undergraduates to be skeptical?

I've addressed this issue before. Back in 2013 I wrote about a Nature paper that looked at promoter sites in the human genome. The authors concluded that there may be 500,000 active promoters that are probably "functional and specific" [Transcription Initiation Sites: Do You Think This Is Reasonable?].

This conclusion is almost certainly wrong but there are probably only a handful of scientists in the entire world who can understand the science in this paper and figure out what went wrong. This is a problem. I know something about this subject but I have no idea what the scientists did. The work is completely opaque to most scientists. This paper was subsequently retracted! [Transcription Initiation Sites: Do You Think This Is Reasonable? (revisited)]

A few months later I looked at another Nature paper on transcription. This one describes an attempt to identify the relationship between variation in gene expression levels and genetic differences in mouse strains. It was extremely difficult to understand the paper and, as it turns out, I didn't succeed. Several Sandwalk readers pointed out differing interpretations of the data and the experiments. There was general agreement that the paper was badly written [ see: Do you understand this Nature paper on transcription factor binding in different mouse strains?].

So, I decided to re-visit this problem by opening up the latest issue of Nature to see if I could learn anything from reading the primary literature.

Just by chance, there happened to be a paper on the effect of genetic variation in mouse strains on gene expression. This is the same topic I addressed in late 2013. Here's the recent paper with it's abstract ...
Chick, J. M., Munger, S. C., Simecek, P., Huttlin, E. L., Choi, K., Gatti, D. M., Raghupathy, N., Svenson, K. L., Churchill, G. A., and Gygi, S. P. (2016) Defining the consequences of genetic variation on a proteome-wide scale. Nature 534:500-505. [doi: 10.1038/nature18270]

Genetic variation modulates protein expression through both transcriptional and post-transcriptional mechanisms. To characterize the consequences of natural genetic diversity on the proteome, here we combine a multiplexed, mass spectrometry-based method for protein quantification with an emerging outbred mouse model containing extensive genetic variation from eight inbred founder strains. By measuring genome-wide transcript and protein expression in livers from 192 Diversity outbred mice, we identify 2,866 protein quantitative trait loci (pQTL) with twice as many local as distant genetic variants. These data support distinct transcriptional and post-transcriptional models underlying the observed pQTL effects. Using a sensitive approach to mediation analysis, we often identified a second protein or transcript as the causal mediator of distant pQTL. Our analysis reveals an extensive network of direct protein–protein interactions. Finally, we show that local genotype can provide accurate predictions of protein abundance in an independent cohort of collaborative cross mice.

The first thing I noticed was that the 2013 paper wasn't in the reference list! That's because it was retracted.

The second thing I noticed was that the abstract didn't really tell me anything about the conclusions. How many genes are regulated differently in the various mouse strains because of genetic difference between the strains? How many of those genetic differences are directly due to changes in promoters and enhancers? How do the authors tell the difference between stochastic variation and variation due to sequence differences in the genomes.

The third thing I noticed was the opening paragraph of the introduction.
Regulation of protein abundance is vital to cellular functions and environmental response. According to the central dogma, [ref: Crick, 1970] the coding sequence of DNA is transcribed into mRNA (transcript), which in turn is translated into protein. Although rates of transcription, translation and degradation of both transcript and protein vary, under this simplest model of regulation, the cellular pool of a protein is determined by the abundance of its corresponding transcript. Genetic or environmental perturbations that alter transcription would directly affect protein abundance. In reality, many layers of regulation intervene in this process, and numerous studies have been carried out to determine whether and to what extent transcript abundance is a predictor of protein abundance2, 3, 4, 5, 6. Several studies have reported that there is generally a low correlation between the two. An emerging consensus is that much of the protein constituent of the cell is buffered against transcriptional variation4, 7, but a global perspective of protein buffering has not been put forward.
Once I realized that the authors had not read the very first paper they referenced, I figured this was not going to be a good paper. Nevertheless, I persevered because I'm very interested in the problem.

I started to lose interest on the second page when I read ...
We identified 2,866 pQTL for 2,552 distinct proteins at a genome-wide significance level of P < 0.1 (Fig. 2a). This is the largest set of pQTL identified so far, with tenfold greater numbers than other mass spectrometry (MS)-based approaches. Significant local pQTL were more common than distant pQTL (1,736 local and 1,130 distant pQTL) (Extended Data Fig. 3g). In addition, we identified 4,188 significant eQTL among 3,706 genes, with threefold more local than distant associations at the transcript level (3,211 local and 977 distant eQTL; Fig. 2a, Extended Data Fig. 3h, i). Finally, to examine the replication rate, we analysed a replication set of 192 separate DO mice treated under identical conditions for eQTL (see Methods and Extended Data Fig. 4). To determine whether the same genetic loci acted on transcript and protein abundance, we first compared the QTL maps. We observed a significant overlap of proteins with pQTL and eQTL (n = 1,400; hypergeometric P < 1 × 10−16; Fig. 2a). As expected, genes with concordant QTL had generally higher correlations between protein and transcript abundance compared to those having only pQTL, only eQTL or neither (Fig. 2b). Among local QTL only, we observed a high degree of overlap with 80% of local pQTL having a corresponding local eQTL. The small number of local pQTL that lack corresponding eQTL (n = 344) could result from genetic variation that regulated protein abundance via post-transcriptional mechanisms such as coding variation that affected protein stability without altering transcript levels. In contrast, distant genetic variants that affected both transcript and protein levels seem to be nearly mutually exclusive (Fig. 2a). This observation leads to the intriguing hypothesis that most distant pQTL affected the abundance of a target protein via post-transcriptional mechanism(s).

I suppose I could figure out what they mean if I was willing to download the supplemental information and spend a good deal of time trying to learn the jargon. (What's the difference between "local" and "distant" eQTL?)

It's not worth the effort.

Here's the conclusion.
This study quantified both protein and transcript abundance in a genetically diverse population of mice, mapping their genetic architecture. We identified the largest catalogue of pQTL so far, which can be attributed to two variables in our experimental design. First, we have improved the accuracy and sensitivity of quantification for both protein and transcript abundance. Second, our experimental population captured genetic diversity far in excess of the human population and standard laboratory mouse strains. Earlier studies reported a disconnect between transcript and protein abundance2, 3, 6, which has also been a conclusion drawn from several recent eQTL–pQTL analyses4, 7, 17, 35. Data here show that local QTL tend to abide by the central dogma as demonstrated by concordant effects on transcripts and proteins, whereas distant pQTL are conferred by post-transcriptional mechanisms. Our mediation analysis provided the ability to identify causal protein intermediates underlying distant pQTL and led to the identification of hundreds of protein–protein associations. Our experimental design provides an advantage over protein interaction maps because genetic mapping is not dependent on physical interactions. This conclusion is further exemplified by the co-regulation of protein complexes or biochemical pathways in this study. Stoichiometric buffering provides one explanation for co-regulation of protein complexes and may account for earlier observations that protein abundances (but not transcript abundances) of orthologues are well-conserved across large evolutionary distances36, 37.

These findings suggest a new predictive genomics framework in which quantitative proteomics and transcriptomics are combined in the analysis of a discovery population like the DO to identify genetic interactions. Next, pathways relevant to the tissue/physiological phenotype of interest are intersected with the list of significant pQTL. Pathways enriched for proteins with significant pQTL should be amenable to manipulation in the founder and CC strains. That is, the founder allele effects inferred at the pQTL can be combined in such a way via crosses of CC strains to tune pathway output. Moreover, as we better understand the types of mutation that can affect protein abundance, we can introduce specific mutations with gene editing into sensitized or robust genetic backgrounds. We foresee this strategy being used to design reproducible rodent models that span a range of human-relevant phenotypes, for example, in drug metabolism or toxicology studies.
In my opinion, the scientific literature is becoming unreachable for most scientists. How many people interested in science can read this paper and understand it, let alone evaluate it? If you are one, then please let me know in the comments.

I can't imagine how any undergraduate could profit by reading this paper without a great deal of help. If the teacher really understands what was done, wouldn't it be far easier to just explain the result to the students?

I blame the journals for this situation. Maybe it's only a problem in genomics and proteomics but even if it's confined to those disciplines, something has to be done.

I can't read the primary scientific literature any more because a lot of it is incomprehensible. Most of the rest is just wrong or misleading.

Image Credits: The first photo is from: Improve Your Reading of Scientific Papers. The second is from: Scientific papers, civil disobedience and personal networks.


  1. I can't imagine how any undergraduate could profit by reading this paper without a great deal of help. If the teacher really understands what was done, wouldn't it be far easier to just explain the result to the students?

    No. The point is that if (at least some) of these students are going to be professional scientists, they will have to learn to read the literature sometime, and it's better if they do so before grad school. I'm not saying freshmen should be forced to read the literature, but by the time I was a junior most classes used the original literature rather than a textbook.

    1. Yes undergrads need to be introduced to the primary literature because even the most straight forward papers are initially incomprehenible to them, as compared to the words in textbooks. The more they are forced to read, the more they will become familiar with the language and structure used in manuscripts.

      But I also recall as an undergrad being asked to write critiques of papers. Then, as now, I found this an exercise in futility. I (we) simply did not know enough about molecular biology or the techniques used to do this.

      As a teacher now myself, I think it is enough to ask students to understand the paper and be able to communicate their understanding to their peers. I do ask them to at least try assessing the quality of the paper but for the most part I take the lead in pointing out questionable figures and conclusions.

      I try to assign papers that are conceptually simple... not the kind of papers Larry is talking about to be sure.

  2. As a non-scientist, but with some science training, the vast majority of papers, even in fields that I study are not accessible.

    There's a lot of very subtle things that not even scientists seem to catch. Things like Larry mentioned above and some consideration of whether the methods are appropriate. It's impossible for a non-specialist to determine if methods are appropriate, but that, I've found is where a lot of the mistakes are made.

    It's fairly easy for even non-experts (like me) to look at data and determine that the conclusions are useless. But how do I know that the data is valid?

    It's very frustrating and why I don't do as much science writing as I used to.

  3. I suppose I could figure out what they mean if I was willing to download the supplemental information and spend a good deal of time trying to learn the jargon. (What's the difference between "local" and "distant" eQTL?)

    I'm not a fan of papers that coin their own obscure terminology, but you are being a bit unfair with this point. Those are standard terms and are pretty self-explanatory rather than some bit of Latin or Greek pretentiousness. A local eQTL is one that maps locally to the parent gene and a distant one is one that doesn't.

    1. Thanks. How distant is "distant" and how local is "local"? Assuming there is a distinct transcription star site, how far away does a site have to be in order to qualify as "distant"?

      And do you have to provide evidence that the site actually functions in regulating gene expression or is it sufficient to just identify a sequence variant somewhere in the vicinity of a gene?

    2. By definition an eQTL is a DNA variant that has been shown to effect gene expression. If it doesn't, it's not an eQTL. As for "local" vs "distant", as I said above, it isn't a a matter of number of basepairs away. If a variant within an annotated gene affects its own expression, it's "local". If the variant is located outside of it, it's "distant".

    3. ... eQTL is a DNA variant that has been shown to effect gene expression.

      I'll take your word for it but It seems strange that they would have tested thousands of variants to prove that they were actually responsible for the difference in gene expression.

      If a variant within an annotated gene affects its own expression, it's "local". If the variant is located outside of it, it's "distant".

      What does it mean to say "within an annotated gene"? Does that mean the sequence variant is within the transcribed region?

    4. It's not about "proving" anything. QTL analysis is not unlike phylogenetic analysis in that it is a statistical analysis of data rather than a direct observation. Just as there are tools like PAUP and PHYLIP to do phylogenetic analysis, so there are QTL tools. The authors appear to be using the DOQTL R package.

  4. Of the thousands if not tens of thousands of papers published every month, the fact that a subset are obtuse does not lead to the conclusion that you can't read the scientific literature nor that students can't. Are you assuming teachers are randomly picking papers to give to students, such that papers like the one described here are chosen? I teach an entire course for seniors on eukaryotic microbiology that uses the primary literature. I pick papers for a specific reason(s) for each topic we cover and in some cases I pick a paper that has a clear deficiency so we can talk about it. (and no I would not pick the paper you mentioned above because there really isn't an interesting or interpretable conclusion in the paper nor would I be able to use the paper as a pedagogical tool.)

    1. How do you explain to your students that some papers published in the scientific literature contain a "clear deficiency" that even they could recognize?

      If you recognize papers in your own field as having serious flaws, don't you think there are just as many in other fields that you don't recognize?

    2. You do it by asking students to find a limitation (data that doesn't support a conclusion, data that was ignored, like a spurious band on a gel, a conclusion with only a single data point to support it. Then have the students discuss their 'limitations' and provide some information or ask additional questions to get them thinking.
      And yes I think there are 'flaws' in every paper, generally those do not rise to the level of undercutting the primary conclusions. Maybe you think otherwise and there are no issues or conceptual problems in most papers. Regardless, I don't read the literature as gospel and think it's important to teach students approaches to read critically. Even for the best papers, in my opinion a critical reader should have some degree of skepticism.

    3. I think that's a very good exercise. Most students don't get that till Journal Club in grad school. Many profs will accumulate a list a papers in their field, some with conspicuous flaws that they want students to find. Any chance you'd be willing to share some of your references with the take home lesson?

    4. @Iantog It's a bit difficult up front because each topic we cover is its own unit. I plan on posting the papers and topics we cover in my course this fall on my blog but I've said that last year too....

  5. I would always be careful with blanket statements on the lines of "the scientific literature is becoming unreachable" or, as I recently read in a different place "scientific journals hide everything away in online supplements".

    First, there are many different areas of science, each with their own practices. In my field there is the odd modelling-heavy paper that I cannot really assess, but I would guess that upwards of 90% of the stuff that appears in the field is easily and quickly understandable by the average systematist.

    Second, why are Nature and Science always the standard of all things? I'd say they are the exact opposite, they are outliers, and not in a good way. They are two out of hundreds of serious journals publishing natural science. In this large field, they are part of a minuscule minority characterised by

    - ridiculously low word limits, meaning that nothing can be properly explained in the main text;

    - having the methods section at the end, meaning that the reader cannot possibly understand the results if they read the text in order;

    - having the reference list formatted in the most cryptic way possible; and

    - focusing on whether he topic of a manuscript is related to the next big thing instead of whether the study was sound.

    Most other scientific journals take approaches that make papers much easier to read and to understand.

    1. Saying that the scientific literature is becoming unreachable is not the same as saying that every scientific paper is incomprehensible. Many papers are still relatively easy to understand.

      The problem in my field is that some grandiose claims are being made based on evidence and data that can't be interpreted or understood by the average scientist. This is especially true in genomics and proteomics papers where almost nobody knows how the data was obtained and how it was manipulated by various software packages.

      The recent paper I described is an example. It's pushing the idea that mammalian genomes are chock full of regulatory sequences, some of which are at great distances from promoters. I don't believe that's true but I have no way of evaluating their experiments.

      I'm not saying that Nature and Science are the only journals that are guilty but they do have a mandate to deliver science that's comprehensible to their average reader. They are not lliving up to their objective.

      There are lots of other journals publishing unreadable papers in my field.

    2. I think the issue is less with the papers than with the field. I don't think the authors necessarily fully understand the full path from taking tissue samples to getting an annotated genome. The main sequencing labs used aren't always the most forthcomming about their "in house" methods. The software used for assemblies is usually the most current stable version of said software, which can differ quite a bit from the last published version of the software in the algorithms used. And there doesn't seem to be a consensous on which tools produce the best results. Then the final assembly is fed into an annotation pipeline, which usually consists of various tools, for all of which the above holds. And of course a lot of these tools use machine learning techniques, which rely on a set of training data that the software uses to construct a model that is then used to make predictions, but that model itself is inaccessible to researchers. And of course there can be issues with the training data set that affect results. One common issue is that some sequence is recognized as a gene, but the closest manually annotated match comes from a distantly related species and has a widely different function, which of course the software tools have no data about. So it gets annotated with the incorrect function and although the software might tag it as unreliable, as soon as it hits genome databases it gets used to train these annotation pipelines, which will then get another hit tenatively associated with a function and thus find it more reliable.

  6. I have no hope of evaluating that paper. I think one of the issues is that as science advances new tools and techniques are introduced that, if you haven't gotten a solid foundation in them in a lab or classroom, you can't really pick up on your own without tremendous effort. I remember older profs in the early 80s who felt out of their element with the molecular revolution.

  7. Undergrads reading original literature = their professors have absolutely no clue or are lazy to the max. Both of these most of the time, actually.

    The typical pearl is "let's critique this Nature paper next week" (it's *always* a Nature paper, BTW)

    1. That's a bit extreme I would say. Undergrads, by year 3 at the latest, should be increasingly compelled to access the primary literature.

    2. Why should they be "increasingly compelled"? Is it because by year 3 they have learned all the big picture basic concepts and principles and the only way forward is to delve into the nitty-gritty of the primary literature?

      Keep in mind that a large percentage of undergraduates will never need to read the primary scientific literature for the rest of their lives but they will all have to make decisions requiring a knowledge of science.

      Let's make sure we teach basic concepts and critical thinking before we "compel" undergraduates to study our own specific research interests.

    3. This is so dependent on discipline I doubt there can be such general statements. In the geosciences reading the primary literature is pretty much mandatory in every job one could get (there's a local geological map and a publication that went along with that discussing how the map was constructed for instance. If there isn't you are like constructing one, in which case you are looking at the primary literature for the region - how do they define geological units, etc.). As a result you generally have to read papers even during your first semester - locally there is a field trip that is part of the introduction to paleontology course and students have to write a short text on the localities visited. For any field trip there is a selection of literature in the institute library which students are supposed to use. Nobody is asking them to be able to discuss all of it in detail, but if there are two papers which assign ages to a quarry one based on pollen and one based on radiometric dating it is expected that when they discuss the quarry they cite one of these papers for the age. If somebody cites both and notes that the radiometric dating supports the earlier dating from the pollen, so much the better. There are quite a few seminars where you have to read the primary literature and then give a short talk. The key here is that you meet with the TA and discuss what you've taken from the literature so that any misconceptions can be cleared up (and usually some people won't meet up with the TA and give crazy talks. But really you teach reading papers by having students read them and then going through what they've missed. Preferably not in public and without imminent grading. I did more of these in my days as an undergrad than I had to, because it was one of the things where I could personally track my advancement. You go from missing obvious things to missing that size 10 qualifying statement in the caption of supplementary figure 5 to asking why in formula (4) there is a z, while you think it should be a g and get told that it's a typo in the paper. Of course now these types of courses are getting scaled down in favor of more multiple choice tests.