Tuesday, November 07, 2017

Contaminated genome sequences

The authors of the original draft of the human genome sequence claimed that hundreds of genes had been acquired from bacteria by lateral gene transfer (LGT) (Lander et al., 2001). This claim was abandoned when the "finished" sequence was published a few years later (International Human Genome Consortium, 2004) because others had shown that the data was easily explained by differential gene loss in other lineages or by bacterial contamination in the draft sequence (see Salzberg, 2017).

Subsequent papers on eukaryotic genome sequences frequently reported the presence of several hundred bacterial genes due to LGT. The most extraordinary claim was that 17% of a tardigrade genome was due to LGT (Boothby et al., 2016). This claim led to the creation of a giant tardigrade that controlled the displacement-activated spore hub drive on Star Trek: Discovery. It was able to interface with the spore network by incorporating mycelium DNA using lateral gene transfer ['Star Trek: Discovery' Mudd-ies Up Tardigrade Science].

Unfortunately, the creators of the new Star Trek series didn't read the paper that came out a few months later showing that most of the bacterial DNA was due to contamination and not LGT (Koutsovoulos et al., 2016).

Bacteria DNA isn't the only contaminant. Longo et al. (2011) documented many cases of genome sequences that were contaminated with human DNA (Alu sequences). They found 492 genome sequences (out of 2,749) that contained detectable amounts of human DNA. Here's what they say in the abstract of their paper ...
Using a primate specific SINE, AluY, we screened 2,749 non-primate public databases from NCBI, Ensembl, JGI, and UCSC and have found 492 to be contaminated with human sequence. These represent species ranging from bacteria (B. cereus) to plants (Z. mays) to fish (D. rerio) with examples found from most phyla. The identification of such extensive contamination of human sequence across databases and sequence types warrants caution among the sequencing community in future sequencing efforts, such as human re-sequencing. We discuss issues this may raise as well as present data that gives insight as to how this may be occurring.
The take-home lesson is that draft sequences are often unreliable. Additional analysis (curation/annotation) often reveals numerous examples of contamination from unrelated sequences (e.g. Yoshida et al., 2017).

Boothby, T.C., Tenlen, J.R., Smith, F.W., Wang, J.R., Patanella, K.A., Nishimura, E.O., Tintori, S.C., Li, Q., Jones, C.D., and Yandell, M. (2015) Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade. Proc. Natl. Acad. Sci. (USA), 112:15976-15981. [doi: 10.1073/pnas.1510461112 ]

International Human Genome Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature, 431(7011), 931-945. [doi: 10.1038/nature03001]

Koutsovoulos, G., Kumar, S., Laetsch, D.R., Stevens, L., Daub, J., Conlon, C., Maroon, H., Thomas, F., Aboobaker, A.A., and Blaxter, M. (2016) No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini. Proc. Natl. Acad. Sci. (USA), 113:5053-5058. [doi: 10.1073/pnas.1600338113 ]

Lander, E. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409:860-921. [doi: 10.1038/35057062]

Longo, M.S., O'Neill, M.J., and O'Neill, R.J. (2011) Abundant human DNA contamination identified in non-primate genome databases. PloS One, 6:e16410. [doi: 10.1371/journal.pone.0016410]

Salzberg, S. L. (2017) Horizontal gene transfer is not a hallmark of the human genome. Genome Biology, 18:85. [doi: 10.1186/s13059-017-1214-2]

Yoshida, Y., Koutsovoulos, G., Laetsch, D.R., Stevens, L., Kumar, S., Horikawa, D. D., Ishino, K., Komine, S., Kunieda, T., and Tomita, M. (2017) Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus. PLoS Biology, 15:e2002266. [doi: 10.1371/journal.pbio.2002266]

No comments :

Post a Comment