Testing the Macaque Genome

We've already been looking at the macaque genome for several months but now that the genome paper is being published I thought some of you might be interested in how the preliminary data stacks up to what we expect.

I'm interested in a family of gene known as the HSP70 gene family. The genes encode the major cellular chaperone that's responsible for correct protein folding. HSP70 is the most highly conserved gene known [Evolution of the HSP70 Gene Family].

We know how many genes there are in mammalian genomes so we can search the macaque genome at Rhesus Macaque Genome Resources to see if the expected genes are present. Here's the result.
HSPA1A: not present, probably due to incomplete sequence or annotation

HSPA1B: correct gene/protein

HSPA1L: correct gene/protein + one incorrect isoform that's really a splicing artifact

HSPA2: not present, probably due to incomplete genome or annotation

HSPA5/BiP: correct gene/protein + one incorrect alternatively spliced isoform that's really an artifact

HSPA8: one single correct gene/protein

HSP9B/mtHSP70: correct gene/protein + three incorrect isoforms generated by EST artifacts
That's not too bad for an initial draft sequence. Two genes are missing and so are several pseudogenes. I assume they'll turn up later when the genome sequence is being finished. Most of the splicing artifacts have been ignored by the annotators but a few have slipped through. They'll be deleted later on when the annotators are informed that the isoforms don't exist.

All in all, this is much better than most genome sequences at this stage. It's a bit better than the chimp genome but still a long way from the quality of the human genome. The mouse genome is almost as good as the human genome. Keep in mind that dozens of labs have been working on the human genome annotation for over six years since the sequence was first published. The cow, dog, frog and several fish genomes are in much worse shape and the chicken and sea urchin genomes are practically useless.

Horse, opossum, rat, pig, rabbit, cat, sheep, tree shrew, guinea pig, hedgehog, elephant, and platypus genomes are still at the assembly stage [Ensemble Genome Browser].

