Genome sequencing is becoming so routine that it's difficult to publish your new genome sequence in a top journal. The trick is to find something unique and exciting about your genome so you can attract the attention of the leading journals. The latest success is the seahorse genome published in the Dec. 15, 2016 issue of Nature (Lin et al., 2016.The species is the tiger tail seahorse Hippocampus comes. The assembled genome is 502Mb or about 1/6th the size of the human genome. The seahorse has 23,458 genes (protein-coding?) or about the same number as most other vertebrates. About 25% of the genome is junk (transposon-related).1
So, what's unique about the seahorse? Just look at the photo. The seahorse doesn't look like any other fish. It has all kinds of specific features that make it a very weird fish. We don't know if these derived features are adaptive or not but we do know they have evolved in the seahorse lineage over the past 100 million years.
By itself, this phenotypic uniqueness probably wouldn't be enough to merit publication of the seahorse genome in Nature. Rapid evolution of phenotypes is not unusual and it's hard to pinpoint the changes at the molecular level. However, in this case the authors claim to have discovered a higher overall rate of evolution in seahorses compared to other fish.
They looked at a set of 4,122 orthologous genes and calculated mutation rates. The results are shown in Figure 1 in the paper.
Figure 1 | Adaptations and evolutionary rate of H. comes. a, Schematic diagram of a pregnant male seahorse. b, The phylogenetic tree generated using protein sequences. The values on the branches are the distances (number of substitutions per site) between each of the teleost fishes and the spotted gar (outgroup). Spotted gar, Lepisosteus oculatus; zebrafish, Danio rerio.The differences in distance are quite small, ranging from 94% tp 99% of the seahorse value (0.463) in the major clade. Nevertheless, the authors claim the difference is statistically significant. They also looked specifically at neutral changes and found the same thing—faster in seahorses. The implication is that the strange morphological differences between seahorses and other species of fish can be explained by a faster mutation rate.
Here's the problem. I have no idea how they came up with these numbers. I can't possibly evaluate the quality of their data to know whether it's believable or not. Clearly the Nature referees thought it was good enough to publish. Those referees must be experts in this kind of analysis. Can someone out there help me understand the quality of this analysis? Here's the description of their method.
We obtained 4,122 one-to-one orthologous genes from the gene family analysis (Supplementary Information, section 4.1). The protein sequences of one-to-one orthologous genes were aligned using MUSCLE48 with the default parameters. We then filtered the saturated sites and poorly aligned regions using trimAl (ref. 49) with the parameters “-gt 0.8 –st 0.001 –cons 60”. After trimming the saturated sites and poorly aligned regions in the concatenated alignment, 2,128,000 amino acids were used for the phylogenomic analysis. The trimmed protein alignments were used as a guide to align corresponding coding sequences (CDSs). The aligned protein and the fourfold degenerate sites in the CDSs were each concatenated into a super gene using an in-house Perl script.
The phylogenomic tree was reconstructed using RAxML version 8.1.19 (ref. 50) based on concatenated protein sequences. Specifically, we used the PROTGAMMAAUTO parameter to select the optimal amino acid substitution model, specified spotted gar as the outgroup, and evaluated the robustness of the result using 100 bootstraps. To compare the neutral mutation rate of different species, we also generated a phylogeny based on fourfold degenerate sites. The phylogenomic topology was used as input and the “-f e” option in RAxML was used to optimize the branch lengths of the input tree using the alignment of fourfold degenerate sites under the general time reversible (GTR) model as suggested by ModelGenerator version 0.85 (ref. 51). We calculated the pairwise distances to the outgroup (spotted gar) based on the optimized branch length of the neutral tree using the cophenetic.phylo module in the R-package APE52. The Bayesian relaxed-molecular clock (BRMC) method, implemented in the MCMCTree program53, was used to estimate the divergence time between different species. The concatenated CDS of one-to-one orthologous genes and the phylogenomics topology were used as inputs. Two calibration time points based on fossil records, O. latipes–T. nigroviridis (~96.9–150.9 million years ago (Mya)), and D. rerio–G. aculeatus (~149.85–165.2 Mya) (http://www.fossilrecord.net/dateaclade/index.html), were used as constraints in the MCMCTree estimation. Specifically, we used the correlated molecular clock and REV substitution model in our calculation. The MCMC process was run for 5,000,000 steps and sampled every 5,000 steps. MCMCTree suggested that H. comes diverged from the common ancestor of stickleback, Nile tilapia, platyfish, fugu, and medaka approximately 103.8 Mya, which corresponds to the Cretaceous period.
How do the calibration time points figure into the calculation? Does it make a difference if these time points are off by 5% or so?
The idea that seahorses evolve faster than other fish will now be incorporated into the scientific literature as a result of this publication. (See the cover of Nature, left.) But is it true?
There was a time when I could read a scientific paper in my field and evaluate the quality of the work and the validity of the conclusions. That time has passed with the reliance on big data and computer programs. Now I have to rely on the (presumably) expert reviewers to evaluate the quality of the work. That's a problem since we have plenty of evidence that the peer review process is seriously flawed.
Photo Credit: Aquariums Vietnam - International
1. Human genes take up about 30% of the total genome or approximately 960Mb. This is more DNA that the total genome of the seahorse. I assume the seahorse genes have smaller introns but that's not mentioned in the paper.
Lin, Q., Fan, S., Zhang, Y., Xu, M., Zhang, H., Yang, Y., Lee, A.P., Woltering, J.M., Ravi, V., and Gunter, H.M. (2016) The seahorse genome and the evolution of its specialized morphology. Nature, 540:395-399. [doi: 10.1038/nature20595]