A second version of the grapevine genome was published at PLoS ONE last week (Velasco et al. 2007). As I began to collect information on that paper I learned that another genome sequence of grapevine had been published independently last September in Nature (Jaillon et al. 2007). Before discussing the PLoS ONE paper I decided to write up a report of that August genome sequence trying to not let the second sequence influence me [The Grapevine Genome].
This gives us an opportunity to evaluate the state of genome biology and genome evolution by comparing two competing analyses of the same genome. Keep in mind that the authors of the second paper were aware of the first study when they published in PLoS ONE so they had an opportunity to correct or modify their own work in light of the previous paper. Thus, the second group is able to point out "errors" in the first sequence and correct "errors" in their own sequence before publication.
Keep this in mind as you read the second paper because it often seems as though the first group to publish did a very sloppy job. What we don't see in the published work is the evidence of sloppiness in the second study that was fixed by referring to the earlier work.
Velasco et al. (2007) also sequenced the Pinot Noir cultivar of Vitis vinifera but unlike the previous study they used a heterogeneous strain. Recall that in the September paper the sequencing team used an inbred line in order to reduce the extreme heterogeneity seen in normal wine-making strains.
The genome size is 505 Mb (505 × 106 bp). This is larger than the earlier published sequence (487 Mb). The extra DNA is almost entirely due to inclusion of ribosomal RNA clusters. Velasco et al. (2007) identified 29,585 genes—only slightly fewer than the 30,434 genes reported by Jaillon et al. (2007). Both teams used fairly strict criteria for identifying and annotating genes. The number of genes in the grapevine genome is comparable to the number in Arabidopsis (26,819) but fewer than the number in poplar (45,555) and rice (41,046). We can expect this number to fall as false positives are eliminated.
There are 719 tRNA genes (including 163 pseudogenes), 89 snRNA genes, and about 1500 copies of the 18S + 5.8S + 28S ribosomal RNA repeat. There are about 175 copies of the 5S RNA gene.
The authors report 166 copies of snoRNA and 143 copies of microRNAs based on known examples in other plant genomes.
Many plants exhibit very high heterogeneity between homologous chromosomes. Sister chromosomes in the Pinot Noir cultivar differ by as much as 11% in DNA sequence, including large gaps. This gives rise to regions that are hemizygous—they contain only one copy of a DNA sequence in a diploid genome. An example of this heterogeneity is shown below.
Two almost contiguous regions of chromosome 1 are depicted. The red regions are transposons of various kinds (c=Copia, a=Gypsy/athila, etc.). You can see that many of the deletions/insertions are at transposon positions indicating that much of the heterogeneity between sister chromosomes is due to the insertion and excision of active transposons. This level of transposon activity is rare in mammalian genomes but common in flowering plants.
In order to study the evolution of the grapevine genome, Velasco et al. (2007) compared the sequences of paralogous genes. These are genes that belong to a gene family that diverged from a common ancestor. By comparing the differences in sequence between any two genes it is possible to estimate the time of divergence. In order to avoid any bias due to selection, it is preferable to only compare nucleotide substitutions that do not change the amino acid sequence (synonymous substitutions, Ks).
The results are shown in the figure above. Most of the pairs of genes are very similar with 0 or 0.1 substitutions. These genes arose from a very recent duplication event. There is a secondary peak at about 0.9 substitutions indicating that a large number of genes were duplicated at some particular time in the past. If this is evidence of a genome-wide duplication event then these pairs of genes should be clustered in syntenic regions. (Large segments of the chromosome that have the same order of genes.)
The insert (E) shows the distribution of those pairs from syntenic regions. It looks like most of the pairs have accumulated similar numbers of substitutions suggesting strongly that there was a genome-wide duplication event.
It is well known that flowering plant genomes have undergone polyploidization and/or hybridization during their evolution from a common ancestor about 200-300 million years ago. In their September paper in Nature, Jaillon et al. (2007) proposed that the grapevine genome was closer to the common ancestor of dicotyledenous plants. Their analysis suggested that all dicots arose from a hexaploid ancestor (three haploid genome equivalents). Further duplications occurred in the lineages leading to poplar and Arabidospis, according to Jaillon et al. (2007) [The Grapevine Genome].
Velasco et al. (2007) disagree. In the second genome study they claim that the ancestral dicot genome was tetraploid (one round of duplication) and that a second round of duplication (2R) occurred in the grapevine lineage after it diverged from poplar and Arabidopsis (see below). Note that in this study Arabidopsis and poplar are assumed to more closely related to each other than they are to grapevine whereas in the previous study grapevine was clustered with poplar.
A third duplication (3R) took place independently in the lineages leading to Arabidopsis and polar, according to Velasco et al. (2007).
At present, it isn't possible to say who is correct. In fact, they might both be wrong. The significance of these two studies is that it gives us some idea of the level of confidence we can place on speculations about genome evolution. How you interpret your data depends very much on how you compare sequences both within a species and between species. The data does not seem to be good enough to make confident predictions as judged by the differing opinions of these two groups.
The take-home lesson is that we need to take studies of this sort with a large grain of salt. In most cases we won't be lucky enough to have competing labs to analyze the same data and point out differing interpretations.
Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., Vezzi, A., Legeai, F., Hugueney, P., Dasilva, C., Horner, D., Mica, E., Jublot, D., Poulain, J., Bruyère, C., Billault, A., Segurens, B., Gouyvenoux, M., Ugarte, E., Cattonaro, F., Anthouard, V., Vico, V., Del Fabbro, C., Alaux, M., Di Gaspero, G., Dumas, V., Felice, N., Paillard, S., Juman, I., Moroldo, M., Scalabrin, S., Canaguier, A., Le Clainche, I., Malacrida, G., Durand, E., Pesole, G., Laucou, V., Chatelet, P., Merdinoglu, D., Delledonne, M., Pezzotti, M., Lecharny, A., Scarpelli, C., Artiguenave, F., Pè, M.E., Valle, G., Morgante, M., Caboche, M., Adam-Blondon, A.F., Weissenbach, J., Quétier, F., Wincker, P.; French-Italian Public Consortium for Grapevine Genome Characterization (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449:463-467. [PubMed] [Nature]
Velasco, R., Zharkikh, A., Troggio, M., Cartwright, D.A., Cestaro, A., Pruss, D., Pindo, M., Fitzgerald, L.M., Vezzulli, S., Reid, J., Malacarne, G., Iliev, D., Coppola, G., Wardell, B., Micheletti, D., Macalma, T., Facci, M., Mitchell, J.T., Perazzolli, M., Eldredge, G., Gatto, P., Oyzerski, R., Moretto, M., Gutin, N., Stefanini, M., Chen, Y., Segala, C., Davenport, C., Demattè, L., Mraz, A., Battilana, J., Stormo, K., Costa, F., Tao, Q., Si-Ammour, A., Harkins, T., Lackey, A., Perbost, C., Taillon, B., Stella, A., Solovyev, V., Fawcett, J.A., Sterck, L., Vandepoele, K., Grando, S.M., Toppo, S., Moser, C., Lanchbury, J., Bogden, R., Skolnick, M., Sgaramella, V., Bhatnagar, S.K., Fontana, P., Gutin, A., Van de Peer, Y., Salamini, F., Viola, R. (2007) A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLoS ONE 2(12): e1326. doi:10.1371/journal.pone.0001326 [PubMed] [PLoS]