Sandwalk: The Mite Genome

Tuesday, December 20, 2011

The Mite Genome

The genome of the two-spotted spider mite, Tetranychus urticae has been sequenced and the results were published in Nature last month (Grbic et al. 2011).

Spider mites eat plants. They produce silk-like webs and that's why they're called "spider mites". They belong to the class Arachnida, which is the same group that contains spiders. The Arachnids are in the subphylum Chelicerata, a large group of arthropods distantly related to the insects and crustaceans. This is the first genome sequence of a chelicerate and that's why it's important.

Genome Size

The genome is only 90 Mb in size. It's the smallest arthropod genome that has been sequenced so far. Contrast this size with the human genome at 3,200 Mb or the genome of another tick, Ixodes scapularis, estimated to be 2,100 Mb. (Honeybee = 236 Mb, Drosophila = 140 Mb.) According to Ryan Gregory's animal genome size database this is the smallest known arachnid genome and the smallest known arthropod genome.

The authors estimate that there are 18,414 protein-encoding genes in the mite genome. This is about the same number of genes as most insects whose genomes have been sequenced and only slightly less than the number of genes in the human genome.

About 41% of the mite genome consists of exons (protein-encoding). Recall that less than 2% of our genome encodes proteins and in most insects the exon sequences make up less than 10% of the genome. (Honeybees and Drosophila also have smaller than average genome sizes.)

Introns

As you might imagine, the mite genome has a lot less junk DNA than other animals. This is partially reflected in the number and size of the introns. The average protein-encoding gene has less than three introns and the ones that are present are a lot smaller than the introns in species with larger genomes.

The figure on the right is a truncated version of a figure that appears in the supplemental information. It shows that the smallest introns are 40 bp and 70% of all introns are less that 150 bp in length (median = 96 bp). This is close to the smallest possible intron size allowing for slices sites and formation of a loop during splicing.

Transposons and Repetitive Sequences

Transposons (active and degenerate) make up less than 10% of the T. urticae genome and highly repetitive sequences (microsatellites) are almost absent. (The spider mite chromosomes don't have centromeres.)

Transposon sequences and highly repetitive sequences are a major component of the junk DNA found in large genomes so their absence in the mite genome is not a surprise.

Why Is the Mite Genome So Small?

The short answer is, we don't know. The long answer is much more complicated. As Michael Lynch points out (Lynch 2007 p.37), there's a balance between rates of insertion and deletion mutations. In species with small genomes the spontaneous rate of nucleotide deletion exceeds that of insertion so genome sizes shrink over time.

There may not be a selective advantage to having small or large genomes. It may just be that in some species the repair machinery tends to favor deletions while in closely related species the enzymes don't have this bias. Or maybe large genomes are slightly deleterious but the population size isn't large enough to allow natural selection to act. Some lineages may never have encountered significant bottlenecks so they've maintained a huge population size for millions of years allowing natural selection to operate on slightly deleterious mutations. This leads to smaller genomes.

Whatever the explanation, the small genome of mites shows us that most of the junk DNA present in other arthropod genomes is dispensable. That's why it's called "junk."

Grbic, M. et al. (2011) The genome of Tetranychus urticae reveals herbivorus pest adaptations. Nature 479:487-492. [doi: 10.1038/nature10640] [PubMed]

Lynch, M. (2007) "The Origins of Genome Architecture" Sinauer Associates, Inc. Publishers, Sunderland, Massachusetts, United States

14 comments:

burntloaferTuesday, December 20, 2011 10:53:00 AM
Thanks. As a layman, I appreciate the clarity. Great post!
ReplyDelete
Replies
Grumpy BobTuesday, December 20, 2011 11:10:00 AM
Back when I was part of the team sequencing the Drosophila genome, it was generally accepted that there was about 100Mb of euchromatin, with the rest made up of satellite-rich gene-poor heterochromatin. That would make the comparison with the mite genome a little less extreme.

Pity they didn't sequence the genome of one of the mite species that can infest Drosophila cultures: it's thought that an egg-predating mite may have transferred P-elements from a virilis group fly to melanogaster (P-elements have only been present in melanogaster for a hundred years or so).
ReplyDelete
Replies
Arek W.Tuesday, December 20, 2011 1:52:00 PM
Slightly off topic, but not much.
I recently read an article about lincRNA and there was much of what I've read about in your posts.

http://www.sciencenews.org/view/feature/id/336570/title/Missing_Lincs

I'll give some examples:

Lesser-known genetic material helps explain why humans are human

Only now have scientists begun identifying...

That archive [Human Genome Project] contains about 3 billion genetic letters, far more than the genomes of less complex organisms such as roundworms and fruit flies. But the project revealed that people’s roughly 22,000 protein-specifying genes don’t greatly outnumber those found in simpler organisms.
The finding was one of the biggest surprises in biology.

If RNAs can do important work without making proteins, the definition of a gene needs to be expanded...

and there is more.

Article was interesting, but now thanks to you I know that I should take such articles with a grain of salt.
ReplyDelete
Replies
Rosie RedfieldTuesday, December 20, 2011 8:38:00 PM
"Thats why it's called 'junk'."

Nice!
ReplyDelete
Replies
AnonymousTuesday, December 20, 2011 10:06:00 PM
What are you talking about? The small genome size is because God ... I mean, the intelligent designer made it so!
ReplyDelete
Replies
Allan MillerWednesday, December 21, 2011 5:15:00 AM
Or maybe large genomes are slightly deleterious but the population size isn't large enough to allow natural selection to act.

Wouldn't we expect junk to scale with effective population size if this were the case?

This theory (due to Lynch et al) seems designed to explain prokaryote-eukaryote differences, but other than the significant discontinuity in population size between these groups, offers little else in support.

Lynch: "Eukaryotes have much smaller population sizes compared to bacteria, and we believe this is the main reason junk DNA sequences are still with us "

Eukaryotes have meiosis, syngamy and diploidy, a much more 'compressible' genome, parallel copying mechanisms, more junk-tolerant energy/metabolite-gathering constraints and (in multicellular forms) a shift from 'germline' constraints on genome size to 'somatic' ones.

All of these seem more plausible as reaszons for their junkiness than raw population size.
ReplyDelete
Replies
The Other JimWednesday, December 21, 2011 11:49:00 PM
@ Allan Miller

This theory (due to Lynch et al) seems designed to explain prokaryote-eukaryote differences, but other than the significant discontinuity in population size between these groups, offers little else in support.

But isn't the prok-euk split the only one where Ne is a highly significant difference, and consistent? Also, for the accumulation of junk, the μ would also matter, as mentioned above...
ReplyDelete
Replies
Allan MillerFriday, December 23, 2011 10:18:00 PM
@The other Jim

I think, in pop-genetic terms, the core distinction between prokaryotes and eukaryotes is meiosis. All eukaryotes either do it, or have ancestors that did. It has a huge impact upon what we can even consider a 'population' to be. So to elevate the Ne differences between these groups to prime cause without even considering the huge mechanistic distinctions seems to miss a vital point, to me.

In pre-meiotic days, each cell is simply competing in an ecology consisting of 'other cells'. The cell is a population of 1. Some clonal lineages will be fitter than others, but it is not clear what delimits the 'effective population' here.

With meiosis, you have recombinant species, and then you have populations - collections of entities not just derived from a common ancestor, but capable of syngamy. Some of 'em are big, and some small ...

It's not just that, of course, prokaryotes are constrained by external-membrane energy generation. By diffusional acquisition of metabolites as opposed to eukaryotic engulfment and cytoskeletal transfer.

The list goes on. They are just hugely different. Disposable transcripts and stowaway surplus DNA are far less punitive to a eukaryote than a prokaryote.

And they are asking for trouble by performing meiosis - this is how much junk spreads around the population: as a sexually transmitted disease.
ReplyDelete
Replies
The Other JimWednesday, December 28, 2011 2:08:00 PM
By the writing, it seems that you are confusing Ne with some form of census size, but your history in the comments on this blog imply that you do know something about popgen. Maybe you could clear this part up?

I am also not convinced with the "meiosis causes populations" reasoning. Haploids can form populations... Not sure what you are getting at here...
ReplyDelete
Replies
Psi WavefunctionThursday, December 29, 2011 12:24:00 AM
@Allan:
"Wouldn't we expect junk to scale with effective population size if this were the case?"

No, because higher Ne --> higher NeS --> more efficient selection --> less junk. Unless I misunderstood you; in which case, yes, it does scale (inversely!) with population size. In terms of genome size vs. Ne, the sore oddballs were the bacterial endosymbionts, which had *smaller* genomes and lower eff pop sizes, but this is due to a deletion bias in prokaryotes, for various reasons.

"This theory (due to Lynch et al) seems designed to explain prokaryote-eukaryote differences, but other than the significant discontinuity in population size between these groups, offers little else in support."

Personally, I find the euk-proke comparison the weaker point because the phylogenetic sample size there is 1 (euks evolved only once), so you can't really rule out any other factors, on a theoretical level.

We have slowly accumulating evidence in the eukaryotic world, however, of multiple consistent inverse correlations between Ne and genome size/junk levels -- this is a painfully slooow and expensive process because the only reliable way to estimate Ne is to measure various site polymorphisms vs. divergences and factor in the species mutation rate. The only way to measure mutation rates is to have some poor fuck (hi!) struggle with constantly-dying mutation accumulation lines (where Ne=1 due to single organism bottlenecks) for years on end, all to ultimately get one, single number that may or may not be interesting. Because we have so few data on Ne, we're only beginning to map out genome sizes over that to actually do this in a phylogenetically mindful way.

Also, speaking of phylogenetically mindful, note that hardly anyone has ever measured mutation rates in protists.

If you'd like more hard data on this topic, feel free to tend 50-100 clonal lines, picking a *single* individual or distinct colony as frequently as possible, and then sequence a bunch of genomes once you finished (and compare with the ancestral genome sequence for those specific lines). No, seriously, we all need help! ;-)

"All of these seem more plausible as reaszons for their junkiness than raw population size."

The factors you mentioned most likely do have some effect or another on junk-tolerance (eg. the masking of underdominant mildly deleterious mutations in diploids), but Lynch (and many other evolutionary biologists) argues that Ne is the fundamental factor whose importance and scale overrides that of Haldane's Sieve-like effects of ploidy, multicellularity, sex, etc -- on a primitive level that prob does his argument no justice, both selection and the introduction of variation into a population are directly linked to its size: NeS*somecoefficientsifapplicable, Ne*u*somecoefficientsifapplicable. The relaxed selection on slightly deleterious underdominant mutations in diploids is a factor of dominance*selective effect -- this selective effect of the mutation is directly linked to effective population size. In other words, even h*s is still ultimately h*NeS, in real, finite populations.

Did that make any sense whatsoever? >_> Vacation writing tends to be extra-crappy...

Disclaimer: not a population biologist, please correct away...
ReplyDelete
Replies
Allan MillerThursday, December 29, 2011 9:56:00 PM
The Other Jim,

By the writing, it seems that you are confusing Ne with some form of census size, but your history in the comments on this blog imply that you do know something about popgen. Maybe you could clear this part up?

The chance of me confusing one thing with another is very high - I think I'm about Flounder level when it comes to popgen! :0) I understand Ne to denote the size of an ideal population (fully mixed, all breeding, constant size) that would generate a given amount of neutral variation. It determines the time taken for fixation or, conversely, the time taken for loss of variation. And the greater Ne, the more likely it is that selective differentials will converge upon their expected value, hence larger effective populations will see more fixation due to selection; smaller will see more fixation by drift.

So if junk is mildly detrimental, it will be eliminated more often in the larger population - all else being equal.

Which is fine as far as it goes, but as noted, there is such a deep mechanistic divide between these organisms, both individually and as dynamic populations, that I would see this as swamping the Ne effect. I guess I am, in part, saying that s is likely to be much higher in prokaryotes for a given amount of surplus DNA. They are under constraint to replicate quickly. Eukaryotes can afford to be much more leisurely about it, for various physiological reasons. Using Kimura’s 4Nes ‘rule of thumb’, (2Nes in the haploid population), a 50-fold lower s in diploids would be equivalent to a 100-fold greater Ne for the same selection/drift ‘pivot point’ – even if the populations behaved in a dynamically equivalent manner, which they do not because of sex and other factors.

I am also not convinced with the "meiosis causes populations" reasoning. Haploids can form populations... Not sure what you are getting at here...

Pop genetics was formulated for metazoans and plants – sexual diploids. The ‘effective’ part of a sexual population is delineated by interbreeding within a particular group and limited or no breeding outside it. We effectively ignore ecology – we isolate our group of interest against a common environmental background, and consider the dynamics of alleles assorting within it. Thanks to recombination, an allele can ‘colonise’ a locus, like gas entering a vessel. Meanwhile, syngamy means that the population is stirred by this same factor, sex. The members of a population are united not just by common ancestry, but regularly seeking each other out. So sex has several roles - it allows independent assortment of alleles within a population, helps to mix it, and defines its boundaries.

Now, we can certainly apply the same models to haploid groups. Alternative types can again ‘take over’ by differential fitness. But the competition is purely an ecological one for physical space and resources.

Imagine seeding a large lake with 10 distinct varieties each of prokaryote and sexual eukaryote. Lob ‘em in and give it a good stir.

You started off with 20 populations. But how many are there in the lake? There are certainly 10 sexual populations, and in each a mutation may spread until it is fixed in that population: all instances of that locus descend from the same ancestor. In the prokaryotes, fixation would only take place when every single prokaryote was substituted – not just those of the same type as the originator - they aren't distinguished by the process. In that sense, Ne is even larger for prokaryotes. But only if you keep stirring. I think that prokaryote populations are much less well-mixed than eukaryotes'.
ReplyDelete
Replies
Allan MillerThursday, December 29, 2011 10:28:00 PM
Psi:

@Allan:
"Wouldn't we expect junk to scale with effective population size if this were the case?"

No, because higher Ne --> higher NeS --> more efficient selection --> less junk. Unless I misunderstood you; in which case, yes, it does scale (inversely!) with population size.

Sorry, yes, badly expressed! :0) And I have seen a paper confirming the expectation in marine vs freshwater fishes. Comparing like organisms, I have no problem with.

"...offers little else in support."

Ummm, maybe a little hasty!

Personally, I find the euk-proke comparison the weaker point because the phylogenetic sample size there is 1 (euks evolved only once), so you can't really rule out any other factors, on a theoretical level.

Aye, that's the thing ... though one could perhaps investigate modern proks and euks experimentally. I do have a hunch that meiosis/syngamy, nutrition and DNA packaging/replication modes change the game, for numerous reasons.

Did that make any sense whatsoever?

Yep, cheers!
ReplyDelete
Replies
Psi WavefunctionFriday, December 30, 2011 4:29:00 PM
Ah, microbial population structure! It's a cute topic. We barely know the first thing about it ;-) There's still fierce arguments on whether endemism exists at all in microbes (ie a species being unique to a particular geographic location and not found in the same conditions elsewhere), or whether they're all cosmopolitan ("everything is everywhere"). This barely even touches popgen yet, and already people can get away with arguments so large. Truth is, we still don't know much about microbial population structures, and what their Ne would be, or mean. Do they mix? What are bacterial gene pools anyway, if we can't even devise a proper species concept for them? And if there are no trees of prokes as Ford Doolittle and Eric Bapteste assert, due to rampant LGT (not that I necessarily buy that entirely), how can we even talk about populations? What are they, in the absense of defined communities of regular gene exchange?

We know so little about what microbes do out there in the wild. To study them, we have to remove them and culture them in alien conditions. As hard as field ecology already is with large fluffy mammals and stationary plants, such a thing hardly exists for critters too small to see by eye. To quote my past supervisor without permission, "microbial ecology is like studying the African savannah from the International Space Station, with a long vacuum cleaner" -- a very apt description of the state of that field, I think!

With microbial eukaryotes, at least our species concepts are a little clearer (just a little) because they do occasionally partake in coitus. But even there, sex was unknown in Giardia and Leishmania (and thought non-existent!) until very recent population genomic data suggested otherwise! This suggests sex is very rare in some (many!) protists, but it also appears that for whatever reason, they all seem to do it at least occasionally. As it's not obligatory each generation, recombination rates fall onto a gradient rather than being a sexual/asexual divide. This means that even if eukaryotes did experience a fundamental population genetic shift due to having sex (apart from oft-convenient dis-linkage of alleles from each other and their genomic environment), its effects are not necessarily profound in all euk lineages. What is really cool is that you could find phylogenetically independent groups of organisms that share similar patterns for other popgen factors except frequency of recombination in natural populations -- and investigate whether this actually has any consistent effect on other popgen parameters! Then we'd be able to empirically characterise some effects of sex!

Still, I think that while diploidy, DNA replication and sex do have their own special effects on evolutionary parameters, they're not necessarily as fundamental as change in population size, since their effects are usually connected through it. That said, people are looking at the interplay between eff pop size and replication error rates -- it seems like a reduction in pop size correlates with and facilitates sloppier editing mechanisms. There should be a paper coming out on that in a few months from the Lynch lab...

So these factors *are* being considered and thought of in a few places studying this (esp from the genome evolution standpoint), just slowly and severely hindered by how little we know about microbial ecology and protists, both crucial in their own ways to understanding the very fundamentals of evolutionary biology.

I'm just rambling because I only recently discovered there are 'populations'... it's all very exciting to integrate with protistology and think about these various things!

Plenty of space both in microbial ecology and in protistology, people! Come on over!!!
ReplyDelete
Replies
Allan MillerTuesday, January 10, 2012 8:20:00 AM
Psi - missed this, sorry.

even if eukaryotes did experience a fundamental population genetic shift due to having sex [...] its effects are not necessarily profound in all euk lineages.

No indeed - the multiplicity of 'evolutionary scandals'. Loss of sex can follow a continuum from often to rarely through to never (I know from bitter personal experience!). As such, the organisms' population dynamic varies accordingly - and an important dynamic deriving from sex is the additional vector of mate search and subsequent haploid dispersal. This 'stirs' sexual populations, but with a vigour that diminishes in proportion to its frequency as other vectors - food search, environmental shoving - become more important.

The 'classic' population remains the multicellular eukaryote, with its implicit assumption of efficient stirring to provide panmixis. And I do find that naive extension of this model 'downwards' to incorporate haploid populations, or 'sidewards' to lump together sexuals and secondary asexuals seems to miss that mechanistic point, the relevance of both mate search and meiosis itself to the population's actual dynamics.

The whole eukaryote clade is founded upon sex, which probably (can I say that?) happened only once, as evidenced by, for example, the ubiquity of Spo11 homologues. Many have dropped it, but as long as the world is extensively populated by their sexual relatives, that tend to undergo an elevated rate of both cladogenesis and anagenesis, as well as adaptive flexibility, diversity will be dominated by the sexual forms.

But there continues to be some confusion between ecological and genetic populations. Maynard Smith (yeah, I know, not a current authority, but still influential) offers the treatment "consider a population of herring consisting 50/50 sexual and asexual individuals" ... this kind of thing is common, and sets my teeth on edge! The whole collective of ecologically competitive (because genetically close) individuals is lumped into one, and I don't think it is helpful to do so. There is a mechanistic divide, involving not just the obvious factors such as variation, epistasis and 'parallel processing', but the influence of sex upon the closeness of fit of the populations' real behaviour to the assumptions of the mathematical model. (And of course you never start at 50/50! The sexual has the advantage of tenure, and standing variation).

There are plenty of more subtle treatments, of course, but the basic "twofold cost" one is about as far as many textbooks get.
ReplyDelete
Replies

Add comment