How many "finished" or permanent draft complete genome sequences have been published?
How many of them are eukaryotes?
Here are the answers from: GOLD
(Apologies for the Three Domain influence.)
Archaea: 181
Bacteria: 3762
Eukaryotes: 183
Why are there so few eukaryotes? Because many eukaryotic genomes are very large and it takes a lot more work to sequence that much DNA. Furthermore, many eukaryotic genomes are full of junk DNA and it's difficult to sequence and assemble repetitive regions in order to get a complete chromosome. The bottom line is money—for most labs it's too expensive to sequence the genome of their favorite eukaryote but it can be quite cheap these days to sequence a bacterial genome.
[Hat Tip: Jonathan Eisen]
And the same reason why so many bacterial genomes are complete
ReplyDeleteShouldn't this actually ask "how many species have been sequenced?"
ReplyDeleteHaha, good one.
DeleteNot actually. Different strains or varieties of the same specie may have been sequenced, wich is the case of some very well studied organisms, like Escherichia coli. In other hand, I hardly believe there are so few eukaryotes sequenced, since fungi are eukaryotes and I know very well that there are a very large effort put in this subject. Fungi usually have much less non-codificant DNA, and thus they are relatively easy to sequence.
DeleteIt's true that repetitive sequences make assembly difficult, but I think the stronger reason for the bias against eukaryotes is the sheer size of their genomes. Sequencing used to be a big deal. Small genomes are easier. OK, the size difference is largely due to junk, but it's the size, not the junk per se, that's the major factor.
ReplyDeleteI didn't mean to imply anything else. Big genomes, lots of junk, harder to sequence and harder to assemble.
DeleteWell, now. Sequencing and assembly are two different things. Junk isn't inherently hard to assemble because it's junk. It's hard to assemble because of all the repetitive sequences. Sure, you know all this, but your statement was confusing on multiple counts.
DeleteTo summarize: eukaryote genomes are hard to sequence because of their size, which happens to be due mostly to junk. They're hard to assemble because of their repetitive sequences, which again happen to be mostly junk.
How about the Archaea then, what's the reason there?
ReplyDeletePopularity. Generally (esp until recent times), genome sequencing initiatives required promotion by interested parties who were usually researchers devoted to the species in question...Archaea just never have been studied as extensively as bacteria for a number of reasons.
DeleteThis organellar biologist would like to point out the additional 3281 obligate intracellular alpha-proteobacteria + 315 obligate intracellular cyanobacteria.
ReplyDeleteRelatively tiny genomes, backing up the size/assembly issue claims.
;-)
Larry, different people disagree with the three domain hypothesis because of different reasons. If I remember well you don't accept the hypothesis because archaebacteria and eubacteria are interspersed in the same clade, according to some trees. Or maybe because of pervasive HGT among these two groups. Am I right? Why do you offer apologies for the three domain influence?
ReplyDeleteSay, TheOtherJim, where are the 315 chloroplasts (if that is what you're referring to) cataloged? There are only 109 at the Chloroplast Genome DB.
ReplyDeleteI was being a bit informal, and included all plastids.
Deletehttp://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=2759&opt=plastid
That was very helpful, thank you.
DeleteI used the same source for teaching purposes. I started last year. As of July 2011, 36 eukaryotic genomes were sequenced. Last year August there were over 180. That's an impressive increase! I wonder how many the website will list coming summer. There must be hundreds of sequencing projects in progress.
ReplyDeleteI am Bioinformatics student and very much interested in sequencing.
ReplyDeleteNot about Numbers ..... see: http://www.templeilluminatus.com/profiles/blogs/recipe-for-saving-rain-forests-and-us
ReplyDeleteIt's what you do with those Nomes....ok Genomes!
Hi,
ReplyDeleteWhen I have checked the GOLD, these are the current statistics for sequenced genomes.
Organisms 72,829
Archaea 1,195
Bacteria 53,377
Eukarya 11,691
Viruses 4,472
But where to find unique numbers ?