Friday, January 25, 2013

How Many Genomes Have Been Sequenced?

There are several types of genome sequence. Some are relatively incomplete and they don't really count. Others have been thoroughly sequenced and we have a good permanent draft sequence. The best ones are the "finished" genome sequences where the preliminary drafts have been checked and gaps have been filled.

How many "finished" or permanent draft complete genome sequences have been published?

How many of them are eukaryotes?

Here are the answers from: GOLD

(Apologies for the Three Domain influence.)

Archaea: 181
Bacteria: 3762
Eukaryotes: 183

Why are there so few eukaryotes? Because many eukaryotic genomes are very large and it takes a lot more work to sequence that much DNA. Furthermore, many eukaryotic genomes are full of junk DNA and it's difficult to sequence and assemble repetitive regions in order to get a complete chromosome. The bottom line is money—for most labs it's too expensive to sequence the genome of their favorite eukaryote but it can be quite cheap these days to sequence a bacterial genome.


[Hat Tip: Jonathan Eisen]

18 comments:

  1. And the same reason why so many bacterial genomes are complete

    ReplyDelete
  2. Shouldn't this actually ask "how many species have been sequenced?"

    ReplyDelete
    Replies
    1. Not actually. Different strains or varieties of the same specie may have been sequenced, wich is the case of some very well studied organisms, like Escherichia coli. In other hand, I hardly believe there are so few eukaryotes sequenced, since fungi are eukaryotes and I know very well that there are a very large effort put in this subject. Fungi usually have much less non-codificant DNA, and thus they are relatively easy to sequence.

      Delete
  3. It's true that repetitive sequences make assembly difficult, but I think the stronger reason for the bias against eukaryotes is the sheer size of their genomes. Sequencing used to be a big deal. Small genomes are easier. OK, the size difference is largely due to junk, but it's the size, not the junk per se, that's the major factor.

    ReplyDelete
    Replies
    1. I didn't mean to imply anything else. Big genomes, lots of junk, harder to sequence and harder to assemble.

      Delete
    2. Well, now. Sequencing and assembly are two different things. Junk isn't inherently hard to assemble because it's junk. It's hard to assemble because of all the repetitive sequences. Sure, you know all this, but your statement was confusing on multiple counts.

      To summarize: eukaryote genomes are hard to sequence because of their size, which happens to be due mostly to junk. They're hard to assemble because of their repetitive sequences, which again happen to be mostly junk.

      Delete
  4. How about the Archaea then, what's the reason there?

    ReplyDelete
    Replies
    1. Popularity. Generally (esp until recent times), genome sequencing initiatives required promotion by interested parties who were usually researchers devoted to the species in question...Archaea just never have been studied as extensively as bacteria for a number of reasons.

      Delete
  5. This organellar biologist would like to point out the additional 3281 obligate intracellular alpha-proteobacteria + 315 obligate intracellular cyanobacteria.

    Relatively tiny genomes, backing up the size/assembly issue claims.

    ;-)

    ReplyDelete
  6. Larry, different people disagree with the three domain hypothesis because of different reasons. If I remember well you don't accept the hypothesis because archaebacteria and eubacteria are interspersed in the same clade, according to some trees. Or maybe because of pervasive HGT among these two groups. Am I right? Why do you offer apologies for the three domain influence?

    ReplyDelete
  7. Say, TheOtherJim, where are the 315 chloroplasts (if that is what you're referring to) cataloged? There are only 109 at the Chloroplast Genome DB.

    ReplyDelete
    Replies
    1. I was being a bit informal, and included all plastids.

      http://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=2759&opt=plastid

      Delete
    2. That was very helpful, thank you.

      Delete
  8. I used the same source for teaching purposes. I started last year. As of July 2011, 36 eukaryotic genomes were sequenced. Last year August there were over 180. That's an impressive increase! I wonder how many the website will list coming summer. There must be hundreds of sequencing projects in progress.

    ReplyDelete
  9. I am Bioinformatics student and very much interested in sequencing.

    ReplyDelete
  10. Not about Numbers ..... see: http://www.templeilluminatus.com/profiles/blogs/recipe-for-saving-rain-forests-and-us

    It's what you do with those Nomes....ok Genomes!

    ReplyDelete
  11. Hi,
    When I have checked the GOLD, these are the current statistics for sequenced genomes.

    Organisms 72,829
    Archaea 1,195
    Bacteria 53,377
    Eukarya 11,691
    Viruses 4,472

    But where to find unique numbers ?

    ReplyDelete