More Recent Comments

Friday, January 25, 2013

How Many Genomes Have Been Sequenced?

There are several types of genome sequence. Some are relatively incomplete and they don't really count. Others have been thoroughly sequenced and we have a good permanent draft sequence. The best ones are the "finished" genome sequences where the preliminary drafts have been checked and gaps have been filled.

How many "finished" or permanent draft complete genome sequences have been published?

How many of them are eukaryotes?

Here are the answers from: GOLD

(Apologies for the Three Domain influence.)

Archaea: 181
Bacteria: 3762
Eukaryotes: 183

Why are there so few eukaryotes? Because many eukaryotic genomes are very large and it takes a lot more work to sequence that much DNA. Furthermore, many eukaryotic genomes are full of junk DNA and it's difficult to sequence and assemble repetitive regions in order to get a complete chromosome. The bottom line is money—for most labs it's too expensive to sequence the genome of their favorite eukaryote but it can be quite cheap these days to sequence a bacterial genome.

[Hat Tip: Jonathan Eisen]


caynazzo said...

And the same reason why so many bacterial genomes are complete

Jonathan said...

Shouldn't this actually ask "how many species have been sequenced?"

John Harshman said...

It's true that repetitive sequences make assembly difficult, but I think the stronger reason for the bias against eukaryotes is the sheer size of their genomes. Sequencing used to be a big deal. Small genomes are easier. OK, the size difference is largely due to junk, but it's the size, not the junk per se, that's the major factor.

Mikkel Rumraket Rasmussen said...

Haha, good one.

Mikkel Rumraket Rasmussen said...

How about the Archaea then, what's the reason there?

Larry Moran said...

I didn't mean to imply anything else. Big genomes, lots of junk, harder to sequence and harder to assemble.

John Harshman said...

Well, now. Sequencing and assembly are two different things. Junk isn't inherently hard to assemble because it's junk. It's hard to assemble because of all the repetitive sequences. Sure, you know all this, but your statement was confusing on multiple counts.

To summarize: eukaryote genomes are hard to sequence because of their size, which happens to be due mostly to junk. They're hard to assemble because of their repetitive sequences, which again happen to be mostly junk.

SRM said...

Popularity. Generally (esp until recent times), genome sequencing initiatives required promotion by interested parties who were usually researchers devoted to the species in question...Archaea just never have been studied as extensively as bacteria for a number of reasons.

TheOtherJim said...

This organellar biologist would like to point out the additional 3281 obligate intracellular alpha-proteobacteria + 315 obligate intracellular cyanobacteria.

Relatively tiny genomes, backing up the size/assembly issue claims.


Sergio A. Muñoz-Gómez said...

Larry, different people disagree with the three domain hypothesis because of different reasons. If I remember well you don't accept the hypothesis because archaebacteria and eubacteria are interspersed in the same clade, according to some trees. Or maybe because of pervasive HGT among these two groups. Am I right? Why do you offer apologies for the three domain influence?

Unknown said...

Say, TheOtherJim, where are the 315 chloroplasts (if that is what you're referring to) cataloged? There are only 109 at the Chloroplast Genome DB.

Frisodio said...

Not actually. Different strains or varieties of the same specie may have been sequenced, wich is the case of some very well studied organisms, like Escherichia coli. In other hand, I hardly believe there are so few eukaryotes sequenced, since fungi are eukaryotes and I know very well that there are a very large effort put in this subject. Fungi usually have much less non-codificant DNA, and thus they are relatively easy to sequence.

TheOtherJim said...

I was being a bit informal, and included all plastids.

Corneel said...

I used the same source for teaching purposes. I started last year. As of July 2011, 36 eukaryotic genomes were sequenced. Last year August there were over 180. That's an impressive increase! I wonder how many the website will list coming summer. There must be hundreds of sequencing projects in progress.

Anonymous said...

I am Bioinformatics student and very much interested in sequencing.

Unknown said...

That was very helpful, thank you.

Anonymous said...

Not about Numbers ..... see:

It's what you do with those Nomes....ok Genomes!

Unknown said...

When I have checked the GOLD, these are the current statistics for sequenced genomes.

Organisms 72,829
Archaea 1,195
Bacteria 53,377
Eukarya 11,691
Viruses 4,472

But where to find unique numbers ?