Wednesday, March 08, 2023

A small crustacean with a very big genome

The antarctic krill genome is the largest animal genome sequenced to date.

Antarctic krill (Euphausia superba) is a species of small crustacean (about 6 cm long) that lives in large swarms in the seas around Antarctica. It is one of the most abundant animals on the planet in terms of biomass and numbers of individuals.

It was known to have a large genome with abundant repetitive DNA sequences making assembly of a complete genome very difficult. Recent technological advances have made it possible to sequence very long fragments of DNA that span many of the repetitive regions and allow assembly of a complete genome (Shao et al. 2023).

The project involved 28 scientists from China (mostly), Australia, Denmark, and Italy. To give you an idea of the effort involved, they listed the sequencing data that was collected: 3.06 terabases (Tb) PacBio long read sequences, 734.99 Gb PacBio circular consensus sequences, 4.01 Tb short reads, and 11.38 Tb Hi-C reads. The assembled genome is 48.1 Gb, which is considerably larger than that of the African lungfish (40 Gb), which up until now was the largest fully sequenced animal genome.

The current draft has 28,834 protein-coding genes and an unknown number of noncoding genes. About 92% of the genome is repetitive DNA that's mostly transposon-related sequences. However, there is an unusual amount of highly repetitive DNA organized as long tandem repeats and this made the assembly of the complete genome quite challenging.

The protein-coding genes in the Antarctic krill are longer than in other species due to the insertion of repetitive DNA into introns but the increase in intron size is less than expected from studies of other large genomes such as lungfish and Mexican axolotl. It looks like more of the genome expansion has occurred in the intergenic DNA compared to these other species.

This study supports the idea that genome expansion is mostly due to the insertion and propagation of repetitive DNA sequences. Some of us think that the repetitive DNA is mostly junk DNA but in this case it seems unusual that there would be so much junk in the genome of a species with such a huge population size (about 350 trillion individuals). The authors were aware of this problem but they were able to calculate an effective population size because they had sequence data from different individuals all around Antarctica. The effective population size (Ne) turned out to be one billion times smaller than the census population size indicating that the population of krill had been much smaller in the recent past. Their data suggests strongly that this smaller population existed only 10 million years ago.

The authors don't mention junk DNA. They seem to favor the idea that large genomes are associated with crustaceans that live in polar regions and that large genomes may confer a selective advantage.


Shao, C., Sun, S., Liu, K., Wang, J., Li, S., Liu, Q., Deagle, B.E., Seim, I., Biscontin, A., Wang, Q. et al. (2023) The enormous repetitive Antarctic krill genome reveals environmental adaptations and population insights. Cell 186:1-16. [doi: 10.1016/j.cell.2023.02.005]

10 comments:

  1. I suspect that this repetition in the genome protects against endogenous retroviruses.

    ReplyDelete
  2. Would this make krill the most abundant source of species-specific nucleic material (junk or not) by total mass on the planet? Ants maybe, but many more species?

    ReplyDelete
  3. @Bob D: You'd have to work out whether it was more than provided by Prochlorococcus marinus.

    ReplyDelete
  4. @Joe Felsenstein True...I only think in metazoan!

    ReplyDelete
  5. @Joe

    Yesterday I spent about an hour trying to find out the total biomass of Prochlorococcus but it was harder than I thought. I'm pretty sure it's several times more than the Antarctic krill but it's genome is so much smaller that the krill probably accounts for more DNA. The calculation is complicated by the fact that much more of the krill biomass is not DNA than in Prochlorococcus (cell size). I gave up.

    ReplyDelete
  6. Interesting point about the smaller population of Krill ~10 million years ago. Was there some sort of extinction event or something?

    On a related note, google informs me Krill feed primarily on phytoplankton. Maybe the strongest selecting pressure operating on Krill is predation, rather than resource limitation (I have no idea so just wondering out loud). That could help explain why they can more easily tolerate the metabolic cost of a colossal genome compared to some other species?

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Oops. I meant, "What could be the selective advantage of a large GENOME size?"

    ReplyDelete
  9. @S. Joshua Swamidass

    I don't think there is a selective advantage to having a large genome. In fact, I think it is slightly disadvantageous.

    There are several speculative answers to your question but none of them stand up to close scrutiny IMHO.

    1. Excess DNA soaks up mutations.
    2. Excess DNA serves as a sink for transposon insertions, thus protecting the functional DNA from disruptive insertions.
    2. Large genomes help organize chromatin, partly by keeping genes apart.
    3. Large genomes lead to large nuclei and to large cells and this is advantageous in large multicellular organisms.
    4. Large genomes are full of sequences that could evolve into new genes and this is an advantage for the evolution of the species.
    5. Large genomes full of pseudogenes can help create genetic diversity by recombining with functional genes.

    ReplyDelete
  10. 6. Large genomes are, like, totes functional and are necessary for complex organisms, notably the very mostest complex organism, humans. Ferns and salamanders do not exist, and I have no idea what you mean by "C-value paradox".

    ReplyDelete