Sandwalk: Structure and expression of the SARS-CoV-2 (coronavirus) genome

Thursday, July 09, 2020

Structure and expression of the SARS-CoV-2 (coronavirus) genome

Coronaviruses are RNA viruses, which means that their genome is RNA, not DNA. All of the coronaviruses have similar genomes but I'm sure you are mostly interested in SARS-CoV-2, the virus that causes COVID-19. The first genome sequence of this virus was determined by Chinese scientists in early January and it was immediately posted on a public server [GenBank MN908947]. The viral RNA came from a patient in intensive care at the Wuhan Yin-Tan Hospital (China). The paper was accepted on Jan. 20th and it appeared in the Feb. 3rd issue of Nature (Zhou et al. 2020).

By the time the paper came out, several universities and pharmaceutical companies had already constructed potential therapeutics and several others had already cloned the genes and were preparing to publish the structures of the proteins.¹

By now there are dozens and dozens of sequences of SARS-CoV-2 genomes from isolates in every part of the world. They are all very similar because the mutation rate in these RNA viruses is not high (about 10^-6 per nucleotide per replication). The original isolate has a total length of 29,891 nt not counting the poly(A) tail. Note that these RNA viruses are about four times larger than a typical retrovirus; they are the largest known RNA viruses.

The RNA genome that's inside the virus particle looks very much like a typical eukaryotic mRNA molecule. It has a 5′ cap and a 3′ poly(A) tail of about 40-50 nucleotides. This RNA is translated by the host protein synthesis components as soon as it is injected into the cell.

The genome contains a number of genes where the word "gene" is used to define the open reading frame of the proteins produced by the virus. The initial translation products are two large polyproteins that are subsequently cleaved by proteases to produce smaller proteins. Most of time the viral RNA is translated to give the 1a polyprotein (~460 kDa) that is subsequently cleaved to produce 11 distinct non-structural proteins (nsps). Sometimes the ribosomes stall near the stop codon when they encounter a frameshift element (FSE) containing a "slippery site" that causes the ribosomes to skip one nucleotide. This avoids the stop codon and allows translation to continue into the 1b gene. The large 1ab polyprotein (~780 kDa) produces another five proteins after cleavage.

The functions of many (but not all) of these proteins have been discovered. Nsp12 is an RNA dependent RNA polymerase (RdRp). This is the enzyme that will copy the viral RNA to produce more infectious RNAs but it also produces a number of other transcripts (see below). RdRp is part of a large replication-transcription complex (RTC) that includes a number of accessory proteins (nsp2, nsp4, nsp6, nsp7+nsp8, nsp9, and nsp10). The exact functions of all these accessory proteins haven't been worked out in detail.

Nsp3 is a papain-like protease (PLpro) and nsp5 is a 3C-like cysteine protease (3CLpro). They are responsible for cleaving polyproteins 1a and 1ab.

Nsp13 is a 5′→3′ helicase (Hel) that's required for transcription. Nsp14 is a 3′→5′ exonuclease involved in proofreading. Nsp15 appears to be a uridine-specific endonuclease and nsp16 is an S-adenosylmethionine methyltransfersase.

The open reading frames at the 3′ end of the viral RNA cannot be translated because of the stop codon at the end of the 1ab "gene." Production of these proteins (e.g. S, M, E etc.) has to wait until later in the life cycle of the virus after the assembly of the RTC complex. As we shall see shortly, the synthesis of these late proteins involves a complicated process that requires production of many different transcripts.

The injected virus RNA is a (+) strand so production of new viral RNA requires two rounds of transcription. First, the RTC complex binds to the 3′ end of the (+) strand and copies it all the way to the 5′ end producing a (-) strand. This strand is then copied to produce new (+) strands that can be incorporated into new virus particles. The new (+) strands also act as messenger RNA to produce more 1a and 1ab polyproteins.²

Transcription from the 3′ end of the (+) strand also produces a group of subgenomic RNAs (sgRNAs). The 3′ end contains a number of transcription-regulating sequences (TRS-B) consisting of a 10 nucleotide AU-rich stretch of RNA. There is another TRS (TRS-L) at the 5′ end next to a leader sequence (L). When the RTC encounters a TRS it will pause and this may cause it to switch and continue transcription at TRS-L. This produces an sgRNA consisting of a stretch from the 3′ end (body) joined to the leader sequence at the 5′ end (leader).

The example shown below shows template switching between a TRS-B located at the 5′ end of the S gene and TRS-L to produce an S sgRNA. This sgRNA is then transcribed to produce an mRNA that can be translated to produce S protein.

Each of the genes at the 3′ end of the virus genome is associated with a TRS-B sequence so transcription from the 3′ end produces 9 different sgRNAs corresponding to the nine functional genes. (Open reading frame 10 is not a functional gene.) The figure on the right is from Kim et al. (2020).

Some of these "late" genes are required for assembly of new virus particles. S is the gene for the trimeric spike protein that mediates attachment of the virus to the ACE2 receptor on the surface of the host cell. M is a membrane glycoprotein— it is the most abundant structural protein. E is the envelope protein. N is the nucleocapsid protein that binds RNA and helps package it into the virus particle.

Reading frame 3 seems to produce two proteins, 3a and 3b. It's likely that 3a is an ion channel protein on the virus surface. Proteins 7 and 8 are additional viral assembly proteins. I don't know the function of protein 6 and I'm not sure if anyone else knows. Many coronaviruses don't make protein 6.

DISCLAIMER: I am not an expert on coronaviruses. Everything in this post is stuff I have learned in the past few days from reading published papers. Feel free to correct all the mistake I have made.

1. The behavior of these Chinese scientists doesn't match with the conspiracy theory that China engineered this virus—perhaps they weren't in on the conspiracy? :-)

2. I don't know how the transcription complex manages to copy right to the ends of the viral RNA. It seems to involve some complicated RNA secondary structures but I didn't bother reading the relevant papers.

References and Bibliography

Bar-On, Y.M., Flamholz, A., Phillips, R. and Milo, R. (2020) Science Forum: SARS-CoV-2 (COVID-19) by the numbers. Elife 9: e57309. [doi: 10.7554/eLife.57309]

Kim, D., Lee, J-Y., Yang, J-S., Kim, J.W., Kim. V.N., and Chang, H. (2020) The Architecture of SARS-CoV-2 Transcriptome. Cell 181:914-921 [doi: 10.1016/j.cell.2020.04.011]

Fung, T.S. and Liu, D.X. (2019) Human coronavirus: host-pathogen interaction. Annual review of microbiology 73: 529-557. [doi: 10.1146/annurev-micro-020518-115759]

Sawicki, S.G., Sawicki, D.L. and Siddell, S.G. (2007) A contemporary View of Coronavirus Transcription. Journal of virology 81(1):20-29. [doi: 10.1128/JVI.01358-06]

Zhou, P., Yang, X.-L., Wang, X.-G., Hu, B., Zhang, L., Zhang, W., Si, H.-R., Zhu, Y., Li, B. and Huang, C.-L. (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579(7798): 270-273. [doi: 10.1038/s41586-020-2012-7]

7 comments :

SPARC said...: Wouldn‘t it be more appropriate to use nt rather than be?; Sunday, July 12, 2020 6:26:00 AM
SPARC said...: Sorry, the autocorrection turned bp into be.; Sunday, July 12, 2020 6:28:00 AM
Larry Moran said...: Yes, nucleotides is better. I thought I had made all the corrections but I clearly missed a lot. Fixed now.; Sunday, July 12, 2020 10:42:00 AM
Mikkel Rumraket Rasmussen said...: Though you state that the Coronavirus currently responsible for this pandemic seems to have a low mutation rate, I still wonder how different have the most divergent strains of the current Coronavirus pandemic become so far?
In absolute numbers, how many mutations separate them from their common ancestor as it was back in December-January, at this point?; Sunday, July 12, 2020 12:05:00 PM
Larry Moran said...: From my Facebook post ...

"SARS-CoV-2 is evolving at a normal rate from the initial isolate first identified and sequenced in China. Tung Phan at the University of Pittsburg (USA) has compared the sequences of 86 strains from all over the world and he has identified a total of 93 mutations including three deletions (see figure below). Forty-two of these mutations are missense mutations located in the ORF1ab polyprotein gene, the spike surface glycoprotein gene, the matrix protein gene, and the nucleocapsid proetin gene. No mutations in the envelope protein genes were found.

Phan, T. (2020). "Genetic diversity and evolution of SARS-CoV-2." Infection, genetics and evolution 81: 104260. [doi: j.meegid.2020.104260]"; Monday, July 13, 2020 5:22:00 PM
Roberto Munguia said...: There is an interesting post about the sequence and structure of Covid19 written by Carl Zimmer in the New York Times.

https://www.nytimes.com/interactive/2020/04/03/science/coronavirus-genome-bad-news-wrapped-in-protein.html; Monday, July 13, 2020 9:03:00 PM
Jouni Väliaho said...: According to nextstrain.org the most divergent SARS-CoV-2 strain found from Senegal contains 24 mutations https://nextstrain.org/ncov/global?m=div.; Wednesday, August 05, 2020 5:45:00 AM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Thursday, July 09, 2020

Structure and expression of the SARS-CoV-2 (coronavirus) genome

7 comments :