More Recent Comments

Sunday, May 30, 2021

Telomere-to-telomere sequencing of a complete human genome

Here's a paper that has recently been posted on the preprint server bioRxiv.

Nurk et al. (2021) The complete sequence of a human genome. [doi: 10.1101/2021.05.26.445798]

I usually don't like to comment on preprints but this one is surely going to be published somewhere and it's important.

The authors have sequenced the entire chromosomes (telomere-to-telomere) of the 22 autosomes and the X chromosome of the cell line CHM13. The cell line is a complete hydatiform mole, which means it is derived from a molar pregnancy where a sperm combines with an egg cell that has lost its nucleus. The sperm DNA duplicates giving rise to cells that have two identical copies of each chromosome. The karyotype of the CHM13 cell line is 46,XX. The advantage of sequencing the DNA from such cell lines is that the interpretation of the sequencing results is not complicated by the heterogeneity of normal diploid cell lines. This was important because the focus of this study was on sequencing repetitive regions of the chromosomes and most chromosome pairs have different numbers of repeats.

Wednesday, May 26, 2021

The SARS-CoV-2 reference genome

Chinese scientists isolated virus particles from a patient admitted to hospital on December 26, 2019 in Wuhan, China. The RNA genome was sequenced and the sequence was immediately distributed to interested scientists around the world. It was submitted to GenBank on January 5, 2020 and appeared as entry NC_045512 on January 13, 2020 [Wuhan seafood market pneumonia virus isolate Wuhan-Hu-1, complete genome].

The original GenBank record was annotated and updated by NIH staff on January 17, 2020 and now appears as updated locus NC_045512 last modified on July 18, 2020 now called SARS-CoV-2 [Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome].

The sequence was extensively mapped and analyzed by Chinese scientists in Shanghai, Wuhan, and Beijing, and Ed Holmes in Sydney, Australia and the results were submitted to Nature on January 7, 2020 and published on February 3, 2020.

Wu, F., Zhao, S., Yu, B., Chen, Y.-M., Wang, W., Song, Z.-G., Hu, Y., Tao, Z.-W., Tian, J.-H., Pei, Y.-Y., Yuan, M.-L., Zhang, Y.-L., Dai, F.-H., Liu, Y., Wang, Q.-M., Zheng, J.-J., Xu, L., Holmes, E.C. and Zhang, Y.-Z. (2020) A new coronavirus associated with human respiratory disease in China. Nature 579:265-269. [doi: 10.1038/s41586-020-2008-3]

Emerging infectious diseases, such as severe acute respiratory syndrome (SARS) and Zika virus disease, present a major threat to public health1–3. Despite intense research efforts, how, when and where new diseases appear are still a source of considerable uncertainty. A severe respiratory disease was recently reported in Wuhan, Hubei province, China. As of 25 January 2020, at least 1,975 cases had been reported since the first patient was hospitalized on 12 December 2019. Epidemiological investigations have suggested that the outbreak was associated with a seafood market in Wuhan. Here we study a single patient who was a worker at the market and who was admitted to the Central Hospital of Wuhan on 26 December 2019 while experiencing a severe respiratory syndrome that included fever, dizziness and a cough. Metagenomic RNA sequencing of a sample of bronchoalveolar lavage fluid from the patient identified a new RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human 1’ coronavirus (and has also been referred to as ‘2019-nCoV’). Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China. This outbreak highlights the ongoing ability of viral spill-over from animals to cause severe disease in humans.

The Nature paper notes that this is a novel coronavirus related to known bat coronaviruses but it's exact origin remains unclear. The authors also mention that the origin of other disease-causing coronavirus-like viruses is also unknown.

Coronaviruses are associated with a number of infectious disease outbreaks in humans, including SARS in 2002–2003 and Middle East respiratory syndrome (MERS) in 2012. Four other coronaviruses—human coronaviruses HKU1, OC43, NL63 and 229E—are also associated with respiratory disease. Although SARS-like coronaviruses have been widely identified in mammals including bats since 2005 in China, the exact origin of human-infected coronaviruses remains unclear. Here we describe a new coronavirus—WHCV—in the BALF from a patient who experienced severe respiratory disease in Wuhan, China. Phylogenetic analysis suggests that WHCV is a member of the genus Betacoronavirus (subgenus Sarbecovirus) that has some genomic and phylogenetic similarities to SARS-CoV1, particularly in the RBD of the spike protein. These genomic and clinical similarities to SARS, as well as its high abundance in clinical samples, provides evidence for an association between WHCV and the ongoing outbreak of respiratory disease in Wuhan and across the world. Although the isolation of the virus from only a single patient is not sufficient to conclude that it caused these respiratory symptoms, our findings have been independently corroborated in further patients in a separate study.

The identification of multiple SARS-like CoVs in bats have led to the idea that these animals act as hosts of a natural reservoir of these viruses. Although SARS-like viruses have been identified widely in bats in China, viruses identical to SARS-CoV have not yet been documented. Notably, WHCV is most closely related to bat coronaviruses, and shows 100% amino acid similarity to bat SL-CoVZC45 in the nsp7 and E proteins (Supplementary Table 3). Thus, these data suggest that bats are a possible host for the viral reservoir of WHCV. However, as a variety of animal species were for sale in the market when the disease was first reported, further studies are needed to determine the natural reservoir and any intermediate hosts of WHCV.

Subsequent work suggests that the virus did not originate in the Wuhan market but was circulating in Wuhan in November 2019 among a small number of people who were not associated with the market. It looks like market workers were the source of superspreader event.

It's important to keep in mind that the exact origin of several other viral diseases has never been determined. This is quite normal so don't be fooled by people who think that the mysterious origin of SARS-CoV-2 demands an immediate explanation. That's likely not going to happen no matter how many outside investigators go snooping around Wuhan looking for clues to support their favorite conspiracy theory.


Monday, May 10, 2021

MIT Professor Rick Young doesn't understand junk DNA

Richard ("Rick") Young is a Professor of Biology at the Massachusetts Institute of Technology and a member of the Whitehead Institute. His area of expertise is the regulation of gene expression in eukaryotes.

He was interviewed by Jorge Conde and Hanne Winarsky on a recent podcast (Feb. 1, 2021) where the main topic was "From Junk DNA to an RNA Revolution." They get just about everything wrong when they talk about junk DNA including the Central Dogma, historical estimates of the number of genes, confusing noncoding DNA with junk, alternative splicing, the number of functional RNAs, the amount of regulatory DNA, and assuming that scientists in the 1970s were idiots.

In this episode, a16z General Partner Jorge Conde and Bio Eats World host Hanne Winarsky talk to Professor Rick Young, Professor of Biology and head of the Young Lab at MIT—all about “junk” DNA, or non-coding DNA.

Which, it turns out—spoiler alert—isn’t junk at all. Much of this so-called junk DNA actually encodes RNA—which we now know has all sorts of incredibly important roles in the cell, many of which were previously thought of as only the domain of proteins. This conversation is all about what we know about what that non-coding genome actually does: how RNA works to regulate all kinds of different gene expression, cell types, and functions; how this has dramatically changed our understanding of how disease arises; and most importantly, what this means we can now do—programming cells, tuning functions up or down, or on or off. What we once thought of as “junk” is now giving us a powerful new tool in intervening in and treating disease—bringing in a whole new category of therapies.

Here's what I don't understand. How could a prominent scientist at one of the best universities in the world be so ignorant of a topic he chooses to discuss on a podcast? Perhaps you could excuse a busy scientist who doesn't have the time to research the topic but what excuse can you offer to explain why the entire culture at MIT and the Whitehead must also be ignorant? Does nobody there ever question their own ideas? Do they only read the papers that support their views and ignore all those that challenge those views?

This is a very serious question. It's the most difficult question I discuss in my book. Why has the false narrative about junk DNA, and many other things, dominated the scientific literature and become accepted dogma among leading scientists? Soemething is seriously wrong with science.


Saturday, May 08, 2021

World Health Organization (WHO) report on the natural origin theory of SARS-CoV-2

The origin of SARS-Cov-2 is a hot topic these days. As far as I can tell, the consensus view among the experts is that the ancestor is from bats but it evolved in an intermediate host before jumping to humans. However, there's a vocal group who claim that the virus was engineered in a lab in the Wuhan Institute of Virology and accidentally escaped causing a pandemic. A group of scientists from WHO investigated this speculation and decided that it was "extremely unlikely." I posted a summary of their analysis a few days ago [World Health Organization (WHO) report on the lab leak conspiracy theory].

That's not going to put an end to the speculation since proponents of the lab leak hypothesis are now saying that the WHO report, and the opinion of other experts, can't be trusted. They claim that there's a widespread consiracy to lie and cover up the fact that SARS-CoV-2 was created in a lab and leaked to the Wuhan population.

There's not much point in arguing with people once they go down the conspiricy theory path since they will refute every argument by claiming that it's part of the conspriacy. However, it's worth pointing out that there's a perfectly valid alternative explanation; namely, natural origin. For those who still have an open mind I'm posting the explanation of the WHO scientific team who conclude that this is the most likely explanation [WHO-convened global study of origins of SARS-CoV-2: China Part].

Introduction through intermediate host followed by zoonotic transmission

Explanation of hypothesis

SARS-CoV-2 is transmitted from an animal reservoir to an animal host, followed by subsequent spread within that intermediate host (spillover host), and then transmission to humans. The passage through an intermediate host can be without or with virus adaptation.

Arguments in favour

Although the closest related viruses have been found in bats, the evolutionary distance between these bat viruses and SARS-CoV-2 is estimated to be several decades, suggesting a missing link (either a missing progenitor virus, or evolution of a progenitor virus in an intermediate host). Highly similar viruses have also been found in pangolins, suggesting cross-species transmission from bats at least once, but again with considerable genetic distance. Both these putative hosts are infrequently in contact with humans, and an intermediary step involving an amplifying host has been observed for several other emerging viruses (Henipaviruses, influenza viruses, SARS-CoV and MERS-CoV). SARS-CoV-2 infection and intraspecies spread (including further transmission to humans) has been documented in an increasing number of animal species, particularly mustelids and felids. SARS-CoV-2 adapts relatively rapidly in susceptible animals (such as mink). The increasing number of animals shown to be susceptible to SARS-CoV-2 includes animals that are farmed in sufficient densities to allow potential for enzootic circulation. High-density farming is common in many places across the world and includes many livestock species as well as farmed wildlife. There was a large network of domesticated wild animal farms, supplying farmed wildlife. In high-density farms, there often are connections between farms (for instance, through the workforce and food supply), leading to complex transmission pathways that may be difficult to unravel, as was observed in other zoonotic outbreaks involving farmed animals. Optimized conditions for sustained virus transmission chains in large-scale animal farms may also impact on virus seasonality in favour of a year-round endemic transmission pattern, and thereby increasing the zoonotic risk in winter months.

Arguments against

SARS-CoV-2 has been identified in an increasing number of animal species, but genetic and epidemiological studies have suggested that these were infections introduced from humans, rather than enzootic virus circulation. In addition, since the containment of SARS-CoV-2 in China, new outbreaks have occurred for which genomic sequence data was generated. Based on epidemiological analysis and genetic sequencing of viruses from new cases throughout 2020, there is no evidence of repeated introduction of early SARS-CoV-2 strains of potential animal origins into humans in China. There was no genetic or serological evidence for SARS-CoV-2 in a wide range of domestic and wild animals tested to date. The screening of the major livestock species was done across the country and provided no evidence for circulation of a related virus. The scale of testing in these species was such that widespread circulation is extremely unlikely. Screening of farmed wildlife was limited but did not provide conclusive evidence for the existence of circulation.

Assessment of likelihood

Based on the above arguments, the scenario including introduction through an intermediary host was considered to be likely to very likely.

I should note that it's often very difficult to figure out who's right and who's wrong in a scientific controversy but in general there's one group that appears to be thinking critically and one that's not. Critical thinking is also hard to recognize but when I was teaching it we emphasized one important clue. Critical thinkers usually present both sides of an argument and discuss not only their own opinions but also the views of the other side. That's one of the things that impress me about the WHO report. It doesn't mean that they are necessarily correct but they sure look a lot better than proponents of the lab leak conspiracy theory who seem to dismiss out of hand the possibility of a natural origin.

I'd also like to make note of the fact the WHO is not a perfect organization. They have made mistakes during this pandemic as have every single government on the planet (some more than others). I'm not defending everything that WHO has done but I don't see any reason to be overly suspicious of the integrity of the scientists who wrote this report.


Friday, May 07, 2021

More misinformation about junk DNA: this time it's in American Scientist

Emily Mortola and Manyuan Long have just published an article in American Scientist about Turning Junk into Us: How Genes Are Born. The article contains a lot of misinformaton about junk DNA that I'll discuss below.

Emily Mortola is a freelance science writer who worked with Manyuan Long when she was an undergraduate (I think). Manyuan Long is the Edna K. Papazian Distinguished Service Professor of Ecology and Evolution in the Department of Ecology and Evolution at the University of Chicago. His main research interest is the origin of new genes. It's reasonable to suspect that he's an expert on genome structure and evolution.

The article is behind a paywall so most of you can't see anything more than the opening paragraphs so let's look at those first. The second sentence is ...

As we discovered in 2003 with the conclusion of the Human Genome Project, a monumental 13-year-long research effort to sequence the entire human genome, approximately 98.8 percent of our DNA was categorized as junk.

This is not correct. The paper on the finished version of the human genome sequence was published in October 2004 (Finishing the euchromatic sequence of the human genome) and the authors reported that the coding exons of protein-coding genes covered about 1.2% of the genome. However, the authors also noted that there are many genes for tRNAs, ribosomal RNAs, snoRNAs, microRNAs, and probably other functional RNAs. Although they don't mention it, the authors must also have been aware of regulatory sequences, centromeres, telomeres, origins of replication and possibly other functional elements. They never said that all noncoding DNA (98.8%) was junk because that would be ridiculous. It's even more ridiculous to say it in 2021 [Stop Using the Term "Noncoding DNA:" It Doesn't Mean What You Think It Means].

The part of the article that you can see also lists a few "Quick Takes" and one of them is ...

Close to 99 percent of our genome has been historically classified as noncoding, useless “junk” DNA. Consequently, these sequences were rarely studied.

This is also incorrect as many scientists have pointed out repeatedly over the past fifty years or so. At no time in the past 50 years has any knowledgeable scientist ever claimed that all noncoding DNA is junk. I'm sorely tempted to accuse the authors of this article of lying because they really should know better, especially if they're writing an article about junk DNA in 2021. However, I reluctantly defer to Hanlon's razor.

Mortola and Long claim that mammalian genomes have between 85% to 99% junk DNA and wonder if it could have a function.

To most geneticists, the answer was that it has no function at all. The flow of genetic information—the central dogma of molecular biology—seems to leave no role for all of our intergenic sequences. In the classical view, a gene consists of a sequence of nucleotides of four possible types--adenine, cytosine, guanine, and thymine--represented by the letters A, C, G, and T. Three nucleotides in a row make up a codon, with each codon corresponding to a specific amino acid, or protein subunit, in the final protein product. In active genes, harmful mutations are weeded out by selection and beneficial ones are allowed to persist. But noncoding regions are not expressed in the form of a protein, so mutations in noncoding regions can be neither harmful nor beneficial. In other words, "junk" mutations cannot be steered by natural selection.

Those of you who have read this far will cringe when reading that. There are so many obvious errors in that paragraph that applying Hanlon's razor seems very complimentary. Imagine saying in the 21st centurey that the Central Dogma leaves no role at all for regulatory sequences or ribosomal RNA genes! But there's more; the authors double-down on their incorrect understanding of "gene" in order to fit their misunderstanding of the Central Dogma.

What Is a Gene, Really?

In our de novo gene studies in rice, to truly assess the potential significance of de novo genes, we relied on a strict definition of the word "gene" with which nearly every expert can agree. First, in order for a nucleotide sequence to be considered a true gene, an open reading frame (ORF) must be present. The ORF can be thought of as the "gene itself"; it begins with a starting mark common for every gene and ends with one of three possible finish line signals. One of the key enzymes in this process, the RNA polymerase, zips along the strand of DNA like a train on a monorail, transcribing it into its messenger RNA form. This point brings us to our second important criterion: A true gene is one that is both transcribed and translated. That is, a true gene is first used as a template to make transient messenger RNA, which is then translated into a protein.

Five Things You Should Know if You Want to Participate in the Junk DNA Debate

The authors admit in the next paragraph that some pseudogenes may produce functional RNAs that are never translated into proteins but they don't mention any other types of gene. I can understand why you might concentrate on protein-coding genes if you are studying de novo genes but why not just say that there are two types of genes and either one can arise de novo? But there's another problem with their definition: they left out a key property of a gene. It's not sufficient that a given stretch of DNA is transcribed and the RNA is translated to make a protein: the protein has to have a function before you can say that the stretch of DNA is a gene [What Is a Gene?]. We'll see in a minute why this is important.

The main point of the paper is the birth of de novo genes and the authors discuss their work with the rice genome. They say they've discovered 175 de novo genes but they don't say how many have a real biological function. This is an important problem in this field and it would have been fascinating to see a description of how they go about assigning a function to their, mostly small, pepides [The evolution of de novo genes]. I'm guessing that they just assume a function as soon as they recognize an open reading frame in a transcript.

As you can see from the title of the article, the emphasis is on the idea that de novo genes can arise from junk DNA—a concept that's not seriously disputed. The one good thing about the article is that the authors do not directly state that the reason for junk DNA is to give rise to new genes but this caption is troubling.

The Human Genome Project was a 13-year-long research effort aimed at mapping the entire human genetic sequence. One of its most intriguing findings was the observation that the number of protein-coding genes estimated to exist in humans--approximately 22,300--represents a mere 1.2 percent of our whole genome, with the other 98.8 percent being categorized as noncoding, useless junk. Analyses of this presumed junk DNA in diverse species are now revealing its role in the creation of genes.

Why do science writers continue to spread misinformation about junk DNA when there's so much correct information out there? All you have to do is look [More misconceptions about junk DNA - what are we doing wrong?].


World Health Organization (WHO) report on the lab leak conspiracy theory

There's been a lot of talk about the possibility that SAR-CoV-2 originated in the Wuhan Institute of Virology and accidentally escaped, causing the COVID-19 pandemic. There's no evidence that directly supports this possibility and plenty of evidence that casts serious doubt on the lab leak hypothesis. In order to discount the evidence against the hypothesis its supporters claim that scientists are lying and covering up the accidental release with the active cooperation of the Chinese government. Thus, an original scientific hypothesis has morphed into a full-blown conspiracy theory.

As with any conspiracy theory, there are all kinds of "facts" that have only been uncovered on twitter or Reddit but there are also speculations published by the Trump administration. It's very difficult verify or refute many of these "facts."

However, there's one fact that is widely misinterpreted and that's the report of the WHO scientists who visited the Wuhan Institute of Virology in order to investigate the lab leak hypothesis. They concluded that it was "extremely unlikely" so, as you might expect, the WHO scientists are now part of the conspiracy. Here's a copy of the section on the lab leak hypothesis from the WHO full report issued on March 30 2021 [WHO-convened global study of origins of SARS-CoV-2: China Part].

Introduction through a laboratory incident

Explanation of hypothesis

SARS-CoV-2 is introduced through a laboratory incident, reflecting an accidental infection of staff from laboratory activities involving the relevant viruses. We did not consider the hypothesis of deliberate release or deliberate bioengineering of SARS-CoV-2 for release, the latter has been ruled out by other scientists following analyses of the genome (3).

Arguments in favour

Although rare, laboratory accidents do happen, and different laboratories around the world are working with bat CoVs. When working in particular with virus cultures, but also with animal inoculations or clinical samples, humans could become infected in laboratories with limited biosafety, poor laboratory management practice, or following negligence. The closest known CoV RaTG13 strain (96.2%) to SARS-CoV-2 detected in bat anal swabs have been sequenced at the Wuhan Institute of Virology. The Wuhan CDC laboratory moved on 2nd December 2019 to a new location near the Huanan market. Such moves can be disruptive for the operations of any laboratory.

Arguments against

The closest relatives of SARS-CoV-2 from bats and pangolin are evolutionarily distant from SARS-CoV-2. There has been speculation regarding the presence of human ACE2 receptor binding and a furin-cleavage site in SARS-CoV-2, but both have been found in animal viruses as well, and elements of the furin-cleavage site are present in RmYN02 and the new Thailand bat SARSr-CoV. There is no record of viruses closely related to SARS-CoV-2 in any laboratory before December 2019, or genomes that in combination could provide a SARS-CoV-2 genome. Regarding accidental culture, prior to December 2019, there is no evidence of circulation of SARS-CoV-2 among people globally and the surveillance programme in place was limited regarding the number of samples processed and thereforethe risk of accidental culturing SARS-CoV-2 in the laboratory is extremely low. The three laboratories in Wuhan working with either CoVs diagnostics and/or CoVs isolation and vaccine development all had high quality biosafety level (BSL3 or 4) facilities that were well-managed, with a staff health monitoring programme with no reporting of COVID-19 compatible respiratory illness during the weeks/months prior to December 2019, and no serological evidence of infection in workers through SARS-CoV-2-specific serology-screening. The Wuhan CDC lab which moved on 2nd December 2019 reported no disruptions or incidents caused by the move. They also reported no storage nor laboratory activities on CoVs or other bat viruses preceding the outbreak.

Assessment of likelihood

In view of the above, a laboratory origin of the pandemic was considered to be extremely unlikely.

Please refer to the original report whenever you see the conspiracy theorists making claims about what WHO did or did not report. Those claims are not always accurate; for example, it is widely reported that WHO confirmed that there were COVID-19 case among lab workers in the autumn of 2019 but, as you can see, WHO refuted that part of the conspiracy theory.


Wednesday, May 05, 2021

Lab leak conspiracy theory rears its ugly head again: this time it's Nicholas Wade of the New York Times

Nicholas Wade used to be a serious science writer but he lost that title many years ago when he proved that he was incapable of distinguishing fact from wishful thinking [Nicholas Wade on the Origin of Life ]. Now he's gone completely bonkers by promoting the ridiculous conspiracy theory that the COVID-19 pandemic was started when the SARS-CoV-2 virus leaked from a lab at the Whuhan Institute of Virology (WIV) [Origin of Covid — Following the Clues].

Nicholas Wade claims that the virologists at the WIV, led by Dr. Shi, created the SARS-CoV-2 virus by genetic engineering. Their goal, according to Wade, was to make a virus that was as deadly to humans as possible in order to study its effects in the lab. Unfortunately, the virus escaped from the lab, according to Wade, and started the pandemic.

Shi Zhengli responded to those silly accusations in July 2020 [Wuhan coronavirus hunter Shi Zhengli speaks out].

On 15 July, Shi emailed Science answers to a series of questions about the virus' origin and her research. In them, she hit back at speculation that the virus leaked from WIV. She and her colleagues discovered the virus in late 2019, she says, in samples from patients who had a pneumonia of unknown origin. “Before that, we had never been in contact with or studied this virus, nor did we know of its existence,” Shi wrote.

“U.S. President Trump's claim that SARS-CoV-2 was leaked from our institute totally contradicts the facts,” she added. “It jeopardizes and affects our academic work and personal life. He owes us an apology.”

Why is this a conspiracy theory? Because the speculation has been investigated by WHO scientists who found no evidence to support it. They saw that the lab protocols at the Institute were very good, as you would expect for a world class lab that was studying dangerous viruses that were known to cause pandemics. Furthermore, none of the workers at the lab tested positive for COVID-19 and none of them were studying any virus that resembled SARS-CoV-19. So, in order for the lab leak hypothesis to be true there has to have been a massive coverup by a very large number of people. That's what makes it a conspriacy theory.

Nicholas Wade gets a lot of his information from Richard Ebright who has been promoting the lab leak conspiracy theory for the past year. Ebright thinks the WHO investigators "... were willing—and in at least one case, enthusiastic—participants in disinformation" [An Interview with Richard Ebright: The WHO Investigation Members Were “participants in disinformation”]. This is classic conspiracy theory stuff: everyone who disagrees with you is part of the conspiracy.

If you still think the lab leak conspiracy theory is true then I urge you to watch this video of a talk by Professor Edward ("Eddy") Holmes, the 2020 New South Wales (Australia) scientist of the year and an expert on human viruses, especially the coronoviruses [The Discovery and Origins of SARS-CoV-2]. He explains why the viruses are likely to orginate in bats and explains why this particular virus started off in bats but probably passed though an intermediate host before reaching humans. (His preferred intermediate host is racoon dogs and he explains why he thinks this is likely.) He explains why the sequence of the virus is entrely consistent with a natural origin. He describes his field work in China and Southeast Asia and his collaborations with the expert scientists in China, including those at the Wuhan Institute of Virology.

Holmes, addreses the conspiracy theory at 41:45 minutes into the talk so you can skip rght to there if you like—although I don't recommend it because there's lots of useful information in the first 40 minutes. Here's why he rejects that cosnspiracy theory and why you should too. These are the facts, according to Holmes. I agree with him.

  • There's "no evidence that SARS-CoV-2 is engineered (and no reason to bioengineer a random bat virus)." Holmes calls this idea is "absolute nonsense." I'm guessing he won't be a fan of Nicholas Wade's article.
  • "Bat virus RaTG13 is not the direct ancestor of SAR-CoV-2—all the components of the virus exist in nature."
  • "No evidence of a secret SARS-CoV-2-like virus kept at the WIV (and no reason to keep it a secret before the pandemic)." The scientists at WIV say that they were not studying such a virus and Holmes says, "Frankly, I believe them." Nicholas Wade thinks they are lying but offers no proof and no reason to justify the lie.
  • The SARS-CoV-2 virus is probably not directly from bats and WIV was only studying bat viruses. Furthermore, the virus is probably not from Yunnan province where the Wuhan Institute of Virology is located.
  • "SARS-CoV-2 was not perfectly adapted to humans on first emergence and appears to be a "generalist" virus." Nicholas Wade is wrong about this as well.
  • "Cases near WIV only appeared later in the outbreak." The first cases in Wuhan appear in the market, specifically in the area where live animals are sold. This strongly suggests that the virus came from animals in the market and that it originated in those animals somewhere else. There were cases in December 2019 that were not linked to the market but they were nowhere near the WIV.
  • "No evidence of SARS-CoV-2 infection at WIH—staff were PCR/antibody negative." Holmes says that if this is true then that rules out the lab leak hypothesis automatically. He's says that either this is the biggest coverup in history and they're all lying or there's no evidence at all that the virus was ever in the lab. He concludes that the virus did not come from the lab but he's sure that the conspiracy theory is not going to go away anytime soon.

Holmes is right. The conspiracy theory is not going away because its proponents think that all Chinese are evil and can't be trusted. Those conspiracy believers are wrong. Please don't spread this ridiculous idea; it makes you no better than QAnon cultists.

If you're really interested in the facts then there are several articles on the origin of SARS-CoV-2 that you should read before falling for the lab leak conspiracy thoery. Here's one.

MacLean, O.A., Lytras, S., Weaver, S., Singer, J.B., Boni, M.F., Lemey, P., Pond, S.L.K. and Robertson, D.L. (2021) Natural selection in the evolution of SARS-CoV-2 in bats created a generalist virus and highly capable human pathogen. PLoS Biology 19:e3001115. [doi: 10.1371/journal.pbio.3001115]

Virus host shifts are generally associated with novel adaptations to exploit the cells of the new host species optimally. Surprisingly, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has apparently required little to no significant adaptation to humans since the start of the Coronavirus Disease 2019 (COVID-19) pandemic and to October 2020. Here we assess the types of natural selection taking place in Sarbecoviruses in horseshoe bats versus the early SARS-CoV-2 evolution in humans. While there is moderate evidence of diversifying positive selection in SARS-CoV-2 in humans, it is limited to the early phase of the pandemic, and purifying selection is much weaker in SARS-CoV-2 than in related bat Sarbecoviruses. In contrast, our analysis detects evidence for significant positive episodic diversifying selection acting at the base of the bat virus lineage SARS-CoV-2 emerged from, accompanied by an adaptive depletion in CpG composition presumed to be linked to the action of antiviral mechanisms in these ancestral bat hosts. The closest bat virus to SARS-CoV-2, RmYN02 (sharing an ancestor about 1976), is a recombinant with a structure that includes differential CpG content in Spike; clear evidence of coinfection and evolution in bats without involvement of other species. While an undiscovered “facilitating” intermediate species cannot be discounted, collectively, our results support the progenitor of SARS-CoV-2 being capable of efficient human–human transmission as a consequence of its adaptive evolutionary history in bats, not humans, which created a relatively generalist virus.


Monday, May 03, 2021

More illusions/delusions of James Shapiro and Denis Noble

It was just a few weeks ago that I discussed short articles by Denis Noble and James Shapiro that were published in the journal Biosemiotics [The illusions of Denis Noble] [The illusions of James Shapiro].

Several readers questioned whether Biosemiotics is a real science journal and they were right: it's a kooky journal and that's why it publishes papers by kooks. However, we now have a new paper by Shapiro and Noble that's about to appear in a legitimate scientific journal; albeit, one that has seen better days. This would normally raise red flags concerning peer review but we're long past the time when we can count on peer review to weed out the kooks.

Here's the paper. I'm not going to discuss all the main points because they were covered in my previous posts. I'll just concentrate on the most ridiculous part in order to illustrate the (lack of) quality of this paper.1

Shapiro, J. and Noble, D. (2021) What prevents mainstream evolutionists teaching the whole truth about how genomes evolve? Progress in Biophysics and Molecular Biology. [doi: 10.1016/j.pbiomolbio.2021.04.004]

The common belief that the neo-Darwinian Modern Synthesis (MS) was buttressed by the discoveries of molecular biology is incorrect. On the contrary those discoveries have undermined the MS. This article discusses the many processes revealed by molecular studies and genome sequencing that contribute to evolution but nonetheless lie beyond the strict confines of the MS formulated in the 1940s. The core assumptions of the MS that molecular studies have discredited include the idea that DNA is intrinsically a faithful self-replicator, the one-way transfer of heritable information from nucleic acids to other cell molecules, the myth of “selfish DNA,” and the existence of an impenetrable Weismann Barrier separating somatic and germ line cells. Processes fundamental to modern evolutionary theory include symbiogenesis, biosphere interactions between distant taxa (including viruses), horizontal DNA transfers, natural genetic engineering, organismal stress responses that activate intrinsic genome change operators, and macroevolution by genome restructuring (distinct from the gradual accumulation of local microevolutionary changes in the MS). These 21st Century concepts treat the evolving genome as a highly formatted and integrated Read-Write (RW) database rather than a Read-Only Memory (ROM) collection of independent gene units that change by random copying errors. Most of the discoverers of these macroevolutionary processes have been ignored in mainstream textbooks and popularizations of evolutionary biology, as we document in some detail. Ironically, we show that the active view of evolution that emerges from genomics and molecular biology is much closer to the 19th century ideas of both Darwin and Lamarck. The capacity of cells to activate evolutionary genome change under stress can account for some of the most negative clinical results in oncology, especially the sudden appearance of treatment-resistant and more aggressive tumors following therapies intended to eradicate all cancer cells. Knowing that extreme stress can be a trigger for punctuated macroevolutionary change suggests that less lethal therapies may result in longer survival times.

The section on "selfish DNA" is the one that seems to have the highest number of misleading and false statements per paragraph.

1.4. The end of “selfish” or “junk” DNA

A major shortcoming of the MS is that it was based on a “gene-centric” view, which assumed that the genome is basically a collection of “genes” that are the protein-coding units of heredity and heritable variation. As we saw in the quotation from Goldschmidt's 1940 book, this view failed to take the evolutionary importance of chromosome structure into account (Goldschmidt, 1940). It also blinded evolutionary biologists to the importance of McClintock's mid- 20th Century discovery of mobile “controlling elements” (McClintock, 1987). Both the ideas of genetic transposition and control of gene expression by these non-coding mobile elements did not fit within the narrow confines of the MS concepts of genome function and variation. A further empirical assault on the limited MS conceptual framework came in the late 1960s when Britten and Kohne discovered that a significant fraction of genomic DNA from complex eukaryotes consists of highly repetitive sequences rather than the unique coding sequences expected to make up the hereditary material (Britten and Kohne, 1968).

  • The title is ridiculous since no respectable scientist ever equated selfish DNA with junk DNA [Selfish genes and transposons].

  • The Modern Synthesis (MS) was not based on a "gene-centric" view.
  • For the past 50 years, no respectable scientist, and no knowledgeable expert in molecular evolution, has restricted the definition of "gene" to just protein-coding genes.
  • For the past 50 years, no expert in molecular evolution has ever thought that the genome is just a collection of protein-coding genes.
  • For the past 50 years, experts in molecular biology have known about transposons and have considered the view that some of them might be "controlling elements." They have concluded that most transposon-related sequences are just fragments of defective transposons with no biological function.
  • Nobody cares whether mobile genetic elements fit within the narrow confines of the Modern Synthesis as described by Huxley and other in the 1940s because no exeprt in molecular evolution has believed in that view of evolution since the late 1960s.
  • The Britten and Kohne paper established that the genomes of most multicellular eukaryotes contain large amounts of repetivie DNA. This was an attempt to resolve the C-value paradox. Britten and Kohne didn't like the idea that this could be junk DNA so they offered some speculation about function. However, futher data established that most of this repetitive DNA is, indeed, junk and Britten and Kohn's speculations have been discredited. Britten and Kohn were attempting to interpret their result within the context of the adaptationist views that characterized the the Modern Synthesis back then. The correct interpretation of their results came with the overthrow of the Modern Synthesis and the adoption of a new view of evolutionary theory that focused on Neutral Theory, Nearly-Neural Theory, and the importance of random geneitc drift. Shaprio and Noble missed that revolution so they continue to attack an old-fashioned strawman version of evolutionay theory.

Before continuing, it's important to realize that by the early 1970s selectionist thinking had been abandoned by the experts in genome evolution. By 1978 Gould and Lewontin tried, unsccessfully, to convince all other biologists to abandon the old selectionist way of thinking [The Spandrels of San Marco and the Panglossian Paradigm]. James Shapiro and Denis Noble are among those other biologists who didn't get the message.

In order to apply selectionist thinking to explain the presence of so much non-coding DNA, evolutionary biologists called this unexpected portion of the genome “junk DNA” (Ohno, 1972) or “selfish DNA” (Orgel and Crick, 1980). Richard Dawkins used an extreme view of these “selfish genes” to erect a whole philosophy of strictly passive evolutionary gradualism (Dawkins, 1976). Today we know that the human genome contains at least 30X as much repetitive non-coding DNA as protein-coding sequences (Lander et al., 2001). Repetitive DNA provides formatting signals for transcription, epigenetic modification and chromosome mechanics and also is the most variable component in the evolutionary diversification of complex genomes (Symonová and Howell, 2018; Subirana et al., 2015; Matsubara et al., 2016; CioffiMde et al., 2015; Chalopin et al., 2015; Shao et al., 2019; Böhne et al., 2008; Li et al., 2016; Oliver et al., 2013). A 2013 plot of organismal complexity against protein-coding and non-coding DNA showed that coding DNA peaked at approximately ∼3 × 107 bp, while the non-coding DNA increased linearly with growing complexity up to ∼2–3 x 1010 bp (Liu et al., 2013). In other words, non-coding DNA tracked organismal complexity better than the protein-coding genes. The “encyclopedia of DNA elements” (ENCODE) project, which largely abandoned the term “gene,” revealed that the large majority of the so-called junk DNA is actively transcribed in a regulated manner, indicating that it is functional (Consortium, 2012; Pennisi, 2012).

  • It is completely, totally, ridiculous to say that the idea of junk DNA was due to selectionist thinking. The first statement in this paragraph is powerful evidence that Shaprio and Noble don't know what they are talking about. The concept of junk DNA is a rejection of selectionist thinking.
  • The use of "noncoding DNA" is what's called a "tell."
  • Again, equating junk DNA with selfish DNA is stupid. If all the excess DNA were selfish then it isn't junk because it has a function.
  • Richard Dawkins' view on evolution is closer to the old-fashioned adaptationist view that was abandoned by the experts by the time he wrote The Selfish Gene. Dawkins book is not really about "genes," however, as is clear to anyone who has read it. He's talking about any piece of DNA that confers a fitness advantage. The Dawkins strawman is a favorite target of the Third Way types but it's just a strawman.
  • No significant proportion of repetitive DNA has a function in spite of the references quoted above.
  • There is no significant correlation between organismal compexity and noncoding DNA. Lots of very similar species, such as onions, have very different genome sizes.
  • No knowledgeable scientist since the 1980s thinks there should be a significant correlation between the number of genes and organismal complexity. We know that most of the phenotypic differences between multicellular species are due to changes in the timing and amount of expression of a standard set of genes. This is the main discovery of evolutionary-developmental biology (evo-devo), another revolution that Shapiro and Nobel missed. They should educate themselves by reading Sean B. Carroll's books.
  • The ENCODE researchers did lots of silly things but they did NOT abandon the term "gene."
  • The idea that most of our genome is functional because of ENCODE is laughable in 2021. The fact that Shapiro and Noble would bring this up is another "tell" and the fact that they would reference Elizabeth Pennisi is even more revealing. These guys are incapable of thinking critically.

Shaprio and Noble then describe a few examples of repetitive DNA sequences that have a known function and they point out that a number of noncoding genes have been indentified. They imply that these functional sequences make up a signifcant fraction of the genome thus calling the concept of junk DNA into question. They close the section with,

Clearly, none of the eminent scientists who wrote about junk or selfish DNA could possibly have imagined the wide range of cellular functionalities that we know today are executed by ncRNA molecules. The idea that a genome was just a collection of protein coding sequences has proved completely inadequate.

  • I don't know about you, dear reader, but I'll match those "eminent scientists" against Shapiro and Noble any day. I'd love to see them try to defend their views in a public debate against some of the leading proponents of junk DNA. I know where my money would be.

Let me close by quoting the last chapter of this paper. I don't intend to comment on it except to say that it gives new meaning to the word "irony."

The campaign to sustain the Modern Synthesis causes real harm in a number of different ways. Among doctors treating bacterial infections, ignorance of real-world evolutionary processes has led to a situation in which the available antibiotics have lost their effectiveness against many life-threatening conditions (CDC et al., 2019). Among the general public, the inability to comprehend the potential all living organisms possess for transferring and reorganizing genomic configurations makes them unprepared to form sound judgements about how society should utilize its growing arsenal of biotechnology tools acquired from our microbial neighbors, like CRISPR (Doudna, 2020). Among oncologists, MS thinking prevents the practitioners treating cancer patients from recognizing the dangers of overtreating tolerable tumors in ways that may provoke a macroevolutionary transition to a far more lethal and untreatable disease (Heng, 2019). Finally, in the battle against obscurantism and anti-evolution prejudice, insistence on an outdated set of assertions about how life can change itself leaves the defenders of rigorous scientific inquiry without satisfactory responses to critics. Clearly, the time has come for the mainstream evolution community to recognize and join the scientific reality of the 21st Century.

Finally, one of the most important properties of kooks is that they find each other and they tend to hang out together, either physically or virtually. I'm not sure why this happens since they often espouse mutually exclusive views. I'm guessing that we can explain it in two different ways: (1) they are all outsiders fighting against a common enemy; namely, real science, and (2) they lack critical thinking skills so they don't see the flaws in each other's arguments.


1. In case you didn't recognize the quality from the title.

Thursday, April 29, 2021

Chromatin organization at promoters in yeast cells

Our genome is very large and very complicated because it is full of junk DNA. It contains thousand of sites where DNA binding proteins can bind just by chance. This leads to the reorganization of nucleosomes in a way that mimics functional sites. It's difficult to distinguish these spurious sites from real functional sites and that has led to much confusion in the scientific literature.1

The yeast genome is much more simple and it's safe to assume that almost all of the sites detected by the standard chromatin assays are genuine, biologically relevant, sites. In that sense, it serves as a model for what functional sites looks like. A recent paper in Nature (April 8, 2021) reports on the mapping of most of the sites in the yeast genome where DNA binding proteins are found.

Rossi, M.J., Kuntala, P.K., Lai, W.K., Yamada, N., Badjatia, N., Mittal, C., Kuzu, G., Bocklund, K., Farrell, N.P., Blanda, T.R.M., Joshua D, V, B.A., Mistretta, K.S., Rocco, D.J., Perkinson, E.S., Kellogg, G.D., Mahony, S. and Pugh, B.F. (2021) A high-resolution protein architecture of the budding yeast genome. Nature 592:309-314. [doi: 10.1038/s41586-021-03314-8]

Origins of replication

Origins of replication are also called autonomously replicating sequence consensus sequences (ACS). There are 253 of them in the yeast genome and they are characterized by a 300 bp nucloeosome-free region that's occupied by the origin recognition complex (ORC) and the helicase MCM.

Telomeres

Telomeres are bound by a number of proteins including silent information regulators (SIRs). There's a nucleosome-free region of about 300 bp. where these proteins are located.

Centromeres

The nucleosome-free region at centromeres covers only 170 bp where a number of centromere binding proteins are located. The absence of nucleosomes at the centromere is a surprise since it was though that centromere DNA was bound by modified nucleosomes containing a specific histone variant.

Tuesday, April 27, 2021

Asymptomatic and presymptomatic spread of SARS-CoV-2

It is widely believed that a substantial amount of viral spread is due to individuals who are transmitting the virus but have no symptoms (asymptomatic spread) but there's so much misinformation about COVID-19 out there that I'm having trouble sorting out real science from fake science so I've become skeptical of just about everything.

I'm not talking about the kind of fake science being spread on FOX News, I'm also talking about misinformation spread by ordinary people like me and the typical readers of this blog. We might do it inadvertantly but it's still wrong.

What's the real data on asymptomatic spread? I don't know, but here's a summary of the issue in a recent issue of Science. It sounds good to me because the authors take steps to address questions that seem obvious.

Rasmussen, A.L. and Popescu, S. V. (2021) SARS-CoV-2 transmission without symptoms. Science 371: 1204-1207. [doi: 10.1126/science.abf9569]

Sunday, April 25, 2021

Happy DNA day 2021!

It was 68 years ago today that the famous Watson and Crick paper was published in Nature along with papers by Franklin & Gosling and Wilkins, Stokes, & Wilson. Threre's a great deal of misinformation circulating about this discovery so I wrote up a brief history of the events based largely on Horace Freeland Judson's book The Eighth Day of Creation. Every biochemistry and molecular biology student must read this book or they don't qualify to be an informed scientist. However, if you are not a biochemistry student then you might enjoy my short version.

Some practising scientists might also enjoy refreshing their memories so they have an accurate view of what happened in case their students ask questions.

The Story of DNA (Part 1)

Where Rosalind Franklin teaches Jim and Francis something about basic chemistry.

The Story of DNA (Part 2)

Where Jim and Francis discover the secret of life.

Here are some other posts that might interest you on DNA Day.



Wednesday, April 21, 2021

Douglas Axe pretends to be an expert on intelligent design

This is a really interesting video presentation by Dougla Axe, a leading proponent of Intelligent Design Creationism. He's criticizing the argument from poor design; an argument that attempt to refute intelligent design by pointing out examples of poor design that a creator would never create. Axe uses an example from Neil deGrasse Tyson and if you look at this objectively you would say that Axe does a pretty good job of refuting Tyson's claims.

Tyson is not a biologist and he shouldn't pretend to be one, but that's not the most interesting take-home lesson from this video. The most interesting point concerns the comments Douglas Axe makes at the end of the video beginning at 11:30 minutes. He claims that Neil deGrasse Tyson is not an expert on designing life so it's foolish of him to pretend that he knows anything about the subject. When you hear someone making an imperfect design argument he asks his listeners to challenge them by saying, "What have YOU made that you think qualifies you to critique life."

Yep. He actually said that! Someone who promotes intelligent design without any experience in designing life actually tried to use that argument against opponents of intelligent design.

God has an inordinate fondness for beetles.


J.B.S. Haldane

The burden of proof is on Intelligent Design Creationists to demonstrate how their view is compatible with science and with the history of life. They have to demonstrate why it took 3.5 billion years to get where we are today and why the history of life is so compatable with evolution. They have to demonstrate why millions of species of bacteria and almost as many species of beetles can only be explained by the actions of an intelligent designer. They have to explain why all the data shows that modern humans and chimpanzees have descended by gradual fixation of mutations from a common ancestor that lived only a few million years ago. They have to explain why an intelligent designer would design a genome that's 90% junk.

These creationists haven't made anything that qualifies them to be experts on the design of life1 but I'm willing to listen to any ideas they have. So far, all we've seen is criticisms of evolution, which is also a topic where they lack expertise.


The Haldane quotation is accurate. See “"A Special Fondness for Beetles" by Stephen J. Gould in Dinosaur in a Haystack.

1. Unless they have some special insight into the mind of god in which case they should be able to tell us exactly how he did it. Why did he create all those strange animals in the Cambrian only to allow most of them to go exinct? And speaking of extinctions, what did he have against most dinosaurs that he decided to kill them by smashing a meteor into the Earth 66 million years ago? Can you explain that, Dr. Axe?

The illusions of James Shapiro

James A. Shapiro is a professor in the Department of Biochemistry and Molecular Biology at the University of Chicago (Chicago, USA). He made signficant contributions to our understanding if the function and structure of transposons but in later years he has become a vocal opponent of evolution culminating in his 2011 book Evolution: A View from the 21st Century. He is one of the founding members of The Third Way of Evolution.

I wrote a critical review of Evolution: A View from the 21st Century for the National Center for Science Education (NCSE) Reports but the issue is no longer visible on the web. Shapiro didn't like my review so NCSE published his rebutal and that's also unavailable. You can see my response at: James Shapiro Responds to My Review of His Book.

Monday, April 19, 2021

The illusions of Denis Noble

Denis Noble was a Professor of Physiology at Oxford University in the United Kingdom until he retired. He had a distinguished career as a physiologist making significant contributions to our undestanding of the heart and its relationship to the whole organism.

In recent years, Noble has dabbled in philosophy and evolution. He has become a vocal opponent modern evolution (sensu Noble) and the way science is currently conducted. Some of his criticisms have made it onto two popular books: The Music of Life and Dance to the Tune of Life. He is one of the leading proponents of the "Extended Evolutionary Synthesis" (EES) and he is one of the founders of The Third Way of Evolution, a wishy-washy and scientifically inaccurate way of attacking a strawman version of evolution and providing a safe haven for religious scientists.

Saturday, April 17, 2021

Philosophers argue that scientific conclusions need not be accurate, justified, or believed by their authors

A remarkable paper has just been posted to a philosophy of science preprint website. (It will be published in Synthase.) Like many papers in this field it's difficult to read and the logic is obtuse but the bottom line is that scientists don't really need to be held to the old standards that we scientists used to think are essential.

Dang, Haixin and Bright, Liam Kofi (2021) Scientific Conclusions Need Not Be Accurate, Justified, or Believed by their Authors. PhilSci Archive {PDF]

We argue that the main results of scientific papers may appropriately be published even if they are false, unjustified, and not believed to be true or justified by their author. To defend this claim we draw upon the literature studying the norms of assertion, and consider how they would apply if one attempted to hold claims made in scientific papers to their strictures, as assertions and discovery claims in scientific papers seem naturally analogous. We first use a case study of William H. Bragg’s early 20th century work in physics to demonstrate that successful science has in fact violated these norms. We then argue that features of the social epistemic arrangement of science which are necessary for its long run success require that we do not hold claims of scientific results to their standards. We end by making a suggestion about the norms that it would be appropriate to hold scientific claims to, along with an explanation of why the social epistemology of science—considered as an instance of collective inquiry—would require such apparently lax norms for claims to be put forward.

Tuesday, April 13, 2021

How do you explain evolution to non-experts?

I spent a lot of time explaining evolution in my book. The goal is to educate readers to the level where they can understand the drift-barrier hypothesis and why slightly deleterious mutations can accumulate in species with small populations. This requires some knowledge of random genetic drift and some knowledge of Neutral Theory and Nearly-Neutral Theory. The emphasis is on population genetics as the most important way of understanding evolution.

You can't understand genomes and junk DNA unless you have a firm understanding of evolution. In fact, you can't make sense of anything about genes and gene expression without such knowledge ... what the heck, nothing in all of biology makes sense if you don't know about evolution.

My approach hasn't been copied by popular websites. They usually misrepresent evolution by presenting it as adaptation; natural selection is the only game in town. I'll put in a link to Francis Collins describing evolution in truly bizarre narration but my question for Sandwalk readers is whether this is useful or not. Is it better to dumb down evolution on the NIH: National Huamn Genome Website [Evolution] or is this a bad idea?


Friday, April 09, 2021

Should we teach genomics and evolution to medical students?

Rama Singh,1 a biology professor at McMaster Universtiy in Hamilton (Ontario, Canada) has just published an interesting article on The Conversation website. It's about Medical schools need to prepare doctors for revolutionary advances in genetics. You can read the full article yourself but let me highlight the last few paragraphs to start the discussion.

Future physicians will be part of health networks involving medical lab technicians, data analysts, disease specialists and the patients and their family members. The physician would need to be knowledgeable about the basic principles of genetics, genomics and evolution to be able to take part in the chain of communication, information sharing and decision-making process.

This would require a more in-depth knowledge of genomics than generally provided in basic genetics courses.

Much has changed in genetics since the discovery of DNA, but much less has changed how genetics and evolution are taught in medical schools.

In 2013-14 a survey of course curriculums in American and Canadian medical schools showed that while most medical schools taught genetics, most respondents felt the amount of time spent was insufficient preparation for clinical practice as it did not provide them with sufficient knowledge base. The survey showed that only 15 per cent of schools covered evolutionary genetics in their programs.

A simple viable solution may require that all medical applicants entering medical schools have completed rigorous courses in genetics and genomics.

Here's the problem. I've just finished research on a book about modern evolution and genomics so I think I know a little bit about the subject. I'm also on the editorial board of a journal that publishes research on biochemistry and molecular biology education. I've written a biochemistry textbook and I have far too many years of experience trying to teach this material to graduate students and undergraduates at the University of Toronto. I can safely say that we (university teachers) have done a horrible job of teaching evolution and genomics to our students. We have turned out an entire generation of students who don't understand modern molecular evolution and don't understand what's in your genome.

What this means is that there's an extremely small pool of students who have completed "rigorous courses in genetics and genomics." Nobody will be able to apply to medical school. I doubt that we could teach this material to medical students with or without the appropriate background.

But you don't have to take my word for it. Some people have tried to teach this material to health science workers so we can see how it's working at that level. Take a look at the The Genomics Education Programme supported by the NHS in the United Kingdom. They have a series of short videos and longer lessons that are designed to educate health care specialists. Here's the blurb that defines their objective.

Rapid advances in technology and understanding mean that genomics is now more relevant than ever before. As genomics increasingly becomes a part of mainstream NHS care, all healthcare professionals, and not just genomics specialists, need to have a good understanding of its relevance and potential to impact the diagnosis, treatment and management of people in our care.

In 2014, Health Education England (HEE) launched a four-year £20 million Genomics Education Programme (GEP) to ensure that our 1.2 million-strong NHS workforce has the knowledge, skills and experience to keep the UK at the heart of the genomics revolution in healthcare.

Funding for the programme has since been extended to enable us to continue our work in providing co-ordinated national direction of education and training in genomics and developing resources for a wide range of professionals.

They describe genes as 'coding' genes that build proteins. There's no mention of noncoding genes. The define a genome as "both genes (coding) and non-coding DNA." They also say that your genome is all of the DNA in our cells (46 chromosomes, 23 pairs). I don't see anything in their education packages that covers modern molecular evolution. In one of the packages they say,

The term ‘junk DNA’ has been used since the 1970s to describe non-coding regions of the genome, but today it is considered inaccurate and misleading. The term ‘junk’ suggests that 98% of the genome has no use, but in recent years, studies and projects have used advances in technology to shed light on these regions and have come to different conclusions about how much of the genome has a biological function.

Here's a link to a short video called What is a genome?. I recommend that you watch it to see the level that these experts think is suitable for health care professionals in the UK and to see the level of expertise of those who made the video. This is what seven years of work by experts and £20 million will get you.

All of this tells me that teaching genomics and evolution to medical students is going to be a lot more difficult than Rama Singh imagines. Not only would we have to counter several years of misinformation but we would have to rely on teachers who probably don't understand either topic.

Let's start by teaching these things correctly to biology and biochemistry majors. That's going to be hard enough for now.


1. Full displosure: Rama and I shared an NSERC grant in 1981 on genetic variation in Drosophila.

SARS-CoV-2 mRNA vaccines: RNA + lipid nanoparticles

The new mRNA vaccines are the result of extensive research over the past thirty years or so. They are marvels of technological innovation but probably not just for the reasons you imagine. The basics of therapeutic mRNA synthesis have been around for about ten years but the problem was how to get the RNA into cells. That requires specialized lipid nanoparticles and making those has been the most recent technological advance. A lot of this research was done in Canada. I found a nice paper (Buschmann et al., 2021) that covers this research and I'll summarize the important points for those of you don't have time to read it.

The mRNA

Normal messenger RNA is susceptable to nuleases and is not readily taken up by human cells. In addition, it elicits an innate immune response that results in supression of translation through phosphorylation of eIF2a. The immune response can be blocked by incorporating modified nucleotides than are not recognized by the various receptors that stimulate the normal response. This was discovered over ten years ago. These modified nucleotides, such as N1-methylpseudouridine, were used to make the SAR-CoV-2 vaccine.

Thursday, April 08, 2021

On the accuracy of genomics in detecting disease variants

Several diseases, such as cancers, are caused by the presence of deleterious alleles that affect the function of a gene. In the case of cancer, most of the mutations are somatic cell mutations—mutations that have occurred after fertilization. These mutations will not be passed on to future generations. However, there are some variants that are present in the germline and these will be inherited. A small percentage of these variants will cause cancer directly but most will just indicate a predisposition to develop cancer.

There are a host of other diseases that have a genetic component and the responsible alleles can also be present in the germline or due to somatic cell mutations.

Over the past fifty years or so there has been a lot of hype associated with the latest technological advances and the ability to detect deleterious germline mutations. The general public has been repeatedly told that we will soon be able to identify all disease-causing alleles and this will definitely lead to incredible medical advances in treating these diseases. Just yesterday, for example, I posted an article on predictions made by The National Genome Research Institute (USA) who predicts that by 2030,

The clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation ‘variant of uncertain significance (VUS)’ obsolete.

Similar predictions, in various forms, were made when the human genome project got under way and at various time afterword. First there was the 1000 genomes project then there was the 100,000 genome project and, of course, ENCODE. The problem is that genomics hasn't lived up to these expectations and there's a very good reason for that: it's because the problem is a lot more difficult than it seems.

One of the Facebook groups that I follow (Modern Genetics & Technology)1 alerted me to a recent paper in JAMA that addressed the problem of genomics accuracy and the prediction of pathogenic variants. I'm posting the complete abstract so you can see the extent of the problem.

AlDubayan, S.H., Conway, J.R., Camp, S.Y., Witkowski, L., Kofman, E., Reardon, B., Han, S., Moore, N., Elmarakeby, H. and Salari, K. (2020) Detection of Pathogenic Variants With Germline Genetic Testing Using Deep Learning vs Standard Methods in Patients With Prostate Cancer and Melanoma. JAMA 324:1957-1969. [doi: 10.1001/jama.2020.20457]

Importance Less than 10% of patients with cancer have detectable pathogenic germline alterations, which may be partially due to incomplete pathogenic variant detection.

Objective To evaluate if deep learning approaches identify more germline pathogenic variants in patients with cancer.

Design Setting, and Participants A cross-sectional study of a standard germline detection method and a deep learning method in 2 convenience cohorts with prostate cancer and melanoma enrolled in the US and Europe between 2010 and 2017. The final date of clinical data collection was December 2017.

Exposures Germline variant detection using standard or deep learning methods.

Main Outcomes and Measures The primary outcomes included pathogenic variant detection performance in 118 cancer-predisposition genes estimated as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The secondary outcomes were pathogenic variant detection performance in 59 genes deemed actionable by the American College of Medical Genetics and Genomics (ACMG) and 5197 clinically relevant mendelian genes. True sensitivity and true specificity could not be calculated due to lack of a criterion reference standard, but were estimated as the proportion of true-positive variants and true-negative variants, respectively, identified by each method in a reference variant set that consisted of all variants judged to be valid from either approach.

Results The prostate cancer cohort included 1072 men (mean [SD] age at diagnosis, 63.7 [7.9] years; 857 [79.9%] with European ancestry) and the melanoma cohort included 1295 patients (mean [SD] age at diagnosis, 59.8 [15.6] years; 488 [37.7%] women; 1060 [81.9%] with European ancestry). The deep learning method identified more patients with pathogenic variants in cancer-predisposition genes than the standard method (prostate cancer: 198 vs 182; melanoma: 93 vs 74); sensitivity (prostate cancer: 94.7% vs 87.1% [difference, 7.6%; 95% CI, 2.2% to 13.1%]; melanoma: 74.4% vs 59.2% [difference, 15.2%; 95% CI, 3.7% to 26.7%]), specificity (prostate cancer: 64.0% vs 36.0% [difference, 28.0%; 95% CI, 1.4% to 54.6%]; melanoma: 63.4% vs 36.6% [difference, 26.8%; 95% CI, 17.6% to 35.9%]), PPV (prostate cancer: 95.7% vs 91.9% [difference, 3.8%; 95% CI, –1.0% to 8.4%]; melanoma: 54.4% vs 35.4% [difference, 19.0%; 95% CI, 9.1% to 28.9%]), and NPV (prostate cancer: 59.3% vs 25.0% [difference, 34.3%; 95% CI, 10.9% to 57.6%]; melanoma: 80.8% vs 60.5% [difference, 20.3%; 95% CI, 10.0% to 30.7%]). For the ACMG genes, the sensitivity of the 2 methods was not significantly different in the prostate cancer cohort (94.9% vs 90.6% [difference, 4.3%; 95% CI, –2.3% to 10.9%]), but the deep learning method had a higher sensitivity in the melanoma cohort (71.6% vs 53.7% [difference, 17.9%; 95% CI, 1.82% to 34.0%]). The deep learning method had higher sensitivity in the mendelian genes (prostate cancer: 99.7% vs 95.1% [difference, 4.6%; 95% CI, 3.0% to 6.3%]; melanoma: 91.7% vs 86.2% [difference, 5.5%; 95% CI, 2.2% to 8.8%]).

Conclusions and Relevance Among a convenience sample of 2 independent cohorts of patients with prostate cancer and melanoma, germline genetic testing using deep learning, compared with the current standard genetic testing method, was associated with higher sensitivity and specificity for detection of pathogenic variants. Further research is needed to understand the relevance of these findings with regard to clinical outcomes.

It's really difficult to understand this paper since there are many terms that I'd have to research more thoroughly; for example, does "germline whole-exon sequencing" mean that only sperm or egg DNA was sequenced and that every single exon in the entire genome was sequenced? Were exons in noncoding genes also sequenced?

I found it much more useful to look at the accompanying editorial by Gregory Feero.

Feero, W.G. (2020) Bioinformatics, Sequencing Accuracy, and the Credibility of Clinical Genomics. JAMA 324:1945-1947. [doi: 10.1001/jama.2020.19939]

Ferro explains that the main problem is distinguishing real pathogenic variants from false positives and this can only be accomplished by first sequencing and assembling the DNA and then using various algorithms to focus on important variants. Then there's the third step.

The third step, which often requires a high level of clinical expertise, sifts through detected potentially deleterious variations to determine if any are relevant to the indication for testing. For example, exome sequencing ordered for a patient with unexplained cardiomyopathy might harbor deleterious variants in the BRCA1 gene which, while a potentially important incidental finding, does not provide a plausible molecular diagnosis for the cardiomyopathy. The complexity of the bioinformatics tools used in these 3 steps is considerable.

It's that third step that's analyzed in the AlDubayan et al. paper and one of the tools used is a deep-learning (AI) algorithm. However, the training of this algorithm requiries considerable clinical expertise and testing it requires a gold standard set of variants to serve as an internal control. As you might have guessed, that gold standard doesn't exist because the whole point of the genomics is to identify perviously unknown deleterious alleles.

Ferro warns us that "clinical genome sequencing remains largely unregulated and accuracy is highly dependant on the expertise of individual testing laboratories." He concludes that genomics still has a long way to go.

The genomics community needs to act as a coherent body to ensure reproducibility of outcomes from clinical genome or exome sequencing, or provide transparent quality metrics for individual clinical laboratories. Issues related to achieving accuracy are not new, are not limited to bioinformatics tools, and will not be surmounted easily. However, until analytic and clinical validity are ensured, conversations about the potential value that genome sequencing brings to clinical situations will be challenging for clinical centers, laboratories that provide sequencing services, and consumers. For the foreseeable future, nongeneticist clinicians should be familiar with the quality of their chosen genome-sequencing laboratory and engage expert advice before changing patient management based on a test result.

I'm guessing that Gregory Feero doesn't think that in nine years (2030) "The clinical relevance of all encountered genomic variants will be readily predictable."


1. I do NOT recommend this group. It's full of amateurs who resist leaning and one of it's main purposes is to post copies of pirated textbooks in its files. The group members get very angry when you tell them that what they are doing is illegal!