More Recent Comments

Monday, May 16, 2022

Wikipedia editors want to supress an article on junk DNA

I've been trying to fix the Wikipedia artilce on Noncoding DNA but it's quite a challenge because the page is controlled by editors who are opposed to junk DNA and I am accused of starting an "edit war" that goes against the consensus. On a parallel track, I have proposed creating a separate Wikipedia article on junk DNA where we can present the evidence for and against junk. This is being disussed under the "Talk" thread on the "Non-coding DNA" article.

Here's an exchange bewteen me [Genome42] and one of the editors who exerts control over the noncoding DNA page. It's shows you what we are up against.

Let's get back to the main topic. Is there anyone here who objects to creating a separate page for junk DNA? If you object, please explain why because it seems to me that we really need such a page in order to explain to viewers what the main issues are in the controversy. We need some place to put all the evidence showing that 90% of the human genome is junk and to explain why many scientists reject this evidence.Genome42 (talk) 20:18, 15 May 2022 (UTC)

I looked at pubmed and searched for "junk dna" to see how prominent this topic even is. It seems the term is declining in usage in the scientific literature [7] (see the "results by year"). This is despite all of the abundant media coverage it still gets. I would say that if the usage in the scientific literature was rising then perhaps it would be good a good idea, but the reverse is happening. I see an increasing number of papers calling for abandoning the term altogether too. Just an FYI, one of the original reasons for the merge of the junk DNA to this article was that it was causing too much confusion and edit warring as a separate page. When merged you could have the general article on noncoding DNA without the fireworks and a section isolating the controversies coming from it rather than having 2 pages on the same topic with the Junk DNA article mixing controversy with general information on noncoding DNA.Ramos1990 (talk) 21:28, 15 May 2022 (UTC)

Are you serious? Do you really believe that the debate is over and junk DNA doesn't exist just because the opinions you prefer to read are against junk? You don't seem to be knowledgeable about this topic. I can help you get up to speed. Read these articles on my blog.

Also, you seem to be genuinely confused about the difference between junk DNA and noncoding DNA. Think of it this way. Genomes can be divided into centromeric DNA and non-centromeric DNA and the junk is located in the non-centromeric DNA. Does that mean we should have an article on non-centromeric DNA where we discuss junk? We can also split the genome between regulatory DNA and non-regulatory DNA but I don't see you calling for an article on non-regulatory DNA where we discuss junk DNA.

The only reason why you favor discussing junk DNA in a article on non-coding DNA is because you think that junk DNA was once defined as non-coding DNA and this article will prove that some non-coding DNA has a function - therefore it is not all junk. That's an extremely biased, and incorrect, view. No knowledgeable scientist ever defended the claim that all noncoding DNA was junk. Do you think we didn't know about noncoding genes, regulatory sequences, and origins of replication back in the 1960s?

Genomes can be separated into functional DNA and junk DNA and that's where the debate is. The non-coding DNA fraction is a heterogeneous mixture of functional elements and junk DNA and it's very confusing to mix them. An article on junk DNA will discuss all of the various functional regions of the genome and how common they are in the human genome. We will see that if you add them all up you only get to about 5% of the genome. The article will discuss the evidence for junk DNA and the arguments against claims for abundant function. None of that is appropriate in an article on non-coding DNA.

It's easy for me to see why there was "edit warring" over a junk DNA article. It's because many of the editors here are opposed to junk DNA so they try to suppress the legitimate scientific debate. You need to recognize that what you are doing here is expressing a very personal and biased opinion about the topic of junk DNA and you are using your position to start edit wars in order to censure any views in favor of junk DNA. Genome42 (talk) 14:49, 16 May 2022 (UTC)


Sunday, May 15, 2022

Describing non-coding DNA on the NIH (USA) National Human Genome Research Institute website

Here's a link to a short podcast on non-coding DNA narrated by Shurjo K. Sen, Program Director, Divison of Genome Sciences. This is the complete text.

Non-coding DNA. So I could talk about this one forever because it actually happened to be the part of the genome that I did most of my PhD work in. And there used to be an older and derogatory term called junk DNA, which, thankfully, doesn't get used these days much longer. So really, the thing to keep in mind here that human genome is a vast, vast expanse of nucleotides, 3.3 billion almost. And only a very, very small fraction of that, about 2% actually codes for what we know to be proteins. And so the question is, what really happens with the rest? Is it just there doing nothing? Or does it have a function? And for many years, particularly in the earlier stages of genomics as a field, people were not really sure that the non-coding parts of the genome have a purpose for being there. And now, or I would say over the last decade or so maybe, we are only just starting to realize that there are an immense number of ways in which what we think of as non-coding actually might just have a more subtle way of passing its information along. So it may not code in the classical protein-coding sense. But there is a ton of information crucial in many, many ways that is hidden in this part of the genome.

I wish I could tell you that this is some kind of a spoof but it's not. It's an example of the poor state of sceince these days and of how much work we need to do to fix it. I would start by firing the Program Director of the Division of Genome Sciences.


Saturday, May 14, 2022

Editing the Wikipedia article on non-coding DNA

I decided to edit the Wikipedia article on non-coding DNA by adding new sections on "Noncoding genes," "Promoters and regulatory sequences," "Centromeres," and "Origins of replication." That didn't go over very well with the Wikipedia police so they deleted the sections on "Noncoding genes" and "Origins of replication." (I'm trying to restore them so you may see them come back when you check the link.)

I also decided to re-write the introduction to make it more accurate but my version has been deleted three times in favor of the original version you see now on the website. I have been threatened with being reported to Wikipedia for disruptive edits.

The introduction has been restored to the version that talks about the ENCODE project and references Nessa Carey's book. I tried to move that paragraph to the section on the ENCODE project and I deleted the reference to Carey's book on the grounds that it is not scientifically accurate [see Nessa Carey doesn't understand junk DNA]. The Wikipedia police have restored the original version three times without explaining why they think we should mention the ENCODE results in the introduction to an article on non-coding DNA and without explaining why Nessa Carey's book needs to be referenced.

The group that's objecting includes Ramos1990, Qzd, and Trappist the monk. (I am Genome42.) They seem to be part of a group that is opposed to junk DNA and resists the creation of a separate article for junk DNA. They want junk DNA to be part of the article on non-coding DNA for reasons that they don't/won't explain.

The main problem is the confusion between "noncoding DNA" and "junk DNA." Some parts of the article are reasonably balanced but other parts imply that any function found in noncoding DNA is a blow against junk DNA. The best way to solve this problem is to have two separate articles; one on noncoding DNA and it's functions and another on junk DNA. There has been a lot of resistance to this among the current editors and I can only assume that this is because they don't see the distinction. I tried to explain it in the discussion thread on splitting by pointing out that we don't talk about non-regulatory DNA, non-centromeric DNA, non-telomeric DNA, or non-origin DNA and there's no confusion about the distinction between these parts of the genome and junk DNA. So why do we single out noncoding DNA and get confused?

It looks like it's going to be a challenge to fix the current Wikipedia page(s) and even more of a challenge to get a separate entry for junk DNA.

Here is the warning that I have received from Ramos1990.

Your recent editing history shows that you are currently engaged in an edit war; that means that you are repeatedly changing content back to how you think it should be, when you have seen that other editors disagree. To resolve the content dispute, please do not revert or change the edits of others when you are reverted. Instead of reverting, please use the talk page to work toward making a version that represents consensus among editors. The best practice at this stage is to discuss, not edit-war. See the bold, revert, discuss cycle for how this is done. If discussions reach an impasse, you can then post a request for help at a relevant noticeboard or seek dispute resolution. In some cases, you may wish to request temporary page protection.

Being involved in an edit war can result in you being blocked from editing—especially if you violate the three-revert rule, which states that an editor must not perform more than three reverts on a single page within a 24-hour period. Undoing another editor's work—whether in whole or in part, whether involving the same or different material each time—counts as a revert. Also keep in mind that while violating the three-revert rule often leads to a block, you can still be blocked for edit warring—even if you do not violate the three-revert rule—should your behavior indicate that you intend to continue reverting repeatedly.

I guess that's very clear. You can't correct content to the way you think it should be as long as other editors disagree. I explained the reason for all my changes in the "history" but none of the other editors have bothered to explain why they reverted to the old version. Strange.


Friday, April 15, 2022

Most lncRNAs are junk

A hard-hitting review will be published in Annual Review of Genomics and Human Genetics. It shows that the case for large numbers of functional lncRNAs is grossly exaggerated.

A long-time Sandwalk reader (Ole Kristian Tørresen) alerted me to a paper that's coming out next October in Annual Review of Genomics and Human Genetics. (Thank-you Ole.) The authors of the review are Chris Ponting from the University of Edinburgh (Edinburgh, Scotland, UK) and Wilfried Haerty at the Earlham Institute in Norwich, UK. They have been arguing the case for junk DNA for the past two decades but most of their arguments are ignored. This paper won't be so easy to ignore because it makes the case forcibly and critically reviews all the false claims for function. I'm going to quote a few juicy parts because I know that many of you will not be able to access the preprint.

Friday, April 08, 2022

The structures of centromeres

The new complete human genome sequence gives us a first-time look at the structures of human centromeres.

This is my sixth post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

The new long-read and ultra-long-read sequencing techniques have revealed the organization of centromeric regions of human chromosomes. The basic structure of these regions has been known for many years [Centromere DNA] but the overall arrangement of the various repeats and the large scale organizaton of the centromere was not clear.

The core functional regions of centromeres consist of multiple copies of tandemly repeated alpha-satellite sequences. These are 171 bp AT-rich sequences that serve as attachment sites for kinetocore proteins. The kinetochore proteins interact with spindle fibers that pull the chromosomes to the opposite ends of a dividing cell. The core region is surrounded by pericentromeric regions containing additional repeats (mostly HSat2 and HSat3). The alpha-satellite repeats take up almost 3% of the genome and the pericentromeric repeats occupy an additional 3%.1 That's why centromeres are a major component of the functional part of the human genome. (Centromeres are classic examples of functional noncoding DNA and knowledgeable scientists have known about them for half a century.2

Altemose, N., Logsdon, G.A., Bzikadze, A.V., Sidhwani, P., Langley, S.A., Caldas, G.V., Hoyt, S.J., Uralsky, L., Ryabov, F.D., Shew, C.J. and et al. (2021) Complete genomic and epigenetic maps of human centromeres. Science 376:56. [doi: 10.1126/science.abl4178]

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.

The details of the organization of each centromere aren't important. There's a lot of variation between centromeres on different chromosomes and between specific centromeres in different individuals. The authors looked at the organization of X chromosome centromeres in a variety of different individuals from different parts of the world. As expected, there was considerable variation and, as expected, there was more variation within Africans than in all other populations combined.

It shouldn't come as a surprise to find that the authors want more T2T sequences.

This high degree of satellite DNA polymorphism underlines the need to produce T2T assemblies from genetically diverse individuals, to fully capture the extent of human variation in these regions, and to shed light on their recent evolution.

I really hope the granting agencies don't fall for this. It would be much better to spend the resources on exploring the biological function of splice variants (alternative splicing?) and putative noncoding genes in order to resolve the junk DNA controversy. It would also help to devote some of this money to the proper education of science undergraduates.

The authors claim to have discovered 676 genes and pseudogenes within the centromeres. They claim that this includes 23 protein coding genes and 141 lncRNAs genes. They present evidence that three of these genes might have a function which means that 161/164 of these "genes" are "putative" genes until we see evidence of function.3


1. It's unlikely that most of this 6% is absolutely required for the proper functioning of the centromeres because there are many individuals with much less centromere DNA. That's why I only attribute about 1% of the genome to functional centromere sequence.

2. Unknowledgeable scientists continue to be shocked when they discover that noncoding DNA can have a biological function. This is because they weren't taught properly as undergraduates.

3. I don't understand why so many scientists are unable to see the difference between a putative gene and a real gene.

Wednesday, April 06, 2022

Genetic variation and the complete human genome sequence

The new complete human genome sequence adds an extra 8% of DNA sequence that's a source of variation in the human population. The sequence also corrects some errors in the current standard reference genome.

This is my fifth post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

Tuesday, April 05, 2022

Two different views of the history of molecular biology

How can different molecular biologists have such opposite views of the history of their field?

I'm posting links to two papers without comment. One of them is from my friend and colleague Alex Palazzo and the other is from James Shapiro who is not my friend or colleague. Both papers have been published in reputable peer-review journals.

Transcription activity in repeat regions of the human genome

A detailed examination of the new complete human genome reveals that 54% of it consists of various repetitive elements. Some of them are transcribed and some aren't.

This is my fourth post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

The fourth paper extends the ENCODE-type analysis of the T2T-CHM13 sequence by focusing on repeats.

Hoyt, S.J., Storer, J.M., Hartley, G.A., Grady, P.G., Gershman, A., de Lima, L.G., Limouse, C., Halabian, R., Wojenski, L., Rodriguez, M. et al. (2021) From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science 376:57. [doi: 10.1126/science.abk3112]

Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.

The most useful part of this paper is the complete analysis of all repetitive elements in the T2T-CHM13 genome. This gives us, for the first time, a complete picture of a human genome. The exact values of the various components aren't important because there's considerable variation with the human population but the big picture is informative.

These are the percentages of the human genome occupied by the different classes of repetitive DNA.

  • SINEs 12.8%
  • Retrotransposon 0.15%
  • LINEs 20.7%
  • LTRs 8.8%
  • DNA transposons 3.6%
  • simple repeats 8%

The total comes to 54%. There are other estimates that are higher because of a more lenient cutoff value for sequence similarity but this gives you a pretty good idea of what the genome looks like. Most of the transposon-related sequence consists of fragments of once active transposons so the fraction of the genome consisting of true selfish DNA capable of transposing is a small fraction of this 54%.

We have every reason to believe that most of this DNA is junk DNA based on several lines of evidence developed over the past 50 years but most of the authors of this paper are reluctant to reach that conclusion so the fact that these repetitive sequences might be junk isn't mentioned in the paper. Instead, the authors concentrate on mapping CpG methylation sites and transcribed regions. They refer to this as "functional annotation" but they don't provide a definition of function.

We provide a high-confidence functional annotation of repeats across the human genome.

As you might expect, the repeat elements that retain vestiges of promoters are often transcribed and this includes adjacent genomic sequences that are found near these promoter (e.g. near LTRs). The long stretches of short tandem repeats (e.g. satellite DNA) do not contain any sequences that resemble promoters so these regions are not transcribed. (The authors seem to be a bit surprised by this result.) Further work is needed to decide how much of this DNA is truly functional and which parts contribute to human uniqueness. Naturally, that will require much more ENCODE-type work and T2T sequencing of other primates.

Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Although we find repeat variants that appear enriched or specific to the human lineage, in the absence of T2T-level assemblies from other primate species, we cannot truly attribute these elements to specific human phenotypes. Thus, the extent of variation described herein highlights the need to expand the effort to create human and nonhuman primate pan-genome references to support exploration of repeats that define the true extent of human variation.

This will cost millions of dollars. I suspect the grant applications have already been sent.



Monday, April 04, 2022

If you were a Harvard freshman you could take a course on the dark matter of the genome.

Check out this freshperson seminar course on Parts Unknown: The Dark Matter of the Genome at Harvard. It is offfered by Amanda J. Whipple of the Department of Molecular and Cellular Biology. She works on noncoding RNAs in the brain. Harvard likes to think of itself as one of the top universities in the world so this seminar course must be an example of world class critical thinking.

Heaven help us if this is what future American leaders are being taught.

Did you know that genes, traditionally defined as DNA encoding protein, only account for two percent of the entire human genome? What is the purpose of the remaining 98% of the genome? Is it simply “junk DNA”? This seminar will explore the large portion of our genome that has been neglected by scientists for many years because its purpose was not known. We will examine research findings which demonstrate non-coding sequences, previously assigned as “junk DNA”, play crucial roles in the development and maintenance of a healthy organism. We will further discuss how these non-coding sequences are promising targets for drug design and disease diagnosis. We will then visit a local research laboratory (either virtually or in person as deemed appropriate) and engage with active scientists regarding the scientific research enterprise.

A thorough understanding of the human genome not only provides a foundation for any student interested in the life sciences, it enables one to engage more deeply in related political and societal debates, which is expected to become even more central as scientists further uncover the dark matter of our genomes.

Setting aside the sarcasm, how did we get to a stage where a prominent researcher at one of the top research universities in the world could write such a course description?



Sunday, April 03, 2022

Karen Miga and the telomere-to-telomere consortium

Karen Miga deserves a lot of the credit for the complete human genome sequence.

Karen Miga is a professor at the University of California, Santa Cruz, and she's been working for several years on sequencing the repetitive regions of the genome. She is a co-founder of the telomere-to-telomere consortium that just published a complete sequece of the human genome. She made a signficant contribution to long-read (~20 Kb) and ultra-long-read (>100 kb) sequencing and that's a major technological achievement that's worthy of prizes.

Read the interview on CBC (Canada) Quirks & Quarks at Scientists sequence complete, gap-free human genome for the first time and watch the YouTube video.


Miga did her Ph.D. with Huntington Willard at Duke University. Hunt has been working on centromeres for more than 40 yeas years and some of my colleagues may remember him when he was a professor at the University of Toronto in the Department of Medical Genetics.



What do we do with two different human genome reference sequences?

It's going to be extremely difficult, perhaps impossible, to merge the new complete human genome sequence with the current standard reference genome.

The source DNA for the new telomere-to-telomere (T2T) human genome sequence was a cell line derived from a molar pregnancy. This meant that the DNA was essentially haploid, thus avoiding the complications of sequencing diploid DNA which contains two highly similar but different genomes. The cell line, CHM13, lacks a Y chromosome but that's trivial since a complete T2T sequence of a Y chromosome will soon be published and it can be added to the T2T-CHM13 genome sequence [Telomere-to-telomere sequencing of a complete human genome].

Segmental duplications in the human genome

The new completed human genome sequence contains some previously unknown large duplicatons (segmental duplications).

This is my third post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

Epigenetic markers in the last 8% of the human genome sequence

The newly sequenced part of the human genome contains the same chromatin regions as the rest of the genome and they don't tell us very much about which regions are functional and which ones are junk.

This is my second post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

A complete human genome sequence (2022)

The first complete human genome sequence has finally been published.

This is my first post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

Friday, April 01, 2022

Illuminating dark matter in human DNA?

A few months ago, the press office of the University of California at San Diego issued a press release with a provocative title ...

Illuminating Dark Matter in Human DNA - Unprecedented Atlas of the "Book of Life"

The press release was posted on several prominent science websites and Facebook groups. According to the press release, much of the human genome remains mysterious (dark matter) even 20 years after it was sequenced. According to the senior author of the paper, Bing Ren, we still don't understand how genes are expressed and how they might go awry in genetic diseases. He says,

A major reason is that the majority of the human DNA sequence, more than 98 percent, is non-protein-coding, and we do not yet have a genetic code book to unlock the information embedded in these sequences.

We've heard that story before and it's getting very boring. We know that 90% of our genome is junk, about 1% encodes proteins, and another 9% contains lots of functional DNA sequences, including regulatory elements. We've known about regulatory elements for more than 50 years so there's nothing mysterious about that component of noncoding DNA.

Wednesday, March 30, 2022

John Mattick's new book

John Mattick and Paulo Amaral have written a book that promotes their views on the content of the human genome. It will be available next August. Their main thesis is that the human genome is full of genes for regulatory RNAs and there's very little junk. A secondary theme is that some very smart scientists have been totally wrong about molecular biology and molecular evolution for the past fifty years.

I pretty much know what's going to be in the book [see John Mattick presents his view of genomes]. I also know that most of his claims don't stand up to close scrutiny but that's not going to prevent it from being touted as a true paradigm shift. (It's actually a paradigm shaft.) I suspect it's going to get favorable reviews in Science and Nature.

John Mattick presents his view of genomes

John Mattick has a new book coming out in August where he defends the notion that most of our genome is full of genes for functonal noncoding RNAs. We have a pretty good idea what he's going to say. This is a talk he gave at Oxford on May 17, 2019.

Here are a few statements that should pique your interest.

  • (0:57) He says that his upcoming book is tentatively titled "the misunderstandings of molecular biology."
  • (1:11) He says that "the assumption has been very deeply embedded from the time of the lac operon on that genes equated to proteins."
  • (2:30) There have been three "surprises" in molecuular biology: (1) introns, (2) eukaryotic genomes are full of 'selfish' DNA, and (3) "gene number does not scale with developmental complexity."
  • (4:30) It is an unjustified assumption to assume that transposon-related seqences are junk and that leads to misinterpretation of neutral evolution.
  • (6:00) The view that evolution of regulatory sequences is mostly responsible for developmental complexity (Evo-Devo) has never been justified.
  • (8:45) A lot of obtuse theoretical discussion about how the number of regulatory protein-coding genes increases quadratically as the total number of protein-coding genes increase in a bacterial genome but at some point there has to be more protein-coding regulatory genes than total protein-coding genes so that limits the evolution of bacteria.
  • (13:40) The proportion of noncoding DNA increases with developmental complexity, topping out at humans.
  • (14:00) The vast majority of the genome in complex organisms is differentially transcribed in different cells and different tissues.
  • (14:15) The whole genome is alive on both strands.
  • (14:20) There are two possibilities: junk RNA or abundant functional transcripts and that explains complex organisms.
  • Mattick then takes several minutes to document the fact that there are abundant transcripts— a fact that has been known for the better part of sixty years but he does not mention that. All of his statements carry the implicit assumption that these transcripts are functional.
  • (20:20) He makes the boring, and largely irelevant, point that most disease-associated loci are located in noncoding regions (GWAS). He's responding to a critic who asked why, if these things (transcripts) are real, don't we see genetic evidence of it.
  • (24:00) Noncoding RNAs have all of the characteristics of functional RNAs with an emphasis on the fact that their expression is often only detected in specific cell types.
  • (31:50) It has now been shown that everything that protein transcription factors can do can be done by noncoding RNA.
  • (32:15) "I want to say to you that conservation is totally misunderstood." Apparently, lack of conservation imputes nothing about function.
  • (41:00) RNAs control phase separation. There's a whole other level of cell organization that we never dreamed of. (Ironically, he gives nucleoli as an example of something we never dreamed of.)
  • (42:36) "This is called soft metaphysics, and it's just come into biology, and it's spectacular in its implications."
  • (46:25) Almost every lncRNA is alternatively spliced in mice and humans.
  • (46:30) There's more alternative splicing in human protein-coding genes than in mice protein-coding genes but the extra splicing in humans is mostly in the 5' untranslated region. (I'm sure it has nothing to do with the fact that tons more RNA-Seq experiments have been done on human tissues.) "We think this is due to the increased sophistication of the regulation of these genes for the evolution of cognition."
  • (48:00) At least 20% of the human genome is evolutionarily conserved at the level of RNA structure and this does not require any assumptions.
  • (55:00) The talk ends at 55 minutes. That's too bad because I'm sure Mattick had a dozen more slides explaining why all of those transcripts are functional, as opposed to the few selected examples he picked. I'm sure he also had a lot of data refuting all of the evidence in favor of junk DNA but he just ran out of time.

I don't know if there were questions but, if there were, I bet that none of them challenged Mattick's main thesis.


Saturday, March 26, 2022

Science communication in the modern world

Science editors asked young scientists to imagine what kind of course they would have created if they could go back to a time before the pandemic [A pandemic education]. Three of the courses were about science communication.

COM 145: Identification, analysis, and communication of scientific evidence

This course focuses on developing the skills required to translate scientific evidence into accessible information for the general public, especially under circumstances that lead to the intensification of fear and misinformation. Discussions will cover the principles of the scientific method, as well as its theoretical and practical relevance in counteracting the dissemination of pseudoscience, particularly on social media. This course discusses chapters from Carl Sagan’s book The Demon-Haunted World, certain peer-reviewed and retracted papers, and materials related to key science issues, such as the anti-vaccine movement. For the final project, students will comprehensibly communicate a scientific topic to the public.

Camila Fonseca Amorim da Silva University of Sao Paulo, Sao Paulo, Brazil

COM 198: Everyday science communication

As scientific discoveries become increasingly specialized, the lack of understanding by the general public undermines trust in scientists and causes the spread of misinformation. This course will be taught by scientists and communication specialists who will provide students with a toolset to explain scientific concepts, as well as their own research projects, to the general public. Upon completion of this course, students will be able to explain to their grandparents that viruses exist even though they can’t see them, convince their neighbors that vaccines don’t contain tracking devices, and explain the concept of exponential growth to governmental officials.

Anna Uzonyi Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.

COM 232: Introduction to talking to regular people

Communicating science is difficult. Many scientists, having immersed themselves in the language of their field, have completely forgotten how to talk to regular people. This course hones introductory science communication skills, such as how to talk about scary things without generating mass panic, how to calmly discourage the hoarding of paper hygiene products, and how to explain why scientific knowledge changes over time. The final project will include cross examination from law school faculty, who are otherwise completely uninvolved with the course and possess minimal scientific training. Recommended for science majors who are unable to discuss impactful scientific findings without citing a P value.

Joseph Michael Cusimano Bernard J. Dunn School of Pharmacy, Shenandoah University, Winchester, VA, USA.

They sound like interesting courses but my own take on science communication is somewhat different. I think it's very difficult for practicing scientists to communicate effectively with the general public so I tend to view science communication at several different levels. My goal is to communicate with an audience of scientists, science journalists, and people who are already familiar with science. The idea is to make sure that this intermediate group understands the scientific facts in my field and to make sure they are familiar with the major controversies.

My hope is that this intermediate group will disseminate this information to their less-informed friends and relatives and, more importantly, stop the spread of misinformation whenever they hear it.

Take junk DNA for example. It's very difficult to convince the average person that 90% of our genome is junk because the idea is so counter-intuitive and contrary to the popular counter-narratives. However, I have a chance of convincing the intermediate group, including science journalists and other scientists, who can follow the scientific arguments. If I succeed, they will at least stop spreading misinformation and false narratives and start presenting alternatives to their sudiences.


Monday, March 14, 2022

Junk DNA

My book manuscript has been reviewed by some outside experts and they seem to have convinced my editor that my book is worth publishing. I hope we can get it finished soon. It would be nice to publish in in September on the 10th anniversary of the ENCODE disaster.

Meanwhile, I keep scanning the literature for mentions of junk DNA to see if scientists are finally coming to their senses. Apparently not, and that's a good thing because it means that my book is still needed. Here's the opening paragraph from a recent review of lncRNAs. The authors are in the Department of Medicine at the Medical College of Gerogia, in Augusta, Georgia (USA).

Ghanam, A.R., Bryant, W.B. and Miano, J.M. (2022) Of mice and human-specific long noncoding RNAs. Mammalian Genome:1-12. [doi: 10.1007/s00335-022-09943-2]

Approximately ninety-eight percent of our genome is noncoding. Contrary to initial descriptions of this vast sea of sequence comprising “junk DNA” (Ohno 1972), comparative genomics and various next-generation sequencing studies have revealed millions of transcription factor binding sites (TFBS) (Vierstra et al. 2020) and tens of thousands of noncoding genes, most notably the class of long noncoding RNAs (LncRNAs), defined currently as processed transcripts of length > 200 base pairs with no protein-coding capacity (Rinn and Chang 2020; Statello et al. 2021). The widespread transcription of LncRNAs and abundance of regulatory sequences such as enhancers support the concept of a genome that is largely functional (ENCODE Project Consortium 2012). Such a dynamic genome should not be surprising given the complex nature of gene expression and gene function necessary for embryonic and postnatal development as well as disease processes.

  • No reasonable scientist, especially Susumu Ohno, ever said that all noncoding DNA was junk.
  • There are millions of transcription factor binding sites but most of them are spurious binding sites that have nothing to do with regulation. They simply reflect the expected behavior of typical DNA binding proteins in a large genome full of junk DNA.
  • Nobody has demonstrated that there are tens of thousand of noncoding genes. There may be tens of thousands of transcripts but that's not the same thing since you have to prove that those transcripts are functional before you can say that they come from genes.
  • There is currently no evidence to support the concept of a genome that is largely functional in spite of what the ENCODE researchers might have said ten years ago.
  • Such a genome would be very surprising, if it were true, given what we know about genomes, evolution, and basic biochemistry.

Except for those few minor details—I hope I'm not being too picky—that's a pretty good way to start a review of lncRNAs. :-)


Sunday, February 20, 2022

Jacques Fresco (1928-2021)

Jacques Fresco died last December. I am kind of a scientific grandson of Jacques Fresco since he mentored my Ph.D. supervisor, Bruce Alberts when he (Bruce) was an undergraduate at Harvard.

While at Harvard, Jacques mentored then-undergraduate Bruce Alberts, who taught at Princeton from 1966 to 1976, served as president of the National Academy of Sciences and wrote the seminal textbook, “The Molecular Biology of the Cell.”

In addition to reassuring Alberts’ parents that they shouldn’t worry about their son’s choice to pursue science instead of medical school — a story Fresco enjoyed telling — he also played a key role in bringing the young scientist to Princeton. “Before I had even completed my Ph.D., he convinced Princeton to offer me an assistant professorship that I did not deserve,” Alberts recalled. “And at Princeton for 10 years, we of course spent an enormous amount of time together. So Jacques was very central to my life as a scientist and a close friend.”

The Fresco lab was right above the Alberts lab when I started graduate school at Princeton in 1968. The main focus of the Fresco lab was the structure of tRNA and in order to isolate different tRNA molecules they needed a very large gel filtration column that was about 4m tall and about as big around as a dinner plate. The column was too tall for their lab so they had to drill a hole through the concrete floor and drop it down into the lab below!

One of my graduate student friends worked in the Fresco lab on hydrogen exchange in tRNA. The idea was to measure the number of hydrogen bonds in the structure by looking at the exchange bewtween hydrogens in the medium and in tRNA. The experiment used a radioactive isotope of hydrogen (tritium) in the medium and each experiment required about one curie of radioactive hydrogen and that's a lot. After a few years my friend decided to become a plastic surgeon instead of a scientist!

I knew Jacques Fresco quite well when I was a graduate student and I always thought he was an excellent scientist.

Many of his students mentioned what an enormous role Fresco played in shaping their careers, in large and small ways. “Jacques treated everybody with the same respect, irreverence and love of life,” said Steven Broitman, a professor of biology at West Chester University in Pennsylvania who completed his Ph.D. with Fresco in 1988. “In addition to all he taught me about science, he also modeled the simple enjoyment in doing science that I have always tried to keep with me and pass on to my own students. He was larger than life, a major figure in the birth of modern molecular biology. He was deeply loved, and he will be missed.”



Sunday, January 09, 2022

Akiko Iwasaki talks about mucosal immunity

Akiko Iwasaki is a Professor of Immunology at Yale and a former student in my department (Dept. of Biochemistry, University of Toronto). She got her undergraduate degree in biochemistry in the mid-1990s1 and then did her Ph.D. in the Dept. of Imunology under my friend and colleague Brian Barber.

Alex Pallazzo is a keen podcast listener and he alerted me to an interview with Akiko Iwasaki on the EMBO podcast channel: The Right Place at the Right time. There are several reasons why listening to this podcast is worthwhile if you are interested in science and immunology. The most important reason is that it gives you a good idea of the depth of knowledge in the field because the level of the interview is pitched at those who have a considerable understanding of imunology. I'm not one of those people but I recognize good science when I hear it.

Another reason is that she discusses COVID-19 and how vaccines work. As you know from earlier posts, the serum antibody levels induced by the current vaccines wane after a few months so that vaccinated people can get infected by the SARS-CoV-2 virus. The secondary response then kicks in protecting you from serious illness. In order to stop the initial infections and prevent the spread of the virus we might have to get booster shots every six months or so and that's not a satisfactory solution.

Iwasaki works on something called mucosal immunity, which is new to me but very familiar to the experts. Here's a brief description from her website and a figure from Wikipedia.

The mucosal surfaces represent major sites of entry for numerous infectious agents. Consequently, the vast mucosal surfaces are intricately lined with cells and lymphoid organs specialized in providing protective antibody and cellular immunity. One of the most fundamental issues in this field concerns how antigens in the mucosa are taken up, processed, and presented by antigen presenting cells. Our laboratory's goal is to understand how immunity is initiated and maintained at the mucosal surfaces, particularly by the dendritic cells (DCs), through natural portals of entry for pathogens that are of significant health concerns in the world.

We focus on understanding how viruses are recognized (innate immunity) and how that information is used to generate protective adaptive immunity.

I hope I understand this well enough to explain it in simple terms. Mucosal immunity means that there are IgA antibodies in the mucosa that surrounds cells in certain parts of the body. For our purposes, the cells in the respiratory tract are important in COVID-19. The memory B-cells and T-cells that respond to the antigen are located right under the mucosa. Imagine that you could produce a vaccine that induced IgA against SARS-CoV-2 in the mucosa. The antibodies would be located right where the virus enters the body and they don't disappear over time like IgG in the blood stream. Furthermore, the secondary response is induced right near the site where the virus is attacking the body.

I think you need a nasal/throat spray vaccine to make this work and such vaccines are under development. They would probably have to be given in conjunction with the intra-muscular mRNA vaccines. I wish I could get Brian Barber to explain this but I can't seem to contact him. He gave a short lesson in immunology on his daughter, Jill Barber's Instagram account last year so I know he could do it.

I learned one other thing from listening to Akiko Iwasaki. We know that SARS-CoV-2 is more virulent in cold weather, especially during the winter months. She explains that the mucosal layer needs to be kept moist but during the winter months it can dry up due to the low humidity. The outside air is cold, therefore the humidity is low, and we import that air into our homes and workplaces. This dry air promotes spread of the virus.

Maybe we should be installing extra humidifiers to keep the humidity at higher levels?

It's a bit of a stretch from Akiko Iwasaki to Jill Barber but we've known Jill since she was little and my wife and I are big fans so here's a musical interlude to take your mind off COVID-19.



1. She must have taken my Molecular biology course and that's probably why she knows so much!

Saturday, January 08, 2022

What is the best COVID-19 vaccine?

Take any vaccine you can get whenever you can. Moderna is the probably the very best vaccine and Pfizer-BioNTech is a close second. AstraZenica is very good but Johnson & Johnson not so much.

A brief summary of the COVID-19 vaccines was published in the Dec. 23rd issue of Nature. It doesn't go into a lot of details but I think the overall impressions are valid. The most serious probem with the summary is that it doesn't take into account the Omicron variant.

Mallapaty, S., Callaway, E., Kozlov, M., Ledford, H., Pickrell, J. and Van Noorden, R. (2021) How COVID vaccines shaped 2021 in eight powerful charts. Nature 600:580-583. [PDF] The extraordinary vaccination of more than four billion people, and the lack of access for many others, were major forces this year — while Omicron’s arrival complicated things further.

The first graph shows the popularity of the major vaccines. It's significant for two reasons. First, people in North America don't realize that the AstraZenica vaccine has made such an enormous contribution to fighting the pandemic. That's because AstraZenica wasn't approved in the United states in spite of its effectiveness and it got a bad reputation in Canada.

Second, the Chinese vaccine, CoronaVac (also known as Sinovac), has been widely distributed throughout the world. The CoronaVac vaccine is an inactivated virus vaccine that doesn't require ultracold temperatures for storage and it is relatively cheap to manufacture. China has been vaccinating people everywhere, notably in Brazil and Indonesia. The CoronaVac vaccine was quite effective against the early variants but it doesn't work as well with the Omicron variant.

The distribution data also shows that the Pfizer-BioNTech vaccine, the one developed in Germany, is far more popular than the Moderna vaccine that was developed in the United States. Even Sinopharm, another Chinese vaccine, is more popular than Moderna. As far as most of the world is concerned, it's the German, British, and Chinese vaccines that are going to save them and not the one created in Boston.

Some of the vaccines are more effective than others but unfortunately the Nature article only addresses the vaccines that are widely used in Europe and North America. The data shows that the mRNA vaccines are very effective against all of the variants that arose before Omicron. The mRNA vaccines not only protected against symptoms but also against severe disease (hospitalizations). The AstraZenica vaccine was also very good but not quite as good as the mRNA vaccines. The Johnson & Johnson vaccine was much less effective.

These data do not address any possible side effects of these vaccines and that's important because it is widely believed in some countries that the AstrZenica vaccine poses a much higher risk of side effects. That's not true. There may be a slightly increased risk of side effects with AstraZenica but it's not significant.

The vaccine's ability to block symptoms depends on the antibody levels in the serum while the ability to prevent long-term infections depends on the development of robust memory B-cells and T-cells. As with all vaccines, the initial antibody levels fall after the vaccination so the ability to prevent initial infections by the virus wanes over time [The omicron variant evades vaccine immunity but boosters help] [On the effectiveness of vaccines].

You can see from the above graph that the vaccines' ability to prevent infecion by the Delta variant falls off considerably by six months after completing the vaccination schedule. It's important to note that this data is with the Delta variant and it explains why countries that rushed to vaccinate their population as quickly as possible in early 2021 suffered more in the Delta wave. It's why booster shots were promoted in Israel and the United States because both of those countries vaccinated early and waited only the minimal time between doses. (Other countries waited longer between the first and second doses so the waning of initial infection was delayed.)

The waning effect is even more pronounced with the Omicron variant because it arose later in the year when far more people were beyond the six month limit of primary infection protection. What this means, I think, is that the Omicron variant isn't special because it "escapes immunity"—that would have been true of any new variant just as it was true of Delta. In any case, the mRNA vaccines are better because they start with a higher level of protection and if the data is accurate it means that Moderna is better than Pfizer.

I was prompted to post this article because many Canadians are hesitant to get the Moderna vaccine for their booster shot, especially if they had Pfizer first. That's ridiculous. Moderna is probably a bit better and, besides, there's plenty of data showing that mixing vaccines is better than sticking with the same one for all your shots.


Image Credit: The coronavirus figure is from Alexy Solodovnikov and Wikmedia Commons.

Friday, January 07, 2022

Ontario (Canada) hospitals are filling up with fully vaccinated patients

The Omicron wave is surpassing all records for the number of cases in Ontario. The province has given up on testing for most people so the actual case counts are far higher than the reported cases and it's unlikely that the numbers are dropping in spite of what the graph (below) might suggest. Judging by what's going on in other countries, the peak is still a week or two away.

Ontario residents have been very good about getting vaccinated. As of today, 88% of eligible people over the age of 12 have been fully vaccinated and 91% have received at least one dose. The 5-11 age group became eligible about six weeks ago and so far 45% have had one shot. This places Ontario (and the rest of Canada) among the most vaccinated places in the world.

Since the unvaccinated population is only 10% of the total, this means that most of the cases are among the fully vaccinated population and most of those cases are mild or asymptomatic. Fully vaccinated people are also getting infected in other countries but the effect is often masked by a large number of cases among the unvaccinated population. This can deceive people into believing that you don't need to worry if you are fully vaccinated.

About 30% of eligible people have received a booster shot and that group is not reporting significant numbers of infections consistent with the data showing that a recent booster will protect you from gettng even mild forms of COVID-19.

It's pretty clear that the Omicron variant is being spread by people who are fully vaccinated with no booster. They may have mild symptoms but they can infect others, including young children and the elderly, who can suffer more severe symptoms.

The number of people in hospital with COVID-19 is rising sharply but so far it's still less than the numbers in the Delta wave last Fall. That's expected to change rapidly over the next few days and there's a great danger that the health care system will be overwhelmed. The best guess so far is that we will just scrape by by cancelling all elective surgeries and restricting the number of non-COVID patients who get admitted to hospital. Other countries may not be so lucky.

Given the high levels of vaccination, you might suspect that most of the people in hospital will have been fully vaccinated and that's exactly what we see. 71% of the COVID-19 patients in the hospitals have been fully vaccinated but this number is slightly misleading since it includes patients who were admitted for other reasons and subsequently tested positive for COVID-19. Those people aren't necessarily being treated for severe COVID symptoms.

It's hard to get an accurate number for the hospitalization rate because we don't know how many cases there are but it looks like that number is below 1%. This means that, on average, fewer than one patient will end up in hospital for every 100 who get COVID-19. This rate is far below the overall rate of 3.9% since the pandemic began and about 2% for the Delta wave when a substantial percentage of the population was vaccinated. It's data like this that suggest that the Omicron variant causes a milder form of COVID-19 but the data is confounded by the fact that fully vaccinated people are now getting infected whereas they still had substantial serum antibody levels during the Delta spike. I'd like to know what the hospitalization rate (and the death rate) was for unvaccinated people last year and what it is now.

The unvaccinated group makes up only 24% of the hospitalized patients but 49% of those in the intensive care units (ICU). This is clear evidence that vaccination offers significant protection against severe forms of the disease—that's exactly what vaccines are supposed to do. However, it's worth noting that 51% of the patients in the ICUs are either fully or partially vaccinated. You can still get a serious case of COVID-19 if you are fully vaccinated. As with the case numbers, this severe outcome will not be obvious in countries with lower vaccination rates and that could be a problem if you are trying to stop the spread.


Wednesday, January 05, 2022

The effect of spike protein mutations in the Omicron variant

The Omicron variant of SARS-CoV-2 contains a large number of novel mutations in the spike protein. How did these mutations occur and what is their effect on the properties of the variant?

The origin of the novel mutations

The phylogeny of the Omicron variant is unusual—it seems to have appeared without any documented history of steady accumulation of mutations. It looks like it split from the other variants before the summer of 2020. This leads to suggestions that the virus was circulating (and mutating) long before it was first detected.

One idea is that the ancestor of the Omicron variant jumped to another species and evolved in that species for 18 months before jumping back into humans. This would account for the lack of intermediates seen in screening infected patients. Several of the key mutations in the spike protein sequence are similar to variants that have adapted to bind to the mouse version of the receptor (ACE2) (Sun et al. 2021) and the Omicron spike protein binds strongly to mouse ACE2. (The original SAR-CoV-2 variants do not infect mice.) I think it's safe to conclude (tentatively) that Omicron evolved in mice and jumped back to humans in October or November 2021.

Immune evasion

The Omicron variant infects people who have been fully vaccinated or who have been previously infected with one of the other variants (Zhang et al. 2021). This is because the low level of circulating antibodies in these individuals is not sufficient to block Omicron. The level can be boosted with a booster shot (or a recent infection) and this increase protects against infections by Omicron. Fortunately, Omicron is attacked by T-cells and memory B-cells in vaccinated individuals so the infections are mild (Redd et al. 2021).

The mRNA vaccines elicit polyclonal antibodies to the spike protein of the original (Wu) variant. Some of these antibodies will recognize surface antigens that have mutated in Omicron so that's why it requires a higher concentration of antibodies to neutralize Omicron. We're lucky that boosting the overall antibody levels with a booster shot is sufficient to protect us from infection. We're also lucky that the T-cell response is robust—it didn't have to be.

The important point to remember during the Omicron wave is that the virus infects, and is transmitted by, fully vaccinated individuals (no booster). Some of the talking heads on TV seem to forget this when they advocate for keeping the schools open as long as everybody is vaccinated. That's not going to stop the spread and, besides, some of those vaccinated children are going to end up in the hospital. What they need to be telling us is that keeping schools open is going to result in X number of hospitalizations and Y number of deaths and this is an acceptable trade-off. (Parents, teachers, school bus drivers, school administrative staff, and their elderly parents may disagree.)

Transmissibility

The Omicron variant is highly transmissible, meaning that it is more infectious than the other variants. This feature explains the enormous spikes of cases in all countries that are experiencing an Omicron wave. We've not seen anything like that in previous waves.

Some of that peak might be due to the fact that vaccinated individuals are not being as careful as they should be so they are spreading the virus in their communities. That may explain the differences between different countries, or different states within a country, but it's not the full story.

Initially, there was a lot of speculation that the spike protein mutations in Omicron made it bind more strongly to the human ACE2 receptors and that would explain why the virus was more infectious. But most of those studies were based on models and the results from different groups were contradictory. Recently the Chinese scientists who have been at the leading edge of these studies since the beginning have shown that the Omicron spike protein does not bind significantly more tightly to ACE2 than the version from other variants (Zhang et al. 2021).

It looks like the increase in infectivity is due to enhanced entry of the Omicron variant into cells once it has bound to the receptor. Some of this is probably due to a mutation that creates a more favorable furin cleavage site but additional increases in entry might be due to conformational changes in the spike protein (Zhang et al. 2021)

Milder cases?

There's a lot of speculation that the Omicron variant causes less severe forms of COVID-19 but the data is complicated by the fact that vaccinated and convalescent patients are also suffering from COVID-19 and they are partially protected. I don't think it's really known whether naive (unvaccinated and not previously infected) individuals have a milder form of the disease and I know we don't have any data on the long-term effect of Omicron infection. Please let me know of any studies that have been released.

I don't know of any logical connection between the known mutations in Omicron and the severity of COVID-19.


Redd, A. D., Nardin, A., Kared, H., Bloch, E. M., Abel, B., Pekosz, A., Laeyendecker, O., Fehlings, M., Quinn, T. C., and Tobian, A.A. (2021) Minimal cross-over between mutations associated with Omicron variant of SARS-CoV-2 and CD8+ T cell epitopes identified in COVID-19 convalescent individuals. bioRxiv : the preprint server for biology, 2021.12.06.471446. [doi: 10.1101/2021.12.06.471446]

Sun, Y., Lin, W., Dong, W., and Xu, J. (2021) Origin and evolutionary analysis of the SARS-CoV-2 Omicron variant. Journal of Biosafety and Biosecurity. [doi: 10.1016/j.jobb.2021.12.001]

Wei, C., Shan, K. J., Wang, W., Zhang, S., Huan, Q., and Qian, W. (2021) Evidence for a mouse origin of the SARS-CoV-2 Omicron variant. Journal of Genetics and Genomics. [doi: 10.1016/j.jgg.2021.12.003]

Zhang, X., Wu, S., Wu, B. et al. (2021) SARS-CoV-2 Omicron strain exhibits potent capabilities for immune evasion and viral entrance. Sig Transduct Target Ther 6:430. [doi: 10.1038/s41392-021-00852-5]