More Recent Comments

Friday, August 07, 2020

Alan McHughen defends his views on junk DNA

Alan McHughen is the author of a recently published book titled DNA Demystified. I took issue with his stance on junk DNA [More misconceptions about junk DNA - what are we doing wrong?] and he has kindly replied to my email message. Here's what he said ...

Thursday, August 06, 2020

More misconceptions about junk DNA - what are we doing wrong?

I'm actively following the views of most science writers on junk DNA to see if they are keeping up on the latest results. The latest book is DNA Demystified by Alan McHughen, a molecular geneticist at the University California, Riverside. It's published by Oxford University Press, the same publisher that published John Parrington's book the deeper genome. Parrington's book was full of misleading and incorrect statements about the human genome so I was anxious to see if Oxford had upped it's game.1, 2

You would think that any book with a title like DNA Demystified would contain the latest interpretations of DNA and genomes, especially with a subtitle like "Unraveling the double Helix." Unfortunately, the book falls far short of its objectives. I don't have time to discuss all of its shortcomings so let's just skip right to the few paragraphs that discuss junk DNA (p.46). I want to emphasize that this is not the main focus of the book. I'm selecting it because it's what I'm interested in and because I want to get a feel for how correct and accurate scientific information is, or is not, being accepted by practicing scientists. Are we falling for fake news?

Saturday, August 01, 2020

ENCODE 3: A lesson in obfuscation and opaqueness

The Encyclopedia of DNA Elements (ENCODE) is a large-scale, and very expensive, attempt to map all of the functional elements in the human genome.

The preliminary study (ENCODE 1) was published in 2007 and the main publicity campaign surrounding that study focused on the fact that much of the human genome was transcribed. The implication was that most of the genome is functional. [see: The ENCODE publicity campaign of 2007].

The ENCODE 2 results were published in 2012 and the publicity campaign emphasized that up to 80% of our genome is functional. Many stories in the popular press touted the death of junk DNA. [see: What did the ENCODE Consortium say in 2012]

Both of these publicity campaigns, and the published conclusions, were heavily criticized for not understanding the distinction between fortuitous transcription and real genes and for not understanding the difference between fortuitous binding sites and functional binding sites. Hundreds of knowledgeable scientists pointed out that it was ridiculous for ENCODE researchers to claim that most of the human genome is functional based on their data. They also pointed out that ENCODE researchers ignored most of the evidence supporting junk DNA.

ENCODE 3 has just been published and the hype has been toned down considerably. Take a look at the main publicity article just published by Nature (ENCODE 3). The Nature article mentions ENCODE 1 and ENCODE 2 but it conveniently ignores the fact that Nature heavily promoted the demise of junk DNA back in 2007 and 2012. The emphasis now is not on how much of the genome is functional—the main goal of ENCODE—but on how much data has been generated and how many papers have been published. You can read the entire article and not see any mention of previous ENCODE/Nature claims. In fact, they don't even tell you how many genes ENCODE found or how many functional regulatory sites were detected.

The News and Views article isn't any better (Expanded ENCODE delivers invaluable genomic encyclopedia). Here's the opening paragraph of that article ...
Less than 2% of the human genome encodes proteins. A grand challenge for genomic sciences has been mapping the functional elements — the regions that determine the extent to which genes are expressed — in the remaining 98% of our DNA. The Encyclopedia of DNA Elements (ENCODE) project, among other large collaborative efforts, was established in 2003 to create a catalogue of these functional elements and to outline their roles in regulating gene expression. In nine papers in Nature, the ENCODE consortium delivers the third phase of its valuable project.1
You'd think with such an introduction that you would be about to learn how much of the genome is functional according to ENCODE 3 but you will be disappointed. There's nothing in that article about the number of genes, the number of regulatory sites, or the number of other functional elements in the human genome. It almost as if Nature wants to tell you about all of the work involved in "mapping the functional elements" without ever describing the results and conclusions. This is in marked contrast to the Nature publicity campaigns of 2007 and 2012 where they were more than willing to promote the (incorrect) conclusions.

In 2020 Nature seems to be more interested in obfuscation and opaqueness. One other thing is certain, the Nature editors and writers aren't the least bit interested in discussing their previous claims about 80% of the genome being functional!

I guess we'll have to rely on the ENCODE Consortium itself to give us a summary of their most recent findings. The summary paper has an intriguing title (Perspectives on ENCODE) that almost makes you think they will revisit the exaggerated claims of 2007 and 2012. No such luck. However, we do learn a little bit about the human genome.
  • 20,225 protein-coding genes [almost 1000 more than the best published estimates - LAM]
  • 37,595 noncoding genes [I strongly doubt they have evidence for that many functional genes]
  • 2,157,387 open chromatin regions [what does this mean?]
  • 1,224,154 transcription factor binding sites [how many are functional?]
That's it. The ENCODE Consortium seems to have learned only two things in 2012. They learned that it's better to avoid mentioning how much of the genome is functional in order to avoid controversy and criticism and they learned that it's best to ignore any of their previous claims for the same reason. This is not how science is supposed to work but the ENCODE Consortium has never been good at showing us how science is supposed to work.

Note: I've looked at some of the papers to try and find out if ENCODE stands by it's previous claim that most the genome is functional but they all seem to be written in a way that avoids committing to such a percentage or addressing the criticisms from 2007 and 2012. The only exception is a paper stating that cis-regulatory elements occupy 7.9% of the human genome (Expanded encyclopaedias of DNA elements in the human and mouse genomes). Please let me know if you come across anything interesting in those papers.


1. Isn't it about time to stop dwelling on the fact that 2% (actually less than 1%) of our genome encodes protein? We've known for decades that there are all kinds of other functional regions of the genome. No knowledgeable scientist thinks that the remaining 98% (99%) has no function.

Saturday, July 11, 2020

The coronavirus life cycle

The coronavirus life cycle is depicted in a figure from Fung and Liu (2019). See below for a brief description.
The virus particle attaches to receptors on the cell surface (mostly ACE2 in the case of SARS-CoV-2). It is taken into the cell by endocytosis and then the viral membrane fuses with the host membrane releasing the viral RNA. The viral RNA is translated to produce the 1a and 1ab polyproteins, which are cleaved to produce 16 nonstructural proteins (nsps). Most of the nsps assemble to from the replication-transcription complex (RTC). [see Structure and expression of the SARS-CoV-2 (coronavirus) genome]

RTC transcribes the original (+) strand creating (-) strands that are subsequently copied to make more viral (+) strands. RTC also produces a cluster of nine (-) strand subgenomic RNAs (sgRNAs) that are transcribed to make (+) sgRNAs that serve as mRNAs for the production of the structural proteins. N protein (nucleocapsid) binds to the viral (+) strand RNAs to help form new viral particles. The other structural proteins are synthesized in the endoplasmic reticulum (ER) where they assemble to form the protein-membrane virus particle that engulfs the viral RNA.

New virus particles are released when the vesicles fuse with the plasma membrane.

The entire life cycle takes about 10-16 hours and about 100 new virus particles are released before the cell commits suicide by apoptosis.


Fung, T.S. and Liu, D.X. (2019) Human coronavirus: host-pathogen interaction. Annual review of microbiology 73:529-557. [doi: 10.1146/annurev-micro-020518-115759]


Thursday, July 09, 2020

Structure and expression of the SARS-CoV-2 (coronavirus) genome


Coronaviruses are RNA viruses, which means that their genome is RNA, not DNA. All of the coronaviruses have similar genomes but I'm sure you are mostly interested in SARS-CoV-2, the virus that causes COVID-19. The first genome sequence of this virus was determined by Chinese scientists in early January and it was immediately posted on a public server [GenBank MN908947]. The viral RNA came from a patient in intensive care at the Wuhan Yin-Tan Hospital (China). The paper was accepted on Jan. 20th and it appeared in the Feb. 3rd issue of Nature (Zhou et al. 2020).

By the time the paper came out, several universities and pharmaceutical companies had already constructed potential therapeutics and several others had already cloned the genes and were preparing to publish the structures of the proteins.1

By now there are dozens and dozens of sequences of SARS-CoV-2 genomes from isolates in every part of the world. They are all very similar because the mutation rate in these RNA viruses is not high (about 10-6 per nucleotide per replication). The original isolate has a total length of 29,891 nt not counting the poly(A) tail. Note that these RNA viruses are about four times larger than a typical retrovirus; they are the largest known RNA viruses.

Wednesday, July 08, 2020

Where did your chicken come from?

Scientists have sequenced the genomes of modern domesticated chickens and compared them to the genomes of various wild pheasants in southern Asia. It has been known for some time that chickens resemble a species of pheasant called red jungle fowl and this led Charles Darwin to speculate that chickens were domesticated in India. Others have suggested Southeast Asia or China as the site of domestication.

The latest results show that modern chickens probably descend from a subspecies of red jungle fowl that inhabits the region around Myanmar (Wang et al., 2020). The subspecies is Gallus gallus spadiceus and the domesticated chicken subspecies is Gallus gallus domesticus. As you might expect, the two subspecies can interbreed.

The authors looked at a total of 863 genomes of domestic chickens, four species of jungle fowl, and all five subspecies of red jungle fowl. They identified a total of 33.4 million SNPs, which were enough to genetically distinguish between the various species AND the subspecies of red jungle fowl. (Contrary to popular belief, it is quite possible to assign a given genome to a subspecies (race) based entirely on genetic differences.)

The sequence data suggest that chickens were domesticated from wild G. g. spadiceus about 10,000 years ago in the northern part of Southeast Asia. The data also suggest that modern domesticated chickens (G. g. domesticus) from India, Pakistan, and Bangladesh interbred with another subspecies of red jungle fowl (G. g. murghi) after the original domestication. These chickens from South Asia contain substantial contributions from G. g. murghi ranging from 8-22%.

Next time you serve chicken, if someone asks you where it came from you won't be lying if you say it came from Myanmar.


Image credits: BBQ chicken, Creative Common License [Chicken BBQ]
Red Jungle Fowl, Creative Commons License [Red_Junglefowl_-Thailand]
Map: Lawler, A. (2020) Dawn of the chicken revealed in Southeast Asia, Science: 368: 1411.

Wang, M., Thakur, M., Peng, M. et al. (2020) 863 genomes reveal the origin and domestication of chicken. Cell Res (2020) [doi: 10.1038/s41422-020-0349-y]

Monday, July 06, 2020

A storm of cytokines

Cytokines are a diverse groups of small signal proteins that act like hormones to turn on genes in blood cells and cells of the immune system. In COVID-19 the production of cytokines can be over-stimulated to produce a cytokine storm that activates immune cells producing all kinds of severe, sometimes lethal, effects. There are dozens of different cytokines but they all act in a similar manner. Each one binds to a receptor on the membrane of a target cell and this stimulates the cytoplasmic side of the receptor to activate a transcription factor that enters the nucleus and turns on a specific set of genes. The activation step requires phosphorylation just like dozens of other signalling pathways. (See Morris et al. (2018) for a recent review.)

I was curious about the structures of these cytokines so I looked up a few of them on PDB. Here are three fairly representative structures.



Morris, R., Kershaw, N.J., and Babon, J.J. (2018) The molecular details of cytokine signaling via the JAK/STAT pathway. Protein Science 27: 1984-2009. [doi: doi.org/10.1002/pro.3519]

Saturday, June 13, 2020

What's in Your Genome? Chapter 3: Repetitive DNA and Mobile Genetic Elements

By the end of chapter 3, readers will be familiar with two main lines of evidence for junk DNA: the C-Value Paradox, and the fact that most of our genome is full of bits and pieces of dead transposons and viruses. They will also understand that this is perfectly consistent with modern evolutionary theory.

Chapter 3: Repetitive DNA and Mobile Genetic Elements
  • Centromeres
  • Telomeres
  • Mobile genetic elements
  • Hidden viruses in your genome
  • What the heck is a transposon?
  • LINES and SINES
  • How much of our genome is composed of transposon-related sequences?
  • BOX 3-1: What does the humped bladderwort tell us about junk DNA?
  • Selfish genes and selfish DNA
  • Mitochondria are invading your genome!
  • Selection hypotheses
  • Exaptation and the post hoc fallacy
  • Box 3-2: Natural genetic engineering?
  • If it walks like a duck ...


What's in Your Genome? Chapter 2: The Evolution of Sloppy Genomes

I had to completely reorganize chapter 2 in order to move population genetics closer to the beginning of the book and reduce the number of words.

Chapter 2: The Evolution of Sloppy Genomes
  • Fugu sashimi
  • Variation in genome size
  • The Onion Test
  • Instantaneous genome doubling
  • Modern evolutionary theory
  • Random genetic drift
  • Neutral Theory
  • Nearly-Neutral Theory
  • Box 2-1: Are humans are still evolving?
  • Population size and the Drift-Barrier Hypothesis
  • Bacteria have small genomes
  • On the evolution of sloppy genomes



What's in Your Genome? Chapter 1: Introducing Genomes

My book is progressing slowly. The main task is to reduce it to about 120,000 words and that's proving to be a lot more difficult that I imagined.

Here's what's now in Chapter 1: Introducing Genomes
  • The genome war
  • Finishing the human genome sequence
  • What is DNA?
  • The double helix
  • The sequence of all the base pairs was the goal of the human genome project
  • How big is your genome?
  • Packaging DNA: chromatin
  • Transcription
  • Translation
  • The genetic code
  • Introns and exons
  • The history of junk DNA



Thursday, June 11, 2020

Dan Graur proposes a new definition of "gene"

I've thought a lot about how to define the word "gene." It's clear that no definition will capture all the possibilities but that doesn't mean we should abandon the term. Traditionally, the biochemical definition attempts to describe the part of the genome that produces a functional product. Most scientists seem to think that the only possible product is a protein so it's common to see the word "gene" defined as a DNA sequence that produces a protein.

But from the very beginning of molecular biology the textbooks also talked about genes for ribosomal RNAs and tRNAs so there was never a time when knowledgeable scientists restricted their definition of a gene to protein-coding regions. My best molecular definition is described in What Is a Gene?.

A gene is a DNA sequence that is transcribed to produce a functional product.

Dan Graur has also thought about the issue and he comes up with a different definition in a recent blog post: What Is a Gene? A Very Short Answer with a Very Long Footnote

A gene is a sequence of genomic material (DNA or RNA) that has a selected effect function.

This is obviously an attempt to equate "function" with "gene" so that all functional parts of the genome are genes, by definition. You might think this is rather silly because it excludes some obvious functional regions but Dan really does want to count them as genes.
Performance of the function may or may not require the gene to be translated or even transcribed.

Genes can, therefore, be classified into three categories:

(1) protein-coding genes, which are transcribed into RNA and subsequently translated into proteins.

(2) RNA-specifying genes, which are transcribed but not translated

(3) nontranscribed genes.
Really? Is it useful to think of centromeres and telomeres as genes? Is it useful to define an origin of replication as a gene? And what about regulatory sequences? Should each functional binding site for a transcription factor be called a gene?

The definition also leads to some other problems. Genes (my definition) occupy about 30% of the human genome but most of this is introns, which are mostly junk (i.e. no selected effect function). How does that make sense using Dan's definition?


Saturday, April 18, 2020

Three scientists discuss junk DNA

I just found this video that was posted to YouTube on May 2019. It's produced by the University of California and it features three researchers discussing the question, "Is Most of Your DNA Junk!" The three scientists are:
  • Rusty Gage, a neuroscientist at the Salk Institute
  • Alysson Muotri, who studies brain development at the University of California, San Diego
  • Miles Wilkinson, who studies neuronal and germ cell development at the University of San Diego
None of them appear to be experts on genomes or junk DNA although one of them (Wilkinson) appears to have some knowledge of the evidence for junk DNA, although many of his explanations are garbled. What's interesting is that they emphasize the fact that some transposon-related sequences are expressed in some cells and they rely on this fact to remain skeptical of junk DNA. They also propose that excess DNA might be present in order to ensure diversity and prepare for future evolution. All three seem to be comfortable with the idea that excess DNA may be protecting the rest of the functional genome.

This is a good example of what we are up against when we try to convince scientists that most of our genome is junk.





Wednesday, April 08, 2020

Alternative splicing: function vs noise

This post is about a recent review of alternative splicing published by my colleague Ben Blencowe in the Dept. of Medical Genetics at the University of Toronto (Toronto, Ontario, Canada). (The other author is Jermej Ule of The Francis Crick Institute in London (UK).) They are strong supporters of the idea that alternative splicing is a common feature of most human genes.

I am a strong supporter of the idea that most splice variants are due to splicing errors and only a few percent of human genes undergo true alternative spicing.

This is a disagreement about the definition of "function." Is the mere existence of multiple splice variants evidence that they are biologically relevant (functional) or should we demand evidence of function—such as conservation—before accepting such a claim?

Monday, April 06, 2020

The Function Wars Part VII: Function monism vs function pluralism

This post is mostly about a recent paper published in Studies in History and Philosophy of Biol & Biomed Sci where two philosophers present their view of the function wars. They argue that the best definition of function is a weak etiological account (monism) and pluralistic accounts that include causal role (CR) definitions are mostly invalid. Weak etiological monism is the idea that sequence conservation is the best indication of function but that doesn't necessarily imply that the trait arose by natural selection (adaptation); it could have arisen by neutral processes such as constructive neutral evolution.

The paper makes several dubious claims about ENCODE that I want to discuss but first we need a little background.

Background

The ENCODE publicity campaign created a lot of controversy in 2012 because ENCODE researchers claimed that 80% of the human genome is functional. That claim conflicted with all the evidence that had accumulated up to that point in time. Based on their definition of function, the leading ENCODE researchers announced the death of junk DNA and this position was adopted by leading science writers and leading journals such as Nature and Science.

Let's be very clear about one thing. This was a SCIENTIFIC conflict over how to interpret data and evidence. The ENCODE researchers simply ignored a ton of evidence demonstrating that most of our genome is junk. Instead, they focused on the well-known facts that much of the genome is transcribed and that the genome is full of transcription factor binding sites. Neither of these facts were new and both of them had simple explanations: (1) most of the transcripts are spurious transcripts that have nothing to do with function, and (2) random non-functional transcription factor binding sites are expected from our knowledge of DNA binding proteins. The ENCODE researchers ignored these explanations and attributed function to all transcripts and all transcription factor binding sites. That's why they announced that 80% of the genome is functional.

Wednesday, February 12, 2020

Happy Darwin Day! 2020

Charles Darwin, the greatest scientist who ever lived, was born on this day in 1809 [Darwin still spurs tributes, debates] [Happy Darwin Day!] [Darwin Day 2017]. Darwin is mostly famous for two things: (1) he described and documented the evidence for evolution and common descent and (2) he provided a plausible scientific explanation of evolution—the theory of natural selection. He put all this in a book, The Origin of Species by Means of Natural Selection published in 1859—a book that spurred a revolution in our understanding of the natural world. (You can still buy a first edition copy of the book but it will cost you several hundred thousand dollars.)

Friday, February 07, 2020

The Function Wars Part VI: The problem with selected effect function

The term "Function Wars" refers to the debate over the meaning of 'function,' especially in the context of junk DNA.1 That debate intensified in 2012 after the ENCODE publicity campaign that tried to redefine function to mean anything they want as long as it refutes junk DNA. This is the sixth in a series of posts exploring the debate and why it's important, or not. Links to the other five posts can be found at the bottom or this post.

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.
Stephen Jay Gould (1982)
Much of the discussion seems like quibbling over semantics but I'm reminded of a similar debate over the mode of evolution: is it gradual or punctuated? As Gould pointed out in 1982, there's a serious issue underlying the debate—an issue that shouldn't get lost in bickering over the meaning of 'gradualistic.' The same warning applies here. It's important to determine how much of the human genome is junk and that requires an understanding of what we mean by junk DNA. However, it's easy to get distracted by focusing on the exact meaning of the word 'function' instead of looking at the big picture.

Friday, January 31, 2020

lncRNA nonsense from Los Alamos

A group of scientists at the Los Alamos National Laboratory (Los Alamos, NM, USA) and their collaborators in Vienna (Austria) and Lethbridge (Alberta, Canada) have worked out the structure of Braveheart lncRNA from mice.
Kim, D.N., Thiel, B.C., Mrozowich, T., Hennelly, S.P., Hofacker, I.L., Patel, T.R., Sanbonmatsu, K.Y. (2020) Zinc-finger protein CNBP alters the 3-D structure of lncRNA Braveheart in solution. Nat. Commun. 11:148 [doi: 10.1038/s41467-019-13942-4]
The authors point out in their paper that lncRNAs are difficult to work with and the 3D structures of only a small number have been characterized. There's nothing in the paper about the problems associated with determining the functions of lncRNAs and nothing about the number of lncRNAs except for this brief opening statement: "Long non-coding RNAs (lncRNAs) constitute a significant fraction of the transcriptome ..."

Tuesday, January 14, 2020

The Three Domain Hypothesis: RIP

The Three Domain Hypothesis died about twenty years ago but most people didn't notice.

The original idea was promoted by Carl Woese and his colleagues in the early 1980s. It was based on the discovery of archaebacteria as a distinct clade that was different from other bacteria (eubacteria). It also became clear that some eukaryotic genes (e.g. ribosomal RNA) were more closely related to archaebacterial genes and the original data indicated that eukaryotes formed another distinct group separate from either the archaebacteria or eubacteria. This gave rise to the Three Domain Hypothesis where each of the groups, bacteria (Eubacteria), archaebacteria (Archaea), and eukaryotes (Eucarya, Eukaryota), formed a separate clade that contained multiple kingdoms. These clades were called Domains.

Wednesday, January 08, 2020

Are pseudogenes really pseudogenes?

There are many junk DNA skeptics who claim that most of our genome is functional. Some of them have even questioned whether pseudogenes are mostly junk. The latest challenge comes from a recent review in Nature Reviews: Genetics where the authors try to place the burden of proof on those who say that pseudogenes are broken, nonfunctional, genes (Cheetam et al., 2019). The authors of the review try to make the case that we should not label a DNA sequence as a pseudogene until we can prove that it is truly nonfunctional junk.

I'm about to refute this ridiculous stance but first we need a little background.

Wednesday, January 01, 2020

Remember MOOCs?

We learned back in 2012 that Massive Open Online Courses (MOOCs) were going to transform higher education. People all over the world, especially in underdeveloped nations, would be able to learn from the best university professors while sitting at home in front of their computers. Several companies entered the market with high expectations of earning enormous profits while altruistically educating students who couldn't afford to go to university.

Tuesday, December 31, 2019

Are introns mostly junk?

There are many reasons for thinking that introns are mostly junk DNA.
  1. The size and sequence of introns in related species are not conserved and almost all of the sequences are evolving at the rate expected for neutral substitutions and fixation by drift.
  2. Many species have lost introns or reduced their lengths drastically suggesting that the presence of large introns can be detrimental in some cases (probably large populations).
  3. After decades of searching, there are very few cases where introns and/or parts of introns have been shown to be essential.
  4. Researchers routinely construct intronless versions of eukaryotic genes and they function normally when re-inserted into the genome.
  5. Intron sequences are often littered with transposon and viral sequences that have inserted into the intron and this is not consistent with the idea that intron sequences are important.
  6. About 98% of the introns in modern yeast (Saccharomyces cerevisiae) have been eliminated during evolution form a common ancestor that probably had about 18,000 introns [Yeast loses its introns]. This suggests that there was no selective pressure to retain those introns over the past 100 million years.
  7. About 245/295 of the remaining introns in yeast have been artificially removed by researchers who are constructing an artificial yeast genome suggesting that over 80% of the introns that survived evolutionary loss are also junk [Yeast loses its introns].

Sunday, December 15, 2019

The evolution of citrate synthase

Citrate synthase [EC 2.3.3.1] is one of the key enzymes of the citric acid cycle. It catalyzes the joining of acetyl-CoA and oxaloacetate to produce citrate.
acetyl-CoA + H2O + oxaloacetate → citrate + HS-CoA + H+
We usually think of this reaction in terms of energy production since acetyl-CoA is the end product of glycolysis and the citric acid cycle produces substrates that enter the electron transport system leading to production of ATP. However, it's important to keep in mind that the enzyme also catalyzes the reverse reaction.

Friday, December 13, 2019

The "standard" view of junk DNA is completely wrong

I was browsing the table of contents of the latest issue of Cell and I came across this ....
For decades, the miniscule protein-coding portion of the genome was the primary focus of medical research. The sequencing of the human genome showed that only ∼2% of our genes ultimately code for proteins, and many in the scientific community believed that the remaining 98% was simply non-functional “junk” (Mattick and Makunin, 2006; Slack, 2006). However, the ENCODE project revealed that the non-protein coding portion of the genome is copied into thousands of RNA molecules (Djebali et al., 2012; Gerstein et al., 2012) that not only regulate fundamental biological processes such as growth, development, and organ function, but also appear to play a critical role in the whole spectrum of human disease, notably cancer (for recent reviews, see Adams et al., 2017; Deveson et al., 2017; Rupaimoole and Slack, 2017).

Slack, F.J. and Chinnaiyan, A.M. (2019) The Role of Non-coding RNAs in Oncology. Cell 179:1033-1055 [doi: 10.1016/j.cell.2019.10.017]
Cell is a high-impact, refereed journal so we can safely assume that this paper was reviewed by reputable scientists. This means that the view expressed in the paragraph above did not raise any alarm bells when the paper was reviewed. The authors clearly believe that what they are saying is true and so do many other reputable scientists. This seems to be the "standard" view of junk DNA among scientists who do not understand the facts or the debate surrounding junk DNA and pervasive transcription.

Here are some of the obvious errors in the statement.
  1. The sequencing of the human genome did NOT show that only ~2% of our genome consisted of coding region. That fact was known almost 50 years ago and the human genome sequence merely confirmed it.
  2. No knowledgeable scientist ever thought that the remaining 98% of the genome was junk—not in 1970 and not in any of the past fifty years.
  3. The ENCODE project revealed that much of our genome is transcribed at some time or another but it is almost certainly true that the vast majority of these low-abundance, non-conserved, transcripts are junk RNA produced by accidental transcription.
  4. The existence of noncoding RNAs such as ribosomal RNA and tRNA was known in the 1960s, long before ENCODE. The existence of snoRNAs, snRNAs, regulatory RNAs, and various catalytic RNAS were known in the 1980s, long before ENCODE. Other RNAs such as miRNAs, piRNAS, and siRNAs were well known in the 1990s, long before ENCODE.
How did this false view of our genome become so widespread? It's partially because of the now highly discredited ENCODE publicity campaign orchestrated by Nature and Science but that doesn't explain everything. The truth is out there in peer-reviewed scientific publications but scientists aren't reading those papers. They don't even realize that their standard view has been seriously challenged. Why?


Monday, October 21, 2019

The evolution of de novo genes

De novo genes are new genes that arise spontaneously from junk DNA [De novo gene birth]. The frequency of de novo gene creation is important for an understanding of evolution. If it's a frequent event, then species with a large amount of junk DNA might have a selective advantage over species with less junk DNA, especially in a changing environment.

Last week I read a short Nature article on de novo genes [Levy, 2019] and I think the subject deserves more attention. Most new genes in a species appear to arise by gene duplication and subsequent divergence but de novo genes are genes that are unrelated to genes in any other clade so we can assume that they are created from junk DNA that accidentally becomes associated with a promoter causing the DNA to be transcribed. A new gene is formed if the RNA acquires a function. If the transcript contains an open reading frame then it may be translated to produce a polypeptide and if the polypeptide performs a new function then the resulting de novo gene is a new protein-coding gene.

The important question is whether the evolution of de novo genes is a common event or a rare event.

Tuesday, September 24, 2019

How many protein-coding genes in the human genome? (2)

It's difficult to know how many protein-coding genes there are in the human genome because there are several different ways of counting and the counts depend on what criteria are used to identify a gene. Last year I commented on a review by Abascal et al. (2018) that concluded there were somewhere between 19,000 and 20,000 protein-coding genes. Those authors discussed the problems with annotation and pointed out that the major databases don't agree on the number of gene [How many protein-coding genes in the human genome?].

Wednesday, September 11, 2019

Gerald Fink promotes a new definition of a gene

This is the 2019 Killian lecture at MIT, delivered in April 2019 by Gerald Fink. Fink is an eminent scientist who has done excellent work on the molecular biology of yeast. He was director of the prestigious Whitehead Institute at MIT from 1990-2001. With those credentials you would expect to watch a well-informed presentation of the latest discoveries in molecular genetics. Wouldn't you?



Sunday, September 08, 2019

Contingency, selection, and the long-term evolution experiment

I'm a big fan of Richard Lenski's long-term evolution experiment (LTEE) and of Zachary Blount's work in particular. [Strolling around slopes and valleys in the adaptive landscape] [On the unpredictability of evolution and potentiation in Lenski's long-term evolution experiment] [Lenski's long-term evolution experiment: the evolution of bacteria that can use citrate as a carbon source]

The results of the LTEE raise some interesting questions about evolution. The Lenski experiment began with 12 (almost) identical cultures and these have now "evolved" for 31 years and more than 65,000 generations. All of the cultures have diverged to some extent and one of them (and only one) has developed the ability to use citrate as a carbon source. Many of the cultures exhibit identical, or very similar, mutations that have reached significant frequencies, or even fixation, in the cultures.

Several other laboratory evolution experiments have been completed or are underway in various labs around the world. The overall results are relevant to a discussion about the role of contingency and accident in the history of life [see Evolution by Accident]. Is it true that if you replay the tape of life the results will be quite different? [Replaying life's tape].

Friday, August 30, 2019

Evolution by Accident

Evolution by Accident
v1.43 ©2006 Laurence A. Moran

This essay has been transferred here from an old server that has been decommissioned.Modern concepts of evolutionary change are frequently attacked by those who find the notions of randomness, chance, and accident to be highly distasteful. Some of these critics are intelligent design creationists and their objections have been refuted elsewhere. In this essay I'm more concerned about my fellow evolutionists who go to great lengths to eliminate chance and accident from all discussions about the fundamental causes of evolution. This is my attempt to convince them that evolution is not as predictable as they claim. I was originally stimulated to put my ideas down on paper when I read essays by John Wilkins [Evolution and Chance] and Loren Haarsma [Chance from a Theistic Perspective] on the TalkOrigins Archive.

The privilege of living beings is the possession of a structure and of a mechanism which ensures two things: (i) reproduction true to type of the structure itself, and (ii) reproduction equally true to type, of any accident that occurs in the structure. Once you have that, you have evolution, because you have conservation of accidents. Accidents can then be recombined and offered to natural selection to find out if they are of any meaning or not.
Jacques Monod (1974) p.394
The main conclusion of this essay is that a large part of ongoing evolution is determined by stochastic events that might as well be called "chance" or "random." Furthermore, a good deal of the past history of life on Earth was the product of chance events, or accidents, that could not have been predicted. When I say "evolution by accident" I'm referring to all these events. This phrase is intended solely to distinguish "accidental" evolution from that which is determined by non-random natural selection. I will argue that evolution is fundamentally a random process, although this should not be interpreted to mean that all of evolution is entirely due to chance or accident. The end result of evolution by accident is modern species that do not look designed.

Tuesday, August 27, 2019

First complete sequence of a human chromosome

A paper announcing the first complete sequence of a human chromosome has recently been posted on the bioRxiv server.

Miga, K. H., Koren, S., Rhie, A., Vollger, M. R., Gershman, A., Bzikadze, A., Brooks, S., Howe, E., Porubsky, D., Logsdon, G. A., et al. (2019) Telomere-to-telomere assembly of a complete human X chromosome. bioRxiv, 735928. doi: [doi: 10.1101/735928]

Abstract: After nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has been finished end to end, and hundreds of unresolved gaps persist. The remaining gaps include ribosomal rDNA arrays, large near-identical segmental duplications, and satellite DNA arrays. These regions harbor largely unexplored variation of unknown consequence, and their absence from the current reference genome can lead to experimental artifacts and hide true variants when re-sequencing additional human genomes. Here we present a de novo human genome assembly that surpasses the continuity of GRCh38, along with the first gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome, we reconstructed the ∼2.8 megabase centromeric satellite DNA array and closed all 29 remaining gaps in the current reference, including new sequence from the human pseudoautosomal regions and cancer-testis ampliconic gene families (CT-X and GAGE). This complete chromosome X, combined with the ultra-long nanopore data, also allowed us to map methylation patterns across complex tandem repeats and satellite arrays for the first time. These results demonstrate that finishing the human genome is now within reach and will enable ongoing efforts to complete the remaining human chromosomes.

Sunday, August 25, 2019

How much of the human genome has been sequenced?

It's been more than seven years since I posted information on how much of the human genome has been sequenced [How Much of Our Genome Is Sequenced?]. At that time, the latest version of the human reference genome was GRCh37.p7 (Feb. 3, 2012) and 89.6% of the genome had been sequenced. It's time to update that information.

We have a pretty good idea of the size of the human genome based on quantitative Feulgen staining (1940-1980) and reassociation kinetic experiments from the 1970s (Morton, 1991). We can safely assume that the correct size of the human genome is close to 3,200,000,000 bp (3,200,000 kb, 3,200 Mb, 3.2 Gb) [How Big Is the Human Genome?]. That's the value cited most often in the literature. However, the actual values calculated by Morton (1991) were 3.227 Gb for the haploid female genome and less than that for the haploid male genome. The human reference genome contains all 22 autosomes plus one copy of the X chromosome and one copy of the Y chromosome. This gives a total of 3.286 Gb.

Thursday, August 22, 2019

Reactionary fringe meets mutation-biased adaptation.
7. Going forward

This the last of a series of posts by Arlin Stoltzfus on the role of mutation as a dispositional factor in evolution. Arlin has established that the role of mutation in evolution is much more important than most people realize. He has also built a strong case for the influence of mutation bias. How should we incorporate these concepts into modern evolutionary theory?

Click on the links in the box (below) to see the other posts in the series.



Reactionary fringe meets mutation-biased adaptation.
7. Going forward

by Arlin Stoltzfus

Haldane (1922) argued that, because mutation is a weak pressure easily overcome by selection, the potential for biases in variation to influence evolution depends on neutral evolution or high mutation rates. This theory, like the Modern Synthesis of 1959, depends on the assumption that evolution begins with pre-existing variation. By contrast, when evolution depends on the introduction of new variants, mutational and developmental biases in variation may impose biases on evolution, without requiring neutral evolution or high mutation rates.