More Recent Comments

Saturday, June 13, 2020

What's in Your Genome? Chapter 3: Repetitive DNA and Mobile Genetic Elements

By the end of chapter 3, readers will be familiar with two main lines of evidence for junk DNA: the C-Value Paradox, and the fact that most of our genome is full of bits and pieces of dead transposons and viruses. They will also understand that this is perfectly consistent with modern evolutionary theory.

Chapter 3: Repetitive DNA and Mobile Genetic Elements
  • Centromeres
  • Telomeres
  • Mobile genetic elements
  • Hidden viruses in your genome
  • What the heck is a transposon?
  • LINES and SINES
  • How much of our genome is composed of transposon-related sequences?
  • BOX 3-1: What does the humped bladderwort tell us about junk DNA?
  • Selfish genes and selfish DNA
  • Mitochondria are invading your genome!
  • Selection hypotheses
  • Exaptation and the post hoc fallacy
  • Box 3-2: Natural genetic engineering?
  • If it walks like a duck ...


What's in Your Genome? Chapter 2: The Evolution of Sloppy Genomes

I had to completely reorganize chapter 2 in order to move population genetics closer to the beginning of the book and reduce the number of words.

Chapter 2: The Evolution of Sloppy Genomes
  • Fugu sashimi
  • Variation in genome size
  • The Onion Test
  • Instantaneous genome doubling
  • Modern evolutionary theory
  • Random genetic drift
  • Neutral Theory
  • Nearly-Neutral Theory
  • Box 2-1: Are humans are still evolving?
  • Population size and the Drift-Barrier Hypothesis
  • Bacteria have small genomes
  • On the evolution of sloppy genomes



What's in Your Genome? Chapter 1: Introducing Genomes

My book is progressing slowly. The main task is to reduce it to about 120,000 words and that's proving to be a lot more difficult that I imagined.

Here's what's now in Chapter 1: Introducing Genomes
  • The genome war
  • Finishing the human genome sequence
  • What is DNA?
  • The double helix
  • The sequence of all the base pairs was the goal of the human genome project
  • How big is your genome?
  • Packaging DNA: chromatin
  • Transcription
  • Translation
  • The genetic code
  • Introns and exons
  • The history of junk DNA



Thursday, June 11, 2020

Dan Graur proposes a new definition of "gene"

I've thought a lot about how to define the word "gene." It's clear that no definition will capture all the possibilities but that doesn't mean we should abandon the term. Traditionally, the biochemical definition attempts to describe the part of the genome that produces a functional product. Most scientists seem to think that the only possible product is a protein so it's common to see the word "gene" defined as a DNA sequence that produces a protein.

But from the very beginning of molecular biology the textbooks also talked about genes for ribosomal RNAs and tRNAs so there was never a time when knowledgeable scientists restricted their definition of a gene to protein-coding regions. My best molecular definition is described in What Is a Gene?.

A gene is a DNA sequence that is transcribed to produce a functional product.

Dan Graur has also thought about the issue and he comes up with a different definition in a recent blog post: What Is a Gene? A Very Short Answer with a Very Long Footnote

A gene is a sequence of genomic material (DNA or RNA) that has a selected effect function.

This is obviously an attempt to equate "function" with "gene" so that all functional parts of the genome are genes, by definition. You might think this is rather silly because it excludes some obvious functional regions but Dan really does want to count them as genes.
Performance of the function may or may not require the gene to be translated or even transcribed.

Genes can, therefore, be classified into three categories:

(1) protein-coding genes, which are transcribed into RNA and subsequently translated into proteins.

(2) RNA-specifying genes, which are transcribed but not translated

(3) nontranscribed genes.
Really? Is it useful to think of centromeres and telomeres as genes? Is it useful to define an origin of replication as a gene? And what about regulatory sequences? Should each functional binding site for a transcription factor be called a gene?

The definition also leads to some other problems. Genes (my definition) occupy about 30% of the human genome but most of this is introns, which are mostly junk (i.e. no selected effect function). How does that make sense using Dan's definition?


Saturday, April 18, 2020

Three scientists discuss junk DNA

I just found this video that was posted to YouTube on May 2019. It's produced by the University of California and it features three researchers discussing the question, "Is Most of Your DNA Junk!" The three scientists are:
  • Rusty Gage, a neuroscientist at the Salk Institute
  • Alysson Muotri, who studies brain development at the University of California, San Diego
  • Miles Wilkinson, who studies neuronal and germ cell development at the University of San Diego
None of them appear to be experts on genomes or junk DNA although one of them (Wilkinson) appears to have some knowledge of the evidence for junk DNA, although many of his explanations are garbled. What's interesting is that they emphasize the fact that some transposon-related sequences are expressed in some cells and they rely on this fact to remain skeptical of junk DNA. They also propose that excess DNA might be present in order to ensure diversity and prepare for future evolution. All three seem to be comfortable with the idea that excess DNA may be protecting the rest of the functional genome.

This is a good example of what we are up against when we try to convince scientists that most of our genome is junk.





Wednesday, April 08, 2020

Alternative splicing: function vs noise

This post is about a recent review of alternative splicing published by my colleague Ben Blencowe in the Dept. of Medical Genetics at the University of Toronto (Toronto, Ontario, Canada). (The other author is Jermej Ule of The Francis Crick Institute in London (UK).) They are strong supporters of the idea that alternative splicing is a common feature of most human genes.

I am a strong supporter of the idea that most splice variants are due to splicing errors and only a few percent of human genes undergo true alternative spicing.

This is a disagreement about the definition of "function." Is the mere existence of multiple splice variants evidence that they are biologically relevant (functional) or should we demand evidence of function—such as conservation—before accepting such a claim?

Monday, April 06, 2020

The Function Wars Part VII: Function monism vs function pluralism

This post is mostly about a recent paper published in Studies in History and Philosophy of Biol & Biomed Sci where two philosophers present their view of the function wars. They argue that the best definition of function is a weak etiological account (monism) and pluralistic accounts that include causal role (CR) definitions are mostly invalid. Weak etiological monism is the idea that sequence conservation is the best indication of function but that doesn't necessarily imply that the trait arose by natural selection (adaptation); it could have arisen by neutral processes such as constructive neutral evolution.

The paper makes several dubious claims about ENCODE that I want to discuss but first we need a little background.

Background

The ENCODE publicity campaign created a lot of controversy in 2012 because ENCODE researchers claimed that 80% of the human genome is functional. That claim conflicted with all the evidence that had accumulated up to that point in time. Based on their definition of function, the leading ENCODE researchers announced the death of junk DNA and this position was adopted by leading science writers and leading journals such as Nature and Science.

Let's be very clear about one thing. This was a SCIENTIFIC conflict over how to interpret data and evidence. The ENCODE researchers simply ignored a ton of evidence demonstrating that most of our genome is junk. Instead, they focused on the well-known facts that much of the genome is transcribed and that the genome is full of transcription factor binding sites. Neither of these facts were new and both of them had simple explanations: (1) most of the transcripts are spurious transcripts that have nothing to do with function, and (2) random non-functional transcription factor binding sites are expected from our knowledge of DNA binding proteins. The ENCODE researchers ignored these explanations and attributed function to all transcripts and all transcription factor binding sites. That's why they announced that 80% of the genome is functional.

Wednesday, February 12, 2020

Happy Darwin Day! 2020

Charles Darwin, the greatest scientist who ever lived, was born on this day in 1809 [Darwin still spurs tributes, debates] [Happy Darwin Day!] [Darwin Day 2017]. Darwin is mostly famous for two things: (1) he described and documented the evidence for evolution and common descent and (2) he provided a plausible scientific explanation of evolution—the theory of natural selection. He put all this in a book, The Origin of Species by Means of Natural Selection published in 1859—a book that spurred a revolution in our understanding of the natural world. (You can still buy a first edition copy of the book but it will cost you several hundred thousand dollars.)

Friday, February 07, 2020

The Function Wars Part VI: The problem with selected effect function

The term "Function Wars" refers to the debate over the meaning of 'function,' especially in the context of junk DNA.1 That debate intensified in 2012 after the ENCODE publicity campaign that tried to redefine function to mean anything they want as long as it refutes junk DNA. This is the sixth in a series of posts exploring the debate and why it's important, or not. Links to the other five posts can be found at the bottom or this post.

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.
Stephen Jay Gould (1982)
Much of the discussion seems like quibbling over semantics but I'm reminded of a similar debate over the mode of evolution: is it gradual or punctuated? As Gould pointed out in 1982, there's a serious issue underlying the debate—an issue that shouldn't get lost in bickering over the meaning of 'gradualistic.' The same warning applies here. It's important to determine how much of the human genome is junk and that requires an understanding of what we mean by junk DNA. However, it's easy to get distracted by focusing on the exact meaning of the word 'function' instead of looking at the big picture.

Friday, January 31, 2020

lncRNA nonsense from Los Alamos

A group of scientists at the Los Alamos National Laboratory (Los Alamos, NM, USA) and their collaborators in Vienna (Austria) and Lethbridge (Alberta, Canada) have worked out the structure of Braveheart lncRNA from mice.
Kim, D.N., Thiel, B.C., Mrozowich, T., Hennelly, S.P., Hofacker, I.L., Patel, T.R., Sanbonmatsu, K.Y. (2020) Zinc-finger protein CNBP alters the 3-D structure of lncRNA Braveheart in solution. Nat. Commun. 11:148 [doi: 10.1038/s41467-019-13942-4]
The authors point out in their paper that lncRNAs are difficult to work with and the 3D structures of only a small number have been characterized. There's nothing in the paper about the problems associated with determining the functions of lncRNAs and nothing about the number of lncRNAs except for this brief opening statement: "Long non-coding RNAs (lncRNAs) constitute a significant fraction of the transcriptome ..."

Tuesday, January 14, 2020

The Three Domain Hypothesis: RIP

The Three Domain Hypothesis died about twenty years ago but most people didn't notice.

The original idea was promoted by Carl Woese and his colleagues in the early 1980s. It was based on the discovery of archaebacteria as a distinct clade that was different from other bacteria (eubacteria). It also became clear that some eukaryotic genes (e.g. ribosomal RNA) were more closely related to archaebacterial genes and the original data indicated that eukaryotes formed another distinct group separate from either the archaebacteria or eubacteria. This gave rise to the Three Domain Hypothesis where each of the groups, bacteria (Eubacteria), archaebacteria (Archaea), and eukaryotes (Eucarya, Eukaryota), formed a separate clade that contained multiple kingdoms. These clades were called Domains.

Wednesday, January 08, 2020

Are pseudogenes really pseudogenes?

There are many junk DNA skeptics who claim that most of our genome is functional. Some of them have even questioned whether pseudogenes are mostly junk. The latest challenge comes from a recent review in Nature Reviews: Genetics where the authors try to place the burden of proof on those who say that pseudogenes are broken, nonfunctional, genes (Cheetam et al., 2019). The authors of the review try to make the case that we should not label a DNA sequence as a pseudogene until we can prove that it is truly nonfunctional junk.

I'm about to refute this ridiculous stance but first we need a little background.

Wednesday, January 01, 2020

Remember MOOCs?

We learned back in 2012 that Massive Open Online Courses (MOOCs) were going to transform higher education. People all over the world, especially in underdeveloped nations, would be able to learn from the best university professors while sitting at home in front of their computers. Several companies entered the market with high expectations of earning enormous profits while altruistically educating students who couldn't afford to go to university.

Tuesday, December 31, 2019

Are introns mostly junk?

There are many reasons for thinking that introns are mostly junk DNA.
  1. The size and sequence of introns in related species are not conserved and almost all of the sequences are evolving at the rate expected for neutral substitutions and fixation by drift.
  2. Many species have lost introns or reduced their lengths drastically suggesting that the presence of large introns can be detrimental in some cases (probably large populations).
  3. After decades of searching, there are very few cases where introns and/or parts of introns have been shown to be essential.
  4. Researchers routinely construct intronless versions of eukaryotic genes and they function normally when re-inserted into the genome.
  5. Intron sequences are often littered with transposon and viral sequences that have inserted into the intron and this is not consistent with the idea that intron sequences are important.
  6. About 98% of the introns in modern yeast (Saccharomyces cerevisiae) have been eliminated during evolution form a common ancestor that probably had about 18,000 introns [Yeast loses its introns]. This suggests that there was no selective pressure to retain those introns over the past 100 million years.
  7. About 245/295 of the remaining introns in yeast have been artificially removed by researchers who are constructing an artificial yeast genome suggesting that over 80% of the introns that survived evolutionary loss are also junk [Yeast loses its introns].

Sunday, December 15, 2019

The evolution of citrate synthase

Citrate synthase [EC 2.3.3.1] is one of the key enzymes of the citric acid cycle. It catalyzes the joining of acetyl-CoA and oxaloacetate to produce citrate.
acetyl-CoA + H2O + oxaloacetate → citrate + HS-CoA + H+
We usually think of this reaction in terms of energy production since acetyl-CoA is the end product of glycolysis and the citric acid cycle produces substrates that enter the electron transport system leading to production of ATP. However, it's important to keep in mind that the enzyme also catalyzes the reverse reaction.