More Recent Comments

Monday, February 12, 2018

Scientists fight back against fake news and pseudoscience

You probably know that climate change is real and humans are a major cause of global warming. You probably know that life has evolved and the Biblical story of creation is false. Scientists have been actively promoting these ideas for decades and they've been relatively successful in most countries. What you may not know is that these are just two of the many controversial claims that scientists are fighting. You may even have been tricked into believing some of the other pseudoscientific claims that are out there.

Dirty bacteria

Did you know that the dirt in your local park is full of bacteria? Each scoop of soil contains millions of bacteria. And it's not just in your local park, soil bacteria are everywhere. This is part of the reason why the total mass of bacteria on the planet outweighs all of the eukayotes combined, including elephants and whales.

There are hundreds of different species of bacteria in your local dirt. They are as different from each other as moose and mushrooms.

Did you ever wonder whether the bacteria in Australian soil are similar to the bacteria in Austrian soil? Delgado-Baquerizo and his colleagues did, so they tested soils from all over the world. The results are published in a recent issue of Science (Delgado-Baquerizo et al., 2018).

The answer is yes ... and no. They looked at 237 locations on all continents except Antarctica. Most samples had about 1000 different species—the authors call them "phylotypes" because it's hard to define what a species is in bacteria. Only a small number of species (phylotypes) were found in all locations (511 out of 25,224 = 2%) but they accounted for almost half of the total mass. Here's how the authors describe their result ...
Together, our results suggest that soil bacterial communities, like plant communities, are typically dominated by a relatively small subset of phylotypes.
Most of those 511 dominant phylotypes fall into two large and diverse clades (phyla?): Proteobacteria and Actinobacteria. The distribution is shown in Figure 1 of the paper (left). It illustrates a little-known fact about bacteria; namely, that they are a very diverse group. Scientists are only beginning to explore this diversity. Only 18% of the 511 dominant phylotypes were previously known to science!




Image Credit: Bacillus Sp. soil bacteria from The ecology of soil-borne human diseases

Delgado-Baquerizo, M., Oliverio, A.M., Brewer, T.E., Benavent-González, A., Eldridge, D.J., Bardgett, R.D., Maestre, F.T., Singh, B.K., and Fierer, N. (2018) A global atlas of the dominant bacteria found in soil. Science, 359(6373), 320-325. doi: doi: 10.1126/science.aap9516

Happy Darwin Day 2018!

Charles Darwin, the greatest scientist who ever lived, was born on this day in 1809 [Darwin still spurs tributes, debates] [Happy Darwin Day!] [Darwin Day 2017]. Darwin is mostly famous for two things: (1) he described and documented the evidence for evolution and common descent and (2) he provided a plausible scientific explanation of evolution—the theory of natural selection. He put all this in a book, The Origin of Species by Means of Natural Selection published in 1859—a book that spurred a revolution in our understanding of the natural world.

Modern evolutionary theory has advanced well beyond Darwin's theory but he still deserves to be honored for being the first to explain evolution and promote it in a way that convinced others. Here's one passage from the introduction to Origin of Species.
Although much remains obscure, and will long remain obscure, I can entertain no doubt, after the most deliberate and dispassionate study of which I am capable, that the view which most naturalists entertain, and which I formerly entertained—namely, that each species has been independently created—is erroneous. I am fully convinced that species are not immutable; but that those belonging to what are called the same genera are lineal descendants of some other and generally extinct species, in the same manner as the acknowledged varieties of any one species are the descendants of that species. Furthermore, I am convinced that Natural Selection has been the main but not exclusive means of modification.


One philosopher's view of random genetic drift

Random genetic drift is the process whereby some allele frequencies change in a population by chance alone. The alleles are not being fixed or eliminated by natural selection. Most of the alleles affected by drift are neutral or nearly neutral with respect to selection. Some are deleterious, in which case they may be accidentally fixed in spite of being selected against. Modern evolutionary theory incorporates random genetic drift as part of population genetics and modern textbooks contain extensive discussions of drift and the influence of population size. The scientific literature has focused recently on the Drift-Barrier Hypothesis, which emphasizes random genetic drift [Learning about modern evolutionary theory: the drift-barrier hypothesis].

Most of the alleles that become fixed in a population are fixed by random genetic drift and not by natural selection. Thus, in a very real sense, drift is the dominant mechanism of evolution. This is especially true in species with large genomes full of junk DNA (like humans) since the majority of alleles occur in junk DNA where they are, by definition, neutral.1 All of the data documenting drift and confirming its importance was discovered by scientists. All of the hypotheses and theories of modern evolution were, and are, developed by scientists.

Nothing in biology makes sense except in the light of population genetics.

Michael Lynch
You might be wondering why I bother to state the obvious; after all, this is the 21st century and everyone who knows about evolution should know about random genetic drift. Well, as it turns out, there are some people who continue to make silly statements about evolution and I need to set the record straight.

One of those people is Massimo Pigliucci, a former scientist who's currently more interested in the philosophy of science. We've encountered him before on Sandwalk [Massimo Pigliucci tries to defend accommodationism (again): result is predictable] [Does Philosophy Generate Knowledge?] [Proponents of the Extended Evolutionary Synthesis (EES) explain their logic using the Central Dogma as an example]. I looks like Pigliucci doesn't have a firm grip on modern evolutionary theory.

His main beef isn't with evolutionary biology. He's mostly upset about the fact that science as a way of knowing is extraordinarily successful whereas philosophy isn't producing many results. He loves to attack any scientist who points out this obvious fact. He accuses them of "scientism" as though that's all it takes to make up for the lack of success of philosophy. His latest rant appears on the Blog of the American Philosophers Association: The Problem with Scientism.

I'm not going to deal with the main part of his article because it's already been covered many times. However, there was one part that caught my eye. That's the part where he lists questions that science (supposedly) can't answer. The list is interesting. Pigliucci says,
Next to last, comes an attitude that seeks to deploy science to answer questions beyond its scope. It seems to me that it is exceedingly easy to come up with questions that either science is wholly unequipped to answer, or for which it can at best provide a (welcome!) degree of relevant background knowledge. I will leave it to colleagues in other disciplines to arrive at their own list, but as far as philosophy is concerned, the following list is just a start:
  • In metaphysics: what is a cause?
  • In logic: is modus ponens a type of valid inference?
  • In epistemology: is knowledge “justified true belief”?
  • In ethics: is abortion permissible once the fetus begins to feel pain?
  • In aesthetics: is there a meaningful difference between Mill’s “low” and “high” pleasures?
  • In philosophy of science: what role does genetic drift play in the logical structure of evolutionary theory?
  • In philosophy of mathematics: what is the ontological status of mathematical objects, such as numbers?
[my emphasis LAM]
Before getting to random genetic drift, I'll just note that my main problem with Pigliucci's argument is that there are other definitions of science that render his discussion meaningless. For example, I prefer the broad definition of science—the one that encompasses several of the Pigliucci's questions [Alan Sokal explains the scientific worldview][Territorial demarcation and the meaning of science]. The second point is that no matter how you define knowledge, philosophers haven't been very successful at adding to our knowledge base. They're good at questions (see above) but not so good at answers. Thus, it's reasonable to claim that science (broad definition) is the only proven method of acquiring knowledge. If that's scientism then I think it's a good working hypothesis.

Now back to random genetic drift. Did you notice that one of the questions that science is "wholly unequiped" to answer is the following: "what role does genetic drift play in the logical structure of evolutionary theory?" Really?

Pigliucci goes on to explain what he means ...
The scientific literature on all the above is basically non-existent, while the philosophical one is huge. None of the above questions admits of answers arising from systematic observations or experiments. While empirical notions may be relevant to some of them (e.g., the one on abortion), it is philosophical arguments that provide the suitable approach.
I hardly know what to say.

How many of you believe that the following statements are true with respect to random genetic drift and evolutionary theory?
  1. The scientific literature on all the above is basically non-existent.
  2. The philosophical literature is huge.
  3. The question does not admit of answers arising from systematic observations or experiments.
  4. It is philosophical arguments that provide the suitable approach.


1. There are some very rare exceptions where a mutation in junk DNA may have detrimental effects.

Saturday, February 10, 2018

We live in the age of bacteria

I'm sad because we now have almost a whole generation of young people who know very little about Stephen Jay Gould. (He died of cancer in 2002.) I was thinking of this yesterday as I was preparing a post on bacteria. Gould's 1996 book, Full House, is about fundamental misconceptions of evolution and progress and it contains the following passage (p. 176) ...

We live now in the "Age of Bacteria." Our planet has always been in the "Age of Bacteria," ever since the first fossils—bacteria, of course—were entombed in rocks more than three and a half billion years ago.

On any possible, reasonable, or fair criterion, bacteria are—and always have been—the dominant forms of life on earth.
Listen to him make this point twenty years ago ...



Friday, February 09, 2018

Junior scientist snowflakes

A recent letter in Nature draws attention to a serious (?) problem in modern society; namely, the persecution of junior scientists by older scientists who ask them tough questions. Anand Kumar Sharma warns us: "Don’t belittle junior researchers in meetings". Here's what he says, ...

The most interesting part of a scientific seminar, colloquium or conference for me is the question and answer session. However, I find it upsetting to witness the unnecessarily hard time that is increasingly given to junior presenters at such meetings. As inquisitive scientists, we do not have the right to undermine or denigrate the efforts of fellow researchers — even when their reply is unconvincing.

It is our responsibility to nurture upcoming researchers. Firing at a speaker from the front row is unlikely to enhance discussions. In my experience, it is more productive to offer positive queries and suggestions, and save negative feedback for more-private settings.

Are splice variants functional or noise?

This is a post about alternative splicing. I've avoided using that term in the title because it's very misleading. Alternative splicing produces a number of different products (RNA or protein) from a single intron-containing gene. The phenomenon has been known for 35 years and there are quite a few very well-studied examples, including several where all of the splice regulatory factors have been characterized.

Wednesday, February 07, 2018

The Salzburg sixty discuss a new paradigm in genetic variation

Sixty evolutionary biologists are going to meet next July in Salzburg (Austria)to discuss "a new paradigmatic understanding of genetic novelty" [Evolution – Genetic Novelty/Genomic Variations by RNA Networks and Viruses]. You probably didn't know that a new paradigm is necessary. That's because you didn't know that the old paradigm of random mutations can't explain genetic diversity. (Not!) Here's how the symposium organizers explain it on their website ...

Tuesday, February 06, 2018

How many lncRNAs are functional?

There's solid evidence that 90% of your genome is junk. Most of it is transcribed at some time but the transcripts are transient and usually confined to the nucleus. They are junk RNA [Functional RNAs?]. This is the view held by many experts but you wouldn't know that from reading the scientific literature and the popular press. The opposition to junk DNA gets much more attention in both venues.

There are prominent voices expressing the view that most of the genome is devoted to producing functional RNAs required for regulating gene expression [John Mattick still claims that most lncRNAs are functional]. Most of these RNAs are long noncoding RNAs known as lncRNAs. Although most of them fail all reasonable criteria for function there are still those who maintain that tens of thousands of them are functional [How many lncRNAs are functional: can sequence comparisons tell us the answer?].

Monday, February 05, 2018

ENCODE's false claims about the number of regulatory sites per gene

Some beating of dead horses may be ethical, where here and there they display unexpected twitches that look like life.

Zuckerkandl and Pauling (1965)

I realize that most of you are tired of seeing criticisms of ENCODE but it's important to realize that most scientists fell hook-line-and-sinker for the ENCODE publicity campaign and they still don't know that most of the claims were ridiculous.

I was reminded of this when I re-read Brendan Maher's summary of the ENCODE results that were published in Nature on Sept. 6, 2012 (Maher, 2012). Maher's article appeared in the front section of the ENCODE issue.1 With respect to regulatory sequences he said ...
The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes ... But the job is far from done, says [Ewan] Birney, a computational biologist at the European Molecular Biology Laboratory’s European Bioinformatics Institute in Hinxton, UK, who coordinated the data analysis for ENCODE. He says that some of the mapping efforts are about halfway to completion, and that deeper characterization of everything the genome is doing is probably only 10% finished.

Saturday, February 03, 2018

What's in Your Genome?: Chapter 5: Regulation and Control of Gene Expression

I'm working (slowly) on a book called What's in Your Genome?: 90% of your genome is junk! The first chapter is an introduction to genomes and DNA [What's in Your Genome? Chapter 1: Introducing Genomes ]. Chapter 2 is an overview of the human genome. It's a summary of known functional sequences and known junk DNA [What's in Your Genome? Chapter 2: The Big Picture]. Chapter 3 defines "genes" and describes protein-coding genes and alternative splicing [What's in Your Genome? Chapter 3: What Is a Gene?]. Chapter 4 is all about pervasive transcription and genes for functional noncoding RNAs [What's in Your Genome? Chapter 4: Pervasive Transcription].

Chapter 5 is Regulation and Control of Gene Expression.
Chapter 5: Regulation and Control of Gene Expression

What do we know about regulatory sequences?
The fundamental principles of regulation were worked out in the 1960s and 1970s by studying bacteria and bacteriophage. The initiation of transcription is controlled by activators and repressors that bind to DNA near the 5′ end of a gene. These transcription factors recognize relatively short sequences of DNA (6-10 bp) and their interactions have been well-characterized. Transcriptional regulation in eukaryotes is more complicated for two reasons. First, there are usually more transcription factors and more binding sites per gene. Second, access to binding sites depends of the state of chromatin. Nucleosomes forming high order structures create a "closed" domain where DNA binding sites are not accessible. In "open" domains the DNA is more accessible and transcription factors can bind. The transition between open and closed domains is an important addition to regulating gene expression in eukaryotes.
The limitations of genomics
By their very nature, genomics studies look at the big picture. Such studies can tell us a lot about how many transcription factors bind to DNA and how much of the genome is transcribed. They cannot tell you whether the data actually reflects function. For that, you have to take a more reductionist approach and dissect the roles of individual factors on individual genes. But working on single genes can be misleading ... you may miss the forest for the trees. Genomic studies have the opposite problem, they may see a forest where there are no trees.
Regulation and evolution
Much of what we see in evolution, especially when it comes to phenotypic differences between species, is due to differences in the regulation of shared genes. The idea dates back to the 1930s and the mechanisms were worked out mostly in the 1980s. It's the reason why all complex animals should have roughly the same number of genes—a prediction that was confirmed by sequencing the human genome. This is the field known as evo-devo or evolutionary developmental biology.
           Box 5-1: Can complex evolution evolve by accident?
Slightly harmful mutations can become fixed in a small population. This may cause a gene to be transcribed less frequently. Subsequent mutations that restore transcription may involve the binding of an additional factor to enhance transcription initiation. The result is more complex regulation that wasn't directly selected.
Open and closed chromatin domains
Gene expression in eukaryotes is regulated, in part, by changing the structure of chromatin. Genes in domains where nucleosomes are densely packed into compact structures are essentially invisible. Genes in more open domains are easily transcribed. In some species, the shift between open and closed domains is associated with methylation of DNA and modifications of histones but it's not clear whether these associations cause the shift or are merely a consequence of the shift.
           Box 5-2: X-chromosome inactivation
In females, one of the X-chromosomes is preferentially converted to a heterochromatic state where most of the genes are in closed domains. Consequently, many of the genes on the X chromosome are only expressed from one copy as is the case in males. The partial inactivation of an X-chromosome is mediated by a small regulatory RNA molecule and this inactivated state is passed on to all subsequent descendants of the original cell.
           Box 5-3: Regulating gene expression by
           rearranging the genome

In several cases, the regulation of gene expression is controlled by rearranging the genome to bring a gene under the control of a new promoter region. Such rearrangements also explain some developmental anomalies such as growth of legs on the head fruit flies instead of antennae. They also account for many cancers.
ENCODE does it again
Genomic studies carried out by the ENCODE Consortium reported that a large percentage of the human genome is devoted to regulation. What the studies actually showed is that there are a large number of binding sites for transcription factors. ENCODE did not present good evidence that these sites were functional.
Does regulation explain junk?
The presence of huge numbers of spurious DNA binding sites is perfectly consistent with the view that 90% of our genome is junk. The idea that a large percentage of our genome is devoted to transcriptional regulation is inconsistent with everything we know from the the studies of individual genes.
           Box 5-3: A thought experiment
Ford Doolittle asks us to imagine the following thought experiment. Take the fugu genome, which is very much smaller than the human genome, and the lungfish genome, which is very much larger, and subject them to the same ENCODE analysis that was performed on the human genome. All three genomes have approximately the same number of genes and most of those genes are homologous. Will the number of transcription factor biding sites be similar in all three species or will the number correlate with the size of the genomes and the amount of junk DNA?
Small RNAs—a revolutionary discovery?
Does the human genome contain hundreds of thousands of gene for small non-coding RNAs that are required for the complex regulation of the protein-coding genes?
A “theory” that just won’t die
"... we have refuted the specific claims that most of the observed transcription across the human genome is random and put forward the case over many years that the appearance of a vast layer of RNA-based epigenetic regulation was a necessary prerequisite to the emergence of developmentally and cognitively advanced organisms." (Mattick and Dinger, 2013)
What the heck is epigenetics?
Epigenetics is a confusing term. It refers loosely to the regulation of gene expression by factors other than differences in the DNA. It's generally assumed to cover things like methylation of DNA and modification of histones. Both of these effects can be passed on from one cell to the next following mitosis. That fact has been known for decades. It is not controversial. The controversy is about whether the heritability of epigenetic features plays a significant role in evolution.
           Box 5-5: The Weismann barrier
The Weisman barrier refers to the separation between somatic cells and the germ line in complex multicellular organisms. The "barrier" is the idea that changes (e.g. methylation, histone modification) that occur in somatic cells can be passed on to other somatic cells but in order to affect evolution those changes have to be transferred to the germ line. That's unlikely. It means that Lamarckian evolution is highly improbable in such species.
How should science journalists cover this story?
The question is whether a large part of the human genome is devoted to regulation thus accounting for an unexpectedly large genome. It's an explanation that attempts to refute the evidence for junk DNA. The issue is complex and very few science journalists are sufficiently informed enough to do it justice. They should, however, be making more of an effort to inform themselves about the controversial nature of the claims made by some scientists and they should be telling their readers that the issue has not yet been resolved.


Thursday, February 01, 2018

Sex isn't as beneficial as you might think

One of the most interesting topics in my molecular evolution class was the discussion over the importance of sex. Most students seem to think the problem is solved. They were taught that sex increases variation in a population and this gives sexual populations an evolutionary advantage. The fact that sex (recombination) breaks up as many linkages as it creates makes the explanation much less viable. The fact that there's very little evidence to support the claim comes as quite a surprise to my students. Sex is still one of the greatest mysterious in evolutionary biology [What did Joe Felsenstein say about sex?] [Everything you thought you knew about sex is probably wrong].

Kevin Laland's view of "modern" evolutionary theory (again)

Kevin Laland has just published another critique of modern evolutionary theory. This one appears in Aeon [Evolution unleashed]. His criticism is based on a naive and outdated view of modern evolutionary biology. That view has been widely criticized in the past but Laland continues to ignore such criticisms [e.g. Kevin Laland's new view of evolution].

Here's how he describes the state of modern evolutionary biology.
If you are not a biologist, you’d be forgiven for being confused about the state of evolutionary science. Modern evolutionary biology dates back to a synthesis that emerged around the 1940s-60s, which married Charles Darwin’s mechanism of natural selection with Gregor Mendel’s discoveries of how genes are inherited. The traditional, and still dominant, view is that adaptations – from the human brain to the peacock’s tail – are fully and satisfactorily explained by natural selection (and subsequent inheritance). Yet as novel ideas flood in from genomics, epigenetics and developmental biology, most evolutionists agree that their field is in flux. Much of the data implies that evolution is more complex than we once assumed.

Wednesday, January 31, 2018

Herding Hemingway's Cats by Kat Arney

Kat Arney has written a very good book on genes and gene expression. She covers all the important controversies in a thorough and thoughtful manner.

Kat Arney is a science writer based in the UK. She has a Ph.D. from the University of Cambridge where she worked on epigenetics and regulation in mice. She also did postdoc work at Imperial College in London. Her experience in the field of molecular biology and gene expression shows up clearly in her book where she demonstrates the appropriate skepticism and critical thinking in her coverage of the major advances in the field.

Friday, November 17, 2017

Calculating time of divergence using genome sequences and mutation rates (humans vs other apes)

There are several ways to report a mutation rate. You can state it as the number of mutations per base pair per year in which case a typical mutation rate for humans is about 5 × 10-10. Or you can express it as the number of mutations per base pair per generation (~1.5 × 10-8).

You can use the number of mutations per generation or per year if you are only discussing one species. In humans, for example, you can describe the mutation rate as 100 mutations per generation and just assume that everyone knows the number of base pairs (6.4 × 109).

Wednesday, November 08, 2017

How much mitochondrial DNA in your genome?

Most mitochondrial genes have been transferred from the ancestral mitochondrial genome to the nuclear genome over the course of 1-2 billion years of evollution. They are no longer present in mitochondria but they are easily recognized because they resemble α-proteobacterial sequences more than the other nuclear genes [see Endosymbiotic Theory].

This process of incorporating mitochondrial DNA into the nuclear genome continues to this day. The latest human reference genome has about 600 examples of nuclear sequences of mitochondrial origin (= numts). Some of them are quite recent while others date back almost 70 million years—the limit of resolution for junk DNA [see Mitochondria are invading your genome!].

Tuesday, November 07, 2017

Lateral gene transfer in eukaryotes - where's the evidence?

Lateral gene transfer (LGT), or horizontal gene transfer (HGT), is widespread in bacteria. It leads to the creation of pangenomes for many bacterial species where different subpopulations contain different subsets of genes that have been incorporated from other species. It also leads to confusing phylogenetic trees such that the history of bacterial evolution looks more like a web of life than a tree [The Web of Life].

Bacterial-like genes are also found in eukaryotes. Many of them are related to genes found in the ancestors of modern mitochondria and chloroplasts and their presence is easily explained by transfer from the organelle to the nucleus. Eukaryotic genomes also contain examples of transposons that have been acquired from bacteria. That's also easy to understand because we know how transposons jump between species.

Contaminated genome sequences

The authors of the original draft of the human genome sequence claimed that hundreds of genes had been acquired from bacteria by lateral gene transfer (LGT) (Lander et al., 2001). This claim was abandoned when the "finished" sequence was published a few years later (International Human Genome Consortium, 2004) because others had shown that the data was easily explained by differential gene loss in other lineages or by bacterial contamination in the draft sequence (see Salzberg, 2017).

Thursday, November 02, 2017

Parental age and the human mutation rate

Theme

Mutation

-definition
-mutation types
-mutation rates
-phylogeny
-controversies

Mutations are mostly due to errors in DNA replication. We have a pretty good idea of the accuracy of DNA replication—the overall error rate is about 10-10 per bp. There are about 30 cell divisions in females between zygote and formation of all egg cells. In males, there are about 400 mitotic cell divisions between zygote and formation of sperm cells (Ohno, 2019) . Using these average values, we can calculate the number of mutations per generation. It works out to about 130 mutations per generation [Estimating the Human Mutation Rate: Biochemical Method].

This value is similar to the estimate from comparing the sequences of different species (e.g. human and chimpanzee) based on the number of differences and the estimated time of divergence. This assumes that most of the genome is evolving at the rate expected for fixation of neutral alleles. This phylogenetic method give a value of about 112 mutations per generation [Estimating the Human Mutation Rate: Phylogenetic Method].

The third way of measuring the mutation rate is to directly compare the genome sequence of a child and both parents (trios). After making corrections for false positives and false negatives, this method yields values of 60-100 mutations per generation depending on how the data is manipulated [Estimating the Human Mutation Rate: Direct Method]. The lower values from the direct method call into question the dates of the split between the various great ape lineages. This controversy has not been resolved [Human mutation rates] [Human mutation rates - what's the right number?].

It's clear that males contribute more to evolution than females. There's about a ten-fold difference in the number of cell divisions in the male line compared to the female line; therefore, we expect there to be about ten times more mutations inherited from fathers. This difference should depend on the age of the father since the older the father the more cell divisions required to produce sperm.

This effect has been demonstrated in many publications. A maternal age effect has also been postulated but that's been more difficult to prove. The latest study of Icelandic trios helps to nail down the exact effect (Jónsson et al., 2017).

The authors examined 1,548 trios consisting of parents and at least one offspring. They analyzed 2.682 Mb of genome sequence (84% of the total genome) and discovered an average of 70 mutations events per child.1 This gives an overall mutation rate of 83 mutations per generation with an average generation time of 30 years. This is consistent with previous results.

Jónsson et al. looked at 225 cases of three generation data in order to make sure that the mutations were germline mutations and not somatic cell mutations. They plotted the numbers of mutations against the age of the father and mother to produce the following graph from Figure 1 of their paper.


Look at parents who are 30 years old. At this age, females contribute about 10 mutations and males contribute about 50. This is only a five-fold difference—much lees than we expect from the number of cell divisions. This suggests that the initial estimates of 400 cell divisions in males might be too high.

An age effect on mutations from the father is quite apparent and expected. A maternal age effect has previously been hypothesized but this is the first solid data that shows such an effect. The authors speculate that oocyotes accumulate mutations with age, particularly mutations due to strand breakage.


Of these, 93% were single nucleotide changes and 7% were small deletions or insertions.

Jónsson, H., Sulem, P., Kehr, B., Kristmundsdottir, S., Zink, F., Hjartarson, E., Hardarson, M.T., Hjorleifsson, K.E., Eggertsson, H.P., and Gudjonsson, S.A. (2017) Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature, 549:519-522. [doi: 10.1038/nature24018]

Ohno, M. (2019) Spontaneous de novo germline mutations in humans and mice: rates, spectra, causes and consequences. Genes & genetic systems 94:13-22. [doi: 10.1266/ggs.18-00015]

Tuesday, October 31, 2017

The history of DNA sequencing

This year marks the 40th anniversary of DNA sequencing technology (Gilbert and Maxam, 1977; Sanger et al., 1977)1 The Sanger technique soon took over and by the 1990s it was the only technique used to sequence DNA. The development of reliable sequencing machines meant the end of those large polyacrylamide gels that we all hated.

Pyrosequencing was developed in the mid 1990's and by the year 2000 massive parallel sequencing using this technique was becoming quite common. This "NextGen" sequencing technique was behind the massive explosion in sequences in the early part of the 21st century.2

Even newer techniques are available today and there's a debate about whether they should be called Third Generation Sequencing (Heather and Chain, 2015).

Nature has published a nice review of the history of DNA sequencing (Shendure et al., 2017). I recommend it to anyone who's interested in the subject. The figure above is taken from that article.


1. Many labs were using the technology in 1976 before the papers were published.

2. New software and enhanced computer power played an important, and underappreciated, role.

Heather, J.M., and Chain, B. (2015) The sequence of sequencers: The history of sequencing DNA. Genomics, 107:1-8. [doi: 10.1016/j.ygeno.2015.11.003]

Maxam, A.M., and Gilbert, W. (1980) Sequencing end-labeled DNA with base-specific chemical cleavages. Methods in enzymology, 65:499-560. [doi: 10.1016/S0076-6879(80)65059-9]

Sanger, F., Nicklen, S., and Coulson, A.R. (1977) DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, 74:5463-5467. [PDF]

Shendure, J., Balasubramanian, S., Church, G.M., Gilbert, W., Rogers, J., Schloss, J.A., and Waterston, R.H. (2017) DNA sequencing at 40: past, present and future. Nature, 550:345-353. [doi: 10.1038/nature24286]


Escape from X chromosome inactivation

Mammals have two sex chromosomes: X and Y. Males have one X chromosome and one Y chromosome and females have two X chromosomes. Since females have two copies of each X chromosome gene, you might expect them to make twice as much gene product as males of the same species. In fact, males and females often make about the same amount of gene product because one of the female X chromosomes is inactivated by a mechanism that causes extensive chromatin condensation.

The mechanism is known as X chromosome inactivation. The phenomenon was originally discovered by Mary Lyon (1925-2014) [see Calico Cats].

Saturday, October 28, 2017

Creationists questioning pseudogenes: the GULO pseudogene

This is the second post discussing creationist1 papers on pseudogenes. The first post addressed a paper by Jeffrey Tomkins on the β-globin pseudogene [Creationists questioning pseudogenes: the beta-globin pseudogene]. This post covers another paper by Tomkins claiming that the GULO pseudogenes in various primate species are not derived from a common ancestor but instead have been deactivated independently in each lineage.

The Tomkins' article was published in 2014 in Answers Research Journal, a publication that describes itself like this:
ARJ is a professional, peer-reviewed technical journal for the publication of interdisciplinary scientific and other relevant research from the perspective of the recent Creation and the global Flood within a biblical framework.

Saturday, October 14, 2017

Creationists questioning pseudogenes: the beta-globin pseudogene

Jonathan Kane recently (Oct. 6, 2017) posted an article on The Panda's Thumb where he claimed that Young Earth Creationists often don't get enough credit for raising serious issues about evolution [Five principles for arguing against creationism].

He mentioned some articles about pseudogenes as prime examples. I asked him for references and he responded with two articles by Jeffrey Tomkins that were published on the Answers in Genesis website. The first was on the β-globin pseudogene and the second was on the GULO pseudogene. Both articles claim that these DNA sequences aren't really pseudogenes because they have functions.

I'll deal with the β-globin pseudogene in this post and the GULO pseudogene in a subsequent post.

Wednesday, October 11, 2017

Historical evolution is determined by chance events

Modern evolutionary theory is based on the idea that alleles become fixed in a population over time. They can be fixed by natural selection if they confer selective advantage or they can be fixed by random genetic drift if they are nearly neutral or slightly deleterious [Learning about modern evolutionary theory: the drift-barrier hypothesis]. Alleles arise by mutation and the path that a population follows over time depends on the timing of mutations [Mutation-Driven Evolution]. That's largely a chance event.

Wednesday, September 13, 2017

Sequencing human diploid genomes

Most eukaryotes are diploid, including humans. They have two copies of each autosome. Thousands of human genomes have been sequenced but in almost all cases the resulting genome sequence is a mixture of sequences from homologous chromosomes. If a site is heterogeneous—different alleles on each chromosome—then these are entered as variants.

Monday, September 11, 2017

What's in Your Genome?: Chapter 4: Pervasive Transcription (revised)

I'm working (slowly) on a book called What's in Your Genome?: 90% of your genome is junk! The first chapter is an introduction to genomes and DNA [What's in Your Genome? Chapter 1: Introducing Genomes ]. Chapter 2 is an overview of the human genome. It's a summary of known functional sequences and known junk DNA [What's in Your Genome? Chapter 2: The Big Picture]. Chapter 3 defines "genes" and describes protein-coding genes and alternative splicing [What's in Your Genome? Chapter 3: What Is a Gene?].

Chapter 4 is all about pervasive transcription and genes for functional noncoding RNAs. I've finally got a respectable draft of this chapter. This is an updated summary—the first version is at: What's in Your Genome? Chapter 4: Pervasive Transcription.

Saturday, September 09, 2017

Cold Spring Harbor tells us about the "dark matter" of the genome (Part I)


This is a podcast from Cold Spring Harbor [Dark Matter of the Genome, Pt. 1 (Base Pairs Episode 8)]. The authors try to convince us that most of the genome is mysterious "dark matter," not junk. The main theme is that the genome contains transposons that could play an important role in evolution and disease.

Wednesday, August 30, 2017

Experts meet to discuss non-coding RNAs - fail to answer the important question

The human genome is pervasively transcribed. More than 80% of the genome is complementary to transcripts that have been detected in some tissue or cell type. The important question is whether most of these transcripts have a biological function. How many genes are there that produce functional non-coding RNA?

There's a reason why this question is important. It's because we have every reason to believe that spurious transcription is common in large genomes like ours. Spurious, or accidental, transcription occurs when the transcription initiation complex binds nonspecifically to sites in the genome that are not real promoters. Spurious transcription also occurs when the initiation complex (RNA plymerase plus factors) fires in the wrong direction from real promoters. Binding and inappropriate transcription are aided by the binding of transcription factors to nonpromoter regions of the genome—a well-known feature of all DNA binding proteins [see Are most transcription factor binding sites functional?].

Sunday, August 27, 2017

The Extended Evolutionary Synthesis - papers from the Royal Society meeting

I went to London last November to attend the Royal Society meeting on New trends in evolutionary biology: biological, philosophical and social science perspectives [New Trends in Evolutionary Biology: The Program].

The meeting was a huge disappointment [Kevin Laland's new view of evolution]. It was dominated by talks that were so abstract and obtuse that it was difficult to mount any serious discussion. The one thing that was crystal clear is that almost all of the speakers had an old-fashioned view of the current status of evolutionary theory. Thus, they were for the most part arguing against a strawman version of evolutionary theory.

The Royal Society has now published the papers that were presented at the meeting [Theme issue ‘New trends in evolutionary biology: biological, philosophical and social science perspectives’ organized by Denis Noble, Nancy Cartwright, Patrick Bateson, John Dupré and Kevin Laland]. I'll list the Table of Contents below.

Most of these papers are locked behind a paywall and that's a good thing because you won't be tempted to read them. The overall quality is atrocious—the Royal Society should be embarrassed to publish them.1 The only good thing about the meeting was that I got to meet a few friends and acquaintances who were supporters of evolution. There was also a sizable contingent of Intelligent Design Creationists at the meeting and I enjoyed talking to them as well2 [see Intelligent Design Creationists reveal their top story of 2016].

Friday, August 25, 2017

Niles Eldredge explains punctuated equilibria

Lots of people misunderstand punctuated equilibria. It's a theory about small changes leading to speciation. In many cases the changes are so slight that you and I might not notice the difference. These are not leaps or saltations and there are no intermediates or missing links. The changes may be due to changes in the frequency of one or two alleles.

Punctuated equilibria are when these speciation events take place relatively quickly and are followed by much longer periods of stasis (no change). Niles Eldredge explains how the theory is derived from his studies of thousands of trilobite fossils.



Niles Eldredge explains hierarchy theory

You may not agree but you should at least know what some evolutionary biologists are thinking.



How much of the human genome is devoted to regulation?

All available evidence suggests that about 90% of our genome is junk DNA. Many scientists are reluctant to accept this evidence—some of them are even unaware of the evidence [Five Things You Should Know if You Want to Participate in the Junk DNA Debate]. Many opponents of junk DNA suffer from what I call The Deflated Ego Problem. They are reluctant to concede that humans have about the same number of genes as all other mammals and only a few more than insects.

One of the common rationalizations is to speculate that while humans may have "only" 25,000 genes they are regulated and controlled in a much more sophisticated manner than the genes in other species. It's this extra level of control that makes humans special. Such speculations have been around for almost fifty years but they have gained in popularity since publication of the human genome sequence.

In some cases, the extra level of regulation is thought to be due to abundant regulatory RNAs. This means there must be tens of thousand of extra genes expressing these regulatory RNAs. John Mattick is the most vocal proponent of this idea and he won an award from the Human Genome Organization for "proving" that his speculation is correct! [John Mattick Wins Chen Award for Distinguished Academic Achievement in Human Genetic and Genomic Research]. Knowledgeable scientists know that Mattick is probably wrong. They believe that most of those transcripts are junk RNAs produced by accidental transcription at very low levels from non-conserved sequences.

Monday, August 07, 2017

A philosopher defends agnosticism

Paul Draper is a philosopher at Purdue University (West Lafayette, Indiana, USA). He has just (Aug. 2, 2017) posted an article on Atheism and Agnosticism on the Stanford Encyclopedia of Philosophy website.

Many philosphers use a different definition of atheism than many atheists. Philosophers tend to define atheism as the proposition that god(s) do not exist. Many atheists (I am one) define atheism as the lack of belief in god(s). The distinction is important but for now I want to discuss Draper's defense of agnosticism.

Keep in mind that Draper defines atheism as "god(s) don't exist." He argues, convincingly, that this proposition cannot be proven. He also argues that theism—the proposition that god(s) exist—can also not be proven. Therefore, the only defensible position for a philosopher like him is agnosticism.

Friday, August 04, 2017

To toss or not to toss?

Now that I'm officially retired I've been cleaning out my office at the university and transferring all the important stuff to my home office. I'm taking advantage of this opportunity to throw out everything that I don't want any more. Eventually I'll have to vacate my university office because it's due to be renovated and transferred to another department.

Some stuff is easy to toss out and some stuff is easy to keep. It's the other stuff that causes a problem. Here's an example ....


These are the manuals that came with my very first PC back in 1981. I know I'll never use them but I'm kinda attached to them. Are they antiques yet?


Thursday, July 27, 2017

talk.origins evolves

The newsgroup talk.origins was created more than 30 years ago. It's been a moderated newsgroup for the past twenty years. The moderator is David Greig and the server, named "Darwin," has been sitting in my office for most of that time. I retired on June 30th and my office is scheduled for renovation so Darwin had to move. Another complication is that the moderator is moving from Toronto to Copenhagen, Denmark.

So talk.origins evolves and the server is moving elsewhere. Goodby Darwin.



Friday, July 14, 2017

Bastille Day 2017

Today is the Fête Nationale in France known also as "le quatorze juillet" or Bastille Day.

This is the day in 1789 when French citizens stormed and captured the Bastille—a Royalist fortress in Paris. It marks the symbolic beginning of the French revolution although the real beginning is when the Third Estate transformed itself into the National Assembly on June 17, 1789 [Tennis Court Oath].

Ms. Sandwalk and I visited the site of the Bastille (Place de la Bastille) when we were in Paris in 2008. There's nothing left of the former castle but the site still resonates with meaning and history.

One of my wife's ancestors is William Playfair, the inventor of pie charts and bar graphs [Bar Graphs, Pie Charts, and Darwin]. His work attracted the attention of the French King so he moved to Paris in 1787 to set up an engineering business. He is said to have participated in the storming of the Bastille but he has a history of exaggeration and untruths so it's more likely that he just witnessed the event. He definitely lived nearby and was in Paris on the day in question. (His son, my wife's ancestor, was born in Paris in 1790.)

In honor of the French national day I invite you to sing the French national anthem, La Marseillaise. An English translation is provided so you can see that La Marseillaise is truly a revolutionary call to arms. (A much better translation can be found here.)1



1. I wonder if President Trump sang La Marseillaise while he was at the ceremonies today?

Check out Uncertain Principles for another version of La Marseillaise—this is the famous scene in Casablanca.

Reposted and modified from 2016.

Revisiting the genetic load argument with Dan Graur

The genetic load argument is one of the oldest arguments for junk DNA and it's one of the most powerful arguments that most of our genome must be junk. The concept dates back to J.B.S. Haldane in the late 1930s but the modern argument traditionally begins with Hermann Muller's classic paper from 1950. It has been extended and refined by him and many others since then (Muller, 1950; Muller, 1966).

Thursday, July 06, 2017

Scientists say "sloppy science" more serious than fraud

An article on Nature: INDEX reports on a recent survey of scientists: Cutting corners a bigger problem than research fraud. The subtitle says it all: Scientists are more concerned about the impact of sloppy science than outright scientific fraud.

The survey was published on BioMed Central.

Tuesday, July 04, 2017

Another contribution of philosophy: Bernard Lonergan

The discussion about philosophy continues on Facebook. One of my long-time Facebook friends, Jonathan Bernier, took up the challenge. Bernier is a professor of religious studies at St. Francis Xavier University in Nova Scotia, Canada. He is a card-carrying philosopher.1

The challenge is to provide recent (past two decades) examples from philosophy that have lead to increased knowledge and understanding of the natural world. Here's what Jonathan Bernier offered.
But to use just one example of advances in philosophical understanding, UofT (specifically Regis College) houses the Lonergan Research Institute, which houses Bernard Lonergan's archives and publishes his collected works. Probably his most significant work is a seven-hundred-page tome called Insight, the first edition of which was published in 1957. It is IMHO the single best account of how humans come to know anything that has ever been written. The tremendous fruits that it has wrought cannot be summarized in a FB commend. Instead, I'd suggest that you walk over and see the friendly people at the LRI. No doubt they could help answer some of your questions.
Here's a Wikipedia link to Bernard Lonergan. He was a Canadian Jesuit priest who died in 1984. Regis College is the Jesuit College associated with the University of Toronto.

Is Jonathan Bernier correct? Is it true that Lonergan's works will eventually change the way we understand learning?


Note: In my response to Bernier on Facebook I said, "I guess I'll just have to take our word for it. I'm not about to walk over to Regis College and consult a bunch of Jesuit priests about the nature of reality." Was I being too harsh? Is this really an examples of a significant contribution of philosophy? Is it possible that a philosopher could be very wrong about the existence of supernatural beings but still make a contribution to the nature of knowledge and understanding?

1. Jonathan Bernier tells me on Facebook that he is not a philosopher and never claimed to be a philosopher.

Monday, July 03, 2017

Contributions of philosophy

I've been discussing the contributions of philosophy on Facebook. Somebody linked to a a post on the topic: What has philosophy contributed to society in the past 50 years?. Here's one of contributions ... do you agree?
Philosophers, historians, and sociologists of science such as Thomas Kuhn, Paul Feyerabend, Bruno Latour, Bas van Fraassen, and Ian Hacking have changed the way that we see the purpose of science in everyday life, as well as proper scientific conduct. Kuhn's concept of a paradigm shift is now so commonplace as to be cliche. Meanwhile, areas like philosophy of physics and especially philosophy of biology are sites of active engagement between philosophers and scientists about the interpretation of scientific results.


Sunday, July 02, 2017

Confusion about the number of genes

My last post was about confusion over the sizes of the human and mouse genomes based on a recent paper by Breschi et al. (2017). Their statements about the number of genes in those species are also confusing. Here's what they say about the human genome.
[According to Ensembl86] the human genome encodes 58,037 genes, of which approximately one-third are protein-coding (19,950), and yields 198,093 transcripts. By comparison, the mouse genome encodes 48,709 genes, of which half are protein-coding (22,018 genes), and yields 118,925 transcripts overall.
The very latest Ensembl estimates (April 2017) for Homo sapiens and Mus musculus are similar. The difference in gene numbers between mouse and human is not significant according to the authors ...
The discrepancy in total number of annotated genes between the two species is unlikely to reflect differences in underlying biology, and can be attributed to the less advanced state of the mouse annotation.
This is correct but it doesn't explain the other numbers. There's general agreement on the number of protein-coding genes in mammals. They all have about 20,000 genes. There is no agreement on the number of genes for functional noncoding RNAs. In its latest build, Ensemble says there are 14,727 lncRNA genes, 5,362 genes for small noncoding RNAs, and 2,222 other genes for nocoding RNAs. The total number of non-protein-coding genes is 22,311.

There is no solid evidence to support this claim. It's true there are many transcripts resembling functional noncoding RNAs but claiming these identify true genes requires evidence that they have a biological function. It would be okay to call them "potential" genes or "possible" genes but the annotators are going beyond the data when they decide that these are actually genes.

Breschi et al. mention the number of transcripts. I don't know what method Ensembl uses to identify a functional transcript. Are these splice variants of protein-coding genes?

The rest of the review discusses the similarities between human and mouse genes. They point out, correctly, that about 16,000 protein-coding genes are orthologous. With respect to lncRNAs they discuss all the problems in comparing human and mouse lncRNA and conclude that "... the current catalogues of orthologous lncRNAs are still highly incomplete and inaccurate." There are several studies suggesting that only 1,000-2,000 lncRNAs are orthologous. Unfortunately, there's very little overlap between the two most comprehensive studies (189 lncRNAs in common).

There are two obvious possibilities. First, it's possible that these RNAs are just due to transcriptional noise and that's why the ones in the mouse and human genomes are different. Second, all these RNAs are functional but the genes have arisen separately in the two lineages. This means that about 10,000 genes for biologically functional lncRNAs have arisen in each of the genomes over the past 100 million years.

Breschi et al. don't discuss the first possibility.


Breschi, A., Gingeras, T.R., and Guigó, R. (2017) Comparative transcriptomics in human and mouse. Nature Reviews Genetics [doi: 10.1038/nrg.2017.19]

Genome size confusion

The July 2017 issue of Nature Reviews: Genetics contains an interesting review of a topic that greatly interest me.
Breschi, A., Gingeras, T. R., and Guigó, R. (2017). Comparative transcriptomics in human and mouse. Nature Reviews Genetics [doi: 10.1038/nrg.2017.19]

Cross-species comparisons of genomes, transcriptomes and gene regulation are now feasible at unprecedented resolution and throughput, enabling the comparison of human and mouse biology at the molecular level. Insights have been gained into the degree of conservation between human and mouse at the level of not only gene expression but also epigenetics and inter-individual variation. However, a number of limitations exist, including incomplete transcriptome characterization and difficulties in identifying orthologous phenotypes and cell types, which are beginning to be addressed by emerging technologies. Ultimately, these comparisons will help to identify the conditions under which the mouse is a suitable model of human physiology and disease, and optimize the use of animal models.
I was confused by the comments made by the authors when they started comparing the human and mouse genomes. They said,
The most recent genome assemblies (GRC38) include 3.1 Gb and 2.7 Gb for human and mouse respectively, with the mouse genome being 12% smaller than the human one.
I think this statement is misleading. The size of the human genome isn't known with precision but the best estimate is 3.2 Gb [How Big Is the Human Genome?]. The current "golden path length" according to Ensembl is 3,096,649,726 bp. [Human assembly and gene annotation]. It's not at all clear what this means and I've found it almost impossible to find out; however, I think it approximates the total amount of sequenced DNA in the latest assembly plus an estimate of the size of some of the gaps.

The golden path length for the mouse genome is 2,730,871,774 bp. [Mouse assembly and gene annotation]. As is the case with the human genome, this is NOT the genome size. Not as much mouse DNA sequence has been assembled into a contiguous and accurate assembly as is the case with humans. The total mouse sequence is at about the same stage the human genome assembly was a few years ago.

If you look at the mouse genome assembly data you see that 2,807,715,301 bp have been sequenced and there's 79,356,856 bp in gaps. That's 2.88 Gb which doesn't match the golden path length and doesn't match the past estimates of the mouse genome size.

We don't know the exact size of the mouse genome. It's likely to be similar to that of the human genome but it could be a bit larger or a bit smaller. The point is that it's confusing to say that the mouse genome is 12% smaller than the human one. What the authors could have said is that less of the mouse genome has been sequenced and assembled into accurate contigs.

If you go to the NCBI site for Homo sapiens you'll see that the size of the genome is 3.24 Gb. The comparable size for Mus musculus is 2.81 Gb. That 15% smaller than the human genome size. How accurate is that?

There's a problem here. With all this sequence information, and all kinds of other data, it's impossible to get an accurate scientific estimate of the total genome sizes.


[Image Credit: Wikipedia: Creative Commons Attribution 2.0 Generic license]

Tuesday, June 27, 2017

Debating alternative splicing (Part IV)

In Debating alternative splicing (Part III) I discussed a review published in the February 2017 issue of Trends in Biochemical Sciences. The review examined the data on detecting predicted protein isoforms and concluded that there was little evidence they existed.

My colleague at the University of Toronto, Ben Blencowe, is a forceful proponent of massive alternative splicing. He responded in a letter published in the June 2017 issue of Trends in Biochemical Sciences (Blencowe, 2017). It's worth looking at his letter in order to understand the position of alternative splicing proponents. He begins by saying,
It is estimated that approximately 95% of multiexonic human genes give rise to transcripts containing more than 100 000 distinct AS events [3,4]. The majority of these AS events display tissue-dependent variation and 10–30% are subject to pronounced cell, tissue, or condition-specific regulation [4].

Monday, June 26, 2017

Debating alternative splicing (Part III)

Proponents of massive alternative splicing argue that most human genes produce many different protein isoforms. According to these scientists, this means that humans can make about 100,000 different proteins from only ~20,000 protein-coding genes. They tend to believe humans are considerably more complex than other animals even though we have about the same number of genes. They think alternative splicing accounts for this complexity [see The Deflated Ego Problem].

Opponents (I am one) argue that most splice variants are due to splicing errors and most of those predicted protein isoforms don't exist. (We also argue that the differences between humans and other animals can be adequately explained by differential regulation of 20,000 protein-coding genes.) The controversy can only be resolved when proponents of massive alternative splicing provide evidence to support their claim that there are 100,000 functional proteins.