More Recent Comments

Monday, February 05, 2018

ENCODE's false claims about the number of regulatory sites per gene

Some beating of dead horses may be ethical, where here and there they display unexpected twitches that look like life.

Zuckerkandl and Pauling (1965)

I realize that most of you are tired of seeing criticisms of ENCODE but it's important to realize that most scientists fell hook-line-and-sinker for the ENCODE publicity campaign and they still don't know that most of the claims were ridiculous.

I was reminded of this when I re-read Brendan Maher's summary of the ENCODE results that were published in Nature on Sept. 6, 2012 (Maher, 2012). Maher's article appeared in the front section of the ENCODE issue.1 With respect to regulatory sequences he said ...
The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes ... But the job is far from done, says [Ewan] Birney, a computational biologist at the European Molecular Biology Laboratory’s European Bioinformatics Institute in Hinxton, UK, who coordinated the data analysis for ENCODE. He says that some of the mapping efforts are about halfway to completion, and that deeper characterization of everything the genome is doing is probably only 10% finished.

Saturday, February 03, 2018

What's in Your Genome?: Chapter 5: Regulation and Control of Gene Expression

I'm working (slowly) on a book called What's in Your Genome?: 90% of your genome is junk! The first chapter is an introduction to genomes and DNA [What's in Your Genome? Chapter 1: Introducing Genomes ]. Chapter 2 is an overview of the human genome. It's a summary of known functional sequences and known junk DNA [What's in Your Genome? Chapter 2: The Big Picture]. Chapter 3 defines "genes" and describes protein-coding genes and alternative splicing [What's in Your Genome? Chapter 3: What Is a Gene?]. Chapter 4 is all about pervasive transcription and genes for functional noncoding RNAs [What's in Your Genome? Chapter 4: Pervasive Transcription].

Chapter 5 is Regulation and Control of Gene Expression.
Chapter 5: Regulation and Control of Gene Expression

What do we know about regulatory sequences?
The fundamental principles of regulation were worked out in the 1960s and 1970s by studying bacteria and bacteriophage. The initiation of transcription is controlled by activators and repressors that bind to DNA near the 5′ end of a gene. These transcription factors recognize relatively short sequences of DNA (6-10 bp) and their interactions have been well-characterized. Transcriptional regulation in eukaryotes is more complicated for two reasons. First, there are usually more transcription factors and more binding sites per gene. Second, access to binding sites depends of the state of chromatin. Nucleosomes forming high order structures create a "closed" domain where DNA binding sites are not accessible. In "open" domains the DNA is more accessible and transcription factors can bind. The transition between open and closed domains is an important addition to regulating gene expression in eukaryotes.
The limitations of genomics
By their very nature, genomics studies look at the big picture. Such studies can tell us a lot about how many transcription factors bind to DNA and how much of the genome is transcribed. They cannot tell you whether the data actually reflects function. For that, you have to take a more reductionist approach and dissect the roles of individual factors on individual genes. But working on single genes can be misleading ... you may miss the forest for the trees. Genomic studies have the opposite problem, they may see a forest where there are no trees.
Regulation and evolution
Much of what we see in evolution, especially when it comes to phenotypic differences between species, is due to differences in the regulation of shared genes. The idea dates back to the 1930s and the mechanisms were worked out mostly in the 1980s. It's the reason why all complex animals should have roughly the same number of genes—a prediction that was confirmed by sequencing the human genome. This is the field known as evo-devo or evolutionary developmental biology.
           Box 5-1: Can complex evolution evolve by accident?
Slightly harmful mutations can become fixed in a small population. This may cause a gene to be transcribed less frequently. Subsequent mutations that restore transcription may involve the binding of an additional factor to enhance transcription initiation. The result is more complex regulation that wasn't directly selected.
Open and closed chromatin domains
Gene expression in eukaryotes is regulated, in part, by changing the structure of chromatin. Genes in domains where nucleosomes are densely packed into compact structures are essentially invisible. Genes in more open domains are easily transcribed. In some species, the shift between open and closed domains is associated with methylation of DNA and modifications of histones but it's not clear whether these associations cause the shift or are merely a consequence of the shift.
           Box 5-2: X-chromosome inactivation
In females, one of the X-chromosomes is preferentially converted to a heterochromatic state where most of the genes are in closed domains. Consequently, many of the genes on the X chromosome are only expressed from one copy as is the case in males. The partial inactivation of an X-chromosome is mediated by a small regulatory RNA molecule and this inactivated state is passed on to all subsequent descendants of the original cell.
           Box 5-3: Regulating gene expression by
           rearranging the genome

In several cases, the regulation of gene expression is controlled by rearranging the genome to bring a gene under the control of a new promoter region. Such rearrangements also explain some developmental anomalies such as growth of legs on the head fruit flies instead of antennae. They also account for many cancers.
ENCODE does it again
Genomic studies carried out by the ENCODE Consortium reported that a large percentage of the human genome is devoted to regulation. What the studies actually showed is that there are a large number of binding sites for transcription factors. ENCODE did not present good evidence that these sites were functional.
Does regulation explain junk?
The presence of huge numbers of spurious DNA binding sites is perfectly consistent with the view that 90% of our genome is junk. The idea that a large percentage of our genome is devoted to transcriptional regulation is inconsistent with everything we know from the the studies of individual genes.
           Box 5-3: A thought experiment
Ford Doolittle asks us to imagine the following thought experiment. Take the fugu genome, which is very much smaller than the human genome, and the lungfish genome, which is very much larger, and subject them to the same ENCODE analysis that was performed on the human genome. All three genomes have approximately the same number of genes and most of those genes are homologous. Will the number of transcription factor biding sites be similar in all three species or will the number correlate with the size of the genomes and the amount of junk DNA?
Small RNAs—a revolutionary discovery?
Does the human genome contain hundreds of thousands of gene for small non-coding RNAs that are required for the complex regulation of the protein-coding genes?
A “theory” that just won’t die
"... we have refuted the specific claims that most of the observed transcription across the human genome is random and put forward the case over many years that the appearance of a vast layer of RNA-based epigenetic regulation was a necessary prerequisite to the emergence of developmentally and cognitively advanced organisms." (Mattick and Dinger, 2013)
What the heck is epigenetics?
Epigenetics is a confusing term. It refers loosely to the regulation of gene expression by factors other than differences in the DNA. It's generally assumed to cover things like methylation of DNA and modification of histones. Both of these effects can be passed on from one cell to the next following mitosis. That fact has been known for decades. It is not controversial. The controversy is about whether the heritability of epigenetic features plays a significant role in evolution.
           Box 5-5: The Weismann barrier
The Weisman barrier refers to the separation between somatic cells and the germ line in complex multicellular organisms. The "barrier" is the idea that changes (e.g. methylation, histone modification) that occur in somatic cells can be passed on to other somatic cells but in order to affect evolution those changes have to be transferred to the germ line. That's unlikely. It means that Lamarckian evolution is highly improbable in such species.
How should science journalists cover this story?
The question is whether a large part of the human genome is devoted to regulation thus accounting for an unexpectedly large genome. It's an explanation that attempts to refute the evidence for junk DNA. The issue is complex and very few science journalists are sufficiently informed enough to do it justice. They should, however, be making more of an effort to inform themselves about the controversial nature of the claims made by some scientists and they should be telling their readers that the issue has not yet been resolved.

Thursday, February 01, 2018

Sex isn't as beneficial as you might think

One of the most interesting topics in my molecular evolution class was the discussion over the importance of sex. Most students seem to think the problem is solved. They were taught that sex increases variation in a population and this gives sexual populations an evolutionary advantage. The fact that sex (recombination) breaks up as many linkages as it creates makes the explanation much less viable. The fact that there's very little evidence to support the claim comes as quite a surprise to my students. Sex is still one of the greatest mysterious in evolutionary biology [What did Joe Felsenstein say about sex?] [Everything you thought you knew about sex is probably wrong].

Kevin Laland's view of "modern" evolutionary theory (again)

Kevin Laland has just published another critique of modern evolutionary theory. This one appears in Aeon [Evolution unleashed]. His criticism is based on a naive and outdated view of modern evolutionary biology. That view has been widely criticized in the past but Laland continues to ignore such criticisms [e.g. Kevin Laland's new view of evolution].

Here's how he describes the state of modern evolutionary biology.
If you are not a biologist, you’d be forgiven for being confused about the state of evolutionary science. Modern evolutionary biology dates back to a synthesis that emerged around the 1940s-60s, which married Charles Darwin’s mechanism of natural selection with Gregor Mendel’s discoveries of how genes are inherited. The traditional, and still dominant, view is that adaptations – from the human brain to the peacock’s tail – are fully and satisfactorily explained by natural selection (and subsequent inheritance). Yet as novel ideas flood in from genomics, epigenetics and developmental biology, most evolutionists agree that their field is in flux. Much of the data implies that evolution is more complex than we once assumed.

Wednesday, January 31, 2018

Herding Hemingway's Cats by Kat Arney

Kat Arney has written a very good book on genes and gene expression. She covers all the important controversies in a thorough and thoughtful manner.

Kat Arney is a science writer based in the UK. She has a Ph.D. from the University of Cambridge where she worked on epigenetics and regulation in mice. She also did postdoc work at Imperial College in London. Her experience in the field of molecular biology and gene expression shows up clearly in her book where she demonstrates the appropriate skepticism and critical thinking in her coverage of the major advances in the field.

Friday, November 17, 2017

Calculating time of divergence using genome sequences and mutation rates (humans vs other apes)

There are several ways to report a mutation rate. You can state it as the number of mutations per base pair per year in which case a typical mutation rate for humans is about 5 × 10-10. Or you can express it as the number of mutations per base pair per generation (~1.5 × 10-8).

You can use the number of mutations per generation or per year if you are only discussing one species. In humans, for example, you can describe the mutation rate as 100 mutations per generation and just assume that everyone knows the number of base pairs (6.4 × 109).

Wednesday, November 08, 2017

How much mitochondrial DNA in your genome?

Most mitochondrial genes have been transferred from the ancestral mitochondrial genome to the nuclear genome over the course of 1-2 billion years of evollution. They are no longer present in mitochondria but they are easily recognized because they resemble α-proteobacterial sequences more than the other nuclear genes [see Endosymbiotic Theory].

This process of incorporating mitochondrial DNA into the nuclear genome continues to this day. The latest human reference genome has about 600 examples of nuclear sequences of mitochondrial origin (= numts). Some of them are quite recent while others date back almost 70 million years—the limit of resolution for junk DNA [see Mitochondria are invading your genome!].

Tuesday, November 07, 2017

Lateral gene transfer in eukaryotes - where's the evidence?

Lateral gene transfer (LGT), or horizontal gene transfer (HGT), is widespread in bacteria. It leads to the creation of pangenomes for many bacterial species where different subpopulations contain different subsets of genes that have been incorporated from other species. It also leads to confusing phylogenetic trees such that the history of bacterial evolution looks more like a web of life than a tree [The Web of Life].

Bacterial-like genes are also found in eukaryotes. Many of them are related to genes found in the ancestors of modern mitochondria and chloroplasts and their presence is easily explained by transfer from the organelle to the nucleus. Eukaryotic genomes also contain examples of transposons that have been acquired from bacteria. That's also easy to understand because we know how transposons jump between species.

Contaminated genome sequences

The authors of the original draft of the human genome sequence claimed that hundreds of genes had been acquired from bacteria by lateral gene transfer (LGT) (Lander et al., 2001). This claim was abandoned when the "finished" sequence was published a few years later (International Human Genome Consortium, 2004) because others had shown that the data was easily explained by differential gene loss in other lineages or by bacterial contamination in the draft sequence (see Salzberg, 2017).

Thursday, November 02, 2017

Parental age and the human mutation rate



-mutation types
-mutation rates

Mutations are mostly due to errors in DNA replication. We have a pretty good idea of the accuracy of DNA replication—the overall error rate is about 10-10 per bp. There are about 30 cell divisions in females between zygote and formation of all egg cells. In males, there are about 400 mitotic cell divisions between zygote and formation of sperm cells. Using these average values, we can calculate the number of mutations per generation. It works out to about 130 mutations per generation [Estimating the Human Mutation Rate: Biochemical Method].

This value is similar to the estimate from comparing the sequences of different species (e.g. human and chimpanzee) based on the number of differences and the estimated time of divergence. This assumes that most of the genome is evolving at the rate expected for fixation of neutral alleles. This phylogenetic method give a value of about 112 mutations per generation [Estimating the Human Mutation Rate: Phylogenetic Method].

The third way of measuring the mutation rate is to directly compare the genome sequence of a child and both parents (trios). After making corrections for false positives and false negatives, this method yields values of 60-100 mutations per generation depending on how the data is manipulated [Estimating the Human Mutation Rate: Direct Method]. The lower values from the direct method call into question the dates of the split between the various great ape lineages. This controversy has not been resolved [Human mutation rates] [Human mutation rates - what's the right number?].

It's clear that males contribute more to evolution than females. There's about a ten-fold difference in the number of cell divisions in the male line compared to the female line; therefore, we expect there to be about ten times more mutations inherited from fathers. This difference should depend on the age of the father since the older the father the more cell divisions required to produce sperm.

This effect has been demonstrated in many publications. A maternal age effect has also been postulated but that's been more difficult to prove. The latest study of Icelandic trios helps to nail down the exact effect (Jónsson et al., 2017).

The authors examined 1,548 trios consisting of parents and at least one offspring. They analyzed 2.682 Mb of genome sequence (84% of the total genome) and discovered an average of 70 mutations events per child.1 This gives an overall mutation rate of 83 mutations per generation with an average generation time of 30 years. This is consistent with previous results.

Jónsson et al. looked at 225 cases of three generation data in order to make sure that the mutations were germline mutations and not somatic cell mutations. They plotted the numbers of mutations against the age of the father and mother to produce the following graph from Figure 1 of their paper.

Look at parents who are 30 years old. At this age, females contribute about 10 mutations and males contribute about 50. This is only a five-fold difference—much lees than we expect from the number of cell divisions. This suggests that the initial estimates of 400 cell divisions in males might be too high.

An age effect on mutations from the father is quite apparent and expected. A maternal age effect has previously been hypothesized but this is the first solid data that shows such an effect. The authors speculate that oocyotes accumulate mutations with age, particularly mutations due to strand breakage.

Of these, 93% were single nucleotide changes and 7% were small deletions or insertions.

Jónsson, H., Sulem, P., Kehr, B., Kristmundsdottir, S., Zink, F., Hjartarson, E., Hardarson, M.T., Hjorleifsson, K.E., Eggertsson, H.P., and Gudjonsson, S.A. (2017) Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature, 549:519-522. [doi: 10.1038/nature24018]

Tuesday, October 31, 2017

The history of DNA sequencing

This year marks the 40th anniversary of DNA sequencing technology (Gilbert and Maxam, 1977; Sanger et al., 1977)1 The Sanger technique soon took over and by the 1990s it was the only technique used to sequence DNA. The development of reliable sequencing machines meant the end of those large polyacrylamide gels that we all hated.

Pyrosequencing was developed in the mid 1990's and by the year 2000 massive parallel sequencing using this technique was becoming quite common. This "NextGen" sequencing technique was behind the massive explosion in sequences in the early part of the 21st century.2

Even newer techniques are available today and there's a debate about whether they should be called Third Generation Sequencing (Heather and Chain, 2015).

Nature has published a nice review of the history of DNA sequencing (Shendure et al., 2017). I recommend it to anyone who's interested in the subject. The figure above is taken from that article.

1. Many labs were using the technology in 1976 before the papers were published.

2. New software and enhanced computer power played an important, and underappreciated, role.

Heather, J.M., and Chain, B. (2015) The sequence of sequencers: The history of sequencing DNA. Genomics, 107:1-8. [doi: 10.1016/j.ygeno.2015.11.003]

Maxam, A.M., and Gilbert, W. (1980) Sequencing end-labeled DNA with base-specific chemical cleavages. Methods in enzymology, 65:499-560. [doi: 10.1016/S0076-6879(80)65059-9]

Sanger, F., Nicklen, S., and Coulson, A.R. (1977) DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, 74:5463-5467. [PDF]

Shendure, J., Balasubramanian, S., Church, G.M., Gilbert, W., Rogers, J., Schloss, J.A., and Waterston, R.H. (2017) DNA sequencing at 40: past, present and future. Nature, 550:345-353. [doi: 10.1038/nature24286]

Escape from X chromosome inactivation

Mammals have two sex chromosomes: X and Y. Males have one X chromosome and one Y chromosome and females have two X chromosomes. Since females have two copies of each X chromosome gene, you might expect them to make twice as much gene product as males of the same species. In fact, males and females often make about the same amount of gene product because one of the female X chromosomes is inactivated by a mechanism that causes extensive chromatin condensation.

The mechanism is known as X chromosome inactivation. The phenomenon was originally discovered by Mary Lyon (1925-2014) [see Calico Cats].

Saturday, October 28, 2017

Creationists questioning pseudogenes: the GULO pseudogene

This is the second post discussing creationist1 papers on pseudogenes. The first post addressed a paper by Jeffrey Tomkins on the β-globin pseudogene [Creationists questioning pseudogenes: the beta-globin pseudogene]. This post covers another paper by Tomkins claiming that the GULO pseudogenes in various primate species are not derived from a common ancestor but instead have been deactivated independently in each lineage.

The Tomkins' article was published in 2014 in Answers Research Journal, a publication that describes itself like this:
ARJ is a professional, peer-reviewed technical journal for the publication of interdisciplinary scientific and other relevant research from the perspective of the recent Creation and the global Flood within a biblical framework.

Saturday, October 14, 2017

Creationists questioning pseudogenes: the beta-globin pseudogene

Jonathan Kane recently (Oct. 6, 2017) posted an article on The Panda's Thumb where he claimed that Young Earth Creationists often don't get enough credit for raising serious issues about evolution [Five principles for arguing against creationism].

He mentioned some articles about pseudogenes as prime examples. I asked him for references and he responded with two articles by Jeffrey Tomkins that were published on the Answers in Genesis website. The first was on the β-globin pseudogene and the second was on the GULO pseudogene. Both articles claim that these DNA sequences aren't really pseudogenes because they have functions.

I'll deal with the β-globin pseudogene in this post and the GULO pseudogene in a subsequent post.

Wednesday, October 11, 2017

Historical evolution is determined by chance events

Modern evolutionary theory is based on the idea that alleles become fixed in a population over time. They can be fixed by natural selection if they confer selective advantage or they can be fixed by random genetic drift if they are nearly neutral or slightly deleterious [Learning about modern evolutionary theory: the drift-barrier hypothesis]. Alleles arise by mutation and the path that a population follows over time depends on the timing of mutations [Mutation-Driven Evolution]. That's largely a chance event.