More Recent Comments

Thursday, December 22, 2022

Junk DNA, TED talks, and the function of lncRNAs

Most of our genome is transcribed but so far only a small number of these transcripts have a well-established biological function.

The fact that most of our genome is transcribed has been known for 50 years but that fact only became widely known with the publication of ENCODE's preliminary results in 2007 (ENCODE, 2007). The ENOCDE scientists referred to this as "pervasive transription" and this label has stuck.

By the end of the 1970s we knew that much of this transcription was due to introns. The latest data shows that protein coding genes and known noncoding genes occupy about 45% of the genome and most of that is intron sequences that are mostly junk. That leaves 30-40% of the genome that is transcribed at some point producing something like one million transcripts of unknown function.

Wednesday, December 21, 2022

A University of Chicago history graduate student's perspective on junk DNA

A new master's thesis on the history of junk DNA has been posted. It's from the Department of History at the University of Chicago.

My routine scan for articles on junk DNA turned up the abstract of an M.A. thesis on the history of junk DNA: Requiem for a Gene: The Problem of Junk DNA for the Molecular Paradigm. The supervisor is Professor Emily Kern in the Department of History at the University of Chicago. I've written to her to ask for a copy of the thesis and for permission to ask her, and her student, some questions about the thesis. No reply so far.

Here's the abstract of the thesis.

“Junk DNA” has been at the center of several high-profile scientific controversies over the past four decades, most recently in the disputes over the ENCODE Project. Despite its prominence in these debates, the concept has yet to be properly historicized. In this thesis, I seek to redress this oversight, inaugurating the study of junk DNA as a historical object and establishing the need for an earlier genesis for the concept than scholars have previously recognized. In search of a new origin story for junk, I chronicle developments in the recognition and characterization of noncoding DNA sequences, positioning them within existing historiographical narratives. Ultimately, I trace the origin of junk to 1958, when a series of unexpected findings in bacteria revealed the existence of significant stretches of DNA that did not encode protein. I show that the discovery of noncoding DNA sequences undermined molecular biologists’ vision of a gene as a line of one-dimensional code and, in turn, provoked the first major crisis in their nascent field. It is from this crisis, I argue, that the concept of junk DNA emerged. Moreover, I challenge the received narrative of junk DNA as an uncritical reification of the burgeoning molecular paradigm. By separating the history of junk DNA from its mythology, I demonstrate that the conceptualization of junk DNA reveals not the strength of molecular biological authority but its fragility.

It looks like it might be a history of noncoding DNA but I won't know for certain until I see the entire thesis. It's only available to students and staff at the University of Chicago.


Sunday, December 18, 2022

Protein concentrations in E. coli are mostly controlled at the level of transcription initiation

The most important step in the regulation of protein-coding genes in E. coli is the rate of binding of RNA polymerase to the promoter region.

A group of scientists at the University of California at San Diego and their European collaborators looked at the concentrations of proteins and mRNAs of about 2000 genes in E. coli. They catalogued these concentrations under several different growth conditions in order to determine whether the level of protein being expressed from each of these genes correlated with transcription rate, translation rate, mRNA stability or other levels of gene expression.

The paper is very difficult to understand because the authors are primarily interested in developing mathematical formulae to describe their results. They expect you to understand equations like,

even though they don't explain the parameters very well. A lot of important information is in the supplements and I couldn't be bothered to download and read them. I don't think the math is anywhere near as important as the data and the conclusions.

Friday, December 16, 2022

Publishing a science book - Lesson #1: The publisher is always right about everything

Don't bother trying to reason with a publisher. All of them have different views on proper style and every single one of them is absolutely certain that their style is the only correct one.

I'm in the middle of the copyedit stage of my book. This is the stage where a copyeditor goes through your manuscript and makes any corrections to spelling and grammar. This is a lot of work for any copyeditor having to deal with one of my manuscripts and I greatly appreciate the effort. My book is a lot better now than it was a few weeks ago. (Who knew that there was only one l in canceled?)

It's also the stage where the publisher imposes their particular style on the manusript and that can be a problem. I'll document some of the issues in subsequent posts but to give you an example, consider the titles of books in the reference list. I wrote it like this: The Selfish Gene and Molecular and Genome Evolution. This is not in line with my publisher's handbook of style so the titles were converted to lowercase as in: The selfish gene and Molecular and genome evolution. I objected, pointing to numerous other science books that used the same titles that are on the covers of the books and suggesting that my readers were more familiar with The Selfish Gene than with The selfish gene.

I was overruled by my publisher who noted that they make their style choices for good reasons—it's for "consistency, clarity, and ease of reading." I assume that publishers, such as Oxford, would make the same argument while insisting that the title should be The Selfish Gene.

In case you ever find yourself in this position, you should keep in mind that your contract will almost certainly say that the publisher has complete control of your book and they can make any changes they want as long as it doesn't affect the meaning of what you wrote.

Here's what it says in my contract, "The Publisher shall publish the Author's work in whatever style and format it thinks most suitable ... While the Publisher may, in its sole discretion, consult the Author with respect to said style and format, the Publisher retains the right to make all final decisions on matters of format, design, selling price and marketing."

I was aware of some issues with inappropriate covers and tiles in the past so I had an extra sentence added to the contract that said, "The Publisher and Author will discuss and agree upon the title and cover design." It's a good thing I put that in because the publisher was pressuring me to change the title of the book and I was able to resist.

Authors can't win most fights over style and format. I've been discussing the publishing of science books with a number of other authors over the past few months and several of them told me not to bother trying to argue with a publisher because they will never give in. They have a set style for all books and they won't make an exception for an individual author no matter how good an argument you make.

I didn't listen to those other authors. Silly me.

I'm thinking of trying to write a standard set of guidelines that scientists could put into their contracts to cover the most egregious style restrictions. It might be helpful if all science writers would insist on inserting these guidelines into their contracts.


Can the AI program ChatGPT pass my exam?

There's a lot of talk about ChatGPT and how it can prepare lectures and get good grades on undergraduate exams. However, ChatGPT is only as good as the information that's popular on the internet and that's not always enough to get a good grade on my exam.

ChatGPT is an artificial intelligence (AI) program that's designed to answer questions using a style and language that's very much like the responses you would get from a real person. It was developed by OpenAI, a tech company in San Francisco. You can create an account and log in to ask any question you want.

Several professors have challenged it with exam questions and they report that ChatGPT would easily pass their exams. I was skeptical, especially when it came to answering questions on controversial topics where there was no clear answer. I also suspected that ChatGPT would get it's answers from the internet and this means that popular, but incorrect, views would likely be part of ChatGPT's response.

Here are my questions and the AI program's answers. It did quite well in some cases but not so well in others. My main concern is that programs like this might be judged to be reliable sources of information despite the fact that the real source is suspect.

Monday, December 12, 2022

Did molecular biology make any contribution to evolutionary theory?

Some evolutionary biologists think—incorrectly, in my opinion—that molecular biology has made no contributions to our understanding of evolution.

PNAS published a series of articles on Gregor Mendel and one of them caught my eye. Here's what Nicholas Barton wrote in his article The "New Synthesis".

During the 1960s and 1970s, there were further conceptual developments—largely independent of the birth of molecular biology during the previous two decades (15). First, there was an understanding that adaptations cannot be explained simply as being “for the good of the species” (16, 17). One must explain how the genetic system (including sexual reproduction, recombination, and a fair meiosis, with each copy of a gene propagating with the same probability) is maintained through selection on individual genes, and remains stable despite mutations that would disrupt the system (17, 19, 20). Second, and related to this, there was an increased awareness of genetic conflicts that arise through sexual reproduction; selfish elements may spread through biased inheritance, even if they reduce individual fitness (19, 21, 22). In the decade following the discovery that DNA carries genetic information, all the fundamental principles of molecular biology were established: the flow of information from sequences of DNA through RNA to protein, the regulation of genes by binding to specific sequences in promoters, and the importance of allostery in allowing arbitrary regulatory networks (23, 24). Yet, the extraordinary achievements of molecular biology had little effect on the conceptual development of evolutionary biology. Conversely, although evolutionary arguments were crucial in the founding of molecular biology, they have had rather little influence in the half-century since (e.g., ref. 25). Of course, molecular biology has revealed an astonishing range of adaptations that demand explanation—for example, the diversity of biochemical pathways, that allow exploitation of almost any conceivable resource, or the efficiency of molecular machines such as the ribosome, which translates the genetic code. Technical advances have brought an accelerating flood of data, most recently, giving us complete genome sequences and expression patterns from any species. Yet, arguably, no fundamentally new principles have been established in molecular biology, and, in evolutionary biology, despite sophisticated theoretical advances and abundant data, we still grapple with the same questions as a century or more ago.

This does not seem fair to me. I think that neutral theory, nearly neutral theory, and the importance of random genetic drift relied heavily on work done by molecular biologists. Similarly, the development of dating techniques using DNA and protein sequences is largely the work of molecular biologists. It wasn't the adaptationists or the paleontologists who discovered that humans and chimpanzees shared a common ancestor 5-7 million years ago and it wasn't either of those groups who discovered the origin of mitochondria.

And some of us are grappling with the idea that most of our genome is junk DNA, a question that never would have occurred to evolutionary biologists from a century ago.

Barton knows all about modern population genetics and the importance of neutral theory because later on he says,

If we consider a single allele, then we can see it as “effectively neutral” if its effect on fitness is less than ∼1/2Ne. This idea was used by Ohta (54) in a modification of the neutral theory, to suggest why larger populations might be less diverse than expected (because a smaller fraction of mutations would be effectively neutral), and why rates of substitution might be constant per year rather than per generation (because species with shorter generation times might tend to have large populations, and have a smaller fraction of effectively neutral mutations that contribute to long-term evolution). Lynch (21) has applied this concept to argue that molecular adaptations that are under weak selection cannot be established or maintained in (relatively) smaller populations, imposing a “drift barrier” to adaptation. Along the same lines, Kondrashov (55) has argued that deleterious mutations with Nes ≈ 1 will accumulate, steadily degrading the population. Both ideas seem problematic if we view adaptation as due to optimization of polygenic traits: Organisms can be well adapted even if drift dominates selection on individual alleles, and, under a model of stabilizing selection on very many traits, any change that degrades fitness can be compensated.

Barton may think that the drift-barrier hypothesis is "problematic" but it certainly seems like a significant advance that owes something to molecular biology.

What do you think? Do you agree with Barton that, "... the extraordinary achievements of molecular biology had little effect on the conceptual development of evolutionary biology."


Friday, December 02, 2022

Sequencing both copies of your diploid genome

New techniques are being developed to obtain the complete sequences of both copies (maternal and paternal) of a typical diploid individual.

The first two sequences of the human genome were published twenty years ago by the International Human Genome Project and by a company called Celera Genomics. The published sequences were a consensus using DNA from multiple indivduals so the final result didn't represent the sequence of any one person. Furthermore, since each of us has inherited separate genomes from our mother and father, our DNA is actually a mixture of two different haploid genomes. Most published genome sequences are an average of these two separate genomes where the choice of nucleotide at any one position is arbitrary.

The first person to have a complete genome sequence was James Watson in 2007 but that was a composite genome sequence. Craig Venter's genome sequence was published a few months later and it was the first complete genome sequence containing separate sequences of each of his 46 chromosomes. (One chromosome from each of his parents.) In today's language, we refer to this as a diploid sequence.

The current reference sequence is based on the data published by the public consortium (International Humand Genome Project)—nobody cares about the Celera sequence. Over the years, more and more sequencing data has been published and this has been incorporated into the standard human reference genome in order to close most gaps and improve the accuracy. The current version is called GRCh38.p14 from February 3, 2022. It's only 95% complete because it's missing large stretches of repetitive DNA, especially in the centromere regions and at the ends of each chromosome (telomeric region).

The important point for this discussion is that CRCh38 is not representative of the genomes of most people on Earth because there has been a bias in favor of sequencing European genomes. (Some variants are annotated in the reference genome but this can't continue.) Many scientists are interested the different kinds of variants present in the human population so they would like to create databases of genomes from diverse populations.

The first complete, telomere-to-telomere (T2T), human genome sequence was published last year [A complete human genome sequence (2022). It was made possible by advances in sequencing technology that generated long reads of 10,000 bp and ultra-long reads of up to 1,000,000 bp [Telomere-to-telomere sequencing of a complete human genome]. The DNA is from a CHM13 cell line that has identical copies of each chromosome so there's no ambiguity due to differences in the maternal and paternal copies. The full name of this sequence is CHM13-T2T.

The two genomes (CRCh38 and CHM13) can't be easily merged so right now there are competing reference genomes [What do we do with two different human genome reference sequences?].

The techniques used to sequence the CHM13 genome make it possible to routinely obtain diploid genome sequences from a large number of individuals because overlapping long reads can link markers on the same chromosome and distinguish between the maternal and paternal chromosomes. However, in practice, the error rate of long read sequencing made assembly of separate chromosomes quite difficult. Recent advances in the accuracy of long read sequencing have been developed by PacBio, and this high fidelity sequencing (PacBio HiFi sequencing) promises to change the game.

The Human Pangene Reference Consortium has tackled the problem by sequencing the genome of an Ashkenazi man (HG002) and his parents (HG002-father and HG004-mother) using the latest sequencing techniques. They then asked the genome community to submit their assemblies using their best software in a kind of "assembly bakeoff." They got 23 responses.

Jarvis, E. D., Formenti, G., Rhie, A., Guarracino, A., Yang, C., Wood, J., et al. (2022) Semi-automated assembly of high-quality diploid human reference genomes. Nature, 611:519-531. [doi: 10.1038/s41586-022-05325-5]

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

We don't need to get into all the details but there are a few observations of interest.

  • All of the attempted assemblies were reasonably good but the best ones had to make use of the parental genomes to resolve discrepancies.
  • Some assemblies began by separating the HG002 (child) sequences into two separate groups based on their similarity to one of the parents. Others generated assemblies without using the parental data then fixed any problems by using the parental genomes and a technique called "graph-based phasing." The second approach was better.
  • All of the final assemblies were contaminated with varying amounts of E. coli and yeast DNA or and/or various adaptor DNA sequences that were not removed by filters. All of them were contaminated with mitochondrial DNA that did not belong in the assembled chromosomes.
  • The most common sources of assembly errors were: (1) missing joins where large stretches of DNA should have been brought together, (2) misjoins where two large stretches (contigs) were inappropriately joined, (3) incorrect inversions, and (4) false duplications.
  • The overall accuracy of the best assemblies was one base pair error in 100,000 bp (10-5).
  • Using the RefSeq database of 27,225 genes, most assemblies captured almost all of these confirmed and probable genes but several hundred were not complete and many were missing.
  • No chromosome was complete telomere-telomere (T2T) but most were nearly complete including the complicated centromere and telomere regions.
  • The two genomes (parental and maternal) differed at 2.6 million SNPs (single nucleotides), 631,000 small structural variations (<50 bp), and 11,600 large structural variations (>50 bp).
  • The consortium used the best assembly algorithm to analyze the genomes of an additional 47 individuals. They began with the same coverage used for HG002; namely, 35X coverage. (Each stretch of DNA was sequenced 35 times on average - about equal amounts in both directions.) This was not successful so they had to increase the coverage to 130X to get good assemblies. They estimate that each additional diploid sequence will reguire 50-60X coverage. This kind of coverage would have been impossible in the 1990s when the first human genome was assembled but now it's fairly easy as long as you have the computer power and storage to deal with it.


Thursday, December 01, 2022

University of Michigan biochemistry students edit Wikipedia

Students in a special topics course at the University of Michigan were taught how to edit a Wikipedia article in order to promote function in repetitive DNA and downplay junk.

The Wikipedia article on Repeated sequence (DNA) was heavily edited today by students who were taking an undergraduate course at the University of Michgan. One of the student leaders, Rasberry Neuron, left the following message on the "Talk" page.

This page was edited for a course assignment at the University of Michigan. The editing process included peer review by four students, the Chemistry librarian at the University of Michigan, and course instructors. The edits published on 12/01/2022 reflect improvements guided by the original editing team and the peer review feedback. See the article's History page for information about what changes were made from the previous version.

References to junk DNA were removed by the students but quickly added back by Paul Gardner who is currently fixing other errors that the students have made.

I checked out the webpage for the course at CHEM 455_505 Special Topics in Biochemistry - Nucleic Acids Biochemistry. The course description is quite revealing.

We now realize that the human genome contains at least 80,000 non-redundant non-coding RNA genes, outnumbering protein-coding genes by at least 4-fold, a revolutionary insight that has led some researchers to dub the eukaryotic cell an “RNA machine”. How exactly these ncRNAs guide every cellular function – from the maintenance and processing to the regulated expression of all genetic information – lies at the leading edge of the modern biosciences, from stem cell to cancer research. This course will provide an equally broad as deep overview of the structure, function and biology of DNA and particularly RNA. We will explore important examples from the current literature and the course content will evolve accordingly.

The class will be taught from a chemical/molecular perspective and will bring modern interdisciplinary concepts from biochemistry, biophysics and molecular biology to the fore.

Most of you will recognize right away that there are factually incorrect statements (i.e. misinformation) in that description. It is not true that there are at least 80,000 noncoding genes in the human genome. At some point in the future that may turn out to be true but it's highly unlikely. Right now, there are at most 5,000 proven noncoding genes. There are many scientists who claim that the mere existence of a noncoding transcript is proof that a corresponding gene must exist but that's not how science works. Before declaring that a gene exists you must present solid evidence that it produces a biologically relevant product [Most lncRNAs are junk] [Wikipedia blocks any mention of junk DNA in the "Human genome" article] [Editing the Wikipedia article on non-coding DNA] [On the misrepresentation of facts about lncRNAs] [The "standard" view of junk DNA is completely wrong] [What's In Your Genome? - The Pie Chart] [How many lncRNAs are functional?].

I'm going to email a link to this post to the course instructors and some of the students. Let's see if we can get them to discuss junk DNA.


Monday, November 21, 2022

How not to write a Nature abstract

A friend recently posted a figure on Facebook that instructs authors in the correct way to prepare a summary paragraph (abstract) for publication in Nature. It uses a specific example and the advice is excellent [How to construct a Nature summary paragraph].

I thought it might be fun to annotate a different example so I randomly selected a paper on genomics to see how it compared. The one that popped up was An integrated encyclopedia of DNA elements in the human genome.


Saturday, November 19, 2022

How many enhancers in the human genome?

In spite of what you might have read, the human genome does not contain one million functional enhancers.

The Sept. 15, 2022 issue of Nature contains a news article on "Gene regulation" [Two-layer design protects genes from mutations in their enhancers]. It begins with the following sentence.

The human genome contains only about 20,000 protein-coding genes, yet gene expression is controlled by around one million regulatory DNA elements called enhancers.

Sandwalk readers won't need to be told the reference for such an outlandish claim because you all know that it's the ENCODE Consortium summary paper from 2012—the one that kicked off their publicity campaign to convince everyone of the death of junk DNA (ENCODE, 2012). ENCODE identified several hundred thousand transcription factor (TF) binding sites and in 2012 they estimated that the total number of base pairs invovled in regulating gene expression could account for 20% of the genome.

How many of those transcription factor binding sites are functional and how many are due to spurious binding to sites that have nothing to do with gene regulation? We don't know the answer to that question but we do know that there will be a huge number of spurious binding sites in a genome of more than three billion base pairs [Are most transcription factor binding sites functional?].

The scientists in the ENCODE Consortium didn't know the answer either but what's surprising is that they didn't even know there was a question. It never occured to them that some of those transcription factor binding sites have nothng to do with regulation.

Fast forward ten years to 2022. Dozens of papers have been published criticizing the ENCODE Consortium for their stupidity lack of knowledge of the basic biochemical properties of DNA binding proteins. Surely nobody who is interested in this topic believes that there are one million functional regulatory elements (enhancers) in the human genome?

Wrong! The authors of this Nature article, Ran Elkon at Tel Aviv University (Israel) and Reuven Agami at the Netherlands Cancer Institute (Amsterdam, Netherlands), didn't get the message. They think it's quite plausible that the expression of every human protein-coding gene is controlled by an average of 50 regulatory sites even though there's not a single known example any such gene.

Not only that, for some reason they think it's only important to mention protein-coding genes in spite of the fact that the reference they give for 20,000 protein-coding genes (Nurk et al., 2022) also claims there are an additional 40,000 noncoding genes. This is an incorrect claim since Nurk et al. have no proof that all those transcribed regions are actually genes but let's play along and assume that there really are 60,000 genes in the human genome. That reduces the average number of enhancers to an average of "only" 17 enhancers per gene. I don't know of a single gene that has 17 or more proven enhancers, do you?

Why would two researchers who study gene regulation say that the human genome contains one million enhancers when there's no evidence to support such a claim and it doesn't make any sense? Why would Nature publish this paper when surely the editors must be aware of all the criticism that arose out of the 2012 ENCODE publicity fiasco?

I can think of only two answers to the first question. Either Elkon and Agami don't know of any papers challenging the view that most TF binding sites are functional (see below) or they do know of those papers but choose to ignore them. Neither answer is acceptable.

I think that the most important question in human gene regulation is how much of the genome is devoted to regulation. How many potential regulatory sites (enhancers) are functional and how many are spurious non-functional sites? Any paper on regulation that does not mention this problem should not be published. All results have to interpreted in light of conflicting claims about function.

Here are some example of papers that raise the issue. The point is not to prove that these authors are correct - although they are correct - but to show that there's a controvesy. You can't just state that there are one million regulatory sites as if it were a fact when you know that the results are being challenged.

"The observations in the ENCODE articles can be explained by the fact that biological systems are noisy: transcription factors can interact at many nonfunctional sites, and transcription initiation takes place at different positions corresponding to sequences similar to promoter sequences, simply because biological systems are not tightly controlled." (Morange, 2014)

"... ENCODE had not shown what fraction of these activities play any substantive role in gene regulation, nor was the project designed to show that. There are other well-studied explanations for reproducible biochemical activities besides crucial human gene regulation, including residual activities (pseudogenes), functions in the molecular features that infest eukaryotic genomes (transposons, viruses, and other mobile elements), and noise." (Eddy, 2013)

"Given that experiments performed in a diverse number of eukaryotic systems have found only a small correlation between TF-binding events and mRNA expression, it appears that in most cases only a fraction of TF-binding sites significantly impacts local gene expression." (Palazzo and Gregory, 2014)

One surprising finding from the early genome-wide ChIP studies was that TF binding is widespread, with thousand to tens of thousands of binding events for many TFs. These number do not fit with existing ideas of the regulatory network structure, in which TFs were generally expected to regulate a few hundred genes, at most. Binding is not necessarily equivalent to regulation, and it is likely that only a small fraction of all binding events will have an important impact on gene expression. (Slattery et al., 2014)

Detailed maps of transcription factor (TF)-bound genomic regions are being produced by consortium-driven efforts such as ENCODE, yet the sequence features that distinguish functional cis-regulatory sites from the millions of spurious motif occurrences in large eukaryotic genomes are poorly understood. (White et al., 2013)

One outstanding issue is the fraction of factor binding in the genome that is "functional", which we define here to mean that disturbing the protein-DNA interaction leads to a measurable downstream effect on gene regulation. (Cusanovich et al., 2014)

... we expect, for example, accidental transcription factor-DNA binding to go on at some rate, so assuming that transcription equals function is not good enough. The null hypothesis after all is that most transcription is spurious and alterantive transcripts are a consequence of error-prone splicing. (Hurst, 2013)

... as a chemist, let me say that I don't find the binding of DNA-binding proteins to random, non-functional stretches of DNA surprising at all. That hardly makes these stretches physiologically important. If evolution is messy, chemistry is equally messy. Molecules stick to many other molecules, and not every one of these interactions has to lead to a physiological event. DNA-binding proteins that are designed to bind to specific DNA sequences would be expected to have some affinity for non-specific sequences just by chance; a negatively charged group could interact with a positively charged one, an aromatic ring could insert between DNA base pairs and a greasy side chain might nestle into a pocket by displacing water molecules. It was a pity the authors of ENCODE decided to define biological functionality partly in terms of chemical interactions which may or may not be biologically relevant. (Jogalekar, 2012)


Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko, A., et al. (2022) The complete sequence of a human genome. Science, 376:44-53. [doi:10.1126/science.abj6987]

The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57-74. [doi: 10.1038/nature11247]

Academic workers on strike at University of California schools

Graduate students, postdocs, and other "academic workers" are on strike for higher wages and better working conditions at University of California schools but it's very difficult to understand what's going on.

Several locals of the United Auto Workers union are on strike. The groups include Academic Researchers, Academic Student Employees (ASEs), Postdocs, and Student Researchers. The list of demands can be found on the UAW website: UAW Bargaining Highlights (All Units).

Here's the problem. At my university, graduate students can make money by getting a position as a TA (teaching assistant). This is a part-time job at an hourly rate. This may be a major source of income for humanities students but for most science students it's just a supplement to their stipend. The press reports on this strike keep referring to a yearly income and they make it sound like part-time employment as a TA should pay a living wage. For example, a recent Los Angeles Times article says,

The workers are demanding a base salary of $54,000 for all graduate student workers, child-care subsidies, enhanced healthcare for dependents, longer family leave, public transit passes and lower tuition costs for international scholars. The union said the workers earn an average current pay of about $24,000 a year.

I don't understand this concept of "base salary." In my experience, most TAs work part time. If they were paid $50 per hour then they would have to work about 30 hours per week over two semesters in order to earn $54,000 per year. That doesn't seem to leave much time for working on a thesis. Perhaps it includes a stipend that doesn't require teaching?

Our graduate students are paid a living allowance (currently about $28,000 Cdn) and their tuition and fees are covered by an extra $8,000. Most of them don't do any teaching. Almost all of this money comes from research grants and not directly from the university.

The University of California system seems to be very different from the one I'm accustomed to. Is the work of TAs obvious to most Americans? Do you understand the issues?

I also don't get the situation with postdocs. The union is asking for a $70,000 salary for postdocs and the university is offering an 8% increase in the first year and smaller increases in subsequent years. In Canada, postdocs are mostly paid from research grants and not from university funds. The average postdoc salary at the University of Toronto is $51,000 (Cdn) but the range is quite large ($40K - $100K). I don't think the University of Toronto can dictate to PIs the amount of money that they have to pay a postdoc but it does count them as employees and ensures that postdocs have healthcare and suitable working conditions. These postdocs are members of a union (CUPE 3902) and there is a minimum stipend of $36,000 (Cdn).

Can someone explain the situation at the University of California schools? Are they asking for a minimum salary of $70,000 (US) ($93,700 Cdn)? Will PIs have to pay postdocs more from their research grants if the union wins a wage increase but the postdocs are already earning more than 70,000?

It's all very confusing and the press doesn't seem to have a good handle on the situation.

Note: I know that the union doesn't expect the university to meet it's maximum demands. I'm sure they will settle for something less. That's not the point I'm trying to make. I'm just trying to understand how graduate students and postdocs are paid in University of California schools.


Friday, November 18, 2022

Higher education for all?

I discuss a recent editorial in Science that advocates expanding university education in order to prepare students for jobs.

I believe that the primary goal of a university education is to teach students how to think (critical thinking). This goal is usually achieved within the context of an in-depth study in a particular field such as history, English literature, geology, or biochemistry. The best way of achieving that goal is called student-centered learning. It involves, among other things, classes that focus on discussion and debate under the guidance of an experienced teacher.

Universities and colleges also have job preparation programs such as business management, medicine, computer technology, and, to a lesser extent, engineering. These programs may, or may not, teach critical thinking (usually not).

About 85% of students who enter high school will graduate. About 63% of high school graduates go to college or university. The current college graduation rate is 65% (within six years). What this means is that for every 100 students that begin high school roughly 35 will graduate from college.

Now let's look at an editorial written by Marcia McNutt, President of the United States National Academy of Sciences. The editorial appears in the Nov. 11 issue of Science [Higher education for all]. She begins by emphasizing the importance of a college degree in terms of new jobs and the wealth of nations.

Currently, 75% of new jobs require a college degree. Yet in the US and Europe, only 40% of young adults attend a 2-year or 4-year college—a percentage that has either not budged or only modestly risen in more than two decades— despite a college education being one of the proven ways to lift the socioeconomic status of underprivileged populations and boost the wealth of nations.

There's no question that well-educated graduates will contribute to society in many ways but there is a question about what "well-educated" really means. Is it teaching specific jobs skills or is it teaching students how to think? I vote for teaching critical thinking and not for job training. I think that creating productive citizens who can fill a variety of different jobs is a side-benefit of preparing them to cope with a complex society that requires critical thinking. I don't think my view is exactly the same as Marcia McNutt's because she emphasizes training as a main goal of college education.

Universities, without building additional facilities, could expand universal and life-long access to higher education by promoting more courses online and at satellite community-college campuses.

Statements like that raise two issues that don't get enough attention. The first one concerns the number of students who should graduate from college in an ideal world. What is that number and at what stage should it be enhanced? Should here be more high school graduates going to college? If so, does that require lowering the bar for admission or is the cost of college the main impediment? Is there a higher percentage of students entering college in countries with free, or very low, tuition? Should there be more students graduating? If so, one easy way to do that is to make university courses easier. Is that what we want?

The question that's never asked is what percentage of the population is smart enough to get a college degree? Is it much higher than 40%?

The second issue concerns the quality of education. The model that I suggested above is not consistent with online courses and there's a substantial number of papers in the pedagogical literature showing the student centered education doesn't work very well online. Does that mean that we should adopt a different way of teaching in order to make college education more accessible? If so, at what cost?

McNutt gives us an example of the kind of change she envisages.

At Colorado College, students complete a lab science course in only 4 weeks, attending lectures in the morning and labs in the afternoon. This success suggests that US universities could offer 2-week short courses that include concentrated, hands-on learning and teamwork in the lab and the field for students who already mastered the basics through online lectures. Such an approach is more common in European institutions of higher education and would allow even those with full-time employment elsewhere to advance their skills during vacations or employer-supported sabbaticals for the purpose of improving the skills of the workforce. Opportunities abound for partnerships with industry for life-long learning. The availability of science training in this format could also be a boon for teachers seeking to fill gaps in their science understanding.

This is clearly a form of college education that focuses on job skills and even goes as far as suggesting that industry could be a "partner" in education. (As an aside, it's interesting that government employers, schools, and nonprofits are never asked to be partners in education even though they hire a substantial number of college graduates.)

Do you agree that the USA should be expanding the number of students who graduate from college and do you agree that the goal is to give them the skills needed to get a job?


Tuesday, November 08, 2022

Science education in an age of misinformation

I just read an annoying article in Boston Review: The Inflated Promise of Science Education. It was written by Catarina Dutilh Novaes, a Professor of Philosophy, and Silvia Ivani, a teaching fellow in philosophy. Their main point was that the old-fashioned way of teaching science has failed because the general public mistrusts scientists. This mistrust stems, in part, from "legacies of scientific or medical racism and the commercialization of biomedical science."

The way to fix this, according to the authors, is for scientists to address these "perceived moral failures" by engaging more with society.

"... science should be done with and for society; research and innovation should be the product of the joint efforts of scientists and citizens and should serve societal interests. To advance this goal, Horizon 2020 encouraged the adoption of dialogical engagement practices: those that establish two-way communication between experts and citizens at various stages of the scientific process (including in the design of scientific projects and planning of research priorities)."

Clearly, scientific education ought to mean the implanting of a rational, sceptical, experimental habit of mind. It ought to mean acquiring a method – a method that can be used on any problem that one meets – and not simply piling up a lot of facts.

George Orwell

This is nonsense. It has nothing to do with science education; instead, the authors are focusing on policy decisions such as convincing people to get vaccinations.

The good news is that the Boston Review article links to a report from Stanford University that's much more intelligent: Science Education in an Age of Misinformation. The philosophers think that this report advocates "... a well-meaning but arguably limited approach to the problem along the lines of the deficit model ...." where "deficit model refers to a mode of science communication where scientists just dispense knowledge to the general public who are supposed to accept it uncritically.

I don't know of very many science educators who think this is the right way to teach. I think the prevailing model is to teach the nature of science (NOS) [The Nature of Science (NOS)]. That requires educating students and the general public about the way science goes about creating knowledge and why evidence-based knowledge is reliable. It's connected to teaching critical thinking, not teaching a bunch scientific facts. The "deficit model" is not the current standard in science education and it hasn't been seriously defended for decades.

"Appreciating the scientific process can be even more important than knowing scientific facts. People often encounter claims that something is scientifically known. If they understand how science generates and assesses evidence bearing on these claims, they possess analytical methods and critical thinking skills that are relevant to a wide variety of facts and concepts and can be used in a wide variety of contexts.”

National Science Foundation, Science and Technology Indicators, 2008

An important part of the modern approach as described in the Stanford report is teaching students (and the general public) how to gather information and determine whether or not it's reliable. That means you have to learn how to evalute the reliabiltiy of your sources and whether you can trust those who claim to be experts. I taught an undergraduate course on this topic for many years and I learned that it's not easy to teach the nature of science and critical thinking.

The Stanford Report is about the nature of science (NOS) model and how to implement it in the age of social media. Specifically, it's about teaching ways to evaluate your sources when you are inundated with misinformation.

The main part of this approach is going to seem controversial to many because it emphasizes the importance of experts and there's a growing reluctance in our society to trust experts. That's what the Boston Globe article was all about. The solutions advocated by the authors of that article are very different than the ones presented in the Sanford report.

The authors of the Standford report recognize that there's a widepread belief that well-educated people can make wise decisions based entirely on their own knowledge and judgement, in other words, that they can be "intellectually independent." They reject that false belief.

The ideal envisioned by the great American educator and philosopher John Dewey—that it is possible to educate students to be fully intellectually independent—is simply a delusion. We are always dependent on the knowledge of others. Moreover, the idea that education can educate independent critical thinkers ignores the fact that to think critically in any domain you need some expertise in that domain. How then, is education to prepare students for a context where they are faced with knowledge claims based on ideas, evidence, and arguments they do not understand?

The goal of science education is to teach students how to figure out which source of information is supported by the real experts and that's not an easy task. It seems pretty obvious that scientists are the experts but not all scientists are experts so how do you tell the difference between science quacks and expert scientists?

The answer requires some knowledge about how science works and how scientists behave. The Stanford reports says that this means acquiring an understanding of "science as a social practice." I think "social practice" is bad choice of terms and I would have preferred that they stick with "nature of science" but that was their choice."

The mechanisms for recognizing the real experts relies on critical thinking but it's not easy. Two of the lead authors1 of the Stanford Report published a short synopsis in Science last month (October 2022) [https://doi.org/10.1126/science.abq8-93]. Their heuristic is shown on the right.

The idea here is that you can judge the quality of scientific information by questioning the credentials of the authors. This is how "outsiders" can make judgements about the quality of the information without being experts themselves. The rules are pretty good but I wish there had been a bit more on "Unbiased scientific information" as a criterion. I think that you can make a judgement based on whether the "experts" take the time to discuss alternative hypotheses and explain the positions of those who disagree with them but this only applies to genuine scientific controversies and if you don't know that there's a controversy then you have no reason to apply this filter.

For example, if a paper is telling you about the wonderful world of regulatory RNAs and points out that there are 100,000 newly discovered genes for these RNAs, you would have no reason to question that if the scientists have no conflict of interest and come from prestigious universities. You would have to reply on the reviewers of the paper, and the journal, to insist that alternative explanations (e.g. junk RNA) were mentioned. That process doesn't always work.

There's no easy way to fix that problem. Scientists are biased all the time but outsiders (i.e. non-experts) have no way of recognizing bias. I used to think that we could rely on science journalists to alert us to these biases and point out that the topic is controversial and no consensus has been reached. That didn't work either.

At least the heuristic works some of the time so we should make sure we teach it to students of all ages. It would be even nicer if we could teach scientists how to be credible and how to recognize their own biases.


1. The third author is my former supervisor, Bruce Alberts, who has been interested in science education for more than fifty years. He did a pretty good job of educating me! :-)

Saturday, November 05, 2022

Nature journalist is confused about noncoding RNAs and junk

Nature Methods is one of the journals in Nature Portfolio published by Springer Nature. Its focus is novel methods in the life sciences.

The latest issue (October, 2022) highlights the issues with identifying functional noncoding RNAs and the editorial, Decoding noncoding RNAs, is quite good—much better than the comments in other journals. Here's the final paragraph.

Despite the increasing prominence of ncRNA, we remind readers that the presence of a ncRNA molecule does not always imply functionality. It is also possible that these transcripts are non-functional or products from, for example, splicing errors. We hope this Focus issue will provide researchers with practical advice for deciphering ncRNA’s roles in biological processes.

However, this praise is mitigated by the appearance of another article in the same journal. Science journalist, Vivien Marx has written a commentary with a title that was bound to catch my eye: How noncoding RNAs began to leave the junkyard. Here's the opening paragraph.

Junk. In the view of some, that’s what noncoding RNAs (ncRNAs) are — genes that are transcribed but not translated into proteins. With one of his ncRNA papers, University of Queensland researcher Tim Mercer recalls that two reviewers said, “this is good” and the third said, “this is all junk; noncoding RNAs aren’t functional.” Debates over ncRNAs, in Mercer’s view, have generally moved from ‘it’s all junk’ to ‘which ones are functional?’ and ‘what are they doing?’

This is the classic setup for a paradigm shaft. What you do is create a false history of a field and then reveal how your ground-breaking work has shattered the long-standing paradigm. In this case, the false history is that the standard view among scientists was that ALL noncoding RNAs were junk. That's nonsense. It means that these old scientists must have dismissed ribosomal RNA and tRNA back in the 1960s. But even if you grant that those were exceptions, it means that they knew nothing about Sidney Altman's work on RNAse P (Nobel Prize, 1989), or 7SL RNA (Alu elements), or the RNA components of spliceosomes (snRNAs), or PiWiRNAs, or snoRNAs, or microRNAs, or a host of regulatory RNAs that have been known for decades.

Knowledgeable scientists knew full well that there are many functional noncoding RNAS and that includes some that are called lncRNAs. As the editorial says, these knowledgeable scientists are warning about attributing function to all transcripts without evidence. In other words, many of the transcripts found in human cells could be junk RNA in spite of the fact that there are also many functional nonciding RNAs.

So, Tim Mercer is correct, the debate is over which ncRNAs are functional and that's the same debate that's been going on for 50 years. Move along folks, nothing to see here.

The author isn't going to let this go. She decides to interview John Mattick, of all people, to get a "proper" perspective on the field. (Tim Mercer is a former student of Mattick's.) Unfortunately, that perspective contains no information on how many functional ncRNAs are present and on what percentage of the genome their genes occupy. It's gonna take several hundred thousand lncRNA genes to make a significant impact on the amount of junk DNA but nobody wants to say that. With John Mattick you get a twofer: a false history (paradigm strawman) plus no evidence that your discoveries are truly revolutionary.

Nature Methods should be ashamed, not for presenting the views of John Mattick—that's perfectly legitimate—but for not putting them in context and presenting the other side of the controversy. Surely at this point in time (2022) we should all know that Mattick's views are on the fringe and most transcripts really are junk RNA?


Monday, October 17, 2022

University press releases are a major source of science misinformation

Here's an example of a press release that distorts science by promoting incorrect information that is not found in the actual publication.

The problems with press releases are well-known but nobody is doing anything about it. I really like the discussion in Stuart Ritchie's recent (2020) book where he begins with the famous "arsenic affair" in 2010. Sandwalk readers will recall that this started with a press conference by NASA announcing that arsenic replaces phosphorus in the DNA of some bacteria. The announcement was treated with contempt by the blogosphere and eventually the claim was discproved by Rosie Redfield who showed that the experiment was flawed [The Arsenic Affair: No Arsenic in DNA!].

This was a case where the science was wrong and NASA should have known before it called a press conference. Ritchie goes on to document many cases where press releases have distorted the science in the actual publication. He doesn't mention the most egregious example, the ENCODE publicity campaign that successfully convinced most scientists that junk DNA was dead [The 10th anniversary of the ENCODE publicity campaign fiasco].

I like what he says about "churnalism" ...

In an age of 'churnalism', where time-pressed journalists often simply repeat the content of press releases in their articles (science news reports are often worded vitrually identically to a press release), scientists have a great deal of power—and a great deal of responsibility. The constraints of peer review, lax as they might be, aren't present at all when engaging with the media, and scientists' biases about the importance of their results can emerge unchecked. Frustratingly, once the hype bubble has been inflated by a press release, it's difficult to burst.

Press releases of all sorts are failing us but university press releases are the most disappointing because we expect universities to be credible sources of information. It's obvious that scientists have to accept the blame for deliberately distorting their findings but surely the information offices at universities are also at fault? I once suggested that every press release has to include a statement, signed by the scientists, saying that the press release accurately reports the results and conclusions that are in the published article and does not contain any additional information or speculation that has not passed peer review.

Let's look at a recent example where the scientists would not have been able to truthfully sign such a statement.

A group of scientists based largely at The University of Sheffield in Sheffield (UK) recently published a paper in Nature on DNA damage in the human genome. They noted that such damage occurs preferentially at promoters and enhancers and is associated with demethylation and transcription activation. They presented evidence that the genome can be partially protected by a protein called "NuMA." I'll show you the abstract below but for now that's all you need to know.

The University of Sheffield decided to promote itself by issuing a press release: Breaks in ‘junk’ DNA give scientists new insight into neurological disorders. This title is a bit of a surprise since the paper only talks about breaks in enhancers and promoters and the word "junk" doesn't appear anywhere in the published report in Nature.

The first paragraph of the press release isn' very helpful.

‘Junk’ DNA could unlock new treatments for neurological disorders as scientists discover how its breaks and repairs affect our protection against neurological disease.

What could this mean? Surely they don't mean to imply that enhancers and promoters are "junk DNA"? That would be really, really, stupid. The rest of the press release should explain what they mean.

The groundbreaking research from the University of Sheffield’s Neuroscience Institute and Healthy Lifespan Institute gives important new insights into so-called junk DNA—or DNA previously thought to be non-essential to the coding of our genome—and how it impacts on neurological disorders such as Motor Neurone Disease (MND) and Alzheimer’s.

Until now, the body’s repair of junk DNA, which can make up 98 per cent of DNA, has been largely overlooked by scientists, but the new study published in Nature found it is much more vulnerable to breaks from oxidative genomic damage than previously thought. This has vital implications on the development of neurological disorders.

Oops! Apparently, they really are that stupid. The scientists who did this work seem to think that 98% of our genome is junk and that includes all the regulatory sequences. It seems like they are completely unaware of decades of work on discovering the function of these regulatory sequences. According The University of Sheffield, these regulatory sequences have been "largely overlooked by scientists." That will come as a big surprise to many of my colleagues who worked on gene regulation in the 1980s and in all the decades since then. It will probably also be a surprise to biochemistry and molecular biology undergraduates at Sheffield—at least I hope it will be a surprise.

Professor Sherif El-Khamisy, Chair in Molecular Medicine at the University of Sheffield, Co-founder and Deputy Director of the Healthy Lifespan Institute, said: “Until now the repair of what people thought is junk DNA has been mostly overlooked, but our study has shown it may have vital implications on the onset and progression of neurological disease."

I wonder if Professor Sherif El-Khamisy can name a single credible scientist who thinks that regulatory sequences are junk DNA?

There's no excuse for propagating this kind of misinformation about junk DNA. It's completely unnecessary and serves only to discredit the university and its scientists.

Ray, S., Abugable, A.A., Parker, J., Liversidge, K., Palminha, N.M., Liao, C., Acosta-Martin, A.E., Souza, C.D.S., Jurga, M., Sudbery, I. and El-Khamisy, S.F. (2022) A mechanism for oxidative damage repair at gene regulatory elements. Nature, 609:1038-1047. doi:[doi: 10.1038/s41586-022-05217-8]

Oxidative genome damage is an unavoidable consequence of cellular metabolism. It arises at gene regulatory elements by epigenetic demethylation during transcriptional activation1,2. Here we show that promoters are protected from oxidative damage via a process mediated by the nuclear mitotic apparatus protein NuMA (also known as NUMA1). NuMA exhibits genomic occupancy approximately 100 bp around transcription start sites. It binds the initiating form of RNA polymerase II, pause-release factors and single-strand break repair (SSBR) components such as TDP1. The binding is increased on chromatin following oxidative damage, and TDP1 enrichment at damaged chromatin is facilitated by NuMA. Depletion of NuMA increases oxidative damage at promoters. NuMA promotes transcription by limiting the polyADP-ribosylation of RNA polymerase II, increasing its availability and release from pausing at promoters. Metabolic labelling of nascent RNA identifies genes that depend on NuMA for transcription including immediate–early response genes. Complementation of NuMA-deficient cells with a mutant that mediates binding to SSBR, or a mitotic separation-of-function mutant, restores SSBR defects. These findings underscore the importance of oxidative DNA damage repair at gene regulatory elements and describe a process that fulfils this function.


Thursday, October 13, 2022

Macroevolution

(This is a copy of an essay that I published in 2006. I made some minor revisions to remove outdated context.)

Overheard at breakfast on the final day of a recent scientific meeting: "Do you believe in macroevolution?" Came the rely: "Well, it depends on how you define it."
                                                                         Roger Lewin (1980)

There is no difference between micro- and macroevolution except that genes between species usually diverge, while genes within species usually combine. The same processes that cause within-species evolution are responsible for above-species evolution.
                                                                         John Wilkins

The minimalist definition of evolution is a change in the hereditary characteristics of a population over the course of many generations. This is a definition that helps us distinguish between changes that are not evolution and changes that meet the minimum criteria. The definition comes from the field of population genetics developed in the early part of the last century. The modern theory of evolution owes much to population genetics and our understanding of how genes work. But is that all there is to evolution?

The central question of the Chicago conference was whether the mechanisms underlying microevolution can be extrapolated to explain the phenomena of macroevolution. At the risk of doing violence to the positions of some of the people at the meeting, the answer can be given as a clear, No.
               Roger Lewin (1980)

No. There's also common descent—the idea that all life has evolved from primitive species over billions of years. Common descent is about the history of life. In this essay I'll describe the main features of how life evolved but keep in mind that this history is a unique event that is accidental, contingent, quirky, and unpredictable. I'll try and point out the most important controversies about common descent.

The complete modern theory of evolution encompasses much more than changes in the genetics of a population. It includes ideas about the causes of speciation, long-term trends, and mass extinctions. This is the domain of macroevolution—loosely defined as evolution above the species level. The kind of evolution that focuses on genes in a population is usually called microevolution.

As a biochemist and a molecular biologist, I tend to view evolution from a molecular perspective. My main interest is molecular evolution and the analysis of sequences of proteins and nucleic acids. One of the goals in writing this essay is to explain this aspect of evolution to the best of my limited ability. However, another important goal is to show how molecular evolution integrates into the bigger picture of evolution as described by all other evolutionary biologists, including paleontologists. When dealing with macroevolution this is very much a learning experience for me since I'm not an expert. Please bear with me while we explore these ideas.

It's difficult to define macroevolution because it's a field of study and not a process. Mark Ridley has one of the best definitions I've seen ...

Macroevolution means evolution on the grand scale, and it is mainly studied in the fossil record. It is contrasted with microevolution, the study of evolution over short time periods., such as that of a human lifetime or less. Microevolution therefore refers to changes in gene frequency within a population .... Macroevolutionary events are more likely to take millions, probably tens of millions of years. Macroevolution refers to things like the trends in horse evolution described by Simpson, and occurring over tens of millions of years, or the origin of major groups, or mass extinctions, or the Cambrian explosion described by Conway Morris. Speciation is the traditional dividing line between micro- and macroevolution.
                                                                         Mark Ridley (1997) p. 227

When we talk about macroevolution we're talking about studies of the history of life on Earth. This takes in all the events that affect the actual historical lineages leading up to today's species. Jeffrey S. Levinton makes this point in his description of the field of macroevolution and it's worth quoting what he says in his book Genetics, Paleontology, and Macroevolution.

Macroevolution must be a field that embraces the ecological theater, including the range of time scales of the ecologist, to the sweeping historical changes available only to paleontological study. It must include the peculiarities of history, which must have had singular effects on the directions that the composition of the world's biota took (e.g., the splitting of continents, the establishment of land and oceanic isthmuses). It must take the entire network of phylogenetic relationships and impose a framework of genetic relationships and appearances of character changes. Then the nature of evolutionary directions and the qualitative transformation of ancestor to descendant over major taxonomic distances must be explained.
                                                                     Jeffrey S. Levinton (2001) p.6

Levinton then goes on to draw a parallel between microevolution and macroevolution on the one hand, and physics and astronomy on the other. He points out that the structure and history of the known universe has to be consistent with modern physics, but that's not sufficient. He gives the big bang as an example of a cosmological hypothesis that doesn't derive directly from fundamental physics. I think this analogy is insightful. Astronomers study the life and death of stars and the interactions of galaxies. Some of them are interested in the formation of planetary systems, especially the unique origin of our own solar system. Explanations of these "macro" phenomena depend on the correctness of the underlying "micro" physics phenomena (e.g., gravity, relativity) but there's more to the field of astronomy than that.

Levinton continues ....

Does the evolutionary biologist differ very much from this scheme of inference? A set of organisms exists today in a partially measurable state of spatial, morphological, and chemical relationships. We have a set of physical and biological laws that might be used to construct predictions about the outcome of the evolutionary process. But, as we all know, we are not very successful, except at solving problems at small scales. We have plausible explanations for the reason why moths living in industrialized areas are rich in dark pigment, but we don't know whether or why life arose more than once or why some groups became extinct (e.g., the dinosaurs) whereas others managed to survive (e.g., horseshoe crabs). Either our laws are inadequate and we have not described the available evidence properly or no such laws can be devised to predict uniquely what should have happened in the history of life. For better or worse, macroevolutionary biology is as much historical as is astronomy, perhaps with looser laws and more diverse objectives....

Indeed, the most profound problem in the study of evolution is to understand how poorly repeatable historical events (e.g., the trapping of an endemic radiation in a lake that dries up) can be distinguished from lawlike repeatable processes. A law that states 'an endemic radiation will become extinct if its structural habitat disappears' has no force because it maps to the singularity of a historical event.
                                                                 Jeffrey S. Levinton (2001) p.6-7

In conclusion, then, macroevolutionary processes are underlain by microevolutionary phenomena and are compatible with microevolutionary theories, but macroevolutionary studies require the formulation of autonomous hypotheses and models (which must be tested using macroevolutionary evidence). In this (epistemologically) very important sense, macroevolution is decoupled from microevolution: macroevolution is an autonomous field of evolutionary study.
     Francisco J. Ayala (1983)

I think it's important to appreciate what macroevolutionary biologists are saying. Most of these scientists are paleontologists and they think of their area of study as an interdisciplinary field that combines geology and biology. According to them, there's an important difference between evolutionary theory and the real history of life. The actual history has to be consistent with modern evolutionary theory (it is) but the unique sequence of historical events doesn't follow directly from application of evolutionary theory. Biological mechanisms such as natural selection and random genetic drift are part of a much larger picture that includes moving continents, asteroid impacts, ice ages, contingency, etc. The field of macroevolution addresses these big picture issues.

Clearly, there are some evolutionary biologists who are only interested in macroevolution. They don't care about microevolution. This is perfectly understandable since they are usually looking at events that take place on a scale of millions of years. They want to understand why some species survive while others perish and why there are some long-term trends in the history of life. (Examples of such trends are the loss of toes during the evolution of horses, the development of elaborate flowers during the evolution of vascular plants, and the tendency of diverse species, such as the marsupial Tasmanian wolf and the common placental wolf, to converge on a similar body plan.)

Nobody denies that macroevolutionary processes involve the fundamental mechanisms of natural selection and random genetic drift, but these microevolutionary processes are not sufficient, by themselves, to explain the history of life. That's why, in the domain of macroevolution, we encounter theories about species sorting and tracking, species selection, and punctuated equilibria.

Micro- and macroevolution are thus different levels of analysis of the same phenomenon: evolution. Macroevolution cannot solely be reduced to microevolution because it encompasses so many other phenomena: adaptive radiation, for example, cannot be reduced only to natural selection, though natural selection helps bring it about.
     Eugenie C. Scott (2004)

As I mentioned earlier, most of macroevolutionary theory is intimately connected with the observed fossil record and, in this sense, it is much more historical than population genetics and evolution within a species. Macroevolution, as a field of study, is the turf of paleontologists and much of the debate about a higher level of evolution (above species and populations) is motivated by the desire of paleontologists to be accepted at the high table of evolutionary theory. It's worth recalling that during the last part of the twentieth century evolutionary theorizing was dominated by population geneticists. Their perspective was described by John Maynard Smith, "... the attitude of population geneticists to any paleontologist rash enough to offer a contribution to evolutionary theory has been to tell him to go away and find another fossil, and not to bother the grownups." (Maynard Smith, 1984)

The distinction between microevolution and macroevolution is often exaggerated, especially by the anti-science crowd. Creationists have gleefully exploited the distinction in order to legitimate their position in the light of clear and obvious examples of evolution that they can't ignore. They claim they can accept microevolution, but they reject macroevolution.

In the real world—the one inhabited by rational human beings—the difference between macroevolution and microevolution is basically a difference in emphasis and level. Some evolutionary biologists are interested in species, trends, and the big picture of evolution, while others are more interested in the mechanics of the underlying mechanisms.

Speciation is critical to conserving the results of both natural selection and genetic drift. Speciation is obviously central to the fate of genetic variation, and a major shaper of patterns of evolutionary change through evolutionary time. It is as if Darwinians—neo- and ulra- most certainly included—care only for the process generating change, and not about its ultimate fate in geological time.
     Niles Eldredge (1995)

The Creationists would have us believe there is some magical barrier separating selection and drift within a species from the evolution of new species and new characteristics. Not only is this imagined barrier invisible to most scientists but, in addition, there is abundant evidence that no such barrier exists. We have numerous examples that show how diverse species are connected by a long series of genetic changes. This is why many scientists claim that macroevoluton is just lots of microevolution over a long period of time.

But wait a minute. I just said that many scientists think of macroevolution as simply a scaled-up version of microevolution, but a few paragraphs ago I said there's more to the theory of evolution than just changes in the frequency of alleles within a population. Don't these statements conflict? Yes, they do ... and therein lies a problem.

When the principle tenets of the Modern Synthesis were being worked out in the 1940's, one of the fundamental conclusions was that macroevolution could be explained by changes in the frequency of alleles within a population due, mostly, to natural selection. This gave rise to the commonly accepted notion that macroevolution is just a lot of microevolution. Let's refer to this as the sufficiency of microevolution argument.

At the time of the synthesis, there were several other explanations that attempted to decouple macroevolution from microevolution. One of these was saltation, or the idea that macroevolution was driven by large-scale mutations (macromutations) leading to the formation of new species. This is the famous "hopeful monster" theory of Goldschmidt. Another decoupling hypothesis was called orthogenesis, or the idea that there is some intrinsic driving force that directs evolution along certain pathways. Some macroevolutionary trends, such as the increase in the size of horses, were thought to be the result of this intrinsic force.

Both of these ideas about macroevolutionary change (saltation and orthogensis) had support from a number of evolutionary biologists. Both were strongly opposed by the group of scientists that produced the Modern Synthesis. One of the key players was the paleontologist George Gaylord Simpson whose books Tempo and Mode in Evolution (1944) and The Major Features of Evolution (1953) attempted to combine paleontology and population genetics. "Tempo" is often praised by evolutionary biologists and many of our classic examples of evolution, such as the bushiness of the horse tree, come from that book. It's influence on paleontologists was profound because it upset the traditional view that macroevolution and the newfangled genetics had nothing in common.

Just as mutation and drift introduce a strong random component into the process of adaptation, mass extinctions introduce chance into the process of diversification. This is because mass extinctions are a sampling process analogous to genetic drift. Instead of sampling allele frequencies, mass extinctions samples species and lineages. ... The punchline? Chance plays a large role in the processes responsible for adaptation and diversity.
        Freeman and Herron (1998)

We see, in context, that the blurring of the distinction between macroevolution and microevolution was part of a counter-attack on the now discredited ideas of saltation and orthogenesis. As usual, when pressing the attack against objectionable ideas, there's a tendency to overrun the objective and inflict collateral damage. In this case, the attack on orthogenesis and the old version of saltation was justified since neither of these ideas offer viable alternatives to natural selection and drift as mechanisms of evolution. Unfortunately, Simpson's attack was so successful that a generation of scientists grew up thinking that macroevolution could be entirely explained by microevolutionary processes. That's why we still see this position being advocated today and that's why many biology textbooks promote the sufficiency of microevolution argument. Gould argues—successfully, in my opinion—that the sufficiency of microevolution became dogma during the hardening of the synthesis in the 1950-'s and 1960's. It was part of an emphasis on the individual as the only real unit of selection.

However, from the beginning of the Modern Synthesis there were other evolutionary biologists who wanted to decouple macroevolution and microevolution—not because they believed in the false doctrines of saltation and orthogenesis, but because they knew of higher level processes that went beyond microevolution. One of these was Ernst Mayr. In his essay "Does Microevolution Explain Macroevolution," Mayr says ...

Among all the claims made during the evolutionary synthesis, perhaps the one that found least acceptance was the assertion that all phenomena of macroevolution can be ‘reduced to,' that is, explained by, microevolutionary genetic processes. Not surprisingly, this claim was usually supported by geneticists but was widely rejected by the very biologists who dealt with macroevolution, the morphologists and paleontologists. Many of them insisted that there is more or less complete discontinuity between the processes at the two levels—that what happens at the species level is entirely different from what happens at the level of the higher categories. Now, 50 years later the controversy remains undecided.
                                                                         Ernst Mayr (1988) p.402

Mayr goes on to make several points about the difference between macroevolution and microevolution. In particular, he emphasizes that macroevolution is concerned with phenotypes and not genotypes, "In this respect, indeed, macroevolution as a field of study is completely decoupled from microevolution." (ibid p. 403). This statement reiterates an important point, namely that macroevolution is a "field of study" and, as such, its focus differs from that of other fields of study such as molecular evolution.

If you think of macroevolution as a field of study rather than a process, then it doesn't make much sense to say that macroevolution can be explained by the process of changing alleles within a population. This would be like saying the entire field of paleontology can be explained by microevolution. This is the point about the meaning of the term "macroevolution" that is so often missed by those who dismiss it as just a bunch of microevolution.

The orthodox believers in the hardened synthesis feel threatened by macroevolution since it implies a kind of evolution that goes beyond the natural selection of individuals within a population. The extreme version of this view is called adaptationism and the believers are called Ultra-Darwinians by their critics. This isn't the place to debate adaptationism: for now, let's just assume that the sufficiency of microevolution argument is related to the pluralist-adaptationist controversy and see how our concept of macroevolution as a field of study relates to the issue. Niles Eldredge describes it like this ...

The very term macroevolution is enough to make an ultra-Darwinian snarl. Macroevolution is counterpoised with microevolution—generation by generation selection- mediated change in gene frequencies within populations. The debate is over the question, Are conventional Darwinian microevolutionary processes sufficient to explain the entire history of life? To ultra-Darwinians, the very term macroevolution suggests that the answer is automatically no. To them, macroevolution implies the action of processes—even genetic processes—that are as yet unknown but must be imagined to yield a satisfactory explanation of the history of life.

But macroevolution need not carry such heavy conceptual baggage. In its most basic usage, it simply means evolution on a large-scale. In particular, to some biologists, it suggests the origin of major groups - such as the origin and radiation of mammals, or the derivation of whales and bats from terrestrial mammalian ancestors. Such sorts of events may or may not demand additional theory for their explanation. Traditional Darwinian explanation, of course, insists not.
                                                              Niles Eldredge (1995) p. 126-127

Eldredge sees macroevolution as a field of study that's mostly concerned with evolution on a large scale. Since he's a paleontologist, it's likely that, for him, macroevolution is the study of evolution based on the fossil record. Eldredge is quite comfortable with the idea that one of the underlying causes of evolution can be natural selection—this includes many changes seen over the course of millions of years. In other words, there is no conflict between microevolution and macroevolution in the sense that microevolution stops and is replaced by macroevolution above the level of species. But there is a conflict in the sense that Eldredge, and many other evolutionary biologists, do not buy the sufficiency of microevolution argument. They believe there are additional theories, and mechanisms, needed to explain macroevolution. Gould says it best ....

We do not advance some special theory for long times and large transitions, fundamentally opposed to the processes of microevolution. Rather, we maintain that nature is organized hierarchically and that no smooth continuum leads across levels. We may attain a unified theory of process, but the processes work differently at different levels and we cannot extrapolate from one level to encompass all events at the next. I believe, in fact, that ... speciation by splitting guarantees that macroevolution must be studied at its own level. ... [S]election among species—not an extrapolation of changes in gene frequencies within populations—may be the motor of macroevolutionary trends. If macroevolution is, as I believe, mainly a story of the differential success of certain kinds of species and, if most species change little in the phyletic mode during the course of their existence, then microevolutionary change within populations is not the stuff (by extrapolation) of major transformations.
                                                         Stephen Jay Gould (1980b) p. 170

Naturalists such as Ernst Mayr and paleontologists such as Gould and Eldredge have all argued convincingly that speciation is an important part of evolution. Since speciation is not a direct consequence of changes in the frequencies of alleles in a population, it follows that microevolution is not sufficient to explain all of evolution. Gould and Eldredge (and others) go even further to argue that there are processes such as species sorting that can only take place above the species level. This means there are evolutionary theories that only apply in the domain of macroevolution.

The idea that there's much more to evolution than genes and population genetics was a favorite theme of Stephen Jay Gould. He advocated a pluralist, hierarchical approach to evolution and his last book The Structure of Evolutionary Theory emphasized macroevolutionary theory—although he often avoided using this term. The Structure of Evolutionary Theory is a huge book that has become required reading for anyone interested in evolution. Remarkably, there's hardly anything in the book about population genetics, molecular evolution, and microevolution as popularly defined. What better way of illustrating that macroevolution must be taken seriously!

Macroevolutionary theory tries to identify patterns and trends that help us understand the big picture. In some cases, the macroevolution biologists have recognized generalities (theories & hypotheses) that only apply to higher level processes. Punctuated equilibria and species sorting are examples of such higher level phenomena. The possible repeatedness of mass extinctions might be another.

Remember that macroevolution should not be contrasted with microevolution because macroevolution deals with history. Microevolution and macroevolution are not competing explanations of the history of life any more than astronomy and physics compete for the correct explanation of the history of the known universe. Both types of explanation are required.

I think species sorting is the easiest higher level phenomena to describe. It illustrates a mechanism that is clearly distinct from changes in the frequencies of alleles within a population. In this sense, it will help explain why microevolution isn't a sufficient explanation for the evolution of life. Of course, one needs to emphasize that macroevolution must be consistent with microevolution.

I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory.
        Stephen Jay Gould (2002)

If we could track a single lineage through time, say from a single-cell protist to Homo sapiens, then we would see a long series of mutations and fixations as each ancestral population evolved. It might look as though the entire history could be accounted for by microevolutionary processes. This is an illusion because the track of the single lineage ignores all of the branching and all of the other species that lived and died along the way. That track would not explain why Neanderthals became extinct and Cro-Magnon survived. It would not explain why modern humans arose in Africa. It would not tell us why placental mammals became more successful than the dinosaurs. It would not explain why humans don't have wings and can't breathe underwater. It doesn't tell us whether replaying the tape of life will automatically lead to humans. All of those things are part of the domain of macroevolution and microevolution isn't sufficient to help us understand them.