More Recent Comments

Thursday, April 08, 2021

On the accuracy of genomics in detecting disease variants

Several diseases, such as cancers, are caused by the presence of deleterious alleles that affect the function of a gene. In the case of cancer, most of the mutations are somatic cell mutations—mutations that have occurred after fertilization. These mutations will not be passed on to future generations. However, there are some variants that are present in the germline and these will be inherited. A small percentage of these variants will cause cancer directly but most will just indicate a predisposition to develop cancer.

There are a host of other diseases that have a genetic component and the responsible alleles can also be present in the germline or due to somatic cell mutations.

Over the past fifty years or so there has been a lot of hype associated with the latest technological advances and the ability to detect deleterious germline mutations. The general public has been repeatedly told that we will soon be able to identify all disease-causing alleles and this will definitely lead to incredible medical advances in treating these diseases. Just yesterday, for example, I posted an article on predictions made by The National Genome Research Institute (USA) who predicts that by 2030,

The clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation ‘variant of uncertain significance (VUS)’ obsolete.

Similar predictions, in various forms, were made when the human genome project got under way and at various time afterword. First there was the 1000 genomes project then there was the 100,000 genome project and, of course, ENCODE. The problem is that genomics hasn't lived up to these expectations and there's a very good reason for that: it's because the problem is a lot more difficult than it seems.

One of the Facebook groups that I follow (Modern Genetics & Technology)1 alerted me to a recent paper in JAMA that addressed the problem of genomics accuracy and the prediction of pathogenic variants. I'm posting the complete abstract so you can see the extent of the problem.

AlDubayan, S.H., Conway, J.R., Camp, S.Y., Witkowski, L., Kofman, E., Reardon, B., Han, S., Moore, N., Elmarakeby, H. and Salari, K. (2020) Detection of Pathogenic Variants With Germline Genetic Testing Using Deep Learning vs Standard Methods in Patients With Prostate Cancer and Melanoma. JAMA 324:1957-1969. [doi: 10.1001/jama.2020.20457]

Importance Less than 10% of patients with cancer have detectable pathogenic germline alterations, which may be partially due to incomplete pathogenic variant detection.

Objective To evaluate if deep learning approaches identify more germline pathogenic variants in patients with cancer.

Design Setting, and Participants A cross-sectional study of a standard germline detection method and a deep learning method in 2 convenience cohorts with prostate cancer and melanoma enrolled in the US and Europe between 2010 and 2017. The final date of clinical data collection was December 2017.

Exposures Germline variant detection using standard or deep learning methods.

Main Outcomes and Measures The primary outcomes included pathogenic variant detection performance in 118 cancer-predisposition genes estimated as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The secondary outcomes were pathogenic variant detection performance in 59 genes deemed actionable by the American College of Medical Genetics and Genomics (ACMG) and 5197 clinically relevant mendelian genes. True sensitivity and true specificity could not be calculated due to lack of a criterion reference standard, but were estimated as the proportion of true-positive variants and true-negative variants, respectively, identified by each method in a reference variant set that consisted of all variants judged to be valid from either approach.

Results The prostate cancer cohort included 1072 men (mean [SD] age at diagnosis, 63.7 [7.9] years; 857 [79.9%] with European ancestry) and the melanoma cohort included 1295 patients (mean [SD] age at diagnosis, 59.8 [15.6] years; 488 [37.7%] women; 1060 [81.9%] with European ancestry). The deep learning method identified more patients with pathogenic variants in cancer-predisposition genes than the standard method (prostate cancer: 198 vs 182; melanoma: 93 vs 74); sensitivity (prostate cancer: 94.7% vs 87.1% [difference, 7.6%; 95% CI, 2.2% to 13.1%]; melanoma: 74.4% vs 59.2% [difference, 15.2%; 95% CI, 3.7% to 26.7%]), specificity (prostate cancer: 64.0% vs 36.0% [difference, 28.0%; 95% CI, 1.4% to 54.6%]; melanoma: 63.4% vs 36.6% [difference, 26.8%; 95% CI, 17.6% to 35.9%]), PPV (prostate cancer: 95.7% vs 91.9% [difference, 3.8%; 95% CI, –1.0% to 8.4%]; melanoma: 54.4% vs 35.4% [difference, 19.0%; 95% CI, 9.1% to 28.9%]), and NPV (prostate cancer: 59.3% vs 25.0% [difference, 34.3%; 95% CI, 10.9% to 57.6%]; melanoma: 80.8% vs 60.5% [difference, 20.3%; 95% CI, 10.0% to 30.7%]). For the ACMG genes, the sensitivity of the 2 methods was not significantly different in the prostate cancer cohort (94.9% vs 90.6% [difference, 4.3%; 95% CI, –2.3% to 10.9%]), but the deep learning method had a higher sensitivity in the melanoma cohort (71.6% vs 53.7% [difference, 17.9%; 95% CI, 1.82% to 34.0%]). The deep learning method had higher sensitivity in the mendelian genes (prostate cancer: 99.7% vs 95.1% [difference, 4.6%; 95% CI, 3.0% to 6.3%]; melanoma: 91.7% vs 86.2% [difference, 5.5%; 95% CI, 2.2% to 8.8%]).

Conclusions and Relevance Among a convenience sample of 2 independent cohorts of patients with prostate cancer and melanoma, germline genetic testing using deep learning, compared with the current standard genetic testing method, was associated with higher sensitivity and specificity for detection of pathogenic variants. Further research is needed to understand the relevance of these findings with regard to clinical outcomes.

It's really difficult to understand this paper since there are many terms that I'd have to research more thoroughly; for example, does "germline whole-exon sequencing" mean that only sperm or egg DNA was sequenced and that every single exon in the entire genome was sequenced? Were exons in noncoding genes also sequenced?

I found it much more useful to look at the accompanying editorial by Gregory Feero.

Feero, W.G. (2020) Bioinformatics, Sequencing Accuracy, and the Credibility of Clinical Genomics. JAMA 324:1945-1947. [doi: 10.1001/jama.2020.19939]

Ferro explains that the main problem is distinguishing real pathogenic variants from false positives and this can only be accomplished by first sequencing and assembling the DNA and then using various algorithms to focus on important variants. Then there's the third step.

The third step, which often requires a high level of clinical expertise, sifts through detected potentially deleterious variations to determine if any are relevant to the indication for testing. For example, exome sequencing ordered for a patient with unexplained cardiomyopathy might harbor deleterious variants in the BRCA1 gene which, while a potentially important incidental finding, does not provide a plausible molecular diagnosis for the cardiomyopathy. The complexity of the bioinformatics tools used in these 3 steps is considerable.

It's that third step that's analyzed in the AlDubayan et al. paper and one of the tools used is a deep-learning (AI) algorithm. However, the training of this algorithm requiries considerable clinical expertise and testing it requires a gold standard set of variants to serve as an internal control. As you might have guessed, that gold standard doesn't exist because the whole point of the genomics is to identify perviously unknown deleterious alleles.

Ferro warns us that "clinical genome sequencing remains largely unregulated and accuracy is highly dependant on the expertise of individual testing laboratories." He concludes that genomics still has a long way to go.

The genomics community needs to act as a coherent body to ensure reproducibility of outcomes from clinical genome or exome sequencing, or provide transparent quality metrics for individual clinical laboratories. Issues related to achieving accuracy are not new, are not limited to bioinformatics tools, and will not be surmounted easily. However, until analytic and clinical validity are ensured, conversations about the potential value that genome sequencing brings to clinical situations will be challenging for clinical centers, laboratories that provide sequencing services, and consumers. For the foreseeable future, nongeneticist clinicians should be familiar with the quality of their chosen genome-sequencing laboratory and engage expert advice before changing patient management based on a test result.

I'm guessing that Gregory Feero doesn't think that in nine years (2030) "The clinical relevance of all encountered genomic variants will be readily predictable."


1. I do NOT recommend this group. It's full of amateurs who resist leaning and one of it's main purposes is to post copies of pirated textbooks in its files. The group members get very angry when you tell them that what they are doing is illegal!

Wednesday, April 07, 2021

Bold predictions for human genomics by 2030

After spending several years working on a book about the human genome I've come to the realization that the field of genomics is not delivering on its promise to help us understand what's in your genome. In fact, genomics researchers have by and large impeded progress by coming up with false claims that need to be debunked.

My view is not widely shared by today's researchers who honestly believe they have made tremendous progress and will make even more as long as they get several billion dollars to continue funding their research. This view is nicely summarized in a Scientific American article from last fall that's really just a precis of an article that first appeared in Nature. The Nature article was written by employees of the National Human Genome Research Institute (NHGRI) at the National Institutes of Health in Bethesda, MD, USA (Green et al., 2020). Its purpose is to promote the work that NHGRI has done in the past and to summarize its strategic vision for the future. At the risk of oversimplifying, the strategic vision is "more of the same."

Green, E.D., Gunter, C., Biesecker, L.G., Di Francesco, V., Easter, C.L., Feingold, E.A., Felsenfeld, A.L., Kaufman, D.J., Ostrander, E.A. and Pavan, W.J. and 20 others (2020) Strategic vision for improving human health at The Forefront of Genomics. Nature 586:683-692. [doi: 10.1038/s41586-020-2817-4]

Starting with the launch of the Human Genome Project three decades ago, and continuing after its completion in 2003, genomics has progressively come to have a central and catalytic role in basic and translational research. In addition, studies increasingly demonstrate how genomic information can be effectively used in clinical care. In the future, the anticipated advances in technology development, biological insights, and clinical applications (among others) will lead to more widespread integration of genomics into almost all areas of biomedical research, the adoption of genomics into mainstream medical and public-health practices, and an increasing relevance of genomics for everyday life. On behalf of the research community, the National Human Genome Research Institute recently completed a multi-year process of strategic engagement to identify future research priorities and opportunities in human genomics, with an emphasis on health applications. Here we describe the highest-priority elements envisioned for the cutting-edge of human genomics going forward—that is, at ‘The Forefront of Genomics’.

What's interesting are the predictions that the NHGRI makes for 2030—predictions that were highlighted in the Scientific American article. I'm going to post those predictions without comment other than saying that I think they are mostly bovine manure. I'm interested in hearing your comments.

Bold predictions for human genomics by 2030

Some of the most impressive genomics achievements, when viewed in retrospect, could hardly have been imagined ten years earlier. Here are ten bold predictions for human genomics that might come true by 2030. Although most are unlikely to be fully attained, achieving one or more of these would require individuals to strive for something that currently seems out of reach. These predictions were crafted to be both inspirational and aspirational in nature, provoking discussions about what might be possible at The Forefront of Genomics in the coming decade.

  1. Generating and analysing a complete human genome sequence will be routine for any research laboratory, becoming as straightforward as carrying out a DNA purification.
  2. The biological function(s) of every human gene will be known; for non-coding elements in the human genome, such knowledge will be the rule rather than the exception.
  3. The general features of the epigenetic landscape and transcriptional output will be routinely incorporated into predictive models of the effect of genotype on phenotype.
  4. Research in human genomics will have moved beyond population descriptors based on historic social constructs such as race.
  5. Studies that involve analyses of genome sequences and associated phenotypic information for millions of human participants will be regularly featured at school science fairs.
  6. The regular use of genomic information will have transitioned from boutique to mainstream in all clinical settings, making genomic testing as routine as complete blood counts.
  7. The clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation ‘variant of uncertain significance (VUS)’ obsolete.
  8. An individual’s complete genome sequence along with informative annotations will, if desired, be securely and readily accessible on their smartphone.
  9. Individuals from ancestrally diverse backgrounds will benefit equitably from advances in human genomics.
  10. Breakthrough discoveries will lead to curative therapies involving genomic modifications for dozens of genetic diseases.

I predict that nine years from now (2030) we will still be dealing with scientists who think that most of our genome is functional; that most human protein-coding genes produce many different proteins by alternative splicing; that epigenetics is useful; that there are more noncoding genes than protein-coding genes; that the leading scientists in the 1960 and 70s were incredibly stupid to suggest junk DNA; that almost every transcription factor binding site is biologically relevant; that most transposon-related sequences have a mysterious (still unknown) function; that it's still a mystery why humans are so much more complex than chimps; and that genomics will eventually solve all problems by 2040.

Why in the world, you might ask, would we still be dealing with issues like that? Because of genomics.


Saturday, April 03, 2021

"Dark matter" as an argument against junk DNA

Opponents of junk DNA have been largely unsuccessful in demonstrating that most of our genome is functional. Many of them are vaguely aware of the fact that "no function" (i.e. junk) is the default hypothesis and the onus is on them to come up with evidence of function. In order to shift, or obfuscate, this burden of proof they have increasingly begun to talk about the "dark matter" of the genome. The idea is to pretend that most of the genome is a complete mystery so that you can't say for certain whether it is junk or functional.

One of the more recent attempts appears in the "Journal Club" section of Nature Reviews Genetics. It focuses on repetitive DNA.

Before looking at that article, let's begin by summarizing what we already know about repetitive DNA. It includes highly repetitive DNA consisting of mutliple tandem repeats of short sequences such as ATATATATAT... or CGACGACGACGA ... or even longer repeats. Much of this is located in centromeric regions of the chromosome and I estimate that functional highly repetitve regions make up about 1% of the genome.[see Centromere DNA and Telomeres]

The other part of repetitive DNA is middle repetitive DNA, which is largely composed of transposons and endogenous viruses, although it includes ribosomal RNA genes and origins of replication. Most of these sequences are dispersed as single copies throughout the genome. It's difficult to determine exactly how much of the genome consists of these middle repetitive sequences but it's certainly more than 50%.

Almost all of the transposon- and virus-related sequences are defective copies of once active transposons and viruses. Most of them are just fragments of the originals. They are evolving at the neutral rate so they look like junk and they behave like junk.1 That's not selfish DNA because is doesn't transpose and it's not "dark matter." These fragments have all the characterstics of nonfunctional junk in our genome.

We know that the C-value paradox is mostly explained by differing amounts of repetitive DNA in different genomes and this is consistent with the idea that they are junk. We know that less that 10% of our genome is conserved and this fits in with that conclusion. Finally, we know that genetic load arguments indicate that most our genome must be impervious to mutation. Combined, these are all powerful bits of evidence and logic in favor of repetitive sequences being mostly junk DNA.

Now let's look at what Neil Gemmell says in this article.

Gemmell, N.J. (2021) Repetitive DNA: genomic dark matter matters. Nature Reviews Genetics:1-1. [doi: 10.1038/s41576-021-00354-8]

"Repetitive DNA sequences were found in hundreds of thousands, and sometimes millions, of copies in the genomes of most eukaryotes. while widespread and evolutionarily conserved, the function of these repeats was unknown. Provocatively, Britten and Kohne concluded 'a concept that is repugnant to us is that about half of the DNA of higher organisms is trivial or permanently inert.'”"

That's from Britten and Kohne (1968) and it's true that more than 50 years ago those workers didn't like the idea of junk DNA. Britten argued that most of this repetitive DNA was likely to be involved in regulation. Gemmell goes on to describe centromeres and telomeres and mentions that most repetitive DNA was thought to be junk.

"... the idea that much of the genome is junk, maintained and perpetuated by random chance, seemed as broadly unsatisfactory to me as it had to the original authors. Enthralled by the mystery of why half our genome is repetitive DNA, I have followed this field ever since."

Gemmell is not alone. In spite of all the evidence for junk DNA, the majority of scientists don't like the fact that most of our genome is junk. Here's how he justifies his continued skepticism.

"But it was not until the 2000s, as full eukaryotic genome sequences emerged, that we discovered that the repetitive non-coding regions of our genome harbour large numbers of promoters, enhancers, transcription factor binding sites and regulatory RNAs that control gene expression. More recently, the importance of repetitive DNA in both structural and regulatory processes has emerged, but much remains to be discovered and understood. It is time to shine further light on this genomic dark matter."

This appears to be the ENCODE publicity campaign legacy rearing its ugly head once more. Most Sandwalk readers know that the presence of transcription factor binding sites, RNA polymerase binding sites, and junk RNA is exactly what one would predict from a genome full of defective transposons. Most of us know that a big fat sloppy genome is bound to contain millions of spurious binding sites for transcription factors so this says nothing about function.

Apparently Gemmell's skepticism doesn't apply to the ENCODE results so he still thinks that all those bits and pieces of transposons are mysterious bits of dark matter that could be several billion base pairs of functional DNA. I don't know what he imagines they could be doing.


Photo Credit: The photo shows human chromosomes labelled with a telomere probe (yellow), from Christoher Counter at Duke University.

1. In my book, I cover this in a section called "If it walks like a duck ..." It's a form of abductive reasoning.

Britten, R. and Kohne, D. (1968) Repeated Sequences in DNA. Science 161:529-540. [doi: 10.1126/science.161.3841.529]

Friday, April 02, 2021

Off to the publisher!

The first draft of my book is ready to be sent to my publisher.

Text by Laurence A. Moran

Cover art and figures by Gordon L. Moran

  • 11 chapters
  • 112,000 words (+ preface and glossary)
  • about 370 pages (estimated)
  • 26 figures
  • 305 notes
  • 400 references

©Laurence A. Moran


Wednesday, March 17, 2021

I think I'll skip this meeting

I just received an invitation to a meeting ...

On behalf of the international organizing committee, we would like to invite you to a conference to be held in Neve Ilan, near Jerusalem, from 4-8 October 2021, entitled ‘Potential and Limitations of Evolutionary Processes’. The main goal of this interdisciplinary, international conference is to bring together scientists and scholars who hold a range of viewpoints on the potential and possible limitations of various undirected chemical and biological processes.

The conference will include presentations from a broad spectrum of disciplines, including chemistry, biochemistry, biology, origin of life, evolution, mathematics, cosmology and philosophy. Open-floor discussion will be geared towards delineating mechanistic details, with a view to discussing in such a way that speakers and participants feel comfortable expressing different opinions and different interpretations of the data, in the spirit of genuine academic inquiry.

I'm pretty sure I got this invite because I attended the Royal Society Meeting on New trends in evolutionary biology: biological, philosophical and social science perspectives back in 2016. That meeting was a big disappointment because the proponents of extending the modern synthesis didn't have much of a case [Kevin Laland's new view of evolution].

I was curious to see what kind of followup the organizers of this new meeting were planning so I checked out the website at: Potential and Limitations of Evolutionary Processes. Warning bells went off immediately when I saw the list of topics.

  • Fine-Tuning of the Universe
  • The Origin of Life
  • Origin & Fine-Tuning of the Genetic Code
  • Origin of Novel Genes
  • Origin of Functional Islands in Protein Sequence Space
  • Origin of Multi-Component Molecular Machines
  • Fine-Tuning of Molecular Systems
  • Fine-Tuning in Complex Biological Systems
  • Evolutionary Waiting Times
  • History of Life & Comparative Genomics

This is a creationist meeting. A little checking shows that three of the four organizers, Russ Carlson, Anthony Futerman, and Siegfried Scherer, are creationists. (I don't know about the other organizer, Joel Sussman, but in this case guilt by association seems appropriate.)

I don't think I'll book a flight to Israel.


Happy St. Patrick's Day!

Happy St. Patrick's Day! These are my great-grandparents Thomas Keys Foster, born in County Tyrone on September 5, 1852 and Eliza Ann Job, born in Fintona, County Tyrone on August 18, 1852. Thomas came to Canada in 1876 to join his older brother, George, on his farm near London, Ontario, Canada. Eliza came the following year and worked on the same farm. Thomas and Eliza decided to move out west where they got married in 1882 in Winnipeg, Manitoba, Canada.

The couple obtained a land grant near Salcoats, Saskatchewan, a few miles south of Yorkton, where they build a sod house and later on a wood frame house that they named "Fairview" after a hill in Ireland overlooking the house where Eliza was born. That's where my grandmother, Ella, was born.

Other ancestors in this line came from the adjacent counties of Donegal (surname Foster) and Fermanagh (surnames Keys, Emerson, Moore) and possibly Londonderry (surname Job).

One of the cool things about studying your genealogy is that you can find connections to almost everyone. This means you can celebrate dozens of special days. In my case it was easy to find other ancestors from England, Scotland, Netherlands, Germany, France, Spain, Poland, Lithuania, Belgium, Ukraine, Russia, and the United States. Today, we will be celebrating St. Patrick's Day. It's rather hectic keeping up with all the national holidays but somebody has to keep the traditions alive!

It's nice to have an excuse to celebrate, especially when it means you can drink beer. However, I would be remiss if I didn't mention one little (tiny, actually) problem. Since my maternal grandmother is pure Irish, I should be 25% Irish but my DNA results indicate that I'm only 4% Irish. That's probalby because my Irish ancestors were Anglicans and were undoubtedly the descendants of settlers from England, Wales, and Scotland who moved to Ireland in the 1600s. This explains why they don't have very Irish-sounding names.

I don't mention this when I'm in an Irish pub.


Monday, March 15, 2021

Is science the only way of knowing?

Most of us learned that science provides good answers to all sort of questions ranging from whether a certain drug is useful in treating COVID-19 to whether humans evolved from primitive apes. A more interesting question is whether there are any limitations to science or whether there are any other effective ways of knowing. The question is related to the charge of "scientism," which is often used as a pejorative term to describe those of us who think that science is the only way of knowing.

I've discussed these issue many times of this blog so I won't rehash all the arguments. Suffice to say that there are two definitions of science; the broad definition and the narrow one. The narrow definition says that science is merely the activity carried out by geologists, chemists, physicists, and biologists. Using this definition it would be silly to say that science is the only way of knowing. The broad definition can be roughly described as: science is a way of knowing that relies on evidence, logic (rationality), and healthy skepticism.

The broad definition is the one preferred by many philosophers and it goes something like this ...

Unfortunately neither "science" nor any other established term in the English language covers all the disciplines that are parts of this community of knowledge disciplines. For lack of a better term, I will call them "science(s) in the broad sense." (The German word "Wissenschaft," the closest translation of "science" into that language, has this wider meaning; that is, it includes all the academic specialties, including the humanities. So does the Latin "scientia.") Science in a broad sense seeks knowledge about nature (natural science), about ourselves (psychology and medicine), about our societies (social science and history), about our physical constructions (technological science), and about our thought construction (linguistics, literary studies, mathematics, and philosophy). (Philosophy, of course, is a science in this broad sense of the word.)

Sven Ove Hanson "Defining Pseudoscience and Science" in Philosophy of Pseudescience: Reconsidering the Demarcation Problem.

Friday, March 12, 2021

The bad news from Ghent

A group of scientists, mostly from the University of Ghent1 (Belgium), have posted a paper on bioRxiv.

Lorenzi, L., Chiu, H.-S., Cobos, F.A., Gross, S., Volders, P.-J., Cannoodt, R., Nuytens, J., Vanderheyden, K., Anckaert, J. and Lefever, S. et al. (2019) The RNA Atlas, a single nucleotide resolution map of the human transcriptome. bioRxiv:807529. [doi: 10.1101/807529]

The human transcriptome consists of various RNA biotypes including multiple types of non-coding RNAs (ncRNAs). Current ncRNA compendia remain incomplete partially because they are almost exclusively derived from the interrogation of small- and polyadenylated RNAs. Here, we present a more comprehensive atlas of the human transcriptome that is derived from matching polyA-, total-, and small-RNA profiles of a heterogeneous collection of nearly 300 human tissues and cell lines. We report on thousands of novel RNA species across all major RNA biotypes, including a hitherto poorly-cataloged class of non-polyadenylated single-exon long non-coding RNAs. In addition, we exploit intron abundance estimates from total RNA-sequencing to test and verify functional regulation by novel non-coding RNAs. Our study represents a substantial expansion of the current catalogue of human ncRNAs and their regulatory interactions. All data, analyses, and results are available in the R2 web portal and serve as a basis to further explore RNA biology and function.

They spent a great deal of effort identifying RNAs from 300 human samples in order to construct an extensive catalogue of five kinds of transcripts: mRNAs, lncRNAs, antisenseRNAs, miRNAs, and circularRNAs. The paper goes off the rails in the first paragraph of the Results section where they immediately equate transcripts wiith genes. They report the following:

  • 19,107 mRNA genes (188 novel)
  • 18,387 lncRNA genes (13,175 novel)
  • 7,309 asRNA genes (2,519 novel)
  • 5,427 miRNAs
  • 5,427 circRNAs

Is science a social construct?

Richard Dawkins has written an essay for The Spectator in which he says,

"[Science is not] a social construct. It’s simply true. Or at least truth is real and science is the best way we have of finding it. ‘Alternative ways of knowing’ may be consoling, they may be sincere, they may be quaint, they may have a poetic or mythic beauty, but the one thing they are not is true. As well as being real, moreover, science has a crystalline, poetic beauty of its own.

The essay is not particularly provocative but it did provoke Jerry Coyne who pointed out that, "The profession of science" can be contrued as a social construct. In this sense Jerry is agreeing with his former supervisor, Richard Lewontin1 who wrote,

"Science is a social institution about which there is a great deal of misunderstanding, even among those who are part of it. We think that science is an institution, a set of methods, a set of people, a great body of knowledge that we call scientific, is somehow apart from the forces that rule our everyday lives and tha goven the structure of our society... The problems that science deals with, the ideas that it uses in investigating those problems, even the so-called scientific results that come out of scientific investigation, are all deeply influenced by predispositions that derive from the society in which we live. Scientists do not begin life as scientists after all, but as social beings immersed in a family, a state, a productive structure, and they view nature through a lens that has been molded by their social structure."

Coincidently, I just happened to be reading Science Fictions an excellent book by Stuart Ritchie who also believes that science is a social construct but he has a slighly different take on the matter.

"Science has cured diseases, mapped the brain, forcasted the climate, and split the atom; it's the best method we have of figuring out how the universe works and of bending it to our will. It is, in other words, our best way of moving towards the truth. Of course, we might never get there—a glance at history shows us hubristic it is to claim any facts as absolute or unchanging. For ratcheting our way towards better knowledge about the world, though, the methods of science is as good as it gets.

But we can't make progress withthose methods alone. It's not enough to make a solitary observation in your lab; you must also convince other scientists that you've discovered something real. This is where the social part comes. Philosophers have long discussed how important it is for scientists to show their fellow researchers how they came to their conclusions.

Dawkins, Coyne, Lewontin, and Ritchie are all right in different ways. Dawkins is talking about science as a way of knowing, although he restricts his definition of science to the natural sciences. The others are referring to the practice of science, or as Jerry Coyne puts it, the profession. It's true that the methods of science are the best way we have to get at the truth and it's true that the way of knowing is not a social construct in any meanigful sense.

Jerry Coyne is right to point out that the methods are employed by human scientists (he's also restricting the practice of science to scientists) and humans are fallible. In that sense, the enterprise of (natural) science is a social construct. Lewontin warns us that scientists have biases and prejudices and that may affect how they do science.

Ritchie makes a diffferent point by emphasizing that (natural) science is a collective endeavor and that "truth" often requires a consensus. That's the sense in which science is social. This is supposed to make science more robust, according to Ritchie, because real knowledge only emerges after carefull and skeptical scrutiny by other scientists. His book is mostly about how that process isn't working and why science is in big trouble. He's right about that.

I think it's important to distinguish between science as a way of knowing and the behavior and practice of scientists. The second one is affected by society and its flaws are well-known but the value of science as way of knowing can't be so easily dismissed.


1. The book is actually a series of lectures (The Massey Lectures) that Lewontin gave in Toronto (Ontario, Canada) in 1990. I attended those lectures.

Tuesday, February 16, 2021

The 20th anniversary of the human genome sequence:
6. Nature doubles down on ENCODE results

Nature has now published a series of articles celebrating the 20th anniversary of the publication of the draft sequences of the human genome [Genome revolution]. Two of the articles are about free access to information and, unlike a similar article in Science, the Nature editors aren't shy about mentioning an important event from 2001; namely, the fact that Science wasn't committed to open access.

By publishing the Human Genome Project’s first paper, we worked with a publicly funded initiative that was committed to data sharing. But the journal acknowledged there would be challenges to maintaining the free, open flow of information, and that the research community might need to make compromises to these principles, for example when the data came from private companies. Indeed, in 2001, colleagues at Science negotiated publishing the draft genome generated by Celera Corporation in Rockville, Maryland. The research paper was immediately free to access, but there were some restrictions on access to the full data.

Friday, February 12, 2021

The 20th anniversary of the human genome sequence:
5. 90% of our genome is junk

This is the fifth (and last) post in celebration of the 20th anniversary of publishing the draft sequence. The first four posts dealt with: (1) the way Science chose to commemorate the occasion [Access to the data]; (2) finishing the sequence; (3) the number of genes; and (4) the amount of functional DNA in the genome.

Back in 2001, knowledgeable scientists knew that most of the human genome is junk and the sequence confirmed that knowledge. Subsequent work on the human genome over the past 20 years has provided additional evidence of junk DNA so that we can now be confident that something like 90% of our genome is junk DNA. Here's a list of data and arguments that support that claim.

Wednesday, February 10, 2021

The 20th anniversary of the human genome sequence:
4. Functional DNA in our genome

We know a lot more about the human genome than we did when the draft sequences were published 20 years ago. One of the most important discoveries is the recognition and extent of true functional sequences in the genome. Genes are one example of such functional sequence but only a minor component (about 1.4%). Most of the functional regions of the genome are not genes.

Here's a list of functional DNA in our genome other than the functional part of genes.

  • Centromeres: There are 24 different centromeres and the average size is four million base pairs. Most of this is repetitive DNA and it adds up to about 3% of the genome. The total amount of centromeric DNA ranges from 2%-10% in different individuals. It's unlikely that all of the centromeric DNA is essential; about 1% seems to be a good estimate.
  • Telomeres: Telomeres are repetivie DNA sequences at the ends of chromosomes. They are required for the proper replication of DNA and they take up about 0.1% of the genome sequence.
  • Origins of replication: DNA replication begins at origins of replication. The size of each origin has not been established with certainlty but it's safe to assume that 100 bp is a good estimate. There are about 100,000 origin sequences but it's unlikely that all of them are functional or necessary. It's reasonable to assume that only 30,000 - 50,000 are real origins and that means 0.3% of the genome is devoted to origins of replication.
  • Regulatory sequences: The transcription of every gene is controlled by sequences that lie outside of the genes, usually at the 5′ end. The total amount of regulatory sequence is controversial but it seems reasonable to assume about 200 bp per gene for a total of five million bp or less than 0.2% of the genome (0.16%). The most extreme claim is about 2,400 bp per gene or 1.8% of the genome.
  • Scaffold attachment regions (SARs): Human chromatin is organized into about 100,000 large loops. The base of each loop consists of particular proteins bound to specific sequences called anchor loop sequences. The nomenclature is confusing; the original term (SAR) isn't as popular today as it was 35 years ago but that doesn't change the fact that about 0.3% of the genome is required to organize chromatin.
  • Transposons: Most of the transposon-related sequencs in our genome are just fragments of defective transposons but there are a few active ones. They account for only a tiny fraction of the genome.
  • Viruses: Functional virus DNA sequences account for less than 0.1% of the genome.

If you add up all the functional DNA from this list, you get to somewhere between 2% and 3% of the genome.


Image credit: Wikipedia.

Monday, February 08, 2021

The 20th anniversary of the human genome sequence: 3. How many genes?

This week marks the 20th anniversary of the publication of the first drafts of the human genome sequence. Science choose to celebrate the achievement with a series of articles that had little to say about the scientific discoveries arising out of the sequencing project; one of the articles praised the opennesss of sequence data without mentioning that the journal had violated its own policy on openness by publishing the Celera sequence [The 20th anniversary of the human genome sequence: 1. Access to the data and the complicity of Science].

I've decided to post a few articles about the human genome beginning with one on finishing the sequence. In this post I'll summarize the latest data on the number of genes in the human genome.

Saturday, February 06, 2021

The 20th anniversary of the human genome sequence:
2. Finishing the sequence

It's been 20 years since the first drafts of the human genome sequence were published. These first drafts from the International Human Genome Project (IHGP) and Celera were far from complete. The IHGP sequence covered about 82% of the genome and it contained about 250,000 gaps and millions of sequencing errors.

Celera never published an updated sequences but IHPG published a "finished" sequence in October 2004. It covered about 92% of the genome and had "only" 300 gaps. The error rate of the finished sequence was down to 10-5.

International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931-945. doi: 10.1038/nature03001

We've known for many decades that the correct size of the human genome is close to 3,200,000 kb or 3.2 Gb. There's isn't a more precise number because different individuals have different amounts of DNA. The best average estimate was 3,286 Gb based on the sequence of 22 autosomes, one X chromosome, and one Y chromosome (Morton 1991). The amount of actual nucleotide sequence in the latest version of the reference genome (GRCh38.p13) is 3,110,748,599 bp and the estimated total size is 3,272,116,950 bp based on estimating the size of the remaining gaps. This means that 95% of the genome has been sequenced. [see How much of the human genome has been sequenced? for a discussion of what's missing.]

Recent advances in sequencing technology have produced sequence data covering the repetitive regions in the gaps and the first complete sequence of a human chromosome (X) was published in 2019 [First complete sequence of a human chromosome]. It's now possible to complete the human genome reference sequence by sequencing at least one individual but I'm not sure that the effort and the expense are worth it.


Image credit the figure is from Miga et al. (2019)

Miga, K.H., Koren, S., Rhie, A., Vollger, M.R., Gershman, A., Bzikadze, A., Brooks, S., Howe, E., Porubsky, D., Logsdon, G.A. et al. (2019) Telomere-to-telomere assembly of a complete human X chromosome. Nature 585:79-84. [doi: 10.1038/s41586-020-2547-7]

Morton, N.E. (1991) Parameters of the human genome. Proceedings of the National Academy of Sciences 88:7474-7476. [doi: 10.1073/pnas.88.17.7474]

The 20th anniversary of the human genome sequence: 1. Access to the data and the complicity of Science

The first drafts of the human genome sequence were published 20 years ago. The paper from the International Human Genome Project (IHGP) was published in Nature on Febuary 15, 2001 and the paper from Celera was published in Science on February 16, 2001.

The original agreement was to publish both papers in Science but IHGP refused to publish their sequence in that journal when it choose to violate its own policy by allowing Celera to restrict access to its data. I highly recommend James Shreeve's book The Genome War for the history behind these publications. It paints an accurate, but not pretty, picture of science and politics.

Lander, E., Linton, L., Birren, B., Nusbaum, C., Zody, M., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A. and Sougnez, C. (2001) Initial sequencing and analysis of the human genome. Nature 409:860-921. doi: 10.1038/35057062

Venter, J., Adams, M., Myers, E., Li, P., Mural, R., Sutton, G., Smith, H., Yandell, M., Evans, C., Holt, R., Gocayne, J., Amanatides, P., Ballew, R., Huson, D., Wortman, J., Zhang, Q., Kodira, C., Zheng, X., Chen, L., Skupski, M., Subramanian, G., Thomas, P., Zhang, J., Gabor Miklos, G., Nelson, C., Broder, S., Clark, A., Nadeau, J., McKusick, V. and Zinder, N. (2001) The sequence of the human genome. Science 291:1304 - 1351. doi: 10.1126/science.1058040

Thursday, December 31, 2020

On the importance of controls

When doing an exeriment, it's important to keep the number of variables to a minimum and it's important to have scientific controls. There are two types of controls. A negative control covers the possibility that you will get a signal by chance; for example, if you are testing an enzyme to see whether it degrades sugar then the negative control will be a tube with no enzyme. Some of the sugar may degrade spontaneoulsy and you need to know this. A positive control is when you deliberately add something that you know will give a positive result; for example, if you are doing a test to see if your sample contains protein then you want to add an extra sample that contains a known amount of protein to make sure all your reagents are working.

Lots of controls are more complicated than the examples I gave but the principle is important. It's true that some experiments don't appear to need the appropriate controls but that may be an illusion. The controls might still be necessary in order to properly interpret the results but they're not done because they are very difficult. This is often true of genomics experiments.

Saturday, December 19, 2020

What do believers in epigenetics think about junk DNA?

I've been writing some stuff about epigenetics so I've been reading papers on how to define the term [What the heck is epigenetics? ]. Turns out there's no universal definition but I discovered that scientists who write about epigenetics are passionate believers in epigenetics no matter how you define it. Surprisingly (not!), there seems to be a correlation between belief in epigenetics and other misconceptions such as the classic misunderstanding of the Central Dogma of Molecular Biology and rejection of junk DNA [The Extraordinary Human Epigenome]

Here's an illustraton of this correlation from the introduction to a special issue on epigenetics in Philosophical Transactions B.

Ganesan, A. (2018) Epigenetics: the first 25 centuries, Philosophical Transactions B. 373: 20170067. [doi: 10.1098/rstb.2017.0067]

Epigenetics is a natural progression of genetics as it aims to understand how genes and other heritable elements are regulated in eukaryotic organisms. The history of epigenetics is briefly reviewed, together with the key issues in the field today. This themed issue brings together a diverse collection of interdisciplinary reviews and research articles that showcase the tremendous recent advances in epigenetic chemical biology and translational research into epigenetic drug discovery.

In addition to the misconceptions, the text (see below) emphasizes the heritable nature of epigenetic phenomena. This idea of heritablity seems to be a dominant theme among epigenetic believers.

A central dogma became popular in biology that equates life with the sequence DNA → RNA → protein. While the central dogma is fundamentally correct, it is a reductionist statement and clearly there are additional layers of subtlety in ‘how’ it is accomplished. Not surprisingly, the answers have turned out to be far more complex than originally imagined, and we are discovering that the phenotypic diversity of life on Earth is mirrored by an equal diversity of hereditary processes at the molecular level. This lies at the heart of modern day epigenetics, which is classically defined as the study of heritable changes in phenotype that occur without an underlying change in genome sequence. The central dogma's focus on genes obscures the fact that much of the genome does not code for genes and indeed such regions were derogatively lumped together as ‘junk DNA’. In fact, these non-coding regions increase in proportion as we climb up the evolutionary tree and clearly play a critical role in defining what makes us human compared with other species.

At the risk of bearting a dead horse, I'd like to point out that the author is wrong about the Central Dogma and wrong about junk DNA. He's right about the heritablitly of some epigenetic phenomena such as methylation of DNA but that fact has been known for almost five decades and so far it hasn't caused a noticable paradigm shift, unless I missed it [Restriction, Modification, and Epigenetics].


Saturday, December 05, 2020

Mouse traps Michael Denton

Michael Denton is a New Zealand biochemist, a Senior Fellow at the Discovery Institute, and the author of two Intelligent Design Creationist books: Evolution: A Theory in Crisis (1985) and Nature's Destiny (1998).

He has just read Michael Behe's latest book and he (Denton) is impressed [Praise for Behe’s Latest: “Facts Before Theory”]:

Behe brings out more forcibly than any other author I have recently read just how vacuous and biased are the criticisms of his work and of the ID position in general by so many mainstream academic defenders of Darwinism. And what is so telling about his many wonderfully crafted responses to his Darwinian critics is that it is Behe who is putting the facts before theory while his many detractors — Kenneth Miller, Jerry Coyne, Larry Moran, Richard Lenski, and others — are putting theory before the facts. In short, this volume shows that it is Behe rather than his detractors who is carefully following the evidence.

I don't know what planet Michael Denton is living on—probably the same one as Michael Behe—but let's make one thing clear about facts and evidence. Behe's entire argument is based on the "fact" that he can't see how Darwin's theory of natural selection can account for the evolution of complex features: therefore god(s) must have done it. This is NOT putting facts before theory and it is NOT carefully following the evidence.

It's just a somewhat sophisticated version of god of the gaps based on Behe's lack of understanding of the basic mechanisms of evolution.

(See, Of mice and Michael, where I explain why Michael Behe fails to answer my critique of The Edge of Evolution.)


Tuesday, December 01, 2020

Of mice and Michael

Michael Behe has published a book containing most of his previously published responses to critics. I was anxious to see how he dealt with my criticisms of The Edge of Evolution but I was disappointed to see that, for the most part, he has just copied excerpts from his 2014 blog posts (pp. 335-355).

I think it might be worthwhile to review the main issues so you can see for yourself whether Michael Behe really answered his critics as the title of his most recent book claims. You can check out the dueling blog posts at the end of this summary to see how the discussion evolved in real time more than four years ago.

Many Sandwalk readers participated in the debate back then and some of them are quoted in Behe's book although he usually just identifies them as commentators.

My Summary

Michael Behe has correctly indentified an extremely improbably evolution event; namely, the development of chloroquine resistance in the malaria parasite. This is an event that is close to the edge of evolution, meaning that more complex events of this type are beyond the edge of evolution and cannot occur naturally. However, several of us have pointed out that his explanation of how that event occurred is incorrect. This is important because he relies on his flawed interpretation of chloroquine resistance to postulate that many observed events in evolution could not possibly have occurred by natural means. Therefore, god(s) must have created them.

In his response to this criticism, he completely misses the point and fails to understand that what is being challenged is his misinterpretation of the mechanisms of evolution and his understanding of mutations.


The main point of The Edge of Evolution is that many of the beneficial features we see could only have evolved by selecting for a number of different mutations where none of the individual mutations confer a benefit by themselves. Behe claims that these mutations had to occur simultaneously or at least close together in time. He argues that this is possible in some cases but in most cases the (relatively) simultaneous occurrence of multiple mutations is beyond the edge of evolution. The only explanation for the creation of these beneficial features is god(s).

Tuesday, November 17, 2020

Using modified nucleotides to make mRNA vaccines

The key features of the mRNA vaccines are the use of modified nucleotides in their synthesis and the use of lipid nanoparticles to deliver them to cells. The main difference between the Pfizer/BioNTech vaccine and the Moderna vaccine is in the delivery system. The lipid vescicules used by Moderna are somewhat more stable and the vaccine doesn't need to be kept constantly at ultra-low temperatures.

Both vaccines use modified RNAs. They synthesize the RNA using modified nucleotides based on variants of uridine; namely, pseudouridine, N1-methylpseudouridine and 5-methylcytidine. (The structures of the nucleosides are from Andries et al., 2015).) The best versions are those that use both 5-methylcytidine and N1-methylpseudouridine.

I'm not an expert on these mRNAs and their delivery systems but the way I understand it is that regular RNA is antigenic—it induces antibodies against it, presumably when it is accidently released from the lipid vesicles outside of the cell. The modified versions are much less antigenic. As an added bonus, the modified RNA is more stable and more efficiently translated.

Two of the key papers are ...

Andries, O., Mc Cafferty, S., De Smedt, S.C., Weiss, R., Sanders, N.N. and Kitada, T. (2015) "N1-methylpseudouridine-incorporated mRNA outperforms pseudouridine-incorporated mRNA by providing enhanced protein expression and reduced immunogenicity in mammalian cell lines and mice." Journal of Controlled Release 217: 337-344. [doi: 10.1016/j.jconrel.2015.08.051]

Pardi, N., Tuyishime, S., Muramatsu, H., Kariko, K., Mui, B.L., Tam, Y.K., Madden, T.D., Hope, M.J. and Weissman, D. (2015) "Expression kinetics of nucleoside-modified mRNA delivered in lipid nanoparticles to mice by various routes." Journal of Controlled Release 217: 345-351. [doi: 10.1016/j.jconrel.2015.08.007]


Sunday, November 15, 2020

Why is the Central Dogma so hard to understand?

The Central Dogma of molecular biology states ...

... once (sequential) information has passed into protein it cannot get out again (F.H.C. Crick, 1958).

The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be transferred from protein to either protein or nucleic acid (F.H.C. Crick, 1970).

This is not difficult to understand since Francis Crick made it very clear in his original 1958 paper and again in his 1970 paper in Nature [see Basic Concepts: The Central Dogma of Molecular Biology]. There's nothing particularly complicated about the Central Dogma. It merely states the obvious fact that sequence information can flow from nucleic acid to protein but not the other way around.

So, why do so many scientists have trouble grasping this simple idea? Why do they continue to misinterpret the Central Dogma while quoting Crick? I seems obvious that they haven't read the paper(s) they are referencing.

I just came across another example of such ignorance and it is so outrageous that I just can't help sharing it with you. Here's a few sentences from a recent review in the 2020 issue of Annual Reviews of Genomics and Human Genetics (Zerbino et al., 2020).

Once the role of DNA was proven, genes became physical components. Protein-coding genes could be characterized by the genetic code, which was determined in 1965, and could thus be defined by the open reading frames (ORFs). However, exceptions to Francis Crick's central dogma of genes as blueprints for protein synthesis (Crick, 1958) were already being uncovered: first tRNA and rRNA and then a broad variety of noncoding RNAs.

I can't imagine what the authors were thinking when they wrote this. If the Central Dogma actually said that the only role for genes was to make proteins then surely the discovery of tRNA and rRNA would have refuted the Central Dogma and relegated it to the dustbin of history. So why bother even mentioning it in 2020?


Crick, F.H.C. (1958) On protein synthesis. Symp. Soc. Exp. Biol. XII:138-163. [PDF]

Crick, F. (1970) Central Dogma of Molecular Biology. Nature 227, 561-563. [PDF file]

Zerbino, D.R., Frankish, A. and Flicek, P. (2020) "Progress, Challenges, and Surprises in Annotating the Human Genome." Annual review of genomics and human genetics 21:55-79. [doi: 10.1146/annurev-genom-121119-083418]

Wednesday, November 11, 2020

On the misrepresentation of facts about lncRNAs

I've been complaining for years about how opponents of junk DNA misrepresent and distort the scientific literature. The same complaints apply to the misrepresentation of data on alternative splicing and on the prevalence of noncoding genes. Sometimes the misrepresentation is subtle so you hardly notice it.

I'm going to illustrate subtle misrepresentation by quoting a recent commentary on lncRNAs that's just been published in BioEssays. The main part of the essay deals with ways of determining the function of lncRNAs with an emphasis on the sructures of RNA and RNA-protein complexes. The authors don't make any specific claims about the number of functional RNAs in humans but it's clear from the context that they think this number is very large.

Wednesday, October 07, 2020

Undergraduate education in biology: no vision, no change

I was looking at the Vision and Change document the other day and it made me realize that very little has changed in undergraduate education. I really shouldn't be surprised since I reached the same conclusion in 2015—six years after the recommendations were published [Vision and Change] [Why can't we teach properly?].

The main recommendations of Vision and Change are that undergraduate education should adopt the proven methods of student-centered education and should focus on core concepts rather than memorization of facts. Although there has been some progress, it's safe to say that neither of these goals have been achieved in the vast majority of biology classes, including biochemistry and molecular biology classes.

Things are getting even worse in this time of COVID-19 because more and more classes are being taught online and there seems to be general agreement that this is okay. It is not okay. Online didactic lectures go against everything in the Vision and Change document. It may be possible to develop online courses that practice student-centered, concept teaching that emphasizes critical thinking but I've seen very few attempts.

Here are a couple of quotations from Vision and Change that should stimulate your thinking.

Traditionally, introductory biology [and biochemistry] courses have been offered as three lectures a week, with, perhaps, an accompanying two- or three-hour laboratory. This approach relies on lectures and a textbook to convey knowledge to the student and then tests the student's acquisition of that knowledge with midterm and final exams. Although many traditional biology courses include laboratories to provide students with hands-on experiences, too often these "experiences" are not much more than guided exercises in which finding the right answer is stressed while providing students with explicit instructions telling them what to do and when to do it.
"Appreciating the scientific process can be even more important than knowing scientific facts. People often encounter claims that something is scientifically known. If they understand how science generates and assesses evidence bearing on these claims, they possess analytical methods and critical thinking skills that are relevant to a wide variety of facts and concepts and can be used in a wide variety of contexts.”

National Science Foundation, Science and Technology Indicators, 2008

If you are a student and this sounds like your courses, then you should demand better. If you are an instructor and this sounds like one of your courses then you should be ashamed; get some vision and change [The Student-Centered Classroom].

Although the definition of student-centered learning may vary from professor to professor, faculty generally agree that student-centered classrooms tend to be interactive, inquiry-driven, cooperative, collaborative, and relevant. Three critical components are consistent throughout the literature, providing guidelines that faculty can apply when developing a course. Student-centered courses and curricula take into account student knowledge and experience at the start of a course and articulate clear learning outcomes in shaping instructional design. Then they provide opportunities for students to examine and discuss their understanding of the concepts presented, offering frequent and varied feedback as part of the learing process. As a result, student-centered science classrooms and assignments typically involve high levels of student-student and student-faculty interaction; connect the course subject matter to topics students find relevant; minimize didactic presentations; reflect diverse views of scientific inquiry, including data presentation, argumentation, and peer review; provide ongoing feedback to both the student and professor about the student's learning progress; and explicitly address learning how to learn.

This is a critical time for science education since science is under attack all over the world. We need to make sure that university students are prepared to deal with scientific claims and counter-claims for the rest of their lives after they leave university. This means that they have to be skilled at critical thinking and that's a skill that can only be taught in a student-centered classroom where students can practice argumentation and learn the importance of evidence. Memorizing the enzymes of the Krebs Cycle will not help them understand climate change or why they should wear a mask in the middle of a pandemic.


Saturday, October 03, 2020

On the importance of random genetic drift in modern evolutionary theory

The latest issue of New Scientist has a number of articles on evolution. All of them are focused on extending and improving the current theory of evolution, which is described as Darwin's version of natural selection [New Scientist doesn't understand modern evolutionary theory].

Most of the criticisms come from a group who want to extend the evolutionary synthesis (EES proponents). Their main goal is to advertise mechanisms that are presumed to enhance adaptation but that weren't explicitly included in the Modern Synthesis that was put together in the late 1940s.

One of the articles addresses random genetic drift [see Survival of the ... luckiest]. The emphasis in this short article is on the effects of drift in small populations and it gives examples of reduced genetic diversity in small populations.

Wednesday, September 30, 2020

New Scientist doesn't understand modern evolutionary theory

New Scientist has devoted much of their September 26th issue to evolution, but not in a good way. Their emphasis is on 13 ways that we must rethink evolution. Readers of this blog are familiar with this theme because New Scientist is talking about the Extended Evolutionary Synthesis (EES)—a series of critiques of the Modern Synthesis in an attempt to overthrow or extend it [The Extended Evolutionary Synthesis - papers from the Royal Society meeting].

My main criticsm of EES is that its proponents demonstrate a remarkable lack of understanding of modern evolutionary theory and they direct most of their attacks against the old adaptationist version of the Modern Synthesis that was popular in the 1950s. For the most part, EES proponents missed the revolution in evolutionary theory that occrred in the late 1960s with the development of Neutral Theory, Nearly-Neutral Theory, and the importance of random genetic drift. EES proponents have shown time and time again that they have not bothered to read a modern textbook on population genetics.

Tuesday, September 22, 2020

The Function Wars Part VIII: Selected effect function and de novo genes

Discussions about the meaning of the word "function" have been going on for many decades, especially among philosphers who love that sort of thing. The debate intensified following the ENCODE publicity hype disaster in 2012 where ENCODE researchers used the word function an entirely inappropriate manner in order to prove that there was no junk in our genome. Since then, a cottege indiustry based on discussing the meaning of function has grown up in the scientific literature and dozens of papers have been published. This may have enhanced a lot of CV's but none of these papers has proposed a rigorous definition of function that we can rely on to distinguish functional DNA from junk DNA.

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.
Stephen Jay Gould (1982)

That doesn't mean that all of the papers have been completely useless. The net result has been to focus attention on the one reliable definition of function that most biologists can accept; the selected effect function. The selected effect function is defined as ...

Friday, August 07, 2020

Alan McHughen defends his views on junk DNA

Alan McHughen is the author of a recently published book titled DNA Demystified. I took issue with his stance on junk DNA [More misconceptions about junk DNA - what are we doing wrong?] and he has kindly replied to my email message. Here's what he said ...

Thursday, August 06, 2020

More misconceptions about junk DNA - what are we doing wrong?

I'm actively following the views of most science writers on junk DNA to see if they are keeping up on the latest results. The latest book is DNA Demystified by Alan McHughen, a molecular geneticist at the University California, Riverside. It's published by Oxford University Press, the same publisher that published John Parrington's book the deeper genome. Parrington's book was full of misleading and incorrect statements about the human genome so I was anxious to see if Oxford had upped it's game.1, 2

You would think that any book with a title like DNA Demystified would contain the latest interpretations of DNA and genomes, especially with a subtitle like "Unraveling the double Helix." Unfortunately, the book falls far short of its objectives. I don't have time to discuss all of its shortcomings so let's just skip right to the few paragraphs that discuss junk DNA (p.46). I want to emphasize that this is not the main focus of the book. I'm selecting it because it's what I'm interested in and because I want to get a feel for how correct and accurate scientific information is, or is not, being accepted by practicing scientists. Are we falling for fake news?

Saturday, August 01, 2020

ENCODE 3: A lesson in obfuscation and opaqueness

The Encyclopedia of DNA Elements (ENCODE) is a large-scale, and very expensive, attempt to map all of the functional elements in the human genome.

The preliminary study (ENCODE 1) was published in 2007 and the main publicity campaign surrounding that study focused on the fact that much of the human genome was transcribed. The implication was that most of the genome is functional. [see: The ENCODE publicity campaign of 2007].

The ENCODE 2 results were published in 2012 and the publicity campaign emphasized that up to 80% of our genome is functional. Many stories in the popular press touted the death of junk DNA. [see: What did the ENCODE Consortium say in 2012]

Both of these publicity campaigns, and the published conclusions, were heavily criticized for not understanding the distinction between fortuitous transcription and real genes and for not understanding the difference between fortuitous binding sites and functional binding sites. Hundreds of knowledgeable scientists pointed out that it was ridiculous for ENCODE researchers to claim that most of the human genome is functional based on their data. They also pointed out that ENCODE researchers ignored most of the evidence supporting junk DNA.

ENCODE 3 has just been published and the hype has been toned down considerably. Take a look at the main publicity article just published by Nature (ENCODE 3). The Nature article mentions ENCODE 1 and ENCODE 2 but it conveniently ignores the fact that Nature heavily promoted the demise of junk DNA back in 2007 and 2012. The emphasis now is not on how much of the genome is functional—the main goal of ENCODE—but on how much data has been generated and how many papers have been published. You can read the entire article and not see any mention of previous ENCODE/Nature claims. In fact, they don't even tell you how many genes ENCODE found or how many functional regulatory sites were detected.

The News and Views article isn't any better (Expanded ENCODE delivers invaluable genomic encyclopedia). Here's the opening paragraph of that article ...
Less than 2% of the human genome encodes proteins. A grand challenge for genomic sciences has been mapping the functional elements — the regions that determine the extent to which genes are expressed — in the remaining 98% of our DNA. The Encyclopedia of DNA Elements (ENCODE) project, among other large collaborative efforts, was established in 2003 to create a catalogue of these functional elements and to outline their roles in regulating gene expression. In nine papers in Nature, the ENCODE consortium delivers the third phase of its valuable project.1
You'd think with such an introduction that you would be about to learn how much of the genome is functional according to ENCODE 3 but you will be disappointed. There's nothing in that article about the number of genes, the number of regulatory sites, or the number of other functional elements in the human genome. It almost as if Nature wants to tell you about all of the work involved in "mapping the functional elements" without ever describing the results and conclusions. This is in marked contrast to the Nature publicity campaigns of 2007 and 2012 where they were more than willing to promote the (incorrect) conclusions.

In 2020 Nature seems to be more interested in obfuscation and opaqueness. One other thing is certain, the Nature editors and writers aren't the least bit interested in discussing their previous claims about 80% of the genome being functional!

I guess we'll have to rely on the ENCODE Consortium itself to give us a summary of their most recent findings. The summary paper has an intriguing title (Perspectives on ENCODE) that almost makes you think they will revisit the exaggerated claims of 2007 and 2012. No such luck. However, we do learn a little bit about the human genome.
  • 20,225 protein-coding genes [almost 1000 more than the best published estimates - LAM]
  • 37,595 noncoding genes [I strongly doubt they have evidence for that many functional genes]
  • 2,157,387 open chromatin regions [what does this mean?]
  • 1,224,154 transcription factor binding sites [how many are functional?]
That's it. The ENCODE Consortium seems to have learned only two things in 2012. They learned that it's better to avoid mentioning how much of the genome is functional in order to avoid controversy and criticism and they learned that it's best to ignore any of their previous claims for the same reason. This is not how science is supposed to work but the ENCODE Consortium has never been good at showing us how science is supposed to work.

Note: I've looked at some of the papers to try and find out if ENCODE stands by it's previous claim that most the genome is functional but they all seem to be written in a way that avoids committing to such a percentage or addressing the criticisms from 2007 and 2012. The only exception is a paper stating that cis-regulatory elements occupy 7.9% of the human genome (Expanded encyclopaedias of DNA elements in the human and mouse genomes). Please let me know if you come across anything interesting in those papers.


1. Isn't it about time to stop dwelling on the fact that 2% (actually less than 1%) of our genome encodes protein? We've known for decades that there are all kinds of other functional regions of the genome. No knowledgeable scientist thinks that the remaining 98% (99%) has no function.