More Recent Comments

Wednesday, April 07, 2021

Bold predictions for human genomics by 2030

After spending several years working on a book about the human genome I've come to the realization that the field of genomics is not delivering on its promise to help us understand what's in your genome. In fact, genomics researchers have by and large impeded progress by coming up with false claims that need to be debunked.

My view is not widely shared by today's researchers who honestly believe they have made tremendous progress and will make even more as long as they get several billion dollars to continue funding their research. This view is nicely summarized in a Scientific American article from last fall that's really just a precis of an article that first appeared in Nature. The Nature article was written by employees of the National Human Genome Research Institute (NHGRI) at the National Institutes of Health in Bethesda, MD, USA (Green et al., 2020). Its purpose is to promote the work that NHGRI has done in the past and to summarize its strategic vision for the future. At the risk of oversimplifying, the strategic vision is "more of the same."

Green, E.D., Gunter, C., Biesecker, L.G., Di Francesco, V., Easter, C.L., Feingold, E.A., Felsenfeld, A.L., Kaufman, D.J., Ostrander, E.A. and Pavan, W.J. and 20 others (2020) Strategic vision for improving human health at The Forefront of Genomics. Nature 586:683-692. [doi: 10.1038/s41586-020-2817-4]

Starting with the launch of the Human Genome Project three decades ago, and continuing after its completion in 2003, genomics has progressively come to have a central and catalytic role in basic and translational research. In addition, studies increasingly demonstrate how genomic information can be effectively used in clinical care. In the future, the anticipated advances in technology development, biological insights, and clinical applications (among others) will lead to more widespread integration of genomics into almost all areas of biomedical research, the adoption of genomics into mainstream medical and public-health practices, and an increasing relevance of genomics for everyday life. On behalf of the research community, the National Human Genome Research Institute recently completed a multi-year process of strategic engagement to identify future research priorities and opportunities in human genomics, with an emphasis on health applications. Here we describe the highest-priority elements envisioned for the cutting-edge of human genomics going forward—that is, at ‘The Forefront of Genomics’.

What's interesting are the predictions that the NHGRI makes for 2030—predictions that were highlighted in the Scientific American article. I'm going to post those predictions without comment other than saying that I think they are mostly bovine manure. I'm interested in hearing your comments.

Bold predictions for human genomics by 2030

Some of the most impressive genomics achievements, when viewed in retrospect, could hardly have been imagined ten years earlier. Here are ten bold predictions for human genomics that might come true by 2030. Although most are unlikely to be fully attained, achieving one or more of these would require individuals to strive for something that currently seems out of reach. These predictions were crafted to be both inspirational and aspirational in nature, provoking discussions about what might be possible at The Forefront of Genomics in the coming decade.

  1. Generating and analysing a complete human genome sequence will be routine for any research laboratory, becoming as straightforward as carrying out a DNA purification.
  2. The biological function(s) of every human gene will be known; for non-coding elements in the human genome, such knowledge will be the rule rather than the exception.
  3. The general features of the epigenetic landscape and transcriptional output will be routinely incorporated into predictive models of the effect of genotype on phenotype.
  4. Research in human genomics will have moved beyond population descriptors based on historic social constructs such as race.
  5. Studies that involve analyses of genome sequences and associated phenotypic information for millions of human participants will be regularly featured at school science fairs.
  6. The regular use of genomic information will have transitioned from boutique to mainstream in all clinical settings, making genomic testing as routine as complete blood counts.
  7. The clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation ‘variant of uncertain significance (VUS)’ obsolete.
  8. An individual’s complete genome sequence along with informative annotations will, if desired, be securely and readily accessible on their smartphone.
  9. Individuals from ancestrally diverse backgrounds will benefit equitably from advances in human genomics.
  10. Breakthrough discoveries will lead to curative therapies involving genomic modifications for dozens of genetic diseases.

I predict that nine years from now (2030) we will still be dealing with scientists who think that most of our genome is functional; that most human protein-coding genes produce many different proteins by alternative splicing; that epigenetics is useful; that there are more noncoding genes than protein-coding genes; that the leading scientists in the 1960 and 70s were incredibly stupid to suggest junk DNA; that almost every transcription factor binding site is biologically relevant; that most transposon-related sequences have a mysterious (still unknown) function; that it's still a mystery why humans are so much more complex than chimps; and that genomics will eventually solve all problems by 2040.

Why in the world, you might ask, would we still be dealing with issues like that? Because of genomics.


Saturday, April 03, 2021

"Dark matter" as an argument against junk DNA

Opponents of junk DNA have been largely unsuccessful in demonstrating that most of our genome is functional. Many of them are vaguely aware of the fact that "no function" (i.e. junk) is the default hypothesis and the onus is on them to come up with evidence of function. In order to shift, or obfuscate, this burden of proof they have increasingly begun to talk about the "dark matter" of the genome. The idea is to pretend that most of the genome is a complete mystery so that you can't say for certain whether it is junk or functional.

One of the more recent attempts appears in the "Journal Club" section of Nature Reviews Genetics. It focuses on repetitive DNA.

Before looking at that article, let's begin by summarizing what we already know about repetitive DNA. It includes highly repetitive DNA consisting of mutliple tandem repeats of short sequences such as ATATATATAT... or CGACGACGACGA ... or even longer repeats. Much of this is located in centromeric regions of the chromosome and I estimate that functional highly repetitve regions make up about 1% of the genome.[see Centromere DNA and Telomeres]

The other part of repetitive DNA is middle repetitive DNA, which is largely composed of transposons and endogenous viruses, although it includes ribosomal RNA genes and origins of replication. Most of these sequences are dispersed as single copies throughout the genome. It's difficult to determine exactly how much of the genome consists of these middle repetitive sequences but it's certainly more than 50%.

Almost all of the transposon- and virus-related sequences are defective copies of once active transposons and viruses. Most of them are just fragments of the originals. They are evolving at the neutral rate so they look like junk and they behave like junk.1 That's not selfish DNA because is doesn't transpose and it's not "dark matter." These fragments have all the characterstics of nonfunctional junk in our genome.

We know that the C-value paradox is mostly explained by differing amounts of repetitive DNA in different genomes and this is consistent with the idea that they are junk. We know that less that 10% of our genome is conserved and this fits in with that conclusion. Finally, we know that genetic load arguments indicate that most our genome must be impervious to mutation. Combined, these are all powerful bits of evidence and logic in favor of repetitive sequences being mostly junk DNA.

Now let's look at what Neil Gemmell says in this article.

Gemmell, N.J. (2021) Repetitive DNA: genomic dark matter matters. Nature Reviews Genetics:1-1. [doi: 10.1038/s41576-021-00354-8]

"Repetitive DNA sequences were found in hundreds of thousands, and sometimes millions, of copies in the genomes of most eukaryotes. while widespread and evolutionarily conserved, the function of these repeats was unknown. Provocatively, Britten and Kohne concluded 'a concept that is repugnant to us is that about half of the DNA of higher organisms is trivial or permanently inert.'”"

That's from Britten and Kohne (1968) and it's true that more than 50 years ago those workers didn't like the idea of junk DNA. Britten argued that most of this repetitive DNA was likely to be involved in regulation. Gemmell goes on to describe centromeres and telomeres and mentions that most repetitive DNA was thought to be junk.

"... the idea that much of the genome is junk, maintained and perpetuated by random chance, seemed as broadly unsatisfactory to me as it had to the original authors. Enthralled by the mystery of why half our genome is repetitive DNA, I have followed this field ever since."

Gemmell is not alone. In spite of all the evidence for junk DNA, the majority of scientists don't like the fact that most of our genome is junk. Here's how he justifies his continued skepticism.

"But it was not until the 2000s, as full eukaryotic genome sequences emerged, that we discovered that the repetitive non-coding regions of our genome harbour large numbers of promoters, enhancers, transcription factor binding sites and regulatory RNAs that control gene expression. More recently, the importance of repetitive DNA in both structural and regulatory processes has emerged, but much remains to be discovered and understood. It is time to shine further light on this genomic dark matter."

This appears to be the ENCODE publicity campaign legacy rearing its ugly head once more. Most Sandwalk readers know that the presence of transcription factor binding sites, RNA polymerase binding sites, and junk RNA is exactly what one would predict from a genome full of defective transposons. Most of us know that a big fat sloppy genome is bound to contain millions of spurious binding sites for transcription factors so this says nothing about function.

Apparently Gemmell's skepticism doesn't apply to the ENCODE results so he still thinks that all those bits and pieces of transposons are mysterious bits of dark matter that could be several billion base pairs of functional DNA. I don't know what he imagines they could be doing.


Photo Credit: The photo shows human chromosomes labelled with a telomere probe (yellow), from Christoher Counter at Duke University.

1. In my book, I cover this in a section called "If it walks like a duck ..." It's a form of abductive reasoning.

Britten, R. and Kohne, D. (1968) Repeated Sequences in DNA. Science 161:529-540. [doi: 10.1126/science.161.3841.529]

Friday, April 02, 2021

Off to the publisher!

The first draft of my book is ready to be sent to my publisher.

Text by Laurence A. Moran

Cover art and figures by Gordon L. Moran

  • 11 chapters
  • 112,000 words (+ preface and glossary)
  • about 370 pages (estimated)
  • 26 figures
  • 305 notes
  • 400 references

©Laurence A. Moran


Wednesday, March 17, 2021

I think I'll skip this meeting

I just received an invitation to a meeting ...

On behalf of the international organizing committee, we would like to invite you to a conference to be held in Neve Ilan, near Jerusalem, from 4-8 October 2021, entitled ‘Potential and Limitations of Evolutionary Processes’. The main goal of this interdisciplinary, international conference is to bring together scientists and scholars who hold a range of viewpoints on the potential and possible limitations of various undirected chemical and biological processes.

The conference will include presentations from a broad spectrum of disciplines, including chemistry, biochemistry, biology, origin of life, evolution, mathematics, cosmology and philosophy. Open-floor discussion will be geared towards delineating mechanistic details, with a view to discussing in such a way that speakers and participants feel comfortable expressing different opinions and different interpretations of the data, in the spirit of genuine academic inquiry.

I'm pretty sure I got this invite because I attended the Royal Society Meeting on New trends in evolutionary biology: biological, philosophical and social science perspectives back in 2016. That meeting was a big disappointment because the proponents of extending the modern synthesis didn't have much of a case [Kevin Laland's new view of evolution].

I was curious to see what kind of followup the organizers of this new meeting were planning so I checked out the website at: Potential and Limitations of Evolutionary Processes. Warning bells went off immediately when I saw the list of topics.

  • Fine-Tuning of the Universe
  • The Origin of Life
  • Origin & Fine-Tuning of the Genetic Code
  • Origin of Novel Genes
  • Origin of Functional Islands in Protein Sequence Space
  • Origin of Multi-Component Molecular Machines
  • Fine-Tuning of Molecular Systems
  • Fine-Tuning in Complex Biological Systems
  • Evolutionary Waiting Times
  • History of Life & Comparative Genomics

This is a creationist meeting. A little checking shows that three of the four organizers, Russ Carlson, Anthony Futerman, and Siegfried Scherer, are creationists. (I don't know about the other organizer, Joel Sussman, but in this case guilt by association seems appropriate.)

I don't think I'll book a flight to Israel.


Happy St. Patrick's Day!

Happy St. Patrick's Day! These are my great-grandparents Thomas Keys Foster, born in County Tyrone on September 5, 1852 and Eliza Ann Job, born in Fintona, County Tyrone on August 18, 1852. Thomas came to Canada in 1876 to join his older brother, George, on his farm near London, Ontario, Canada. Eliza came the following year and worked on the same farm. Thomas and Eliza decided to move out west where they got married in 1882 in Winnipeg, Manitoba, Canada.

The couple obtained a land grant near Salcoats, Saskatchewan, a few miles south of Yorkton, where they build a sod house and later on a wood frame house that they named "Fairview" after a hill in Ireland overlooking the house where Eliza was born. That's where my grandmother, Ella, was born.

Other ancestors in this line came from the adjacent counties of Donegal (surname Foster) and Fermanagh (surnames Keys, Emerson, Moore) and possibly Londonderry (surname Job).

One of the cool things about studying your genealogy is that you can find connections to almost everyone. This means you can celebrate dozens of special days. In my case it was easy to find other ancestors from England, Scotland, Netherlands, Germany, France, Spain, Poland, Lithuania, Belgium, Ukraine, Russia, and the United States. Today, we will be celebrating St. Patrick's Day. It's rather hectic keeping up with all the national holidays but somebody has to keep the traditions alive!

It's nice to have an excuse to celebrate, especially when it means you can drink beer. However, I would be remiss if I didn't mention one little (tiny, actually) problem. Since my maternal grandmother is pure Irish, I should be 25% Irish but my DNA results indicate that I'm only 4% Irish. That's probalby because my Irish ancestors were Anglicans and were undoubtedly the descendants of settlers from England, Wales, and Scotland who moved to Ireland in the 1600s. This explains why they don't have very Irish-sounding names.

I don't mention this when I'm in an Irish pub.


Monday, March 15, 2021

Is science the only way of knowing?

Most of us learned that science provides good answers to all sort of questions ranging from whether a certain drug is useful in treating COVID-19 to whether humans evolved from primitive apes. A more interesting question is whether there are any limitations to science or whether there are any other effective ways of knowing. The question is related to the charge of "scientism," which is often used as a pejorative term to describe those of us who think that science is the only way of knowing.

I've discussed these issue many times of this blog so I won't rehash all the arguments. Suffice to say that there are two definitions of science; the broad definition and the narrow one. The narrow definition says that science is merely the activity carried out by geologists, chemists, physicists, and biologists. Using this definition it would be silly to say that science is the only way of knowing. The broad definition can be roughly described as: science is a way of knowing that relies on evidence, logic (rationality), and healthy skepticism.

The broad definition is the one preferred by many philosophers and it goes something like this ...

Unfortunately neither "science" nor any other established term in the English language covers all the disciplines that are parts of this community of knowledge disciplines. For lack of a better term, I will call them "science(s) in the broad sense." (The German word "Wissenschaft," the closest translation of "science" into that language, has this wider meaning; that is, it includes all the academic specialties, including the humanities. So does the Latin "scientia.") Science in a broad sense seeks knowledge about nature (natural science), about ourselves (psychology and medicine), about our societies (social science and history), about our physical constructions (technological science), and about our thought construction (linguistics, literary studies, mathematics, and philosophy). (Philosophy, of course, is a science in this broad sense of the word.)

Sven Ove Hanson "Defining Pseudoscience and Science" in Philosophy of Pseudescience: Reconsidering the Demarcation Problem.

Friday, March 12, 2021

The bad news from Ghent

A group of scientists, mostly from the University of Ghent1 (Belgium), have posted a paper on bioRxiv.

Lorenzi, L., Chiu, H.-S., Cobos, F.A., Gross, S., Volders, P.-J., Cannoodt, R., Nuytens, J., Vanderheyden, K., Anckaert, J. and Lefever, S. et al. (2019) The RNA Atlas, a single nucleotide resolution map of the human transcriptome. bioRxiv:807529. [doi: 10.1101/807529]

The human transcriptome consists of various RNA biotypes including multiple types of non-coding RNAs (ncRNAs). Current ncRNA compendia remain incomplete partially because they are almost exclusively derived from the interrogation of small- and polyadenylated RNAs. Here, we present a more comprehensive atlas of the human transcriptome that is derived from matching polyA-, total-, and small-RNA profiles of a heterogeneous collection of nearly 300 human tissues and cell lines. We report on thousands of novel RNA species across all major RNA biotypes, including a hitherto poorly-cataloged class of non-polyadenylated single-exon long non-coding RNAs. In addition, we exploit intron abundance estimates from total RNA-sequencing to test and verify functional regulation by novel non-coding RNAs. Our study represents a substantial expansion of the current catalogue of human ncRNAs and their regulatory interactions. All data, analyses, and results are available in the R2 web portal and serve as a basis to further explore RNA biology and function.

They spent a great deal of effort identifying RNAs from 300 human samples in order to construct an extensive catalogue of five kinds of transcripts: mRNAs, lncRNAs, antisenseRNAs, miRNAs, and circularRNAs. The paper goes off the rails in the first paragraph of the Results section where they immediately equate transcripts wiith genes. They report the following:

  • 19,107 mRNA genes (188 novel)
  • 18,387 lncRNA genes (13,175 novel)
  • 7,309 asRNA genes (2,519 novel)
  • 5,427 miRNAs
  • 5,427 circRNAs

Is science a social construct?

Richard Dawkins has written an essay for The Spectator in which he says,

"[Science is not] a social construct. It’s simply true. Or at least truth is real and science is the best way we have of finding it. ‘Alternative ways of knowing’ may be consoling, they may be sincere, they may be quaint, they may have a poetic or mythic beauty, but the one thing they are not is true. As well as being real, moreover, science has a crystalline, poetic beauty of its own.

The essay is not particularly provocative but it did provoke Jerry Coyne who pointed out that, "The profession of science" can be contrued as a social construct. In this sense Jerry is agreeing with his former supervisor, Richard Lewontin1 who wrote,

"Science is a social institution about which there is a great deal of misunderstanding, even among those who are part of it. We think that science is an institution, a set of methods, a set of people, a great body of knowledge that we call scientific, is somehow apart from the forces that rule our everyday lives and tha goven the structure of our society... The problems that science deals with, the ideas that it uses in investigating those problems, even the so-called scientific results that come out of scientific investigation, are all deeply influenced by predispositions that derive from the society in which we live. Scientists do not begin life as scientists after all, but as social beings immersed in a family, a state, a productive structure, and they view nature through a lens that has been molded by their social structure."

Coincidently, I just happened to be reading Science Fictions an excellent book by Stuart Ritchie who also believes that science is a social construct but he has a slighly different take on the matter.

"Science has cured diseases, mapped the brain, forcasted the climate, and split the atom; it's the best method we have of figuring out how the universe works and of bending it to our will. It is, in other words, our best way of moving towards the truth. Of course, we might never get there—a glance at history shows us hubristic it is to claim any facts as absolute or unchanging. For ratcheting our way towards better knowledge about the world, though, the methods of science is as good as it gets.

But we can't make progress withthose methods alone. It's not enough to make a solitary observation in your lab; you must also convince other scientists that you've discovered something real. This is where the social part comes. Philosophers have long discussed how important it is for scientists to show their fellow researchers how they came to their conclusions.

Dawkins, Coyne, Lewontin, and Ritchie are all right in different ways. Dawkins is talking about science as a way of knowing, although he restricts his definition of science to the natural sciences. The others are referring to the practice of science, or as Jerry Coyne puts it, the profession. It's true that the methods of science are the best way we have to get at the truth and it's true that the way of knowing is not a social construct in any meanigful sense.

Jerry Coyne is right to point out that the methods are employed by human scientists (he's also restricting the practice of science to scientists) and humans are fallible. In that sense, the enterprise of (natural) science is a social construct. Lewontin warns us that scientists have biases and prejudices and that may affect how they do science.

Ritchie makes a diffferent point by emphasizing that (natural) science is a collective endeavor and that "truth" often requires a consensus. That's the sense in which science is social. This is supposed to make science more robust, according to Ritchie, because real knowledge only emerges after carefull and skeptical scrutiny by other scientists. His book is mostly about how that process isn't working and why science is in big trouble. He's right about that.

I think it's important to distinguish between science as a way of knowing and the behavior and practice of scientists. The second one is affected by society and its flaws are well-known but the value of science as way of knowing can't be so easily dismissed.


1. The book is actually a series of lectures (The Massey Lectures) that Lewontin gave in Toronto (Ontario, Canada) in 1990. I attended those lectures.

Tuesday, February 16, 2021

The 20th anniversary of the human genome sequence:
6. Nature doubles down on ENCODE results

Nature has now published a series of articles celebrating the 20th anniversary of the publication of the draft sequences of the human genome [Genome revolution]. Two of the articles are about free access to information and, unlike a similar article in Science, the Nature editors aren't shy about mentioning an important event from 2001; namely, the fact that Science wasn't committed to open access.

By publishing the Human Genome Project’s first paper, we worked with a publicly funded initiative that was committed to data sharing. But the journal acknowledged there would be challenges to maintaining the free, open flow of information, and that the research community might need to make compromises to these principles, for example when the data came from private companies. Indeed, in 2001, colleagues at Science negotiated publishing the draft genome generated by Celera Corporation in Rockville, Maryland. The research paper was immediately free to access, but there were some restrictions on access to the full data.

Friday, February 12, 2021

The 20th anniversary of the human genome sequence:
5. 90% of our genome is junk

This is the fifth (and last) post in celebration of the 20th anniversary of publishing the draft sequence. The first four posts dealt with: (1) the way Science chose to commemorate the occasion [Access to the data]; (2) finishing the sequence; (3) the number of genes; and (4) the amount of functional DNA in the genome.

Back in 2001, knowledgeable scientists knew that most of the human genome is junk and the sequence confirmed that knowledge. Subsequent work on the human genome over the past 20 years has provided additional evidence of junk DNA so that we can now be confident that something like 90% of our genome is junk DNA. Here's a list of data and arguments that support that claim.

Wednesday, February 10, 2021

The 20th anniversary of the human genome sequence:
4. Functional DNA in our genome

We know a lot more about the human genome than we did when the draft sequences were published 20 years ago. One of the most important discoveries is the recognition and extent of true functional sequences in the genome. Genes are one example of such functional sequence but only a minor component (about 1.4%). Most of the functional regions of the genome are not genes.

Here's a list of functional DNA in our genome other than the functional part of genes.

  • Centromeres: There are 24 different centromeres and the average size is four million base pairs. Most of this is repetitive DNA and it adds up to about 3% of the genome. The total amount of centromeric DNA ranges from 2%-10% in different individuals. It's unlikely that all of the centromeric DNA is essential; about 1% seems to be a good estimate.
  • Telomeres: Telomeres are repetivie DNA sequences at the ends of chromosomes. They are required for the proper replication of DNA and they take up about 0.1% of the genome sequence.
  • Origins of replication: DNA replication begins at origins of replication. The size of each origin has not been established with certainlty but it's safe to assume that 100 bp is a good estimate. There are about 100,000 origin sequences but it's unlikely that all of them are functional or necessary. It's reasonable to assume that only 30,000 - 50,000 are real origins and that means 0.3% of the genome is devoted to origins of replication.
  • Regulatory sequences: The transcription of every gene is controlled by sequences that lie outside of the genes, usually at the 5′ end. The total amount of regulatory sequence is controversial but it seems reasonable to assume about 200 bp per gene for a total of five million bp or less than 0.2% of the genome (0.16%). The most extreme claim is about 2,400 bp per gene or 1.8% of the genome.
  • Scaffold attachment regions (SARs): Human chromatin is organized into about 100,000 large loops. The base of each loop consists of particular proteins bound to specific sequences called anchor loop sequences. The nomenclature is confusing; the original term (SAR) isn't as popular today as it was 35 years ago but that doesn't change the fact that about 0.3% of the genome is required to organize chromatin.
  • Transposons: Most of the transposon-related sequencs in our genome are just fragments of defective transposons but there are a few active ones. They account for only a tiny fraction of the genome.
  • Viruses: Functional virus DNA sequences account for less than 0.1% of the genome.

If you add up all the functional DNA from this list, you get to somewhere between 2% and 3% of the genome.


Image credit: Wikipedia.

Monday, February 08, 2021

The 20th anniversary of the human genome sequence: 3. How many genes?

This week marks the 20th anniversary of the publication of the first drafts of the human genome sequence. Science choose to celebrate the achievement with a series of articles that had little to say about the scientific discoveries arising out of the sequencing project; one of the articles praised the opennesss of sequence data without mentioning that the journal had violated its own policy on openness by publishing the Celera sequence [The 20th anniversary of the human genome sequence: 1. Access to the data and the complicity of Science].

I've decided to post a few articles about the human genome beginning with one on finishing the sequence. In this post I'll summarize the latest data on the number of genes in the human genome.

Saturday, February 06, 2021

The 20th anniversary of the human genome sequence:
2. Finishing the sequence

It's been 20 years since the first drafts of the human genome sequence were published. These first drafts from the International Human Genome Project (IHGP) and Celera were far from complete. The IHGP sequence covered about 82% of the genome and it contained about 250,000 gaps and millions of sequencing errors.

Celera never published an updated sequences but IHPG published a "finished" sequence in October 2004. It covered about 92% of the genome and had "only" 300 gaps. The error rate of the finished sequence was down to 10-5.

International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931-945. doi: 10.1038/nature03001

We've known for many decades that the correct size of the human genome is close to 3,200,000 kb or 3.2 Gb. There's isn't a more precise number because different individuals have different amounts of DNA. The best average estimate was 3,286 Gb based on the sequence of 22 autosomes, one X chromosome, and one Y chromosome (Morton 1991). The amount of actual nucleotide sequence in the latest version of the reference genome (GRCh38.p13) is 3,110,748,599 bp and the estimated total size is 3,272,116,950 bp based on estimating the size of the remaining gaps. This means that 95% of the genome has been sequenced. [see How much of the human genome has been sequenced? for a discussion of what's missing.]

Recent advances in sequencing technology have produced sequence data covering the repetitive regions in the gaps and the first complete sequence of a human chromosome (X) was published in 2019 [First complete sequence of a human chromosome]. It's now possible to complete the human genome reference sequence by sequencing at least one individual but I'm not sure that the effort and the expense are worth it.


Image credit the figure is from Miga et al. (2019)

Miga, K.H., Koren, S., Rhie, A., Vollger, M.R., Gershman, A., Bzikadze, A., Brooks, S., Howe, E., Porubsky, D., Logsdon, G.A. et al. (2019) Telomere-to-telomere assembly of a complete human X chromosome. Nature 585:79-84. [doi: 10.1038/s41586-020-2547-7]

Morton, N.E. (1991) Parameters of the human genome. Proceedings of the National Academy of Sciences 88:7474-7476. [doi: 10.1073/pnas.88.17.7474]

The 20th anniversary of the human genome sequence: 1. Access to the data and the complicity of Science

The first drafts of the human genome sequence were published 20 years ago. The paper from the International Human Genome Project (IHGP) was published in Nature on Febuary 15, 2001 and the paper from Celera was published in Science on February 16, 2001.

The original agreement was to publish both papers in Science but IHGP refused to publish their sequence in that journal when it choose to violate its own policy by allowing Celera to restrict access to its data. I highly recommend James Shreeve's book The Genome War for the history behind these publications. It paints an accurate, but not pretty, picture of science and politics.

Lander, E., Linton, L., Birren, B., Nusbaum, C., Zody, M., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A. and Sougnez, C. (2001) Initial sequencing and analysis of the human genome. Nature 409:860-921. doi: 10.1038/35057062

Venter, J., Adams, M., Myers, E., Li, P., Mural, R., Sutton, G., Smith, H., Yandell, M., Evans, C., Holt, R., Gocayne, J., Amanatides, P., Ballew, R., Huson, D., Wortman, J., Zhang, Q., Kodira, C., Zheng, X., Chen, L., Skupski, M., Subramanian, G., Thomas, P., Zhang, J., Gabor Miklos, G., Nelson, C., Broder, S., Clark, A., Nadeau, J., McKusick, V. and Zinder, N. (2001) The sequence of the human genome. Science 291:1304 - 1351. doi: 10.1126/science.1058040

Thursday, December 31, 2020

On the importance of controls

When doing an exeriment, it's important to keep the number of variables to a minimum and it's important to have scientific controls. There are two types of controls. A negative control covers the possibility that you will get a signal by chance; for example, if you are testing an enzyme to see whether it degrades sugar then the negative control will be a tube with no enzyme. Some of the sugar may degrade spontaneoulsy and you need to know this. A positive control is when you deliberately add something that you know will give a positive result; for example, if you are doing a test to see if your sample contains protein then you want to add an extra sample that contains a known amount of protein to make sure all your reagents are working.

Lots of controls are more complicated than the examples I gave but the principle is important. It's true that some experiments don't appear to need the appropriate controls but that may be an illusion. The controls might still be necessary in order to properly interpret the results but they're not done because they are very difficult. This is often true of genomics experiments.

Saturday, December 19, 2020

What do believers in epigenetics think about junk DNA?

I've been writing some stuff about epigenetics so I've been reading papers on how to define the term [What the heck is epigenetics? ]. Turns out there's no universal definition but I discovered that scientists who write about epigenetics are passionate believers in epigenetics no matter how you define it. Surprisingly (not!), there seems to be a correlation between belief in epigenetics and other misconceptions such as the classic misunderstanding of the Central Dogma of Molecular Biology and rejection of junk DNA [The Extraordinary Human Epigenome]

Here's an illustraton of this correlation from the introduction to a special issue on epigenetics in Philosophical Transactions B.

Ganesan, A. (2018) Epigenetics: the first 25 centuries, Philosophical Transactions B. 373: 20170067. [doi: 10.1098/rstb.2017.0067]

Epigenetics is a natural progression of genetics as it aims to understand how genes and other heritable elements are regulated in eukaryotic organisms. The history of epigenetics is briefly reviewed, together with the key issues in the field today. This themed issue brings together a diverse collection of interdisciplinary reviews and research articles that showcase the tremendous recent advances in epigenetic chemical biology and translational research into epigenetic drug discovery.

In addition to the misconceptions, the text (see below) emphasizes the heritable nature of epigenetic phenomena. This idea of heritablity seems to be a dominant theme among epigenetic believers.

A central dogma became popular in biology that equates life with the sequence DNA → RNA → protein. While the central dogma is fundamentally correct, it is a reductionist statement and clearly there are additional layers of subtlety in ‘how’ it is accomplished. Not surprisingly, the answers have turned out to be far more complex than originally imagined, and we are discovering that the phenotypic diversity of life on Earth is mirrored by an equal diversity of hereditary processes at the molecular level. This lies at the heart of modern day epigenetics, which is classically defined as the study of heritable changes in phenotype that occur without an underlying change in genome sequence. The central dogma's focus on genes obscures the fact that much of the genome does not code for genes and indeed such regions were derogatively lumped together as ‘junk DNA’. In fact, these non-coding regions increase in proportion as we climb up the evolutionary tree and clearly play a critical role in defining what makes us human compared with other species.

At the risk of bearting a dead horse, I'd like to point out that the author is wrong about the Central Dogma and wrong about junk DNA. He's right about the heritablitly of some epigenetic phenomena such as methylation of DNA but that fact has been known for almost five decades and so far it hasn't caused a noticable paradigm shift, unless I missed it [Restriction, Modification, and Epigenetics].


Saturday, December 05, 2020

Mouse traps Michael Denton

Michael Denton is a New Zealand biochemist, a Senior Fellow at the Discovery Institute, and the author of two Intelligent Design Creationist books: Evolution: A Theory in Crisis (1985) and Nature's Destiny (1998).

He has just read Michael Behe's latest book and he (Denton) is impressed [Praise for Behe’s Latest: “Facts Before Theory”]:

Behe brings out more forcibly than any other author I have recently read just how vacuous and biased are the criticisms of his work and of the ID position in general by so many mainstream academic defenders of Darwinism. And what is so telling about his many wonderfully crafted responses to his Darwinian critics is that it is Behe who is putting the facts before theory while his many detractors — Kenneth Miller, Jerry Coyne, Larry Moran, Richard Lenski, and others — are putting theory before the facts. In short, this volume shows that it is Behe rather than his detractors who is carefully following the evidence.

I don't know what planet Michael Denton is living on—probably the same one as Michael Behe—but let's make one thing clear about facts and evidence. Behe's entire argument is based on the "fact" that he can't see how Darwin's theory of natural selection can account for the evolution of complex features: therefore god(s) must have done it. This is NOT putting facts before theory and it is NOT carefully following the evidence.

It's just a somewhat sophisticated version of god of the gaps based on Behe's lack of understanding of the basic mechanisms of evolution.

(See, Of mice and Michael, where I explain why Michael Behe fails to answer my critique of The Edge of Evolution.)


Tuesday, December 01, 2020

Of mice and Michael

Michael Behe has published a book containing most of his previously published responses to critics. I was anxious to see how he dealt with my criticisms of The Edge of Evolution but I was disappointed to see that, for the most part, he has just copied excerpts from his 2014 blog posts (pp. 335-355).

I think it might be worthwhile to review the main issues so you can see for yourself whether Michael Behe really answered his critics as the title of his most recent book claims. You can check out the dueling blog posts at the end of this summary to see how the discussion evolved in real time more than four years ago.

Many Sandwalk readers participated in the debate back then and some of them are quoted in Behe's book although he usually just identifies them as commentators.

My Summary

Michael Behe has correctly indentified an extremely improbably evolution event; namely, the development of chloroquine resistance in the malaria parasite. This is an event that is close to the edge of evolution, meaning that more complex events of this type are beyond the edge of evolution and cannot occur naturally. However, several of us have pointed out that his explanation of how that event occurred is incorrect. This is important because he relies on his flawed interpretation of chloroquine resistance to postulate that many observed events in evolution could not possibly have occurred by natural means. Therefore, god(s) must have created them.

In his response to this criticism, he completely misses the point and fails to understand that what is being challenged is his misinterpretation of the mechanisms of evolution and his understanding of mutations.


The main point of The Edge of Evolution is that many of the beneficial features we see could only have evolved by selecting for a number of different mutations where none of the individual mutations confer a benefit by themselves. Behe claims that these mutations had to occur simultaneously or at least close together in time. He argues that this is possible in some cases but in most cases the (relatively) simultaneous occurrence of multiple mutations is beyond the edge of evolution. The only explanation for the creation of these beneficial features is god(s).

Tuesday, November 17, 2020

Using modified nucleotides to make mRNA vaccines

The key features of the mRNA vaccines are the use of modified nucleotides in their synthesis and the use of lipid nanoparticles to deliver them to cells. The main difference between the Pfizer/BioNTech vaccine and the Moderna vaccine is in the delivery system. The lipid vescicules used by Moderna are somewhat more stable and the vaccine doesn't need to be kept constantly at ultra-low temperatures.

Both vaccines use modified RNAs. They synthesize the RNA using modified nucleotides based on variants of uridine; namely, pseudouridine, N1-methylpseudouridine and 5-methylcytidine. (The structures of the nucleosides are from Andries et al., 2015).) The best versions are those that use both 5-methylcytidine and N1-methylpseudouridine.

I'm not an expert on these mRNAs and their delivery systems but the way I understand it is that regular RNA is antigenic—it induces antibodies against it, presumably when it is accidently released from the lipid vesicles outside of the cell. The modified versions are much less antigenic. As an added bonus, the modified RNA is more stable and more efficiently translated.

Two of the key papers are ...

Andries, O., Mc Cafferty, S., De Smedt, S.C., Weiss, R., Sanders, N.N. and Kitada, T. (2015) "N1-methylpseudouridine-incorporated mRNA outperforms pseudouridine-incorporated mRNA by providing enhanced protein expression and reduced immunogenicity in mammalian cell lines and mice." Journal of Controlled Release 217: 337-344. [doi: 10.1016/j.jconrel.2015.08.051]

Pardi, N., Tuyishime, S., Muramatsu, H., Kariko, K., Mui, B.L., Tam, Y.K., Madden, T.D., Hope, M.J. and Weissman, D. (2015) "Expression kinetics of nucleoside-modified mRNA delivered in lipid nanoparticles to mice by various routes." Journal of Controlled Release 217: 345-351. [doi: 10.1016/j.jconrel.2015.08.007]


Sunday, November 15, 2020

Why is the Central Dogma so hard to understand?

The Central Dogma of molecular biology states ...

... once (sequential) information has passed into protein it cannot get out again (F.H.C. Crick, 1958).

The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be transferred from protein to either protein or nucleic acid (F.H.C. Crick, 1970).

This is not difficult to understand since Francis Crick made it very clear in his original 1958 paper and again in his 1970 paper in Nature [see Basic Concepts: The Central Dogma of Molecular Biology]. There's nothing particularly complicated about the Central Dogma. It merely states the obvious fact that sequence information can flow from nucleic acid to protein but not the other way around.

So, why do so many scientists have trouble grasping this simple idea? Why do they continue to misinterpret the Central Dogma while quoting Crick? I seems obvious that they haven't read the paper(s) they are referencing.

I just came across another example of such ignorance and it is so outrageous that I just can't help sharing it with you. Here's a few sentences from a recent review in the 2020 issue of Annual Reviews of Genomics and Human Genetics (Zerbino et al., 2020).

Once the role of DNA was proven, genes became physical components. Protein-coding genes could be characterized by the genetic code, which was determined in 1965, and could thus be defined by the open reading frames (ORFs). However, exceptions to Francis Crick's central dogma of genes as blueprints for protein synthesis (Crick, 1958) were already being uncovered: first tRNA and rRNA and then a broad variety of noncoding RNAs.

I can't imagine what the authors were thinking when they wrote this. If the Central Dogma actually said that the only role for genes was to make proteins then surely the discovery of tRNA and rRNA would have refuted the Central Dogma and relegated it to the dustbin of history. So why bother even mentioning it in 2020?


Crick, F.H.C. (1958) On protein synthesis. Symp. Soc. Exp. Biol. XII:138-163. [PDF]

Crick, F. (1970) Central Dogma of Molecular Biology. Nature 227, 561-563. [PDF file]

Zerbino, D.R., Frankish, A. and Flicek, P. (2020) "Progress, Challenges, and Surprises in Annotating the Human Genome." Annual review of genomics and human genetics 21:55-79. [doi: 10.1146/annurev-genom-121119-083418]

Wednesday, November 11, 2020

On the misrepresentation of facts about lncRNAs

I've been complaining for years about how opponents of junk DNA misrepresent and distort the scientific literature. The same complaints apply to the misrepresentation of data on alternative splicing and on the prevalence of noncoding genes. Sometimes the misrepresentation is subtle so you hardly notice it.

I'm going to illustrate subtle misrepresentation by quoting a recent commentary on lncRNAs that's just been published in BioEssays. The main part of the essay deals with ways of determining the function of lncRNAs with an emphasis on the sructures of RNA and RNA-protein complexes. The authors don't make any specific claims about the number of functional RNAs in humans but it's clear from the context that they think this number is very large.

Wednesday, October 07, 2020

Undergraduate education in biology: no vision, no change

I was looking at the Vision and Change document the other day and it made me realize that very little has changed in undergraduate education. I really shouldn't be surprised since I reached the same conclusion in 2015—six years after the recommendations were published [Vision and Change] [Why can't we teach properly?].

The main recommendations of Vision and Change are that undergraduate education should adopt the proven methods of student-centered education and should focus on core concepts rather than memorization of facts. Although there has been some progress, it's safe to say that neither of these goals have been achieved in the vast majority of biology classes, including biochemistry and molecular biology classes.

Things are getting even worse in this time of COVID-19 because more and more classes are being taught online and there seems to be general agreement that this is okay. It is not okay. Online didactic lectures go against everything in the Vision and Change document. It may be possible to develop online courses that practice student-centered, concept teaching that emphasizes critical thinking but I've seen very few attempts.

Here are a couple of quotations from Vision and Change that should stimulate your thinking.

Traditionally, introductory biology [and biochemistry] courses have been offered as three lectures a week, with, perhaps, an accompanying two- or three-hour laboratory. This approach relies on lectures and a textbook to convey knowledge to the student and then tests the student's acquisition of that knowledge with midterm and final exams. Although many traditional biology courses include laboratories to provide students with hands-on experiences, too often these "experiences" are not much more than guided exercises in which finding the right answer is stressed while providing students with explicit instructions telling them what to do and when to do it.
"Appreciating the scientific process can be even more important than knowing scientific facts. People often encounter claims that something is scientifically known. If they understand how science generates and assesses evidence bearing on these claims, they possess analytical methods and critical thinking skills that are relevant to a wide variety of facts and concepts and can be used in a wide variety of contexts.”

National Science Foundation, Science and Technology Indicators, 2008

If you are a student and this sounds like your courses, then you should demand better. If you are an instructor and this sounds like one of your courses then you should be ashamed; get some vision and change [The Student-Centered Classroom].

Although the definition of student-centered learning may vary from professor to professor, faculty generally agree that student-centered classrooms tend to be interactive, inquiry-driven, cooperative, collaborative, and relevant. Three critical components are consistent throughout the literature, providing guidelines that faculty can apply when developing a course. Student-centered courses and curricula take into account student knowledge and experience at the start of a course and articulate clear learning outcomes in shaping instructional design. Then they provide opportunities for students to examine and discuss their understanding of the concepts presented, offering frequent and varied feedback as part of the learing process. As a result, student-centered science classrooms and assignments typically involve high levels of student-student and student-faculty interaction; connect the course subject matter to topics students find relevant; minimize didactic presentations; reflect diverse views of scientific inquiry, including data presentation, argumentation, and peer review; provide ongoing feedback to both the student and professor about the student's learning progress; and explicitly address learning how to learn.

This is a critical time for science education since science is under attack all over the world. We need to make sure that university students are prepared to deal with scientific claims and counter-claims for the rest of their lives after they leave university. This means that they have to be skilled at critical thinking and that's a skill that can only be taught in a student-centered classroom where students can practice argumentation and learn the importance of evidence. Memorizing the enzymes of the Krebs Cycle will not help them understand climate change or why they should wear a mask in the middle of a pandemic.


Saturday, October 03, 2020

On the importance of random genetic drift in modern evolutionary theory

The latest issue of New Scientist has a number of articles on evolution. All of them are focused on extending and improving the current theory of evolution, which is described as Darwin's version of natural selection [New Scientist doesn't understand modern evolutionary theory].

Most of the criticisms come from a group who want to extend the evolutionary synthesis (EES proponents). Their main goal is to advertise mechanisms that are presumed to enhance adaptation but that weren't explicitly included in the Modern Synthesis that was put together in the late 1940s.

One of the articles addresses random genetic drift [see Survival of the ... luckiest]. The emphasis in this short article is on the effects of drift in small populations and it gives examples of reduced genetic diversity in small populations.

Wednesday, September 30, 2020

New Scientist doesn't understand modern evolutionary theory

New Scientist has devoted much of their September 26th issue to evolution, but not in a good way. Their emphasis is on 13 ways that we must rethink evolution. Readers of this blog are familiar with this theme because New Scientist is talking about the Extended Evolutionary Synthesis (EES)—a series of critiques of the Modern Synthesis in an attempt to overthrow or extend it [The Extended Evolutionary Synthesis - papers from the Royal Society meeting].

My main criticsm of EES is that its proponents demonstrate a remarkable lack of understanding of modern evolutionary theory and they direct most of their attacks against the old adaptationist version of the Modern Synthesis that was popular in the 1950s. For the most part, EES proponents missed the revolution in evolutionary theory that occrred in the late 1960s with the development of Neutral Theory, Nearly-Neutral Theory, and the importance of random genetic drift. EES proponents have shown time and time again that they have not bothered to read a modern textbook on population genetics.

Tuesday, September 22, 2020

The Function Wars Part VIII: Selected effect function and de novo genes

Discussions about the meaning of the word "function" have been going on for many decades, especially among philosphers who love that sort of thing. The debate intensified following the ENCODE publicity hype disaster in 2012 where ENCODE researchers used the word function an entirely inappropriate manner in order to prove that there was no junk in our genome. Since then, a cottege indiustry based on discussing the meaning of function has grown up in the scientific literature and dozens of papers have been published. This may have enhanced a lot of CV's but none of these papers has proposed a rigorous definition of function that we can rely on to distinguish functional DNA from junk DNA.

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.
Stephen Jay Gould (1982)

That doesn't mean that all of the papers have been completely useless. The net result has been to focus attention on the one reliable definition of function that most biologists can accept; the selected effect function. The selected effect function is defined as ...

Friday, August 07, 2020

Alan McHughen defends his views on junk DNA

Alan McHughen is the author of a recently published book titled DNA Demystified. I took issue with his stance on junk DNA [More misconceptions about junk DNA - what are we doing wrong?] and he has kindly replied to my email message. Here's what he said ...

Thursday, August 06, 2020

More misconceptions about junk DNA - what are we doing wrong?

I'm actively following the views of most science writers on junk DNA to see if they are keeping up on the latest results. The latest book is DNA Demystified by Alan McHughen, a molecular geneticist at the University California, Riverside. It's published by Oxford University Press, the same publisher that published John Parrington's book the deeper genome. Parrington's book was full of misleading and incorrect statements about the human genome so I was anxious to see if Oxford had upped it's game.1, 2

You would think that any book with a title like DNA Demystified would contain the latest interpretations of DNA and genomes, especially with a subtitle like "Unraveling the double Helix." Unfortunately, the book falls far short of its objectives. I don't have time to discuss all of its shortcomings so let's just skip right to the few paragraphs that discuss junk DNA (p.46). I want to emphasize that this is not the main focus of the book. I'm selecting it because it's what I'm interested in and because I want to get a feel for how correct and accurate scientific information is, or is not, being accepted by practicing scientists. Are we falling for fake news?

Saturday, August 01, 2020

ENCODE 3: A lesson in obfuscation and opaqueness

The Encyclopedia of DNA Elements (ENCODE) is a large-scale, and very expensive, attempt to map all of the functional elements in the human genome.

The preliminary study (ENCODE 1) was published in 2007 and the main publicity campaign surrounding that study focused on the fact that much of the human genome was transcribed. The implication was that most of the genome is functional. [see: The ENCODE publicity campaign of 2007].

The ENCODE 2 results were published in 2012 and the publicity campaign emphasized that up to 80% of our genome is functional. Many stories in the popular press touted the death of junk DNA. [see: What did the ENCODE Consortium say in 2012]

Both of these publicity campaigns, and the published conclusions, were heavily criticized for not understanding the distinction between fortuitous transcription and real genes and for not understanding the difference between fortuitous binding sites and functional binding sites. Hundreds of knowledgeable scientists pointed out that it was ridiculous for ENCODE researchers to claim that most of the human genome is functional based on their data. They also pointed out that ENCODE researchers ignored most of the evidence supporting junk DNA.

ENCODE 3 has just been published and the hype has been toned down considerably. Take a look at the main publicity article just published by Nature (ENCODE 3). The Nature article mentions ENCODE 1 and ENCODE 2 but it conveniently ignores the fact that Nature heavily promoted the demise of junk DNA back in 2007 and 2012. The emphasis now is not on how much of the genome is functional—the main goal of ENCODE—but on how much data has been generated and how many papers have been published. You can read the entire article and not see any mention of previous ENCODE/Nature claims. In fact, they don't even tell you how many genes ENCODE found or how many functional regulatory sites were detected.

The News and Views article isn't any better (Expanded ENCODE delivers invaluable genomic encyclopedia). Here's the opening paragraph of that article ...
Less than 2% of the human genome encodes proteins. A grand challenge for genomic sciences has been mapping the functional elements — the regions that determine the extent to which genes are expressed — in the remaining 98% of our DNA. The Encyclopedia of DNA Elements (ENCODE) project, among other large collaborative efforts, was established in 2003 to create a catalogue of these functional elements and to outline their roles in regulating gene expression. In nine papers in Nature, the ENCODE consortium delivers the third phase of its valuable project.1
You'd think with such an introduction that you would be about to learn how much of the genome is functional according to ENCODE 3 but you will be disappointed. There's nothing in that article about the number of genes, the number of regulatory sites, or the number of other functional elements in the human genome. It almost as if Nature wants to tell you about all of the work involved in "mapping the functional elements" without ever describing the results and conclusions. This is in marked contrast to the Nature publicity campaigns of 2007 and 2012 where they were more than willing to promote the (incorrect) conclusions.

In 2020 Nature seems to be more interested in obfuscation and opaqueness. One other thing is certain, the Nature editors and writers aren't the least bit interested in discussing their previous claims about 80% of the genome being functional!

I guess we'll have to rely on the ENCODE Consortium itself to give us a summary of their most recent findings. The summary paper has an intriguing title (Perspectives on ENCODE) that almost makes you think they will revisit the exaggerated claims of 2007 and 2012. No such luck. However, we do learn a little bit about the human genome.
  • 20,225 protein-coding genes [almost 1000 more than the best published estimates - LAM]
  • 37,595 noncoding genes [I strongly doubt they have evidence for that many functional genes]
  • 2,157,387 open chromatin regions [what does this mean?]
  • 1,224,154 transcription factor binding sites [how many are functional?]
That's it. The ENCODE Consortium seems to have learned only two things in 2012. They learned that it's better to avoid mentioning how much of the genome is functional in order to avoid controversy and criticism and they learned that it's best to ignore any of their previous claims for the same reason. This is not how science is supposed to work but the ENCODE Consortium has never been good at showing us how science is supposed to work.

Note: I've looked at some of the papers to try and find out if ENCODE stands by it's previous claim that most the genome is functional but they all seem to be written in a way that avoids committing to such a percentage or addressing the criticisms from 2007 and 2012. The only exception is a paper stating that cis-regulatory elements occupy 7.9% of the human genome (Expanded encyclopaedias of DNA elements in the human and mouse genomes). Please let me know if you come across anything interesting in those papers.


1. Isn't it about time to stop dwelling on the fact that 2% (actually less than 1%) of our genome encodes protein? We've known for decades that there are all kinds of other functional regions of the genome. No knowledgeable scientist thinks that the remaining 98% (99%) has no function.

Saturday, July 11, 2020

The coronavirus life cycle

The coronavirus life cycle is depicted in a figure from Fung and Liu (2019). See below for a brief description.
The virus particle attaches to receptors on the cell surface (mostly ACE2 in the case of SARS-CoV-2). It is taken into the cell by endocytosis and then the viral membrane fuses with the host membrane releasing the viral RNA. The viral RNA is translated to produce the 1a and 1ab polyproteins, which are cleaved to produce 16 nonstructural proteins (nsps). Most of the nsps assemble to from the replication-transcription complex (RTC). [see Structure and expression of the SARS-CoV-2 (coronavirus) genome]

RTC transcribes the original (+) strand creating (-) strands that are subsequently copied to make more viral (+) strands. RTC also produces a cluster of nine (-) strand subgenomic RNAs (sgRNAs) that are transcribed to make (+) sgRNAs that serve as mRNAs for the production of the structural proteins. N protein (nucleocapsid) binds to the viral (+) strand RNAs to help form new viral particles. The other structural proteins are synthesized in the endoplasmic reticulum (ER) where they assemble to form the protein-membrane virus particle that engulfs the viral RNA.

New virus particles are released when the vesicles fuse with the plasma membrane.

The entire life cycle takes about 10-16 hours and about 100 new virus particles are released before the cell commits suicide by apoptosis.


Fung, T.S. and Liu, D.X. (2019) Human coronavirus: host-pathogen interaction. Annual review of microbiology 73:529-557. [doi: 10.1146/annurev-micro-020518-115759]


Thursday, July 09, 2020

Structure and expression of the SARS-CoV-2 (coronavirus) genome


Coronaviruses are RNA viruses, which means that their genome is RNA, not DNA. All of the coronaviruses have similar genomes but I'm sure you are mostly interested in SARS-CoV-2, the virus that causes COVID-19. The first genome sequence of this virus was determined by Chinese scientists in early January and it was immediately posted on a public server [GenBank MN908947]. The viral RNA came from a patient in intensive care at the Wuhan Yin-Tan Hospital (China). The paper was accepted on Jan. 20th and it appeared in the Feb. 3rd issue of Nature (Zhou et al. 2020).

By the time the paper came out, several universities and pharmaceutical companies had already constructed potential therapeutics and several others had already cloned the genes and were preparing to publish the structures of the proteins.1

By now there are dozens and dozens of sequences of SARS-CoV-2 genomes from isolates in every part of the world. They are all very similar because the mutation rate in these RNA viruses is not high (about 10-6 per nucleotide per replication). The original isolate has a total length of 29,891 nt not counting the poly(A) tail. Note that these RNA viruses are about four times larger than a typical retrovirus; they are the largest known RNA viruses.

Wednesday, July 08, 2020

Where did your chicken come from?

Scientists have sequenced the genomes of modern domesticated chickens and compared them to the genomes of various wild pheasants in southern Asia. It has been known for some time that chickens resemble a species of pheasant called red jungle fowl and this led Charles Darwin to speculate that chickens were domesticated in India. Others have suggested Southeast Asia or China as the site of domestication.

The latest results show that modern chickens probably descend from a subspecies of red jungle fowl that inhabits the region around Myanmar (Wang et al., 2020). The subspecies is Gallus gallus spadiceus and the domesticated chicken subspecies is Gallus gallus domesticus. As you might expect, the two subspecies can interbreed.

The authors looked at a total of 863 genomes of domestic chickens, four species of jungle fowl, and all five subspecies of red jungle fowl. They identified a total of 33.4 million SNPs, which were enough to genetically distinguish between the various species AND the subspecies of red jungle fowl. (Contrary to popular belief, it is quite possible to assign a given genome to a subspecies (race) based entirely on genetic differences.)

The sequence data suggest that chickens were domesticated from wild G. g. spadiceus about 10,000 years ago in the northern part of Southeast Asia. The data also suggest that modern domesticated chickens (G. g. domesticus) from India, Pakistan, and Bangladesh interbred with another subspecies of red jungle fowl (G. g. murghi) after the original domestication. These chickens from South Asia contain substantial contributions from G. g. murghi ranging from 8-22%.

Next time you serve chicken, if someone asks you where it came from you won't be lying if you say it came from Myanmar.


Image credits: BBQ chicken, Creative Common License [Chicken BBQ]
Red Jungle Fowl, Creative Commons License [Red_Junglefowl_-Thailand]
Map: Lawler, A. (2020) Dawn of the chicken revealed in Southeast Asia, Science: 368: 1411.

Wang, M., Thakur, M., Peng, M. et al. (2020) 863 genomes reveal the origin and domestication of chicken. Cell Res (2020) [doi: 10.1038/s41422-020-0349-y]