More Recent Comments

Friday, March 12, 2021

The bad news from Ghent

A group of scientists, mostly from the University of Ghent1 (Belgium), have posted a paper on bioRxiv.

Lorenzi, L., Chiu, H.-S., Cobos, F.A., Gross, S., Volders, P.-J., Cannoodt, R., Nuytens, J., Vanderheyden, K., Anckaert, J. and Lefever, S. et al. (2019) The RNA Atlas, a single nucleotide resolution map of the human transcriptome. bioRxiv:807529. [doi: 10.1101/807529]

The human transcriptome consists of various RNA biotypes including multiple types of non-coding RNAs (ncRNAs). Current ncRNA compendia remain incomplete partially because they are almost exclusively derived from the interrogation of small- and polyadenylated RNAs. Here, we present a more comprehensive atlas of the human transcriptome that is derived from matching polyA-, total-, and small-RNA profiles of a heterogeneous collection of nearly 300 human tissues and cell lines. We report on thousands of novel RNA species across all major RNA biotypes, including a hitherto poorly-cataloged class of non-polyadenylated single-exon long non-coding RNAs. In addition, we exploit intron abundance estimates from total RNA-sequencing to test and verify functional regulation by novel non-coding RNAs. Our study represents a substantial expansion of the current catalogue of human ncRNAs and their regulatory interactions. All data, analyses, and results are available in the R2 web portal and serve as a basis to further explore RNA biology and function.

They spent a great deal of effort identifying RNAs from 300 human samples in order to construct an extensive catalogue of five kinds of transcripts: mRNAs, lncRNAs, antisenseRNAs, miRNAs, and circularRNAs. The paper goes off the rails in the first paragraph of the Results section where they immediately equate transcripts wiith genes. They report the following:

  • 19,107 mRNA genes (188 novel)
  • 18,387 lncRNA genes (13,175 novel)
  • 7,309 asRNA genes (2,519 novel)
  • 5,427 miRNAs
  • 5,427 circRNAs

Is science a social construct?

Richard Dawkins has written an essay for The Spectator in which he says,

"[Science is not] a social construct. It’s simply true. Or at least truth is real and science is the best way we have of finding it. ‘Alternative ways of knowing’ may be consoling, they may be sincere, they may be quaint, they may have a poetic or mythic beauty, but the one thing they are not is true. As well as being real, moreover, science has a crystalline, poetic beauty of its own.

The essay is not particularly provocative but it did provoke Jerry Coyne who pointed out that, "The profession of science" can be contrued as a social construct. In this sense Jerry is agreeing with his former supervisor, Richard Lewontin1 who wrote,

"Science is a social institution about which there is a great deal of misunderstanding, even among those who are part of it. We think that science is an institution, a set of methods, a set of people, a great body of knowledge that we call scientific, is somehow apart from the forces that rule our everyday lives and tha goven the structure of our society... The problems that science deals with, the ideas that it uses in investigating those problems, even the so-called scientific results that come out of scientific investigation, are all deeply influenced by predispositions that derive from the society in which we live. Scientists do not begin life as scientists after all, but as social beings immersed in a family, a state, a productive structure, and they view nature through a lens that has been molded by their social structure."

Coincidently, I just happened to be reading Science Fictions an excellent book by Stuart Ritchie who also believes that science is a social construct but he has a slighly different take on the matter.

"Science has cured diseases, mapped the brain, forcasted the climate, and split the atom; it's the best method we have of figuring out how the universe works and of bending it to our will. It is, in other words, our best way of moving towards the truth. Of course, we might never get there—a glance at history shows us hubristic it is to claim any facts as absolute or unchanging. For ratcheting our way towards better knowledge about the world, though, the methods of science is as good as it gets.

But we can't make progress withthose methods alone. It's not enough to make a solitary observation in your lab; you must also convince other scientists that you've discovered something real. This is where the social part comes. Philosophers have long discussed how important it is for scientists to show their fellow researchers how they came to their conclusions.

Dawkins, Coyne, Lewontin, and Ritchie are all right in different ways. Dawkins is talking about science as a way of knowing, although he restricts his definition of science to the natural sciences. The others are referring to the practice of science, or as Jerry Coyne puts it, the profession. It's true that the methods of science are the best way we have to get at the truth and it's true that the way of knowing is not a social construct in any meanigful sense.

Jerry Coyne is right to point out that the methods are employed by human scientists (he's also restricting the practice of science to scientists) and humans are fallible. In that sense, the enterprise of (natural) science is a social construct. Lewontin warns us that scientists have biases and prejudices and that may affect how they do science.

Ritchie makes a diffferent point by emphasizing that (natural) science is a collective endeavor and that "truth" often requires a consensus. That's the sense in which science is social. This is supposed to make science more robust, according to Ritchie, because real knowledge only emerges after carefull and skeptical scrutiny by other scientists. His book is mostly about how that process isn't working and why science is in big trouble. He's right about that.

I think it's important to distinguish between science as a way of knowing and the behavior and practice of scientists. The second one is affected by society and its flaws are well-known but the value of science as way of knowing can't be so easily dismissed.


1. The book is actually a series of lectures (The Massey Lectures) that Lewontin gave in Toronto (Ontario, Canada) in 1990. I attended those lectures.

Tuesday, February 16, 2021

The 20th anniversary of the human genome sequence:
6. Nature doubles down on ENCODE results

Nature has now published a series of articles celebrating the 20th anniversary of the publication of the draft sequences of the human genome [Genome revolution]. Two of the articles are about free access to information and, unlike a similar article in Science, the Nature editors aren't shy about mentioning an important event from 2001; namely, the fact that Science wasn't committed to open access.

By publishing the Human Genome Project’s first paper, we worked with a publicly funded initiative that was committed to data sharing. But the journal acknowledged there would be challenges to maintaining the free, open flow of information, and that the research community might need to make compromises to these principles, for example when the data came from private companies. Indeed, in 2001, colleagues at Science negotiated publishing the draft genome generated by Celera Corporation in Rockville, Maryland. The research paper was immediately free to access, but there were some restrictions on access to the full data.

Friday, February 12, 2021

The 20th anniversary of the human genome sequence:
5. 90% of our genome is junk

This is the fifth (and last) post in celebration of the 20th anniversary of publishing the draft sequence. The first four posts dealt with: (1) the way Science chose to commemorate the occasion [Access to the data]; (2) finishing the sequence; (3) the number of genes; and (4) the amount of functional DNA in the genome.

Back in 2001, knowledgeable scientists knew that most of the human genome is junk and the sequence confirmed that knowledge. Subsequent work on the human genome over the past 20 years has provided additional evidence of junk DNA so that we can now be confident that something like 90% of our genome is junk DNA. Here's a list of data and arguments that support that claim.

Wednesday, February 10, 2021

The 20th anniversary of the human genome sequence:
4. Functional DNA in our genome

We know a lot more about the human genome than we did when the draft sequences were published 20 years ago. One of the most important discoveries is the recognition and extent of true functional sequences in the genome. Genes are one example of such functional sequence but only a minor component (about 1.4%). Most of the functional regions of the genome are not genes.

Here's a list of functional DNA in our genome other than the functional part of genes.

  • Centromeres: There are 24 different centromeres and the average size is four million base pairs. Most of this is repetitive DNA and it adds up to about 3% of the genome. The total amount of centromeric DNA ranges from 2%-10% in different individuals. It's unlikely that all of the centromeric DNA is essential; about 1% seems to be a good estimate.
  • Telomeres: Telomeres are repetivie DNA sequences at the ends of chromosomes. They are required for the proper replication of DNA and they take up about 0.1% of the genome sequence.
  • Origins of replication: DNA replication begins at origins of replication. The size of each origin has not been established with certainlty but it's safe to assume that 100 bp is a good estimate. There are about 100,000 origin sequences but it's unlikely that all of them are functional or necessary. It's reasonable to assume that only 30,000 - 50,000 are real origins and that means 0.3% of the genome is devoted to origins of replication.
  • Regulatory sequences: The transcription of every gene is controlled by sequences that lie outside of the genes, usually at the 5′ end. The total amount of regulatory sequence is controversial but it seems reasonable to assume about 200 bp per gene for a total of five million bp or less than 0.2% of the genome (0.16%). The most extreme claim is about 2,400 bp per gene or 1.8% of the genome.
  • Scaffold attachment regions (SARs): Human chromatin is organized into about 100,000 large loops. The base of each loop consists of particular proteins bound to specific sequences called anchor loop sequences. The nomenclature is confusing; the original term (SAR) isn't as popular today as it was 35 years ago but that doesn't change the fact that about 0.3% of the genome is required to organize chromatin.
  • Transposons: Most of the transposon-related sequencs in our genome are just fragments of defective transposons but there are a few active ones. They account for only a tiny fraction of the genome.
  • Viruses: Functional virus DNA sequences account for less than 0.1% of the genome.

If you add up all the functional DNA from this list, you get to somewhere between 2% and 3% of the genome.


Image credit: Wikipedia.

Monday, February 08, 2021

The 20th anniversary of the human genome sequence: 3. How many genes?

This week marks the 20th anniversary of the publication of the first drafts of the human genome sequence. Science choose to celebrate the achievement with a series of articles that had little to say about the scientific discoveries arising out of the sequencing project; one of the articles praised the opennesss of sequence data without mentioning that the journal had violated its own policy on openness by publishing the Celera sequence [The 20th anniversary of the human genome sequence: 1. Access to the data and the complicity of Science].

I've decided to post a few articles about the human genome beginning with one on finishing the sequence. In this post I'll summarize the latest data on the number of genes in the human genome.

Saturday, February 06, 2021

The 20th anniversary of the human genome sequence:
2. Finishing the sequence

It's been 20 years since the first drafts of the human genome sequence were published. These first drafts from the International Human Genome Project (IHGP) and Celera were far from complete. The IHGP sequence covered about 82% of the genome and it contained about 250,000 gaps and millions of sequencing errors.

Celera never published an updated sequences but IHPG published a "finished" sequence in October 2004. It covered about 92% of the genome and had "only" 300 gaps. The error rate of the finished sequence was down to 10-5.

International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931-945. doi: 10.1038/nature03001

We've known for many decades that the correct size of the human genome is close to 3,200,000 kb or 3.2 Gb. There's isn't a more precise number because different individuals have different amounts of DNA. The best average estimate was 3,286 Gb based on the sequence of 22 autosomes, one X chromosome, and one Y chromosome (Morton 1991). The amount of actual nucleotide sequence in the latest version of the reference genome (GRCh38.p13) is 3,110,748,599 bp and the estimated total size is 3,272,116,950 bp based on estimating the size of the remaining gaps. This means that 95% of the genome has been sequenced. [see How much of the human genome has been sequenced? for a discussion of what's missing.]

Recent advances in sequencing technology have produced sequence data covering the repetitive regions in the gaps and the first complete sequence of a human chromosome (X) was published in 2019 [First complete sequence of a human chromosome]. It's now possible to complete the human genome reference sequence by sequencing at least one individual but I'm not sure that the effort and the expense are worth it.


Image credit the figure is from Miga et al. (2019)

Miga, K.H., Koren, S., Rhie, A., Vollger, M.R., Gershman, A., Bzikadze, A., Brooks, S., Howe, E., Porubsky, D., Logsdon, G.A. et al. (2019) Telomere-to-telomere assembly of a complete human X chromosome. Nature 585:79-84. [doi: 10.1038/s41586-020-2547-7]

Morton, N.E. (1991) Parameters of the human genome. Proceedings of the National Academy of Sciences 88:7474-7476. [doi: 10.1073/pnas.88.17.7474]

The 20th anniversary of the human genome sequence: 1. Access to the data and the complicity of Science

The first drafts of the human genome sequence were published 20 years ago. The paper from the International Human Genome Project (IHGP) was published in Nature on Febuary 15, 2001 and the paper from Celera was published in Science on February 16, 2001.

The original agreement was to publish both papers in Science but IHGP refused to publish their sequence in that journal when it choose to violate its own policy by allowing Celera to restrict access to its data. I highly recommend James Shreeve's book The Genome War for the history behind these publications. It paints an accurate, but not pretty, picture of science and politics.

Lander, E., Linton, L., Birren, B., Nusbaum, C., Zody, M., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A. and Sougnez, C. (2001) Initial sequencing and analysis of the human genome. Nature 409:860-921. doi: 10.1038/35057062

Venter, J., Adams, M., Myers, E., Li, P., Mural, R., Sutton, G., Smith, H., Yandell, M., Evans, C., Holt, R., Gocayne, J., Amanatides, P., Ballew, R., Huson, D., Wortman, J., Zhang, Q., Kodira, C., Zheng, X., Chen, L., Skupski, M., Subramanian, G., Thomas, P., Zhang, J., Gabor Miklos, G., Nelson, C., Broder, S., Clark, A., Nadeau, J., McKusick, V. and Zinder, N. (2001) The sequence of the human genome. Science 291:1304 - 1351. doi: 10.1126/science.1058040

Thursday, December 31, 2020

On the importance of controls

When doing an exeriment, it's important to keep the number of variables to a minimum and it's important to have scientific controls. There are two types of controls. A negative control covers the possibility that you will get a signal by chance; for example, if you are testing an enzyme to see whether it degrades sugar then the negative control will be a tube with no enzyme. Some of the sugar may degrade spontaneoulsy and you need to know this. A positive control is when you deliberately add something that you know will give a positive result; for example, if you are doing a test to see if your sample contains protein then you want to add an extra sample that contains a known amount of protein to make sure all your reagents are working.

Lots of controls are more complicated than the examples I gave but the principle is important. It's true that some experiments don't appear to need the appropriate controls but that may be an illusion. The controls might still be necessary in order to properly interpret the results but they're not done because they are very difficult. This is often true of genomics experiments.

Saturday, December 19, 2020

What do believers in epigenetics think about junk DNA?

I've been writing some stuff about epigenetics so I've been reading papers on how to define the term [What the heck is epigenetics? ]. Turns out there's no universal definition but I discovered that scientists who write about epigenetics are passionate believers in epigenetics no matter how you define it. Surprisingly (not!), there seems to be a correlation between belief in epigenetics and other misconceptions such as the classic misunderstanding of the Central Dogma of Molecular Biology and rejection of junk DNA [The Extraordinary Human Epigenome]

Here's an illustraton of this correlation from the introduction to a special issue on epigenetics in Philosophical Transactions B.

Ganesan, A. (2018) Epigenetics: the first 25 centuries, Philosophical Transactions B. 373: 20170067. [doi: 10.1098/rstb.2017.0067]

Epigenetics is a natural progression of genetics as it aims to understand how genes and other heritable elements are regulated in eukaryotic organisms. The history of epigenetics is briefly reviewed, together with the key issues in the field today. This themed issue brings together a diverse collection of interdisciplinary reviews and research articles that showcase the tremendous recent advances in epigenetic chemical biology and translational research into epigenetic drug discovery.

In addition to the misconceptions, the text (see below) emphasizes the heritable nature of epigenetic phenomena. This idea of heritablity seems to be a dominant theme among epigenetic believers.

A central dogma became popular in biology that equates life with the sequence DNA → RNA → protein. While the central dogma is fundamentally correct, it is a reductionist statement and clearly there are additional layers of subtlety in ‘how’ it is accomplished. Not surprisingly, the answers have turned out to be far more complex than originally imagined, and we are discovering that the phenotypic diversity of life on Earth is mirrored by an equal diversity of hereditary processes at the molecular level. This lies at the heart of modern day epigenetics, which is classically defined as the study of heritable changes in phenotype that occur without an underlying change in genome sequence. The central dogma's focus on genes obscures the fact that much of the genome does not code for genes and indeed such regions were derogatively lumped together as ‘junk DNA’. In fact, these non-coding regions increase in proportion as we climb up the evolutionary tree and clearly play a critical role in defining what makes us human compared with other species.

At the risk of bearting a dead horse, I'd like to point out that the author is wrong about the Central Dogma and wrong about junk DNA. He's right about the heritablitly of some epigenetic phenomena such as methylation of DNA but that fact has been known for almost five decades and so far it hasn't caused a noticable paradigm shift, unless I missed it [Restriction, Modification, and Epigenetics].


Saturday, December 05, 2020

Mouse traps Michael Denton

Michael Denton is a New Zealand biochemist, a Senior Fellow at the Discovery Institute, and the author of two Intelligent Design Creationist books: Evolution: A Theory in Crisis (1985) and Nature's Destiny (1998).

He has just read Michael Behe's latest book and he (Denton) is impressed [Praise for Behe’s Latest: “Facts Before Theory”]:

Behe brings out more forcibly than any other author I have recently read just how vacuous and biased are the criticisms of his work and of the ID position in general by so many mainstream academic defenders of Darwinism. And what is so telling about his many wonderfully crafted responses to his Darwinian critics is that it is Behe who is putting the facts before theory while his many detractors — Kenneth Miller, Jerry Coyne, Larry Moran, Richard Lenski, and others — are putting theory before the facts. In short, this volume shows that it is Behe rather than his detractors who is carefully following the evidence.

I don't know what planet Michael Denton is living on—probably the same one as Michael Behe—but let's make one thing clear about facts and evidence. Behe's entire argument is based on the "fact" that he can't see how Darwin's theory of natural selection can account for the evolution of complex features: therefore god(s) must have done it. This is NOT putting facts before theory and it is NOT carefully following the evidence.

It's just a somewhat sophisticated version of god of the gaps based on Behe's lack of understanding of the basic mechanisms of evolution.

(See, Of mice and Michael, where I explain why Michael Behe fails to answer my critique of The Edge of Evolution.)


Tuesday, December 01, 2020

Of mice and Michael

Michael Behe has published a book containing most of his previously published responses to critics. I was anxious to see how he dealt with my criticisms of The Edge of Evolution but I was disappointed to see that, for the most part, he has just copied excerpts from his 2014 blog posts (pp. 335-355).

I think it might be worthwhile to review the main issues so you can see for yourself whether Michael Behe really answered his critics as the title of his most recent book claims. You can check out the dueling blog posts at the end of this summary to see how the discussion evolved in real time more than four years ago.

Many Sandwalk readers participated in the debate back then and some of them are quoted in Behe's book although he usually just identifies them as commentators.

My Summary

Michael Behe has correctly indentified an extremely improbably evolution event; namely, the development of chloroquine resistance in the malaria parasite. This is an event that is close to the edge of evolution, meaning that more complex events of this type are beyond the edge of evolution and cannot occur naturally. However, several of us have pointed out that his explanation of how that event occurred is incorrect. This is important because he relies on his flawed interpretation of chloroquine resistance to postulate that many observed events in evolution could not possibly have occurred by natural means. Therefore, god(s) must have created them.

In his response to this criticism, he completely misses the point and fails to understand that what is being challenged is his misinterpretation of the mechanisms of evolution and his understanding of mutations.


The main point of The Edge of Evolution is that many of the beneficial features we see could only have evolved by selecting for a number of different mutations where none of the individual mutations confer a benefit by themselves. Behe claims that these mutations had to occur simultaneously or at least close together in time. He argues that this is possible in some cases but in most cases the (relatively) simultaneous occurrence of multiple mutations is beyond the edge of evolution. The only explanation for the creation of these beneficial features is god(s).

Tuesday, November 17, 2020

Using modified nucleotides to make mRNA vaccines

The key features of the mRNA vaccines are the use of modified nucleotides in their synthesis and the use of lipid nanoparticles to deliver them to cells. The main difference between the Pfizer/BioNTech vaccine and the Moderna vaccine is in the delivery system. The lipid vescicules used by Moderna are somewhat more stable and the vaccine doesn't need to be kept constantly at ultra-low temperatures.

Both vaccines use modified RNAs. They synthesize the RNA using modified nucleotides based on variants of uridine; namely, pseudouridine, N1-methylpseudouridine and 5-methylcytidine. (The structures of the nucleosides are from Andries et al., 2015).) The best versions are those that use both 5-methylcytidine and N1-methylpseudouridine.

I'm not an expert on these mRNAs and their delivery systems but the way I understand it is that regular RNA is antigenic—it induces antibodies against it, presumably when it is accidently released from the lipid vesicles outside of the cell. The modified versions are much less antigenic. As an added bonus, the modified RNA is more stable and more efficiently translated.

Two of the key papers are ...

Andries, O., Mc Cafferty, S., De Smedt, S.C., Weiss, R., Sanders, N.N. and Kitada, T. (2015) "N1-methylpseudouridine-incorporated mRNA outperforms pseudouridine-incorporated mRNA by providing enhanced protein expression and reduced immunogenicity in mammalian cell lines and mice." Journal of Controlled Release 217: 337-344. [doi: 10.1016/j.jconrel.2015.08.051]

Pardi, N., Tuyishime, S., Muramatsu, H., Kariko, K., Mui, B.L., Tam, Y.K., Madden, T.D., Hope, M.J. and Weissman, D. (2015) "Expression kinetics of nucleoside-modified mRNA delivered in lipid nanoparticles to mice by various routes." Journal of Controlled Release 217: 345-351. [doi: 10.1016/j.jconrel.2015.08.007]


Sunday, November 15, 2020

Why is the Central Dogma so hard to understand?

The Central Dogma of molecular biology states ...

... once (sequential) information has passed into protein it cannot get out again (F.H.C. Crick, 1958).

The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be transferred from protein to either protein or nucleic acid (F.H.C. Crick, 1970).

This is not difficult to understand since Francis Crick made it very clear in his original 1958 paper and again in his 1970 paper in Nature [see Basic Concepts: The Central Dogma of Molecular Biology]. There's nothing particularly complicated about the Central Dogma. It merely states the obvious fact that sequence information can flow from nucleic acid to protein but not the other way around.

So, why do so many scientists have trouble grasping this simple idea? Why do they continue to misinterpret the Central Dogma while quoting Crick? I seems obvious that they haven't read the paper(s) they are referencing.

I just came across another example of such ignorance and it is so outrageous that I just can't help sharing it with you. Here's a few sentences from a recent review in the 2020 issue of Annual Reviews of Genomics and Human Genetics (Zerbino et al., 2020).

Once the role of DNA was proven, genes became physical components. Protein-coding genes could be characterized by the genetic code, which was determined in 1965, and could thus be defined by the open reading frames (ORFs). However, exceptions to Francis Crick's central dogma of genes as blueprints for protein synthesis (Crick, 1958) were already being uncovered: first tRNA and rRNA and then a broad variety of noncoding RNAs.

I can't imagine what the authors were thinking when they wrote this. If the Central Dogma actually said that the only role for genes was to make proteins then surely the discovery of tRNA and rRNA would have refuted the Central Dogma and relegated it to the dustbin of history. So why bother even mentioning it in 2020?


Crick, F.H.C. (1958) On protein synthesis. Symp. Soc. Exp. Biol. XII:138-163. [PDF]

Crick, F. (1970) Central Dogma of Molecular Biology. Nature 227, 561-563. [PDF file]

Zerbino, D.R., Frankish, A. and Flicek, P. (2020) "Progress, Challenges, and Surprises in Annotating the Human Genome." Annual review of genomics and human genetics 21:55-79. [doi: 10.1146/annurev-genom-121119-083418]

Wednesday, November 11, 2020

On the misrepresentation of facts about lncRNAs

I've been complaining for years about how opponents of junk DNA misrepresent and distort the scientific literature. The same complaints apply to the misrepresentation of data on alternative splicing and on the prevalence of noncoding genes. Sometimes the misrepresentation is subtle so you hardly notice it.

I'm going to illustrate subtle misrepresentation by quoting a recent commentary on lncRNAs that's just been published in BioEssays. The main part of the essay deals with ways of determining the function of lncRNAs with an emphasis on the sructures of RNA and RNA-protein complexes. The authors don't make any specific claims about the number of functional RNAs in humans but it's clear from the context that they think this number is very large.