More Recent Comments

Monday, August 29, 2022

The creationist view of junk DNA

Here's a recent video podcast (Aug. 23, 1022) from the Institute for Creation Research (sic). It features an interview with Dr. Jeff Tomkins of the ICR where he explains the history of junk DNA and why scientists no longer believe in junk DNA.

Most Sandwalk readers will recognize all the lies and distortions but here's the problem: I suspect that the majority of biologists would pretty much agree with the creationist interpretation. They also believe that junk DNA has been refuted and most of our genome is functional.

That's very sad.

Friday, August 26, 2022

ENCODE and their current definition of "function"

ENCODE has mostly abandoned it's definition of function based on biochemical activity and replaced it with "candidate" function or "likely" function, but the message isn't getting out.

Back in 2012, the ENCODE Consortium announced that 80% of the human genome was functional and junk DNA was dead [What did the ENCODE Consortium say in 2012?]. This claim was widely disputed, causing the ENCODE Consortium leaders to back down in 2014 and restate their goal (Kellis et al. 2014). The new goal is merely to map all the potential functional elements.

... the Encyclopedia of DNA Elements Project [ENCODE] was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types.

The new goal was repeated when the ENCODE III results were published in 2020, although you had to read carefully to recognize that they were no longer claiming to identify functional elements in the genome and they were raising no objections to junk DNA [ENCODE 3: A lesson in obfuscation and opaqueness].

Wednesday, August 24, 2022

Junk DNA vs noncoding DNA

The Wikipedia article on the Human genome contained a reference that I had not seen before.

"Finally DNA that is deleterious to the organism and is under negative selective pressure is called garbage DNA.[43]"

Reference 43 is a chapter in a book.

Pena S.D. (2021) "An Overview of the Human Genome: Coding DNA and Non-Coding DNA". In Haddad LA (ed.). Human Genome Structure, Function and Clinical Considerations. Cham: Springer Nature. pp. 5–7. ISBN 978-3-03-073151-9.

Sérgio Danilo Junho Pena is a human geneticist and professor in the Dept. of Biochemistry and Immunology at the Federal University of Minas Gerais in Belo Horizonte, Brazil. He is a member of the Human Genome Organization council. If you click on the Wikipedia link, it takes you to an excerpt from the book where S.D.J. Pena discusses "Coding and Non-coding DNA."

There are two quotations from that chapter that caught my eye. The first one is,

"Less than 2% of the human genome corresponds to protein-coding genes. The functional role of the remaining 98%, apart from repetitive sequences (constitutive heterochromatin) that appear to have a structural role in the chromosome, is a matter of controversy. Evolutionary evidence suggests that this noncoding DNA has no function—hence the common name of 'junk DNA.'"

Professor Pena then goes on to discuss the ENCODE results pointing out that there are many scientists who disagree with the conclusion that 80% of our genome is functional. He then says,

"Many evolutionary biologists have stuck to their guns in defense of the traditional and evolutionary view that non-coding DNA is 'junk DNA.'"

This is immediately followed by a quote from Dan Graur, implying that he (Graur) is one of the evolutionary biologists who defend the evolutionary view that noncoding DNA is junk.

I'm very interested in tracking down the reason for equating noncoding DNA and junk DNA, especially in contexts where the claim is obviously wrong. So I wrote to Professor Pena—he got his Ph.D. in Canada—and asked him for a primary source that supports the claim that "evolutionary science suggests that this noncoding DNA has no function."

He was kind enough to reply saying that there are multiple sources and he sent me links to two of them. Here's the first one.

I explained that this was somewhat ironic since I had written most of the Wikipedia article on Non-coding DNA and my goal was to refute the idea than noncoding DNA and junk DNA were synonyms. I explained that under the section on 'junk DNA' he would see the following statement that I inserted after writing sections on all those functional noncoding DNA elements.

"Junk DNA is often confused with non-coding DNA[48] but, as documented above, there are substantial fractions of non-coding DNA that have well-defined functions such as regulation, non-coding genes, origins of replication, telomeres, centromeres, and chromatin organizing sites (SARs)."

That's intended to dispel the notion that proponents of junk DNA ever equated noncoding DNA and junk DNA. I suggested that he couldn't use that source as support for his statement.

Here's my response to his second source.

The second reference is to a 2007 article by Wojciech Makalowski,1 a prominent opponent of junk DNA. He says, "In 1972 the late geneticist Susumu Ohno coined the term "junk DNA" to describe all noncoding sections of a genome" but that is a demonstrably false statement in two respects.

First, Ohno did not coin the term "junk DNA" - it was commonly used in discussions about genomes and even appeared in print many years before Ohno's paper. Second, Ohno specifically addresses regulatory sequences in his paper so it's clear that he knew about functional noncoding DNA that was not junk. He also mentions centromeres and I think it's safe to assume that he knew about ribosomal RNA genes and tRNA genes.

The only possible conclusion is that Makalowski is wrong on two counts.

I then asked about the second statement in Professor Pena's article and suggested that it might have been much better to say, "Many evolutionary biologists have stuck to their guns and defend the view that most of human genome is junk." He agreed.

So, what have we learned? Professor Pena is a well-respected scientist and an expert on the human genome. He is on the council of the Human Genome Organization. Yet, he propagated the common myth that noncoding DNA is junk and saw nothing wrong with Makalowski's false reference to Susumu Ohno. Professor Pena himself must be well aware of functional noncoding elements such as regulatory sequences and noncoding genes so it's difficult explain why he would imagine that prominant defenders of junk DNA don't know this.

I think the explanation is that this connection between noncoding DNA and junk DNA is so entrenched in the popular and scientific literature that it is just repeated as a meme without ever considering whether it makes sense.

1. The pdf appears to be a response to a query in Scientific American on February 12, 2007. It may be connected to a Scientific American paper by Khajavinia and Makalowski (2007).

Khajavinia, A., and Makalowski, W. (2007) What is" junk" DNA, and what is it worth? Scientific American, 296:104. [PubMed]

Tuesday, August 23, 2022

Are synonymous mutations mostly neutral or are they deleterious?

A recent paper in Nature claims that 75% of synonymous mutations reduce fitness in yeast. The results were challenged (refuted?) ten weeks later in a manuscript posted on a preprint server.

The first paper was published in the June 23, 2022 issue of Nature (Shen et al., 2022). The authors looked at mutations in 21 nonessential genes in yeast where mutations are known to lower fitness. They created mutations in the coding regions of these genes using the CRISPR-Cas9 editing technique. A total of 1,866 synonymous mutations were created as well as 6,306 non-synonymous mutations and 169 nonsense mutations.

Monday, August 22, 2022

NPR vs CDC on the new COVID-19 guidelines

NPR tweeted out a summary of the new CDC (United States) guidelines on COVID-19. The figure was posted under the name of Dr. Marcus Plescia, chief medical officer for the Association of State and Territorial Health Officials. I've posted a screenshot of the figure on the right.

Before discussing the four bullet points, I want to emphasize that Marcus Plescia issued a press release on August 11, 2022 when the new guidelines came out and it did not mention the points in the NPR figure. In fact, it seems to me that he would not agree with the NPR sumary.

Is every gene associated with cancer?

There has been an enormous expansion of papers on cancer and many of them make a connection with a particular human gene. A recent note in Trends in Genetics revealed that 15,233 human genes have already been mentioned in a cancer paper (de Magalhães, 2022). (I'm pretty sure the author is only referring to portein-coding genes).

The author notes that this association doesn't necessarily mean that there's a cause-and-effect relationship and also notes that justifying a connection between your favorite gene and a cancer grant application is a factor. However, he concludes that,

In genetics and genomics, literally everything is associated with cancer. If a gene has not been associated with cancer yet, it probably means it has not been studied enough and will most likely be associated with cancer in the future. In a scientific world where everything and every gene can be associated with cancer, the challenge is determining which are the key drivers of cancer and more promising therapeutic targets.

I think he's on to something. I predict that all noncoding genes will eventually be associated with cancer as well. Not only that, I predict that several thousand fake genes will also be associated with cancer. It won't be long before there are 100,000 human genes associated with cancer and then the remaining parts of the genome will also be mentioned in cancer papers.

This will mean the end of junk DNA because anything that causes cancer must be part of a functional genome.

I hope my book comes out before this becomes widely known.

de Magalhães, J.P. (2021) Every gene can (and possibly will) be associated with cancer. TRENDS in Genetics 38:216-217 [doi: 10.1016/j.tig.2021.09.005]

Sunday, August 21, 2022

Splicing errors or alternative splicing?

The most important issue in alternative splicing, in my opinion, is whether splice variants are due to splicing errors (= junk RNA) or whether they reflect real biologically relevant alternative splicing.

Unfortunately, this view is not shared by the majority of scientists who work in this field. They are convinced that the vast majority of splice variant transcripts represent real examples of regulation and the main task is to document the extent of alternative splicing and characterize the various mechanisms.

I've written a lot about this topic over the years (see the list of posts at the bottom of this page). The two most important issues are: (1) the frequency of splicing errors and whether it can account for the splice variants and (2) the number of well-established, genuine, examples of biologically relevant alternative splicing and whether that's consistent with the claims.

I managed to post a summary of the data on the accuracy of splicing on the Intron article on Wikipedia and I urge you to take a look at it before it disappears. The bottom line is that splicing is not terribly accurate so we expect to detect a fairly high level of incorrectly spliced transcripts whenever we look at a collection of RNAs from a particular cell line. The expected number of mispliced transcripts is well within the concentrations of 'alternatively spliced' transcripts reported in most studies.

Saturday, August 20, 2022

Editing the 'Intergenic region' article on Wikipedia

Just before getting banned from Wikipedia, I was about to deal with a claim on the Intergenic region article. I had already fixed most of the other problems but there is still this statement in the subsection labeled "Properties."

According to the ENCODE project's study of the human genome, due to "both the expansion of genic regions by the discovery of new isoforms and the identification of novel intergenic transcripts, there has been a marked increase in the number of intergenic regions (from 32,481 to 60,250) due to their fragmentation and a decrease in their lengths (from 14,170 bp to 3,949 bp median length)"[2]

The source is one of the ENCODE papers published in the September 6 edition of Nature (Djebali et al., 2012). The quotation is accurate. Here's the full quotation.

As a consequence of both the expansion of genic regions by the discovery of new isoforms and the identification of novel intergenic transcripts, there has been a marked increase in the number of intergenic regions (from 32,481 to 60,250) due to their fragmentation and a decrease in their lengths (from 14,170 bp to 3,949 bp median length.

What's interesting about that data is what it reveals about the percentage of the genome devoted to intergenic DNA and the percentage devoted to genes. The authors claim that there are 60,250 intergenic regions, which means that there must be more than 60,000 genes.1 The median length of these intergenic regions is 3,949 bp and that means that roughly 204.5 x 106 bp are found in intergenic DNA. That's roughly 7% of the genome depending on which genome size you use. It doesn't mean that all the rest is genes but it sounds like they're saying that about 90% of the genome is occupied by genes.

In case you doubt that's what they're saying, read the rest of the paragraph in the paper.

Concordantly, we observed an increased overlap of genic regions. As the determination of genic regions is currently defined by the cumulative lengths of the isoforms and their genetic association to phenotypic characteristics, the likely continued reduction in the lengths of intergenic regions will steadily lead to the overlap of most genes previously assumed to be distinct genetic loci. This supports and is consistent with earlier observations of a highly interleaved transcribed genome, but more importantly, prompts the reconsideration of the definition of a gene.

It sounds like they are anticipating a time when the discovery of more noncoding genes will eventually lead to a situation where the intergenic regions disappear and all genes will overlap.

Now, as most of you know, the ENCODE papers have been discredited and hardly any knowledgeable scientist thinks there are 60,000 genes that occupy 90% of the genome. But here's the problem. I probably couldn't delete that sentence from Wikipedia because it meets all the criteria of a reliable source (published in Nature by scientists from reputable universities). Recent experience tells me that the Wikipolice Wikipedia editors would have blocked me from deleting it.

The best I could do would be to balance the claim with one from another "reliable source" such as Piovasan et al. (2019) who list the total number of exons and introns and their average sizes allowing you to calculate that protein-coding genes occupy about 35% of the genome. Other papers give slightly higher values for protein-coding genes.

It's hard to get a reliable source on the real number of noncoding genes and their average size but I estimate that there are about 5,000 genes and a generous estimate that they could take up a few percent of the genome. I assume in my upcoming book that genes probably occupy about 45% of the genome because I'm trying to err on the side of function.

An article on Intergenic regions is not really the place to get into a discussion about the number of noncoding genes but in the absence of such a well-sourced explanation the audience will be left with the statement from Djebali et al. and that's extremely misleading. Thus, my preference would be to replace it with a link to some other article where the controversy can be explained, preferably a new article on junk DNA.2

I was going to say,

The total amount of intergenic DNA depends on the size of the genome, the number of genes, and the length of each gene. That can vary widely from species to species. The value for the human genome is controversial because there is no widespread agreement on the number of genes but it's almost certain that intergenic DNA takes up at least 40% of the genome.

I can't supply a specific reference for this statement so it would never have gotten past the Wikipolice Wikpipedia editors. This is a problem that can't be solved because any serious attempt to fix it will probably lead to getting blocked on Wikipedia.

There is one other statement in that section in the article on Intergenic region.

Scientists have now artificially synthesized proteins from intergenic regions.[3]

I would have removed that statement because it's irrelevant. It does not contribute to understanding intergenic regions. It's undoubtedly one of those little factoids that someone has stumbled across and thinks it needs to be on Wikipedia.

Deletion of a statement like that would have met with fierce resistance from the Wikipedia editors because it is properly sourced. The reference is to a 2009 paper in the Journal of Biological Engineering: "Synthesizing non-natural parts from natural genomic template."

1. There are no intergenic regions between the last genes on the end of a chromosome and the telomeres.

2. The Wikipedia editors deleted the Junk DNA article about ten years ago on the grounds that junk DNA had been disproven.

Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A. et al. (2012) Landscape of transcription in human cells. Nature 489:101-108. [doi: 10.1038/nature11233]

Piovesan, A., Antonaros, F., Vitale, L., Strippoli, P., Pelleri, M. C., and Caracausi, M. (2019) Human protein-coding genes and gene feature statistics in 2019. BMC research notes 12:315. [doi: 10.1186/s13104-019-4343-8]

Blocked by Wikipedia!

My account on Wikipedia has been blocked by some editor named Bbb23 after receiving a complaint from another editor named Praxidicae. Praxidicae has been blocking my attempts to edit articles on Intergenic region, Allele, and Non-coding DNA on the grounds that I am not obeying the Wikipedia rules. She has no expertise in science but she claims to be an expert on proper sources.

I have been removing unsourced statements and correcting incorrect ones. I have also attempted to make the articles more relevant by removing extraneous material. I have added material that reflects the scientific consensus on these topics.

Here's the complaint against me (Genome42) as stated by Praxidicae.

Persistent edit warring and refusal to provide sources, this user refuses to acknowledge that we require sources, not just an assessment by a self proclaimed SME. Discussions across multiple pages with said user have failed, including here where there has been a slow burning edit war, as well as personal attacks against other editors (which you can see in the discussions and his own talk page.) Instead of providing sources, he is just removing them because they are "outdated", though TNT has provided more up to date sources, which they've now removed as well. They've also expressed a desire to get other editors including myself to purposely engage in edit warring to get other editors blocked.

The complaint was posted this morning. Bbb23 apparently believed every word of this complaint and blocked me indefinitely 38 minutes later because ...

Disruptive editing, including edit-warring, refusal to collaborate with other editors, claiming that scientific articles can only be edited by experts, e.g., the user

The immediate cause of being blocked was my attempt to re-edit the Intergenic region article after an extensive discussion that you can see on Intergenic region: Talk. If you want to see a good example of the irresponsible behavior of Wikipedia editors that's a good place to look. There's an even better example on Non-coding: Talk where some other scientists have also attempted, unsuccessfully, to convince Praxidicae.

I'm really frustrated by this behavior and I don't know what to do. I could fight the blockage but I think the cult of Wikipedia editors is pretty tight and my chances are slim. What's really interesting is that I can't even comment on my own 'trial' at (User:Genome42 reported by User:Praxidicae) because I've been blocked!

UPDATE: I appealed the block by saying ...

I have been unfairly accused. I have attempted to debate and discuss the reasons for my edits but the Wikipedia editors refuse to discuss the scientific issues and, instead, make false accusations about a lack of sources and unjustified reasons for removing false and misleading statements from the Wikipedia articles. Check out the Talk section on Non-coding DNA for a good example of other scientists trying to convince Praxidicae to back off.

Another Wikipedia administrator reviewd my appeal and declined it saying, "As you see nothing wrong with your edits, there are no grounds to consider lifting the block."

Any advice? Is Wikipedia worth fighting for?

FURTHER UPDATE: I appealed again ...

I'm confused about the process. Is there no way to have a reasonable discussion about this? It seems like the only way to get unblocked is to admit guilt and apologize. Is that correct?

A new editor named Daniel Case responded ...

Declining since this isn't making an argument for being unblocked.

I think the best thing you could do for yourself right is back off and cool down. I do see where you might have had a point, but you insisted on edit warring when you should have been discussing, and your blog isn't a reliable source unless, say, enough other scientists accept it as one. I admit that it seems Praxdicidae was getting a little too dogmatic, but I haven't had the time to look at the whole argument.

This is still frustrating. It was clearly the editor PRAXIDICAE who started and continued the edit war and who refused to engage in a discussion about the scientific merits of my edits. I discussed, she warred. The only acceptable resolution to this war appears to be that I admit to being wrong and PRAXIDICAE is assumed to be correct. That's what cooperation and consensus means to this group of editors/administrators.

Also, I never suggested using my blog as a reliable source in a Wikipedia article although I did mention a blog post in the discussion (Talk) as a more detailed explanation of my scientific reasons for making an edit.

And isn't it strange that the judge in a "trial" admits to not having the time to look at all the evidence before rendering a verdict? I think that what's going on here is that these Wikipedia adminstrators tend to stick together and defend each other's actions but that's really not in line with what Wikipedia is supposed to be about.

Thursday, August 18, 2022

The trouble with Wikipedia

I used to think that Wikipedia was a pretty good source of information even for scientific subjects. It wasn't perfect, but most of the articles could be fixed.

I was wrong. It took me more than two months to make the article on Non-coding DNA acceptable and my changes met with considerable resistance. Along the way, I learned that the old article on Junk DNA had been deleted ten years ago because the general scientific consensus was that junk DNA doesn't exist. So I started to work on a new "Junk DNA" article only to discover that it was going to be very difficult to get it approved. The powerful cult of experienced Wikipedia editors were clearly going to withhold approval of a new article on that subject.

I tried editing some other articles in order to correct misinformation but I ran into the same kind of resistance [see Allele, Gene, Human genome, Evolution, Alternative splicing, Intron]. Frequently, strange editors pop out the woodwork to restore (revert) my attempts on the grounds that I was refuting well-sourced information. I even had one editor named tgeorgescu tell me that, "Friend, Wikipedians aren’t interested in what you know. They are interested in what you can cite, i.e. reliable sources."

How can you tell which sources are reliable unless you know something about the subject?

Much of this bad behavior is covered in a Wikipedia article on Why Wikipedia is not so great. Here's the part that concerns me the most.

People revert edits without explaining themselves (Example: an edit on Economics) (a proper explanation usually works better on the talk page than in an edit summary). Then, when somebody reverts, also without an explanation, an edit war often results. There's not enough grounding in Wikiquette to explain that reverts without comments are inconsiderate and almost never justified except for spam and simple vandalism, and even in those cases comments need to be made for tracking purposes.

There's a culture of hostility and conflict rather than of good will and cooperation. Even experienced Wikipedians fail to assume good faith in their collaborators. It seems fighting off perceived intruders and making egotistical reversions are a higher priority than incorporating helpful collaborators into Wikipedia's community. Glaring errors and omissions are completely ignored by veteran Wikiholics (many of whom pose as scientists, for example, but have no verifiable credentials) who have nothing to contribute but egotistical reverts.

In another article on Criticism of Wikipedia the contributors raise a number of issues including the bad behavior of the cult of long-time Wikipedia editors. It also points out that anonymous editors who refuse to reveal their identify and areas of expertise leads to a lack of accountability.

This sort of behavior is frustrating and it has an effect. Well-meaning scientists are quickly discouraged from fixing articles because of all the hassle they have to go through.

I now see that the problem can't be easily fixed and Wikipedia science articles are not reliable.

Friday, August 12, 2022

The surprising (?) conservation of noncoding DNA

We've known for more than half-a-century that a lot of noncoding DNA is functional. Why are some people still surprised? It's a puzzlement.

A paper in Trends in Genetics caught my eye as I was looking for somethng else. The authors review the various functions of noncoding DNA such as regulatory sequences and noncoding genes. There's nothing wrong with that but the context is a bit shocking for a paper that was published in 2021 in a highly respected journal.

Leypold, N.A. and Speicher, M.R. (2021) Evolutionary conservation in noncoding genomic regions. TRENDS in Genetics 37:903-918. [doi: 10.1016/j.tig.2021.06.007]

Humans may share more genomic commonalities with other species than previously thought. According to current estimates, ~5% of the human genome is functionally constrained, which is a much larger fraction than the ~1.5% occupied by annotated protein-coding genes. Hence, ~3.5% of the human genome comprises likely functional conserved noncoding elements (CNEs) preserved among organisms, whose common ancestors existed throughout hundreds of millions of years of evolution. As whole-genome sequencing emerges as a standard procedure in genetic analyses, interpretation of variations in CNEs, including the elucidation of mechanistic and functional roles, becomes a necessity. Here, we discuss the phenomenon of noncoding conservation via four dimensions (sequence, regulatory conservation, spatiotemporal expression, and structure) and the potential significance of CNEs in phenotype variation and disease.

Thursday, August 04, 2022

Identifying functional DNA (and junk) by purifying selection

Functional DNA is best defined as DNA that is currently under purifying selection. In other words, it can't be deleted without affecting the fitness of the individual. This is the "maintenance function" definition and it differs from the "causal role" and "selected effect" definitions [The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect].

It has always been difficult to determine whether a given sequence is under purifying selection so sequence conservation is often used as a proxy. This is perfectly justifiable since the two criteria are strongly correlated. As a general rule, sequences that are currently being maintained by selection are ancient enough to show evidence of conservation. The only exceptions are de novo sequences and sequences that have recently become expendable and these are rare.

Sunday, July 31, 2022

Junk DNA causes cancer

This is a story about misleading press releases. The spread of misinformation by press offices is a serious issue that needs to be addressed.

The Institute of Cancer Research in London (UK) published a press release on July 19, 2022 with the provocative title: ‘Junk’ DNA could lead to cancer by stopping copying of DNA. The first three sentences tell most of the story.

Scientists have found that non-coding ‘junk’ DNA, far from being harmless and inert, could potentially contribute to the development of cancer.

Their study has shown how non-coding DNA can get in the way of the replication and repair of our genome, potentially allowing mutations to accumulate.

It has been previously found that non-coding or repetitive patterns of DNA – which make up around half of our genome – could disrupt the replication of the genome.

Nobody ever said that junk DNA was "inert and harmless;" in fact it is assumed to be slightly deleterious and only gets fixed because it is invisible to natural selection in small populations (Nearly Neutral Theory). And no intelligent scientist equates noncoding DNA and junk DNA, even by implication. But in any case, this article isn't about all junk DNA, it's about certain small stretches of repetitive DNA that interfere with replication so that the resulting mutations have to be fixed by repair mechanisms. The most likely sequences to interfere with replication are repeats of CG or (CG)n repeats. As the authors point out in the discussion, these repeats are "extremely rare" in all genomes, including the human genome, suggesting that they are under negative selection.

Other, more common, repeats also show detectable in vitro interference with replisomes at replication forks. The errors introduced by replication stalling can be repaired but some of them will escape repair causing mutations. It's not clear to me why mutations in junk DNA are a problem. That's not explained in the paper.

Here's the paper.

Casas-Delucchi, C.S., Daza-Martin, M., Williams, S.L. et al. (2022) Mechchanism of replication stalling and recovery within repetitive DNA. Nat Commun 13:3953 [doi: 10.1038/s41467-022-31657-x]

Accurate chromosomal DNA replication is essential to maintain genomic stability. Genetic evidence suggests that certain repetitive sequences impair replication, yet the underlying mechanism is poorly defined. Replication could be directly inhibited by the DNA template or indirectly, for example by DNA-bound proteins. Here, we reconstitute replication of mono-, di- and trinucleotide repeats in vitro using eukaryotic replisomes assembled from purified proteins. We find that structure-prone repeats are sufficient to impair replication. Whilst template unwinding is unaffected, leading strand synthesis is inhibited, leading to fork uncoupling. Synthesis through hairpin-forming repeats is rescued by replisome-intrinsic mechanisms, whereas synthesis of quadruplex-forming repeats requires an extrinsic accessory helicase. DNA-induced fork stalling is mechanistically similar to that induced by leading strand DNA lesions, highlighting structure-prone repeats as an important potential source of replication stress. Thus, we propose that our understanding of the cellular response to replication stress may also be applied to DNA-induced replication stalling.

The word "junk" does not appear anywhere in the paper and the word "cancer" appears only once in the text where it refers to a "cancer-associated" mutation in yeast. This makes me wonder why the press release uses both of these words so prominently. Does anybody have any ideas?

Perhaps it has something to do with a quotation from Gideon Coster, who is described as the study leader. He says,

We wanted to understand why it seems more difficult for cells to copy repetitive DNA sequences than other parts of the genome. Our study suggests that so-called junk DNA is actually playing an important and potentially damaging role in cells, by blocking DNA replication and potentially opening the door to cancerous mutations.

I find it strange that he refers to "so-called junk DNA" in the press release but didn't mention it in the peer-reviewed paper. He also didn't emphasize cancerous mutations in the paper.

The press release contain another quotation, this time it's from Kristian Helin who is the Chief Executive of The Institute of Cancer Research. He says,

This study helps to unravel the puzzle of junk DNA – showing how these repetitive sequences can block DNA replication and repair. It’s possible that this mechanism could play a role in the development of cancer as a cause of genetic instability – especially as cancer cells start dividing more quickly and so place the process of DNA replication under more stress.

It's unclear to me how studying these mutation-inducing repeats could help "unravel the puzzle of junk DNA" but that's probably why I'm not the chief executive of a cancer research insitute. I'm so stupid that I didn't even known there WAS a "puzzle" of junk DNA to be unravelled!

It's time for scientists to speak out against press releases like this one. It misrepresents the results and their interpretation as published after undergoing peer review. Intead, the press release is used as a propaganda exercise to promote the personal views of the scientists—views that they couldn't publish. This is what happened with ENCODE and it's becoming more and more common. The fact that, in this case, the personal views of these scientists are flawed only makes the situation worse.

Saturday, July 30, 2022

Wikipedia blocks any mention of junk DNA in the "Human genome" article

Wikipedia has an article on the Human genome. The introduction includes the following statement,

Human genomes include both protein-coding DNA genes and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly-repetitive sequences.

This is a recent improvement (July 22, 2022) over the original statement that simply said, "Human genomes include both protein-coding DNA genes and noncoding DNA." I noted in the "talk" section" that there was no mention of junk DNA in the entire article on the human genome so I added a sentence to the end of the section quoted above. I said,

Some non-coding DNA is junk, such as pseudogenes, but there is no firm consensus over the total mount of junk DNA.1

Thursday, July 28, 2022

Kat Arney defends junk DNA

I'm a big fan of Kat Arney and I loved her 2016 book Herding Hemingway's Cats where she interviews a number of prominent scientists. If you haven't read it you should try and get a copy even if it's just to read the chapters on Mark Ptashne, Dan Graur, and Adrian Bird. The last chapter begins with an attempt to interview Evelyn Fox Keller but don't be put off by that because the rest of the chapter is very scientific.

Kar Arney gets mentioned a couple of times in my book and I quote her opinion of epigenetics from the chapter on Adrian Bird. She has a much better understanding of genes, genomes, and junk DNA that every other person who's ever written a book on those subjects. I especially like what she has to say about her journey of discovery on page 259 near the end of the book.

Things that I thought were solid fact have been exposed as dogma and scientific hearsay, based on little evidence but repeated enough times by researchers, textbooks, and journalists until they feel real.
                                                                                Kat Arney (2016)

Kat Arney has just (July 28, 2022) posted a Genetics Society podcast on Genetics Unzipped. The main title is Does size matter when it comes to your genes and the subsections are "Where have all the genes gone?" "Genes or junk?" and "Are you more special than an onion?" You can listen to the podcast on that site (24 minutes) or read the entire transcript.

I don't entirely agree with everything she says in the podcast but she should be applauded for defending junk DNA in the face of all the scientific hearsay that out there. Good for her.

Here's three things that I might have said differently.

  • I don't agree with her historical account of estimates of the number of genes in the human genome [False History and the Number of Genes 2010]. The knowledgeable experts in the field were predicting about 30,000 genes and their estimates weren't far off. The figure below is from Hatje et al. (2019). Note the anomalous estimates from the GeneSweep lottery and the EST data. The EST data were known to be highly suspect. This is important because the false narrative promotes the idea that scientists knew very little about the human genome before the sequence was published and it promotes the idea that there's some great mystery (too few genes) that needs to be solved.
  • I disagree with her statement that "actual genes makes up less than 2% of all the DNA in the whole human genome." My disagreement depends somewhat on the definition of a gene but that's not really controversial. We're talking about the molecular gene and that's defined as "A gene is a DNA sequence that is transcribed to produce a functional product" [What Is a Gene?]. There are exceptions but this is the best definition we have. The fact that a great many scientists are confused about this is no excuse. Genes include introns so the typical human gene is quite large. In fact, about 45% of the human genome is devoted to genes. This is a far cry from the small percentage (<2%) that consists only of coding regions.
  • Kat Arney says, "So, given that most of our genome isn’t actually genes, what does the rest of it do? Well, it’s complicated, and there’s still a lot we don’t know." My quibble here is subtle but I think it's important. I think we have a pretty good handle on the functional parts of our genome and I don't expect any surprises. We know that about 10% of our genome is conserved and we can account for most of that functional DNA. The rest is not a mystery. We know that most of it consists of various flotsam and jetsam related to transposons and things like pseudogenes and dead viruses. This is junk DNA by any definition and we should stop pretending that it's a big mystery. When we say that 90% of our genome is junk that's not a reflection of ignorance; it's an evidence-based conclusion.

Hatje, K., Mühlhausen, S., Simm, D., and Kollmar, M. (2019) The Protein-Coding Human Genome: Annotating High-Hanging Fruits. BioEssays, 0(0), 1900066. [doi: 10.1002/bies.201900066]