More Recent Comments

Thursday, December 01, 2022

University of Michigan biochemistry students edit Wikipedia

Students in a special topics course at the University of Michigan were taught how to edit a Wikipedia article in order to promote function in repetitive DNA and downplay junk.

The Wikipedia article on Repeated sequence (DNA) was heavily edited today by students who were taking an undergraduate course at the University of Michgan. One of the student leaders, Rasberry Neuron, left the following message on the "Talk" page.

This page was edited for a course assignment at the University of Michigan. The editing process included peer review by four students, the Chemistry librarian at the University of Michigan, and course instructors. The edits published on 12/01/2022 reflect improvements guided by the original editing team and the peer review feedback. See the article's History page for information about what changes were made from the previous version.

References to junk DNA were removed by the students but quickly added back by Paul Gardner who is currently fixing other errors that the students have made.

I checked out the webpage for the course at CHEM 455_505 Special Topics in Biochemistry - Nucleic Acids Biochemistry. The course description is quite revealing.

We now realize that the human genome contains at least 80,000 non-redundant non-coding RNA genes, outnumbering protein-coding genes by at least 4-fold, a revolutionary insight that has led some researchers to dub the eukaryotic cell an “RNA machine”. How exactly these ncRNAs guide every cellular function – from the maintenance and processing to the regulated expression of all genetic information – lies at the leading edge of the modern biosciences, from stem cell to cancer research. This course will provide an equally broad as deep overview of the structure, function and biology of DNA and particularly RNA. We will explore important examples from the current literature and the course content will evolve accordingly.

The class will be taught from a chemical/molecular perspective and will bring modern interdisciplinary concepts from biochemistry, biophysics and molecular biology to the fore.

Most of you will recognize right away that there are factually incorrect statements (i.e. misinformation) in that description. It is not true that there are at least 80,000 noncoding genes in the human genome. At some point in the future that may turn out to be true but it's highly unlikely. Right now, there are at most 5,000 proven noncoding genes. There are many scientists who claim that the mere existence of a noncoding transcript is proof that a corresponding gene must exist but that's not how science works. Before declaring that a gene exists you must present solid evidence that it produces a biologically relevant product [Most lncRNAs are junk] [Wikipedia blocks any mention of junk DNA in the "Human genome" article] [Editing the Wikipedia article on non-coding DNA] [On the misrepresentation of facts about lncRNAs] [The "standard" view of junk DNA is completely wrong] [What's In Your Genome? - The Pie Chart] [How many lncRNAs are functional?].

I'm going to email a link to this post to the course instructors and some of the students. Let's see if we can get them to discuss junk DNA.


SPARC said...

This no junk in your genome movement is more and more reminiscent of a religious endeavor based on erroneous assumptions and misinterpreted results. Kept alive by the uncritical adoption of these ideas and their self-referential perpetuation by some cult leaders who have acquired positions that they could not give up without losing face. Presumably, however, they are so caught up in their thinking that they are incapable of doing so. They have developed a power and hierarchies that stifle critical voices not only at wikipedia. Not by the leaders themselves, but by willing followers who parrot the mantra of the movement and are incapable of their own critical thinking.

Mark Sturtevant said...

It is also possible that they were not taught that there are differing views on this subject, with strong evidence that ncRNA includes a lot of junk transcription. Very bad form. One can compare their view to a kind of religion, but maybe also to another kind of phlogiston -- which was a thing purported to exist but actually did not exist.
I've dug out some links from your web site, Larry. But they could use references to primary literature explaining the still-existing paradigm that has been around for decades.

doi: 10.1146/annurev-genom-112921-123710

Larry Moran said...

I've been discussing the subject with Nils Walter, one of the course instructors, but he hasn't given me permission to post his comments. His basic argument seems to be that the scientific consensus is against junk DNA and it's time to move on.

He thinks it would be a waste of time to debate the subject because it's no longer controversial. He teaches his students that the idea of junk DNA started with the publication of the human genome but it has now been dismissed by the scientific consensus.

Here's a statement from his website.

"RNA is a magical molecule that both started life and sustains it today. Over 75% of our genome encodes highly conserved non-coding RNA molecules, compared with only <2% that encodes proteins."

Joe Felsenstein said...

I can only think of one person working in the field of molecular evolution who thinks that most of the genome is functional. Or conserved. All the rest firmly disagree with this guy. It is deeply sad that the editors at Wikipedia will not allow at least one sentence into the relevant articles, saying that there is a controversy here.

Argon said...

"Over 75% of our genome encodes highly conserved non-coding RNA molecules, compared with only <2% that encodes proteins."

75% of our genome is 'highly conserved'? I thought perhaps 75% of the genome can appear in lncRNAs, but 'highly conserved'? Not sequence-wise. Perhaps tertiary structure but that doesn't appear to be characterized in sufficient detail to claim correctness to within an order of magnitude.

DAK said...

If about 50% of the human genome is taken up by genes and 75% (according Walter) encodes RNAs would that mean that a substantial chunk of these ncRNAs would have to be encoded in introns? Is that a problem or something that happens?

Mikkel Rumraket Rasmussen said...

It would be quite an understatement, to say that calling 75% of the human genome "highly conserved" is to employ an unusually relaxed sense of the word conserved.

Many parts are quite literally evolving (and have been for tens of millions of years) at the neutral rate. That means it's NOT being conserved.

Michael Tress said...

"Over 75% of our genome encodes highly conserved non-coding RNA molecules, compared with only <2% that encodes proteins."

That's not even remotely true. There are currently 19,379 coding genes annotated in GENCODE, of 62,696 total genes. But there aren't twice as many bases encoding RNA molecules as code for proteins because coding transcripts are generally much longer than non-coding transcripts.

Only a small fraction of these non-coding transcripts could reasonably be considered to be "highly conserved".

Larry Moran said...

@Michael Tress

Protein-coding genes occupy about 40% of the genome because introns take up most of the gene.

Perhaps he was counting highly conserved noncoding genes that are contained within introns?

Michael Tress said...


Yeah, the original comparison between RNA encoding and protein coding genes was apples and oranges. The only way to get close to 75% of RNA encoding genes is to include the introns of RNA-encoding genes, while ignoring introns when counting the less than 2% of protein coding regions.

Is it possible to get to 75% non-coding RNA encoding genes including introns? If protein coding genes cover 40% of the genome, then one would have to assume that (a) every region between protein coding must be packed full of RNA encoding transcripts and (b) most protein coding introns contain non-coding RNA encoding genes.

Neither are true. Even if all genes were packed together (they aren't), you would still have to fit pseudogenes into the remaining space (pseudogenes make up 14,700 of the almost 43,000 genes in GENCODE that are not coding). And although it is possible to find non coding RNA transcripts inside coding introns, it is relatively rare. Most I have seen have just one or two exons too, so don't take up much space.

I just had a quick look at a coding gene-rich section of chromosome 6: 28 coding genes, easily more than 100 introns, and just 5 of these introns contained short non-coding RNA sections. Non coding RNA transcripts inside coding introns is not common. Unless we haven't found them yet, of course ...