Sandwalk: Wikipedia

Showing posts with label Wikipedia. Show all posts

Wednesday, February 14, 2024

Copilot answers the question, "What is junk DNA?"

The Microsoft browser (Edge) has a built in function called Copilot. It's an AI assistant based on ChatGPT-4.

I decided to test it byt asking "What is junk DNA?" and here's the answer it gave me.

Help fix the Wikipedia article on evolution

The Wikipedia article on evolution [Evolution] is a "Featured article," which means two things: (1) it is one of the best articles Wikipedia has to offer, and (2) it was voted a featured article by Wikipedia editors and that means they will resist any changes.

You will be shocked to learn that the article isn't perfect. It could use some serious updating and revision but my first attempt was reverted by an editor named Efbrazil who has vowed to revert any edits I make unless I can get his approval. So I thought I'd give it a try and you can see the result on the Talk:Evolution pages. My intitial objective is to edit the introductory paragraphs in the lead to eliminate the reference to expression of genes and to introduce the term "allele," which is covered in the main part of the article. Here's the current opening paragraphs of the lead,

In biology, evolution is the change in heritable characteristics of biological populations over successive generations.[1][2] These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Genetic variation tends to exist within any given population as a result of genetic mutation and recombination.[3] Evolution occurs when evolutionary processes such as natural selection (including sexual selection) and genetic drift act on this variation, resulting in certain characteristics becoming more or less common within a population over successive generations.[4] It is this process of evolution that has given rise to biodiversity at every level of biological organisation.[5][6]

The theory of evolution by natural selection was conceived independently by Charles Darwin and Alfred Russel Wallace in the mid-19th century and was set out in detail in Darwin's book On the Origin of Species.[7] Evolution by natural selection is established by observable facts about living organisms: (1) more offspring are often produced than can possibly survive; (2) traits vary among individuals with respect to their morphology, physiology, and behaviour; (3) different traits confer different rates of survival and reproduction (differential fitness); and (4) traits can be passed from generation to generation (heritability of fitness).[8] In successive generations, members of a population are therefore more likely to be replaced by the offspring of parents with favourable characteristics for that environment. In the early 20th century, other competing ideas of evolution were refuted as the modern synthesis concluded Darwinian evolution acts on Mendelian genetic variation.[9]

I'm also thinking that we should modify the following sentences that don't seem to be appropriate in a "Featured article,"

According to the now largely abandoned neutral theory of molecular evolution most evolutionary changes are the result of the fixation of neutral mutations by genetic drift.[101] In this model, most genetic changes in a population are thus the result of constant mutation pressure and genetic drift.[102] This form of the neutral theory is now largely abandoned since it does not seem to fit the genetic variation seen in nature.[103][104]

Editor Efbrazil seems to be he only editor willing to discuss these problems and he is hard to convince. If anyone else is interested in improving this Wikipedia article, I invite you to participate in the discussion on the Talk pages.

Thursday, December 01, 2022

University of Michigan biochemistry students edit Wikipedia

Students in a special topics course at the University of Michigan were taught how to edit a Wikipedia article in order to promote function in repetitive DNA and downplay junk.

The Wikipedia article on Repeated sequence (DNA) was heavily edited today by students who were taking an undergraduate course at the University of Michgan. One of the student leaders, Rasberry Neuron, left the following message on the "Talk" page.

This page was edited for a course assignment at the University of Michigan. The editing process included peer review by four students, the Chemistry librarian at the University of Michigan, and course instructors. The edits published on 12/01/2022 reflect improvements guided by the original editing team and the peer review feedback. See the article's History page for information about what changes were made from the previous version.

References to junk DNA were removed by the students but quickly added back by Paul Gardner who is currently fixing other errors that the students have made.

I checked out the webpage for the course at CHEM 455_505 Special Topics in Biochemistry - Nucleic Acids Biochemistry. The course description is quite revealing.

We now realize that the human genome contains at least 80,000 non-redundant non-coding RNA genes, outnumbering protein-coding genes by at least 4-fold, a revolutionary insight that has led some researchers to dub the eukaryotic cell an “RNA machine”. How exactly these ncRNAs guide every cellular function – from the maintenance and processing to the regulated expression of all genetic information – lies at the leading edge of the modern biosciences, from stem cell to cancer research. This course will provide an equally broad as deep overview of the structure, function and biology of DNA and particularly RNA. We will explore important examples from the current literature and the course content will evolve accordingly.

The class will be taught from a chemical/molecular perspective and will bring modern interdisciplinary concepts from biochemistry, biophysics and molecular biology to the fore.

Most of you will recognize right away that there are factually incorrect statements (i.e. misinformation) in that description. It is not true that there are at least 80,000 noncoding genes in the human genome. At some point in the future that may turn out to be true but it's highly unlikely. Right now, there are at most 5,000 proven noncoding genes. There are many scientists who claim that the mere existence of a noncoding transcript is proof that a corresponding gene must exist but that's not how science works. Before declaring that a gene exists you must present solid evidence that it produces a biologically relevant product [Most lncRNAs are junk] [Wikipedia blocks any mention of junk DNA in the "Human genome" article] [Editing the Wikipedia article on non-coding DNA] [On the misrepresentation of facts about lncRNAs] [The "standard" view of junk DNA is completely wrong] [What's In Your Genome? - The Pie Chart] [How many lncRNAs are functional?].

I'm going to email a link to this post to the course instructors and some of the students. Let's see if we can get them to discuss junk DNA.

Saturday, September 10, 2022

Wikipedia articles: Quality and importance rankings

Wikipedia has a way of assessing the quality of articles that have been posted and edited. The rankings are somewhat confusing and it’s hard to find the complete list of quality categories so I’m putting a link to Wikipedia: Content assessment here.

There are six categories ranging from FA (featured article) to C.

Wikipedia: the ENCODE article

The ENCODE article on Wikipedia is a pretty good example of how to write a science article. Unfortunately, there are a few issues that will be very difficult to fix.

When Wikipedia was formed twenty years ago, there were many people who were skeptical about the concept of a free crowdsourced encyclopedia. Most people understood that a reliable source of information was needed for the internet because the traditional encyclopedias were too expensive, but could it be done by relying on volunteers to write articles that could be trusted?

The answer is mostly “yes” although that comes with some qualifications. Many science articles are not good; they contain inaccurate and misleading information and often don’t represent the scientific consensus. They also tend to be disjointed and unreadable. On the other hand, many non-science articles are at least as good, and often better, than anything in the traditional encyclopedias (eg. Battle of Waterloo; Toronto, Ontario; The Beach Boys).

By 2008, Wikipedia had expanded enormously and the quality of articles was being compared favorably to those of Encyclopedia Britannica, which had been forced to go online to compete. However, this comparison is a bit unfair since it downplays science articles.

Editing the 'Intergenic region' article on Wikipedia

Just before getting banned from Wikipedia, I was about to deal with a claim on the Intergenic region article. I had already fixed most of the other problems but there is still this statement in the subsection labeled "Properties."

According to the ENCODE project's study of the human genome, due to "both the expansion of genic regions by the discovery of new isoforms and the identification of novel intergenic transcripts, there has been a marked increase in the number of intergenic regions (from 32,481 to 60,250) due to their fragmentation and a decrease in their lengths (from 14,170 bp to 3,949 bp median length)"[2]

The source is one of the ENCODE papers published in the September 6 edition of Nature (Djebali et al., 2012). The quotation is accurate. Here's the full quotation.

As a consequence of both the expansion of genic regions by the discovery of new isoforms and the identification of novel intergenic transcripts, there has been a marked increase in the number of intergenic regions (from 32,481 to 60,250) due to their fragmentation and a decrease in their lengths (from 14,170 bp to 3,949 bp median length.

What's interesting about that data is what it reveals about the percentage of the genome devoted to intergenic DNA and the percentage devoted to genes. The authors claim that there are 60,250 intergenic regions, which means that there must be more than 60,000 genes.¹ The median length of these intergenic regions is 3,949 bp and that means that roughly 204.5 x 10⁶ bp are found in intergenic DNA. That's roughly 7% of the genome depending on which genome size you use. It doesn't mean that all the rest is genes but it sounds like they're saying that about 90% of the genome is occupied by genes.

In case you doubt that's what they're saying, read the rest of the paragraph in the paper.

Concordantly, we observed an increased overlap of genic regions. As the determination of genic regions is currently defined by the cumulative lengths of the isoforms and their genetic association to phenotypic characteristics, the likely continued reduction in the lengths of intergenic regions will steadily lead to the overlap of most genes previously assumed to be distinct genetic loci. This supports and is consistent with earlier observations of a highly interleaved transcribed genome, but more importantly, prompts the reconsideration of the definition of a gene.

It sounds like they are anticipating a time when the discovery of more noncoding genes will eventually lead to a situation where the intergenic regions disappear and all genes will overlap.

Now, as most of you know, the ENCODE papers have been discredited and hardly any knowledgeable scientist thinks there are 60,000 genes that occupy 90% of the genome. But here's the problem. I probably couldn't delete that sentence from Wikipedia because it meets all the criteria of a reliable source (published in Nature by scientists from reputable universities). Recent experience tells me that the ~~Wikipolice~~ Wikipedia editors would have blocked me from deleting it.

The best I could do would be to balance the claim with one from another "reliable source" such as Piovasan et al. (2019) who list the total number of exons and introns and their average sizes allowing you to calculate that protein-coding genes occupy about 35% of the genome. Other papers give slightly higher values for protein-coding genes.

It's hard to get a reliable source on the real number of noncoding genes and their average size but I estimate that there are about 5,000 genes and a generous estimate that they could take up a few percent of the genome. I assume in my upcoming book that genes probably occupy about 45% of the genome because I'm trying to err on the side of function.

An article on Intergenic regions is not really the place to get into a discussion about the number of noncoding genes but in the absence of such a well-sourced explanation the audience will be left with the statement from Djebali et al. and that's extremely misleading. Thus, my preference would be to replace it with a link to some other article where the controversy can be explained, preferably a new article on junk DNA.²

I was going to say,

The total amount of intergenic DNA depends on the size of the genome, the number of genes, and the length of each gene. That can vary widely from species to species. The value for the human genome is controversial because there is no widespread agreement on the number of genes but it's almost certain that intergenic DNA takes up at least 40% of the genome.

I can't supply a specific reference for this statement so it would never have gotten past the ~~Wikipolice~~ Wikpipedia editors. This is a problem that can't be solved because any serious attempt to fix it will probably lead to getting blocked on Wikipedia.

There is one other statement in that section in the article on Intergenic region.

Scientists have now artificially synthesized proteins from intergenic regions.[3]

I would have removed that statement because it's irrelevant. It does not contribute to understanding intergenic regions. It's undoubtedly one of those little factoids that someone has stumbled across and thinks it needs to be on Wikipedia.

Deletion of a statement like that would have met with fierce resistance from the Wikipedia editors because it is properly sourced. The reference is to a 2009 paper in the Journal of Biological Engineering: "Synthesizing non-natural parts from natural genomic template."

1. There are no intergenic regions between the last genes on the end of a chromosome and the telomeres.

2. The Wikipedia editors deleted the Junk DNA article about ten years ago on the grounds that junk DNA had been disproven.

Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A. et al. (2012) Landscape of transcription in human cells. Nature 489:101-108. [doi: 10.1038/nature11233]

Piovesan, A., Antonaros, F., Vitale, L., Strippoli, P., Pelleri, M. C., and Caracausi, M. (2019) Human protein-coding genes and gene feature statistics in 2019. BMC research notes 12:315. [doi: 10.1186/s13104-019-4343-8]

Subscribe to: Comments ( Atom )

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

Sandwalk

More Recent Comments

Wednesday, February 14, 2024

Copilot answers the question, "What is junk DNA?"

Tuesday, August 01, 2023

Help fix the Wikipedia article on evolution

Thursday, December 01, 2022

University of Michigan biochemistry students edit Wikipedia

Saturday, September 10, 2022

Wikipedia articles: Quality and importance rankings

Sunday, September 04, 2022

Wikipedia: the ENCODE article

Saturday, August 20, 2022

Editing the 'Intergenic region' article on Wikipedia