More Recent Comments

Saturday, May 20, 2023

Chapter 10: Turning Genes On and Off

Francis Collins, and many others, believe that the concept of junk DNA is outmoded because recent discoveries have shown that most of the human genome is devoted to regulation. This is part of a clash of worldviews where one side sees the genome as analogous to a finely tuned Swiss watch with no room for junk and the other sees the genome as a sloppy entity that's just good enough to survive.

The ENCODE researchers and their allies claim that the human genome contains more than 600,000 regulatory sites and that means an average of 24 per gene covering about 10,000 bp per gene. I explain why these numbers are unreasonable and why most of the sites they identify have nothing to do with biologically significant regulation.

This chapter also covers the epigenetics hype and restriction/modification.

Click on this link to see more.
Chapter 10: Turning Genes On and Off


Wednesday, May 17, 2023

Chapter 9: The ENCODE Publicity Campaign

In September 2012, the ENCODE researchers published a bunch of papers claiming to show that 80% of the human genome was functional. They helped orchestrate a massive publicity campaign with the help pf Nature— a campaign that succeeded in spreading the message that junk DNA had been refuted.

That claim was challenged within 24 hours by numerous scientists on social media. They pointed out that the ENCODE researchers were using a ridiculous definition of function and that they had completely ignored all the evidence for junk DNA. Over the next two years there were numerous scientific papers criticizing the ENCODE claims and the ENCODE researchers were forced to retract the claim that they had proven that 80% of the genome is functional.

I discuss what went wrong and lay the blame mostly on the ENCODE researchers who did not behave as proper scientists when presenting a controversial hypothesis. The editors of Nature share the blame for not doing a proper job of vetting the ENCODE claims and not subjecting the papers to rigorous peer review. Science writers also failed to think critically about the results they were reporting.

Click on this link to see more.
Chapter 9: The ENCODE Publicity Campaign


Monday, May 15, 2023

Chapter 8: Noncoding Genes and Junk RNA

I think there are no more than 5,000 noncoding genes but many scientists claim that there are tens of thousands of newly discovered noncoding genes. I describe the known noncoding genes (less than 1000) and explain why many of the transcripts detected are just junk RNA produced by spurious transcription. The presence of abundant noncoding genes will not solve the Deflated Ego Problem.

This chapter covers the misconceptions about the Central Dogma and how they are incorrectly used to try and discredit junk DNA. The views of John Mattick are explained and refuted. I end the chapter with a plea to adopt a worldview that can accommodate messy biochemistry and a sloppy genome that's full of junk DNA.

Click on this link to see more.

Chapter 8: NoncodingGenes and Junk RNA

Thursday, May 11, 2023

Chapter 7: Gene Families and the Birth & Death of Genes

This chapter describes gene families in the human genome. I explain how new genes are born by gene duplication and how they die by deletion or by becoming pseudogenes. Our genome is littered with pseudogenes: how do they evolve and are they all junk? What are the consequences of whole genome duplications and what does it teach us about junk DNA? How many real ORFan genes are there and why do some people think there are more? Finally, you will learn why dachshunds have short legs and what "The Bridge on the River Kwai" has to do with the accuracy of the human genome sequence.

Click on this link to see more.

Gene Families and the Birth and Death of Genes

Wednesday, May 10, 2023

Chapter 6: How Many Genes? How Many Proteins?

Here's a link to the summary of what's in Chapter 6. The important topics are the correct definition of "gene" and the number of protein-coding genes. I explain the false history concerning the number of genes that were predicted when the human genome sequence was published. This is the chapter that introduces the Deflated Ego Problem.

The last half of the chapter covers introns and why most intron sequences are junk. There's an extensive discussion of alternative splicing and why most genes are NOT alternatively spliced in spite of what you might have been taught.

Chapter 6: How Many Genes? How Many Proteins?

Sunday, May 07, 2023

Chapter 5: The Big Picture

Here's a link to a summary of what's in Chapter 5. It lists the main components of the human genome and concludes that less than 10% of the genome is functional. In other words, 90% of your genome is junk!

Chapter 5: The Big Picture

Tuesday, April 25, 2023

Happy DNA Day 2023!

It was 70 years ago today that the famous Watson and Crick paper was published in Nature along with papers by Franklin & Gosling and Wilkins, Stokes, & Wilson. Threre's a great deal of misinformation circulating about this discovery so I wrote up a brief history of the events based largely on Horace Freeland Judson's book The Eighth Day of Creation. Every biochemistry and molecular biology student must read this book or they don't qualify to be an informed scientist. However, if you are not a biochemistry student then you might enjoy my short version.

Some practising scientists might also enjoy refreshing their memories so they have an accurate view of what happened in case their students ask questions.

The Story of DNA (Part 1)

Where Rosalind Franklin teaches Jim and Francis something about basic chemistry.

The Story of DNA (Part 2)

Where Jim and Francis discover the secret of life.

Here's the latest version of Rosalind Frankin's contribution written by Matthew Cobb and Nathaniel Comfort: What Rosalind Franklin truly contributed to the discovery of DNA's structure. If you want to know the accurate version of her history then this is a must-read. Cobb is working on a biography of Crick and Comfort is writing a biography of Watson.

Here are some other posts that might interest you on DNA Day.



Saturday, March 25, 2023

ChatGPT lies about junk DNA

I asked ChatGPT some questions about junk DNA and it made up a Francis Crick quotation and misrepresented the view of Susumu Ohno.

We have finally restored the Junk DNA article on Wikipedia. (It was deleted about ten years ago when Wikipedians decided that junk DNA doesn't exist.) One of the issues on Wikipedia is how to deal with misconceptions and misunderstandings while staying within the boundaries of Wikipedia culture. Wikipedians have an aversion to anything that looks like editorializing so you can't just say something like, "Nobody ever said that all non-coding DNA was junk." Instead, you have to find a credible reference to someone else who said that.

I've been trying to figure out how far the misunderstandings of junk DNA have spread so I asked ChatGPt (from OpenAI) again.

Wednesday, March 08, 2023

A small crustacean with a very big genome

The antarctic krill genome is the largest animal genome sequenced to date.

Antarctic krill (Euphausia superba) is a species of small crustacean (about 6 cm long) that lives in large swarms in the seas around Antarctica. It is one of the most abundant animals on the planet in terms of biomass and numbers of individuals.

It was known to have a large genome with abundant repetitive DNA sequences making assembly of a complete genome very difficult. Recent technological advances have made it possible to sequence very long fragments of DNA that span many of the repetitive regions and allow assembly of a complete genome (Shao et al. 2023).

The project involved 28 scientists from China (mostly), Australia, Denmark, and Italy. To give you an idea of the effort involved, they listed the sequencing data that was collected: 3.06 terabases (Tb) PacBio long read sequences, 734.99 Gb PacBio circular consensus sequences, 4.01 Tb short reads, and 11.38 Tb Hi-C reads. The assembled genome is 48.1 Gb, which is considerably larger than that of the African lungfish (40 Gb), which up until now was the largest fully sequenced animal genome.

The current draft has 28,834 protein-coding genes and an unknown number of noncoding genes. About 92% of the genome is repetitive DNA that's mostly transposon-related sequences. However, there is an unusual amount of highly repetitive DNA organized as long tandem repeats and this made the assembly of the complete genome quite challenging.

The protein-coding genes in the Antarctic krill are longer than in other species due to the insertion of repetitive DNA into introns but the increase in intron size is less than expected from studies of other large genomes such as lungfish and Mexican axolotl. It looks like more of the genome expansion has occurred in the intergenic DNA compared to these other species.

This study supports the idea that genome expansion is mostly due to the insertion and propagation of repetitive DNA sequences. Some of us think that the repetitive DNA is mostly junk DNA but in this case it seems unusual that there would be so much junk in the genome of a species with such a huge population size (about 350 trillion individuals). The authors were aware of this problem but they were able to calculate an effective population size because they had sequence data from different individuals all around Antarctica. The effective population size (Ne) turned out to be one billion times smaller than the census population size indicating that the population of krill had been much smaller in the recent past. Their data suggests strongly that this smaller population existed only 10 million years ago.

The authors don't mention junk DNA. They seem to favor the idea that large genomes are associated with crustaceans that live in polar regions and that large genomes may confer a selective advantage.


Shao, C., Sun, S., Liu, K., Wang, J., Li, S., Liu, Q., Deagle, B.E., Seim, I., Biscontin, A., Wang, Q. et al. (2023) The enormous repetitive Antarctic krill genome reveals environmental adaptations and population insights. Cell 186:1-16. [doi: 10.1016/j.cell.2023.02.005]

Friday, March 03, 2023

Do you understand the scientific literature?

I'm finding it increasingly difficult to understand the scientific literature even in subjects that I've been following for decades. Is it just because I'm getting too old to keep up?

Here's an example of a paper that I'd like to understand but after reading the abstract and the introduction I gave up. I'll quote the first paragraph of the introduction to see if any Sandwalk readers can do better.

I'm not talking about the paper being a complete mystery; I can figure out roughly what's it's about. What I'm thinking is that the opening paragraph could have been written in a way that makes the goals of the research much more comprehensible to average scientifically-literate people.

Weiner, D. J., Nadig, A., Jagadeesh, K. A., Dey, K. K., Neale, B. M., Robinson, E. B., ... & O’Connor, L. J. (2023) Polygenic architecture of rare coding variation across 394,783 exomes. Nature 614:492-499. [doi = 10.1038/s41586-022-05684-z]

Genome-wide association studies (GWAS) have identified thousands of common variants that are associated with common diseases and traits. Common variants have small effect sizes individually, but they combine to explain a large fraction of common disease heritability. More recently, sequencing studies have identified hundreds of genes containing rare coding variants, and these variants can have much larger effect sizes. However, it is unclear how much heritability rare variants explain in aggregate, or more generally, how common-variant and rare-variant architecture compare: whether they are equally polygenic; whether they implicate the same genes, cell types and genetically correlated risk factors; and whether rare variants will contribute meaningfully to population risk stratification.

The first question that comes to mind is whether the variant that's associated with a common disease is the cause of that disease or merely linked to the actual cause. In other words, are the associated variants responsible for the "effect size"? It sounds like the answer is "yes" in this case. Has that been firmly esablished in the GWAS field?


Thursday, March 02, 2023

"You like me!"

The endorsements for my book are in.

One of the last steps in publishing a book is to collect endorsements—favorable statements from famous people who urge you to buy the book. These short endorsements will appear in the front of the book and on the book jacket (dust jacket). They may also appear on various websites in order to promote sales.

The trick is to sent the book out for review to as many people as possible and hope that one or two will like it well enough to say something nice. I'm pleased to report that there were, indeed, a few people who liked the book well enough to endorse it.



The title of this post is from Sally Field's acceptance speech on winning the Academy Award for best actress in 1985. She said, "I can't deny the fact that you like me. Right now, you like me!"

Wednesday, March 01, 2023

Definition of a gene (again)

The correct definition of a molecular gene isn't difficult but getting it recognized and accepted is a different story.

When writing my book on junk DNA I realized that there was an issue with genes. The average scientist, and consequently the average science writer, has a very confused picture of genes and the proper way to define them. The issue shouldn't be confusing for Sandwalk readers since we've covered that ground many times in the past. I think the best working definition of a gene is, "A gene is a DNA sequence that is transcribed to produce a functional product" [What Is a Gene?]