Thursday, April 29, 2021

Chromatin organization at promoters in yeast cells

Our genome is very large and very complicated because it is full of junk DNA. It contains thousand of sites where DNA binding proteins can bind just by chance. This leads to the reorganization of nucleosomes in a way that mimics functional sites. It's difficult to distinguish these spurious sites from real functional sites and that has led to much confusion in the scientific literature.1

The yeast genome is much more simple and it's safe to assume that almost all of the sites detected by the standard chromatin assays are genuine, biologically relevant, sites. In that sense, it serves as a model for what functional sites looks like. A recent paper in Nature (April 8, 2021) reports on the mapping of most of the sites in the yeast genome where DNA binding proteins are found.

Rossi, M.J., Kuntala, P.K., Lai, W.K., Yamada, N., Badjatia, N., Mittal, C., Kuzu, G., Bocklund, K., Farrell, N.P., Blanda, T.R.M., Joshua D, V, B.A., Mistretta, K.S., Rocco, D.J., Perkinson, E.S., Kellogg, G.D., Mahony, S. and Pugh, B.F. (2021) A high-resolution protein architecture of the budding yeast genome. Nature 592:309-314. [doi: 10.1038/s41586-021-03314-8]

Origins of replication

Origins of replication are also called autonomously replicating sequence consensus sequences (ACS). There are 253 of them in the yeast genome and they are characterized by a 300 bp nucloeosome-free region that's occupied by the origin recognition complex (ORC) and the helicase MCM.

Telomeres

Telomeres are bound by a number of proteins including silent information regulators (SIRs). There's a nucleosome-free region of about 300 bp. where these proteins are located.

Centromeres

The nucleosome-free region at centromeres covers only 170 bp where a number of centromere binding proteins are located. The absence of nucleosomes at the centromere is a surprise since it was though that centromere DNA was bound by modified nucleosomes containing a specific histone variant.

Tuesday, April 27, 2021

Asymptomatic and presymptomatic spread of SARS-CoV-2

It is widely believed that a substantial amount of viral spread is due to individuals who are transmitting the virus but have no symptoms (asymptomatic spread) but there's so much misinformation about COVID-19 out there that I'm having trouble sorting out real science from fake science so I've become skeptical of just about everything.

I'm not talking about the kind of fake science being spread on FOX News, I'm also talking about misinformation spread by ordinary people like me and the typical readers of this blog. We might do it inadvertantly but it's still wrong.

What's the real data on asymptomatic spread? I don't know, but here's a summary of the issue in a recent issue of Science. It sounds good to me because the authors take steps to address questions that seem obvious.

Rasmussen, A.L. and Popescu, S. V. (2021) SARS-CoV-2 transmission without symptoms. Science 371: 1204-1207. [doi: 10.1126/science.abf9569]

Sunday, April 25, 2021

Happy DNA day 2021!

It was 68 years ago today that the famous Watson and Crick paper was published in Nature along with papers by Franklin & Gosling and Wilkins, Stokes, & Wilson. Threre's a great deal of misinformation circulating about this discovery so I wrote up a brief history of the events based largely on Horace Freeland Judson's book The Eighth Day of Creation. Every biochemistry and molecular biology student must read this book or they don't qualify to be an informed scientist. However, if you are not a biochemistry student then you might enjoy my short version.

Some practising scientists might also enjoy refreshing their memories so they have an accurate view of what happened in case their students ask questions.

The Story of DNA (Part 1)

Where Rosalind Franklin teaches Jim and Francis something about basic chemistry.

The Story of DNA (Part 2)

Where Jim and Francis discover the secret of life.

Here are some other posts that might interest you on DNA Day.



Wednesday, April 21, 2021

Douglas Axe pretends to be an expert on intelligent design

This is a really interesting video presentation by Dougla Axe, a leading proponent of Intelligent Design Creationism. He's criticizing the argument from poor design; an argument that attempt to refute intelligent design by pointing out examples of poor design that a creator would never create. Axe uses an example from Neil deGrasse Tyson and if you look at this objectively you would say that Axe does a pretty good job of refuting Tyson's claims.

Tyson is not a biologist and he shouldn't pretend to be one, but that's not the most interesting take-home lesson from this video. The most interesting point concerns the comments Douglas Axe makes at the end of the video beginning at 11:30 minutes. He claims that Neil deGrasse Tyson is not an expert on designing life so it's foolish of him to pretend that he knows anything about the subject. When you hear someone making an imperfect design argument he asks his listeners to challenge them by saying, "What have YOU made that you think qualifies you to critique life."

Yep. He actually said that! Someone who promotes intelligent design without any experience in designing life actually tried to use that argument against opponents of intelligent design.

God has an inordinate fondness for beetles.


J.B.S. Haldane

The burden of proof is on Intelligent Design Creationists to demonstrate how their view is compatible with science and with the history of life. They have to demonstrate why it took 3.5 billion years to get where we are today and why the history of life is so compatable with evolution. They have to demonstrate why millions of species of bacteria and almost as many species of beetles can only be explained by the actions of an intelligent designer. They have to explain why all the data shows that modern humans and chimpanzees have descended by gradual fixation of mutations from a common ancestor that lived only a few million years ago. They have to explain why an intelligent designer would design a genome that's 90% junk.

These creationists haven't made anything that qualifies them to be experts on the design of life1 but I'm willing to listen to any ideas they have. So far, all we've seen is criticisms of evolution, which is also a topic where they lack expertise.


The Haldane quotation is accurate. See “"A Special Fondness for Beetles" by Stephen J. Gould in Dinosaur in a Haystack.

1. Unless they have some special insight into the mind of god in which case they should be able to tell us exactly how he did it. Why did he create all those strange animals in the Cambrian only to allow most of them to go exinct? And speaking of extinctions, what did he have against most dinosaurs that he decided to kill them by smashing a meteor into the Earth 66 million years ago? Can you explain that, Dr. Axe?

The illusions of James Shapiro

James A. Shapiro is a professor in the Department of Biochemistry and Molecular Biology at the University of Chicago (Chicago, USA). He made signficant contributions to our understanding if the function and structure of transposons but in later years he has become a vocal opponent of evolution culminating in his 2011 book Evolution: A View from the 21st Century. He is one of the founding members of The Third Way of Evolution.

I wrote a critical review of Evolution: A View from the 21st Century for the National Center for Science Education (NCSE) Reports but the issue is no longer visible on the web. Shapiro didn't like my review so NCSE published his rebutal and that's also unavailable. You can see my response at: James Shapiro Responds to My Review of His Book.

Monday, April 19, 2021

The illusions of Denis Noble

Denis Noble was a Professor of Physiology at Oxford University in the United Kingdom until he retired. He had a distinguished career as a physiologist making significant contributions to our undestanding of the heart and its relationship to the whole organism.

In recent years, Noble has dabbled in philosophy and evolution. He has become a vocal opponent modern evolution (sensu Noble) and the way science is currently conducted. Some of his criticisms have made it onto two popular books: The Music of Life and Dance to the Tune of Life. He is one of the leading proponents of the "Extended Evolutionary Synthesis" (EES) and he is one of the founders of The Third Way of Evolution, a wishy-washy and scientifically inaccurate way of attacking a strawman version of evolution and providing a safe haven for religious scientists.

Saturday, April 17, 2021

Philosophers argue that scientific conclusions need not be accurate, justified, or believed by their authors

A remarkable paper has just been posted to a philosophy of science preprint website. (It will be published in Synthase.) Like many papers in this field it's difficult to read and the logic is obtuse but the bottom line is that scientists don't really need to be held to the old standards that we scientists used to think are essential.

Dang, Haixin and Bright, Liam Kofi (2021) Scientific Conclusions Need Not Be Accurate, Justified, or Believed by their Authors. PhilSci Archive {PDF]

We argue that the main results of scientific papers may appropriately be published even if they are false, unjustified, and not believed to be true or justified by their author. To defend this claim we draw upon the literature studying the norms of assertion, and consider how they would apply if one attempted to hold claims made in scientific papers to their strictures, as assertions and discovery claims in scientific papers seem naturally analogous. We first use a case study of William H. Bragg’s early 20th century work in physics to demonstrate that successful science has in fact violated these norms. We then argue that features of the social epistemic arrangement of science which are necessary for its long run success require that we do not hold claims of scientific results to their standards. We end by making a suggestion about the norms that it would be appropriate to hold scientific claims to, along with an explanation of why the social epistemology of science—considered as an instance of collective inquiry—would require such apparently lax norms for claims to be put forward.

Tuesday, April 13, 2021

How do you explain evolution to non-experts?

I spent a lot of time explaining evolution in my book. The goal is to educate readers to the level where they can understand the drift-barrier hypothesis and why slightly deleterious mutations can accumulate in species with small populations. This requires some knowledge of random genetic drift and some knowledge of Neutral Theory and Nearly-Neutral Theory. The emphasis is on population genetics as the most important way of understanding evolution.

You can't understand genomes and junk DNA unless you have a firm understanding of evolution. In fact, you can't make sense of anything about genes and gene expression without such knowledge ... what the heck, nothing in all of biology makes sense if you don't know about evolution.

My approach hasn't been copied by popular websites. They usually misrepresent evolution by presenting it as adaptation; natural selection is the only game in town. I'll put in a link to Francis Collins describing evolution in truly bizarre narration but my question for Sandwalk readers is whether this is useful or not. Is it better to dumb down evolution on the NIH: National Huamn Genome Website [Evolution] or is this a bad idea?


Friday, April 09, 2021

Should we teach genomics and evolution to medical students?

Rama Singh,1 a biology professor at McMaster Universtiy in Hamilton (Ontario, Canada) has just published an interesting article on The Conversation website. It's about Medical schools need to prepare doctors for revolutionary advances in genetics. You can read the full article yourself but let me highlight the last few paragraphs to start the discussion.

Future physicians will be part of health networks involving medical lab technicians, data analysts, disease specialists and the patients and their family members. The physician would need to be knowledgeable about the basic principles of genetics, genomics and evolution to be able to take part in the chain of communication, information sharing and decision-making process.

This would require a more in-depth knowledge of genomics than generally provided in basic genetics courses.

Much has changed in genetics since the discovery of DNA, but much less has changed how genetics and evolution are taught in medical schools.

In 2013-14 a survey of course curriculums in American and Canadian medical schools showed that while most medical schools taught genetics, most respondents felt the amount of time spent was insufficient preparation for clinical practice as it did not provide them with sufficient knowledge base. The survey showed that only 15 per cent of schools covered evolutionary genetics in their programs.

A simple viable solution may require that all medical applicants entering medical schools have completed rigorous courses in genetics and genomics.

Here's the problem. I've just finished research on a book about modern evolution and genomics so I think I know a little bit about the subject. I'm also on the editorial board of a journal that publishes research on biochemistry and molecular biology education. I've written a biochemistry textbook and I have far too many years of experience trying to teach this material to graduate students and undergraduates at the University of Toronto. I can safely say that we (university teachers) have done a horrible job of teaching evolution and genomics to our students. We have turned out an entire generation of students who don't understand modern molecular evolution and don't understand what's in your genome.

What this means is that there's an extremely small pool of students who have completed "rigorous courses in genetics and genomics." Nobody will be able to apply to medical school. I doubt that we could teach this material to medical students with or without the appropriate background.

But you don't have to take my word for it. Some people have tried to teach this material to health science workers so we can see how it's working at that level. Take a look at the The Genomics Education Programme supported by the NHS in the United Kingdom. They have a series of short videos and longer lessons that are designed to educate health care specialists. Here's the blurb that defines their objective.

Rapid advances in technology and understanding mean that genomics is now more relevant than ever before. As genomics increasingly becomes a part of mainstream NHS care, all healthcare professionals, and not just genomics specialists, need to have a good understanding of its relevance and potential to impact the diagnosis, treatment and management of people in our care.

In 2014, Health Education England (HEE) launched a four-year £20 million Genomics Education Programme (GEP) to ensure that our 1.2 million-strong NHS workforce has the knowledge, skills and experience to keep the UK at the heart of the genomics revolution in healthcare.

Funding for the programme has since been extended to enable us to continue our work in providing co-ordinated national direction of education and training in genomics and developing resources for a wide range of professionals.

They describe genes as 'coding' genes that build proteins. There's no mention of noncoding genes. The define a genome as "both genes (coding) and non-coding DNA." They also say that your genome is all of the DNA in our cells (46 chromosomes, 23 pairs). I don't see anything in their education packages that covers modern molecular evolution. In one of the packages they say,

The term ‘junk DNA’ has been used since the 1970s to describe non-coding regions of the genome, but today it is considered inaccurate and misleading. The term ‘junk’ suggests that 98% of the genome has no use, but in recent years, studies and projects have used advances in technology to shed light on these regions and have come to different conclusions about how much of the genome has a biological function.

Here's a link to a short video called What is a genome?. I recommend that you watch it to see the level that these experts think is suitable for health care professionals in the UK and to see the level of expertise of those who made the video. This is what seven years of work by experts and £20 million will get you.

All of this tells me that teaching genomics and evolution to medical students is going to be a lot more difficult than Rama Singh imagines. Not only would we have to counter several years of misinformation but we would have to rely on teachers who probably don't understand either topic.

Let's start by teaching these things correctly to biology and biochemistry majors. That's going to be hard enough for now.


1. Full displosure: Rama and I shared an NSERC grant in 1981 on genetic variation in Drosophila.

SARS-CoV-2 mRNA vaccines: RNA + lipid nanoparticles

The new mRNA vaccines are the result of extensive research over the past thirty years or so. They are marvels of technological innovation but probably not just for the reasons you imagine. The basics of therapeutic mRNA synthesis have been around for about ten years but the problem was how to get the RNA into cells. That requires specialized lipid nanoparticles and making those has been the most recent technological advance. A lot of this research was done in Canada. I found a nice paper (Buschmann et al., 2021) that covers this research and I'll summarize the important points for those of you don't have time to read it.

The mRNA

Normal messenger RNA is susceptable to nuleases and is not readily taken up by human cells. In addition, it elicits an innate immune response that results in supression of translation through phosphorylation of eIF2a. The immune response can be blocked by incorporating modified nucleotides than are not recognized by the various receptors that stimulate the normal response. This was discovered over ten years ago. These modified nucleotides, such as N1-methylpseudouridine, were used to make the SAR-CoV-2 vaccine.

Thursday, April 08, 2021

On the accuracy of genomics in detecting disease variants

Several diseases, such as cancers, are caused by the presence of deleterious alleles that affect the function of a gene. In the case of cancer, most of the mutations are somatic cell mutations—mutations that have occurred after fertilization. These mutations will not be passed on to future generations. However, there are some variants that are present in the germline and these will be inherited. A small percentage of these variants will cause cancer directly but most will just indicate a predisposition to develop cancer.

There are a host of other diseases that have a genetic component and the responsible alleles can also be present in the germline or due to somatic cell mutations.

Over the past fifty years or so there has been a lot of hype associated with the latest technological advances and the ability to detect deleterious germline mutations. The general public has been repeatedly told that we will soon be able to identify all disease-causing alleles and this will definitely lead to incredible medical advances in treating these diseases. Just yesterday, for example, I posted an article on predictions made by The National Genome Research Institute (USA) who predicts that by 2030,

The clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation ‘variant of uncertain significance (VUS)’ obsolete.

Similar predictions, in various forms, were made when the human genome project got under way and at various time afterword. First there was the 1000 genomes project then there was the 100,000 genome project and, of course, ENCODE. The problem is that genomics hasn't lived up to these expectations and there's a very good reason for that: it's because the problem is a lot more difficult than it seems.

One of the Facebook groups that I follow (Modern Genetics & Technology)1 alerted me to a recent paper in JAMA that addressed the problem of genomics accuracy and the prediction of pathogenic variants. I'm posting the complete abstract so you can see the extent of the problem.

AlDubayan, S.H., Conway, J.R., Camp, S.Y., Witkowski, L., Kofman, E., Reardon, B., Han, S., Moore, N., Elmarakeby, H. and Salari, K. (2020) Detection of Pathogenic Variants With Germline Genetic Testing Using Deep Learning vs Standard Methods in Patients With Prostate Cancer and Melanoma. JAMA 324:1957-1969. [doi: 10.1001/jama.2020.20457]

Importance Less than 10% of patients with cancer have detectable pathogenic germline alterations, which may be partially due to incomplete pathogenic variant detection.

Objective To evaluate if deep learning approaches identify more germline pathogenic variants in patients with cancer.

Design Setting, and Participants A cross-sectional study of a standard germline detection method and a deep learning method in 2 convenience cohorts with prostate cancer and melanoma enrolled in the US and Europe between 2010 and 2017. The final date of clinical data collection was December 2017.

Exposures Germline variant detection using standard or deep learning methods.

Main Outcomes and Measures The primary outcomes included pathogenic variant detection performance in 118 cancer-predisposition genes estimated as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The secondary outcomes were pathogenic variant detection performance in 59 genes deemed actionable by the American College of Medical Genetics and Genomics (ACMG) and 5197 clinically relevant mendelian genes. True sensitivity and true specificity could not be calculated due to lack of a criterion reference standard, but were estimated as the proportion of true-positive variants and true-negative variants, respectively, identified by each method in a reference variant set that consisted of all variants judged to be valid from either approach.

Results The prostate cancer cohort included 1072 men (mean [SD] age at diagnosis, 63.7 [7.9] years; 857 [79.9%] with European ancestry) and the melanoma cohort included 1295 patients (mean [SD] age at diagnosis, 59.8 [15.6] years; 488 [37.7%] women; 1060 [81.9%] with European ancestry). The deep learning method identified more patients with pathogenic variants in cancer-predisposition genes than the standard method (prostate cancer: 198 vs 182; melanoma: 93 vs 74); sensitivity (prostate cancer: 94.7% vs 87.1% [difference, 7.6%; 95% CI, 2.2% to 13.1%]; melanoma: 74.4% vs 59.2% [difference, 15.2%; 95% CI, 3.7% to 26.7%]), specificity (prostate cancer: 64.0% vs 36.0% [difference, 28.0%; 95% CI, 1.4% to 54.6%]; melanoma: 63.4% vs 36.6% [difference, 26.8%; 95% CI, 17.6% to 35.9%]), PPV (prostate cancer: 95.7% vs 91.9% [difference, 3.8%; 95% CI, –1.0% to 8.4%]; melanoma: 54.4% vs 35.4% [difference, 19.0%; 95% CI, 9.1% to 28.9%]), and NPV (prostate cancer: 59.3% vs 25.0% [difference, 34.3%; 95% CI, 10.9% to 57.6%]; melanoma: 80.8% vs 60.5% [difference, 20.3%; 95% CI, 10.0% to 30.7%]). For the ACMG genes, the sensitivity of the 2 methods was not significantly different in the prostate cancer cohort (94.9% vs 90.6% [difference, 4.3%; 95% CI, –2.3% to 10.9%]), but the deep learning method had a higher sensitivity in the melanoma cohort (71.6% vs 53.7% [difference, 17.9%; 95% CI, 1.82% to 34.0%]). The deep learning method had higher sensitivity in the mendelian genes (prostate cancer: 99.7% vs 95.1% [difference, 4.6%; 95% CI, 3.0% to 6.3%]; melanoma: 91.7% vs 86.2% [difference, 5.5%; 95% CI, 2.2% to 8.8%]).

Conclusions and Relevance Among a convenience sample of 2 independent cohorts of patients with prostate cancer and melanoma, germline genetic testing using deep learning, compared with the current standard genetic testing method, was associated with higher sensitivity and specificity for detection of pathogenic variants. Further research is needed to understand the relevance of these findings with regard to clinical outcomes.

It's really difficult to understand this paper since there are many terms that I'd have to research more thoroughly; for example, does "germline whole-exon sequencing" mean that only sperm or egg DNA was sequenced and that every single exon in the entire genome was sequenced? Were exons in noncoding genes also sequenced?

I found it much more useful to look at the accompanying editorial by Gregory Feero.

Feero, W.G. (2020) Bioinformatics, Sequencing Accuracy, and the Credibility of Clinical Genomics. JAMA 324:1945-1947. [doi: 10.1001/jama.2020.19939]

Ferro explains that the main problem is distinguishing real pathogenic variants from false positives and this can only be accomplished by first sequencing and assembling the DNA and then using various algorithms to focus on important variants. Then there's the third step.

The third step, which often requires a high level of clinical expertise, sifts through detected potentially deleterious variations to determine if any are relevant to the indication for testing. For example, exome sequencing ordered for a patient with unexplained cardiomyopathy might harbor deleterious variants in the BRCA1 gene which, while a potentially important incidental finding, does not provide a plausible molecular diagnosis for the cardiomyopathy. The complexity of the bioinformatics tools used in these 3 steps is considerable.

It's that third step that's analyzed in the AlDubayan et al. paper and one of the tools used is a deep-learning (AI) algorithm. However, the training of this algorithm requiries considerable clinical expertise and testing it requires a gold standard set of variants to serve as an internal control. As you might have guessed, that gold standard doesn't exist because the whole point of the genomics is to identify perviously unknown deleterious alleles.

Ferro warns us that "clinical genome sequencing remains largely unregulated and accuracy is highly dependant on the expertise of individual testing laboratories." He concludes that genomics still has a long way to go.

The genomics community needs to act as a coherent body to ensure reproducibility of outcomes from clinical genome or exome sequencing, or provide transparent quality metrics for individual clinical laboratories. Issues related to achieving accuracy are not new, are not limited to bioinformatics tools, and will not be surmounted easily. However, until analytic and clinical validity are ensured, conversations about the potential value that genome sequencing brings to clinical situations will be challenging for clinical centers, laboratories that provide sequencing services, and consumers. For the foreseeable future, nongeneticist clinicians should be familiar with the quality of their chosen genome-sequencing laboratory and engage expert advice before changing patient management based on a test result.

I'm guessing that Gregory Feero doesn't think that in nine years (2030) "The clinical relevance of all encountered genomic variants will be readily predictable."


1. I do NOT recommend this group. It's full of amateurs who resist leaning and one of it's main purposes is to post copies of pirated textbooks in its files. The group members get very angry when you tell them that what they are doing is illegal!

Wednesday, April 07, 2021

Bold predictions for human genomics by 2030

After spending several years working on a book about the human genome I've come to the realization that the field of genomics is not delivering on its promise to help us understand what's in your genome. In fact, genomics researchers have by and large impeded progress by coming up with false claims that need to be debunked.

My view is not widely shared by today's researchers who honestly believe they have made tremendous progress and will make even more as long as they get several billion dollars to continue funding their research. This view is nicely summarized in a Scientific American article from last fall that's really just a precis of an article that first appeared in Nature. The Nature article was written by employees of the National Human Genome Research Institute (NHGRI) at the National Institutes of Health in Bethesda, MD, USA (Green et al., 2020). Its purpose is to promote the work that NHGRI has done in the past and to summarize its strategic vision for the future. At the risk of oversimplifying, the strategic vision is "more of the same."

Green, E.D., Gunter, C., Biesecker, L.G., Di Francesco, V., Easter, C.L., Feingold, E.A., Felsenfeld, A.L., Kaufman, D.J., Ostrander, E.A. and Pavan, W.J. and 20 others (2020) Strategic vision for improving human health at The Forefront of Genomics. Nature 586:683-692. [doi: 10.1038/s41586-020-2817-4]

Starting with the launch of the Human Genome Project three decades ago, and continuing after its completion in 2003, genomics has progressively come to have a central and catalytic role in basic and translational research. In addition, studies increasingly demonstrate how genomic information can be effectively used in clinical care. In the future, the anticipated advances in technology development, biological insights, and clinical applications (among others) will lead to more widespread integration of genomics into almost all areas of biomedical research, the adoption of genomics into mainstream medical and public-health practices, and an increasing relevance of genomics for everyday life. On behalf of the research community, the National Human Genome Research Institute recently completed a multi-year process of strategic engagement to identify future research priorities and opportunities in human genomics, with an emphasis on health applications. Here we describe the highest-priority elements envisioned for the cutting-edge of human genomics going forward—that is, at ‘The Forefront of Genomics’.

What's interesting are the predictions that the NHGRI makes for 2030—predictions that were highlighted in the Scientific American article. I'm going to post those predictions without comment other than saying that I think they are mostly bovine manure. I'm interested in hearing your comments.

Bold predictions for human genomics by 2030

Some of the most impressive genomics achievements, when viewed in retrospect, could hardly have been imagined ten years earlier. Here are ten bold predictions for human genomics that might come true by 2030. Although most are unlikely to be fully attained, achieving one or more of these would require individuals to strive for something that currently seems out of reach. These predictions were crafted to be both inspirational and aspirational in nature, provoking discussions about what might be possible at The Forefront of Genomics in the coming decade.

  1. Generating and analysing a complete human genome sequence will be routine for any research laboratory, becoming as straightforward as carrying out a DNA purification.
  2. The biological function(s) of every human gene will be known; for non-coding elements in the human genome, such knowledge will be the rule rather than the exception.
  3. The general features of the epigenetic landscape and transcriptional output will be routinely incorporated into predictive models of the effect of genotype on phenotype.
  4. Research in human genomics will have moved beyond population descriptors based on historic social constructs such as race.
  5. Studies that involve analyses of genome sequences and associated phenotypic information for millions of human participants will be regularly featured at school science fairs.
  6. The regular use of genomic information will have transitioned from boutique to mainstream in all clinical settings, making genomic testing as routine as complete blood counts.
  7. The clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation ‘variant of uncertain significance (VUS)’ obsolete.
  8. An individual’s complete genome sequence along with informative annotations will, if desired, be securely and readily accessible on their smartphone.
  9. Individuals from ancestrally diverse backgrounds will benefit equitably from advances in human genomics.
  10. Breakthrough discoveries will lead to curative therapies involving genomic modifications for dozens of genetic diseases.

I predict that nine years from now (2030) we will still be dealing with scientists who think that most of our genome is functional; that most human protein-coding genes produce many different proteins by alternative splicing; that epigenetics is useful; that there are more noncoding genes than protein-coding genes; that the leading scientists in the 1960 and 70s were incredibly stupid to suggest junk DNA; that almost every transcription factor binding site is biologically relevant; that most transposon-related sequences have a mysterious (still unknown) function; that it's still a mystery why humans are so much more complex than chimps; and that genomics will eventually solve all problems by 2040.

Why in the world, you might ask, would we still be dealing with issues like that? Because of genomics.


Saturday, April 03, 2021

"Dark matter" as an argument against junk DNA

Opponents of junk DNA have been largely unsuccessful in demonstrating that most of our genome is functional. Many of them are vaguely aware of the fact that "no function" (i.e. junk) is the default hypothesis and the onus is on them to come up with evidence of function. In order to shift, or obfuscate, this burden of proof they have increasingly begun to talk about the "dark matter" of the genome. The idea is to pretend that most of the genome is a complete mystery so that you can't say for certain whether it is junk or functional.

One of the more recent attempts appears in the "Journal Club" section of Nature Reviews Genetics. It focuses on repetitive DNA.

Before looking at that article, let's begin by summarizing what we already know about repetitive DNA. It includes highly repetitive DNA consisting of mutliple tandem repeats of short sequences such as ATATATATAT... or CGACGACGACGA ... or even longer repeats. Much of this is located in centromeric regions of the chromosome and I estimate that functional highly repetitve regions make up about 1% of the genome.[see Centromere DNA and Telomeres]

The other part of repetitive DNA is middle repetitive DNA, which is largely composed of transposons and endogenous viruses, although it includes ribosomal RNA genes and origins of replication. Most of these sequences are dispersed as single copies throughout the genome. It's difficult to determine exactly how much of the genome consists of these middle repetitive sequences but it's certainly more than 50%.

Almost all of the transposon- and virus-related sequences are defective copies of once active transposons and viruses. Most of them are just fragments of the originals. They are evolving at the neutral rate so they look like junk and they behave like junk.1 That's not selfish DNA because is doesn't transpose and it's not "dark matter." These fragments have all the characterstics of nonfunctional junk in our genome.

We know that the C-value paradox is mostly explained by differing amounts of repetitive DNA in different genomes and this is consistent with the idea that they are junk. We know that less that 10% of our genome is conserved and this fits in with that conclusion. Finally, we know that genetic load arguments indicate that most our genome must be impervious to mutation. Combined, these are all powerful bits of evidence and logic in favor of repetitive sequences being mostly junk DNA.

Now let's look at what Neil Gemmell says in this article.

Gemmell, N.J. (2021) Repetitive DNA: genomic dark matter matters. Nature Reviews Genetics:1-1. [doi: 10.1038/s41576-021-00354-8]

"Repetitive DNA sequences were found in hundreds of thousands, and sometimes millions, of copies in the genomes of most eukaryotes. while widespread and evolutionarily conserved, the function of these repeats was unknown. Provocatively, Britten and Kohne concluded 'a concept that is repugnant to us is that about half of the DNA of higher organisms is trivial or permanently inert.'”"

That's from Britten and Kohne (1968) and it's true that more than 50 years ago those workers didn't like the idea of junk DNA. Britten argued that most of this repetitive DNA was likely to be involved in regulation. Gemmell goes on to describe centromeres and telomeres and mentions that most repetitive DNA was thought to be junk.

"... the idea that much of the genome is junk, maintained and perpetuated by random chance, seemed as broadly unsatisfactory to me as it had to the original authors. Enthralled by the mystery of why half our genome is repetitive DNA, I have followed this field ever since."

Gemmell is not alone. In spite of all the evidence for junk DNA, the majority of scientists don't like the fact that most of our genome is junk. Here's how he justifies his continued skepticism.

"But it was not until the 2000s, as full eukaryotic genome sequences emerged, that we discovered that the repetitive non-coding regions of our genome harbour large numbers of promoters, enhancers, transcription factor binding sites and regulatory RNAs that control gene expression. More recently, the importance of repetitive DNA in both structural and regulatory processes has emerged, but much remains to be discovered and understood. It is time to shine further light on this genomic dark matter."

This appears to be the ENCODE publicity campaign legacy rearing its ugly head once more. Most Sandwalk readers know that the presence of transcription factor binding sites, RNA polymerase binding sites, and junk RNA is exactly what one would predict from a genome full of defective transposons. Most of us know that a big fat sloppy genome is bound to contain millions of spurious binding sites for transcription factors so this says nothing about function.

Apparently Gemmell's skepticism doesn't apply to the ENCODE results so he still thinks that all those bits and pieces of transposons are mysterious bits of dark matter that could be several billion base pairs of functional DNA. I don't know what he imagines they could be doing.


Photo Credit: The photo shows human chromosomes labelled with a telomere probe (yellow), from Christoher Counter at Duke University.

1. In my book, I cover this in a section called "If it walks like a duck ..." It's a form of abductive reasoning.

Britten, R. and Kohne, D. (1968) Repeated Sequences in DNA. Science 161:529-540. [doi: 10.1126/science.161.3841.529]

Friday, April 02, 2021

Off to the publisher!

The first draft of my book is ready to be sent to my publisher.

Text by Laurence A. Moran

Cover art and figures by Gordon L. Moran

  • 11 chapters
  • 112,000 words (+ preface and glossary)
  • about 370 pages (estimated)
  • 26 figures
  • 305 notes
  • 400 references

©Laurence A. Moran