More Recent Comments

Wednesday, June 29, 2022

The Function Wars Part XII: Revising history and defending ENCODE

I'm very disappointed in scientists and philosophers who try to defend ENCODE's behavior on the grounds that they were using a legitimate definition of function. I'm even more annoyed when they deliberately misrepresent ENCODE's motive in launching the massive publicity campaign in 2012.

Here's another new paper on the function wars.

Ratti, E. and Germain, P.-L. (2021) A Relic of Design: Against Proper Functions in Biology. Biology & Philosophy 37:27. [doi: 10.1007/s10539-022-09856-z]

The notion of biological function is fraught with difficulties - intrinsically and irremediably so, we argue. The physiological practice of functional ascription originates from a time when organisms were thought to be designed and remained largely unchanged since. In a secularized worldview, this creates a paradox which accounts of functions as selected effect attempt to resolve. This attempt, we argue, misses its target in physiology and it brings problems of its own. Instead, we propose that a better solution to the conundrum of biological functions is to abandon the notion altogether, a prospect not only less daunting than it appears, but arguably the natural continuation of the naturalisation of biology.

Tuesday, June 28, 2022

The Function Wars Part XI: Stefan Linquist responds to my critique

Stefan Linquist is a philosopher at the University of Guelph (Guelph, Ontario, Canada). He recently published a paper on function that I discussed [The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect]. This is his response.


Hi Larry,

First, thank you for giving my paper a careful read. The intended audience is the community of biologically-minded philosophers who seem largely convinced that:

1) Genes are so passé. More specifically, when it comes to explaining phenotypic development and evolution, such non-genic factors as noncoding RNA, maternally inherited methylation patterns, repetitive elements, etc. are equally if not more significant than genes. It is a short step to the view that most of these elements are somehow functional for the organism. Stated pejoratively, thinkers like John Mattick and Evelyn Fox-Keller have had a significant intellectual founder-effect on my discipline. My paper attempts to push back against this trend.

2) Molecular biology can and should ignore evolution. The idea here is that when it comes to the search for molecular mechanisms, it doesn’t matter if genomes are the product of multi-level evolution or if they had been created by God. When you work on mechanisms, you do experiments, and evolutionary considerations are irrelevant to how those experiments are conducted and interpreted. Or, so the thinking goes.

Many of your blog posts present counter arguments to these ideas with a level of understanding and precision that exceeds my efforts in this paper. Nonetheless, I want to take issue with your one suggestion (if I understand correctly) that biochemists tend to operate with a sophisticated understanding of the genome. My position is that biochemistry might be necessary, but is not sufficient for an informed view of genomics. Without Darwinian reasoning, biochemistry leads down unnecessary blind alleys.

Obvious to whom?

Let me be upfront that I am something of an academic bumpkin in comparison to fancy city-folk like you, or Palazzo, or Graur, or Doolittle, or Haig. My knowledge of molecular genetics is largely self taught. This is partly why I am perplexed by statements like the following. In a special collection of Chromosome Research entitled, “Transposable elements and the multidimensional genome” (2018), P.A. Larson (the collection editor) opens with this doozy:

“There is no such thing as “junk DNA.” Indeed, a suite of discoveries made over the past few decades have put to rest this misnomer and have identified many important roles that so-called junk DNA provides to both genome structure and function…”

Is it me? Or is it him? It’s him, right? My point is simply that it can’t be obvious to everyone within the molecular biological community that not every binding site or repetitive element is somehow functional for the organism. This is to say nothing of the hype surrounding lncRNA.

My argument in the paper is that the missing piece of information is an understanding of where the majority of eukaryotic DNA comes from: a byproduct of coevolutionary interactions between parasitic TEs and the cell. Indeed, I provide evidence in another publication, Transposon dynamics and the epigenetic switch hypothesis, that over the past two decades or so, within the fields of molecular biology and biomedicine, interest in TEs has steadily declined. This trend is surprising given that over the same period we have come to learn just how prevalent TEs are in most genomes. I think that I can show, in another forthcoming paper, that this trend toward ignoring TE coevolutionary dynamics is associated with the increased biomedicalization of molecular biology as a discipline (more on that another time, perhaps). Whether this decline of interest in TEs is responsible for the tendency to interpret junk DNA as somehow functional is a further question.

Another factor that I find perplexing is the trickle of molecular biology majors who attend my philosophy of biology undergraduate seminar. I'm not surprised that they show up at all, rather I'm surprised about their conviction that any biochemically active region of the genome simply must be functional for the organism. "Functional until proven otherwise" seems to be the mantra that one must memorize in order to pass the med-school admissions exam. When I suggest to them that Darwinian reasoning leads to an alternative hypothesis about most of the DNA in eukaryotic genomes, they balk. Some just leave my class: “What does he know, just a philosopher.” Such is the life of an academic bumpkin from the intellectual sticks.

This is all to say that, yes, you are correct that my paper presents no new biological data. In a sense, it is old news. But it is news that many people –even some academic city slickers-- seem not to have absorbed.

I like to think of philosophy journals as a clearing house for discussions that are extremely important, but would be unlikely to elbow their way into the pages of most scientific journals. Aside of helpful blogs such as yours, where else are we to debate the theoretical framing and interpretation of junk DNA?

What’s with the philosophical obsession over functions?

It’s true that my article focuses on this longstanding debate over CR vs SE functions. I can imagine that from the perspective of a molecular biologist (with such a rich ontology to draw from, and so many fine grained distinctions at your disposal) this binary must appear ham-fisted.

Let me say two things. First, I repeat that my main audience is the community of biological philosophers. In this context, these basic categories of function and the debates that surround them provide a lingua franca. To have this discussion without connecting it to function concepts would seem odd. Second, I think that you and I would both be happy if the word “function” in genetics were restricted to what I elsewhere call maintenance functions. That is, to elements that have been maintained by purifying selection. However, many of my colleagues are so convinced of point 2 (above) that this proposal is essentially a non-starter. That is, they maintain that since molecular biology doesn't investigate causal role functions (a big assumption, but let it go for now), then this discipline can ignore Darwinian reasoning. My argument is that this inference is too quick. A problem with CR functions is their permissiveness: any old strand of DNA can have some CR function or other. What we need is some way to sort the functional wheat from the junky chaff. To do that, thinking about selective history is your best bet. In effect, you can deny entrance to Darwin at the front door if you want, but eventually you’ll have to let him in through the back.

A final note on the term “selective history.” You suggested that I should have instead used “evolution” in order to discourage a Panglossian view of the genome. The issue I see with your suggestion is that “evolution” is too vague –it really just means change over time. My contention is that one needs to do more than just consider historical (e.g. phylogenetic) details in order to take a biologically informed view of the genome. In addition, one needs to think about how the cell coevolves with parasitic TEs. Maybe “coevolutionary dynamics” would have been a better choice.

Finally, a plug. The paper you read is part of a special collection in Biology and Philosophy that I co-edited with Ford Doolittle entitled, “Function, junk and transposable elements: contested issues in the science of genomics.” As I write, I see that three papers have so far appeared and the other two (including mine and one coauthored by Alex Palazzo) should see the light of day soon:

Function, junk and transposable elements: contested issues in the science of genomics

Hopefully some of these will provide additional fodder.
Cheers,
Stefan

Thursday, June 23, 2022

The Function Wars Part X: "Spam DNA"?

The authors of a recent paper think we need a new term "spam DNA" to describe some features of the human genome.

Fagundes, N.J., Bisso-Machado, R., Figueiredo, P.I., Varal, M. and Zani, A.L. (2022) What We Talk About When We Talk About “Junk DNA”. Genome Biology and Evolution 14:evac055. [doi: 10.1093/gbe/evac055]

“Junk DNA” is a popular yet controversial concept that states that organisms carry in their genomes DNA that has no positive impact on their fitness. Nonetheless, biochemical functions have been identified for an increasing fraction of DNA elements traditionally seen as “Junk DNA”. These findings have been interpreted as fundamentally undermining the “Junk DNA” concept. Here, we reinforce previous arguments that this interpretation relies on an inadequate concept of biological function that does not consider the selected effect of a given genomic structure, which is central to the “Junk DNA” concept. Next, we suggest that another (though ignored) confounding factor is that the discussion about biological functions includes two different dimensions: a horizontal, ecological dimension that reflects how a given genomic element affects fitness in a specific time, and a vertical, temporal dimension that reflects how a given genomic element persisted along time. We suggest that “Junk DNA” should be used exclusively relative to the horizontal dimension, while for the vertical dimension, we propose a new term, “Spam DNA”, that reflects the fact that a given genomic element may persist in the genome even if not selected for on their origin. Importantly, these concepts are complementary. An element can be both “Spam DNA” and “Junk DNA”, and “Spam DNA” can also be recruited to perform evolved biological functions, as illustrated in processes of exaptation or constructive neutral evolution.

The authors are scientists at the Federal Univesity of Rio Grande do Sul in Brazil. They are concerened about the origins of junk DNA and whether true selected effect functions (strong selected effect = SSE) conflict with the definition of junk DNA. Here's how they put it,

Paradoxically as it may seem, under the SSE definition, elements that contribute positively to fitness and are maintained by purifying selection would still count as “junk” only because they did not originate as an adaptation.

This is essentially correct according to how many philosophers define selected effect functions but that issue was resolved by focusing on purifying selection as the important criterion and ignoring the history of the trait (= maintenance function, MF). There is only a 'paradox' if you stick to the philosophy definition of function (i.e. SSE) and even then, the paradox only exists if the SSE definition is the only way to identify junk DNA. [see: The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect] The authors recognize this since they include a good discussion of this other definition (MF) and its advantages. Nevertheless, they propose a new term called "spam DNA" to help clarify the problem.

"Spam DNA" represents every genomic element which has not been selected for during its origin in the genome, even if it currently participates in relevant biological functions.

All of the DNA in the light blue box is spam DNA. Note that it includes DNA that is currently functional as long as it originated from junk DNA as they define it. Also, some junk DNA isn't spam DNA as long as it arose from the inactivation of DNA that used to have a function. Thus, pseudogenes aren't junk and neither are bits and pieces of transposons.

This isn't helpful. The current debate is about how much of our genome is junk so who cares about the history of individual sequences? A significant amount of what we currently define as junk DNA may have come from once-active transposons but we may never be able to trace the history of each piece of junk DNA. Does it fall into the first category in the figure (functional to junk) or is it spam DNA? Is this really important? No,it is not.

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

Wednesday, June 22, 2022

The Function Wars Part IX: Stefan Linquist on Causal Role vs Selected Effect

How much of the human genome is functional? This a problem that will be solved by biochemists not epistemologists.

What is junk DNA? What is functional DNA? Defining your terms is a key part of any scientific controversy because you can't have a debate if you can't agree on what you are debating. We've been debating the prevalence of junk DNA for more than 50 years and much of that debate has been (deliberately?) muddled by one side or the other in order to score points. For example, how many times have you heard the ridiculous claim that all noncoding DNA was supposed to be junk DNA? And how many times have you heard that all transcripts must have a function merely because they exist?

Tuesday, June 14, 2022

Distrust simplicity (and turn off your irony meters)

I just stumbled upon an opinion piece published in EMBO Reports on May 22, 2022. The author is Frank Gannon who is identified as the former Director of the QIMR Berghofer Medical Research Institute in Brisbane, Australia and the title of the article is "Seek simplicity and distrust it."

I'm about to quote some excerpts from the article but before doing so I need to warn you to run off your irony meters—even if you have the latest version with the most recent software updates.

Gannon's main point is that scientists should seek simple explanations but they must be willing to abandon them when better data comes along. He gives us some examples.

However, it seems that there is a collective amnesia among scientists such that we forget to distrust the simplicity that we pursue on our path to insight. The central dogma of molecular biology—that information flows unidirectionally from DNA to RNA to protein—was overturned, at least in part, with the discovery that this linear cascade could be reversed by reverse transcription.

Really? The Central Dogma of Molecular Biology was overturned, "at least in part," by reverse transcriptase? (It wasn't.) If you are going to write about a topic like this then you'd better make sure you know what you're talking about.

The great quote from Jacques Monod “What is true for E. coli is true for the elephant”, held valid only until the discovery of introns in eukaryotes. As I was close to the earliest data that pointed to the existence of split genes, I am well aware of the incredulity of biologists when they realised that genetic material did not have the same simple design irrespective of the organism.

Monod's statement was never supposed to be taken as literally as that.1 He was referring to the unity of biochemistry (Friedman, 2004). This is clear from what he says in Chance and Necessity, "Today we know that from the bacterium to man the chemical machinery is essentially the same, in both its structure and functioning." He meant that all species have DNA, RNA, and protein and that these molecules carry out the same roles in humans as they do in bacteria. The essence of this simple observation is as true today as it was 50 years ago.

The death of “Junk DNA”—a term, coined in 1972 by Susumu Ohno for the non-coding parts of the genome—has been more gradual. The perception that exons are the only useful part of the genome has been proven wrong with the discoveries of noncoding RNA, the controlling roles of intra-genomic areas, the essential interactions between distant genomic regions and peptides encoded by short open frame regions.

Did you turn off your irony meter? Don't say I didn't warn you. Jacques Monod (and Susumu Ohno) would be surprised to learn that in 1972 they knew nothing about noncoding genes and regulatory sequences.

More seriously, how did we ever get to the stage where a prominent scientist who frequently publishes opinion pieces in EMBO Reports could be so ignorant of the junk DNA controversy after all that's been written about it in the past ten years?



1. Besides, introns exist in bacteria.

Friedman, H.C. (2004) From Butyribacterium to E. coli: An Essay on Unity in Biochemistry. Perspectives in Biology and Medicine 47:47-66. [doi: 10.1353/pbm.2004.0007]

Monday, June 13, 2022

Manolis Kellis dismisses junk DNA

Manolis Kellis is a professor of computer science at the Massachusetts Institute of Technology (MIT). Sandwalk readers will remember him as one of the ENCODE leaders who participated in the massive publicity campaign of 2012 where they attempted to prove that most of the human genme is functional, not junk. He is the lead author of the semi-retraction that was published eighteen months later. [What did the ENCODE Consortium say in 2012 and 2014?]

Kellis was interviewed in April 2022 and it's interesting to hear his current views on junk DNA especially since MIT has just been rated the top university in the world for the 11th straight year. [QS ranks MIT the world’s No. 1 university for 2022-23].

His response to a question about junk DNA begins at 58 minutes. Kellis makes three points.

  • He doesn't like the word "junk."
  • Lots of noncoding DNA has known functions such as noncoding genes and regulatory sequences.
  • Half of our genome consists of transposon sequences and their regulatory regions fueled the mammalian radiation following the asteroid impact so that modern mammalian genomes now contain a complex and sophisticated network of regulatory sequences.

As I suspected, Kellis still doesn't recognize any of the evidence for junk DNA that was briefly outlined in the Kellis et al. (2014) paper. I find it surprising that after a decade of being exposed to criticism of his stance on junk DNA he is still not capable of presenting a cogent argument against junk.


Kellis, M. et al. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) April 24, 2014 published online [doi: 10.1073/pnas.1318948111]

Monday, June 06, 2022

My father on D-day

Today is the 78th anniversary of D-Day—the day British, Canadian, and American troops landed on the beaches of Normandy in World War II.1

For us baby boomers it always meant a day of special significance for our parents. In my case, it was my father who took part in the invasions. That's him on the right as he looked in 1944. He was an RAF pilot flying rocket-firing typhoons in close support of the ground troops. His missions were limited to quick strikes and reconnaissance during the first few days of the invasion because Normandy was at the limit of their range from southern England. During the second week of the invasion (June 14th) his squadron landed in Crepon, Normandy and things became very hectic from then on with several close support missions every day [see Hawker Hurricanes and Typhoons in World War II].


Monday, May 16, 2022

Wikipedia editors want to supress an article on junk DNA

I've been trying to fix the Wikipedia artilce on Noncoding DNA but it's quite a challenge because the page is controlled by editors who are opposed to junk DNA and I am accused of starting an "edit war" that goes against the consensus. On a parallel track, I have proposed creating a separate Wikipedia article on junk DNA where we can present the evidence for and against junk. This is being disussed under the "Talk" thread on the "Non-coding DNA" article.

Here's an exchange bewteen me [Genome42] and one of the editors who exerts control over the noncoding DNA page. It's shows you what we are up against.

Let's get back to the main topic. Is there anyone here who objects to creating a separate page for junk DNA? If you object, please explain why because it seems to me that we really need such a page in order to explain to viewers what the main issues are in the controversy. We need some place to put all the evidence showing that 90% of the human genome is junk and to explain why many scientists reject this evidence.Genome42 (talk) 20:18, 15 May 2022 (UTC)

I looked at pubmed and searched for "junk dna" to see how prominent this topic even is. It seems the term is declining in usage in the scientific literature [7] (see the "results by year"). This is despite all of the abundant media coverage it still gets. I would say that if the usage in the scientific literature was rising then perhaps it would be good a good idea, but the reverse is happening. I see an increasing number of papers calling for abandoning the term altogether too. Just an FYI, one of the original reasons for the merge of the junk DNA to this article was that it was causing too much confusion and edit warring as a separate page. When merged you could have the general article on noncoding DNA without the fireworks and a section isolating the controversies coming from it rather than having 2 pages on the same topic with the Junk DNA article mixing controversy with general information on noncoding DNA.Ramos1990 (talk) 21:28, 15 May 2022 (UTC)

Are you serious? Do you really believe that the debate is over and junk DNA doesn't exist just because the opinions you prefer to read are against junk? You don't seem to be knowledgeable about this topic. I can help you get up to speed. Read these articles on my blog.

Also, you seem to be genuinely confused about the difference between junk DNA and noncoding DNA. Think of it this way. Genomes can be divided into centromeric DNA and non-centromeric DNA and the junk is located in the non-centromeric DNA. Does that mean we should have an article on non-centromeric DNA where we discuss junk? We can also split the genome between regulatory DNA and non-regulatory DNA but I don't see you calling for an article on non-regulatory DNA where we discuss junk DNA.

The only reason why you favor discussing junk DNA in a article on non-coding DNA is because you think that junk DNA was once defined as non-coding DNA and this article will prove that some non-coding DNA has a function - therefore it is not all junk. That's an extremely biased, and incorrect, view. No knowledgeable scientist ever defended the claim that all noncoding DNA was junk. Do you think we didn't know about noncoding genes, regulatory sequences, and origins of replication back in the 1960s?

Genomes can be separated into functional DNA and junk DNA and that's where the debate is. The non-coding DNA fraction is a heterogeneous mixture of functional elements and junk DNA and it's very confusing to mix them. An article on junk DNA will discuss all of the various functional regions of the genome and how common they are in the human genome. We will see that if you add them all up you only get to about 5% of the genome. The article will discuss the evidence for junk DNA and the arguments against claims for abundant function. None of that is appropriate in an article on non-coding DNA.

It's easy for me to see why there was "edit warring" over a junk DNA article. It's because many of the editors here are opposed to junk DNA so they try to suppress the legitimate scientific debate. You need to recognize that what you are doing here is expressing a very personal and biased opinion about the topic of junk DNA and you are using your position to start edit wars in order to censure any views in favor of junk DNA. Genome42 (talk) 14:49, 16 May 2022 (UTC)


Sunday, May 15, 2022

Describing non-coding DNA on the NIH (USA) National Human Genome Research Institute website

Here's a link to a short podcast on non-coding DNA narrated by Shurjo K. Sen, Program Director, Divison of Genome Sciences. This is the complete text.

Non-coding DNA. So I could talk about this one forever because it actually happened to be the part of the genome that I did most of my PhD work in. And there used to be an older and derogatory term called junk DNA, which, thankfully, doesn't get used these days much longer. So really, the thing to keep in mind here that human genome is a vast, vast expanse of nucleotides, 3.3 billion almost. And only a very, very small fraction of that, about 2% actually codes for what we know to be proteins. And so the question is, what really happens with the rest? Is it just there doing nothing? Or does it have a function? And for many years, particularly in the earlier stages of genomics as a field, people were not really sure that the non-coding parts of the genome have a purpose for being there. And now, or I would say over the last decade or so maybe, we are only just starting to realize that there are an immense number of ways in which what we think of as non-coding actually might just have a more subtle way of passing its information along. So it may not code in the classical protein-coding sense. But there is a ton of information crucial in many, many ways that is hidden in this part of the genome.

I wish I could tell you that this is some kind of a spoof but it's not. It's an example of the poor state of science these days and of how much work we need to do to fix it. I would start by firing the Program Director of the Division of Genome Sciences.


Saturday, May 14, 2022

Editing the Wikipedia article on non-coding DNA

I decided to edit the Wikipedia article on non-coding DNA by adding new sections on "Noncoding genes," "Promoters and regulatory sequences," "Centromeres," and "Origins of replication." That didn't go over very well with the Wikipedia police so they deleted the sections on "Noncoding genes" and "Origins of replication." (I'm trying to restore them so you may see them come back when you check the link.)

I also decided to re-write the introduction to make it more accurate but my version has been deleted three times in favor of the original version you see now on the website. I have been threatened with being reported to Wikipedia for disruptive edits.

The introduction has been restored to the version that talks about the ENCODE project and references Nessa Carey's book. I tried to move that paragraph to the section on the ENCODE project and I deleted the reference to Carey's book on the grounds that it is not scientifically accurate [see Nessa Carey doesn't understand junk DNA]. The Wikipedia police have restored the original version three times without explaining why they think we should mention the ENCODE results in the introduction to an article on non-coding DNA and without explaining why Nessa Carey's book needs to be referenced.

The group that's objecting includes Ramos1990, Qzd, and Trappist the monk. (I am Genome42.) They seem to be part of a group that is opposed to junk DNA and resists the creation of a separate article for junk DNA. They want junk DNA to be part of the article on non-coding DNA for reasons that they don't/won't explain.

The main problem is the confusion between "noncoding DNA" and "junk DNA." Some parts of the article are reasonably balanced but other parts imply that any function found in noncoding DNA is a blow against junk DNA. The best way to solve this problem is to have two separate articles; one on noncoding DNA and it's functions and another on junk DNA. There has been a lot of resistance to this among the current editors and I can only assume that this is because they don't see the distinction. I tried to explain it in the discussion thread on splitting by pointing out that we don't talk about non-regulatory DNA, non-centromeric DNA, non-telomeric DNA, or non-origin DNA and there's no confusion about the distinction between these parts of the genome and junk DNA. So why do we single out noncoding DNA and get confused?

It looks like it's going to be a challenge to fix the current Wikipedia page(s) and even more of a challenge to get a separate entry for junk DNA.

Here is the warning that I have received from Ramos1990.

Your recent editing history shows that you are currently engaged in an edit war; that means that you are repeatedly changing content back to how you think it should be, when you have seen that other editors disagree. To resolve the content dispute, please do not revert or change the edits of others when you are reverted. Instead of reverting, please use the talk page to work toward making a version that represents consensus among editors. The best practice at this stage is to discuss, not edit-war. See the bold, revert, discuss cycle for how this is done. If discussions reach an impasse, you can then post a request for help at a relevant noticeboard or seek dispute resolution. In some cases, you may wish to request temporary page protection.

Being involved in an edit war can result in you being blocked from editing—especially if you violate the three-revert rule, which states that an editor must not perform more than three reverts on a single page within a 24-hour period. Undoing another editor's work—whether in whole or in part, whether involving the same or different material each time—counts as a revert. Also keep in mind that while violating the three-revert rule often leads to a block, you can still be blocked for edit warring—even if you do not violate the three-revert rule—should your behavior indicate that you intend to continue reverting repeatedly.

I guess that's very clear. You can't correct content to the way you think it should be as long as other editors disagree. I explained the reason for all my changes in the "history" but none of the other editors have bothered to explain why they reverted to the old version. Strange.


Friday, April 15, 2022

Most lncRNAs are junk

A hard-hitting review will be published in Annual Review of Genomics and Human Genetics. It shows that the case for large numbers of functional lncRNAs is grossly exaggerated.

A long-time Sandwalk reader (Ole Kristian Tørresen) alerted me to a paper that's coming out next October in Annual Review of Genomics and Human Genetics. (Thank-you Ole.) The authors of the review are Chris Ponting from the University of Edinburgh (Edinburgh, Scotland, UK) and Wilfried Haerty at the Earlham Institute in Norwich, UK. They have been arguing the case for junk DNA for the past two decades but most of their arguments are ignored. This paper won't be so easy to ignore because it makes the case forcibly and critically reviews all the false claims for function. I'm going to quote a few juicy parts because I know that many of you will not be able to access the preprint.

Friday, April 08, 2022

The structures of centromeres

The new complete human genome sequence gives us a first-time look at the structures of human centromeres.

This is my sixth post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

The new long-read and ultra-long-read sequencing techniques have revealed the organization of centromeric regions of human chromosomes. The basic structure of these regions has been known for many years [Centromere DNA] but the overall arrangement of the various repeats and the large scale organizaton of the centromere was not clear.

The core functional regions of centromeres consist of multiple copies of tandemly repeated alpha-satellite sequences. These are 171 bp AT-rich sequences that serve as attachment sites for kinetocore proteins. The kinetochore proteins interact with spindle fibers that pull the chromosomes to the opposite ends of a dividing cell. The core region is surrounded by pericentromeric regions containing additional repeats (mostly HSat2 and HSat3). The alpha-satellite repeats take up almost 3% of the genome and the pericentromeric repeats occupy an additional 3%.1 That's why centromeres are a major component of the functional part of the human genome. (Centromeres are classic examples of functional noncoding DNA and knowledgeable scientists have known about them for half a century.2

Altemose, N., Logsdon, G.A., Bzikadze, A.V., Sidhwani, P., Langley, S.A., Caldas, G.V., Hoyt, S.J., Uralsky, L., Ryabov, F.D., Shew, C.J. and et al. (2021) Complete genomic and epigenetic maps of human centromeres. Science 376:56. [doi: 10.1126/science.abl4178]

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.

The details of the organization of each centromere aren't important. There's a lot of variation between centromeres on different chromosomes and between specific centromeres in different individuals. The authors looked at the organization of X chromosome centromeres in a variety of different individuals from different parts of the world. As expected, there was considerable variation and, as expected, there was more variation within Africans than in all other populations combined.

It shouldn't come as a surprise to find that the authors want more T2T sequences.

This high degree of satellite DNA polymorphism underlines the need to produce T2T assemblies from genetically diverse individuals, to fully capture the extent of human variation in these regions, and to shed light on their recent evolution.

I really hope the granting agencies don't fall for this. It would be much better to spend the resources on exploring the biological function of splice variants (alternative splicing?) and putative noncoding genes in order to resolve the junk DNA controversy. It would also help to devote some of this money to the proper education of science undergraduates.

The authors claim to have discovered 676 genes and pseudogenes within the centromeres. They claim that this includes 23 protein coding genes and 141 lncRNAs genes. They present evidence that three of these genes might have a function which means that 161/164 of these "genes" are "putative" genes until we see evidence of function.3


1. It's unlikely that most of this 6% is absolutely required for the proper functioning of the centromeres because there are many individuals with much less centromere DNA. That's why I only attribute about 1% of the genome to functional centromere sequence.

2. Unknowledgeable scientists continue to be shocked when they discover that noncoding DNA can have a biological function. This is because they weren't taught properly as undergraduates.

3. I don't understand why so many scientists are unable to see the difference between a putative gene and a real gene.

Wednesday, April 06, 2022

Genetic variation and the complete human genome sequence

The new complete human genome sequence adds an extra 8% of DNA sequence that's a source of variation in the human population. The sequence also corrects some errors in the current standard reference genome.

This is my fifth post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

Tuesday, April 05, 2022

Two different views of the history of molecular biology

How can different molecular biologists have such opposite views of the history of their field?

I'm posting links to two papers without comment. One of them is from my friend and colleague Alex Palazzo and the other is from James Shapiro who is not my friend or colleague. Both papers have been published in reputable peer-review journals.

Transcription activity in repeat regions of the human genome

A detailed examination of the new complete human genome reveals that 54% of it consists of various repetitive elements. Some of them are transcribed and some aren't.

This is my fourth post on the complete telomere-to-telomere sequence of the human genome in cell line CHM13 (T2T-CHM13). There were six papers in the April 1st edition of Science. My posts on all six papers are listed at the bottom of this post.

The fourth paper extends the ENCODE-type analysis of the T2T-CHM13 sequence by focusing on repeats.

Hoyt, S.J., Storer, J.M., Hartley, G.A., Grady, P.G., Gershman, A., de Lima, L.G., Limouse, C., Halabian, R., Wojenski, L., Rodriguez, M. et al. (2021) From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science 376:57. [doi: 10.1126/science.abk3112]

Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.

The most useful part of this paper is the complete analysis of all repetitive elements in the T2T-CHM13 genome. This gives us, for the first time, a complete picture of a human genome. The exact values of the various components aren't important because there's considerable variation with the human population but the big picture is informative.

These are the percentages of the human genome occupied by the different classes of repetitive DNA.

  • SINEs 12.8%
  • Retrotransposon 0.15%
  • LINEs 20.7%
  • LTRs 8.8%
  • DNA transposons 3.6%
  • simple repeats 8%

The total comes to 54%. There are other estimates that are higher because of a more lenient cutoff value for sequence similarity but this gives you a pretty good idea of what the genome looks like. Most of the transposon-related sequence consists of fragments of once active transposons so the fraction of the genome consisting of true selfish DNA capable of transposing is a small fraction of this 54%.

We have every reason to believe that most of this DNA is junk DNA based on several lines of evidence developed over the past 50 years but most of the authors of this paper are reluctant to reach that conclusion so the fact that these repetitive sequences might be junk isn't mentioned in the paper. Instead, the authors concentrate on mapping CpG methylation sites and transcribed regions. They refer to this as "functional annotation" but they don't provide a definition of function.

We provide a high-confidence functional annotation of repeats across the human genome.

As you might expect, the repeat elements that retain vestiges of promoters are often transcribed and this includes adjacent genomic sequences that are found near these promoter (e.g. near LTRs). The long stretches of short tandem repeats (e.g. satellite DNA) do not contain any sequences that resemble promoters so these regions are not transcribed. (The authors seem to be a bit surprised by this result.) Further work is needed to decide how much of this DNA is truly functional and which parts contribute to human uniqueness. Naturally, that will require much more ENCODE-type work and T2T sequencing of other primates.

Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Although we find repeat variants that appear enriched or specific to the human lineage, in the absence of T2T-level assemblies from other primate species, we cannot truly attribute these elements to specific human phenotypes. Thus, the extent of variation described herein highlights the need to expand the effort to create human and nonhuman primate pan-genome references to support exploration of repeats that define the true extent of human variation.

This will cost millions of dollars. I suspect the grant applications have already been sent.