Sandwalk: What do we do with two different human genome reference sequences?

Sunday, April 03, 2022

What do we do with two different human genome reference sequences?

It's going to be extremely difficult, perhaps impossible, to merge the new complete human genome sequence with the current standard reference genome.

The source DNA for the new telomere-to-telomere (T2T) human genome sequence was a cell line derived from a molar pregnancy. This meant that the DNA was essentially haploid, thus avoiding the complications of sequencing diploid DNA which contains two highly similar but different genomes. The cell line, CHM13, lacks a Y chromosome but that's trivial since a complete T2T sequence of a Y chromosome will soon be published and it can be added to the T2T-CHM13 genome sequence [Telomere-to-telomere sequencing of a complete human genome].

The current standard reference genome is CHCh38.p13 (Feb. 28, 2019). It is a vitual sequence derived from a number of individuals living near Buffalo, New York. Since publication of the first human genome sequences, there have been thousands of others and none of them match the standard reference genome because of polymorphic SNPs and various deletions and insertions (indels). Some of these deletions and insertions can be very large (e.g. segmental duplications) so that no two human genome sequences are identical or even the same size (other than identical twins).

None of this is new. We all know that the standard reference genome is just that, a reference genome. We all know that there's a huge amount of variation between individuals; it's exactly what you expect for a dynamic genome where most changes, including deletions and insertions, are not restrained by purifying selection. It's good evidence that most of our genome is junk.

A lot of this variation at the level of SNPs and short indels can be handled by annotating the standard reference genome. Larger insertions and deletions, and chromosomal rearrangements, require a supplemental database that can be linked to the standard reference genome. This is one way to deal with the "pangenome"—the complete sequences of every known genome.

But this isn't as easy as it seems and it's especially complicated with the new complete sequence. A good discussion of the problem with integrating the T2T-CHM13 assembly can be found in a short essay by Deanna Church in the same issue of Science that contains the new sequence papers (Church, 2022). The new assembly corrects some errors in the CHCh38 assembly and adds an extra 8% of the genome. What that means is that the extensive annotation in the standard reference genome, can't be easily transferred to the T2T-CHM13 assembly because, for one thing, the numbering of the bases is very different. In addition, there are extra sequences in T2T-CHM13 that have to annotated. If they are duplications then you have to figure out which copy corresponds to the CHCh38 version and that's going to take time. Church shows some of the issues in a figure.

Keep in mind that the sequence of the standard reference genome is important but the annotation is equally important. A lot of genomics work relies on accurate annotation of coding regions, regulatory sequences, transposons, origins of replication, and a host of other markers. The reference genome is routinely scanned to extract this information. Genome wide association studies (GWAS) rely on the annotation. This annotation is the product of 22 years of work by hundreds of scientists and it's not going to be easy to extend it to the T2T-CHM13 genome.

What do we do with two different human genome reference sequences?

Epigenetic markers in the last 8% of the human genome sequence

Segmental duplications in the human genome

Church, D.M. (2022) A next-generation human genome sequence. Science 376:34-35. [doi: 10.1126/science.abo5367]

No comments :

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Sunday, April 03, 2022

What do we do with two different human genome reference sequences?

No comments :