Sandwalk: First complete sequence of a human chromosome

Tuesday, August 27, 2019

First complete sequence of a human chromosome

A paper announcing the first complete sequence of a human chromosome has recently been posted on the bioRxiv server.

Miga, K. H., Koren, S., Rhie, A., Vollger, M. R., Gershman, A., Bzikadze, A., Brooks, S., Howe, E., Porubsky, D., Logsdon, G. A., et al. (2019) Telomere-to-telomere assembly of a complete human X chromosome. bioRxiv, 735928. doi: [doi: 10.1101/735928]

Abstract: After nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has been finished end to end, and hundreds of unresolved gaps persist. The remaining gaps include ribosomal rDNA arrays, large near-identical segmental duplications, and satellite DNA arrays. These regions harbor largely unexplored variation of unknown consequence, and their absence from the current reference genome can lead to experimental artifacts and hide true variants when re-sequencing additional human genomes. Here we present a de novo human genome assembly that surpasses the continuity of GRCh38, along with the first gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome, we reconstructed the ∼2.8 megabase centromeric satellite DNA array and closed all 29 remaining gaps in the current reference, including new sequence from the human pseudoautosomal regions and cancer-testis ampliconic gene families (CT-X and GAGE). This complete chromosome X, combined with the ultra-long nanopore data, also allowed us to map methylation patterns across complex tandem repeats and satellite arrays for the first time. These results demonstrate that finishing the human genome is now within reach and will enable ongoing efforts to complete the remaining human chromosomes.

The authors focused their efforts on the X chromosome from a cell line that is effectively haploid so it has only one copy of each chromosome. This is important because the missing regions of the chromosomes in the current reference genome consist of long stretches of repetitive DNA and there is considerable variation in the human population at these sites [see How much of the human genome has been sequenced?]. In diploid cells the two homologues will almost certainly be different making it difficult to assign sequenced DNA to the correct chromosome.

In the current release of the human reference genome (CRCh38), the assembled sequence of the X chromosome consists of three large gaps at the centromere (CENX in the figure above) and two large segmental duplication (DMRTC1 and another near the tip of the long arm). In addition there were 26 smaller gaps in the sequence.

The authors employed new sequencing technology to generate ultra-long reads of more than 100,000 bp. These sequences are less accurate than the shorter reads that were used to generate the reference genome but that limitation can be overcome by generating a large number of overlapping reads that cancel out the errors. In this case, they produced a whole genome sequence from 39x coverage combined with shorter reads from a previous 70x coverage to give an overall accuracy of at least 99.99%.

The result was extensive closure of existing gaps with the exception of the centromeric region. In some chromosomes the only missing sequence was at the centromere. In the case of the X chromosome, there were three large contigs shown in orange and blue in the figure. One of them spanned the centromere region in the diagram but this is misleading since that region has been collapsed in the reference genome. There is actually a large gap in the top orange contig. The two other gaps, at the junctions of the orange and blue contigs, span the segmental duplications. Note that all the other gaps in the CRCh38 reference genome were closed in the initial assembly.

The two gaps at the segmental duplications were closed by manually assembling the data from the ultra-long reads and confirming the assembly with data from other techniques. (The assembly software couldn't handle the assembly.)

The centromere region was the major challenge because it consists of about 2.8 Mb (2800 Kb) of highly repetitive satellite DNA containing hundreds of copies of α-satellite sequences (about 171 bp) and other AT-rich repeats that are much shorter [Centromere DNA]. The long sequence reads were correctly aligned and assembled by identifying site-specific single-nucleotide variants and using them as anchors to create a contiguous array. This was the same technique used last year to sequence the centromere of the Y-chromosome (Jain et al., 2918).

The figure below shows the Y-chromsome centromere in order to illustrate the complexity of the centromeric region.

The central part of the centromere sequence contains 52 higher order repeats (HOR) of α-satellite sequence. Each one contains about 34 monomers (light blue). In addition, there are three stretches of variant HOR regions that do not match the more common HOR (purple). The central region is surrounded by a pericentromeric region consisting of highly diverged α-satellite sequences (AT-rich DNA, dark blue). This is a typical arrangement for human centromeres except that the pericentromeric regions are often larger and the HORs contain different numbers of &alpha-satellite monomers. (The most common HOR in the X chromosome is a 12-mer.)

The first successful assembly of a human chromosome is a significant achievement but the significance is not so much in assembling the centromeric region but in closing all the other gaps. In fact, the true significance of the paper is in achieving high-quality ultra-long sequence reads of 100 Kb to >1000 Kb and in producing enough of these to achieve an average of 39-fold coverage of the entire genome. The authors conclude that by some quality metrics their new genome sequence is better than the current reference standard.

Image Credit: The drawing of a centromere is from Alberts et al. (2002) Figure 4-50.

Jain, M., Olsen, H.E., Turner, D.J., Stoddart, D., Bulazel, K.V., Paten, B., Haussler, D., Willard, H.F., Akeson, M., and Miga, K.H. (2018) Linear assembly of a human centromere on the Y chromosome. Nature biotechnology, 36:321-327. doi: [doi: 10.1038/nbt.4109]

2 comments :

John Harshman said...: Could you explain why the cell line is effectively haploid?; Tuesday, August 27, 2019 5:55:00 PM
Larry Moran said...: The cell line is CHM13hTERT derived from a molar pregnancy (hydatidiform mole). This happens when a sperm fertilizes an egg that has no nucleus. The sperm chromosomes are duplicated as the cells divide so the cell line derived from such a tissue contains two sets of identical chromosomes.

The karyotype of the CHM13 line is 46,XX.

https://sites.google.com/ucsc.edu/t2tworkinggroup/chm13-cell-line; Tuesday, August 27, 2019 6:44:00 PM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Tuesday, August 27, 2019

First complete sequence of a human chromosome

2 comments :