More Recent Comments

Showing posts with label Genome. Show all posts
Showing posts with label Genome. Show all posts

Wednesday, February 10, 2021

The 20th anniversary of the human genome sequence:
4. Functional DNA in our genome

We know a lot more about the human genome than we did when the draft sequences were published 20 years ago. One of the most important discoveries is the recognition and extent of true functional sequences in the genome. Genes are one example of such functional sequence but only a minor component (about 1.4%). Most of the functional regions of the genome are not genes.

Here's a list of functional DNA in our genome other than the functional part of genes.

  • Centromeres: There are 24 different centromeres and the average size is four million base pairs. Most of this is repetitive DNA and it adds up to about 3% of the genome. The total amount of centromeric DNA ranges from 2%-10% in different individuals. It's unlikely that all of the centromeric DNA is essential; about 1% seems to be a good estimate.
  • Telomeres: Telomeres are repetivie DNA sequences at the ends of chromosomes. They are required for the proper replication of DNA and they take up about 0.1% of the genome sequence.
  • Origins of replication: DNA replication begins at origins of replication. The size of each origin has not been established with certainlty but it's safe to assume that 100 bp is a good estimate. There are about 100,000 origin sequences but it's unlikely that all of them are functional or necessary. It's reasonable to assume that only 30,000 - 50,000 are real origins and that means 0.3% of the genome is devoted to origins of replication.
  • Regulatory sequences: The transcription of every gene is controlled by sequences that lie outside of the genes, usually at the 5′ end. The total amount of regulatory sequence is controversial but it seems reasonable to assume about 200 bp per gene for a total of five million bp or less than 0.2% of the genome (0.16%). The most extreme claim is about 2,400 bp per gene or 1.8% of the genome.
  • Scaffold attachment regions (SARs): Human chromatin is organized into about 100,000 large loops. The base of each loop consists of particular proteins bound to specific sequences called anchor loop sequences. The nomenclature is confusing; the original term (SAR) isn't as popular today as it was 35 years ago but that doesn't change the fact that about 0.3% of the genome is required to organize chromatin.
  • Transposons: Most of the transposon-related sequencs in our genome are just fragments of defective transposons but there are a few active ones. They account for only a tiny fraction of the genome.
  • Viruses: Functional virus DNA sequences account for less than 0.1% of the genome.

If you add up all the functional DNA from this list, you get to somewhere between 2% and 3% of the genome.


Image credit: Wikipedia.

Monday, February 08, 2021

The 20th anniversary of the human genome sequence: 3. How many genes?

This week marks the 20th anniversary of the publication of the first drafts of the human genome sequence. Science choose to celebrate the achievement with a series of articles that had little to say about the scientific discoveries arising out of the sequencing project; one of the articles praised the opennesss of sequence data without mentioning that the journal had violated its own policy on openness by publishing the Celera sequence [The 20th anniversary of the human genome sequence: 1. Access to the data and the complicity of Science].

I've decided to post a few articles about the human genome beginning with one on finishing the sequence. In this post I'll summarize the latest data on the number of genes in the human genome.

Saturday, February 06, 2021

The 20th anniversary of the human genome sequence:
2. Finishing the sequence

It's been 20 years since the first drafts of the human genome sequence were published. These first drafts from the International Human Genome Project (IHGP) and Celera were far from complete. The IHGP sequence covered about 82% of the genome and it contained about 250,000 gaps and millions of sequencing errors.

Celera never published an updated sequences but IHPG published a "finished" sequence in October 2004. It covered about 92% of the genome and had "only" 300 gaps. The error rate of the finished sequence was down to 10-5.

International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931-945. doi: 10.1038/nature03001

We've known for many decades that the correct size of the human genome is close to 3,200,000 kb or 3.2 Gb. There's isn't a more precise number because different individuals have different amounts of DNA. The best average estimate was 3,286 Gb based on the sequence of 22 autosomes, one X chromosome, and one Y chromosome (Morton 1991). The amount of actual nucleotide sequence in the latest version of the reference genome (GRCh38.p13) is 3,110,748,599 bp and the estimated total size is 3,272,116,950 bp based on estimating the size of the remaining gaps. This means that 95% of the genome has been sequenced. [see How much of the human genome has been sequenced? for a discussion of what's missing.]

Recent advances in sequencing technology have produced sequence data covering the repetitive regions in the gaps and the first complete sequence of a human chromosome (X) was published in 2019 [First complete sequence of a human chromosome]. It's now possible to complete the human genome reference sequence by sequencing at least one individual but I'm not sure that the effort and the expense are worth it.


Image credit the figure is from Miga et al. (2019)

Miga, K.H., Koren, S., Rhie, A., Vollger, M.R., Gershman, A., Bzikadze, A., Brooks, S., Howe, E., Porubsky, D., Logsdon, G.A. et al. (2019) Telomere-to-telomere assembly of a complete human X chromosome. Nature 585:79-84. [doi: 10.1038/s41586-020-2547-7]

Morton, N.E. (1991) Parameters of the human genome. Proceedings of the National Academy of Sciences 88:7474-7476. [doi: 10.1073/pnas.88.17.7474]

The 20th anniversary of the human genome sequence: 1. Access to the data and the complicity of Science

The first drafts of the human genome sequence were published 20 years ago. The paper from the International Human Genome Project (IHGP) was published in Nature on Febuary 15, 2001 and the paper from Celera was published in Science on February 16, 2001.

The original agreement was to publish both papers in Science but IHGP refused to publish their sequence in that journal when it choose to violate its own policy by allowing Celera to restrict access to its data. I highly recommend James Shreeve's book The Genome War for the history behind these publications. It paints an accurate, but not pretty, picture of science and politics.

Lander, E., Linton, L., Birren, B., Nusbaum, C., Zody, M., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A. and Sougnez, C. (2001) Initial sequencing and analysis of the human genome. Nature 409:860-921. doi: 10.1038/35057062

Venter, J., Adams, M., Myers, E., Li, P., Mural, R., Sutton, G., Smith, H., Yandell, M., Evans, C., Holt, R., Gocayne, J., Amanatides, P., Ballew, R., Huson, D., Wortman, J., Zhang, Q., Kodira, C., Zheng, X., Chen, L., Skupski, M., Subramanian, G., Thomas, P., Zhang, J., Gabor Miklos, G., Nelson, C., Broder, S., Clark, A., Nadeau, J., McKusick, V. and Zinder, N. (2001) The sequence of the human genome. Science 291:1304 - 1351. doi: 10.1126/science.1058040

Friday, August 07, 2020

Alan McHughen defends his views on junk DNA

Alan McHughen is the author of a recently published book titled DNA Demystified. I took issue with his stance on junk DNA [More misconceptions about junk DNA - what are we doing wrong?] and he has kindly replied to my email message. Here's what he said ...

Thursday, August 06, 2020

More misconceptions about junk DNA - what are we doing wrong?

I'm actively following the views of most science writers on junk DNA to see if they are keeping up on the latest results. The latest book is DNA Demystified by Alan McHughen, a molecular geneticist at the University California, Riverside. It's published by Oxford University Press, the same publisher that published John Parrington's book the deeper genome. Parrington's book was full of misleading and incorrect statements about the human genome so I was anxious to see if Oxford had upped it's game.1, 2

You would think that any book with a title like DNA Demystified would contain the latest interpretations of DNA and genomes, especially with a subtitle like "Unraveling the double Helix." Unfortunately, the book falls far short of its objectives. I don't have time to discuss all of its shortcomings so let's just skip right to the few paragraphs that discuss junk DNA (p.46). I want to emphasize that this is not the main focus of the book. I'm selecting it because it's what I'm interested in and because I want to get a feel for how correct and accurate scientific information is, or is not, being accepted by practicing scientists. Are we falling for fake news?

Saturday, August 01, 2020

ENCODE 3: A lesson in obfuscation and opaqueness

The Encyclopedia of DNA Elements (ENCODE) is a large-scale, and very expensive, attempt to map all of the functional elements in the human genome.

The preliminary study (ENCODE 1) was published in 2007 and the main publicity campaign surrounding that study focused on the fact that much of the human genome was transcribed. The implication was that most of the genome is functional. [see: The ENCODE publicity campaign of 2007].

The ENCODE 2 results were published in 2012 and the publicity campaign emphasized that up to 80% of our genome is functional. Many stories in the popular press touted the death of junk DNA. [see: What did the ENCODE Consortium say in 2012]

Both of these publicity campaigns, and the published conclusions, were heavily criticized for not understanding the distinction between fortuitous transcription and real genes and for not understanding the difference between fortuitous binding sites and functional binding sites. Hundreds of knowledgeable scientists pointed out that it was ridiculous for ENCODE researchers to claim that most of the human genome is functional based on their data. They also pointed out that ENCODE researchers ignored most of the evidence supporting junk DNA.

ENCODE 3 has just been published and the hype has been toned down considerably. Take a look at the main publicity article just published by Nature (ENCODE 3). The Nature article mentions ENCODE 1 and ENCODE 2 but it conveniently ignores the fact that Nature heavily promoted the demise of junk DNA back in 2007 and 2012. The emphasis now is not on how much of the genome is functional—the main goal of ENCODE—but on how much data has been generated and how many papers have been published. You can read the entire article and not see any mention of previous ENCODE/Nature claims. In fact, they don't even tell you how many genes ENCODE found or how many functional regulatory sites were detected.

The News and Views article isn't any better (Expanded ENCODE delivers invaluable genomic encyclopedia). Here's the opening paragraph of that article ...
Less than 2% of the human genome encodes proteins. A grand challenge for genomic sciences has been mapping the functional elements — the regions that determine the extent to which genes are expressed — in the remaining 98% of our DNA. The Encyclopedia of DNA Elements (ENCODE) project, among other large collaborative efforts, was established in 2003 to create a catalogue of these functional elements and to outline their roles in regulating gene expression. In nine papers in Nature, the ENCODE consortium delivers the third phase of its valuable project.1
You'd think with such an introduction that you would be about to learn how much of the genome is functional according to ENCODE 3 but you will be disappointed. There's nothing in that article about the number of genes, the number of regulatory sites, or the number of other functional elements in the human genome. It almost as if Nature wants to tell you about all of the work involved in "mapping the functional elements" without ever describing the results and conclusions. This is in marked contrast to the Nature publicity campaigns of 2007 and 2012 where they were more than willing to promote the (incorrect) conclusions.

In 2020 Nature seems to be more interested in obfuscation and opaqueness. One other thing is certain, the Nature editors and writers aren't the least bit interested in discussing their previous claims about 80% of the genome being functional!

I guess we'll have to rely on the ENCODE Consortium itself to give us a summary of their most recent findings. The summary paper has an intriguing title (Perspectives on ENCODE) that almost makes you think they will revisit the exaggerated claims of 2007 and 2012. No such luck. However, we do learn a little bit about the human genome.
  • 20,225 protein-coding genes [almost 1000 more than the best published estimates - LAM]
  • 37,595 noncoding genes [I strongly doubt they have evidence for that many functional genes]
  • 2,157,387 open chromatin regions [what does this mean?]
  • 1,224,154 transcription factor binding sites [how many are functional?]
That's it. The ENCODE Consortium seems to have learned only two things in 2012. They learned that it's better to avoid mentioning how much of the genome is functional in order to avoid controversy and criticism and they learned that it's best to ignore any of their previous claims for the same reason. This is not how science is supposed to work but the ENCODE Consortium has never been good at showing us how science is supposed to work.

Note: I've looked at some of the papers to try and find out if ENCODE stands by it's previous claim that most the genome is functional but they all seem to be written in a way that avoids committing to such a percentage or addressing the criticisms from 2007 and 2012. The only exception is a paper stating that cis-regulatory elements occupy 7.9% of the human genome (Expanded encyclopaedias of DNA elements in the human and mouse genomes). Please let me know if you come across anything interesting in those papers.


1. Isn't it about time to stop dwelling on the fact that 2% (actually less than 1%) of our genome encodes protein? We've known for decades that there are all kinds of other functional regions of the genome. No knowledgeable scientist thinks that the remaining 98% (99%) has no function.

Thursday, July 09, 2020

Structure and expression of the SARS-CoV-2 (coronavirus) genome


Coronaviruses are RNA viruses, which means that their genome is RNA, not DNA. All of the coronaviruses have similar genomes but I'm sure you are mostly interested in SARS-CoV-2, the virus that causes COVID-19. The first genome sequence of this virus was determined by Chinese scientists in early January and it was immediately posted on a public server [GenBank MN908947]. The viral RNA came from a patient in intensive care at the Wuhan Yin-Tan Hospital (China). The paper was accepted on Jan. 20th and it appeared in the Feb. 3rd issue of Nature (Zhou et al. 2020).

By the time the paper came out, several universities and pharmaceutical companies had already constructed potential therapeutics and several others had already cloned the genes and were preparing to publish the structures of the proteins.1

By now there are dozens and dozens of sequences of SARS-CoV-2 genomes from isolates in every part of the world. They are all very similar because the mutation rate in these RNA viruses is not high (about 10-6 per nucleotide per replication). The original isolate has a total length of 29,891 nt not counting the poly(A) tail. Note that these RNA viruses are about four times larger than a typical retrovirus; they are the largest known RNA viruses.

Wednesday, July 08, 2020

Where did your chicken come from?

Scientists have sequenced the genomes of modern domesticated chickens and compared them to the genomes of various wild pheasants in southern Asia. It has been known for some time that chickens resemble a species of pheasant called red jungle fowl and this led Charles Darwin to speculate that chickens were domesticated in India. Others have suggested Southeast Asia or China as the site of domestication.

The latest results show that modern chickens probably descend from a subspecies of red jungle fowl that inhabits the region around Myanmar (Wang et al., 2020). The subspecies is Gallus gallus spadiceus and the domesticated chicken subspecies is Gallus gallus domesticus. As you might expect, the two subspecies can interbreed.

The authors looked at a total of 863 genomes of domestic chickens, four species of jungle fowl, and all five subspecies of red jungle fowl. They identified a total of 33.4 million SNPs, which were enough to genetically distinguish between the various species AND the subspecies of red jungle fowl. (Contrary to popular belief, it is quite possible to assign a given genome to a subspecies (race) based entirely on genetic differences.)

The sequence data suggest that chickens were domesticated from wild G. g. spadiceus about 10,000 years ago in the northern part of Southeast Asia. The data also suggest that modern domesticated chickens (G. g. domesticus) from India, Pakistan, and Bangladesh interbred with another subspecies of red jungle fowl (G. g. murghi) after the original domestication. These chickens from South Asia contain substantial contributions from G. g. murghi ranging from 8-22%.

Next time you serve chicken, if someone asks you where it came from you won't be lying if you say it came from Myanmar.


Image credits: BBQ chicken, Creative Common License [Chicken BBQ]
Red Jungle Fowl, Creative Commons License [Red_Junglefowl_-Thailand]
Map: Lawler, A. (2020) Dawn of the chicken revealed in Southeast Asia, Science: 368: 1411.

Wang, M., Thakur, M., Peng, M. et al. (2020) 863 genomes reveal the origin and domestication of chicken. Cell Res (2020) [doi: 10.1038/s41422-020-0349-y]

Saturday, April 18, 2020

Three scientists discuss junk DNA

I just found this video that was posted to YouTube on May 2019. It's produced by the University of California and it features three researchers discussing the question, "Is Most of Your DNA Junk!" The three scientists are:
  • Rusty Gage, a neuroscientist at the Salk Institute
  • Alysson Muotri, who studies brain development at the University of California, San Diego
  • Miles Wilkinson, who studies neuronal and germ cell development at the University of San Diego
None of them appear to be experts on genomes or junk DNA although one of them (Wilkinson) appears to have some knowledge of the evidence for junk DNA, although many of his explanations are garbled. What's interesting is that they emphasize the fact that some transposon-related sequences are expressed in some cells and they rely on this fact to remain skeptical of junk DNA. They also propose that excess DNA might be present in order to ensure diversity and prepare for future evolution. All three seem to be comfortable with the idea that excess DNA may be protecting the rest of the functional genome.

This is a good example of what we are up against when we try to convince scientists that most of our genome is junk.





Friday, February 07, 2020

The Function Wars Part VI: The problem with selected effect function

The term "Function Wars" refers to the debate over the meaning of 'function,' especially in the context of junk DNA.1 That debate intensified in 2012 after the ENCODE publicity campaign that tried to redefine function to mean anything they want as long as it refutes junk DNA. This is the sixth in a series of posts exploring the debate and why it's important, or not. Links to the other five posts can be found at the bottom or this post.

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.
Stephen Jay Gould (1982)
Much of the discussion seems like quibbling over semantics but I'm reminded of a similar debate over the mode of evolution: is it gradual or punctuated? As Gould pointed out in 1982, there's a serious issue underlying the debate—an issue that shouldn't get lost in bickering over the meaning of 'gradualistic.' The same warning applies here. It's important to determine how much of the human genome is junk and that requires an understanding of what we mean by junk DNA. However, it's easy to get distracted by focusing on the exact meaning of the word 'function' instead of looking at the big picture.

Friday, January 31, 2020

lncRNA nonsense from Los Alamos

A group of scientists at the Los Alamos National Laboratory (Los Alamos, NM, USA) and their collaborators in Vienna (Austria) and Lethbridge (Alberta, Canada) have worked out the structure of Braveheart lncRNA from mice.
Kim, D.N., Thiel, B.C., Mrozowich, T., Hennelly, S.P., Hofacker, I.L., Patel, T.R., Sanbonmatsu, K.Y. (2020) Zinc-finger protein CNBP alters the 3-D structure of lncRNA Braveheart in solution. Nat. Commun. 11:148 [doi: 10.1038/s41467-019-13942-4]
The authors point out in their paper that lncRNAs are difficult to work with and the 3D structures of only a small number have been characterized. There's nothing in the paper about the problems associated with determining the functions of lncRNAs and nothing about the number of lncRNAs except for this brief opening statement: "Long non-coding RNAs (lncRNAs) constitute a significant fraction of the transcriptome ..."

Wednesday, January 08, 2020

Are pseudogenes really pseudogenes?

There are many junk DNA skeptics who claim that most of our genome is functional. Some of them have even questioned whether pseudogenes are mostly junk. The latest challenge comes from a recent review in Nature Reviews: Genetics where the authors try to place the burden of proof on those who say that pseudogenes are broken, nonfunctional, genes (Cheetam et al., 2019). The authors of the review try to make the case that we should not label a DNA sequence as a pseudogene until we can prove that it is truly nonfunctional junk.

I'm about to refute this ridiculous stance but first we need a little background.

Friday, December 13, 2019

The "standard" view of junk DNA is completely wrong

I was browsing the table of contents of the latest issue of Cell and I came across this ....
For decades, the miniscule protein-coding portion of the genome was the primary focus of medical research. The sequencing of the human genome showed that only ∼2% of our genes ultimately code for proteins, and many in the scientific community believed that the remaining 98% was simply non-functional “junk” (Mattick and Makunin, 2006; Slack, 2006). However, the ENCODE project revealed that the non-protein coding portion of the genome is copied into thousands of RNA molecules (Djebali et al., 2012; Gerstein et al., 2012) that not only regulate fundamental biological processes such as growth, development, and organ function, but also appear to play a critical role in the whole spectrum of human disease, notably cancer (for recent reviews, see Adams et al., 2017; Deveson et al., 2017; Rupaimoole and Slack, 2017).

Slack, F.J. and Chinnaiyan, A.M. (2019) The Role of Non-coding RNAs in Oncology. Cell 179:1033-1055 [doi: 10.1016/j.cell.2019.10.017]
Cell is a high-impact, refereed journal so we can safely assume that this paper was reviewed by reputable scientists. This means that the view expressed in the paragraph above did not raise any alarm bells when the paper was reviewed. The authors clearly believe that what they are saying is true and so do many other reputable scientists. This seems to be the "standard" view of junk DNA among scientists who do not understand the facts or the debate surrounding junk DNA and pervasive transcription.

Here are some of the obvious errors in the statement.
  1. The sequencing of the human genome did NOT show that only ~2% of our genome consisted of coding region. That fact was known almost 50 years ago and the human genome sequence merely confirmed it.
  2. No knowledgeable scientist ever thought that the remaining 98% of the genome was junk—not in 1970 and not in any of the past fifty years.
  3. The ENCODE project revealed that much of our genome is transcribed at some time or another but it is almost certainly true that the vast majority of these low-abundance, non-conserved, transcripts are junk RNA produced by accidental transcription.
  4. The existence of noncoding RNAs such as ribosomal RNA and tRNA was known in the 1960s, long before ENCODE. The existence of snoRNAs, snRNAs, regulatory RNAs, and various catalytic RNAS were known in the 1980s, long before ENCODE. Other RNAs such as miRNAs, piRNAS, and siRNAs were well known in the 1990s, long before ENCODE.
How did this false view of our genome become so widespread? It's partially because of the now highly discredited ENCODE publicity campaign orchestrated by Nature and Science but that doesn't explain everything. The truth is out there in peer-reviewed scientific publications but scientists aren't reading those papers. They don't even realize that their standard view has been seriously challenged. Why?


Tuesday, September 24, 2019

How many protein-coding genes in the human genome? (2)

It's difficult to know how many protein-coding genes there are in the human genome because there are several different ways of counting and the counts depend on what criteria are used to identify a gene. Last year I commented on a review by Abascal et al. (2018) that concluded there were somewhere between 19,000 and 20,000 protein-coding genes. Those authors discussed the problems with annotation and pointed out that the major databases don't agree on the number of gene [How many protein-coding genes in the human genome?].

Tuesday, August 27, 2019

First complete sequence of a human chromosome

A paper announcing the first complete sequence of a human chromosome has recently been posted on the bioRxiv server.

Miga, K. H., Koren, S., Rhie, A., Vollger, M. R., Gershman, A., Bzikadze, A., Brooks, S., Howe, E., Porubsky, D., Logsdon, G. A., et al. (2019) Telomere-to-telomere assembly of a complete human X chromosome. bioRxiv, 735928. doi: [doi: 10.1101/735928]

Abstract: After nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has been finished end to end, and hundreds of unresolved gaps persist. The remaining gaps include ribosomal rDNA arrays, large near-identical segmental duplications, and satellite DNA arrays. These regions harbor largely unexplored variation of unknown consequence, and their absence from the current reference genome can lead to experimental artifacts and hide true variants when re-sequencing additional human genomes. Here we present a de novo human genome assembly that surpasses the continuity of GRCh38, along with the first gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome, we reconstructed the ∼2.8 megabase centromeric satellite DNA array and closed all 29 remaining gaps in the current reference, including new sequence from the human pseudoautosomal regions and cancer-testis ampliconic gene families (CT-X and GAGE). This complete chromosome X, combined with the ultra-long nanopore data, also allowed us to map methylation patterns across complex tandem repeats and satellite arrays for the first time. These results demonstrate that finishing the human genome is now within reach and will enable ongoing efforts to complete the remaining human chromosomes.

Sunday, August 25, 2019

How much of the human genome has been sequenced?

It's been more than seven years since I posted information on how much of the human genome has been sequenced [How Much of Our Genome Is Sequenced?]. At that time, the latest version of the human reference genome was GRCh37.p7 (Feb. 3, 2012) and 89.6% of the genome had been sequenced. It's time to update that information.

We have a pretty good idea of the size of the human genome based on quantitative Feulgen staining (1940-1980) and reassociation kinetic experiments from the 1970s (Morton, 1991). We can safely assume that the correct size of the human genome is close to 3,200,000,000 bp (3,200,000 kb, 3,200 Mb, 3.2 Gb) [How Big Is the Human Genome?]. That's the value cited most often in the literature. However, the actual values calculated by Morton (1991) were 3.227 Gb for the haploid female genome and less than that for the haploid male genome. The human reference genome contains all 22 autosomes plus one copy of the X chromosome and one copy of the Y chromosome. This gives a total of 3.286 Gb.

Monday, February 04, 2019

What is the dominant view of junk DNA?

I think that about 90% of our genome is junk and I know lots of other scientists who feel the same way. I'm pretty sure that this view is not shared by the majority of scientists but I don't know whether they are convinced that most of our genome is functional or whether they just think the question is unanswerable at the present time. I suspect that the latter view is more common but I'd like to hear your opinion.

Sunday, January 27, 2019

Yeast loses its introns

Baker’s yeast (Saccharomyces cerevisiae) is one of the best studied eukaryotes. Its genome is just slightly larger than the largest bacterial genome and it was the first eukaryotic genome to be sequenced (Mewes at al., 1997). It has about 7000 genes in total and 6,604 of these genes are protein-coding genes but only 280 of these genes contain introns.1 The rest have lost their introns over the course of several hundred million years of evolution (Hooks et al., 2014).

We know that introns have been lost in yeast because the genes of related species have lots of introns. The common ancestor of all fungi undoubtedly had genes with multiple introns because the available evidence indicates that introns invaded eukarotic genes very early in the evolution of eukaryotes. The fact that most introns have been purged from the yeast genome suggests that introns are not essential for gene function. In other words, introns are mostly junk.2

Saturday, December 08, 2018

The persistent myth of alternative splicing

I'm convinced that widespread alternative splicing does not occur in humans or in any other species. It's true that the phenomenon exists but it's restricted to a small number of genes—probably fewer than 1000 genes in humans. Most of the unusual transcripts detected by modern technology are rare and unstable, which is consistent with the idea that they are due to splicing errors. Genome annotators have rejected almost all of those transcripts.

You can see links to my numerous posts on this topic at: Alternative splicing and the gene concept and Are splice variants functional or noise?.