More Recent Comments
Sunday, February 10, 2008
A Case of Plagiarism
The blogosphere is all atwitter about the publication of a paper titled "Mitochondria, the missing link between body and soul: Proteomic prospective evidence". This is the train wreck of a paper that PZ Myers blogged about a few days ago [What Happened to the "Peers" on this Paper?].
Everyone needs to know that the contents of this paper were not only stupid but also plagiarized. The authors couldn't even come up with their own words to explain their silly ideas. For the latest additions to a long list of stolen paragraphs see Commentary: Neither buried nor treasure.
The guilty journal is Proteomics. The editors are not blameless.
The Streisand Effect in Action
I mentioned the Streisand Effect a few days ago. Here's a perfect example of how it works.
ThePolitic.cmo is one of those Canadian blogs written by someone (Matthew) who embarrasses my country. Matthew writes,
It’s no secret that many of us liberty and/or family-minded folks are great fans of The National Post which officially only competes with the Globe and Mail but realistically also occupies reality that the Toronto Star and Toronto Sun covet. I personally began subscribing to the Post after graduation not because it had a host of right-wing commentators (the Toronto Sun can also claim this), but because the paper took the mission of presenting all view points seriously by often welcoming guest columnists who would attack its editorials, or by presenting series like the one they did two weeks ago on abortion, where a dozen commentators would weigh in on the issue with intelligent, but different viewpoints.For more information about the Love & Sex issue go to the National Post website [Love & Sex Issue].
This led me to great sadness today when I went onto their website to read the digital version of the paper. The front cover was just a large cartoon title that said “The Love & Sex Issue” which is tastefully questionable in itself for a national newspaper, but if you look at the picture itself, it also contains the drawing of two nude people behind the “x” which those of us familiar with Japanese pop-culture would classify as hentai. Half of the main section contained articles which were more at home in a Penthouse issue and the Post’s website contains video content that I dare not look at but is clearly part of the above-mentioned theme.
I have since called the Post’s office and dealt with a nice young chap who will be passing along my complaints (the Post is good at responding to these), but in the mean time, I invite everyone else who is disturbed by this extremely poor lack in judgment to write or phone to the Post’s editorial staff:
Warning: There may be
Mathew should just be thankful that this topic wasn't covered in one of the left-wing newspapers like the Toronto Star. He probably would have had to leave the country for a few days to avoid the newspaper boxes.
[Hat Tip: Canadian Cynic]
God Only Knows
I found a new Canadian blog called Stony_Curtis: Pop Culture, Guys, Food, Montreal. With a title like that, don't you just have to click on the link?
One of the first things I stumbled across was the video below of the Beach Boys singing "God Only Knows" from the Pet Sounds album. I don't know who Stony Curtis is but I like him already. (P.S. Somehow the song doesn't seem quite as innovative and moving as it was 40 years ago. I wonder why.)
The Meaning of Consensus
What does Stephen Harper mean when he uses the word "consensus" as in,
The Conservative government will not extend Canada's combat mission in Afghanistan beyond February 2009 without a consensus in Parliament, Prime Minister Stephen Harper said Friday.Canadian Cynic has the answer. [Hint: Harper doesn't mean what you think he means.]
"I will want to see some degree of consensus among Canadians on how we move forward on that," Harper told reporters Friday in Ottawa.
Saturday, February 09, 2008
Junk in Your Genome: Intron Size and Distribution
In the comments to Junk in Your Genome: Protein-Encoding Genes martinc asks,
I suggested that much of the intron sequences were junk. Martinc's question is quite reasonable but in order to get an answer we need to look more closely at the distribution of introns.
The figure shows the distribution of intron sizes in four species: the flowering plant Arabidopsis thaliana; the fruit fly Drosophila melanogaster; human, and mouse. The data is from Hong et al. (2006, Fig.1).
Note that the distribution in Arabidopsis and Drosophila is very tight. Both of these species have relatively compact genomes compared to mammals. The data strongly suggests that the minimum intron size is about 80 bp.
The distributions in the human and mouse genomes are very different. There is a strong peak at 100 bp—this is similar to the peaks in other species. But unlike other species, mammalian introns can be extremely large, giving rise to a long tail of the distribution extending to 10,000 bp or more. The key question is whether this distribution of long introns is noise or an artifact of gene prediction algorithms, or whether it represents a real phenomenon.
Returning to martinc's question. If we look at well-conserved genes in different species what we find is some variation in intron length but only around a mean of about 100-400 bp. In other words, in genes that have been closely examined, where the protein product is known, the distribution of intron sizes looks a lot more like the distribution in Arabidopsis and Drosophila.
Let's look at the hsp90 genes. These are the genes that endcode Hsp90, the protein that SciPhu was blogging about [Hsp90 and Evolution].
I've picked the zebrafish gene and four mammalian genes to illustrate the variation in intron length. (Blue exons are 5′ and 3′ UTR's.) Most of the introns are between 80 and 400 bp in size but there are a few exceptions. In this case the human gene is the exception; it has two huge introns at the 5′ end of the gene.
What we see is a narrow distribution of intron lengths in most cases and a few huge introns. It isn't surprising that the length of introns in different species are quite similar.
Let's look at my favorite gene. HSPA8 is the cytoplasmic version of the chaperone HSP70 multigene family.
We see a similar pattern. Most intron lengths are very similar in different species suggesting selection for introns in the 100-400 bp range. There are exceptions, as we see in the chimpanzee, monkey and dog genes. All three have large introns at either the 5′ or 3′ ends. The large monkey inrons are 10,253 bp and 1007 bp. The large chimpanzee intron is 13,257 bp in length. This is typical. I think it's very likely that the large introns in noncoding exons are artifacts.
So here's the complete answer to the question posed at the top of the page. I think there's selection to maintain introns sizes to a fairly narrow range of between 100-400 bp. Because of this, we expect to see similar intron sizes in different species. On occasion we discover a huge intron that is peculiar to one species. This intron could be a transient expansion that hasn't been reduced yet, or it could be an artifact.
Incidentally, while retrieving these sequences from Entrez Gene I noticed that the annotators have eliminated all spice variants for HSP90 and HSPA8 genes with a few exceptions.
The dog sequences all have many splice variants for every gene and some of the variants have been retained in Entrez Gene entry for dog HSPA8. Look carefully at the two predicted variants in the seond and third lines. These alternative splice variants are supposed to produce Hsc70 proteins that are missing several highly conserved regions encoded by exons 7 and 8. Recall that this is the most highly conserved protein in biology.
These cannot be biologically relevant protein variants that are only produced in dogs. The annotators are right to remove similar artifacts from the other genomes and they should remove these as well. Alternative splice variants are mostly artifacts, in my opinion, but that's a fight for another day.
Larry, if the amount of necessary sequences within introns are as small as you suggest wouldn't this allow us to make a prediction. Couldn't we predict that due to drift there should be very little similarity in intron lengths between different species. If, by any chance, there is similarity then what would your explanation be?There have been quite a few studies of average intron size in various species. I selected a number for the average size of introns from Hong et al. (2006). The average intron size, according to them, is 3,479 bp in coding regions. This value is a little deceptive since there are a small number of huge introns that make the average quite large. The median value is 1334 bp or less than half the average value.
I suggested that much of the intron sequences were junk. Martinc's question is quite reasonable but in order to get an answer we need to look more closely at the distribution of introns.
The figure shows the distribution of intron sizes in four species: the flowering plant Arabidopsis thaliana; the fruit fly Drosophila melanogaster; human, and mouse. The data is from Hong et al. (2006, Fig.1).
Note that the distribution in Arabidopsis and Drosophila is very tight. Both of these species have relatively compact genomes compared to mammals. The data strongly suggests that the minimum intron size is about 80 bp.
The distributions in the human and mouse genomes are very different. There is a strong peak at 100 bp—this is similar to the peaks in other species. But unlike other species, mammalian introns can be extremely large, giving rise to a long tail of the distribution extending to 10,000 bp or more. The key question is whether this distribution of long introns is noise or an artifact of gene prediction algorithms, or whether it represents a real phenomenon.
Returning to martinc's question. If we look at well-conserved genes in different species what we find is some variation in intron length but only around a mean of about 100-400 bp. In other words, in genes that have been closely examined, where the protein product is known, the distribution of intron sizes looks a lot more like the distribution in Arabidopsis and Drosophila.
Let's look at the hsp90 genes. These are the genes that endcode Hsp90, the protein that SciPhu was blogging about [Hsp90 and Evolution].
I've picked the zebrafish gene and four mammalian genes to illustrate the variation in intron length. (Blue exons are 5′ and 3′ UTR's.) Most of the introns are between 80 and 400 bp in size but there are a few exceptions. In this case the human gene is the exception; it has two huge introns at the 5′ end of the gene.
What we see is a narrow distribution of intron lengths in most cases and a few huge introns. It isn't surprising that the length of introns in different species are quite similar.
Let's look at my favorite gene. HSPA8 is the cytoplasmic version of the chaperone HSP70 multigene family.
We see a similar pattern. Most intron lengths are very similar in different species suggesting selection for introns in the 100-400 bp range. There are exceptions, as we see in the chimpanzee, monkey and dog genes. All three have large introns at either the 5′ or 3′ ends. The large monkey inrons are 10,253 bp and 1007 bp. The large chimpanzee intron is 13,257 bp in length. This is typical. I think it's very likely that the large introns in noncoding exons are artifacts.
So here's the complete answer to the question posed at the top of the page. I think there's selection to maintain introns sizes to a fairly narrow range of between 100-400 bp. Because of this, we expect to see similar intron sizes in different species. On occasion we discover a huge intron that is peculiar to one species. This intron could be a transient expansion that hasn't been reduced yet, or it could be an artifact.
Incidentally, while retrieving these sequences from Entrez Gene I noticed that the annotators have eliminated all spice variants for HSP90 and HSPA8 genes with a few exceptions.
The dog sequences all have many splice variants for every gene and some of the variants have been retained in Entrez Gene entry for dog HSPA8. Look carefully at the two predicted variants in the seond and third lines. These alternative splice variants are supposed to produce Hsc70 proteins that are missing several highly conserved regions encoded by exons 7 and 8. Recall that this is the most highly conserved protein in biology.
These cannot be biologically relevant protein variants that are only produced in dogs. The annotators are right to remove similar artifacts from the other genomes and they should remove these as well. Alternative splice variants are mostly artifacts, in my opinion, but that's a fight for another day.
Hong X, Scofield DG, Lynch M (2006) Intron size, abundance, and distribution within untranslated regions of genes. Mol. Biol. Evol. 23:2392-404. [PubMed]
Friday, February 08, 2008
Junk in Your Genome: Protein-Encoding Genes
The typical human gene has eight exons and seven introns (the actual average number of introns is 7.2). These values are based on analysis of 5236 well-characterized human genes with full-length cDNA's (Hong et al. 2006). There are lots of conflicting results in the literature. Most claim there are more introns but the data is based largely on a computational assessment of introns and exons. It includes a number of introns of extraordinary length lying between exons of dubious existence (often non-coding). I'll assume for the time being that there are 7.2 introns per gene, on average, and the average length is 3750 bp (Hong et al. 2006)
Each gene is transcribed from a 5′ promoter (P) and the primary transcript terminates at a polyadenylation site (t).
THEME
Genomes & Junk DNA
Total Junk so far
55%
The exons contain coding regions (blue) that encode the sequence of the protein product. A typical protein has a molecular weight of 70,000 daltons and this corresponds to about 635 amino acid residues. The coding region is 1905 bp but we'll round up to 2 kb. Each gene has a region of the mRNA at the 5′ end called the 5′ untranslated region (UTR). This is required for translation. It averages 200 bp in size, with considerable variation. The 3′ end of the gene has a similar untranslated region that we'll assume to be essential.
Thus, total essential exons comprise 2200 bp on average per gene. Since there are 20,500 protein-encoding genes, this means 20,500 × 2.2 kb = 45.1 Mb or 1.4% of the genome (about 1.3% coding and 0.1% UTRs).
The minimum size of a eukaryotic intron is less than 50 bp. For a typical mammalian intron, the essential sequences in the introns are: the 5′ splice site (~10 bp); the 3′ splice site (~30 bp): the branch site (~10 bp); and enough additional RNA to form a loop (~30 bp). This gives a total of 80 bp of essential sequence per intron or 20,500 × 7.2 × 80 = 11.8 Mb. Thus, 0.37% of the genome is essential because it contains sequences for processing RNA.
The total of essential sequences in the transcribed part of a gene is about 1.8% of the genome.
The rest of the intron sequence is non-essential junk. Much of it is littered with transposable elements that have inserted haphazardly. If we subtract the essential intron sequence then the average size of the remaining DNA is 3650 bp. The total amount of this sequence is 20,500 × 7.2 × 3650 = 538.7 Mb or 17% of the genome. (Most estimates are somewhat higher.)
Assuming that 44% of this is repetitive transposable elements, this leaves7.4% 9.6% of the genome. That's an additional 7.4% 9.6% of non-essential DNA, or junk, bringing our current total to 53% 55% junk.
The transcription of every gene is controlled by sequences beyond the 5′ end. There are two classes of sequence; promoters, and regulatory sequences. The actual binding sites for RNA polymerase II and various regulatory proteins make up only about 100 bp of essential sequence but the various bound proteins have to form loops of DNA in order to come into contact. It's reasonable to assume that the average gene may need as much as 1000 bp of essential regulatory sequence. (A generous estimate.)
This means 20,500 × 1000 bp = 20.5 Mb or 0.6% of the genome is essential for regulation.
The grand totals for protein-encoding genes are:
essential 2.4%
junk7.4% 9.6% (not counting sequences that were included in other calculations)
Each gene is transcribed from a 5′ promoter (P) and the primary transcript terminates at a polyadenylation site (t).
THEME
Genomes & Junk DNA
Total Junk so far
55%
The exons contain coding regions (blue) that encode the sequence of the protein product. A typical protein has a molecular weight of 70,000 daltons and this corresponds to about 635 amino acid residues. The coding region is 1905 bp but we'll round up to 2 kb. Each gene has a region of the mRNA at the 5′ end called the 5′ untranslated region (UTR). This is required for translation. It averages 200 bp in size, with considerable variation. The 3′ end of the gene has a similar untranslated region that we'll assume to be essential.
Thus, total essential exons comprise 2200 bp on average per gene. Since there are 20,500 protein-encoding genes, this means 20,500 × 2.2 kb = 45.1 Mb or 1.4% of the genome (about 1.3% coding and 0.1% UTRs).
The minimum size of a eukaryotic intron is less than 50 bp. For a typical mammalian intron, the essential sequences in the introns are: the 5′ splice site (~10 bp); the 3′ splice site (~30 bp): the branch site (~10 bp); and enough additional RNA to form a loop (~30 bp). This gives a total of 80 bp of essential sequence per intron or 20,500 × 7.2 × 80 = 11.8 Mb. Thus, 0.37% of the genome is essential because it contains sequences for processing RNA.
The total of essential sequences in the transcribed part of a gene is about 1.8% of the genome.
The rest of the intron sequence is non-essential junk. Much of it is littered with transposable elements that have inserted haphazardly. If we subtract the essential intron sequence then the average size of the remaining DNA is 3650 bp. The total amount of this sequence is 20,500 × 7.2 × 3650 = 538.7 Mb or 17% of the genome. (Most estimates are somewhat higher.)
Assuming that 44% of this is repetitive transposable elements, this leaves
The transcription of every gene is controlled by sequences beyond the 5′ end. There are two classes of sequence; promoters, and regulatory sequences. The actual binding sites for RNA polymerase II and various regulatory proteins make up only about 100 bp of essential sequence but the various bound proteins have to form loops of DNA in order to come into contact. It's reasonable to assume that the average gene may need as much as 1000 bp of essential regulatory sequence. (A generous estimate.)
This means 20,500 × 1000 bp = 20.5 Mb or 0.6% of the genome is essential for regulation.
The grand totals for protein-encoding genes are:
essential 2.4%
junk
Hong X, Scofield DG, Lynch M (2006) Intron size, abundance, and distribution within untranslated regions of genes. Mol. Biol. Evol. 23:2392-404. [PubMed]
Hsp90 and Evolution
The SciPhu blog has an interesting series of posts on the chaperone Hsp90 and it's effect on evolution. Here's the list of article with the links.
My contribution to JustScience 2008 will be a review on a protein with the potential to transform evolution theory as we know it today. The review will be divided into 5 separate blog posts:I don't agree with the main conclusion that Hsp90 has an important role in evolution but the case is well presented. Anyone who wants to know more about hopeful monsters should read these articles.
UPDATE: My comment above may be misinterpreted. Sciphu may be right about "hopeful monsters" but he's dead wrong to confuse punctuated equilibria and macromutations. He makes the same mistake that others routinely make [Macromutations and Punctuated Equilibria].
We often criticize the creationists for misunderstanding punctuated equilibria and confusing it with the lack of transitional fossils. I think we should be just as hard on evolutionists who make the same mistake.
The figure shows the structures of Hsp90 from yeast, dog, human and E. coli [HSP90 Structure]
Stupidity Exists in Canada as well!
Friday's Urban Legend: FALSE
The following email message is going the rounds in Canada. Of course it's a favorite of the right-wing bigot crowd—here's an example [Kinda like Woodstock . . .for terrorists, see comments]. But the message has also suckered lots of otherwise intelligent people. It was sent to me by a friend who has finally learned to check with the urban legends website before spamming his email list. (A small victory for rationalism! ).
CANADA PENSION - A Must Read: only in Canada.As if to prove that people can be really stupid, the Canadian letter has morphed into an American one by merely substituting "America" for Canada. See the snopes.com site for an example of similar email letters [Refugee Whiz].
Do not apply for your old age pension.
Apply to be a refugee. It is interesting that the federal government provides a single refugee with a monthly allowance of $1,890.00 and each can get an additional $580.00 in social assistance for a total of $2,470.00.
This compares very well to a single pensioner who, after contributing to the growth and development of Canada for 40 or 50 years, can only receive a monthly maximum of $1,012.00 in old age pension and Guaranteed Income Supplement.
Maybe our pensioners should apply as refugees!
Let's send this thought to as many Canadians as we can and maybe we can get the refugees cut back to $1,012.00 and the pensioners up to $2,470.00, so they can enjoy the money they were forced to submit to the Canadian government for those 40 to 50 years.
Please forward this to every Canadian you know.
As usual, snopes.com has done the homework. They have outlined the history of this urban legend and traced it to the source. They have posted a letter from Citizenship and Immigration Canada that explains the real situation.
Refugees don't receive more financial assistance from the federal government than Canadian pensioners. In [a letter to the Toronto Star], a one-time,start-up payment provided to some refugees in Canada was mistaken for an ongoing, monthly payment. Unfortunately, although the newspaper published a clarification, the misleading information had already spread widely over
In truth, about three quarters of refugees receive financial assistance from the federal government, for a limited time, and at levels lower than Canadian pensioners. They are known as government-assisted refugees.
We have to remember that many of these people are fleeing from unimaginable hardship, and have lived in refugee camps for several years. Others are victims of trauma or torture in their home countries. Many arrive with little more than a few personal belongings, if that. Canada has a humanitarian role to accept refugees and help them start their new lives here.
For this reason, government-assisted refugees get a one-time payment of up to $1,095 from the federal government to cover essentials — basic,start-up needs like food, furniture and clothing. They also receive a temporary monthly allowance for food and shelter that is based on provincial social assistance rates. In Ontario, for example, a single refugee would receive $592 per month. This assistance is temporary— lasting only for one year or until they can find a job, whichever comes first.
This short-term support for refugees is a far cry from the lifetime benefits for Canada's seniors. The Old Age Security (OAS) program, for example, provides people who have lived in Canada for at least10 years with a pension at age 65. The Guaranteed Income Supplement (GIS) is an additional monthly benefit for low-income pensioners. The Canada Pension Plan (CPP), or Quebec Pension Plan (QPP) for people in Quebec, pays a monthly retirement pension to people who have worked and contributed to the plan over their career. In July 2006, Canadian seniors received an average of $463.20 in OAS benefits and $472.79 in CPP retirement benefits ($388.94 in QPP). Lower income OAS recipients also qualified for an average of an additional $361.94 in GIS benefits.
[The map depicts the important settlements of American refugees who came to Canada in the 1800's. Most of them traveled north via secret routes with the aid of American sympathizers. The route came to be known as [The Underground Railroad]
Labels:
Canada
,
Urban Legend
Thursday, February 07, 2008
Junk in Your Genome: Pseudogenes
Pseudogenes are non-functional DNA sequences that resemble genes. Much of the DNA related to transposable elements falls into this category. There are ribosomal RNA and tRNA pseudogenes but the term usually refers to sequences that resemble protein-encoding genes.
THEME
Genomes & Junk DNA
Total Junk so far
46%
There are two kinds of pseudogenes derived from protein-encoding genes. Those derived from reverse transcription of mRNA and the re-integration of double-stranded DNA into the genome are called "processed" pseudogenes because the mRNA precursor was processed to give mature mRNA before being copied. Consequently, processed pseudogenes do not have introns. They also don't have promoters so they cannot be transcribed.
The other kind of pseudogene arises following a gene duplication event. One of the copies acquires a mutation that inactivates it. This is usually not harmful because the other copy remains intact. It is the fate of most duplicated genes to become a pseudogene by inactivation.
The original meaning of "junk" DNA referred to pseudogenes (reviewed in Gregory 2005) but the term is now used frequently to mean any non-functional DNA. That's the definition I use here.
Ensembl lists 2,081 pseudogenes in the human genome but that's very low compared to other studies [Human Genome]. The number of processed pseudogenes range from several thousand up to 17 thousand (Drouin 2006). The ENCODE project found 118 pseudogenes in their detailed analysis of 1% of the genome (Solovyev et al. 2006). This suggest that there are 11,800 pseudogenes in the entire genome.
A number of studies suggest that the number of processed pseudogenes is approximately the same as the number of inactivated duplicated genes (reviewed in Taylor and Raes 2005). In the case of processed pseudogenes, there are many copies of a relatively small subset of the total number of genes. In other words, lots of genes do not spawn pseudogenes and those that do have many offspring. This is because there is a bias in favor of genes that are highly expressed n the germ line.
The total number of pseudogenes in the genome is likely to be close to the number of genes based on extrapolations from detailed analyses of small segments of the genome or single chromosomes.
If we assume that there are 10,000 processed pseudogenes averaging 2 kb each then this represents 20 Mb or 0.06% of the genome. If there are an equal number of other pseudogenes then this is 10,000 × 60 kb = 600 Mb or 18% of the genome. This is all junk DNA but it overlaps extensively with the junk DNA from transposable elements. It is further evidence that substantial parts of the genome are non- functional but since most of that sequence would be introns in an active gene, it would count as junk DNA even if the gene were active. It's best to just count the inactive exons in order to avoid double counting.
Thus, pseudogenes are about 1.2% of the genome and all of it is junk.1,2
1. A small number of former pseudogenes have been reactivated. They are no longer pseudogenes so they don't count as junk. A small number of pseudogenes have acquired a separate function so they don't count as junk. There do not appear to be very many examples.
2. There are many scientists who have tried to make the case for pseudogenes having some sort of function. The most common speculation is that they serve as an important reservoir of sequence information that can be accessed by recombination and/or re-activation (e.g., Balakirev and Ayala 2003).
Balakirev, E.S. and Ayala, F.J. (2003) PSEUDOGENES: Are They “Junk” or Functional DNA? Ann. Rev. Genet. 37:123-151. [doi:10.1146/annurev.genet.37.040103.103949]
Drouin, G. (2006)Processed pseudogenes are more abundant in human and mouse X chromosomes than in autosomes. Mol. Biol. Evol. 23:1652-1655 [PubMed]
Gregory, T.R. (2005) "Genome Size Evolution in Animals" in The Evolution of the Genome. Elsevier Academic Press, New York (USA).
Solovyev, V., Kosarev, P., Seledsov, I. and Vorobyev, D. (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7 Suppl 1:S10.1-12 [ PubMed
Taylor, J.S. and Raes, J. (2005) "Small-Scale Gene Duplications" in The Evolution of the Genome. Elsevier Academic Press, New York (USA).
Junk in Your Genome: SINES
In a previous posting I talked about Long Interspersed Elements or LINEs [Junk in Your Genome: LINEs]. These are retrotransposons that make up a significant percentage of the junk DNA in your genome. Most of them are completely defective, they are incapable of transposing and they usually don't encode any functional proteins.
THEME
Genomes & Junk DNA
A minority of LINEs are still active. Their genes for reverse transcriptase and endonuclease are still functional and the the transposons still retain the end sequences necessary for insertion.
Today I want to discuss Short Interspersed Elements or SINEs. These pieces of DNA tend to be only 100-400 bp in length but they contain all the features of transposons at their ends. The most important of these features is a short repeat of genomic DNA.
Most SINEs are related to the genes for small RNAs and, more specifically, to genes that are transcribed by RNA polymerase III [Transcription of the 7SL Gene]. Recall that one of the characteristics of Class III genes is that many of them have internal promoters. What this means is that the start site for transcription lies entirely within the DNA that's transcribed.
SINEs look like this:
The blue line represents the transcribed region of the SINE and the black line is the genomic DNA flanking the insert. At each end there is a short (about 5 bp) direct repeat representing the remnants of the insertion event. The 3′ end of the SINE has a short stretch of adenlyate residues (poly A) that is required for mobility.
A typical SINE is only about 100-400 bp long. As mentioned above, one of the key features of SINEs is the presence of an internal promoter to which RNA polymerase III binds. Class III promoters generally have two separate binding regions designated Box A and Box B. All SINEs are derived from genes encoding cellular RNAs such as tRNA, 7SL RNA, U RNAs, etc. These genes are transcribed by RNA polymerase III.
The SINE is transcribed because of the presence of the internal promoter. The transcript may be copied by reverse transcriptase produced from active LINEs in the genome. The DNA:RNA hybrid can be converted to double-stranded DNA and integrated into the genome as a transposable element using the LINE endonuclease. The process is similar to the mechanism that produces processed pseudogenes derived from mRNA but the difference is that the SINEs can still be transcribed when they have integrated into the genome whereas the mRNA pseudogenes have been separated from their promoter.
In the mouse genome there are two large families of SINEs. The B1 family is derived from a truncated and rearranged 7SL RNA. (Recall that 7SL RNA is the RNA component of signal recognition particle.) The B2 family comes from a tRNA that has acquired a terminal extension (Dewannieux and Heidmann 2005).
Each mouse family has about one million copies and together they make up about 20% of the mouse genome. Most of these transposable elements are defective because they have acquired mutations. They are not mobile and many are not transcribed.
In humans, the largest family of SINEs is called Alu elements after the fact that the sequence is cleaved by the restriction endonuclease Alu. These SINEs are also derived from 7SL RNA but the rearrangement is different from that in mouse. (They have a common ancestor.) There are about one million Alu elements in the human genome.
SINEs make up about 13% of the human genome. The largest proportion, by far, is Alu elements but there are small numbers of SINEs derived from other cellular RNAs such as the U RNAs required for splicing and snoRNAs (Garcia-Perez et al. 2007).
SINEs are parasites (selfish DNA). They are not essential for human survival and reproduction, especially the huge majority of SINEs that are defective. Thus, at least 13% of the human genome is clearly junk. The total amount of junk DNA contributed by all transposable elements is 44% of the genome (Kidwell 2005).
THEME
Genomes & Junk DNA
A minority of LINEs are still active. Their genes for reverse transcriptase and endonuclease are still functional and the the transposons still retain the end sequences necessary for insertion.
Today I want to discuss Short Interspersed Elements or SINEs. These pieces of DNA tend to be only 100-400 bp in length but they contain all the features of transposons at their ends. The most important of these features is a short repeat of genomic DNA.
Most SINEs are related to the genes for small RNAs and, more specifically, to genes that are transcribed by RNA polymerase III [Transcription of the 7SL Gene]. Recall that one of the characteristics of Class III genes is that many of them have internal promoters. What this means is that the start site for transcription lies entirely within the DNA that's transcribed.
SINEs look like this:
The blue line represents the transcribed region of the SINE and the black line is the genomic DNA flanking the insert. At each end there is a short (about 5 bp) direct repeat representing the remnants of the insertion event. The 3′ end of the SINE has a short stretch of adenlyate residues (poly A) that is required for mobility.
A typical SINE is only about 100-400 bp long. As mentioned above, one of the key features of SINEs is the presence of an internal promoter to which RNA polymerase III binds. Class III promoters generally have two separate binding regions designated Box A and Box B. All SINEs are derived from genes encoding cellular RNAs such as tRNA, 7SL RNA, U RNAs, etc. These genes are transcribed by RNA polymerase III.
The SINE is transcribed because of the presence of the internal promoter. The transcript may be copied by reverse transcriptase produced from active LINEs in the genome. The DNA:RNA hybrid can be converted to double-stranded DNA and integrated into the genome as a transposable element using the LINE endonuclease. The process is similar to the mechanism that produces processed pseudogenes derived from mRNA but the difference is that the SINEs can still be transcribed when they have integrated into the genome whereas the mRNA pseudogenes have been separated from their promoter.
In the mouse genome there are two large families of SINEs. The B1 family is derived from a truncated and rearranged 7SL RNA. (Recall that 7SL RNA is the RNA component of signal recognition particle.) The B2 family comes from a tRNA that has acquired a terminal extension (Dewannieux and Heidmann 2005).
Each mouse family has about one million copies and together they make up about 20% of the mouse genome. Most of these transposable elements are defective because they have acquired mutations. They are not mobile and many are not transcribed.
In humans, the largest family of SINEs is called Alu elements after the fact that the sequence is cleaved by the restriction endonuclease Alu. These SINEs are also derived from 7SL RNA but the rearrangement is different from that in mouse. (They have a common ancestor.) There are about one million Alu elements in the human genome.
SINEs make up about 13% of the human genome. The largest proportion, by far, is Alu elements but there are small numbers of SINEs derived from other cellular RNAs such as the U RNAs required for splicing and snoRNAs (Garcia-Perez et al. 2007).
SINEs are parasites (selfish DNA). They are not essential for human survival and reproduction, especially the huge majority of SINEs that are defective. Thus, at least 13% of the human genome is clearly junk. The total amount of junk DNA contributed by all transposable elements is 44% of the genome (Kidwell 2005).
Dewannieux, M. and Heidmann, T. (2005) L1-mediated retrotransposition of murine B1 and B2 SINEs recapitulated in cultured cells. J. Mol. Biol. 349:241-7 [PubMed]
Garcia-Perez, J.L., Doucet, A.J., Bucheton, A., Moran, J.V. and Gilbert, N. (2007) Distinct mechanisms for trans-mediated mobilization of cellular RNAs by the LINE-1 reverse transcriptase. Genome Res. 17:602-11. [PubMed] [Genome Research]
Kidwell, M. (2005) "Transposable Elements" in The Evolution of the Genome T.R. Gregory ed. Elsevier Academic Press, New York (USA)
Regulation of Transcription
From Horton et al. (2006), pp. 663-665.
Many genes are expressed in every cell. The expression of these housekeeping genes is said to be constitutive. In general, such genes have strong promoters and are transcribed efficiently and continuously. Genes whose products are required at low levels usually have weak promoters and are transcribed infrequently. In addition to constitutively expressed genes, cells contain genes that are expressed at high levels in some circumstances and not at all in others. Such genes are said to be regulated.
Regulation of gene expression can occur at any point in the flow of biological information but occurs most often at the level of transcription. Various mechanisms have evolved that allow cells to program gene expression during differentiation and development and to respond to environmental stimuli.
The initiation of transcription of regulated genes is controlled by regulatory proteins that bind to specific DNA sequences. Transcriptional regulation can be negative or positive. Transcription of a negatively regulated gene is prevented by a regulatory protein called a repressor. A negatively regulated gene can be transcribed only in the absence of active repressor. Transcription of a positively regulated gene can be activated by a regulatory protein called an activator. A positively regulated gene is transcribed poorly or not at all in the absence of the activator.
Repressors and activators are often allosteric proteins whose function is modified by ligand binding. In general, a ligand alters the conformation of the protein and affects its ability to bind to specific DNA sequences. For example, some repressors control the synthesis of enzymes for a catabolic pathway. In the absence of substrate for these enzymes, the genes are repressed. When substrate is present, it binds to the repressor, causing the repressor to dissociate from the DNA and allowing the genes to be transcribed. Ligands that bind to and inactivate repressors are called inducers because they induce transcription of the genes controlled by the repressors. In contrast, some repressors that control the synthesis of enzymes for a biosynthetic pathway bind to DNA only when associated with a ligand. The ligand is often the end product of the biosynthetic pathway. This regulatory mechanism ensures that the genes are turned off as product accumulates. Ligands that bind to and activate repressors are called corepressors. The DNA-binding activity of allosteric activators can also be affected in two ways by ligand binding. Four general strategies for regulating transcription are illustrated in the figures. Examples of all four strategies have been identified.
Few regulatory systems are as simple as those described above. For example, the transcription of many genes is regulated by a combination of repressors and activators or by multiple activators. Elaborate mechanisms for regulating transcription have evolved to meet the specific requirements of individual organisms. When transcription is regulated by a host of mechanisms acting together, a greater range of cellular responses is possible. By examining how the transcription of a few particular genes is controlled, we can begin to understand how positive and negative mechanisms can be combined to produce the remarkably sensitive regulation seen in bacterial cells.
©Laurence A. Moran and Pearson Prentice Hall
Horton, H.R., Moran, L.A., Scrimgeour, K.G., perry, M.D. and Rawn, J.D. (2006) Principles of Biochemisty. Pearson/Preintic Hall, Upper Saddle River N.J. (USA)
Theme: Genomes & Junk DNA
Junk in Your Genome
Transposable Elements: (44% junk)
DNA transposons:
active (functional): <0.1%
defective (nonfunctional): 3%
retrotransposons:
active (functional): <0.1%
defective transposons
(full-length, nonfunctional): 8%
L1 LINES (fragments, nonfunctional): 16%
other LINES: 4%
SINES (small pseudogene fragments): 13%
co-opted transposons/fragments: <0.1% a
DNA viruses
active (functional): <0.1%
defective DNA viruses: ~1%
RNA viruses
active (functional): <0.1%
defective (nonfunctional): 8%
co-opted RNA viruses: <0.1% b
(from protein-encoding genes): 1.2% junk
co-opted pseudogenes: <0.1% c
essential 0.22%
junk 0.19%
Other RNA encoding genes
tRNA genes: <0.1% (essential)
known small RNA genes: <0.1% (essential)
putative regulatory RNAs: ~2% (essential) Protein-encoding genes: (9.6% junk)
transcribed region:
essential 1.8%
intron junk (not included above) 9.6% d
essential 0.6%
Origins of DNA replication
<0.1% (essential) Scaffold attachment regions (SARS)
<0.1% (essential) Highly Repetitive DNA (1% junk)
α-satellite DNA (centromeres)
essential 2.0%
non-essential 1.0%%
telomeres
essential (less than 1000 kb, insignificant)
Intergenic DNA (not included above)
conserved 2% (essential)
non-conserved 26.3% (unknown but probably junk)
Total Essential/Functional (so far) = 8.7%
Total Junk (so far) = 65%
Unknown (probably mostly junk) = 26.3%
LAST UPDATE: May 10, 2011 (fixed totals, and ribosomal RNA calculations)
November 11, 2006
Sea Urchin Genome Sequenced
Transposable Elements: (44% junk)
DNA transposons:
active (functional): <0.1%
defective (nonfunctional): 3%
retrotransposons:
active (functional): <0.1%
defective transposons
(full-length, nonfunctional): 8%
L1 LINES (fragments, nonfunctional): 16%
other LINES: 4%
SINES (small pseudogene fragments): 13%
co-opted transposons/fragments: <0.1% a
aCo-opted transposons and transposon fragments are those that have secondarily acquired a new function.Viruses (9% junk)
DNA viruses
active (functional): <0.1%
defective DNA viruses: ~1%
RNA viruses
active (functional): <0.1%
defective (nonfunctional): 8%
co-opted RNA viruses: <0.1% b
bCo-opted RNA viruses are defective integrated virus genomes that have secondarily acquired a new function.Pseudogenes (1.2% junk)
(from protein-encoding genes): 1.2% junk
co-opted pseudogenes: <0.1% c
cCo-opted pseudogenes are formerly defective pseudogenes those that have secondarily acquired a new function.Ribosomal RNA genes:
essential 0.22%
junk 0.19%
Other RNA encoding genes
tRNA genes: <0.1% (essential)
known small RNA genes: <0.1% (essential)
putative regulatory RNAs: ~2% (essential) Protein-encoding genes: (9.6% junk)
transcribed region:
essential 1.8%
intron junk (not included above) 9.6% d
dIntrons sequences account for about 30% of the genome. Most of these sequences qualify as junk but they are littered with defective transposable elements that are already included in the calculation of junk DNA.Regulatory sequences:
essential 0.6%
Origins of DNA replication
<0.1% (essential) Scaffold attachment regions (SARS)
<0.1% (essential) Highly Repetitive DNA (1% junk)
α-satellite DNA (centromeres)
essential 2.0%
non-essential 1.0%%
telomeres
essential (less than 1000 kb, insignificant)
Intergenic DNA (not included above)
conserved 2% (essential)
non-conserved 26.3% (unknown but probably junk)
Total Essential/Functional (so far) = 8.7%
Total Junk (so far) = 65%
Unknown (probably mostly junk) = 26.3%
For references and further information click on the "Genomes & Junk DNA" link in the box
LAST UPDATE: May 10, 2011 (fixed totals, and ribosomal RNA calculations)
November 11, 2006
Sea Urchin Genome Sequenced
The sea urchin genome is 814,000 kb or about 1/4 the size of a typical mammalian genome. Like mammalian genomes, the sea urchin genome contains a lot of junk DNA, especially repetitive DNA. The preliminary count of the number of genes is 23,300. This is about the same number that we have in our genomes. Only about 10,000 of these genes have been annotated by the sea urchin sequencing team.
Wednesday, February 06, 2008
How to Be a Grown-up Scientist
Janet Stemwedel has put her finger on an important issue. Read her blog and find out about The project of being a grown-up scientist (part 1).
What percentage of academic scientists are grown-ups? I think it's pretty high in my department but it's not 100%.
TV Ontario's Best Lecturers
It's that time of year again. TV Ontario (TVO) has chosen its ten finalists for best university lecturer. you can see the list on the Best Lecturer website.
Some of you might recall that Michael Persinger of the magic motorcycle helmet was one of the finalists last year and he went on to win the $10,000 prize [TV Ontario's Best Lecturers]. I was a bit peeved at this. I wrote,
This is a popularity contest. The last one was very disappointing because some of the most important aspects of being a good university lecturer were ignored.Are you wondering how they did? They chose Michael Persinger, a "fringe" scientist, to put it politely.
I'm talking about accuracy and rigour. It's not good enough to just please the students. What you are saying has to be pitched at the right level and it has to be correct. Too many of the lectures were superficial, first-year introductions that offered no challenge to the students. (One, for example, was an overview of Greek and Roman architecture by an engineering Professor.) The students loved it, of course, and so did the TV producers because they could understand the material. Lecturer's in upper level courses need not apply.
Some of last year's lectures were inaccurate. The material was either misleading or false, and the concepts being taught were flawed. Neither students nor TV audiences were in any position to evaluate the material so accuracy was not a criterion in selecting the best lecturer of 2006.
I wrote to the producers about this, suggesting that the lecturers be pre-screened by experts in the discipline. TV Ontario promised to do a better job this year. I'm looking forward to seeing if they keep their promise.
How are they doing this year? Here's the list of judges.
Zanana L. Akande (born 1937 in Toronto, Ontario) is a former Canadian politician. She was the first black woman elected to the Legislative Assembly of Ontario, and the first black woman to serve as a cabinet minister in Canada.Isn't that interesting. The best people to judge whether a university Professor is delivering a good lecture are a politician, a writer, and an actor.
Barry Callaghan has done work in journalism, television, and filmmaking in addition to his own writing. He began his career as a part-time reporter for Canadian Broadcasting Corp (CBC) television news and gave weekly book reviews on the CBC radio program Audio.
Tony Nardi is an actor/ writer/producer. His acting experience has been diverse and prolific, in live theater, television and film. As an actor he received his training in Montreal at the Actor's Studio, The Banff School of Fine Arts, The Stratford Festival, and Italy.
Silly me. I thought that Professors might be on the panel of judges. I guess they're all too busy serving on juries that evaluate politicians, writers, and actors.
The top three criteria for evaluating university lectures are: (1) accuracy, (2) accuracy, and (3) accuracy. The only people who can judge whether those criteria are being met are other academics in the same discipline. If the lectures aren't accurate then nothing else matters. If the lecture material is accurate then you can start looking at other things, such as style.
Who Put the Cephalopod in SEED Magazine?
SEED magazine is usually a pretty good magazine in spite of the fact that they get a few things wrong and in spite of the fact that they sponsor ScienceBlogsTM.
But enough is enough. Imagine my surprise when I opened the current issue (January/February 2008) to page 21 and saw this ugly, squishy, creature. As a (fairly) loyal reader, I've been tolerant of their graphics and images even though most of them don't make much sense. But this is way over the top. Did the editors forget that this magazine is displayed on news stands where young children might see it?
Who is responsible for this? And what can we do about it?
[Hint: The disgusting image seems to be associated with an article titled Eyeing the Evolutionary Past by Paul Z. Mierz.]
Subscribe to:
Posts
(
Atom
)