More Recent Comments

Tuesday, February 12, 2008

The Lac Operon

The lac operon in E. coli consists of three genes1 (lacZ, lacY and lacA) transcribed from a single promoter. The lacZ gene encodes the enzyme β-galactosidase, an enzyme that cleaves β-galactosides. Lactose is a typical β-galactoside and the enzyme cleaves the disaccharide converting it to separate molecules of glucose and galactose. These monosacharides can enter into the metabolic pool of the cell where they can serve as the sole source of carbon.

Thus, when the lac operon is active and β-galactosidase is present, E. coli can grow on lactose as its only source of carbon. Outside of the laboratory, E. coli rarely encountered lactose (until recently) but there are many plant β-galactosides that are substrates for the enzyme.

LacY encodes a famous transporter called lactose permease. It is responsible for importing βgalactosides. The lacA gene encodes a transacetylase that is responsible for detoxifying the cell when it takes up poisonous β-galactosides.


Transcription begins at the Plac promoter and ends at a terminator at the 3′ end of the operon. Each of the three reading frames is translated separately from the polycistronic mRNA.

Upstream of the lac operon is the lacI gene. It encodes the lac repressor, one of the proteins that controls expression of the lac operon. The lacI gene is transcribed from its own promoter and it has its own terminator. (It is not necessary for the lacI gene to be linked to the operon.)

Expression of β-galactosidase, lac permease, and the transacetylase is regulated at the level of transcription. RNA polymerase binds to the lac promoter but this is a weak σ70 promoter.2. The promoter sequence is a poor match to the consensus sequence for these types of promoters so the operon is transcribed infrequently in the absence of additional activators. Transcription of the operon is activated by cAMP regulatory (or receptor) protein (CRP).3

In the absence of any β-galactoside, the operon is not transcribed and no enzyme is synthesized. Transcription is prevented by lac repressor, which binds to two operator sequences called O1 and O2. When β-galactosides are present repression is relived and the operon is transcribed at a low level in order to take advantage of the carbon source. When there is no other carbon source available, the operon is activated by CRP and the rate of transcription—and enzyme production—increases considerably.



1. This is one of the exceptions to the standard definition of a gene [What Is a Gene?]. In this case we are using the word "gene" to mean the coding region for a particular protein.

2. There are many different promoters in the E. coli genome. They are recognized by various RNA polymerase complexes containing different bound activators. One set of common activators is called σ factors: σ70 is the most common σ factor. Most genes have a σ70 promoter.

3. CRP is also known as catabolite activator protein (CAP).

Monday, February 11, 2008

More Billboards in Chambersburg PA

 
Read all about it at Battle of the Chambersburg billboards.



Monday's Molecule #62

 
Today's molecule is a cartoon depicting the action of several molecules. Your task is to identify all the molecules in the diagram and explain what's going on. Even if you're not interested in a free lunch, I'd appreciate hearing from you. I'd like to know how many of you understand the diagram. In fact, I'll put a poll in the sidebar to see how many recognize the process that's depicted here.

There's an indirect connection between this molecule and Wednesday's Nobel Laureate(s). Your task is to figure out the significance of today's diagram and identify the Nobel Laureate(s) who is associated with discovering the underlying process. (Be sure to check previous Laureates.)

The reward goes to the person who correctly identifies the molecule and the Nobel Laureate(s). Previous winners are ineligible for one month from the time they first collected the prize. There are three ineligible candidates for this week's reward. The prize is a free lunch at the Faculty Club.

THEME:

Nobel Laureates
Send your guess to Sandwalk (sandwalk(at)bioinfo.med.utoronto.ca) and I'll pick the first email message that correctly identifies the molecules, the process, and the Nobel Laureate(s). Note that I'm not going to repeat Nobel Laureates so you might want to check the list of previous Sandwalk postings.

Correct responses will be posted tomorrow along with the time that the message was received on my server. I may select multiple winners if several people get it right.

Comments will be blocked for 24 hours.





Sunday, February 10, 2008

Stu Kauffman in Toronto

 
I went to hear Stu Kauffman on Friday night [see Reinventing the Sacred].

Before the talk we had a little chat about blogging and some other topics. He wondered what the bloggers were saying about him and I told him that many don't understand what he's trying to say. I explained that I fell into that same category. I can't figure out what it is that he's trying to promote. He promised to try and explain in his talk.

It didn't work. I'm not much further ahead than I was before I heard him talk. Here's a brief summary of some things he said. I'm sorry if I can't put it all together into one big picture but I just can't.

The New Atheists: Kauffman thinks that Dawkins and his "New Atheist" friends are preaching to the converted. According to Kauffman, they will never convince the believers. Kauffman describes himself as a secular humanist and a non-believer. He thinks we should try to reach out to the religious community by adopting spiritual language. Hence the title of his talk. I don't really know what he means by this. He gave one example of having a reverence for some trees growing on a hill top near his house but I'm not sure if this is relevant. (See photograph, is that the hill top?)

I don't agree with his position on the so-called New Atheists and I don't agree with his proposal that it's the atheists who need to move towards the theists by adopting the sacred.

Reductionism: Kauffmann is very much opposed to reductionism. He spent some time describing how the laws of physics just don't work when you try and predict the structure of complex things. This does not mean they don't obey the laws of physics and chemistry, it means those laws aren't sufficient. This is because of emerging properties.

The discussion about reductionism and emergent properties is interesting but Kauffman makes it too complicated, for me, by going off on all kinds of tangents. In talking about it with him afterwards, he seems to be thinking that life is somehow special. It's different than the physical world. He takes pains to point out that he's not talking about vitalism but it sure sounds like that to me.

The other interesting thing about his anti-reductionism is that it doesn't apply in the same sense that Lewontin means when he talks about gene-centric biology. Before the lecture we were discussing the reason why human siblings don't mate and Kauffman was quite eager to offer an evolutionary psychology explanation. He suggested there was selection for an anti-incest gene in our ancestors to prevent inbreeding. That's the worst kind of reductionism but it's not the sort of reductionism that Kauffman disputes.

Determinism: Kauffman doesn't like determinism. He pointed out that quantum mechanics has ruled out the Laplace version of determinism. I don't think this is particularly controversial but I do think there are versions of determinism that don't require strict predictability. I kept waiting for the other shoe to drop. I don't think Kauffman was trying to make a case for free will and I don't think he was using his anti-determinism to argue against materialism, but I'm not sure.



Somehow these topics, and several others, were supposed to weave together to form a new way of looking at science. And a new way of reaching out to theists. That's the part I didn't get. A lot of what he was saying was true, but hardly profound. What was supposed to be profound didn't seem to be true.

Stu Kauffman took down the URL for Sandwalk and he promised to read my comments on his lectures. I hope he will respond in the comments. He seems like a pretty cool guy even if he's a bit baffling.

The dominant impression I have from talking to members of the audience—there were 65 people at the talk—is that people think he's saying something important but they just can't put their finger on what it is. At least I'm not the only one.


Where does disbelief in Darwin lead?

 
You probably think the answer to the question is obvious. The rejection of science leads to irrational behavior, right?

Of course it's right. DaveScot sets out to prove it over on Uncommon Descent with a posting that has the same title as this one [Where does disbelief in Darwin lead?]. As you read it, remember that the person who is writing the article is a disbeliever in evolution. Let's see where that kind of thinking leads ....
Be that as it may I’m a results oriented guy. Instead of presuming that “poorer” science education leads to poorer scientific output I instead look at what America actually produces in the way of science and engineering. Without question America’s output in science and engineering leads the world. Not just a little but a lot. We don’t steal nuclear technology secrets from China, they steal ours. We don’t use European GPS satellites for navigation, they use ours. The list can go on and on. We put a man on the moon 40 years ago while to this day no one else has. America has almost 3 times the number of Nobel prize winners as the next closest nation. That doesn’t support the notion that disbelief in Darwin is causing any problems. In fact it supports just the opposite. Disbelief in evolution makes a country into a superpower - militarily, economically, and yes even scientifically.

Education in America is working just fine, thank you, judging by the fruits of American science and engineering. Disbelief in Darwinian evolution, if anything, leads to greater technological achievements not lesser. If it isn’t broken, don’t try to fix it.
Well, there you have it. If only those successful scientists, engineers, and Nobel Laureates1 would stop believing in evolution there's no limit to what America could achieve. Just look at how far America has come when it's only the ignorant who disbelieve in evolution!

You know, you simply can't make this stuff up.


1. America is pretty much in the middle of the pack in terms of Nobel Laureates per capita [Nobel Prizes by Country]. It takes a bit of intelligence and simple math to recognize that point.

What Freedom of Speech Really Means

 
Read the amazing story on Friendly Atheist [Atheist Billboard Taken Down].

The Freedom from Religion Foundation contracted with Kegerreis Outdoor Advertising LLC to put up the following billboard in Chambersburg, Pennsylvania (USA).


That billboard has now been taken down and replaced with,


Read what the company has to say about their decision. It should not be necessary to point out what freedom of speech means and it is not proper for an advertising company to publicly state their moral values. Do all employees of Kegerreis Outdoor Advertising LLC agreed with the statement? If not, are they going to make their views known at company headquarters? Would you?


A Case of Plagiarism

 
The blogosphere is all atwitter about the publication of a paper titled "Mitochondria, the missing link between body and soul: Proteomic prospective evidence". This is the train wreck of a paper that PZ Myers blogged about a few days ago [What Happened to the "Peers" on this Paper?].

Everyone needs to know that the contents of this paper were not only stupid but also plagiarized. The authors couldn't even come up with their own words to explain their silly ideas. For the latest additions to a long list of stolen paragraphs see Commentary: Neither buried nor treasure.

The guilty journal is Proteomics. The editors are not blameless.


The Streisand Effect in Action

 
I mentioned the Streisand Effect a few days ago. Here's a perfect example of how it works.

ThePolitic.cmo is one of those Canadian blogs written by someone (Matthew) who embarrasses my country. Matthew writes,
It’s no secret that many of us liberty and/or family-minded folks are great fans of The National Post which officially only competes with the Globe and Mail but realistically also occupies reality that the Toronto Star and Toronto Sun covet. I personally began subscribing to the Post after graduation not because it had a host of right-wing commentators (the Toronto Sun can also claim this), but because the paper took the mission of presenting all view points seriously by often welcoming guest columnists who would attack its editorials, or by presenting series like the one they did two weeks ago on abortion, where a dozen commentators would weigh in on the issue with intelligent, but different viewpoints.

This led me to great sadness today when I went onto their website to read the digital version of the paper. The front cover was just a large cartoon title that said “The Love & Sex Issue” which is tastefully questionable in itself for a national newspaper, but if you look at the picture itself, it also contains the drawing of two nude people behind the “x” which those of us familiar with Japanese pop-culture would classify as hentai. Half of the main section contained articles which were more at home in a Penthouse issue and the Post’s website contains video content that I dare not look at but is clearly part of the above-mentioned theme.

I have since called the Post’s office and dealt with a nice young chap who will be passing along my complaints (the Post is good at responding to these), but in the mean time, I invite everyone else who is disturbed by this extremely poor lack in judgment to write or phone to the Post’s editorial staff:
For more information about the Love & Sex issue go to the National Post website [Love & Sex Issue].

Warning: There may be sexy hentai cartoons there and you will certainly find some discussions about another H-word. Don't go there if you're in kindergarten or you're a prude. Don't go there if, like Matthew, you dare not look at pictures of nude bodies.

Mathew should just be thankful that this topic wasn't covered in one of the left-wing newspapers like the Toronto Star. He probably would have had to leave the country for a few days to avoid the newspaper boxes.


[Hat Tip: Canadian Cynic]

God Only Knows

 
I found a new Canadian blog called Stony_Curtis: Pop Culture, Guys, Food, Montreal. With a title like that, don't you just have to click on the link?

One of the first things I stumbled across was the video below of the Beach Boys singing "God Only Knows" from the Pet Sounds album. I don't know who Stony Curtis is but I like him already. (P.S. Somehow the song doesn't seem quite as innovative and moving as it was 40 years ago. I wonder why.)




The Meaning of Consensus

 
What does Stephen Harper mean when he uses the word "consensus" as in,
The Conservative government will not extend Canada's combat mission in Afghanistan beyond February 2009 without a consensus in Parliament, Prime Minister Stephen Harper said Friday.

"I will want to see some degree of consensus among Canadians on how we move forward on that," Harper told reporters Friday in Ottawa.
Canadian Cynic has the answer. [Hint: Harper doesn't mean what you think he means.]


Saturday, February 09, 2008

Junk in Your Genome: Intron Size and Distribution

In the comments to Junk in Your Genome: Protein-Encoding Genes martinc asks,
Larry, if the amount of necessary sequences within introns are as small as you suggest wouldn't this allow us to make a prediction. Couldn't we predict that due to drift there should be very little similarity in intron lengths between different species. If, by any chance, there is similarity then what would your explanation be?
There have been quite a few studies of average intron size in various species. I selected a number for the average size of introns from Hong et al. (2006). The average intron size, according to them, is 3,479 bp in coding regions. This value is a little deceptive since there are a small number of huge introns that make the average quite large. The median value is 1334 bp or less than half the average value.

I suggested that much of the intron sequences were junk. Martinc's question is quite reasonable but in order to get an answer we need to look more closely at the distribution of introns.

The figure shows the distribution of intron sizes in four species: the flowering plant Arabidopsis thaliana; the fruit fly Drosophila melanogaster; human, and mouse. The data is from Hong et al. (2006, Fig.1).

Note that the distribution in Arabidopsis and Drosophila is very tight. Both of these species have relatively compact genomes compared to mammals. The data strongly suggests that the minimum intron size is about 80 bp.

The distributions in the human and mouse genomes are very different. There is a strong peak at 100 bp—this is similar to the peaks in other species. But unlike other species, mammalian introns can be extremely large, giving rise to a long tail of the distribution extending to 10,000 bp or more. The key question is whether this distribution of long introns is noise or an artifact of gene prediction algorithms, or whether it represents a real phenomenon.

Returning to martinc's question. If we look at well-conserved genes in different species what we find is some variation in intron length but only around a mean of about 100-400 bp. In other words, in genes that have been closely examined, where the protein product is known, the distribution of intron sizes looks a lot more like the distribution in Arabidopsis and Drosophila.

Let's look at the hsp90 genes. These are the genes that endcode Hsp90, the protein that SciPhu was blogging about [Hsp90 and Evolution].

I've picked the zebrafish gene and four mammalian genes to illustrate the variation in intron length. (Blue exons are 5′ and 3′ UTR's.) Most of the introns are between 80 and 400 bp in size but there are a few exceptions. In this case the human gene is the exception; it has two huge introns at the 5′ end of the gene.

What we see is a narrow distribution of intron lengths in most cases and a few huge introns. It isn't surprising that the length of introns in different species are quite similar.

Let's look at my favorite gene. HSPA8 is the cytoplasmic version of the chaperone HSP70 multigene family.

We see a similar pattern. Most intron lengths are very similar in different species suggesting selection for introns in the 100-400 bp range. There are exceptions, as we see in the chimpanzee, monkey and dog genes. All three have large introns at either the 5′ or 3′ ends. The large monkey inrons are 10,253 bp and 1007 bp. The large chimpanzee intron is 13,257 bp in length. This is typical. I think it's very likely that the large introns in noncoding exons are artifacts.

So here's the complete answer to the question posed at the top of the page. I think there's selection to maintain introns sizes to a fairly narrow range of between 100-400 bp. Because of this, we expect to see similar intron sizes in different species. On occasion we discover a huge intron that is peculiar to one species. This intron could be a transient expansion that hasn't been reduced yet, or it could be an artifact.

Incidentally, while retrieving these sequences from Entrez Gene I noticed that the annotators have eliminated all spice variants for HSP90 and HSPA8 genes with a few exceptions.

The dog sequences all have many splice variants for every gene and some of the variants have been retained in Entrez Gene entry for dog HSPA8. Look carefully at the two predicted variants in the seond and third lines. These alternative splice variants are supposed to produce Hsc70 proteins that are missing several highly conserved regions encoded by exons 7 and 8. Recall that this is the most highly conserved protein in biology.

These cannot be biologically relevant protein variants that are only produced in dogs. The annotators are right to remove similar artifacts from the other genomes and they should remove these as well. Alternative splice variants are mostly artifacts, in my opinion, but that's a fight for another day.


Hong X, Scofield DG, Lynch M (2006) Intron size, abundance, and distribution within untranslated regions of genes. Mol. Biol. Evol. 23:2392-404. [PubMed]

Friday, February 08, 2008

Junk in Your Genome: Protein-Encoding Genes

The typical human gene has eight exons and seven introns (the actual average number of introns is 7.2). These values are based on analysis of 5236 well-characterized human genes with full-length cDNA's (Hong et al. 2006). There are lots of conflicting results in the literature. Most claim there are more introns but the data is based largely on a computational assessment of introns and exons. It includes a number of introns of extraordinary length lying between exons of dubious existence (often non-coding). I'll assume for the time being that there are 7.2 introns per gene, on average, and the average length is 3750 bp (Hong et al. 2006)

Each gene is transcribed from a 5′ promoter (P) and the primary transcript terminates at a polyadenylation site (t).

THEME

Genomes & Junk DNA

Total Junk so far

    55%
The exons contain coding regions (blue) that encode the sequence of the protein product. A typical protein has a molecular weight of 70,000 daltons and this corresponds to about 635 amino acid residues. The coding region is 1905 bp but we'll round up to 2 kb. Each gene has a region of the mRNA at the 5′ end called the 5′ untranslated region (UTR). This is required for translation. It averages 200 bp in size, with considerable variation. The 3′ end of the gene has a similar untranslated region that we'll assume to be essential.

Thus, total essential exons comprise 2200 bp on average per gene. Since there are 20,500 protein-encoding genes, this means 20,500 × 2.2 kb = 45.1 Mb or 1.4% of the genome (about 1.3% coding and 0.1% UTRs).

The minimum size of a eukaryotic intron is less than 50 bp. For a typical mammalian intron, the essential sequences in the introns are: the 5′ splice site (~10 bp); the 3′ splice site (~30 bp): the branch site (~10 bp); and enough additional RNA to form a loop (~30 bp). This gives a total of 80 bp of essential sequence per intron or 20,500 × 7.2 × 80 = 11.8 Mb. Thus, 0.37% of the genome is essential because it contains sequences for processing RNA.

The total of essential sequences in the transcribed part of a gene is about 1.8% of the genome.

The rest of the intron sequence is non-essential junk. Much of it is littered with transposable elements that have inserted haphazardly. If we subtract the essential intron sequence then the average size of the remaining DNA is 3650 bp. The total amount of this sequence is 20,500 × 7.2 × 3650 = 538.7 Mb or 17% of the genome. (Most estimates are somewhat higher.)

Assuming that 44% of this is repetitive transposable elements, this leaves 7.4% 9.6% of the genome. That's an additional 7.4% 9.6% of non-essential DNA, or junk, bringing our current total to 53% 55% junk.

The transcription of every gene is controlled by sequences beyond the 5′ end. There are two classes of sequence; promoters, and regulatory sequences. The actual binding sites for RNA polymerase II and various regulatory proteins make up only about 100 bp of essential sequence but the various bound proteins have to form loops of DNA in order to come into contact. It's reasonable to assume that the average gene may need as much as 1000 bp of essential regulatory sequence. (A generous estimate.)

This means 20,500 × 1000 bp = 20.5 Mb or 0.6% of the genome is essential for regulation.

The grand totals for protein-encoding genes are:

essential 2.4%

junk 7.4% 9.6% (not counting sequences that were included in other calculations)


Hong X, Scofield DG, Lynch M (2006) Intron size, abundance, and distribution within untranslated regions of genes. Mol. Biol. Evol. 23:2392-404. [PubMed]

Hsp90 and Evolution

 
The SciPhu blog has an interesting series of posts on the chaperone Hsp90 and it's effect on evolution. Here's the list of article with the links.
My contribution to JustScience 2008 will be a review on a protein with the potential to transform evolution theory as we know it today. The review will be divided into 5 separate blog posts:
  1. Introduction to Hsp90 and evolution (this post)
  2. Presenting the Hsp90 protein
  3. How can chaperones act in evolution
  4. Evidence for Hsp90 involvement in rapid evolution of new traits
  5. Summary
I don't agree with the main conclusion that Hsp90 has an important role in evolution but the case is well presented. Anyone who wants to know more about hopeful monsters should read these articles.

UPDATE: My comment above may be misinterpreted. Sciphu may be right about "hopeful monsters" but he's dead wrong to confuse punctuated equilibria and macromutations. He makes the same mistake that others routinely make [Macromutations and Punctuated Equilibria].

We often criticize the creationists for misunderstanding punctuated equilibria and confusing it with the lack of transitional fossils. I think we should be just as hard on evolutionists who make the same mistake.


The figure shows the structures of Hsp90 from yeast, dog, human and E. coli [HSP90 Structure]

Stupidity Exists in Canada as well!

 
Friday's Urban Legend: FALSE

The following email message is going the rounds in Canada. Of course it's a favorite of the right-wing bigot crowd—here's an example [Kinda like Woodstock . . .for terrorists, see comments]. But the message has also suckered lots of otherwise intelligent people. It was sent to me by a friend who has finally learned to check with the urban legends website before spamming his email list. (A small victory for rationalism! ).

CANADA PENSION - A Must Read: only in Canada.

Do not apply for your old age pension.

Apply to be a refugee. It is interesting that the federal government provides a single refugee with a monthly allowance of $1,890.00 and each can get an additional $580.00 in social assistance for a total of $2,470.00.

This compares very well to a single pensioner who, after contributing to the growth and development of Canada for 40 or 50 years, can only receive a monthly maximum of $1,012.00 in old age pension and Guaranteed Income Supplement.

Maybe our pensioners should apply as refugees!

Let's send this thought to as many Canadians as we can and maybe we can get the refugees cut back to $1,012.00 and the pensioners up to $2,470.00, so they can enjoy the money they were forced to submit to the Canadian government for those 40 to 50 years.

Please forward this to every Canadian you know.
As if to prove that people can be really stupid, the Canadian letter has morphed into an American one by merely substituting "America" for Canada. See the snopes.com site for an example of similar email letters [Refugee Whiz].

As usual, snopes.com has done the homework. They have outlined the history of this urban legend and traced it to the source. They have posted a letter from Citizenship and Immigration Canada that explains the real situation.
Refugees don't receive more financial assistance from the federal government than Canadian pensioners. In [a letter to the Toronto Star], a one-time, start-up payment provided to some refugees in Canada was mistaken for an ongoing, monthly payment. Unfortunately, although the newspaper published a clarification, the misleading information had already spread widely over e-mail and the internet.

In truth, about three quarters of refugees receive financial assistance from the federal government, for a limited time, and at levels lower than Canadian pensioners. They are known as government-assisted refugees.

We have to remember that many of these people are fleeing from unimaginable hardship, and have lived in refugee camps for several years. Others are victims of trauma or torture in their home countries. Many arrive with little more than a few personal belongings, if that. Canada has a humanitarian role to accept refugees and help them start their new lives here.

For this reason, government-assisted refugees get a one-time payment of up to $1,095 from the federal government to cover essentials — basic, start-up needs like food, furniture and clothing. They also receive a temporary monthly allowance for food and shelter that is based on provincial social assistance rates. In Ontario, for example, a single refugee would receive $592 per month. This assistance is temporary — lasting only for one year or until they can find a job, whichever comes first.

This short-term support for refugees is a far cry from the lifetime benefits for Canada's seniors. The Old Age Security (OAS) program, for example, provides people who have lived in Canada for at least 10 years with a pension at age 65. The Guaranteed Income Supplement (GIS) is an additional monthly benefit for low-income pensioners. The Canada Pension Plan (CPP), or Quebec Pension Plan (QPP) for people in Quebec, pays a monthly retirement pension to people who have worked and contributed to the plan over their career. In July 2006, Canadian seniors received an average of $463.20 in OAS benefits and $472.79 in CPP retirement benefits ($388.94 in QPP). Lower income OAS recipients also qualified for an average of an additional $361.94 in GIS benefits.


[The map depicts the important settlements of American refugees who came to Canada in the 1800's. Most of them traveled north via secret routes with the aid of American sympathizers. The route came to be known as [The Underground Railroad]

Thursday, February 07, 2008

Junk in Your Genome: Pseudogenes

 
Pseudogenes are non-functional DNA sequences that resemble genes. Much of the DNA related to transposable elements falls into this category. There are ribosomal RNA and tRNA pseudogenes but the term usually refers to sequences that resemble protein-encoding genes.

THEME

Genomes & Junk DNA

Total Junk so far

    46%
There are two kinds of pseudogenes derived from protein-encoding genes. Those derived from reverse transcription of mRNA and the re-integration of double-stranded DNA into the genome are called "processed" pseudogenes because the mRNA precursor was processed to give mature mRNA before being copied. Consequently, processed pseudogenes do not have introns. They also don't have promoters so they cannot be transcribed.

The other kind of pseudogene arises following a gene duplication event. One of the copies acquires a mutation that inactivates it. This is usually not harmful because the other copy remains intact. It is the fate of most duplicated genes to become a pseudogene by inactivation.

The original meaning of "junk" DNA referred to pseudogenes (reviewed in Gregory 2005) but the term is now used frequently to mean any non-functional DNA. That's the definition I use here.

Ensembl lists 2,081 pseudogenes in the human genome but that's very low compared to other studies [Human Genome]. The number of processed pseudogenes range from several thousand up to 17 thousand (Drouin 2006). The ENCODE project found 118 pseudogenes in their detailed analysis of 1% of the genome (Solovyev et al. 2006). This suggest that there are 11,800 pseudogenes in the entire genome.

A number of studies suggest that the number of processed pseudogenes is approximately the same as the number of inactivated duplicated genes (reviewed in Taylor and Raes 2005). In the case of processed pseudogenes, there are many copies of a relatively small subset of the total number of genes. In other words, lots of genes do not spawn pseudogenes and those that do have many offspring. This is because there is a bias in favor of genes that are highly expressed n the germ line.

The total number of pseudogenes in the genome is likely to be close to the number of genes based on extrapolations from detailed analyses of small segments of the genome or single chromosomes.

If we assume that there are 10,000 processed pseudogenes averaging 2 kb each then this represents 20 Mb or 0.06% of the genome. If there are an equal number of other pseudogenes then this is 10,000 × 60 kb = 600 Mb or 18% of the genome. This is all junk DNA but it overlaps extensively with the junk DNA from transposable elements. It is further evidence that substantial parts of the genome are non- functional but since most of that sequence would be introns in an active gene, it would count as junk DNA even if the gene were active. It's best to just count the inactive exons in order to avoid double counting.

Thus, pseudogenes are about 1.2% of the genome and all of it is junk.1,2


1. A small number of former pseudogenes have been reactivated. They are no longer pseudogenes so they don't count as junk. A small number of pseudogenes have acquired a separate function so they don't count as junk. There do not appear to be very many examples.

2. There are many scientists who have tried to make the case for pseudogenes having some sort of function. The most common speculation is that they serve as an important reservoir of sequence information that can be accessed by recombination and/or re-activation (e.g., Balakirev and Ayala 2003).

Balakirev, E.S. and Ayala, F.J. (2003) PSEUDOGENES: Are They “Junk” or Functional DNA? Ann. Rev. Genet. 37:123-151. [doi:10.1146/annurev.genet.37.040103.103949]

Drouin, G. (2006)Processed pseudogenes are more abundant in human and mouse X chromosomes than in autosomes. Mol. Biol. Evol. 23:1652-1655 [PubMed]

Gregory, T.R. (2005) "Genome Size Evolution in Animals" in The Evolution of the Genome. Elsevier Academic Press, New York (USA).

Solovyev, V., Kosarev, P., Seledsov, I. and Vorobyev, D. (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7 Suppl 1:S10.1-12 [ PubMed

Taylor, J.S. and Raes, J. (2005) "Small-Scale Gene Duplications" in The Evolution of the Genome. Elsevier Academic Press, New York (USA).