More Recent Comments

Tuesday, February 12, 2008

Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm

 
A newly published paper in PLoS Biology (Li et al 2008) finds that many transcription factors are non-specifiaccly bound to DNA and they may not be involved in regulating gene expression at most binding sites. For an explanation of why this shold not be a surprise see Repression of the lac Operon.

Here's the Author Summary of the paper.
One of the largest classes of regulatory proteins in animals, sequence-specific DNA binding transcription factors determine in which cells genes will be expressed and so control the development of an animal from a single cell to a morphologically complex adult. Understanding how this process is coordinated depends on knowing the number and types of genes that each transcription factor binds and regulates. Using immunoprecipitation of in vivo crosslinked chromatin coupled with DNA microarray hybridization (ChIP/chip), we have determined the genomic binding sites in early embryos of six transcription factors that play a crucial role in early development of the fruit fly Drosophila melanogaster. We find that these proteins bind to several thousand genomic regions that lie close to approximately half the protein coding genes. Although this is a much larger number of genes than these factors are generally thought to regulate, we go on to show that whereas the more highly bound genes generally look to be functional targets, many of the genes bound at lower levels do not appear to be regulated by these factors. Our conclusions differ from those of other groups who have not distinguished between different levels of DNA binding in vivo using similar assays and who have generally assumed that all detected binding is functional.


Li et al. (2008) Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm. [PLoS Biology]

Goodbye Toronto

 
It's too damn cold and there's way too much snow in Toronto. I'm getting out of town for a bit. Blogging may be intermittent, especially if I'm having fun.




This photograph was taken yesterday (Feb. 11, 2008) just outside my building. That's the Ontario Legislature.

Happy Birthday Charles Darwin

 
Charles Robert Darwin was born on this day in 1809. Darwin was the greatest scientist who ever lived.

In honor of his birthday, and given that this is a year of politics in America, I thought it would be fun to post something about Darwin's interactions with politicians. The historical account is from Janet Browne's excellent biography (Brown 2002).

William Gladstone (photo below) was an orthodox Christian. He was not a fan of evolution. In March 1877 Gladstone was leader of the Liberal party and a former Prime Minister of the most powerful country in the world. He was spending the weekend with John Lunnock—a well-known liberal—and a few other friends, including Thomas Huxley.

They decided to walk over to Darwin's House in Downe. This was 18 years after the publication of Origins and Darwin was a famous guy. The guests were cordially received by Darwin and his wife Emma. Darwin and Emma were life-long liberals and they were honored by Gladstone's visit. A few days later, Darwin wrote a note to his friend saying,

Our quiet, however, was broken a couple of days ago by Gladstone calling here.—I never saw him before & was much pleased with him: I expected a stern, overwhelming sort of man, but found him as soft & smooth as butter, & very pleasant. He asked me whether I thought that the United States would hereafter play a much greater part in the history of the world than Europe. I said that I thought it would, but why he asked me, I cannot conceive & I said that he ought to be able to form a far better opinion,—but what that was he did not at all let out.
A few years later Gladstone sent Darwin one of his essays on Homer. Darwin gratefully acknowledged the gesture.

In 1881, when Gladstone was Prime Minister again, Darwin and some of his friends petitioned Gladstone to award a pension to Alfred Russel Wallace, who was in dire financial straits at the time. Gladstone granted the request. Two months later Gladstone offered Darwin a position as trustee of the British Museum but Darwin declined. (Remember, Gladstone did not agree with Darwin about evolution, or religion.)

When Darwin died, Gladstone was instrumental in arranging for him to be buried in Westminster Abbey. The funeral was held on April 26, 1882. William Gladstone was too busy to attend. He went to a dinner at Windsor.


Brown, J. (2002) Charles Darwin: The Power of Place (Vol. II). Alfred A. Knopf, New York (USA)

Repression of the lac Operon

There are many lesson to be learned from understanding the regulation of transcription of a well-studied system like the E. coli lac operon. Some of those lessons have consequences when we think about the problems of having large eukaryotic genomes. Read the description below and the implications that follow.

From Horton et al. (2006) p. 666



lac repressor binds simultaneously to two sites near the promoter of the lac operon. Repressor-binding sites are called operators. One operator (O1) is adjacent to the promoter, and the other (O2) is within the coding region of lacZ. When bound to both operators, the repressor causes the DNA to form a stable loop that can be seen in electron micrographs of the complex formed between lac repressor and DNA (bottom figure). The interaction of lac repressor with the operator sequences may block transcription by preventing the binding of RNA polymerase to the lac promoter. However, it is now known that, in some cases, both lac repressor and RNA polymerase can bind to the promoter at the same time. Thus, the repressor may also block transcription initiation by preventing formation of the open complex and promoter clearance. A schematic diagram of lac repressor bound to DNA in the presence of RNA polymerase is shown in the figure on the right. [See Monday's Molecule #61 for another view.] The diagram illustrates the relationship between the operators and the promoter and the DNA loop that forms when the repressor binds to DNA.

The repressor locates an operator by binding nonspecifically to DNA and searching in one dimension. (Recall from Section 21.3C that RNA polymerase also uses this kind of searching mechanism.) The equilibrium association constant for the binding of lac repressor to O1 in vitro is very high. As a result, the repressor blocks transcription very effectively. (lac repressor binds to the O2 site with lower affinity.) A bacterial cell contains only about 10 molecules of lac repressor, but the repressor searches for and finds an operator so rapidly that when a repressor dissociates spontaneously from the operator, another occupies the site within a very short time. However, during this brief interval, one transcript of the operon can be made since RNA polymerase is poised at the promoter. This low level of transcription, called escape synthesis, ensures that small amounts of lactose permease and β-galactosidase are present in the cell.

In the absence of lactose, lac repressor blocks expression of the lac operon, but when β-galactosides are available as potential carbon sources, the genes are transcribed. Several β-galactosides can act as inducers. If lactose is the available carbon source, the inducer is allolactose, which is produced from lactose by the action of β-galactosidse (Figure 21.18). Allolactose binds tightly to lac repressor and causes a conformational change that reduces the affinity of the repressor for the operators. [see Regulation of Transcription] In the presence of the inducer, lac repressor dissociates from the DNA, allowing RNA polymerase to initiate transcription. (Note that because of escape synthesis, lactose can be taken up and converted to allolactose even when the genes are repressed.)

Electron micrographs of DNA loops. These loops were formed by mixing lac repressor with a fragment of DNA bearing two synthetic lac repressor–binding sites. One binding site is located at one end of the DNA fragment, and the other is 535 bp away. DNA loops 535 bp in length form when the tetrameric repressor binds simultaneously to the two sites.
The strength of binding between a protein and a ligand is measured by an equilibrium binding constant (KB). In the case of lac repressor binding to its specific strong binding site (O1) KB = 1013 M-1. This is very high, in fact it is one of the tightest DNA bindings known in biology. What this means is that lac repressor will sit on the operon and repress transcription for at least 20 minutes under normal conditions.

However, the repressor will eventually fall off (dissociation rate constant k-1 = 6 × 10-4 s-1) and, as described above, the operon will be transcribed once (escape synthesis). A new repressor molecule finds the operator sequences very quickly because lac repressor binds non-specifically to DNA (KB = 4 × 104) and slides along the DNA searching for the operator in a process called one dimensional diffusion (association rate constant k1 = 1010 M-1 s-1). Even though the lac repressor only remains bound non-specifically for a few seconds, it is able to search about 2000 bp looking for a specific binding site.

Given the huge difference between the specific and non-specific binding constants, the cell only needs about ten molecules of lac repressor to ensure that the operator sequences are bound almost all of the time. At any given time nine of these molecules will be bound to random pieces of DNA in the genome and the other one will be bound to the lac operon.

Similar repressors and activators work in eukaryotic cells to regulate transcription. But in eukayotic cells we have a much bigger problem. First, there are very few regulatory proteins that have as strong a specific binding constant as lac repressor. Second, there is much more DNA in a eukaryotic cell. The consequences of having a large genome are: (a) it takes these DNA binding proteins much longer to find their specific binding site, and (b) at any one time, many more of the regulatory proteins are soaked up in non-specific binding to DNA. In eukaryotic cells with an abundance of junk DNA a typical regulatory protein has to be present at about 20,000 copies per cell in order to have a decent chance of biding to its specific regulatory site for a significant length of time. (Recall that only ten molecules of lac repressor are needed in E. coli.)

Given the properties of DNA binding that we have discovered and characterized in bacteria and bacteriophage, we can calculate that escape synthesis in eukaryotic cells in likely to be much more of a problem than in bacterial cells. Furthermore, accidental transcription of random bits of DNA is almost certainly going to be common in a cell with a large bloated genome. This is because RNA polymerase also binds non-specifically to DNA and also because the larger the genome, the more likely you are to encounter promoter and regulatory sequences that just by chance happen to be close matches to real functional sequences. This is a very important concept and one that is not widely appreciated. Based on our knowledge of basic biochemistry we expect that there will be random, infrequent transcription of a large percentage of the genome. These transcripts are merely a consequence of the properties of DNA binding proteins and they have no biological significance.

Some of these problems in eukaryotes are mitigated by a separate level of regulation at the level of chromatin structure. Large regions of the chromosome can be masked from DNA binding proteins by formation of a tight heterochromatic complex of nucleosomes and DNA. Less compact complexes are formed in non-active regions of the genome where the DNA is less accessible but not invisible. When genes in a region are transcribed, the chromatin opens out into an open complex where the DNA is easily accessible to regulatory proteins. This solves some of the problems discussed above but it is only a partial solution. We know for a fact that the concentrations of regulatorty proteins are high (20,000 copies) and a growing amount of evidence points to frequent accidental transcription.

©Laurence A. Moran and Pearson Prentice Hall


Horton, H.R., Moran, L.A., Scrimgeour, K.G., perry, M.D. and Rawn, J.D. (2006) Principles of Biochemisty. Pearson/Prentice Hall, Upper Saddle River N.J. (USA)

The Lac Operon

The lac operon in E. coli consists of three genes1 (lacZ, lacY and lacA) transcribed from a single promoter. The lacZ gene encodes the enzyme β-galactosidase, an enzyme that cleaves β-galactosides. Lactose is a typical β-galactoside and the enzyme cleaves the disaccharide converting it to separate molecules of glucose and galactose. These monosacharides can enter into the metabolic pool of the cell where they can serve as the sole source of carbon.

Thus, when the lac operon is active and β-galactosidase is present, E. coli can grow on lactose as its only source of carbon. Outside of the laboratory, E. coli rarely encountered lactose (until recently) but there are many plant β-galactosides that are substrates for the enzyme.

LacY encodes a famous transporter called lactose permease. It is responsible for importing βgalactosides. The lacA gene encodes a transacetylase that is responsible for detoxifying the cell when it takes up poisonous β-galactosides.


Transcription begins at the Plac promoter and ends at a terminator at the 3′ end of the operon. Each of the three reading frames is translated separately from the polycistronic mRNA.

Upstream of the lac operon is the lacI gene. It encodes the lac repressor, one of the proteins that controls expression of the lac operon. The lacI gene is transcribed from its own promoter and it has its own terminator. (It is not necessary for the lacI gene to be linked to the operon.)

Expression of β-galactosidase, lac permease, and the transacetylase is regulated at the level of transcription. RNA polymerase binds to the lac promoter but this is a weak σ70 promoter.2. The promoter sequence is a poor match to the consensus sequence for these types of promoters so the operon is transcribed infrequently in the absence of additional activators. Transcription of the operon is activated by cAMP regulatory (or receptor) protein (CRP).3

In the absence of any β-galactoside, the operon is not transcribed and no enzyme is synthesized. Transcription is prevented by lac repressor, which binds to two operator sequences called O1 and O2. When β-galactosides are present repression is relived and the operon is transcribed at a low level in order to take advantage of the carbon source. When there is no other carbon source available, the operon is activated by CRP and the rate of transcription—and enzyme production—increases considerably.



1. This is one of the exceptions to the standard definition of a gene [What Is a Gene?]. In this case we are using the word "gene" to mean the coding region for a particular protein.

2. There are many different promoters in the E. coli genome. They are recognized by various RNA polymerase complexes containing different bound activators. One set of common activators is called σ factors: σ70 is the most common σ factor. Most genes have a σ70 promoter.

3. CRP is also known as catabolite activator protein (CAP).

Monday, February 11, 2008

More Billboards in Chambersburg PA

 
Read all about it at Battle of the Chambersburg billboards.



Monday's Molecule #62

 
Today's molecule is a cartoon depicting the action of several molecules. Your task is to identify all the molecules in the diagram and explain what's going on. Even if you're not interested in a free lunch, I'd appreciate hearing from you. I'd like to know how many of you understand the diagram. In fact, I'll put a poll in the sidebar to see how many recognize the process that's depicted here.

There's an indirect connection between this molecule and Wednesday's Nobel Laureate(s). Your task is to figure out the significance of today's diagram and identify the Nobel Laureate(s) who is associated with discovering the underlying process. (Be sure to check previous Laureates.)

The reward goes to the person who correctly identifies the molecule and the Nobel Laureate(s). Previous winners are ineligible for one month from the time they first collected the prize. There are three ineligible candidates for this week's reward. The prize is a free lunch at the Faculty Club.

THEME:

Nobel Laureates
Send your guess to Sandwalk (sandwalk(at)bioinfo.med.utoronto.ca) and I'll pick the first email message that correctly identifies the molecules, the process, and the Nobel Laureate(s). Note that I'm not going to repeat Nobel Laureates so you might want to check the list of previous Sandwalk postings.

Correct responses will be posted tomorrow along with the time that the message was received on my server. I may select multiple winners if several people get it right.

Comments will be blocked for 24 hours.





Sunday, February 10, 2008

Stu Kauffman in Toronto

 
I went to hear Stu Kauffman on Friday night [see Reinventing the Sacred].

Before the talk we had a little chat about blogging and some other topics. He wondered what the bloggers were saying about him and I told him that many don't understand what he's trying to say. I explained that I fell into that same category. I can't figure out what it is that he's trying to promote. He promised to try and explain in his talk.

It didn't work. I'm not much further ahead than I was before I heard him talk. Here's a brief summary of some things he said. I'm sorry if I can't put it all together into one big picture but I just can't.

The New Atheists: Kauffman thinks that Dawkins and his "New Atheist" friends are preaching to the converted. According to Kauffman, they will never convince the believers. Kauffman describes himself as a secular humanist and a non-believer. He thinks we should try to reach out to the religious community by adopting spiritual language. Hence the title of his talk. I don't really know what he means by this. He gave one example of having a reverence for some trees growing on a hill top near his house but I'm not sure if this is relevant. (See photograph, is that the hill top?)

I don't agree with his position on the so-called New Atheists and I don't agree with his proposal that it's the atheists who need to move towards the theists by adopting the sacred.

Reductionism: Kauffmann is very much opposed to reductionism. He spent some time describing how the laws of physics just don't work when you try and predict the structure of complex things. This does not mean they don't obey the laws of physics and chemistry, it means those laws aren't sufficient. This is because of emerging properties.

The discussion about reductionism and emergent properties is interesting but Kauffman makes it too complicated, for me, by going off on all kinds of tangents. In talking about it with him afterwards, he seems to be thinking that life is somehow special. It's different than the physical world. He takes pains to point out that he's not talking about vitalism but it sure sounds like that to me.

The other interesting thing about his anti-reductionism is that it doesn't apply in the same sense that Lewontin means when he talks about gene-centric biology. Before the lecture we were discussing the reason why human siblings don't mate and Kauffman was quite eager to offer an evolutionary psychology explanation. He suggested there was selection for an anti-incest gene in our ancestors to prevent inbreeding. That's the worst kind of reductionism but it's not the sort of reductionism that Kauffman disputes.

Determinism: Kauffman doesn't like determinism. He pointed out that quantum mechanics has ruled out the Laplace version of determinism. I don't think this is particularly controversial but I do think there are versions of determinism that don't require strict predictability. I kept waiting for the other shoe to drop. I don't think Kauffman was trying to make a case for free will and I don't think he was using his anti-determinism to argue against materialism, but I'm not sure.



Somehow these topics, and several others, were supposed to weave together to form a new way of looking at science. And a new way of reaching out to theists. That's the part I didn't get. A lot of what he was saying was true, but hardly profound. What was supposed to be profound didn't seem to be true.

Stu Kauffman took down the URL for Sandwalk and he promised to read my comments on his lectures. I hope he will respond in the comments. He seems like a pretty cool guy even if he's a bit baffling.

The dominant impression I have from talking to members of the audience—there were 65 people at the talk—is that people think he's saying something important but they just can't put their finger on what it is. At least I'm not the only one.


Where does disbelief in Darwin lead?

 
You probably think the answer to the question is obvious. The rejection of science leads to irrational behavior, right?

Of course it's right. DaveScot sets out to prove it over on Uncommon Descent with a posting that has the same title as this one [Where does disbelief in Darwin lead?]. As you read it, remember that the person who is writing the article is a disbeliever in evolution. Let's see where that kind of thinking leads ....
Be that as it may I’m a results oriented guy. Instead of presuming that “poorer” science education leads to poorer scientific output I instead look at what America actually produces in the way of science and engineering. Without question America’s output in science and engineering leads the world. Not just a little but a lot. We don’t steal nuclear technology secrets from China, they steal ours. We don’t use European GPS satellites for navigation, they use ours. The list can go on and on. We put a man on the moon 40 years ago while to this day no one else has. America has almost 3 times the number of Nobel prize winners as the next closest nation. That doesn’t support the notion that disbelief in Darwin is causing any problems. In fact it supports just the opposite. Disbelief in evolution makes a country into a superpower - militarily, economically, and yes even scientifically.

Education in America is working just fine, thank you, judging by the fruits of American science and engineering. Disbelief in Darwinian evolution, if anything, leads to greater technological achievements not lesser. If it isn’t broken, don’t try to fix it.
Well, there you have it. If only those successful scientists, engineers, and Nobel Laureates1 would stop believing in evolution there's no limit to what America could achieve. Just look at how far America has come when it's only the ignorant who disbelieve in evolution!

You know, you simply can't make this stuff up.


1. America is pretty much in the middle of the pack in terms of Nobel Laureates per capita [Nobel Prizes by Country]. It takes a bit of intelligence and simple math to recognize that point.

What Freedom of Speech Really Means

 
Read the amazing story on Friendly Atheist [Atheist Billboard Taken Down].

The Freedom from Religion Foundation contracted with Kegerreis Outdoor Advertising LLC to put up the following billboard in Chambersburg, Pennsylvania (USA).


That billboard has now been taken down and replaced with,


Read what the company has to say about their decision. It should not be necessary to point out what freedom of speech means and it is not proper for an advertising company to publicly state their moral values. Do all employees of Kegerreis Outdoor Advertising LLC agreed with the statement? If not, are they going to make their views known at company headquarters? Would you?


A Case of Plagiarism

 
The blogosphere is all atwitter about the publication of a paper titled "Mitochondria, the missing link between body and soul: Proteomic prospective evidence". This is the train wreck of a paper that PZ Myers blogged about a few days ago [What Happened to the "Peers" on this Paper?].

Everyone needs to know that the contents of this paper were not only stupid but also plagiarized. The authors couldn't even come up with their own words to explain their silly ideas. For the latest additions to a long list of stolen paragraphs see Commentary: Neither buried nor treasure.

The guilty journal is Proteomics. The editors are not blameless.


The Streisand Effect in Action

 
I mentioned the Streisand Effect a few days ago. Here's a perfect example of how it works.

ThePolitic.cmo is one of those Canadian blogs written by someone (Matthew) who embarrasses my country. Matthew writes,
It’s no secret that many of us liberty and/or family-minded folks are great fans of The National Post which officially only competes with the Globe and Mail but realistically also occupies reality that the Toronto Star and Toronto Sun covet. I personally began subscribing to the Post after graduation not because it had a host of right-wing commentators (the Toronto Sun can also claim this), but because the paper took the mission of presenting all view points seriously by often welcoming guest columnists who would attack its editorials, or by presenting series like the one they did two weeks ago on abortion, where a dozen commentators would weigh in on the issue with intelligent, but different viewpoints.

This led me to great sadness today when I went onto their website to read the digital version of the paper. The front cover was just a large cartoon title that said “The Love & Sex Issue” which is tastefully questionable in itself for a national newspaper, but if you look at the picture itself, it also contains the drawing of two nude people behind the “x” which those of us familiar with Japanese pop-culture would classify as hentai. Half of the main section contained articles which were more at home in a Penthouse issue and the Post’s website contains video content that I dare not look at but is clearly part of the above-mentioned theme.

I have since called the Post’s office and dealt with a nice young chap who will be passing along my complaints (the Post is good at responding to these), but in the mean time, I invite everyone else who is disturbed by this extremely poor lack in judgment to write or phone to the Post’s editorial staff:
For more information about the Love & Sex issue go to the National Post website [Love & Sex Issue].

Warning: There may be sexy hentai cartoons there and you will certainly find some discussions about another H-word. Don't go there if you're in kindergarten or you're a prude. Don't go there if, like Matthew, you dare not look at pictures of nude bodies.

Mathew should just be thankful that this topic wasn't covered in one of the left-wing newspapers like the Toronto Star. He probably would have had to leave the country for a few days to avoid the newspaper boxes.


[Hat Tip: Canadian Cynic]

God Only Knows

 
I found a new Canadian blog called Stony_Curtis: Pop Culture, Guys, Food, Montreal. With a title like that, don't you just have to click on the link?

One of the first things I stumbled across was the video below of the Beach Boys singing "God Only Knows" from the Pet Sounds album. I don't know who Stony Curtis is but I like him already. (P.S. Somehow the song doesn't seem quite as innovative and moving as it was 40 years ago. I wonder why.)




The Meaning of Consensus

 
What does Stephen Harper mean when he uses the word "consensus" as in,
The Conservative government will not extend Canada's combat mission in Afghanistan beyond February 2009 without a consensus in Parliament, Prime Minister Stephen Harper said Friday.

"I will want to see some degree of consensus among Canadians on how we move forward on that," Harper told reporters Friday in Ottawa.
Canadian Cynic has the answer. [Hint: Harper doesn't mean what you think he means.]


Saturday, February 09, 2008

Junk in Your Genome: Intron Size and Distribution

In the comments to Junk in Your Genome: Protein-Encoding Genes martinc asks,
Larry, if the amount of necessary sequences within introns are as small as you suggest wouldn't this allow us to make a prediction. Couldn't we predict that due to drift there should be very little similarity in intron lengths between different species. If, by any chance, there is similarity then what would your explanation be?
There have been quite a few studies of average intron size in various species. I selected a number for the average size of introns from Hong et al. (2006). The average intron size, according to them, is 3,479 bp in coding regions. This value is a little deceptive since there are a small number of huge introns that make the average quite large. The median value is 1334 bp or less than half the average value.

I suggested that much of the intron sequences were junk. Martinc's question is quite reasonable but in order to get an answer we need to look more closely at the distribution of introns.

The figure shows the distribution of intron sizes in four species: the flowering plant Arabidopsis thaliana; the fruit fly Drosophila melanogaster; human, and mouse. The data is from Hong et al. (2006, Fig.1).

Note that the distribution in Arabidopsis and Drosophila is very tight. Both of these species have relatively compact genomes compared to mammals. The data strongly suggests that the minimum intron size is about 80 bp.

The distributions in the human and mouse genomes are very different. There is a strong peak at 100 bp—this is similar to the peaks in other species. But unlike other species, mammalian introns can be extremely large, giving rise to a long tail of the distribution extending to 10,000 bp or more. The key question is whether this distribution of long introns is noise or an artifact of gene prediction algorithms, or whether it represents a real phenomenon.

Returning to martinc's question. If we look at well-conserved genes in different species what we find is some variation in intron length but only around a mean of about 100-400 bp. In other words, in genes that have been closely examined, where the protein product is known, the distribution of intron sizes looks a lot more like the distribution in Arabidopsis and Drosophila.

Let's look at the hsp90 genes. These are the genes that endcode Hsp90, the protein that SciPhu was blogging about [Hsp90 and Evolution].

I've picked the zebrafish gene and four mammalian genes to illustrate the variation in intron length. (Blue exons are 5′ and 3′ UTR's.) Most of the introns are between 80 and 400 bp in size but there are a few exceptions. In this case the human gene is the exception; it has two huge introns at the 5′ end of the gene.

What we see is a narrow distribution of intron lengths in most cases and a few huge introns. It isn't surprising that the length of introns in different species are quite similar.

Let's look at my favorite gene. HSPA8 is the cytoplasmic version of the chaperone HSP70 multigene family.

We see a similar pattern. Most intron lengths are very similar in different species suggesting selection for introns in the 100-400 bp range. There are exceptions, as we see in the chimpanzee, monkey and dog genes. All three have large introns at either the 5′ or 3′ ends. The large monkey inrons are 10,253 bp and 1007 bp. The large chimpanzee intron is 13,257 bp in length. This is typical. I think it's very likely that the large introns in noncoding exons are artifacts.

So here's the complete answer to the question posed at the top of the page. I think there's selection to maintain introns sizes to a fairly narrow range of between 100-400 bp. Because of this, we expect to see similar intron sizes in different species. On occasion we discover a huge intron that is peculiar to one species. This intron could be a transient expansion that hasn't been reduced yet, or it could be an artifact.

Incidentally, while retrieving these sequences from Entrez Gene I noticed that the annotators have eliminated all spice variants for HSP90 and HSPA8 genes with a few exceptions.

The dog sequences all have many splice variants for every gene and some of the variants have been retained in Entrez Gene entry for dog HSPA8. Look carefully at the two predicted variants in the seond and third lines. These alternative splice variants are supposed to produce Hsc70 proteins that are missing several highly conserved regions encoded by exons 7 and 8. Recall that this is the most highly conserved protein in biology.

These cannot be biologically relevant protein variants that are only produced in dogs. The annotators are right to remove similar artifacts from the other genomes and they should remove these as well. Alternative splice variants are mostly artifacts, in my opinion, but that's a fight for another day.


Hong X, Scofield DG, Lynch M (2006) Intron size, abundance, and distribution within untranslated regions of genes. Mol. Biol. Evol. 23:2392-404. [PubMed]