More Recent Comments

Tuesday, February 12, 2008

Repression of the lac Operon

There are many lesson to be learned from understanding the regulation of transcription of a well-studied system like the E. coli lac operon. Some of those lessons have consequences when we think about the problems of having large eukaryotic genomes. Read the description below and the implications that follow.

From Horton et al. (2006) p. 666



lac repressor binds simultaneously to two sites near the promoter of the lac operon. Repressor-binding sites are called operators. One operator (O1) is adjacent to the promoter, and the other (O2) is within the coding region of lacZ. When bound to both operators, the repressor causes the DNA to form a stable loop that can be seen in electron micrographs of the complex formed between lac repressor and DNA (bottom figure). The interaction of lac repressor with the operator sequences may block transcription by preventing the binding of RNA polymerase to the lac promoter. However, it is now known that, in some cases, both lac repressor and RNA polymerase can bind to the promoter at the same time. Thus, the repressor may also block transcription initiation by preventing formation of the open complex and promoter clearance. A schematic diagram of lac repressor bound to DNA in the presence of RNA polymerase is shown in the figure on the right. [See Monday's Molecule #61 for another view.] The diagram illustrates the relationship between the operators and the promoter and the DNA loop that forms when the repressor binds to DNA.

The repressor locates an operator by binding nonspecifically to DNA and searching in one dimension. (Recall from Section 21.3C that RNA polymerase also uses this kind of searching mechanism.) The equilibrium association constant for the binding of lac repressor to O1 in vitro is very high. As a result, the repressor blocks transcription very effectively. (lac repressor binds to the O2 site with lower affinity.) A bacterial cell contains only about 10 molecules of lac repressor, but the repressor searches for and finds an operator so rapidly that when a repressor dissociates spontaneously from the operator, another occupies the site within a very short time. However, during this brief interval, one transcript of the operon can be made since RNA polymerase is poised at the promoter. This low level of transcription, called escape synthesis, ensures that small amounts of lactose permease and β-galactosidase are present in the cell.

In the absence of lactose, lac repressor blocks expression of the lac operon, but when β-galactosides are available as potential carbon sources, the genes are transcribed. Several β-galactosides can act as inducers. If lactose is the available carbon source, the inducer is allolactose, which is produced from lactose by the action of β-galactosidse (Figure 21.18). Allolactose binds tightly to lac repressor and causes a conformational change that reduces the affinity of the repressor for the operators. [see Regulation of Transcription] In the presence of the inducer, lac repressor dissociates from the DNA, allowing RNA polymerase to initiate transcription. (Note that because of escape synthesis, lactose can be taken up and converted to allolactose even when the genes are repressed.)

Electron micrographs of DNA loops. These loops were formed by mixing lac repressor with a fragment of DNA bearing two synthetic lac repressor–binding sites. One binding site is located at one end of the DNA fragment, and the other is 535 bp away. DNA loops 535 bp in length form when the tetrameric repressor binds simultaneously to the two sites.
The strength of binding between a protein and a ligand is measured by an equilibrium binding constant (KB). In the case of lac repressor binding to its specific strong binding site (O1) KB = 1013 M-1. This is very high, in fact it is one of the tightest DNA bindings known in biology. What this means is that lac repressor will sit on the operon and repress transcription for at least 20 minutes under normal conditions.

However, the repressor will eventually fall off (dissociation rate constant k-1 = 6 × 10-4 s-1) and, as described above, the operon will be transcribed once (escape synthesis). A new repressor molecule finds the operator sequences very quickly because lac repressor binds non-specifically to DNA (KB = 4 × 104) and slides along the DNA searching for the operator in a process called one dimensional diffusion (association rate constant k1 = 1010 M-1 s-1). Even though the lac repressor only remains bound non-specifically for a few seconds, it is able to search about 2000 bp looking for a specific binding site.

Given the huge difference between the specific and non-specific binding constants, the cell only needs about ten molecules of lac repressor to ensure that the operator sequences are bound almost all of the time. At any given time nine of these molecules will be bound to random pieces of DNA in the genome and the other one will be bound to the lac operon.

Similar repressors and activators work in eukaryotic cells to regulate transcription. But in eukayotic cells we have a much bigger problem. First, there are very few regulatory proteins that have as strong a specific binding constant as lac repressor. Second, there is much more DNA in a eukaryotic cell. The consequences of having a large genome are: (a) it takes these DNA binding proteins much longer to find their specific binding site, and (b) at any one time, many more of the regulatory proteins are soaked up in non-specific binding to DNA. In eukaryotic cells with an abundance of junk DNA a typical regulatory protein has to be present at about 20,000 copies per cell in order to have a decent chance of biding to its specific regulatory site for a significant length of time. (Recall that only ten molecules of lac repressor are needed in E. coli.)

Given the properties of DNA binding that we have discovered and characterized in bacteria and bacteriophage, we can calculate that escape synthesis in eukaryotic cells in likely to be much more of a problem than in bacterial cells. Furthermore, accidental transcription of random bits of DNA is almost certainly going to be common in a cell with a large bloated genome. This is because RNA polymerase also binds non-specifically to DNA and also because the larger the genome, the more likely you are to encounter promoter and regulatory sequences that just by chance happen to be close matches to real functional sequences. This is a very important concept and one that is not widely appreciated. Based on our knowledge of basic biochemistry we expect that there will be random, infrequent transcription of a large percentage of the genome. These transcripts are merely a consequence of the properties of DNA binding proteins and they have no biological significance.

Some of these problems in eukaryotes are mitigated by a separate level of regulation at the level of chromatin structure. Large regions of the chromosome can be masked from DNA binding proteins by formation of a tight heterochromatic complex of nucleosomes and DNA. Less compact complexes are formed in non-active regions of the genome where the DNA is less accessible but not invisible. When genes in a region are transcribed, the chromatin opens out into an open complex where the DNA is easily accessible to regulatory proteins. This solves some of the problems discussed above but it is only a partial solution. We know for a fact that the concentrations of regulatorty proteins are high (20,000 copies) and a growing amount of evidence points to frequent accidental transcription.

©Laurence A. Moran and Pearson Prentice Hall


Horton, H.R., Moran, L.A., Scrimgeour, K.G., perry, M.D. and Rawn, J.D. (2006) Principles of Biochemisty. Pearson/Prentice Hall, Upper Saddle River N.J. (USA)

The Lac Operon

The lac operon in E. coli consists of three genes1 (lacZ, lacY and lacA) transcribed from a single promoter. The lacZ gene encodes the enzyme β-galactosidase, an enzyme that cleaves β-galactosides. Lactose is a typical β-galactoside and the enzyme cleaves the disaccharide converting it to separate molecules of glucose and galactose. These monosacharides can enter into the metabolic pool of the cell where they can serve as the sole source of carbon.

Thus, when the lac operon is active and β-galactosidase is present, E. coli can grow on lactose as its only source of carbon. Outside of the laboratory, E. coli rarely encountered lactose (until recently) but there are many plant β-galactosides that are substrates for the enzyme.

LacY encodes a famous transporter called lactose permease. It is responsible for importing βgalactosides. The lacA gene encodes a transacetylase that is responsible for detoxifying the cell when it takes up poisonous β-galactosides.


Transcription begins at the Plac promoter and ends at a terminator at the 3′ end of the operon. Each of the three reading frames is translated separately from the polycistronic mRNA.

Upstream of the lac operon is the lacI gene. It encodes the lac repressor, one of the proteins that controls expression of the lac operon. The lacI gene is transcribed from its own promoter and it has its own terminator. (It is not necessary for the lacI gene to be linked to the operon.)

Expression of β-galactosidase, lac permease, and the transacetylase is regulated at the level of transcription. RNA polymerase binds to the lac promoter but this is a weak σ70 promoter.2. The promoter sequence is a poor match to the consensus sequence for these types of promoters so the operon is transcribed infrequently in the absence of additional activators. Transcription of the operon is activated by cAMP regulatory (or receptor) protein (CRP).3

In the absence of any β-galactoside, the operon is not transcribed and no enzyme is synthesized. Transcription is prevented by lac repressor, which binds to two operator sequences called O1 and O2. When β-galactosides are present repression is relived and the operon is transcribed at a low level in order to take advantage of the carbon source. When there is no other carbon source available, the operon is activated by CRP and the rate of transcription—and enzyme production—increases considerably.



1. This is one of the exceptions to the standard definition of a gene [What Is a Gene?]. In this case we are using the word "gene" to mean the coding region for a particular protein.

2. There are many different promoters in the E. coli genome. They are recognized by various RNA polymerase complexes containing different bound activators. One set of common activators is called σ factors: σ70 is the most common σ factor. Most genes have a σ70 promoter.

3. CRP is also known as catabolite activator protein (CAP).

Monday, February 11, 2008

More Billboards in Chambersburg PA

 
Read all about it at Battle of the Chambersburg billboards.



Monday's Molecule #62

 
Today's molecule is a cartoon depicting the action of several molecules. Your task is to identify all the molecules in the diagram and explain what's going on. Even if you're not interested in a free lunch, I'd appreciate hearing from you. I'd like to know how many of you understand the diagram. In fact, I'll put a poll in the sidebar to see how many recognize the process that's depicted here.

There's an indirect connection between this molecule and Wednesday's Nobel Laureate(s). Your task is to figure out the significance of today's diagram and identify the Nobel Laureate(s) who is associated with discovering the underlying process. (Be sure to check previous Laureates.)

The reward goes to the person who correctly identifies the molecule and the Nobel Laureate(s). Previous winners are ineligible for one month from the time they first collected the prize. There are three ineligible candidates for this week's reward. The prize is a free lunch at the Faculty Club.

THEME:

Nobel Laureates
Send your guess to Sandwalk (sandwalk(at)bioinfo.med.utoronto.ca) and I'll pick the first email message that correctly identifies the molecules, the process, and the Nobel Laureate(s). Note that I'm not going to repeat Nobel Laureates so you might want to check the list of previous Sandwalk postings.

Correct responses will be posted tomorrow along with the time that the message was received on my server. I may select multiple winners if several people get it right.

Comments will be blocked for 24 hours.





Sunday, February 10, 2008

Stu Kauffman in Toronto

 
I went to hear Stu Kauffman on Friday night [see Reinventing the Sacred].

Before the talk we had a little chat about blogging and some other topics. He wondered what the bloggers were saying about him and I told him that many don't understand what he's trying to say. I explained that I fell into that same category. I can't figure out what it is that he's trying to promote. He promised to try and explain in his talk.

It didn't work. I'm not much further ahead than I was before I heard him talk. Here's a brief summary of some things he said. I'm sorry if I can't put it all together into one big picture but I just can't.

The New Atheists: Kauffman thinks that Dawkins and his "New Atheist" friends are preaching to the converted. According to Kauffman, they will never convince the believers. Kauffman describes himself as a secular humanist and a non-believer. He thinks we should try to reach out to the religious community by adopting spiritual language. Hence the title of his talk. I don't really know what he means by this. He gave one example of having a reverence for some trees growing on a hill top near his house but I'm not sure if this is relevant. (See photograph, is that the hill top?)

I don't agree with his position on the so-called New Atheists and I don't agree with his proposal that it's the atheists who need to move towards the theists by adopting the sacred.

Reductionism: Kauffmann is very much opposed to reductionism. He spent some time describing how the laws of physics just don't work when you try and predict the structure of complex things. This does not mean they don't obey the laws of physics and chemistry, it means those laws aren't sufficient. This is because of emerging properties.

The discussion about reductionism and emergent properties is interesting but Kauffman makes it too complicated, for me, by going off on all kinds of tangents. In talking about it with him afterwards, he seems to be thinking that life is somehow special. It's different than the physical world. He takes pains to point out that he's not talking about vitalism but it sure sounds like that to me.

The other interesting thing about his anti-reductionism is that it doesn't apply in the same sense that Lewontin means when he talks about gene-centric biology. Before the lecture we were discussing the reason why human siblings don't mate and Kauffman was quite eager to offer an evolutionary psychology explanation. He suggested there was selection for an anti-incest gene in our ancestors to prevent inbreeding. That's the worst kind of reductionism but it's not the sort of reductionism that Kauffman disputes.

Determinism: Kauffman doesn't like determinism. He pointed out that quantum mechanics has ruled out the Laplace version of determinism. I don't think this is particularly controversial but I do think there are versions of determinism that don't require strict predictability. I kept waiting for the other shoe to drop. I don't think Kauffman was trying to make a case for free will and I don't think he was using his anti-determinism to argue against materialism, but I'm not sure.



Somehow these topics, and several others, were supposed to weave together to form a new way of looking at science. And a new way of reaching out to theists. That's the part I didn't get. A lot of what he was saying was true, but hardly profound. What was supposed to be profound didn't seem to be true.

Stu Kauffman took down the URL for Sandwalk and he promised to read my comments on his lectures. I hope he will respond in the comments. He seems like a pretty cool guy even if he's a bit baffling.

The dominant impression I have from talking to members of the audience—there were 65 people at the talk—is that people think he's saying something important but they just can't put their finger on what it is. At least I'm not the only one.


Where does disbelief in Darwin lead?

 
You probably think the answer to the question is obvious. The rejection of science leads to irrational behavior, right?

Of course it's right. DaveScot sets out to prove it over on Uncommon Descent with a posting that has the same title as this one [Where does disbelief in Darwin lead?]. As you read it, remember that the person who is writing the article is a disbeliever in evolution. Let's see where that kind of thinking leads ....
Be that as it may I’m a results oriented guy. Instead of presuming that “poorer” science education leads to poorer scientific output I instead look at what America actually produces in the way of science and engineering. Without question America’s output in science and engineering leads the world. Not just a little but a lot. We don’t steal nuclear technology secrets from China, they steal ours. We don’t use European GPS satellites for navigation, they use ours. The list can go on and on. We put a man on the moon 40 years ago while to this day no one else has. America has almost 3 times the number of Nobel prize winners as the next closest nation. That doesn’t support the notion that disbelief in Darwin is causing any problems. In fact it supports just the opposite. Disbelief in evolution makes a country into a superpower - militarily, economically, and yes even scientifically.

Education in America is working just fine, thank you, judging by the fruits of American science and engineering. Disbelief in Darwinian evolution, if anything, leads to greater technological achievements not lesser. If it isn’t broken, don’t try to fix it.
Well, there you have it. If only those successful scientists, engineers, and Nobel Laureates1 would stop believing in evolution there's no limit to what America could achieve. Just look at how far America has come when it's only the ignorant who disbelieve in evolution!

You know, you simply can't make this stuff up.


1. America is pretty much in the middle of the pack in terms of Nobel Laureates per capita [Nobel Prizes by Country]. It takes a bit of intelligence and simple math to recognize that point.

What Freedom of Speech Really Means

 
Read the amazing story on Friendly Atheist [Atheist Billboard Taken Down].

The Freedom from Religion Foundation contracted with Kegerreis Outdoor Advertising LLC to put up the following billboard in Chambersburg, Pennsylvania (USA).


That billboard has now been taken down and replaced with,


Read what the company has to say about their decision. It should not be necessary to point out what freedom of speech means and it is not proper for an advertising company to publicly state their moral values. Do all employees of Kegerreis Outdoor Advertising LLC agreed with the statement? If not, are they going to make their views known at company headquarters? Would you?


A Case of Plagiarism

 
The blogosphere is all atwitter about the publication of a paper titled "Mitochondria, the missing link between body and soul: Proteomic prospective evidence". This is the train wreck of a paper that PZ Myers blogged about a few days ago [What Happened to the "Peers" on this Paper?].

Everyone needs to know that the contents of this paper were not only stupid but also plagiarized. The authors couldn't even come up with their own words to explain their silly ideas. For the latest additions to a long list of stolen paragraphs see Commentary: Neither buried nor treasure.

The guilty journal is Proteomics. The editors are not blameless.


The Streisand Effect in Action

 
I mentioned the Streisand Effect a few days ago. Here's a perfect example of how it works.

ThePolitic.cmo is one of those Canadian blogs written by someone (Matthew) who embarrasses my country. Matthew writes,
It’s no secret that many of us liberty and/or family-minded folks are great fans of The National Post which officially only competes with the Globe and Mail but realistically also occupies reality that the Toronto Star and Toronto Sun covet. I personally began subscribing to the Post after graduation not because it had a host of right-wing commentators (the Toronto Sun can also claim this), but because the paper took the mission of presenting all view points seriously by often welcoming guest columnists who would attack its editorials, or by presenting series like the one they did two weeks ago on abortion, where a dozen commentators would weigh in on the issue with intelligent, but different viewpoints.

This led me to great sadness today when I went onto their website to read the digital version of the paper. The front cover was just a large cartoon title that said “The Love & Sex Issue” which is tastefully questionable in itself for a national newspaper, but if you look at the picture itself, it also contains the drawing of two nude people behind the “x” which those of us familiar with Japanese pop-culture would classify as hentai. Half of the main section contained articles which were more at home in a Penthouse issue and the Post’s website contains video content that I dare not look at but is clearly part of the above-mentioned theme.

I have since called the Post’s office and dealt with a nice young chap who will be passing along my complaints (the Post is good at responding to these), but in the mean time, I invite everyone else who is disturbed by this extremely poor lack in judgment to write or phone to the Post’s editorial staff:
For more information about the Love & Sex issue go to the National Post website [Love & Sex Issue].

Warning: There may be sexy hentai cartoons there and you will certainly find some discussions about another H-word. Don't go there if you're in kindergarten or you're a prude. Don't go there if, like Matthew, you dare not look at pictures of nude bodies.

Mathew should just be thankful that this topic wasn't covered in one of the left-wing newspapers like the Toronto Star. He probably would have had to leave the country for a few days to avoid the newspaper boxes.


[Hat Tip: Canadian Cynic]

God Only Knows

 
I found a new Canadian blog called Stony_Curtis: Pop Culture, Guys, Food, Montreal. With a title like that, don't you just have to click on the link?

One of the first things I stumbled across was the video below of the Beach Boys singing "God Only Knows" from the Pet Sounds album. I don't know who Stony Curtis is but I like him already. (P.S. Somehow the song doesn't seem quite as innovative and moving as it was 40 years ago. I wonder why.)




The Meaning of Consensus

 
What does Stephen Harper mean when he uses the word "consensus" as in,
The Conservative government will not extend Canada's combat mission in Afghanistan beyond February 2009 without a consensus in Parliament, Prime Minister Stephen Harper said Friday.

"I will want to see some degree of consensus among Canadians on how we move forward on that," Harper told reporters Friday in Ottawa.
Canadian Cynic has the answer. [Hint: Harper doesn't mean what you think he means.]


Saturday, February 09, 2008

Junk in Your Genome: Intron Size and Distribution

In the comments to Junk in Your Genome: Protein-Encoding Genes martinc asks,
Larry, if the amount of necessary sequences within introns are as small as you suggest wouldn't this allow us to make a prediction. Couldn't we predict that due to drift there should be very little similarity in intron lengths between different species. If, by any chance, there is similarity then what would your explanation be?
There have been quite a few studies of average intron size in various species. I selected a number for the average size of introns from Hong et al. (2006). The average intron size, according to them, is 3,479 bp in coding regions. This value is a little deceptive since there are a small number of huge introns that make the average quite large. The median value is 1334 bp or less than half the average value.

I suggested that much of the intron sequences were junk. Martinc's question is quite reasonable but in order to get an answer we need to look more closely at the distribution of introns.

The figure shows the distribution of intron sizes in four species: the flowering plant Arabidopsis thaliana; the fruit fly Drosophila melanogaster; human, and mouse. The data is from Hong et al. (2006, Fig.1).

Note that the distribution in Arabidopsis and Drosophila is very tight. Both of these species have relatively compact genomes compared to mammals. The data strongly suggests that the minimum intron size is about 80 bp.

The distributions in the human and mouse genomes are very different. There is a strong peak at 100 bp—this is similar to the peaks in other species. But unlike other species, mammalian introns can be extremely large, giving rise to a long tail of the distribution extending to 10,000 bp or more. The key question is whether this distribution of long introns is noise or an artifact of gene prediction algorithms, or whether it represents a real phenomenon.

Returning to martinc's question. If we look at well-conserved genes in different species what we find is some variation in intron length but only around a mean of about 100-400 bp. In other words, in genes that have been closely examined, where the protein product is known, the distribution of intron sizes looks a lot more like the distribution in Arabidopsis and Drosophila.

Let's look at the hsp90 genes. These are the genes that endcode Hsp90, the protein that SciPhu was blogging about [Hsp90 and Evolution].

I've picked the zebrafish gene and four mammalian genes to illustrate the variation in intron length. (Blue exons are 5′ and 3′ UTR's.) Most of the introns are between 80 and 400 bp in size but there are a few exceptions. In this case the human gene is the exception; it has two huge introns at the 5′ end of the gene.

What we see is a narrow distribution of intron lengths in most cases and a few huge introns. It isn't surprising that the length of introns in different species are quite similar.

Let's look at my favorite gene. HSPA8 is the cytoplasmic version of the chaperone HSP70 multigene family.

We see a similar pattern. Most intron lengths are very similar in different species suggesting selection for introns in the 100-400 bp range. There are exceptions, as we see in the chimpanzee, monkey and dog genes. All three have large introns at either the 5′ or 3′ ends. The large monkey inrons are 10,253 bp and 1007 bp. The large chimpanzee intron is 13,257 bp in length. This is typical. I think it's very likely that the large introns in noncoding exons are artifacts.

So here's the complete answer to the question posed at the top of the page. I think there's selection to maintain introns sizes to a fairly narrow range of between 100-400 bp. Because of this, we expect to see similar intron sizes in different species. On occasion we discover a huge intron that is peculiar to one species. This intron could be a transient expansion that hasn't been reduced yet, or it could be an artifact.

Incidentally, while retrieving these sequences from Entrez Gene I noticed that the annotators have eliminated all spice variants for HSP90 and HSPA8 genes with a few exceptions.

The dog sequences all have many splice variants for every gene and some of the variants have been retained in Entrez Gene entry for dog HSPA8. Look carefully at the two predicted variants in the seond and third lines. These alternative splice variants are supposed to produce Hsc70 proteins that are missing several highly conserved regions encoded by exons 7 and 8. Recall that this is the most highly conserved protein in biology.

These cannot be biologically relevant protein variants that are only produced in dogs. The annotators are right to remove similar artifacts from the other genomes and they should remove these as well. Alternative splice variants are mostly artifacts, in my opinion, but that's a fight for another day.


Hong X, Scofield DG, Lynch M (2006) Intron size, abundance, and distribution within untranslated regions of genes. Mol. Biol. Evol. 23:2392-404. [PubMed]

Friday, February 08, 2008

Junk in Your Genome: Protein-Encoding Genes

The typical human gene has eight exons and seven introns (the actual average number of introns is 7.2). These values are based on analysis of 5236 well-characterized human genes with full-length cDNA's (Hong et al. 2006). There are lots of conflicting results in the literature. Most claim there are more introns but the data is based largely on a computational assessment of introns and exons. It includes a number of introns of extraordinary length lying between exons of dubious existence (often non-coding). I'll assume for the time being that there are 7.2 introns per gene, on average, and the average length is 3750 bp (Hong et al. 2006)

Each gene is transcribed from a 5′ promoter (P) and the primary transcript terminates at a polyadenylation site (t).

THEME

Genomes & Junk DNA

Total Junk so far

    55%
The exons contain coding regions (blue) that encode the sequence of the protein product. A typical protein has a molecular weight of 70,000 daltons and this corresponds to about 635 amino acid residues. The coding region is 1905 bp but we'll round up to 2 kb. Each gene has a region of the mRNA at the 5′ end called the 5′ untranslated region (UTR). This is required for translation. It averages 200 bp in size, with considerable variation. The 3′ end of the gene has a similar untranslated region that we'll assume to be essential.

Thus, total essential exons comprise 2200 bp on average per gene. Since there are 20,500 protein-encoding genes, this means 20,500 × 2.2 kb = 45.1 Mb or 1.4% of the genome (about 1.3% coding and 0.1% UTRs).

The minimum size of a eukaryotic intron is less than 50 bp. For a typical mammalian intron, the essential sequences in the introns are: the 5′ splice site (~10 bp); the 3′ splice site (~30 bp): the branch site (~10 bp); and enough additional RNA to form a loop (~30 bp). This gives a total of 80 bp of essential sequence per intron or 20,500 × 7.2 × 80 = 11.8 Mb. Thus, 0.37% of the genome is essential because it contains sequences for processing RNA.

The total of essential sequences in the transcribed part of a gene is about 1.8% of the genome.

The rest of the intron sequence is non-essential junk. Much of it is littered with transposable elements that have inserted haphazardly. If we subtract the essential intron sequence then the average size of the remaining DNA is 3650 bp. The total amount of this sequence is 20,500 × 7.2 × 3650 = 538.7 Mb or 17% of the genome. (Most estimates are somewhat higher.)

Assuming that 44% of this is repetitive transposable elements, this leaves 7.4% 9.6% of the genome. That's an additional 7.4% 9.6% of non-essential DNA, or junk, bringing our current total to 53% 55% junk.

The transcription of every gene is controlled by sequences beyond the 5′ end. There are two classes of sequence; promoters, and regulatory sequences. The actual binding sites for RNA polymerase II and various regulatory proteins make up only about 100 bp of essential sequence but the various bound proteins have to form loops of DNA in order to come into contact. It's reasonable to assume that the average gene may need as much as 1000 bp of essential regulatory sequence. (A generous estimate.)

This means 20,500 × 1000 bp = 20.5 Mb or 0.6% of the genome is essential for regulation.

The grand totals for protein-encoding genes are:

essential 2.4%

junk 7.4% 9.6% (not counting sequences that were included in other calculations)


Hong X, Scofield DG, Lynch M (2006) Intron size, abundance, and distribution within untranslated regions of genes. Mol. Biol. Evol. 23:2392-404. [PubMed]

Hsp90 and Evolution

 
The SciPhu blog has an interesting series of posts on the chaperone Hsp90 and it's effect on evolution. Here's the list of article with the links.
My contribution to JustScience 2008 will be a review on a protein with the potential to transform evolution theory as we know it today. The review will be divided into 5 separate blog posts:
  1. Introduction to Hsp90 and evolution (this post)
  2. Presenting the Hsp90 protein
  3. How can chaperones act in evolution
  4. Evidence for Hsp90 involvement in rapid evolution of new traits
  5. Summary
I don't agree with the main conclusion that Hsp90 has an important role in evolution but the case is well presented. Anyone who wants to know more about hopeful monsters should read these articles.

UPDATE: My comment above may be misinterpreted. Sciphu may be right about "hopeful monsters" but he's dead wrong to confuse punctuated equilibria and macromutations. He makes the same mistake that others routinely make [Macromutations and Punctuated Equilibria].

We often criticize the creationists for misunderstanding punctuated equilibria and confusing it with the lack of transitional fossils. I think we should be just as hard on evolutionists who make the same mistake.


The figure shows the structures of Hsp90 from yeast, dog, human and E. coli [HSP90 Structure]

Stupidity Exists in Canada as well!

 
Friday's Urban Legend: FALSE

The following email message is going the rounds in Canada. Of course it's a favorite of the right-wing bigot crowd—here's an example [Kinda like Woodstock . . .for terrorists, see comments]. But the message has also suckered lots of otherwise intelligent people. It was sent to me by a friend who has finally learned to check with the urban legends website before spamming his email list. (A small victory for rationalism! ).

CANADA PENSION - A Must Read: only in Canada.

Do not apply for your old age pension.

Apply to be a refugee. It is interesting that the federal government provides a single refugee with a monthly allowance of $1,890.00 and each can get an additional $580.00 in social assistance for a total of $2,470.00.

This compares very well to a single pensioner who, after contributing to the growth and development of Canada for 40 or 50 years, can only receive a monthly maximum of $1,012.00 in old age pension and Guaranteed Income Supplement.

Maybe our pensioners should apply as refugees!

Let's send this thought to as many Canadians as we can and maybe we can get the refugees cut back to $1,012.00 and the pensioners up to $2,470.00, so they can enjoy the money they were forced to submit to the Canadian government for those 40 to 50 years.

Please forward this to every Canadian you know.
As if to prove that people can be really stupid, the Canadian letter has morphed into an American one by merely substituting "America" for Canada. See the snopes.com site for an example of similar email letters [Refugee Whiz].

As usual, snopes.com has done the homework. They have outlined the history of this urban legend and traced it to the source. They have posted a letter from Citizenship and Immigration Canada that explains the real situation.
Refugees don't receive more financial assistance from the federal government than Canadian pensioners. In [a letter to the Toronto Star], a one-time, start-up payment provided to some refugees in Canada was mistaken for an ongoing, monthly payment. Unfortunately, although the newspaper published a clarification, the misleading information had already spread widely over e-mail and the internet.

In truth, about three quarters of refugees receive financial assistance from the federal government, for a limited time, and at levels lower than Canadian pensioners. They are known as government-assisted refugees.

We have to remember that many of these people are fleeing from unimaginable hardship, and have lived in refugee camps for several years. Others are victims of trauma or torture in their home countries. Many arrive with little more than a few personal belongings, if that. Canada has a humanitarian role to accept refugees and help them start their new lives here.

For this reason, government-assisted refugees get a one-time payment of up to $1,095 from the federal government to cover essentials — basic, start-up needs like food, furniture and clothing. They also receive a temporary monthly allowance for food and shelter that is based on provincial social assistance rates. In Ontario, for example, a single refugee would receive $592 per month. This assistance is temporary — lasting only for one year or until they can find a job, whichever comes first.

This short-term support for refugees is a far cry from the lifetime benefits for Canada's seniors. The Old Age Security (OAS) program, for example, provides people who have lived in Canada for at least 10 years with a pension at age 65. The Guaranteed Income Supplement (GIS) is an additional monthly benefit for low-income pensioners. The Canada Pension Plan (CPP), or Quebec Pension Plan (QPP) for people in Quebec, pays a monthly retirement pension to people who have worked and contributed to the plan over their career. In July 2006, Canadian seniors received an average of $463.20 in OAS benefits and $472.79 in CPP retirement benefits ($388.94 in QPP). Lower income OAS recipients also qualified for an average of an additional $361.94 in GIS benefits.


[The map depicts the important settlements of American refugees who came to Canada in the 1800's. Most of them traveled north via secret routes with the aid of American sympathizers. The route came to be known as [The Underground Railroad]

Thursday, February 07, 2008

Junk in Your Genome: Pseudogenes

 
Pseudogenes are non-functional DNA sequences that resemble genes. Much of the DNA related to transposable elements falls into this category. There are ribosomal RNA and tRNA pseudogenes but the term usually refers to sequences that resemble protein-encoding genes.

THEME

Genomes & Junk DNA

Total Junk so far

    46%
There are two kinds of pseudogenes derived from protein-encoding genes. Those derived from reverse transcription of mRNA and the re-integration of double-stranded DNA into the genome are called "processed" pseudogenes because the mRNA precursor was processed to give mature mRNA before being copied. Consequently, processed pseudogenes do not have introns. They also don't have promoters so they cannot be transcribed.

The other kind of pseudogene arises following a gene duplication event. One of the copies acquires a mutation that inactivates it. This is usually not harmful because the other copy remains intact. It is the fate of most duplicated genes to become a pseudogene by inactivation.

The original meaning of "junk" DNA referred to pseudogenes (reviewed in Gregory 2005) but the term is now used frequently to mean any non-functional DNA. That's the definition I use here.

Ensembl lists 2,081 pseudogenes in the human genome but that's very low compared to other studies [Human Genome]. The number of processed pseudogenes range from several thousand up to 17 thousand (Drouin 2006). The ENCODE project found 118 pseudogenes in their detailed analysis of 1% of the genome (Solovyev et al. 2006). This suggest that there are 11,800 pseudogenes in the entire genome.

A number of studies suggest that the number of processed pseudogenes is approximately the same as the number of inactivated duplicated genes (reviewed in Taylor and Raes 2005). In the case of processed pseudogenes, there are many copies of a relatively small subset of the total number of genes. In other words, lots of genes do not spawn pseudogenes and those that do have many offspring. This is because there is a bias in favor of genes that are highly expressed n the germ line.

The total number of pseudogenes in the genome is likely to be close to the number of genes based on extrapolations from detailed analyses of small segments of the genome or single chromosomes.

If we assume that there are 10,000 processed pseudogenes averaging 2 kb each then this represents 20 Mb or 0.06% of the genome. If there are an equal number of other pseudogenes then this is 10,000 × 60 kb = 600 Mb or 18% of the genome. This is all junk DNA but it overlaps extensively with the junk DNA from transposable elements. It is further evidence that substantial parts of the genome are non- functional but since most of that sequence would be introns in an active gene, it would count as junk DNA even if the gene were active. It's best to just count the inactive exons in order to avoid double counting.

Thus, pseudogenes are about 1.2% of the genome and all of it is junk.1,2


1. A small number of former pseudogenes have been reactivated. They are no longer pseudogenes so they don't count as junk. A small number of pseudogenes have acquired a separate function so they don't count as junk. There do not appear to be very many examples.

2. There are many scientists who have tried to make the case for pseudogenes having some sort of function. The most common speculation is that they serve as an important reservoir of sequence information that can be accessed by recombination and/or re-activation (e.g., Balakirev and Ayala 2003).

Balakirev, E.S. and Ayala, F.J. (2003) PSEUDOGENES: Are They “Junk” or Functional DNA? Ann. Rev. Genet. 37:123-151. [doi:10.1146/annurev.genet.37.040103.103949]

Drouin, G. (2006)Processed pseudogenes are more abundant in human and mouse X chromosomes than in autosomes. Mol. Biol. Evol. 23:1652-1655 [PubMed]

Gregory, T.R. (2005) "Genome Size Evolution in Animals" in The Evolution of the Genome. Elsevier Academic Press, New York (USA).

Solovyev, V., Kosarev, P., Seledsov, I. and Vorobyev, D. (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7 Suppl 1:S10.1-12 [ PubMed

Taylor, J.S. and Raes, J. (2005) "Small-Scale Gene Duplications" in The Evolution of the Genome. Elsevier Academic Press, New York (USA).

Junk in Your Genome: SINES

In a previous posting I talked about Long Interspersed Elements or LINEs [Junk in Your Genome: LINEs]. These are retrotransposons that make up a significant percentage of the junk DNA in your genome. Most of them are completely defective, they are incapable of transposing and they usually don't encode any functional proteins.

THEME

Genomes & Junk DNA
A minority of LINEs are still active. Their genes for reverse transcriptase and endonuclease are still functional and the the transposons still retain the end sequences necessary for insertion.

Today I want to discuss Short Interspersed Elements or SINEs. These pieces of DNA tend to be only 100-400 bp in length but they contain all the features of transposons at their ends. The most important of these features is a short repeat of genomic DNA.

Most SINEs are related to the genes for small RNAs and, more specifically, to genes that are transcribed by RNA polymerase III [Transcription of the 7SL Gene]. Recall that one of the characteristics of Class III genes is that many of them have internal promoters. What this means is that the start site for transcription lies entirely within the DNA that's transcribed.

SINEs look like this:

The blue line represents the transcribed region of the SINE and the black line is the genomic DNA flanking the insert. At each end there is a short (about 5 bp) direct repeat representing the remnants of the insertion event. The 3′ end of the SINE has a short stretch of adenlyate residues (poly A) that is required for mobility.

A typical SINE is only about 100-400 bp long. As mentioned above, one of the key features of SINEs is the presence of an internal promoter to which RNA polymerase III binds. Class III promoters generally have two separate binding regions designated Box A and Box B. All SINEs are derived from genes encoding cellular RNAs such as tRNA, 7SL RNA, U RNAs, etc. These genes are transcribed by RNA polymerase III.

The SINE is transcribed because of the presence of the internal promoter. The transcript may be copied by reverse transcriptase produced from active LINEs in the genome. The DNA:RNA hybrid can be converted to double-stranded DNA and integrated into the genome as a transposable element using the LINE endonuclease. The process is similar to the mechanism that produces processed pseudogenes derived from mRNA but the difference is that the SINEs can still be transcribed when they have integrated into the genome whereas the mRNA pseudogenes have been separated from their promoter.

In the mouse genome there are two large families of SINEs. The B1 family is derived from a truncated and rearranged 7SL RNA. (Recall that 7SL RNA is the RNA component of signal recognition particle.) The B2 family comes from a tRNA that has acquired a terminal extension (Dewannieux and Heidmann 2005).

Each mouse family has about one million copies and together they make up about 20% of the mouse genome. Most of these transposable elements are defective because they have acquired mutations. They are not mobile and many are not transcribed.

In humans, the largest family of SINEs is called Alu elements after the fact that the sequence is cleaved by the restriction endonuclease Alu. These SINEs are also derived from 7SL RNA but the rearrangement is different from that in mouse. (They have a common ancestor.) There are about one million Alu elements in the human genome.

SINEs make up about 13% of the human genome. The largest proportion, by far, is Alu elements but there are small numbers of SINEs derived from other cellular RNAs such as the U RNAs required for splicing and snoRNAs (Garcia-Perez et al. 2007).

SINEs are parasites (selfish DNA). They are not essential for human survival and reproduction, especially the huge majority of SINEs that are defective. Thus, at least 13% of the human genome is clearly junk. The total amount of junk DNA contributed by all transposable elements is 44% of the genome (Kidwell 2005).


Dewannieux, M. and Heidmann, T. (2005) L1-mediated retrotransposition of murine B1 and B2 SINEs recapitulated in cultured cells. J. Mol. Biol. 349:241-7 [PubMed]

Garcia-Perez, J.L., Doucet, A.J., Bucheton, A., Moran, J.V. and Gilbert, N. (2007) Distinct mechanisms for trans-mediated mobilization of cellular RNAs by the LINE-1 reverse transcriptase. Genome Res. 17:602-11. [PubMed] [Genome Research]

Kidwell, M. (2005) "Transposable Elements" in The Evolution of the Genome T.R. Gregory ed. Elsevier Academic Press, New York (USA)

Regulation of Transcription

 
From Horton et al. (2006), pp. 663-665.



Many genes are expressed in every cell. The expression of these housekeeping genes is said to be constitutive. In general, such genes have strong promoters and are transcribed efficiently and continuously. Genes whose products are required at low levels usually have weak promoters and are transcribed infrequently. In addition to constitutively expressed genes, cells contain genes that are expressed at high levels in some circumstances and not at all in others. Such genes are said to be regulated.

Regulation of gene expression can occur at any point in the flow of biological information but occurs most often at the level of transcription. Various mechanisms have evolved that allow cells to program gene expression during differentiation and development and to respond to environmental stimuli.

The initiation of transcription of regulated genes is controlled by regulatory proteins that bind to specific DNA sequences. Transcriptional regulation can be negative or positive. Transcription of a negatively regulated gene is prevented by a regulatory protein called a repressor. A negatively regulated gene can be transcribed only in the absence of active repressor. Transcription of a positively regulated gene can be activated by a regulatory protein called an activator. A positively regulated gene is transcribed poorly or not at all in the absence of the activator.


Repressors and activators are often allosteric proteins whose function is modified by ligand binding. In general, a ligand alters the conformation of the protein and affects its ability to bind to specific DNA sequences. For example, some repressors control the synthesis of enzymes for a catabolic pathway. In the absence of substrate for these enzymes, the genes are repressed. When substrate is present, it binds to the repressor, causing the repressor to dissociate from the DNA and allowing the genes to be transcribed. Ligands that bind to and inactivate repressors are called inducers because they induce transcription of the genes controlled by the repressors. In contrast, some repressors that control the synthesis of enzymes for a biosynthetic pathway bind to DNA only when associated with a ligand. The ligand is often the end product of the biosynthetic pathway. This regulatory mechanism ensures that the genes are turned off as product accumulates. Ligands that bind to and activate repressors are called corepressors. The DNA-binding activity of allosteric activators can also be affected in two ways by ligand binding. Four general strategies for regulating transcription are illustrated in the figures. Examples of all four strategies have been identified.


Few regulatory systems are as simple as those described above. For example, the transcription of many genes is regulated by a combination of repressors and activators or by multiple activators. Elaborate mechanisms for regulating transcription have evolved to meet the specific requirements of individual organisms. When transcription is regulated by a host of mechanisms acting together, a greater range of cellular responses is possible. By examining how the transcription of a few particular genes is controlled, we can begin to understand how positive and negative mechanisms can be combined to produce the remarkably sensitive regulation seen in bacterial cells.

©Laurence A. Moran and Pearson Prentice Hall


Horton, H.R., Moran, L.A., Scrimgeour, K.G., perry, M.D. and Rawn, J.D. (2006) Principles of Biochemisty. Pearson/Preintic Hall, Upper Saddle River N.J. (USA)

Theme: Genomes & Junk DNA

Junk in Your Genome

Transposable Elements: (44% junk)

      DNA transposons:
         active (functional): <0.1%
         defective (nonfunctional): 3%
      retrotransposons:
         active (functional): <0.1%
         defective transposons
            (full-length, nonfunctional): 8%
            L1 LINES (fragments, nonfunctional): 16%
            other LINES: 4%
            SINES (small pseudogene fragments): 13%
            co-opted transposons/fragments: <0.1% a
aCo-opted transposons and transposon fragments are those that have secondarily acquired a new function.
Viruses (9% junk)

      DNA viruses
         active (functional): <0.1%
         defective DNA viruses: ~1%
      RNA viruses
         active (functional): <0.1%
         defective (nonfunctional): 8%
         co-opted RNA viruses: <0.1% b
bCo-opted RNA viruses are defective integrated virus genomes that have secondarily acquired a new function.
Pseudogenes (1.2% junk)
      (from protein-encoding genes): 1.2% junk
      co-opted pseudogenes: <0.1% c
cCo-opted pseudogenes are formerly defective pseudogenes those that have secondarily acquired a new function.
Ribosomal RNA genes:
      essential 0.22%
      junk 0.19%

Other RNA encoding genes
      tRNA genes: <0.1% (essential)
      known small RNA genes: <0.1% (essential)
      putative regulatory RNAs: ~2% (essential) Protein-encoding genes: (9.6% junk)
      transcribed region:  
            essential 1.8%  
            intron junk (not included above) 9.6% d
dIntrons sequences account for about 30% of the genome. Most of these sequences qualify as junk but they are littered with defective transposable elements that are already included in the calculation of junk DNA.
Regulatory sequences:
      essential 0.6%

Origins of DNA replication
      <0.1% (essential) Scaffold attachment regions (SARS)
      <0.1% (essential) Highly Repetitive DNA (1% junk)
      α-satellite DNA (centromeres)
            essential 2.0%
            non-essential 1.0%%
      telomeres
            essential (less than 1000 kb, insignificant)

Intergenic DNA (not included above)
      conserved 2% (essential)
      non-conserved 26.3% (unknown but probably junk)

Total Essential/Functional (so far) = 8.7%
Total Junk (so far) = 65%
Unknown (probably mostly junk) = 26.3%
For references and further information click on the "Genomes & Junk DNA" link in the box

LAST UPDATE: May 10, 2011 (fixed totals, and ribosomal RNA calculations)





November 11, 2006
Sea Urchin Genome Sequenced

The sea urchin genome is 814,000 kb or about 1/4 the size of a typical mammalian genome. Like mammalian genomes, the sea urchin genome contains a lot of junk DNA, especially repetitive DNA. The preliminary count of the number of genes is 23,300. This is about the same number that we have in our genomes. Only about 10,000 of these genes have been annotated by the sea urchin sequencing team.

Wednesday, February 06, 2008

How to Be a Grown-up Scientist

 
Janet Stemwedel has put her finger on an important issue. Read her blog and find out about The project of being a grown-up scientist (part 1).

What percentage of academic scientists are grown-ups? I think it's pretty high in my department but it's not 100%.


TV Ontario's Best Lecturers

 
It's that time of year again. TV Ontario (TVO) has chosen its ten finalists for best university lecturer. you can see the list on the Best Lecturer website.

Some of you might recall that Michael Persinger of the magic motorcycle helmet was one of the finalists last year and he went on to win the $10,000 prize [TV Ontario's Best Lecturers]. I was a bit peeved at this. I wrote,
This is a popularity contest. The last one was very disappointing because some of the most important aspects of being a good university lecturer were ignored.

I'm talking about accuracy and rigour. It's not good enough to just please the students. What you are saying has to be pitched at the right level and it has to be correct. Too many of the lectures were superficial, first-year introductions that offered no challenge to the students. (One, for example, was an overview of Greek and Roman architecture by an engineering Professor.) The students loved it, of course, and so did the TV producers because they could understand the material. Lecturer's in upper level courses need not apply.

Some of last year's lectures were inaccurate. The material was either misleading or false, and the concepts being taught were flawed. Neither students nor TV audiences were in any position to evaluate the material so accuracy was not a criterion in selecting the best lecturer of 2006.

I wrote to the producers about this, suggesting that the lecturers be pre-screened by experts in the discipline. TV Ontario promised to do a better job this year. I'm looking forward to seeing if they keep their promise.
Are you wondering how they did? They chose Michael Persinger, a "fringe" scientist, to put it politely.

How are they doing this year? Here's the list of judges.
Zanana L. Akande (born 1937 in Toronto, Ontario) is a former Canadian politician. She was the first black woman elected to the Legislative Assembly of Ontario, and the first black woman to serve as a cabinet minister in Canada.

Barry Callaghan has done work in journalism, television, and filmmaking in addition to his own writing. He began his career as a part-time reporter for Canadian Broadcasting Corp (CBC) television news and gave weekly book reviews on the CBC radio program Audio.

Tony Nardi is an actor/ writer/producer. His acting experience has been diverse and prolific, in live theater, television and film. As an actor he received his training in Montreal at the Actor's Studio, The Banff School of Fine Arts, The Stratford Festival, and Italy.
Isn't that interesting. The best people to judge whether a university Professor is delivering a good lecture are a politician, a writer, and an actor.

Silly me. I thought that Professors might be on the panel of judges. I guess they're all too busy serving on juries that evaluate politicians, writers, and actors.

The top three criteria for evaluating university lectures are: (1) accuracy, (2) accuracy, and (3) accuracy. The only people who can judge whether those criteria are being met are other academics in the same discipline. If the lectures aren't accurate then nothing else matters. If the lecture material is accurate then you can start looking at other things, such as style.


Who Put the Cephalopod in SEED Magazine?

 
SEED magazine is usually a pretty good magazine in spite of the fact that they get a few things wrong and in spite of the fact that they sponsor ScienceBlogsTM.

But enough is enough. Imagine my surprise when I opened the current issue (January/February 2008) to page 21 and saw this ugly, squishy, creature. As a (fairly) loyal reader, I've been tolerant of their graphics and images even though most of them don't make much sense. But this is way over the top. Did the editors forget that this magazine is displayed on news stands where young children might see it?

Who is responsible for this? And what can we do about it?


[Hint: The disgusting image seems to be associated with an article titled Eyeing the Evolutionary Past by Paul Z. Mierz.]

Nobel Laureate: François Jacob

 

The Nobel Prize in Physiology or Medicine 1965.
"for their discoveries concerning genetic control of enzyme and virus synthesis"


François Jacob (1920 - ) received the Nobel Prize in Physiology or Medicine for his work on gene expression. He shared the prize with André Lwoff and Jacques Monod. The three men worked together at the Institut Pasteur in Paris, France, at a time when it was one of the leading centers of research in this field.

Jacob made major contributions to the discovery of messenger RNA and the regulation of transcription when these processes were just beginning to be understood. His name, and Monod's, are mostly associated with the lac operon in E. coli but the prize was also given for work with bacteriophage. The concepts of operons, operators, and repressors all come from the work of Jacob and Monod.

THEME:

Nobel Laureates
The presentation speech was given by Professor Sven Gard, member of the Nobel Committee for Physiology or Medicine of the Royal Caroline Institute. As you read it, note how much they knew in 1965 after only a few years of intense work in deciphering the genetic code and working out how genes are transcribed. This is only 12 years after Watson & Crick's paper on the structure of DNA. That's the same amount of time that has elapsed between 1996, when Dolly the sheep was cloned, and today.
Your Majesties, Royal Highnesses, Ladies and Gentlemen.

The 1965 Nobel Prize in Physiology or Medicine is shared by Professors Jacob, Lwoff and Monod for «discoveries concerning the genetic regulation of enzyme and virus synthesis».

This particular sphere of research is by no means easy. I heard one of the prize winners, Professor Jacob, forewarn an audience of specialists more or less as follows: «In describing genetic mechanisms, there is a choice between being inexact and incomprehensible». In making this presentation, I shall try to be as inexact as conscience permits.

It has become progressively more apparent that the answer to what has hitherto been romantically termed the secret of life must be sought in the mechanism of action and in the structure of the hereditary material, the genes. This central field of research has naturally been approached from the periphery and in stages. Only in recent years has it been possible to make a serious attack on these fundamental problems.

Several previous Nobel Prize holders: Beadle, Tatum, Crick, Watson, Wilkins, Kornberg and Ochoa have worked in this sphere of research and have formulated certain basic proposals which have enabled the French scholars to continue their efforts. It has been established that one of the principal functions of genes must be to determine the nature and number of enzymes within the cell, the chemical apparatus which controls all the reactions by which the cellular material is formed and the energy necessary for various life processes is released. There is thus a particular gene for each specific enzyme.

In addition, some light has been thrown on the chemical structure of genes. In principle, they have the form of a long double chain consisting of four different components, which can be designated by the letters a, c, g, and t, and with the property of forming pairs with each other. An «a» in one of the chains has to be matched by a «t» in the other, a «g» only by a «c». However, they can be linked along the length of the chain in any order whatsoever, so that the number of possible combinations is virtually unlimited. A chain of genes contains from several hundreds to many thousands of units; such structures can easily carry the specific patterns for the million or more genes which it is estimated that a cell may have.

This model of the genes represents a coded message containing two types of information. If the double chain of a gene is split lengthwise and each half acquires a new partner, then the final result is two double chains identical to the original gene. The model thus contains information relative to the actual structure of the gene, which permits multiplication, in its turn a condition of heredity. When a cell divides, each daughter cell receives an exact copy of the parent gene. The structure of the double chain ensures the stability and permanence required by hereditary material.

But the model can also be read in another way. Along the length of the chain, the letters are grouped in threes in coded words. An alphabet of four letters allows the formation of more than 30 different words and the sequence in the gene of such words provides the structural information for an enzyme or some other protein. Proteins are also chain molecules built up from twenty or so different types of building blocks. To each of these building blocks there corresponds a chemical code word of three letters. The gene thus contains information on the number, nature, and order of the building blocks in a particular protein.

Thus it was already clear that the hereditary blueprint contained the collective structural information for all substances necessary for the functions of the living cell. It was not known how the genetic information was put into effect or transformed into chemical activity. As to the function of the genes, it was thought that they participated in a sort of procreative act when the new cell came into being, producing new substances necessary for the life of the cell, but subsequently lying dormant until the next cell division. It was presumed that the structure and formation of the chemical apparatus determined in this way defined all the regulatory mechanisms necessary for the cell's ability to adapt to changes in the environment and to respond in an adequate manner to stimuli of different types.

To begin with, the group of French workers were able to demonstrate how the structural information of the genes was used chemically. During a process resembling gene multiplication an exact copy of the genetic code is produced, termed a messenger. The latter is then incorporated into the chemical «workshop» of the cell and wound like magnetic tape onto a spool. For each word arriving on the spool, a constructional unit is attracted, which carries a complement to this word and attaches itself there just like a piece of jigsaw puzzle. The building blocks of a protein are selected in this way one by one, aligned, and joined together to form a protein with the appropriate structure.

The messenger substance is, however, short-lived. The tape lasts only for a few recordings. The enzymes are also used up in a similar way. For the cell to maintain its activity, it is thus necessary to have an uninterrupted production of the messenger material, that is to say continuous activity of the corresponding gene.

However, cells can adapt themselves to different external conditions. Thus there must exist some mechanisms controlling the activity of the genes. The research into the nature of these mechanisms is a remarkable achievement which has opened the way for the possible explanation of a series of hitherto mysterious biological phenomena. The discovery of a previously unknown class, the operator genes, which control the structural genes, marks a major breakthrough.

There are two types of operator genes. One type releases chemical signals, which are perceived by a second, receptor, type. The latter controls in its turn one or more structural genes. As long as the signals are being received the receptor remains blocked and the structural genes are inactive. Certain substances coming from outside or formed within the cell can, however, influence the chemical signals in a specific manner, changing their character so that they can no longer influence the receptor. The latter is unblocked and activates the structural genes; messenger material is produced and the synthesis of enzymes or another protein commences.

Control of gene activity is thus of a negative nature; the structural genes are only active if the repressor signals do not arrive. One can speak here of chemical control circuits similar in many ways to electrical circuits, for example in a television set. In the same way, they can be interconnected or arranged in a series to form complicated systems.

With the aid of such control circuits, the free living monocellular organism can produce enzymes when required, or interrupt chemical reactions if they are likely to cause damage; an excitatory stimulus can provoke movement, flight or attack, depending on the nature of the excitation. With such mechanisms it is possible to direct the development of cells into more complicated structures. It is particularly interesting to note that the activity of viruses is controlled, in principle, in the same manner.

Bacteriophages contain a genetic control circuit complete with emitter, receptor, and structural genes. While chemical signals are being sent and received, the virus remains inactive. When incorporated into a cell, it behaves like a normal component of the cell, and can confer on it new properties which may improve its chances of survival in the struggle for existence. However, if the signals are interrupted, the virus is activated, starts to grow rapidly and soon kills the host cell. There is considerable evidence for the view that certain types of tumor virus are incorporated into a normal cell in the same way, thus transforming it into a tumour cell.

We are easily inclined to hold an exaggerated opinion of ourselves in this era of advanced technology. Thus, we are justified in having a great admiration for the achievements in electronics, where, for example, the attempts at miniaturization to reduce component size, to lower the weight, and reduce the volume of apparatus have enabled a rapid development of space science. However, we should bear in mind that, millions of years ago, nature perfected systems far surpassing all that the inventive genius of man has been able to conceive hitherto. A single living cell, measuring several thousandths of a millimetre, contains hundreds of thousands of chemical control circuits, exactly harmonized and functioning infallibly. It is hardly possible to improve on miniaturization further; we are dealing here with a level where the components are single molecules. The group of French workers has opened up a field of research which in the truest sense of the word can be described as molecular biology.

Lwoff represents microbiology, Monod biochemistry, and Jacob cellular genetics. Their decisive discovery would not have been possible without competence and technical knowledge in all these fields, nor without intimate cooperation between the three researchers. But the mystery of life is not resolved simply with knowledge and technical skill. One must also have a gift for observation, a logical intellect, a faculty for the synthesis of ideas, a degree of imagination, and scientific intuition, qualities with which the three workers are liberally endowed.

Research in this field has not yet yielded results that can be used in practice. However, the discoveries have given a strong impetus to research in all domains of biology with far-reaching effects spreading out like ripples in the water. Now that we know the nature of such mechanisms, we have the possibility of learning to master them, with all the consequences which that will surely entail for practical medicine.

François Jacob, André Lwoff, Jacques Monod. Thanks to your technically unimpeachable experiments and your ingenious and logical deductions, you have gained a more intimate familiarity with the nature of vital functions than anyone before you has done. Action, coordination, adaptation, variation - these are the most striking manifestations of living matter. By placing more emphasis on dynamic activity and mechanisms than on structure, you have laid the foundations for the science of molecular biology in the true sense of the term. In the name of the Caroline Institute, I ask you to accept our admiration and our most sincere congratulations. Finally, I invite you to come down from the platform to receive the prize from His Majesty the King.



Gods Behaving Badly

 
Gods Behaving Badly is a new book by Marie Phillips. It was just reviewed in the New York Times [The House of Myth]. Here's a teaser,
Americans have long delighted in movies like ''It's a Wonderful Life,'' ''Heaven Can Wait'' (both the 1943 and 1978 versions) and ''Bruce Almighty'' -- ''divine comedies,'' to borrow the marketing shtick of the day, in which a benevolent male Judeo-Christian God and sometimes his demonic counterpart are represented by stock imagery like billowing clouds, bolts of lightning, bumbling plainclothes angels and horned creatures thumping pitchforks. The humor may be irreverent, but it's always delivered with a basic attitude of respect.

Such deference, a holdover perhaps from the days of the Hays Code, is entirely lacking in Marie Phillips's first novel, ''Gods Behaving Badly,'' in which the 12 major deities of ancient Greece uneasily cohabit in a dilapidated town house in 21st-century London, dwelling just above the city's ''greasy tide'' of human flesh. It's like Hesiod's ''Theogony'' meets MTV's ''Real World.''

In the author's affectionate telling, Zeus, the fading patriarch, is squirreled away on the top floor; Apollo is a horny and malcontented television psychic; and Aphrodite is a phone-sex worker whose buttocks, when she mounts a staircase, resemble ''two hard-boiled eggs dancing a tango'' -- maybe the most original description of the female posterior since Jerry approvingly deemed Sugar Kane's ''Jell-O on springs'' in ''Some Like It Hot.'' Apollo's virginal, pragmatic twin sister, Artemis, walks dogs for a living and jogs compulsively in her spare time. Dionysus owns a nightclub called Bacchanalia and is constantly plugged in to a music player. Meanwhile, Athena has been cast as an efficient boardroom type who distributes handouts to her bored family as she subjects them to streams of corporate gobbledygook.
This sounds like a terrific book. I don't normally read fiction—other than creationist books—but this will be an exception. Has anyone read it?


What Happened to the "Peers" on this Paper?

 

Quite a few science bloggers were shocked at a paper that appeared recently in the journal Proteomics—a respectable journal up 'till now.

PZ Myers had the stomach to blog about this train wreck of a paper. Read his article at A baffling failure of peer review.


[Photo Credit: Train Wreck at Gare Montparnasse, Paris, France, 1895 from Answers.com]

Joshua Lederberg

 

Joshua Lederberg died last Saturday (Feb. 2, 2008). In his honor, John Dennehy has selected one of Lederberg's famous papers as This Week's Citation Classic: Joshua Lederberg.

I think it's too bad that our current generation of students is growing up without being sufficiently aware of the fundamental principles of biochemistry and molecular biology that were worked out in bacteria and bacteriophage.

UPDATE: [Loss of a giant: Joshua Lederberg]


Tangled Bank #98

 
The latest issue of Tangled Bank is #98. It's hosted by Steve Matheson at Quintessence of Dust [Tangled Bank #98].
Hey! Welcome to Tangled Bank #98, and thanks for stopping by. If you've never been to Quintessence of Dust, the lobby is below and to the right. I hope you'll poke around a little.

PZ didn't give me a budget for refreshments, but if you come to the house I'll make sure we at least have plenty of guacamole. Chips are here, and beer is over there. Our city was once used by Anne Lamott as a metaphor for plainness, but it's much cooler than most people think. You can get to our house on a nice bus system, and after the carnival we can pick one of two Ethiopian restaurants. My day job is at Calvin College, but right now I'm on sabbatical in the lab of a friend and collaborator at the Van Andel Institute in downtown Grand Rapids.


If you want to submit an article to Tangled Bank send an email message to host@tangledbank.net. Be sure to include the words "Tangled Bank" in the subject line. Remember that this carnival only accepts one submission per week from each blogger. For some of you that's going to be a serious problem. You have to pick your best article on biology.