More Recent Comments

Friday, January 18, 2008

Soybean Genome

 
A preliminary draft of the soybean (Glycine max) genome has been released on the Phytozome website [Glycine max Genome].

The reported size of the genome is 950 Mb (950 × 106 base pairs). This is considerably larger that the genomes of grape (505 Mb), Arabidopsis (157 Mb), rice (389 Mb), and polar (485 Mb).

The larger size suggests a recent polyploidization event in the lineage leading to soybean. The number of genes in the draft sequence is 51,320. This also suggests that many genes are duplicated. (Grape has about 30,000 genes, poplar has about 45,000, rice has 38,000 and Arabidopsis has only 27,029.) Keep in mind that the total number of genes is likely to drop by a considerable amount once detailed annotation gets underway. Nevertheless, it looks like the soybean has a lot more genes than the other flowering plants.


[Photo Credit: [Photograph]. Retrieved January 18, 2008, from Encyclopædia Britannica Online: bean: soybean]

Brampton Prude

 
Brampton is a city west of Toronto and north of Mississauga, where I live. Heart Lake United Church is trying to attract customers so they put up the sign shown here. I think it's funny.

Nicole Cedrone doesn't agree. She thought it was offensive when she drove by on her way home from the doctor [Church Strips Saucy Sign]. She complained and the sign was removed.
"I have to admit, it is funny, but it's not appropriate for where it is," Cedrone said. "I just think it's offensive."

She said she is glad her 11-year-old wasn't in the car with her to ask, `Mom, what does that mean?'"
Well, the photograph of the "offensive" sign is now prominently featured in The Toronto Star where, hopefully, her 11-year-old son will read it and ask questions like, "Mom, what does 'saucy' mean?" By being such a prude, Nicole Cedrone has ensured that the sign will be viewed by millions and not just a small number of people driving along Sandalwood Parkway. Way to go, Nicole.


Bobby Fischer

 
Bobby Fischer died yesterday in Reykjavik, Iceland, where he had been living for the past several years [Bobby Fischer, 64: Former chess champion].

Back in 1972, Fischer beat Boris Spassky of the USSR to become world chess champion. The event has been glorified as part of the cold war competition between the USA and the USSR but this was only part of the story. Some of us were just interested in it as a major sporting event featuring a radical new hero who didn't always play by the rules.

I remember following the games live on television—yes, that's right, the moves in each game were broadcast live on a large chessborad, with plenty of color commentary. As an amateur chess player, it was a real insight into the world of high level play.

Go to World Chess Championship 1972 for a brief summary of this extraordinary event. We'll never see anything like it again.

Here's the position at adjournment in the final (21st) game [Spassky vs Fischer Game #21]. Fischer (black) has just played h5. After thinking about the position all night Spassky phoned in the next morning to resign and concede the championship. Can you see why he gave up?



Why I Like Richard Dawkins

 
Richard Dawkins doesn't pull punches and he doesn't beat around the bush. You always know where he stands on any given issue. This is what I admire about Richard Dawkins.

I don't agree with him on lots of things but whenever you engage him you know you've got a fight on your hands. It's the combination of intelligence and forthrightness that make him such a powerful voice in science. We need more scientists who are both smart, and willing to stand up for their ideas. We need more open controversy in science these days. Scientists need to speak up when they encounter silly ideas in the scientific literature. It is not a scientific virtue to be polite in such cases; in fact, it can be detrimental to science to clog up the scientific literature with scientific nonsense on the grounds that one shouldn't criticize fellow scientists in public. Richard Dawkins does not make that mistake.

Dawkins does not like group selection because it conflicts with his adaptationist, gene-centric, worldview. He's been very clear about this over the years. I admire him for sticking to his guns and standing by the original dismissal of group selection by George Williams.1

David Sloan and E.O. Wilson have recently been pushing for a revival of group selection. They published a short summary of their new book in the Nov. 3 edition of New Scientist [Evolution: Survival of the selfless] where they said,
The concept of genes as "replicators" and "the fundamental unit of selection" averages the fitness of genes across all contexts to predict what evolves in the total population. The whole point of multilevel selection theory, however, is to ask whether genes can evolve on the strength of between-group selection, despite a selective disadvantage within each group. When this happens, the gene favoured by between-group selection is more fit overall than the gene favoured by within-group selection in the total population.

It is bizarre (in retrospect) to interpret this as an argument against group selection. Both Williams and Dawkins eventually acknowledged their error, but it is still common to find the "gene's-eye view" of evolution presented as a drop-dead argument against group selection.

The old arguments against group selection have all failed. It is theoretically plausible, it happens in reality, and the so-called alternatives actually include the logic of multilevel selection. Had this been known in the 1960s, sociobiology would have taken a very different direction. It is this branch point that must be revisited to put sociobiology back on a firm theoretical foundation.
Dawkins responds to this in a letter published in the Dec. 15 issue [Genes Still Central].
Genes still central

David Sloan Wilson's lifelong quest to redefine "group selection" in such a way as to sow maximum confusion - and even to confuse the normally wise and sensible Edward O. Wilson into joining him - is of no more scientific interest than semantic doubletalk ever is. What goes beyond semantics, however, is his statement (it is safe to assume that E. O. Wilson is blameless) that "Both Williams and Dawkins eventually acknowledged their error..." (3 November, p 42).

I cannot speak for George Williams but, as far as I am concerned, the statement is false: not a semantic confusion; not an exaggeration of a half-truth; not a distortion of a quarter-truth; but a total, unmitigated, barefaced lie. Like many scientists, I am delighted to acknowledge occasions when I have changed my mind, but this is not one of them.

D. S. Wilson should apologise. E. O. Wilson, being the gentleman he is, probably will.

* Richard Dawkins, Oxford UK
Does anyone have any doubts about where Dawkins stands on the issue of group selection?

David Sloan Wilson and E.O. Wilson responed to Dawkins' letter by claiming that they were only referring to one minor aspect of the argument against group selection but I don't think anyone is going to be fooled by that. In their article, they clearly imply that Dawkins has acknowledged his "error" in opposing group selection. This is a case where a simple apology would have worked better.


1. Ironically, Dawkins is a huge fan of kin selection, which, in my opinion is just about as weak as group selection.

Thursday, January 17, 2008

Gerty Cori Biochemist on USA Stamp

Biochemist Gerty Cori is going to be on a new USA stamp to be issued in March. Cori and her husband won the Nobel Prize in 1947 for their work on glycogen metabolism [Nobel Laureates: Carl Ferdinand Cori and Gerty Theresa Cori].

One of the key intermediates in this pathway is the Cori ester [Monday's Molecule #25]. That's the molecule pictured on the stamp. Unfortunately, there's a mistake in the structure. How many can spot it? Why didn't they ask a biochemist to check the design?

UPDATE: Here's the correct structure.
The error was first discovered by a reader of Chemical & Engineering News [Going postal over structural errors]. Here's how C&EN describes the mistake ...
It is a sad state of affairs, because it was precisely the isolation of glucose-1-phosphate, and discovery of the so-called Cori ester, that garnered Cori the Nobel Prize. "Long-dead carbohydrate chemists would roll over in their graves to see this structure after all the effort they made to get it right," one sugar chemist wrote in an e-mail to Newscripts.

The glitch made us rather glum, despondent even, as we considered the squandered opportunity to serve some first-class carbohydrates to the American public. For alas, the suboptimal stamps have already been printed and are still scheduled for release in early March, despite the error.


[Hat Tip: Living the Scientific Life]

A Junk DNA Quiz

 
Take the junk DNA quiz in the left sidebar to let me know what you think of your genome. How much of it could be removed without affecting our species in any significant1 way in terms of viability and reproduction? Or even in terms of significant ability to evolve in the future? In other words, how much is junk?


1. I did not choose the word "significant" in order to be obtuse. I picked it in order to eliminate some trivial possibilities that really don't make any difference. For example, no matter how little DNA you delete you would be able to detect some change, even if it's just a reduction in the time to replicate the genome of the amount of energy used. If you think that such changes are "significant" then you should answer "none" to the question in the quiz.

The Plausibility of Life

The Plausibility of Life is an evo-devo book by Mark Kirschner and John Gerhart. I read it a long time ago and had pretty much put it out of my mind except for the occasional potshot [Evo-Devo: Innovation and Robustness in Evolution] [Animal Chauvinism].

A couple of days ago I was shocked into taking another look at the book. The shocker was Alex Palazzo of The Daily Transcript who wrote [Today's rant - the biggest story never covered].
The most insightful book on biology written in the past decade was Gerhart & Kirschner's book, The Plausibility of Life.
That's so far out of line with my opinion that I began to wonder if I had missed something important. Alex is a smart guy and he's not likely to be way off base.

Alas, in this case Alex got it wrong. This is certainly not one of the most insightful books on biology in this decade, or any other decade. It's pretty much adaptationist, animal chauvinistic, evo-devo-centric gobbledygook from a pair of scientists who don't understand evolution.

The central question of the book is, "... how can small, random genetic changes be converted into complex and useful innovations?" (p. ix). Apparently this is a puzzlement to evolutionary developmental biologists. One that has eluded them until the last decade of the twentieth century.

Kirschner and Gerhart's effort is a valuable update of some of these ideas, but it hardly constitutes "a new theory" to complement the Modern Synthesis.

Massimo Pigliucci
Have We Solved Darwin's Dilemma?
Kirschner and Gerhart have the solution.
In this book we propose a major new scientific theory: facilitated variation that deals with the means of producing useful variation.
Since the essence of their theory is "facilitated variation" you would think that this concept would be clearly explained in the book. It isn't. There's a lot of beating around the bush and hand waving about a fundamental new process of evolution but little in the way of concrete facts and evidence.

When I read the book for the very first time I was astonished to realize that by the end of the book I still didn't know what the authors meant by "facilitated variation." So, I looked it up in the glossary ....
Facilitated Variation: An explanation of the organism's generation of complex phenotypic change from a small number of random changes of the genotype. We posit that the conserved components greatly facilitate evolutionary change by reducing the amount of genetic change required to generate phenotypic novelty, principally through their reuse in new combinations and in different parts of their adaptive ranges of performance.
Is that clear? Of course it isn't. You have to go back and read very carefully to even begin to understand what they're talking about.

... Kirschner and Gerhart do not present any detailed examples of how the properties of developmental systems have actually contributed to the evolution of a major evolutionary novelty. Nor have they shown that alternative properties would have prevented such evolution. Although The Plausibility of Life contains many interesting facts and arguments, its major thesis is only weakly supported by the evidence.

Brian Charlesworth
On the Origins of Novelty and Variation
Here's how I see it. Animals have a number of conserved core processes like transcription, membrane trafficking, formation of eyes, compartments, etc. that contribute to the success of the organism. You can't mess with these core processes because that would be lethal. However, new phenotypes can arise when existing core processes are expressed in new combinations or at different times during development. The core processes have been selected for adaptability. In particular, they have evolved in a way that facilitates phenotypic variation by encouraging the evolution of new combinations and new timing while, at the same time discouraging evolution of the core processes themselves.

Over time, animal species have been selected for the ability to evolve by taking advantage of facilitated variation without threatening the core processes. This one of the definitions of evolvability.
In summary, we believe that evolvability—the capacity for organisms to evolve—is a real phenomenon. We believe that facilitated variation explains the variation side of evolvability, through the reuse of a limited set of conserved processes in new combinations and in different parts of their adaptive ranges due to genetic modifications of nonconserved regulatory components ....

Facilitated variation has arisen and increased by selection, we say. Since it facilitates the generation of innumerable complex, selectable heritable traits with only a small investment of random genetic variation, it is indeed the greatest adaptation of all, at least for animals since the Cambrian.
I'm not buying any of this. I'd like to see real evidence that the process of evolution in animals is different than evolution in prokaryotes. Whether you're dealing with the genetic switch in bacteriophage λ, sporulation in Bacillus subtilis, or the formation of heterocysts in cyanobacteria, it is always the case that the processes are controlled by a cascade of regulatory factors and drastic changes in the timing and extent of these regulatory genes can lead to significant phenotypic effects.

What's so special about animals that we have to develop a "major new scientific theory" to explain similar observations?


Wednesday, January 16, 2008

Nobel Laureate: Sidney Altman

 

The Nobel Prize in Chemistry 1989.

"for their discovery of catalytic properties of RNA"



In 1989, Sidney Altman (1939 - ) was awarded the Nobel Prize in Chemistry for discovering that the RNA component of RNase P was the catalytic component of the enzyme [Transfer RNA Processing: RNase P]. He shared the prize with Thomas Cech who worked on self-splicing ribosomal RNA precursors.

The presentation speech was delivered by Professor Bertil Andersson of the Royal Swedish Academy of Sciences.
THEME:

Nobel Laureates
Your Majesties, Your Royal Highnesses, Ladies and Gentlemen,

The cells making up such living organisms as bacteria, plants, animals and human beings can be looked upon as chemical miracles. Simultaneously occurring in each and every one of these units of life, invisible to the naked eye, are thousands of different chemical reactions, necessary to the maintenance of biological processes. Among the large number of components responsible for cell functions, two groups of molecules are outstandingly important. They are the nucleic acids - carriers of genetic information - and the proteins, which catalyze the metabolism of cells through their ability to act as enzymes.

Genetic information is programmed like a chemical code in deoxyribonucleic acid, better known by its abbreviated name of DNA. The cell, however, cannot decipher the genetic code of the DNA molecule directly. Only when the code has been transferred, with the aid of enzymes, to another type of nucleic acid, ribonucleic acid or RNA, can it be interpreted by the cell and used as a template for producing protein. Genetic information, in other words, flows from the genetic code of DNA to RNA and finally to the proteins, which in turn build up cells and organisms having various functions. This is the molecular reason for a frog looking different from a chaffinch and a hare being able to run faster than a hedgehog.

Life would be impossible without enzymes, the task of which is to catalyze the diversity of chemical reactions which take place in biological cells. What is a catalyst and what makes catalysis such a pivotal concept in chemistry? The actual concept is not new. It was minted as early as 1835 by the famous Swedish scientist Jöns Jacob Berzelius, who described a catalyst as a molecule capable of putting life into dormant chemical reactions. Berzelius had observed that chemical processes, in addition to the reagents, often needed an auxiliary substance - a catalyst - to occur. Let us consider ordinary water, which consists of oxygen and hydrogen. These two substances do not react very easily with one another. Instead, small quantities of the metal platinum are needed to accelerate or catalyze the formation of water. Today, perhaps, the term catalyst is most often heard in connection with purification of vehicle exhausts, a process in which the metals platinum and rhodium catalyze the degradation of the contaminant nitrous oxides.

As I said earlier, living cells also require catalysis. A certain enzyme, for example, is needed to catalyze the breakdown of starch into glucose and then other enzymes are needed to burn the glucose and supply the cell with necessary energy. In green plants, enzymes are needed which can convert atmospheric carbon dioxide into complicated carbon compounds such as starch and cellulose.

As recently as the early 1980s, the generally accepted view among scientists was that enzymes were proteins. The idea of proteins having a monopole of biocatalytic capacity has been deeply rooted, and created a fundamental dogma of biochemistry. This is the very basic perspective in which we have to regard the discovery today being rewarded with the Nobel Prize for Chemistry. When Sidney Altman showed that the enzyme denoted RNaseP only needed RNA in order to function, and when Thomas Cech discovered self-catalytic splicing of a nucleic acid fragment from an immature RNA molecule, this dogma was well and truly holed below the waterline. They had shown that RNA can have catalytic capacity and can function as an enzyme. The discovery of catalytic RNA came as a great surprise and was indeed met with a certain amount of scepticism. Who could ever have suspected that scientists, as recently as in our own decade, were missing such a fundamental component in their understanding of the molecular prerequisites of life? Altman's and Cech's discoveries not only mean that the introductory chapters of our chemistry and biology textbooks will have to be rewritten, they also herald a new way of thinking and are a call to new biochemical research.

The discovery of catalytic properties in RNA also gives us a new insight into the way in which biological processes once began on this earth, billions of years ago. Researchers have wondered which were the first biological molecules. How could life begin if the DNA molecules of the genetic code can only be reproduced and deciphered with the aid of protein enzymes, and proteins can only be produced by means of genetic information from DNA? Which came first, the chicken or the egg? Altman and Cech have now found the missing link. Probably it was the RNA molecule that came first. This molecule has the properties needed by an original biomolecule, because it is capable of being both genetic code and enzyme at one and the same time.

Professor Altman, Professor Cech, you have made the unexpected discovery that RNA is not only a molecule of heredity in living cells, but also can serve as a biocatalyst. This finding, which went against the most basic dogma in biochemistry, was initially met with scepticism by the scientific community. However, your personal determination and experimental skills have overcome all resistance, and today your discovery of catalytic RNA opens up new and exciting possibilities for future basic and applied chemical research.

In recognition of your important contributions to chemistry, the Royal Swedish Academy of Sciences has decided to confer upon you this year's Nobel Prize for Chemistry. It is a privilege and pleasure for me to convey to you the warmest congratulations of the Academy and to ask you to receive your prizes from the hands of His Majesty the King.


Transfer RNA Processing: RNase P

 

RNase P is one of the key enzymes in the processing of tRNA primary transcripts [Transfer RNA: Synthesis].

RNase P is a ribozyme. Most of the enzyme consists of an RNA molecule called RNA P and the rest is composed of small proteins. In bacteria there is a single protein subunit while in eukaryotes there are up to eight small proteins bound to the RNA component.

RNA P, by itself, can catalyze the cleavage reaction [Monday's Molecule #58]. The role of the protein is simply to facilitate the reaction.1

The structure of the RNA component from two different species has recently been published. The one shown here is RNA P from Thermophilus maritima (reviewed in Baird et al. 2007). This catalytic RNA is found in all species and it's the classic example of an RNA that can catalyze a reaction in the absence of protein. Sidney Altman received the Nobel Prize in 1989 for demonstrating that the activity was confined to the RNA part of the holoenzyme.

The exact structure of the complete holoenzyme (RNA + protein) is not known but the evidence suggest a model such as the one shown on the left (Smith et al. 2007). The RNA is blue, the protein subunit is red, and the bound tRNA precursor is brown. Note that the protein subunit is positioned at the site of the cleavage near the 5′ end of the mature tRNA.

Part of the RNA ribozome is interacting with the TΨC loop of the tRNA molecule. This loop is present in all tRNAs which explains why the RNase P enzyme can cleave all tRNA precusors no matter which particular tRNA going to be produced.

There are two different types of RNase P depending on the species. Although both of them have similar catalytic RNAs they differ in size of the RNA and in the proteins that are bound to it.


1 When the reaction is carried out under in vivo concentrations of ionic strength, temperature etc., the protein component is absolutely required in order to get significant activity.

Baird, N.J., Fang, X.W., Srividya, N., Pan, T. and Sosnick, T.R. (2007) Folding of a universal ribozyme: the ribonuclease P RNA. Quarterly Rev. Biophys. 40:113-161. [doi:10.1017/S0033583507004623] [PubMed]

Smith, J.K., Hsieh, J. and Fierke, C.A. (2007) Importance of RNA-protein interactions in bacterial ribonuclease P structure and catalysis. Biopolymers 87:329-38. [PubMed]

Transfer RNA: Synthesis

 
Transfer RNA's are produced by transcribing a tRNA gene to produce a single-stranded tRNA precursor molecule. tRNA genes are just one of the many examples of genes that don't encode proteins. It's worth keeping this in mind when you read discussions about how genes are defined and the role of "noncoding" DNA in the genome.

tRNA genes can be individual isolated genes or they can be linked to other genes in a larger transcriptional unit. A common example of the latter situation occurs in ribosomal RNA operons where tRNA genes are located in the regions between the large and small ribosomal RNA genes. In bacteria, the tRNA genes can be part of a co-transcribed operon containing protein-encoding genes. In eukaryotes the tRNA genes are transcribed by RNA polymerase III [Eukaryotic RNA Polymerases].

No matter how the tRNA genes are arranged, the primary transcriptional product is larger than the functional tRNA and it contains no modified bases. This primary transcript has to be processed to: (a) reduce it to the proper length, (b) remove any introns and (c) convert the standard nulceotides into modified nucleotides like dihydrouridylate (D) or pseudouridylate (Ψ) [Transfer RNA: Structure].

The trimming steps involve a number of specific RNA cleavage enzymes. RNase P specifically cuts the precursor at the 5′ end of the mature tRNA. Other endonucleases cut the precursor near the 3′ end of the mature molecule.

The 3′ end must then be trimmed back to the proper position. This step is carried out by an exonuclease called RNase D in bacteria. Finally, the nucleotides CCA are added to the 3′ end by tRNA nucleotidyl transferase. (All tRNA's have the same 3′ nulceotides—this is where the amino acid is attached later on.) Some tRNA genes have already have the sequence CCA at the 3′ end of the mature molecular so the last step isn't always required.


Transfer RNA: Structure

 
Transfer RNA (tRNA) is an essential component of the protein synthesis reaction. There are at least twenty different kinds of tRNA in the cell1 and each one serves as the carrier of a specific amino acid to the site of translation.

tRNA's are L-shaped molecules. The amino acid is attached to one end and the other end consists of three anticodon nucleotides. The anticodon pairs with a codon in messenger RNA (mRNA) ensuring that the correct amino acid is incorporated into the growing polypeptide chain.

The L-shaped tRNA is formed from a small single-stranded RNA molecule that folds into the proper conformation. Four different regions of double-stranded RNA are formed during the folding process.

The two ends of the molecule form the acceptor stem region where the amino acid is attached. The anticodon is an exposed single-stranded region in a loop at the end of the anticodon arm.

The two other stem/loop structures are named after the modified nucleotides that are found in those parts of the molecule. The D arm contains dihydrouridylate residues while the TΨC arm contains a ribothymidylate residue (T), a pseudouridylate residue (Ψ) and a cytidylate (C) residue in that order. All tRNA's have a similar TΨC sequence. The variable arm is variable, just as you would expect. In some tRNA's it is barely noticable while in others it is the largest arm.

tRNA's are usually drawn in the "cloverleaf" form (below) to emphasize the base-pairs in the secondary structure.


1. Most genomes contain 40-80 different tRNA genes. While there are only 20 common amino acids, there are 61 different codons. Many codons are recognized by more than one different tRNA—the classic example is the codon AUG that can be recognized by methionyl-tRNA and initiator tRNA.

First Rule of Holes

 
Greg Laden has responded to criticism of his views on junk DNA [Moran, Gregory, Give me a Break!].

In the comments to Greg's post, Steve LaBonne brings up the First Rule of Holes. This is an excellent example. In case there are some people who are not familiar with the First Rule of Holes, here it is ....
FIRST RULE OF HOLES

If you're in one, stop digging.


Tuesday, January 15, 2008

Greg Laden Gets Suckered by John Mattick

 
Oh dear. Greg Laden reviews a paper from John Mattick's group and he falls for the hype, hook line and sinker. Here's what Greg says [Genes are only part of the story: ncRNA does stuff].
The "Junk DNA" story is largely a myth, as you probably already know. DNA does not have to code for one of the few tens of thousands of proteins or enzymes known for any given animal, for example, to have a function. We know that. But we actually don't know a lot more than that, or more exactly, there is not a widely accepted dogma for the role of "non-coding DNA." It does really seem that scientists assumed for too long that there was no function in the DNA.
I hate to break it to you Greg, but junk DNA is not a myth. It really is true that a huge amount of our genome is junk. It's mostly defective transposons like SINES and LINES [Junk in your Genome: LINEs]. It's a lie that we don't know what most non-coding DNA is doing. We do know. It's not doing anything because it's mostly screwed up transposons and pseudogenes like Alu's.

Mattick may have found a few bits of DNA that encode regulatory RNAs but that's only a small part of the total genome. He, and you, have fallen for excuse #5 of The Deflated Ego Problem.

Ryan Gregory has already tried to teach Greg some real science about junk DNA so I won't pile on any more than I have [Signs of function in non-coding RNAs in mouse brain.].

UPDATE: RPM chimes in to expose the flawed thinking of Greg Laden [How Easy is it to Write About Junk DNA?]


Humans Have Only 20,500 Protein-Encoding Genes

The first drafts of the human genome indicated about 30,000 genes, a number that was very much in line with many predictions that had been made over the years by scientists who were studying the topic. (Other scientists, and most science writers, thought there were about 100,000 genes [Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome]).

Since the publication of the first draft, the number of genes has been dropping as annotators eliminate sequences that were falsely attributed to protein-encoding genes. Current estimates suggest there are about 28,000 different genes all together with about 4,000 of them encoding RNA products such as ribosomal RNA, tRNA, and the small RNAs involved in a numer of metabolic processes [Ensembl: Homo sapiens].

A gene encoding a protein will have an open reading frame (ORF) consisting of multiple codons— usually more than 100. Some of these potential protein-encoding genes appear to be unique to humans. They weren't found in the other mammalian genomes that had been sequenced (e.g., mouse, dog). Quite a few scientists took this as evidence for genes that distinguish humans from other mammals. According to them, these unique genes arose during the recent evolution of Homo sapiens and that's why there are no homologues in the other mammalian genomes.

Other scientists looked at the data in a different light. They suspected that these "unique" or "orphan" genes were more likely to be artifacts because they were not conserved. In other words, they reached exactly the opposite conclusion based on their understanding of evolution. Their prediction was that these orphan genes resulted from spurious ORF's and not real genes.

Blogging on Peer-Reviewed ResearchThis problem has been examined by Eric Lander's group in Boston, MA (USA) and the results were published in PNAS (Clamp et al., 2007). Their careful analysis has eliminated most of the orphan genes and the new gene count for protein-encoding genes is now 20,488.

Here's how the authors describe the purpose of their study,

The purpose of this article is to test whether the nonconserved human ORFs represent bona fide human protein-coding genes or whether they are simply spurious occurrences in cDNAs. Although it is broadly accepted that ORFs with strong cross-species conservation to mouse or dog are valid protein-coding genes (7), no work has addressed the crucial issue of whether nonconserved human ORFs are invalid. Specifically, one must reject the alternative hypothesis that the nonconserved ORFs represent (i) ancestral genes that are present in our common mammalian ancestor but were lost in mouse and dog or (ii) novel genes that arose in the human lineage after divergence from mouse and dog.
To begin the study they choose to analyze the 21,895 protein-encoding genes in the Ensembl database. They looked for genes that were related to similar sequences in the mouse and dog genomes. (These are the only two well-characterized non-human, mammalian genomes.) After visual inspection of low scoring sequences they were able to eliminate about 1600 potential genes because they were pseudogenes, transposons, or artifacts of various sorts.

They were left with 19,108 verified genes and 1177 orphan "genes"—human ORF's that were not similar to any gene in the mouse and dog genomes. These genes could be newly evolved genes in the human/primate lineage or ancient genes that had been lost in mice and dogs.

The next step was to categorize the orphan "genes" to see if they looked like real protein-encoding genes. The results indicated that in terms of sequence similarity to the same regions in the mouse and dog genomes, the orphan ORF's were indistinguishable from random sequences. Similarly, the characteristics of the presumed codons of these genes were very different from conserved genes and very similar to random sequences with short accidental reading frames. Thus, the orphan sequences look like artifacts.

To confirm this conclusion, the authors compared the sequences to the macaque and chimpanzee genomes. They were not found in those genomes either.
If the orphans represent valid human protein-coding genes, we would have to conclude that the vast majority of the orphans were born after the divergence from chimpanzee. Such a model would require a prodigious rate of gene birth in mammalian lineages and a ferocious rate of gene death erasing the huge number of genes born before the divergence from chimpanzee. We reject such a model as wholly implausible. We thus conclude that the vast majority of orphans are simply randomly occurring ORFs that do not represent protein-coding genes.
This analysis was extended to the other gene catalogs (Vega, and RefSeq) as well as an updated version of the Ensembl catalog (v38). This resulted identification of an additional 1271 valid genes. Adding in the genes in the mitochondrial genome (13) and the Y chromosome (78) gives a total of 20,470 genes.

Finally, reanalysis of the transposons and pseudogenes revealed 18 cases where a real gene had evolved from an inactive pseudogene. This gives a grand total of 20,488 protein-encoding genes in the human genome.

There are several conclusions that can be drawn from this excellent study.
We show that the vast majority of ORFs without cross-species counterparts are simply random occurrences. The exceptions appear to represent a sufficiently small fraction that the best course is would be consider such ORFs as noncoding in the absence of direct experimental evidence.
This is going to be a major challenge for many workers who prefer to see evolution in a different manner. There are a number of papers that view these orphans sequences as direct evidence that human specific genes had arisen in the recent past. Clamp et al. (2007) are saying that if the sequences aren't present in the macaque and chimpanzee then one should conclude that they are artifacts.

Remember, many of the artifactual genes are supported by EST/cDNA data suggesting that they are transcribed. This study calls that evidence into question—correctly in my opinion—indicating that we should be skeptical of the EST data.
One important biological implication of our results is that truly novel protein-coding genes (encoding at least 100 amino acids) arise only rarely in mammalian lineages. With the current gene catalogs, there are only 168 "human-specific" genes (<1% of the total; only 11 are manually reviewed entries in RefSeq; see SI Table 4). These genes lack clear orthologs or paralogs in mouse and dog, but are recognizable because they belong to small paralogous families within the human genome (2 to 9 members) or contain Pfam domains homologous to other proteins. These paralogous families shows a range of nucleotide identities, consistent with their having arisen over the course of ~75 million years since the divergence from the mouse lineage.
This is an important conclusion and I think it is accurate. There are very few "new" genes in the human genome, and, by implication, in other mammalian genomes. This conclusion is consistent with what we know about evolution but it contradicts studies that purport to show rapid evolution of novel genes and novel regulatory mechanisms in humans.


[Image Credit: The human karyotype is from the Ensembl website.]

Clamp, M., Fry, B., Kamal, M., Xie, X., Cuff, J., Lin, M.F., Kellis, M., Lindblad-Toh, K. and Lander, E.S. (2007) Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl. Acad. Sci. (USA) 104:19428-19433. [DOI 10.1073/pnas.0709013104]

Digital Object Identifier (DOI)

 
The digital object identifier, or DOI, is a unique identifier that's given to electronic documents. The idea is that it serves as a permalink to the item. An item can be moved to a different webpage but the DOI will always point to it as long as the DOI is undated when the item is moved.

We often encounter these DOI identifiers in online journal articles. For example, a recently published PNAS article has the following DOI 10.1073/pnas.0709013104. I usually forget how to resolve those DOI's. In case I'm not the only one, I thought I'd post the information.

The resolver is locatad at http://dx.doi.org/. So if you want to see the PNAS article you type in the following URL: http://dx.doi.org/10.1073/pnas.0709013104. Try it.