Wednesday, February 04, 2009

Genomics, Proteomics and Mass Spectrometry

The explosion in sequence information as a result of various genome projects has resulted in many unexpected payoffs. One of them has to do with the identification of tiny amounts of unknown protein.

Many experiments in biochemistry and molecular biology lead to the recognition of a novel protein that hasn't been identified. For example, one could go fishing for proteins that bound to other proteins or look at the protein composition of various complexes.

Often the only thing one knows about the protein is its molecular weight on an SDS gel. You can cut out the band containing your protein of interest and extract the protein but that only gives you a tiny amount of denatured protein.

With the development of protein mass spectrometry it becomes possible to determine an accurate molecular weight of the protein [Biochemistry and Mass Spectrometry]. In theory, one could then compare this molecular weight to all the calculated molecular weights of all the proteins encoded in the genome. These calculated molecular weights can be determined from the genome sequence—if you're lucky enough to be working with an organism whose genome has been completely sequenced.

Unfortunately, there are many proteins with similar molecular weights so this straightforward technique doesn't work. However, if you digest the protein with enzymes that cut it several times at specific sites, you create group of peptide fragments. The molecular weights of the peptides can be determined by mass spec and the "fingerprint" of your unknown protein can be compared to calculated fingerprints of every protein in the proteome.

Here's an example of a tryptic digest of an unknown human protein of Mr = 90,000. The sizes of the various fragments can be measured accurately and compared to the predicted fragment sizes based on the known DNA sequence of the gene. If you're lucky, there is only one protein that will give rise to the observed peptides. Thus, the unknown protein can be unambiguously identified from the mass of its peptides.

In this case, the protein is Hsp90. As you might have guessed, the success of this techniques owes almost as much to the development of efficient software and databases as it does to the advances in mass spectroscopy.

The technique is powerful but the equipment is expensive and requires well-trained technicians.

There are many different kinds of mass specs and every lab will have its own customized setup. The one shown here belongs to Joseph Loo of Chemistry & Biochemistry, UCLA (Los Angeles, CA, USA). I "borrowed" it from his website [Joseph Loo].

Modern research facilities will have access to special labs where protein fingerprinting is routinely performed. In some cases, a major facility will serve as a regional center for analyses and charge a fee ($50-150) for each sample.

The image of the tryptic peptides of Hsp90, above, are from the website of such a facility in the Department of Biochemistry at the University of Buffalo (Buffalo, NY, USA) [Proteomic Capabilities]. Now that you know how the technique works, the description on their website will look much less intimidating.
The MALDI-TOF facility housed in the Department of Biochemistry provides access to mass spectrometric fingerprinting of unknown proteins. MALDI-TOF (Matrix-assisted, Laser-Desorption-Ionization/Time of flight) mass spectrometry is presently the method of choice for identification of unknown proteins via mass analysis of proteolytic peptides, and for characterization of post-translational modifications. This technique is rapid, highly sensitive, and applicable to a wide variety of research problems. Applications include direct characterization of mutated proteins, estimating the extent of protein derivatization (e.g., biotinylation), and identification of unknown proteins isolated from polyacrylamide gels. Depending on the specific application and complexity of the system, reliable data can be obtained in the fmol-pmol range.
In practice, the identification of a protein from its predicted fingerprint doesn't always work. The determined molecular weights aren't precise enough to unambiguously identify the protein and some peptides don't "fly." In addition, post-translational modifications of the protein will interfere with the molecular weights calculated from the gene sequence.

In most cases when you send out your sample you get back a list of possibilities that has to be narrowed down by other means (e.g., another protease digest).

This limitation has led to the development of coupled mass specs where the peptides from one are fragmented and fed into another. What this gives you is the sequence of each peptide by a technique called MS/MS. With sequence information you can search all the databases for sequence similarity and identify proteins even if the gene for that particular species hasn't been cloned and sequenced.

1 comment :

  1. I've made use of mass spec. The protein supply I was getting from a collaborator was yielding crappy crystals. I revamped the purification procedure, whipped up my own batch of good clean protein, then got no crystals at all. This result was repeatable.

    It occurred to me: the new protein was too clean. So I sent some of the old crappy crystals off for mass spec, and sure enough the protein in then was about 30 residues shorter than expected. I had a stop codon inserted into the expression construct in order to get good, clean, consistently shorter protein, and viola! the crystallization yields got much better.