Improvement of sequence coverage in peptide mass fingerprinting (Proteomics)

1. Introduction

Detailed information regarding the presence of isoforms, splice variants, and posttranslational modifications (PTMs) is essential for the complete understanding of the biological function, location, and properties of a specific protein.

Traditionally, proteomic studies are based on protein separation using two-dimensional (2D) gel electrophoresis followed by identification by mass spec-trometry (MS). An alternative strategy based on separation of peptides derived by enzymatic digestion of complex protein mixture using 2D liquid chromatog-raphy systems coupled to mass spectrometry has gained extensive use because it allows automation and high throughput. In this approach, each protein is identified on the basis of only a few peptides. However, studies of different isoforms and changes in protein modification require high coverage of the protein sequence and are still best performed after separation of the proteins for example by 2D gels.

Several strategies have been developed for the specific detection of PTMs such as phosphorylation, nitrosylation, glycosylation, and acetylation (Mann and Jensen, 2003; see also Article 61, Posttranslational modification of proteins, Volume 6). Most of these are designed to identify one specific type of modification by selective enrichment of the modified peptides (for review, see Jensen, 2004).

However, if the aim is to fully characterize a protein and its modifications, a more global strategy for full characterization of proteins is needed. This requires that signals covering all or the majority of the protein sequence are observed, that is, a high sequence coverage is obtained. Although sequence coverage close to 100% might be obtained in peptide mass fingerprinting by Matrix Assisted Laser Desorption/Ionization (MALDI) MS, protein identification is often performed with sequence coverage as low as 10%, and essential peptides may fail to be identified for different reasons. The main cause is selective ion suppression during the mass spectrometric analysis of the complex peptide mixtures. Additional reasons for low sequence coverage are loss of peptides during sample handling and the fact that modified peptides often have a low stochiometri. Consequently, the normal fingerprint procedure has to be carefully optimized before (if ever) a full characterization of the protein can be obtained.


In the following section, we will discuss strategies useful for enhancing the sequence coverage by using complementary information gained by the use of different enzymes for protein cleavage, different methods for enrichment, different matrices, and/or different ionization methods. The strategies described all assume that the proteins are separated, for example, by 2D-PAGE prior to characterization.

2. Enzymatic digestion

Traditionally, trypsin is the enzyme of choice for protein and peptide mass fingerprinting. The masses of tryptic peptides are typically within the range of 700-2500 Dalton (Da), which is optimal for protein identification by peptide mass fingerprinting. However, digestion with a single enzyme will often not provide complete sequence coverage. This is partly due to generation of a number of too large or too small peptides that may not be detected under the chosen conditions.

Especially, lysine-terminated peptides with masses below 1000 Da will often be lost in purification procedure or missing in the spectra owing to suppression effects (see below). However, by using suboptimal digestion conditions, these peptides might be observed as part of larger peptides containing one or more missed cleavage sites.

A more efficient strategy to increase sequence coverage is to combine data from the tryptic digestion with those obtained by digestion with proteolytic enzymes with different specificity. Several examples of successful improvement of the sequence coverage using multiple enzymatic digests are given in the literature (Larsen et al., 2001; Choudhary et al., 2003; MacCoss et al., 2002). Examples of proteases with high specificity are endoproteinases Lys-C, Asp-N, and Glu-C, all of which can be used for in-gel as well as in-solution digestion. The use of specific enzymes can be combined with digestion using less specific enzymes such as subtilisin and chymotrypsin, which generally produce smaller peptides. Inclusion of these enzymes might be an advantage when the specific enzymes produce too large peptides either because of lack of appropriate cleavage sites or because of large modifying groups, which might cause steric hindrance for the enzyme action.

3. Signal suppression effects

Ion suppression is not a fully characterized phenomenon. It is influenced by a combination of presence of impurities, gas-phase affinity of the peptides, choice of matrix in MALDI, and choice of ionization and acquisition method.

3.1. Removal of impurities

The presence of impurities such as salt, buffers, and detergents can result in adducts formation and thereby in reduced signal intensities or can lead to complete suppression of peptide signals. Therefore, desalting/cleanup and concentration of the digest prior to mass spectrometric analysis is crucial, especially when analyzing low amounts of starting material (e.g., low abundant protein spots). This can be achieved with commercial microcolumns (e.g., ZipTip’s from Millipore (Billerica, MA, USA)) or by using small reversed-phase Poros R2 columns made in gel loader tips (Gobom et al., 1999). The peptides are bound to the reversed-phase material followed by removal of the contaminants by a washing step and subsequent elution of the peptides directly onto the MALDI target using a matrix solution (Gobom etal., 1999).

We have observed that some small hydrophilic and large hydrophobic peptides are lost in this procedure because they are either not retained on the reversed-phase column or not eluted by the matrix solution respectively (Larsen etal., 2002; Laugesen and Roepstorff, 2003). The small hydrophilic peptides can be trapped by passing the flow-through from the reverse-phase columns onto a column containing graphite instead of the Poros R2 material, followed again by elution with matrix solution (Larsen etal., 2002).

The graphite columns have also been demonstrated to efficiently retain hydrophilic modified peptides such as phosphorylated and glycosylated peptides and thereby improve the chance to identify these modified peptides (Larsen et al., 2004).

3.2. Ion suppression/Preferential ionization in MALDI

It is found that tryptic peptides having a C-terminal arginine give higher signal intensity than lysine-terminated peptides, most likely owing to the high proton affinity of the guanidinium group in arginine (Krause et al., 1999). Conversion of the lysine residues to the more basic homoarginine by reaction with O-methylisourea prior to analysis enhances the signals for lysine-containing peptides, resulting in increased sequence coverage (Brancia etal., 2000; Hale etal., 2000; Beardsley et al., 2000). A recent study by the group of Krause indicates that the presence of phenylalanine, leucine, and proline in the peptide also seems to enhance the desorp-tion/ionization process, resulting in higher signal intensities in a MALDI spectrum, demonstrating that ionization efficiency is a complex and not fully understood phenomenon (Baumgart et al., 2004).

3.3. Matrix selection

The choice of matrix also influences the sequence coverage. Thus, different sequence coverages can be obtained when the same digest is analyzed with the three commonly used matrices for peptide/protein fingerprinting: 2,5-dihydroxybenzoic acid (DHB), a-cyano-4-hydroxycinnamic acid (CHCA), and sinapinic acid (SA) (see Article 14, Sample preparation for MALDI and electrospray, Volume 5). As a consequence, improved sequence coverage can be obtained by combining the results from analysis with different matrices (Gonnet etal., 2003; Gobom etal., 1999). The use of matrix mixtures has also been reported to increase the sequence coverage compared to the use of a single matrix (Laugesen and Roepstorff, 2003). In addition, these matrix mixtures seem to be more tolerant toward the presence of impurities and thus reduce the need for sample purification.

3.4. Choice of acquisition and ionization method

For a simple protein identification, the MALDI time-of-flight instrument will typically be optimized to yield maximal sensitivity and resolution in the mass range between 700 and 3500 Da, with a highest resolution around 2000 Da. However, by tuning the mass spectrometer (grid voltage and delay time), an improved sensitivity and resolution for the detection of larger peptides at the cost of the smaller can be obtained. Repeated acquisitions with the instrument optimized for different mass regions, therefore, often provide better sequence coverage.

The choice of ionization method also influences which peptides are observed. Thus, our experience, as well as the experience of Kast et al. (2003), tells us that only between one-third and one-half of the peptides in a mixture, or even less, are observed in common for Electrospray Ionization (ESI) and MALDI MS. By combining the results from the two ionization methods, nano-ESI and MALDI, considerable improvements of the sequence coverage in protein fingerprinting can be obtained.

3.5. Liquid Chromatography Mass Spectrometry, LCMS

The use of a separation step prior to mass spectrometric analysis overcomes many of the above-mentioned shortcomings when analyzing peptide mixtures. The prepurification step is integrated in the LCMS (Liquid Chromatography Mass Spectrometry) procedure. By separating the individual peptides before reaching the mass spectrometer, suppression effects are reduced or nonexistent and, consequently, sequence coverage close to 100% should be obtainable provided that all the peptides are retained on and subsequently eluted from the chromatographic column, which is not always the case (see above). Until recently, LCMS was only available on ESI instruments. However, off-line LC-MALDI-MS has gained increasing interest, and unpublished data from several groups indicate that this method results in even better sequence coverage than LC-ESI-MS.

4. Interactive data handling

For the description above, it is obvious that complete sequence coverage including observation of peptides present in substochiometric amounts is highly dependent on the complexity of the peptide mixture. Consequently, a prerequisite for complete coverage of the protein sequence including minor variants is that the proteins of interest are isolated prior to analysis (e.g., by 2D gels) and that contaminating proteins such as keratins are eliminated as well as the enzyme used are high quality with minimal autodigestion. Alternatively, exclusion list containing all the masses of known contaminating peptides has to be developed before or during the experiments (using either the program PeakErazor (http://welcome.to/GPMAW) or similar software package) to avoid wasting sample material on identifying peptides belonging to contaminants or peptides already identified.

As soon as the protein is identified, interactive data handling needs to be performed. By making an in silico digest of the protein after identification, the nondetected peptides can be examined for their chemical/physical properties such as hydrophobicity/hydrophilicity, acidity/basicity, predicted modifications, and m/z -values. On the basis of this information, a strategy for the detection of these peptides or their potentially modified forms can be designed using the approaches suggested for improving sequence coverage described above.

The interactive data handling can be combined with the concept of hypothesis-driven multistage MS introduced by Kalkum et al. (2003) in which they calculate the mass for peptides predicted to be present, but only in trace amounts, for example, possible modified peptides or peptides representing isoforms and splice variants. By performing MS/MS or MS3 in a MALDI-ion-trap after selecting the appropriate mass-to-charge value, they have been able to identify peptides for which the signals in the mass spectra were suppressed or hidden in the noise.

Off-line LC-MALDI, in contrast to on-line LC-ESI, offers the possibility to “freeze” the sample “in time”, which again allows evaluation of the data, use of interactive data handling, and hypothesis-driven multistage MS. However, even without an LC system, a considerably increased sequence coverage can be obtained using interactive data handling combined with the optimization strategies described in this chapter.

5. Conclusion

Here we have presented some approaches that can enhance sequence coverage. The strategy described here is time consuming and not easy to automate. And even using this, complete characterization of a protein might not be possible at a proteomics sensitivity level especially because modified peptides of low stoichiometry might escape detection. The challenge in the future will be to quantify the degree of modification for a specific peptide. This will necessitate strategies for quantitative comparison between two peptides with totally different properties, the modified and nonmodified peptides. We expect that it will also be possible to solve this difficult task in the future.

Next post:

Previous post: