Protein Sequencing (Molecular Biology)

The introduction of protein sequence determination started the era of macromolecular characterization. It established the homogeneity of proteins, permitted analytical automation, initiated the path to increasing speed and sensitivity, and was instrumental also in the interpretation of three-dimensional protein structure analysis and, later, DNA sequence analysis. Both the speed and sensitivity of chemical protein sequence determination have increased enormously. Initially, each analytical step required days or more in time and millimoles in amounts, but now only minutes and picomoles or less are necessary. At the same time, the major chemical method, the Edman Degradation, has stayed more or less unchanged. It now constitutes one of the oldest molecular methods that is still much in use. Recently, however, even this method has been challenged by other approaches, mass spectrometry in particular, but further chemical sequencers are also still being launched. Probably, both chemical and mass spectrometric protein sequence analysis will remain for a considerable time. However, protein sequence determination is not just a question of the degradative analysis; it also involves sample preparation, correlation with DNA sequences, including genome sequencing projects, the screening of databases, NMR and X-ray crystallography, as well as epitope analysis and protein structure predictions, all of which constitute different aspects of integrated, modern molecular biology. In short, sequence determination is part of the era of the proteome that is now emerging at the protein identification stage of the genomic sequencing breakthrough. The various stages of protein analysis using chemical degradation and mass spectrometry are presented here, together with the current stage and perspectives of future methodology.


1. Development of sequence determination

Sequence determination of proteins has been possible for about 50 years (1-5). During the second half of the twentieth century, development has been rapid, bringing structural analysis from something requiring the effort of a whole department, a small protein, large amounts, several years of work, and still some uncertainty, to something requiring just a single person, any size of protein, minute amounts, a limited time, and great reliability. This progress has been paralleled by methodological and instrumental developments in essentially nine steps (Table 1).

Table 1. Important Steps in the Development of Methods and Instruments in Protein Sequence Analysis

1. The first analysis and recognition of defined structures (1940s to 1950s)

2. The Edman reaction (~1950)

3. Manual methods: "Dansyl-Edman"and later follow-ups (DABITC) (from ~1960 on)

4. Automation: the protein sequenator or sequencer (1967 on)

5. Development of HPLC (~1977)

6. Polybrene, membranes, valves, conversion,on-line HPLC: 2nd generation sequencers (end of 1970s)

7. Mass spectrometry (1970s)

8. Third-generation sequencers, including C-terminal degradation (end of 1990s)

9. Electrospray, nanospray, present-generation "Protein mass spectrometers" (end of 1990s)

Each one of these steps has meant a corresponding leap in analytical speed, reproducibility, sensitivity, and ease of analysis. The successive development is clearly traceable in the scientific literature, and in this case especially in the Proceedings volumes of the series of the special methods conferences, MPSA (for Methods in Protein Sequence Analysis, 1975-1992, and Methods in Protein Structure Analysis from 1994 to the present) that originated from still earlier meetings that roughly coincided with the introduction of automatic sequencers (6, 7). MPSA continued for a long time as the leading methodological meeting, but other conferences also covering the subject now compete in progress reports, in particular the Protein Society (which started in 1986) and its journal, Protein Science (started in 1992), and the ABRF (Association of Biomolecular Resource Facilities, from 1988; The Journal of Biomolecular Techniques, electronic since 1997). ABRF and MPSA still concentrate on the methodological developments, with ABRF being the larger source. Details of the MPSA meetings and their proceedings are given in Table 2. In retrospect, most of the major advances can be followed in these proceedings.

Table 2. MPSA Proceedings

MPSA Number,

Proceedings Volume

Year Publisher (editor), Year of Publication

I,

1975 Pierce (Laursen), 1975

II,

1977 Elsevier (Previero and Coletti-Previero), 1977

III,

1979 Elsevier (Birr), 1980

IV,

1981 Humana (Elzinga), 1982

V,

1984 (Not Published) (Walker)

VI,

1986 Humana (Walsh), 1987

VII,

1988 Springer (Wittmann-Liebold), 1989

VIII,

1990 Birkhauser (Jornvall et al.), 1991

IX,

1992 Plenum (Imahori and Sakiyama), 1993

X,

1994 Plenum (Atassi and Appella), 1995

XI,

1996 J. Prot. Chem. 16 (van der Rest and Vandekerckhove), 1997

XII,

1998 To be held (Kyriakidis and Choli-Papadopoulou)

2. Methodology and instrumentation

2.1. Initial Analysis of a Protein Sequence and Definition of the Concept

During the end of the 19th century and early parts of the 20th century, proteins were gradually purified, giving insight into the fact that they constitute defined molecules of exact structure. During the same time, the constituent amino acids were characterized, culminating with threonine in the mid-1930s (8). By the use of many methods, including partial acid hydrolysis, and reaction with a labeling reagent that is stable during hydrolysis and specific for the protein amino group, fluorodinitrobenzene (FDNB) (9), the first protein primary structure, that of insulin, was determined in the late 1940s and early 1950s (1, 2), for which Frederick Sanger received the 1958 Nobel Prize in Chemistry. With this sequence determination, proteins were recognized as defined molecules, marking the start of the modern era of structural analysis.

2.2. Edman Degradation and Protein Sequencers

The Edman degradation and its use in automatic protein sequencers is described in the entry Edman Degradation.

2.3. Mass Spectrometry

The nonvolatility of peptides was the prime limitation to the use of mass spectrometry (MS) with proteins. Many advances changed this recently. One especially important step was the introduction of triple quadrupole instruments (10), essentially tandem mass spectrometers (MS/MS), where a first quadrupole, the first "mass spectrometer" or MS!, performs peptide mass selection; a second quadrupole acts as a collision gas cell where peptide fragmentation can occur; and the third quadrupole, the second mass spectrometer, or MS2, analyzes the peptide fragments generated in the collision cell by their mass, thus allowing peptide sequence analysis by collision-induced dissociation or collision-activated dissociation mass spectrometry (11).

Another crucial step in the MS sequencing approach was the introduction of novel ion-producing techniques, to make MS accessible to proteins in general (see Mass Spectrometry). Desorption methods of ionization such as fast atom bombardment (12), plasma desorption (13), and matrix-assisted laser desorption ionization (MALDI) (14), made it possible to transfer peptides and proteins into the gas phase, making them available to mass spectrometry. MALDI, coupled to time-of-flight (TOF) instruments (14), made mass spectrometers easy to handle and brought the technique within the economic means of many protein laboratories. Hence, MALDI-TOF instruments are now used in many laboratories, but mostly for peptide mass measurements, for peptide identification and screening, rather than sequence analysis. With further developments, however, such as the introduction of reflectrons (15), delayed extraction (16), and post-source decay (17), these instruments can also give some sequence information via peptide cleavages. However, more important for sequence analysis, and the real breakthrough via MS/MS instruments, was the introduction of electrospray ionization (ESI) (18). This opened the way to sample introduction via solvents, and hence to on-line applications, with the MS analysis immediately following a chromatography separation. Similarly, further mass analysis systems, including ion traps (19), made additional progress in some MS/MS techniques. Finally, full-scale computerization and on-line databank screenings allowed further speed and interpretations (20). It is possible that MS may become the method that eventually replaces the Edman method.

2.4. Present-Generation Chemical Sequencers

Recent chemical developments of automatic protein sequencers have involved primarily further miniaturization. The most novel type of chemical sequencers now have phenylthiohydantoin (PTH) detection columns in the sub-millimeter diameter range, UV detectors with volumes in the nanoliter range, application possibilities in the sub-picomole range, and overall sensitivities at that level. An important complement has also been developed on the side of sample preparation, with instruments now blotting proteins and peptides in nanoliter volumes from chromatography separations directly onto blotting matrices for subsequent sequencer analysis (21).

2.5. C-Terminal Sequencing

Other chemical advancements have involved development of sequence determination from the C-terminus. The principle has long been known, using isothiocyanate degradation (22). The conditions were harsh, however, using strong acids, acetic anhydride, and repeated activation of the C-terminal carboxyl group in each step. Consequently, the yields were low, side-reactions plentiful, and erroneous peptide bond cleavages frequent. In addition, the secondary amine of proline residues could not react at all, and yields for carboxyl and hydroxyl residues were especially poor. Hence, degradative methods from the C-terminus have not been used much, and commercial C-terminal sequencers did not exist, in spite of all the progress with the N-terminal (see Edman Degradation). Instead, C-terminal analysis long relied on hydrazinolysis (now outdated) (23), reductive methods to get the corresponding alcohol (also outdated) (24), and, in particular, enzymatic approaches, utilizing carboxypeptidases (25). Initially, the available carboxypeptidases had too strict substrate specificities to be useful for protein analysis in general. Lately, however, carboxypeptidases have appeared with wide specificities, and hence good applicability to proteins in general (26-28). In conjunction with mass spectrometric analysis in MALDI-TOF instruments, this has opened a new route to small-scale C-terminal analysis in the "ladder sequencing mode" (see below); (29, 30).

Progress has also been substantial in chemical aspects of C-terminal degradations of proteins. Several adjustments allowed the method to work for a few cycles in most cases (31). Recently, further modifications allow, in one step, simultaneous cleavage of the C-terminal residue and activation of the next, by the use of an additional chemical modification involving S-alkylation of the thiohydantoin. This has now made it possible to degrade several proteins in reasonable yield, and in several cases to follow the C-terminal sequence for up to 10 cycles (32). This new approach has just become available in commercial instruments. Although C-terminal degradation is still less sensitive than N-terminal degradation and less reliable, does not reach as far, has difficulties with some residues (carboxylic and hydroxylic), and is still not feasible at all with proline residues, C-terminal analysis is now becoming practical. Just a few cycles of C-terminal sequence information are sometimes sufficient, especially for identification of the correct recombinant proteins and corresponding gene constructs; for this purpose, there are now both C- and N-terminal instruments.

In conclusion, present-day chemical sequencers have reached the sub-picomole range for N-terminal analyses, and degradations from both ends on a larger scale. The chemistry for C-terminal degradations has started to develop. For N-terminal degradations, miniaturization has progressed substantially, with sensitivity increased about 10 -fold and speed about 10 -fold in ~50 years, using the same basic chemistry of Edman degradation.

2.6. Present-Generation "Protein Mass Spectrometers"

Substantial progress has also been made in mass spectrometry. Electrospray ionization has been miniaturized. The use of small capillaries in the "microspray" mode (33) or "nanospray" mode (34) transfersions into the gas phase more efficiently and increases the sensitivity, as do the infusion of small volumes at low flow rates (10 nL/min), and signal averaging (33, 34).

Similarly, introduction of ion traps has allowed the more efficient collection of ions, increasing the sensitivity. Furthermore, use of TOF mass analyzers as the second MS of MS/MS instruments (35) has increased both the speed and the accuracy (mass accuracy of 0.1 Da and sensitivity at the attomole scale) to new levels in instruments just released. Such instruments for sequence analysis, coordinated with MALDI-TOF instruments for peptide mass determination and screening, are easy to run and require less work than ordinary mass spectrometers. Together, they bring MS to a stage beyond that of chemical protein sequencers, and will perhaps one day overtake the whole sequencing market at a future step. For the moment, TOF combination instruments on the mass spectrometry side, plus the current chemical sequencers, have brought protein sequence determination to a new stage of perfection, speed, and sensitivity.

3. Perspectives and further methodology

Three levels of future protein sequencing may be predicted: "conventional approaches, tissue characterizations, and further correlations.

3.1. "Conventional" Approaches, with Protein Purification and a Column End-Step

For this approach, MS/MS mass spectrometers are now in routine use at the femtomole scale, and miniaturized modern chemical sequencers work in the sub- or low picomole range. Regarding mass spectrometry, the nanospray approach to electrospray ionization has made it routine to analyze proteins in solution. Similar approaches with chemical sequencers and with sample preparation using capillary zone electrophoresis (36) or micro-HPLC (18) mean that sample preparation will continue to make progress along with both MS and chemical sequencer instrumentation. This combination makes it possible to analyze virtually any protein prepared by column chromatography. This development is expected to continue, with further miniaturization, plus on-line shortcuts. Regarding sensitivity, however, we may just have passed half or even more of the major leaps in this "conventional" mode of analysis! Half a century of progress has seen a sensitivity increase of about 108, down to about ~10-14 moles, which is about half-way down through the magnitudes to one atom (the inverse of Avogadro’s number, 0.16*10 ). In other words, we might expect to reach the ultimate one-molecule sensitivity level before or just after another similarly large leap. Also, in some special approaches of protein detection and analysis, using fluorescence correlation spectroscopy, science can already approach the one-molecule level of analysis (37). Although much still remains to make it all routine, to increase speed, and to lower the cost, it is possible that something like half of the major leaps have already been seen in the "conventional" approach of sequence analysis via column preparations of individually purified proteins.

3.2. Tissue Characterizations Using Two-Dimensional Gel

Separations for Sample Preparation This is one of two other types of analysis that offer great possibilities. With proper care, two-dimensional gel electrophoresis can now separate almost thousands of proteins into distinct positions (38). The patterns obtained can be stored, compared, and analyzed for both relative and absolute changes. In this manner, all major proteins of any tissue can be rapidly screened. Comparisons between patterns from normal or diseased organs allow direct detection of tumor markers, signal proteins, and other special forms. Similarly, gene expression in different tissues, and at different ages, allows conclusions to be drawn about developmental and differentiation patterns. In all cases, the corresponding protein spots, including all those that can be detected by ordinary or silver-staining methods (see Silver Stain), can be recovered and identified by sequence analysis.

The present procedure involves recovery of the corresponding gel piece, proper washing to allow subsequent analysis, drying, addition and penetration of a proteinase (usually trypsin ), digestion of the protein in the gel (39), and recovery of the fragments, most conveniently via subsequent mass spectrometry. The MS can be performed simply by accurate mass determination of all the peptide fragments obtained from the action of the proteinase; this is usually sufficient to identify a known protein from a sequence database. If this tryptic fragment mass analysis is not sufficient for identification, the MS/MS instrument can simply be set instead to analyze directly for sequence (via use of the collision cell) of all those fragments that are not identified by just their masses. Notably, this entire analysis is theoretically possible from just one spot recovered from a two-dimensional gel. Several such approaches are listed in the latest MPSA proceedings volumes (eg (40, 41)), and are within reach of most protein analysis centers (42). It is to be expected that all major proteins in most tissues will soon have been identified as to their nature and function.

3.3. Further Correlations of Databank Technology, Protein Analyses, and Separation Modes

The great variety of post-translational modifications can also be identified through protein sequencing methods. Two-dimensional gel separations are excellent also for separation of multiple forms of the same protein, independent of whether the multiplicity is derived from additions/deletions (size differences, noticeable in one of the two directions) or from changes in charge (the other direction). It should also be noted that mass spectrometry can be used to detect also noncovalent associations, simply by using proper energy levels in the ionization steps of ion production.

4. Summary

In short, much of the proteome era to come will use the separation, analysis, and computer methods now available to characterize all cellular proteins functionally through sequence analysis. Because of the large equipment involved, much of this era may be expected to occur in "protein analysis centers" rather than in individual laboratories or groups. Still, continuous miniaturization and small-scale approaches like the "ladder" modes mentioned above (28, 29) may also keep individual laboratories and groups genuinely in-scale and fully contributing.

Next post:

Previous post: