Protein Folding In Vitro (Molecular Biology)

Proteins are biologically active only after they have adopted their native, three-dimensional folded conformations (see Protein Structure). Yet the genetic information used in protein biosynthesis specifies only the linear sequence of amino acids residues, the primary structure. Natural proteins can often be denatured, or unfolded, and then renatured, or refolded, to the original conformation. When they cannot, it is usually because the unfolded protein has precipitated, aggregated, or been subjected to covalent modification. Therefore, the information for the secondary structure and the tertiary structure resides in the primary structure. Consequently, it should be possible to predict the three-dimensional structure of a protein from just its primary structure, if the process of protein folding were understood. Moreover, many proteins produced for pharmaceutical or industrial uses are generated initially in insoluble, unfolded, and inactive forms in inclusion bodies, and they must be folded before they can be used.

Protein folding is simply a conformational change, an isomerization (unless disulfide bonds are formed), but its unusual aspect is the enormous number of conformations that an unfolded protein can adopt. If each amino acid residue adopts an average of j conformations, a polypeptide chain with (N+1) residues (N peptide units between them to define the conformation) could adopt up to jN different conformations. The value of j is believed to be approximately 8, so a relatively small polypeptide chain of 100 amino acid residues should be able to sample some 10oy different conformations. If the rate constant for an unfolded conformation to change is k the average time to sample all of these conformations is given by


tmp1C1-23_thumb[1] 

Unfolded conformations cannot change more rapidly than 10 times per second (and probably do so some 109 to 1010 times per second), so it would require, on average, more than 10 66 years to sample 89 10 conformations. Even if there were only two conformations possible per residue, there would still be 10 conformations for N = 100, and 10 years would be required for random searching. Nevertheless, many proteins refold in vitro within seconds or minutes, some within a millisecond. Clearly, protein folding does not occur by a random searching of all possible conformations to find the unique native conformation, and there are likely to be pathways of folding.

A further complication is that unfolded proteins under denaturing conditions (eg, 8 Murea or 6 M guanidinium chloride (GdmCl)) approximate random coils, so that each of the 1015 to 1018 molecules in a typical experimental sample would be expected to have a different conformation at each instant (and a slightly different one some 10-10 seconds later). Therefore, each molecule is initiating folding from a different starting point.

1. Refolding of Small, Single-Domain Proteins

The native conformational states of proteins may often be unfolded reversibly by adding denaturants, increasing or decreasing the temperature, varying the pH, applying high pressures, or cleaving disulfide bonds (see Denaturation, Protein). At equilibrium, the unfolding transitions of single-domain proteins are usually two-state, and only the fully folded, native (N) and unfolded (U) states are populated. In this case, unfolding of the native conformation is cooperative, and partly folded molecules are unstable relative to the U or N states under all conditions. The most common exception is the molten globule conformation, which predominates under intermediate denaturing conditions with some proteins.

Experimental in vitro studies of protein refolding kinetics generally start with the fully unfolded protein under unfolding conditions and, to initiate refolding, change the conditions abruptly to favor the folded state. Then changes in the conformational properties of the protein molecule population are monitored as a function of time. Studies of a number of small, model proteins have given the following general picture of how proteins fold (although there are always exceptions, and there is no generally accepted view of the protein folding mechanism.)

Generally, the fully folded state N begins to appear immediately without a detectable lag period and at a rate depending on both the identity of the protein and the refolding conditions. The rate is independent of the unfolding conditions used. The more physiological the refolding conditions, the greater the rate of folding (Fig. 1 ). Unfolding is observed by the reverse process of abruptly changing conditions from folding to those favoring unfolding, for example, by rapidly adding a denaturant or by changing the pH or temperature. The more denaturing the conditions, the more rapid the rate of unfolding.

In both unfolding and refolding, all of the molecules have the same probability of undergoing the transition, and a single rate constant is observed for the population, unless the molecules differ covalently or conformationally in a way that is only slowly interconverted. The most apparent instances of the latter are when there are both cis and trans isomers of peptide bond residues.(see Cis/Trans Isomerization). The trans form is intrinsically more stable, and the cis form is usually not present substantially, except when adjacent to proline residues. This cis form can occur in a folded protein, which usually has such a peptide bond cis in all of the molecules. In the unfolded state, however, there is an equilibrium mixture of both isomers. Consequently, some unfolded molecules have one or more incorrect isomers. Such isomerization is intrinsically slow, and the folding of these molecules is slowed or prevented by the incorrect isomer. The isomerization can be rate-limiting for their folding. The following discussion is limited to those small model proteins that show no such intrinsically slow process in folding.

Figure 1. Typical temperature dependence of the rates and equilibria of protein folding transitions not involving intrinsically slow isomerizations. The natural logarithms of the rate constants for unfolding and refolding are plotted as a function of (temperature)-1 in an Arrhenius plot. A similar plot of the equilibrium constant Keq between the folding (N) and unfolded (U) states is a van’t Hoff plot. The curvature of the van’t Hoff plot results from the greater apparent heat capacity of U than N. The linear Arrhenius plot for the rate of unfolding indicates that the folding transition state has the same heat capacity as N. The greater heat capacity of U is reflected entirely in the curvature of the Arrhenius plot for the rate of refolding becausetmp1C1-25_thumbThe data used to construct this diagram are for hen egg-white lysozyme at pH 3 (2, 3), extrapolated to the absence of GdmCl. Althoughtmp1C1-26_thumbat tmp1C1-27_thumbit is a coincidence that the rate constants also have the value 1s-1 at this temperature, so that all three curves intersect at a common point.

Typical temperature dependence of the rates and equilibria of protein folding transitions not involving intrinsically slow isomerizations. The natural logarithms of the rate constants for unfolding and refolding are plotted as a function of (temperature)-1 in an Arrhenius plot. A similar plot of the equilibrium constant Keq between the folding (N) and unfolded (U) states is a van't Hoff plot.

The ratio of the rate constants for unfolding and refolding generally agrees with the equilibrium constant for the transition, so microscopic reversibility applies. There appears to be a classical transition state for the unfolding/refolding transition. That all of the many conformationally diverse molecules of a population of unfolded protein refold with the same rate constant, independent of how the protein was unfolded, indicates that all of the molecules must equilibrate rapidly and reversibly before undergoing the same rate-limiting step.

Upon transferring the unfolded protein to refolding conditions, some proteins adopt partly folded or molten globule conformations very rapidly, more rapidly than the native conformation appears. Such partly unfolded species are often considered responsible for the rapidity of folding, and much effort has gone into characterizing them. This is difficult because they are populated only transiently and are converted to N. These intermediate species, however, are usually in rapid equilibrium with the unfolded state, and it is generally not possible to distinguish between the two possible kinetic models, in which they are either on- or off-pathway intermediates I:

tmp1C1-28_thumb[1]

Many proteins do not adopt such partly folded species but remain unfolded until converting to N in an apparently all-or-nothing transition. These proteins also refold more rapidly than those that adopt partly folded conformations, although they are also the smaller proteins. Therefore, the presence of stable, partly folded intermediates is not necessary for rapid protein folding.

Such partly folded species are generally not detected as intermediates during unfolding, which is almost always an all-or-nothing transition, even with proteins that adopt partly folded intermediates in refolding. Some exceptions have, however, been reported (1).

The transition state for the folding transition is characterized indirectly by measuring the rate of unfolding or refolding as a function of changing the conditions (Fig. 1) or by altering the covalent structure of the protein. Plots of the logarithm of the rate constant versus the denaturant concentration are generally closely linear, which suggests that the nature of the transition state and the pathway remains constant. The particular data of Figure 1 indicate that the transition state in that case has the same heat capacity as the native state but somewhat lower enthalpy (estimated by the slopes of the Arrhenius plots). Similar observations are found with other proteins, usually by varying the denaturant concentration rather than the temperature. Studies using protein engineering to alter the structure of the protein systematically indicate that many, if not all, of the stabilizing interactions of the native state have been disrupted in the transition state. The transition state is close to the final conformation structurally but lacks the cooperativity that gives net stability to the fully folded conformation.

Ligands that bind tightly to the native protein generally do not increase the refolding rate of small proteins. Instead, they usually decrease the unfolding rate, and this is why they increase the stability of the folded conformation. These observations indicate that the transition state does not bind the ligand tightly.

Folded proteins are often cleaved by proteinases at specific sites on their surfaces, and the folded conformation is maintained. Such cleaved proteins can be unfolded and dissociated into two or three polypeptide fragments. Very often these fragments recombine and regenerate the folded conformation.

The general kinetic scheme suggested by the experimental observations for folding a single protein domain is illustrated in Figure 2.

Figure 2. A general kinetic model for protein folding indicated by the experimental observations, in the absence of intrinsically slow conformational isomerizations. U; are various unfolded molecules with different conformations at the start of folding, I; are partially folded molecules, and N is the fully folded protein. All kinetic steps indicated by arrows are rapid, except for that labeled "slow." "{" indicates the occurrence of the overall transition state. All steps are reversible, except for that indicated with a single-headed arrow, which occurs only in the indicated direction under conditions strongly favoring the folded state. As indicated, all of the unfolded molecules rapidly equilibrate under refolding conditions with a few partly folded species, which are also in rapid equilibrium. All of the molecules pass through a common slow step, which involves going through a transition state that is a distorted form of the native-like conformation. The intermediates I might be stable or unstable and therefore populated transiently or not, but all intermediates that occur after the rate-determining step are very unstable relative to N.

A general kinetic model for protein folding indicated by the experimental observations, in the absence of intrinsically slow conformational isomerizations. U; are various unfolded molecules with different conformations at the start of folding, I; are partially folded molecules, and N is the fully folded protein. All kinetic steps indicated by arrows are rapid, except for that labeled "slow." "{" indicates the occurrence of the overall transition state. All steps are reversible, except for that indicated with a single-headed arrow, which occurs only in the indicated direction under conditions strongly favoring the folded state. As indicated, all of the unfolded molecules rapidly equilibrate under refolding conditions with a few partly folded species, which are also in rapid equilibrium. All of the molecules pass through a common slow step, which involves going through a transition state that is a distorted form of the native-like conformation. The intermediates I might be stable or unstable and therefore populated transiently or not, but all intermediates that occur after the rate-determining step are very unstable relative to N.

2. Kinetic Determination of Folding

If proteins cannot fold randomly and nonrandom folding pathways are crucial, the resulting folded state may not be the most stable conformation possible, but could be instead the form most kinetically accessible. If a kinetic pathway of folding is so vital, it should be possible to block folding by interfering with that pathway, and a protein might fold normally, solely for kinetic reasons, to a metastable state that is not the most stable thermodynamically. Examples are known, but only relatively few.

A number of bacterial proteinases, such as subtilisin and a-lytic protease, are synthesized as inactive precursors that have amino-terminal prosegments which are subsequently removed proteolytically to generate the active, native proteinase. These proproteinases unfold and refold in vitro, but the mature forms do not refold. They refold only when the pro segment is added. The negligible rate of refolding of the mature protein, when the native protein is very stable, indicates a kinetic block to folding that is alleviated by the pro segment.

Within the serpin proteinase inhibitor family, plasminogen activator inhibitor -1 is synthesized in a form that is active as an inhibitor but relatively unstable. The active form slowly converts to a more stable form but is inactive as an inhibitor and known as the latent form. If this latent form is unfolded and then refolded, the active metastable form is regenerated before again undergoing the slow conversion to the inactive but stable form. Therefore, folding does not produce the most stable folded conformation directly but only through a metastable intermediate. The difference between the two forms involves a large change in a b-sheet of the protein.

2.1. Folding Coupled to Disulfide Formation

Many proteins that contain disulfide bonds between the thiol groups of cysteine residues become unfolded if these bonds are reduced, even in the absence of denaturants. The reduced protein remains unfolded even under physiological conditions. If the disulfides are permitted to form again, the protein can regenerate the native disulfide bonds and conformation. Then folding is coupled to disulfide formation. The great advantages of this are that disulfide bond formation and breakage can be controlled experimentally (using thiol-disulfide exchange with a disulfide reagent) and that any disulfide bonds in a protein can be trapped in a form that can be stable indefinitely. Cysteine residues can also be replaced or their thiol groups blocked irreversibly to decrease the number of disulfide possibilities, and the effect on the folding process can be used to dissect the pathway. Therefore, all the disulfide intermediates can be identified and characterized, and their roles in the folding process can be determined, usually unambiguously.

The best characterized disulfide folding pathway is that of BPTI (Fig. 3). Reduced BPTI is a very unfolded polypeptide chain, approximating a random coil, but with weak local interactions between residues close in the primary structure. Consequently, the initial formation of disulfide bonds is approximately random (after correcting for any differences in thiol group reactivity). The one-disulfide intermediates are not random, however, because that with the Cys30-Cys51 disulfide bond adopts a stable, partly folded conformation: this restricts which disulfide bonds can be formed subsequently. There is a kinetic block in forming either the 30-51 or 5-55 disulfide bonds, if the other is already present, which would generate the native-like (30-51,5-55) intermediate. Instead, this intermediate is normally formed most readily by intramolecular disulfide rearrangements of two intermediates with nonnative second disulfide bonds (Fig. 3).

Figure 3. The productive disulfide folding pathway of BPTI. R is the fully reduced protein. Intermediates are indicated b paired. The major disulfide intermediates are depicted, and their conformations determined by NMR analysis are indicate backbone are unfolded or very flexible. The relative rates of the intramolecular step in forming each disulfide bond are ii appropriate arrowhead. The wider the arrowhead, the greater the rate in that direction. The brackets indicate that the one-equilibrium. The "+" between two species indicates that they have the same kinetic roles.

The productive disulfide folding pathway of BPTI. R is the fully reduced protein. Intermediates are indicated b paired. The major disulfide intermediates are depicted, and their conformations determined by NMR analysis are indicate backbone are unfolded or very flexible. The relative rates of the intramolecular step in forming each disulfide bond are ii appropriate arrowhead. The wider the arrowhead, the greater the rate in that direction. The brackets indicate that the one-equilibrium. The "+" between two species indicates that they have the same kinetic roles.

The disulfide pathways elucidated to varying extents with a few other small proteins show similar properties but have variations. Initial disulfide formation in the reduced protein is approximately random until a stable folded conformation is adopted, which can either favor or disfavor formation of further disulfide bonds. Some proteins, such as BPTI (Fig. 3), adopt partly folded conformations. Some, such as a-lactalbumin, adopt the molten globule conformation, and others remain unfolded until the entire folded conformation appears, such as ribonuclease A. In the absence of a folded conformation, further disulfide bonds are formed more slowly and are less stable. There is a kinetic block in forming a disulfide bond if that bond will become buried in a resulting stable folded conformation. Such kinetic blocks are caused by the high energy barrier that is most apparent in the reverse direction, upon reducing a buried disulfide bond. This kinetic barrier is usually overcome most readily in disulfide formation by intramolecular protein disulfide rearrangements in place of the intermolecular process of direct protein disulfide formation. The two processes involve, however, the same energy barrier and the same conformational transitions.

When the protein has all but one or two of the native disulfide bonds, it can adopt the stable native conformation. For example, the very same native conformation is observed with BPTI in which any one of the three native disulfide bonds is missing. These quasi-native species indicate that disulfide bonds merely stabilize the native conformation and do not determine it. The stability of the quasi-native conformation varies, depending on which disulfide bond is missing. The quasi-native conformation inhibits formation of native disulfide bonds that will be buried but favors formation of those on the surface.

3. Multiple Domains

Large proteins are divided into domains, each of which is usually comparable to a small single-domain protein, although they vary in the extent to which the domains interact with and stabilize each other. When such interactions are not very great, a single domain excised from such a protein usually maintains the same conformation and refolds to it after being unfolded. The same principles are expected to apply to the folding of a single domain within a multidomain protein, and it is believed that multidomain proteins fold modularly by their individual domains folding and then associating. The final step, in which the domains interact, is often rate-limiting. Very often, the isolated domain folds more rapidly than when in the intact protein, suggesting that the unfolded domains interact and interfere with each other during folding.

The folded domains and subunits present during refolding of these complex proteins often recognize and bind their specific ligands. Consequently, the presence of such a ligand in this instance increases the rate of refolding and assembly.

Multidomain proteins are also appear to be especially susceptible to aggregation during refolding. This is thought to be caused by specific complementary interactions between the domains of different molecules, similar to those that should occur between the domains of the same polypeptide chain.

3.1. Multiple Subunits

Proteins with quaternary structure, consisting of multiple polypeptide chains, are often dissociated and unfolded under denaturing conditions. Upon transfer to folding conditions, they can refold and reassemble to regenerate the original tertiary and quaternary structures. The folding of the individual domains and their reassembly is often observed as separate transitions. Assembly can be followed by covalent cross-linking at different times. Whether folding or assembly is rate-limiting depends on the protein concentration, because assembly is more rapid at higher protein concentrations. Kinetic results with a number of proteins are consistent with individual subunits folding and then reassembling, but the assembly process also involves some changes in the tertiary structure of the individual subunits. The large size and complexity of multisubunit proteins makes it difficult to elucidate the details. It is usually further complicated by the tendency of the protein to aggregate during refolding, especially at high protein concentrations. The interactions responsible for the aggregation are specific, because other proteins do not usually have an effect, unless very closely related.

In some dimeric proteins, the two polypeptide chains are highly entwined in their native conformations, and it is impossible for the same conformation to be stable with a single chain in the absence of the other subunit. Nevertheless, such proteins refold and reassemble. This process is probably similar to that in which proteolytic fragments of a single polypeptide chain reassociate and refold (see previous discussion). The two processes of folding and assembly must be coordinated and occur in a single, concerted step.

Next post:

Previous post: