Protein Folding In Vivo (Molecular Biology)

Protein folding is the process by which the linear information contained in the amino acid sequence of a polypeptide chain gives rise to the unique three-dimensional conformation of the functional protein structure. How folding is achieved with high efficiency constitutes one of the basic problems in biology. The discovery that unfolded polypeptides can refold spontaneously in vitro (1) (see Protein Folding In Vitro) suggested that the folding (acquisition of tertiary structure) and assembly (formation of quaternary structure) of newly-synthesized polypeptides in vivo also occur spontaneously, without the involvement of further components. It is now clear, however, that in many cases efficient folding in the cell depends on a machinery of preexisting proteins, the molecular chaperones, whose primary task, often in an energy-dependent manner, is to prevent protein misfolding and aggregation (2, 3). Thus in vivo the principle of self-assembly of proteins is replaced by a process of assisted self-assembly. Another important difference between protein folding in vitro and in vivo relates to the fact that in the cell folding occurs in the context of translation during protein biosynthesis. As a result, folding can initiate cotranslationally, and this mechanism is important for the successful folding of large modular polypeptides with multiple domains.

1. Protein Aggregation

Many proteins refold via compact globular intermediates that contain varying amounts of secondary structure but lack the stable tertiary structure that defines the native state (see Molten Globule). These intermediates expose to solvent some hydrophobic amino acid residues (which mostly will be buried upon correct folding) and via these residues tend to associate with one another to form aggregates. Aggregate formation is usually an irreversible off-pathway step. In vitro, the extent of aggregation can often be controlled by lowering the protein concentration and the temperature of the refolding mixture and by adjusting other physical parameters of the reaction, such as pH and ionic strength. In contrast, cellular conditions dictate that, without the intervention of molecular chaperones, aggregation would outcompete correct folding, at least for a significant fraction of newly synthesized polypeptides. The concentration of unfolded polypeptides emerging from ribosomes (ie, "nascent" polypeptide chains) in the cytosol is very high; it reaches ~35 |M(~1mg/ml for a 30-kDa chain) in Escherichia coli, assuming a uniform distribution of ribosomes. The local concentration of nascent chains is significantly greater, however, because of macromolecular crowding (see Excluded Volume) and because translating ribosomes are organized in polyribosomes. Macromolecular crowding refers to the fact that a large fraction of the cellular volume is occupied by proteins and other macromolecules at a total concentration of ~300 g/L and is therefore not available to other macromolecules (4, 5). Crowding is predicted to result in an increase by several orders of magnitude in association constants for unfolded polypeptides over those in dilute solution.


The risk of newly synthesized polypeptides aggregating is enhanced by the inability of nascent chains to fold into stable tertiary structures, at least during the early phase of translation. Stable folding requires the presence of a complete protein domain (usually ~100-300 amino acid residues in length) that can fold independently (see Protein Structure). As the C-terminal ~30 amino acid residues of a translating polypeptide are tethered to the ribosome, and are thus topologically restricted, nascent chains remain unfolded until an entire domain has emerged from the ribosome. Aggregation of these nascent chains is thought to be prevented by the co-translational binding of molecular chaperones, including members of the heat-shock proteins 70 (Hsp70) and Hsp40 (DnaJ) families (3).

2. Molecular Chaperones

Molecular chaperones were originally defined as proteins that mediate the correct assembly of other proteins, but are not themselves components of the final functional structures (2). Chaperones occur ubiquitously, and many of them are classified as stress response proteins, although their functions are essential under normal growth conditions (3, 6-9). Most chaperones function by stabilizing an otherwise unstable conformer of another protein—and by controlled binding and release, may facilitate its correct fate in vivo, whether this is folding, oligomeric assembly, transport to a particular subcellular compartment, or disposal by degradation (9). Molecular chaperones do not contribute steric information for correct folding, but rather prevent incorrect interactions between (and perhaps also within) nonnative polypeptides, thus typically increasing the yield, but not the rate, of folding reactions. These properties distinguish the chaperones from so-called folding catalysts, protein disulfide isomerases, and peptidylprolyl cis-trans isomerases, which accelerate intrinsically slow steps in the folding of some proteins, namely, the rearrangement of disulfide bonds in secretory proteins (see Protein Secretion) and the cis-trans isomerization of peptide bonds preceding proline residues, respectively (10, 11).

Molecular chaperones fall into several structurally unrelated families of proteins, including the members of the Hsp70, Hsp40 (DnaJ), and Hsp90 families, as well as the chaperonins (Hsp60) and the so-called small heat-shock proteins. Most of these proteins are soluble, but membrane-bound chaperones, such as calnexin in the endoplasmic reticulum (ER), exist as well. A table summarizing the main classes of molecular chaperones is found in the entry Molecular chaperone. Chaperones may be expressed constitutively or on exposure of cells to stresses, such as high temperature. Those chaperones with a broad spectrum of protein substrates generally suppress protein aggregation by recognizing hydrophobic amino acid residues or accessible surfaces that are exposed by unfolded proteins or incompletely folded polypeptide chains. The need for an increase in cellular chaperone capacity at elevated growth temperatures (or upon exposure of cells to other stresses) is explained by the tendency of preexisting proteins to unfold under these conditions. Proper folding is usually achieved by the controlled dissociation of the complexes between unfolded polypeptides and chaperones, which often occurs in an ATP- and cofactor-dependent mechanism.

Of the chaperones with a well-documented role in assisting the folding of newly synthesized polypeptides (de novo folding), the members of the Hsp70 and chaperonin classes have been studied most extensively. They represent two basic paradigms of ATP-dependent chaperone action.

3. Hsp70s

The Hsp70s are a family of highly conserved ATPases of relative molecular mass ~70,000 found in prokaryotes (see DnaK/DnaJ Proteins) as well as in eukaryotes, where they occur in the cytosol, mitochondria, chloroplasts, and the ER (12) (see BiP (Hsp70)). Protection of nascent chains from aggregation is thought to be the main role of Hsp70 in de novo protein folding (13-15). In addition, Hsp70s have other important functions in protein metabolism under both stress- and nonstress conditions, including functions in membrane translocation of proteins and the degradation of misfolded proteins. This versatility of the Hsp70s results from their basic function, which is to bind and release hydrophobic peptide segments that are generally exposed in unfolded polypeptides. These peptide segments are on average seven amino acid residues in length and must be enriched in hydrophobic amino acids, such as leucine (Fig. 1a) (16-18). These segments are bound in an extended conformation within a binding cleft of the ~18 – kDa C-terminal domain of Hsp70, whose high resolution X-ray crystallography structure is known (19). Peptide binding and release by the C-terminal domain is regulated via ATP-dependent conformational changes in an N-terminal ATPase domain of ~45 kDa, which has the three-dimensional structure of actin (20). Peptide binds to the ATP-state of Hsp70 in which the peptide binding cleft in the C-terminal domain is open (Fig. 1b). In the ADP state, the peptide cleft is closed, preventing the peptide from dissociating. Efficient peptide binding depends on essential cofactors of the Hsp40 (DnaJ) family. E. coli DnaJ and several of its eukaryotic homologues are chaperones in their own right, as well as activators of the Hsp70 ATPase (21-23). DnaJ can present an unfolded polypeptide to ATP-bound Hsp70 (DnaK in E. coli). DnaJ-catalyzed hydrolysis of ATP results in stable peptide binding. In E. coli (and in mitochondria) an additional cofactor, GrpE, catalyzes the exchange of Hsp70-bound ADP for ATP and thereby facilitates polypeptide release (21, 24). It is noteworthy that the Hsp70 reaction cycle resembles the regulation of certain GTP-binding proteins (3, 23) (Fig. 1b).

Figure 1. The Hsp70 chaperone system. (a) Example of a peptide with high affinity for Hsp70. This peptide was cocryst peptide binding domain of DnaK (19). Peptide segments with similar properties occur on average at a distance of 50-10( Mechanism of the E. coli Hsp70 system DnaK (Hsp70), DnaJ, and GrpE. Note that E. coli DnaJ binds to the unfolded pc chaperones, but this may not be the case for other Hsp40 homologs, such as mammalian Hsp40. Whenever U is released, provided all structural elements necessary for folding are available (eg, on release of a nascent chain from the ribosome). of interaction with the Hsp70 system.

The Hsp70 chaperone system. (a) Example of a peptide with high affinity for Hsp70. This peptide was cocryst peptide binding domain of DnaK (19). Peptide segments with similar properties occur on average at a distance of 50-10( Mechanism of the E. coli Hsp70 system DnaK (Hsp70), DnaJ, and GrpE. Note that E. coli DnaJ binds to the unfolded pc chaperones, but this may not be the case for other Hsp40 homologs, such as mammalian Hsp40. Whenever U is released, provided all structural elements necessary for folding are available (eg, on release of a nascent chain from the ribosome). of interaction with the Hsp70 system.

Hsp70 functions essentially as a buffer for unfolded and incompletely folded polypeptides, thus reducing the concentration of aggregation-sensitive folding intermediates. The net result of Hsp70 action is the binding and release of the polypeptide chain in an unfolded conformation. On release, the unfolded polypeptide may fold spontaneously to the native state, provided all structural elements necessary for folding are available, or it may be transferred to another chaperone (see text below) or rebind to Hsp70. On the basis of these properties, the Hsp70 chaperone system is ideally suited to protect nascent polypeptide chains against aggregation and to assist in their folding. As long as a polypeptide chain (or a domain) is not yet completely synthesized, release of nascent polypeptide from Hsp70 will not result in folding but in rebinding to Hsp70 as the chain continues to expose hydrophobic residues (see Fig. 1). Indeed, a large fraction of nascent polypeptide chains interact with Hsp70 in vivo (15), but whether this interaction is generally required for folding remains to be demonstrated.

4. Chaperonins

In contrast to the Hsp70s, the chaperonins (also classified as Hsp60s) are large cylindrical protein complexes consisting of two rings of ~60 – kDa subunits that are stacked back to back. In the case of eubacterial chaperonins, such as GroEL of E. coli, there are seven identical subunits per ring, whereas the chaperonins of archaebacteria and the eukaryotic cytosol are heterooligomeric and may contain up to eight different subunits (3, 25). The salient structural feature of these ~800 – kDa complexes is a central cavity in which a single polypeptide molecule can fold and avoid aggregation with other unfolded polypeptides. Unfolded polypeptide chain binds to hydrophobic patches at the inner wall of the chaperonin cavity. In the case of GroEL, binding of the ring-shaped cofactor GroES to the opening of the GroEL cylinder then displaces the polypeptide from these binding sites into an enclosed folding cage (26-29). The ATPase activity of the chaperonin regulates the closing and opening of the cage. Binding to chaperonin may also unfold polypeptides that have been kinetically trapped in misfolded states, thereby giving them another chance to fold upon release into the cavity. (See also Chaperonin.)

Chaperonins are essential for cell viability under all growth conditions (30-32), because they are required for the folding of a subset of newly synthesized polypeptides. In E. coli, GroEL interacts with approximately 15% of total newly synthesized cytosolic proteins at growth temperatures of 30-37°C, and with up to 30% or more under heat stress at 42°C (33). In addition to chaperonin-dependent proteins that interact more or less quantitatively with GroEL, other proteins transit GroEL with only a few percent of the total population of molecules. These latter proteins do not depend on GroEL for folding, at least under nonstress conditions. GroEL acts predominantly post-translationally, and most of its substrates fall into the size range of 10 to 55 kDa, the upper size limit of the folding cage (33). According to in vitro studies, GroEL-dependent proteins have relatively slow folding rates and therefore are highly aggregation-sensitive. When expressed in cells containing insufficient or dysfunctional chaperonin, these proteins aggregate or are degraded proteolytically. The mechanisms of folding for polypeptides larger than ~55 kDa will be discussed below.

5. Chaperone Pathways in Folding

The Hsp70 and chaperonin systems can act sequentially. This sequential action has been demonstrated for protein folding in the cytosolic compartment and in mitochondria and chloroplasts (3). Considering the origin of mitochondria from endosymbiotic bacteria, the matrix of mitochondria—specifically, the space surrounded by the inner mitochondrial membrane—is evolutionarily related to the bacterial cytosol. This compartment contains homologues of all the major bacterial chaperones, DnaK (Hsp70), DnaJ, GrpE, GroEL, and GroES. Although mitochondria still synthesize a small number of proteins in the matrix, most mitochondrial proteins are imported from the cytosol (34). A prerequisite for effective membrane translocation is that these polypeptides be maintained in an unfolded state by cytosolic chaperones (see Translocation). On import, they interact first with mitochondrial Hsp70 that is bound to the inner surface of the inner membrane and then with soluble Hsp70 in the matrix (Fig. 2a). Membrane-bound Hsp70 participates actively in the translocation process itself, whereas soluble Hsp70 assists in folding. A subset of imported proteins has to be transferred from Hsp70 to the mitochondrial GroEL homologue, Hsp60, for successful folding (35, 36). In vitro reconstitution experiments using the E. coli chaperones established the mechanistic significance of this chaperone relay (22). The Hsp70 system prevents aggregation of unfolded polypeptides, preserving their folding competence, whereas the chaperonin mediates folding to the native state. This pathway is not necessarily unidirectional; polypeptides that cannot fold on the chaperonin may be transferred back to Hsp70 and may eventually be degraded (3).

Figure 2. Sequential action of the Hsp70 and chaperonin systems in folding. (a) Protein folding in mitochondria. OM and IM, outer and inner mitochondrial membranes, respectively. Hsp60, mitochondrial GroEL; Hsp10, mitochondrial GroES. Mitochondrial Hsp70 interacts first with the incoming polypeptide because of its ability to recognize extended peptide motifs and its specific asociation with the translocation machinery of the inner mitochondrial membrane. Release of newly translocated polypeptide from Hsp70 is dependent on ATP and the action of mitochondrial Hsp40 (DnaJ) and GrpE (see Fig. 1). The Hsp70 and Hsp60 (chaperonin) systems are functionally distinct in that only Hsp60 can release the substrate protein in a fully folded state. Small, rapidly folding proteins will either not interact with Hsp60 or will do so only with low efficiency. (b) Folding of newly synthesized polypeptides in the eukaryotic cytosol. TRiC, the cytosolic chaperonin. Cytosolic Hsp70 interacts with the nascent polypeptide because of its ability to recognize extended peptide motifs. Binding of NAC very close to the peptidyl-transferase center may precede that of Hsp70 for most cytosolic proteins. Release of newly translated polypeptide from Hsp70 is dependent on ATP and the action of Hsp40 (DnaJ). This step is probably GrpE-independent in eukaryotes. Most proteins fold upon release from Hsp70, but a subset of proteins needs assistance by the chaperonin for folding. Although folding of these proteins occurs in the central cavity of TRiC (as in the case of GroEL), TRiC is independent of a GroES cofactor. The function of GroES to form a lid on the opening of the chaperonin cylinder appears to be integrated into the structure of the TRiC subunits (50).

Sequential action of the Hsp70 and chaperonin systems in folding. (a) Protein folding in mitochondria. OM and IM, outer and inner mitochondrial membranes, respectively. Hsp60, mitochondrial GroEL; Hsp10, mitochondrial GroES. Mitochondrial Hsp70 interacts first with the incoming polypeptide because of its ability to recognize extended peptide motifs and its specific asociation with the translocation machinery of the inner mitochondrial membrane. Release of newly translocated polypeptide from Hsp70 is dependent on ATP and the action of mitochondrial Hsp40 (DnaJ) and GrpE (see Fig. 1). The Hsp70 and Hsp60 (chaperonin) systems are functionally distinct in that only Hsp60 can release the substrate protein in a fully folded state. Small, rapidly folding proteins will either not interact with Hsp60 or will do so only with low efficiency. (b) Folding of newly synthesized polypeptides in the eukaryotic cytosol. TRiC, the cytosolic chaperonin. Cytosolic Hsp70 interacts with the nascent polypeptide because of its ability to recognize extended peptide motifs. Binding of NAC very close to the peptidyl-transferase center may precede that of Hsp70 for most cytosolic proteins. Release of newly translated polypeptide from Hsp70 is dependent on ATP and the action of Hsp40 (DnaJ). This step is probably GrpE-independent in eukaryotes. Most proteins fold upon release from Hsp70, but a subset of proteins needs assistance by the chaperonin for folding. Although folding of these proteins occurs in the central cavity of TRiC (as in the case of GroEL), TRiC is independent of a GroES cofactor. The function of GroES to form a lid on the opening of the chaperonin cylinder appears to be integrated into the structure of the TRiC subunits (50).

There is increasing evidence from in vivo studies that the Hsp70/chaperonin pathway plays an important role in the prokaryotic and eukaryotic cytosol, both for the folding of newly synthesized polypeptides and during the refolding of stress-denatured polypeptides (3, 37-41). Hsp70 interacts with a wide array of nascent chains in eukaryotes (15). While the majority of proteins probably do not have to transit further chaperones to complete folding, a subset of proteins, presumably those that fold slowly, must be transferred from Hsp70 to chaperonin to reach their native state (39, 40) (Fig. 2b). The major substrate proteins in eukaryotes that follow such a pathway are the cytoskeletal proteins actin and tubulin, which are critically dependent on the cytosolic chaperonin (CCT, or TRiC) for folding (32, 40, 42, 43). The situation is similar in bacteria in that only a fraction of all newly formed polypeptide chains bind to chaperonin (33). For example, folding of certain bacterial forms of ribulose bisphosphate carboxylase expressed in E. coli requires interaction with DnaK (Hsp70) and with GroEL, whereby the two systems must act sequentially and not in parallel (41).

Although the basic mechanistic principles of chaperone action are now well understood, the complexity of chaperone-assisted folding pathways in vivo is only beginning to be appreciated. For example, in E. coli cytosol a significant fraction of nascent chains interact with trigger factor, a 48-kDa chaperone that has an affinity for ribosomes and possesses both chaperone and peptidylprolyl cis-trans isomerase activities (44). In eukaryotes many nascent chains interact first with nascent-chain-associated complex (NAC) before they form a complex with Hsp70 (45) (Fig. 2b). NAC is a heterodimer of 33-kDa and 21-kDa subunits. NAC binds also to ribosomes and prevents their association with the membrane of the endoplasmic reticulum, except when a secretory precursor polypeptide is synthesized. For cytosolic proteins, NAC binding may help to recruit Hsp70.

A multiplicity of chaperone components cooperate in mediating proper folding, disulfide bond formation, and glycosylation of secretory proteins in the lumen of the endoplasmic reticulum (ER) (see Protein Secretion). Correct folding and assembly is a prerequisite for the packaging of proteins into vesicles that travel from the ER via the Golgi apparatus to the cell surface, and the ER lumen is effectively a highly concentrated solution of chaperones and protein folding catalysts. These include, among others, the Hsp70 homologue BiP, various Hsp40s, the Hsp90 homologue Grp94, protein disulfide isomerase, calnexin, and calreticulin. The latter two chaperones recognize certain carbohydrate modifications that are typical of incompletely folded polypeptides and retain these polypeptides in the ER until folding is completed (46). It is noteworthy that the ER does not contain a chaperonin homologue.

6. Cotranslational Folding of Multidomain Proteins

Analysis of the size distribution of proteins in several completely sequenced genomes indicates that eukaryotes have a proportionally larger number of modular polypeptides consisting of multiple protein domains, than do bacteria (47, 48). For example, in the yeast Saccharomyces cerevisiae the average protein has a length of 496 amino acids (~55 kDa), and ~38% of all yeast proteins are larger than 55 kDa, including ~1450 soluble proteins (48). In contrast, the average length of an E. coli protein is only 317 residues (~35 kDa), and only 13% of all E. coli proteins exceed 55 kDa, the size cutoff of the GroEL/GroES folding cage. The size distribution of protein domains (the "folding units") is uniform across all three kingdoms of life (bacteria, archaea, eukarya) in the range of 100300 residues. Thus a genome encoding proportionally longer polypeptides must encode more and/or longer multidomain polypeptides. Since these proteins frequently do not refold efficiently in vitro, it would be expected that their folding in vivo is particularly chaperonin-dependent. However, the volume capacity of the central cavity of the eukaryotic cytosolic chaperonin, TRiC, is probably not much greater than that of GroEL. Moreover, TRiC is of low abundance in many eukaryotic cells and is thought to interact with only a restricted subset of polypeptides (see text above). Neither do other abundant chaperone proteins, including the Hsp90 system (49), play a general role in de novo protein folding. How do large modular polypeptides manage to fold efficiently in vivo?

In modular polypeptides, domains are often joined by flexible linker segments. Such proteins are able to fold cotranslationally as their domains emerge sequentially from the ribosome (3, 47). This mechanism allows a high efficiency of folding by reducing unproductive intramolecular interactions between concurrently folding domains. Such interactions may occur during the collapse of the unfolded chain into a disorganized globule and may explain the tendency of multidomain proteins to misfold in vitro (see Protein Folding In Vitro). Mechanistically, cotranslational folding reduces the problem of folding a large polypeptide to the folding of its independent modules or domains, those structural units most able to fold spontaneously. Sequential domain folding probably relies predominantly on the protection of nascent chains by Hsp70 until a complete domain has been synthesized and emerged from the ribosome (Fig. 3). ATP-dependent release of Hsp70 may then allow domain folding. The ATPase activity of Hsp70 in the eukaryotic cytosol (~1ATP hydrolyzed per molecule per minute) seems to be adjusted to the speed of translation (about two to three amino acids per second) such that Hsp70 would bind and release the nascent chain once during the synthesis of a polypeptide domain of average length. Transfer from Hsp70 to the chaperonin may be necessary only for proteins that are unable to fold into stable structures during translation. This transfer is the case for some multidomain proteins in which the domains are constructed of discontinuous sequence segments of the polypeptide chain. Here a continuous chain forms part of a domain, then leaves the compact region to form part or all of another domain, after which it returns to complete the previous domain. Actin, one of the main substrate proteins of the eukaryotic cytosolic chaperonin, is composed of discontinuous domains and forms stable tertiary structure only post-translationally. Similarly, proteins whose domains are structurally unstable in isolation (and are ultimately stabilized by interactions with other domains or subunits) may also require sequestration in the chaperonin folding cage for post-translational folding. Thus a combination of chaperonin-independent (cotranslational) and chaperonin-dependent (post-translational) mechanisms is operative in the eukaryotic cytosol (48).

Figure 3. Protein folding pathways in the eukaryotic cytosol. Model for the cotranslational folding of a multidomain protein in the eukaryotic cytosol. Folding is assisted by the Hsp70 system independently of the chaperonin. Folding of a completed domain occurs as Hsp70 dissociates from the nascent chain in an ATP-dependent manner.

Protein folding pathways in the eukaryotic cytosol. Model for the cotranslational folding of a multidomain protein in the eukaryotic cytosol. Folding is assisted by the Hsp70 system independently of the chaperonin. Folding of a completed domain occurs as Hsp70 dissociates from the nascent chain in an ATP-dependent manner.

Evidence has been presented that the bacterial translation-folding machinery has a reduced capacity to support the cotranslational and sequential folding of multidomain proteins, when domain folding is slow compared to the rapid speed of bacterial translation (15-20 amino acids per second) (47). It has been proposed that bacterial proteins are generally selected for efficient post-translational folding. Inefficiency of cotranslational folding in bacteria would help to explain not only why many eukaryotic multidomain proteins misfold upon bacterial expression, but also perhaps why the bacterial protein complement is structurally less complex than that of eukaryotic cells. Modular polypeptides are believed to have evolved by random gene fusion events. Thus, if cotranslational folding in bacteria were even partially constrained, the evolution of multidomain proteins by domain shuffling would be less frequent than in organisms more generally able to support sequential domain folding. Future research will have to explore to what extent differences in folding mechanism between bacterial and eukaryotic cells may be responsible for the explosive evolution of modular polypeptides in eukaryotes.

Next post:

Previous post: