DNA:Protein Interaction Thermodynamics Part 1 (Molecular Biology)

1. Specific Interactions

The precise recognition of a defined DNA sequence by a DNA-binding protein requires an optimal shape complementarity between the interacting species. As a result, a large number of noncovalent interactions form at the interface between the DNA and the protein. Individual interactions often contribute only a small amount to the overall stability of the complex. Nevertheless, all of these interactions are important for the preferred binding of the protein to the specific DNA site (or the discrimination of the nonspecific site).

Early concepts about the determination of specificity in DNA protein complexes relied on "direct-readout," which is based on the interaction between the functional groups of the proteins and the DNA bases (1, 2). Structural studies of DNA-protein complexes showed that specific interactions are mediated by hydrogen bonds, electrostatic interactions between charged groups, and van der Waals interactions. Constrained hydrogen bonding interactions can be made between the polypeptide backbone or short side chains and the nucleobases. Longer side chains make more flexible contacts with the DNA. They are, however, often orientated in space by other amino acid side chains or by making two contacts to the same base, to adjacent bases, or to a base and a phosphodiester group. In these cases, the enthalpic gain of the interaction is often reduced by a negative entropic contribution stemming from the reduction of conformational flexibility of the amino acid side chain. The X-ray crystallography structures of the DNA complex of the basic helix-loop-helix (BHLH) proteins MyoD (3) and E47 (4) provide examples of bidentate and bridging interactions. Arg(111) of MyoD, which contacts both N(7) of guanine and an oxygen of the adjacent phosphodiester, provides an example of a bridging interaction (Fig. 1). A hydrogen bond between N(e) of Arg(111) and the hydroxyl group of a threonine residue further restricts the mobility of Arg(111). Arg(111) also forms a bidentate interaction with guanine through hydrogen bonds with N7 and O4, the latter being mediated by a water molecule. Another example is provided by the carboxyl group of a Glu(47), which makes hydrogen bonds to N7 of an adenine and to N4 of the adjacent thymine, thereby specifying the first two bases of the Glu(47) binding site (Fig. 2). A neighboring arginine residue orientates the glutamate in space through the formation of a "clamp" between the phosphate backbone and the peptide. An important consequence of the formation of such hydrogen bonding networks that severely restrict the conformational flexibility is that the specificity is achieved, at least in part, through selection of the smallest number of destabilizing interactions (5). The restriction enzymes EcoRI, EcoRV, and BamHI rely for specificity to a large extent on discrimination against noncognate sites (Ref. 6 and references therein).


Figure 1. Diagram showing the structure of the BHLH domain of MyoD from amino acid Asp(109) to Thr(115) and the key contacts of Arg(111) with guanine and the adjacent phosphodiester of the E-box sequence CANNTG (60). Note that the contact of Arg(111) to O6 of guanine is mediated by a water molecule and the stabilization of the arginine side chain through an hydrogen bond with the hydroxyl group of Thr(115). This display was created from the coordinates of the DNA complex of MyoD (3).

Diagram showing the structure of the BHLH domain of MyoD from amino acid Asp(109) to Thr(115) and the key contacts of Arg(111) with guanine and the adjacent phosphodiester of the E-box sequence CANNTG (60). Note that the contact of Arg(111) to O6 of guanine is mediated by a water molecule and the stabilization of the arginine side chain through an hydrogen bond with the hydroxyl group of Thr(115). This display was created from the coordinates of the DNA complex of MyoD (3).

Figure 2. Networked hydrogen-bonding interactions between glutamate(345) and arginine(348) of E47 and the CpA dinucleotide of the E-box DNA sequence (5). E(345) is hydrogen-bonded to N4 of cytosine and to N6 of adenine. Note the "clamp" function of R(348) which connects the phosphate backbone to E(345), thereby locking the conformation of the side chain of E(345). The b and g-CH2 groups of E(345) make van der Waals contacts to the methyl group of thymine. The program MacMoMo (183) was used to create this display from the coordinates of the DNA complex of E47 (4).

 Networked hydrogen-bonding interactions between glutamate(345) and arginine(348) of E47 and the CpA dinucleotide of the E-box DNA sequence (5). E(345) is hydrogen-bonded to N4 of cytosine and to N6 of adenine. Note the "clamp" function of R(348) which connects the phosphate backbone to E(345), thereby locking the conformation of the side chain of E(345). The b and g-CH2 groups of E(345) make van der Waals contacts to the methyl group of thymine. The program MacMoMo (183) was used to create this display from the coordinates of the DNA complex of E47 (4).

To the best of our knowledge, there is no site-specific DNA-binding protein for which a complete thermodynamic description of the association reaction with DNA is available, and only relatively few data are available for eukaryotic proteins. However, thermodynamic studies with the prokaryotic Lac, Mnt, trp, l Cro, Arc, and l ci repressors (7-16), the restriction endonucleases EcoRI (17-22), EcoRV (6, 23), BamHI (6), and RsrI (20), the cyclic AMP receptor protein (CRP)/catabolite activating protein (CAP) (24), the DNA-binding domain of glucocorticoid receptor (25), and the consequences of altered interactions between amino acid side chains and bases have been addressed through site-directed mutation of either the protein or the DNA. The results from these experiments are difficult to interpret, because of a multitude of subtle energetic changes resulting from the altered bond, perturbation of the local structure of the DNA and/or the protein, and changes in solvation. Nevertheless, a study by the group of Jen-Jacobsen of the interaction of EcoRI with a series of base- analogues in the absence of Mg indicated that each hydrogen bond between the protein and the DNA stabilizes the complex by approximately 1.4 kcal/mol (19, 21). For the DNA complex of Trp repressor, the replacement of N7 of individual purines with carbon led to an increase in the free energy of binding of approximately 1 kcal/mol when the particular nitrogen atom formed a hydrogen bond to the repressor in the wild-type complex (26). Several other studies confirmed a value of 1 to 1.5 kcal/mol of stabilization for every hydrogen bond formed (23, 27). On the other hand, replacing an adenine, the N6 of which forms a hydrogen bond with a glutamate of the basic helix-loop-helix (BHLH) protein E12 (Fig. 2), with a purine led to a reduction of only approximately 0.8 kcal/mol of the stability of the E12 DNA complex in electrophoretic mobility shift assay experiments (5).

Van der Waals interactions between the protein and the methyl group of thymine that projects into the major groove in B-DNA provide another possibility for the formation of specific interactions. A careful thermodynamic study of the interaction between the methyl group of thymine -5 of the Or3 operator and a methylene group of Lys32 of the Cro repressor from phage l indicated that the van der Waals interaction stabilizes the complex by 1.6 kcal/mol (7). When this methyl group was removed by replacing thymine with uracil, the binding enthalpy was reduced, with no observable change in the binding entropy, as would be expected for such an interaction, especially because the conformation of Lys33 in Cro is stabilized by a bidentate interaction with N7 and O6 of guanine (-4) of Or3. However, the creation of a cavity by the removal of the methyl group could potentially lead to the trapping of a water molecule. Such a case has been observed for the DNA complex of the DNA-binding domain of the glucocorticoid receptor, where a favorable enthalpic contribution compensated for the unfavorable entropic effect of approximately 1 kcal/mol when a methyl group was removed from a thymine (28) that did not make specific contacts to the protein (29). Other studies confirmed that van der Waals interactions between a protein and a thymine methyl group stabilize the complex by between 0.5 and 2 kcal/mol (7, 19, 23, 24, 27, 30-33).

DNA-binding proteins make extensive contacts to the phosphodiester groups of the DNA backbone through charged and uncharged side chains. It is not easy to estimate the contribution from these interactions to the overall stability of the complex. One approach is to determine the contribution of the polyelectrolyte effect from the dependence of the reaction free energy on the concentration of univalent salts and to add a contribution for the interactions between the DNA phosphates and uncharged amino acid residues. It is important to keep in mind that the polyelectrolyte effect is a consequence of the fact that DNA is a polyanion of high axial charge density (reviewed in Refs. 11, 34, and 35). The association reaction is therefore driven by the release of monovalent cations (at least at low salt concentrations) (36-38). The salt-dependence of binding to various specific and nonspecific DNA sequences is often different, indicating that protein-phosphate contacts are involved in determining the specificity of DNA binding (39-44).

Anions bind to proteins with relatively small binding constants (45), and their effects are most likely not of electrostatic nature (see Hofmeister Series and Salting In, Salting Out). Anions seem to affect the dissociation constants of DNA-protein complexes, especially at high anion concentrations (46, 47). The nature of the anion is important, and replacing chloride with glutamate can increase the sequence-specific DNA-binding affinity by as much as 80-fold (43). At 25°C in a low salt buffer, the leucine zipper protein GCN4 binds to an ATF/CREB site (ATGACGTCAT) and to an AP-1 site (ATGACTCAT) with the same affinity. In a buffer containing 250 mM potassium glutamate, however, the dissociation constant of the AP-1 complex was approximately one order of magnitude smaller than for the ATF/CREB complex (48).

In order to understand fully the DNA-binding specificities displayed by proteins, it is necessary to define the energetic differences between the interactions in specific and nonspecific complexes (see DNA:Protein Binding Specificity). Model studies of the DNA-binding reactions of oligopeptides suggest that the nonspecific complex is a loose association held together by Coulombic interactions between positively charged residues of the protein and the negatively charged phosphate backbone of the DNA (49, 50). In these studies, the number of DNA phosphates contacted, as well as the number of monovalent cations thermodynamically released on binding, was found to be approximately equal to the number of positively charged side chains in the peptides (49-56). Thermodynamic studies indicate that the number of phosphate contacts is often greater in the nonspecific complexes than in the specific ones (6, 11, 57).

Only two structures of nonspecific complexes are currently known, namely those involving the restriction endonuclease EcoRV (58, 59) and the DNA-binding domain of glucocorticoid receptor (29). Glucocorticoid receptor binds to DNA sequences containing inverted repeats separated by three base pairs. In the X-ray crystallography structure analysis of the DNA complex of glucocorticoid receptor, an oligonucleotide was used where the spacing had been increased to four base pairs. Therefore, while one protein subunit bound specifically to the one inverted repeat, the other faced a nonspecific binding site. Both subunits introduced an a-helix into the major groove of the oligonucleotide, which was opened up by approximately 2 Aring; on the specific side. This distortion is most likely achieved through the higher number of interactions in the specific complex. Comparison of the specific and the nonspecific complexes of EcoRV revealed that a loop that penetrates the major groove is partially disordered in the nonspecific case and not well-buried in the groove. The buried surface area ( vide infra) is more than 1570 Aring;2 larger in the specific complex than in the nonspecific one (58).

An interesting case is provided by the BHLH protein MASH-1. The DNA-binding specificity of MASH-1 was found to be very low for all conditions studied (39, 60). While no three-dimensional structure for a "nonspecific" complex of MASH-1 or any other BHLH protein has been determined, the available data suggest that the specific and "nonspecific" complexes are very similar (5, 39, 60, 61), and it appears that BHLH proteins might have only one binding mode. This is most probably a consequence of the exposed recognition a-helix that adopts its helical conformation only on DNA binding (39, 60). In many other proteins that rely on an a-helix for DNA recognition, the conformation of the recognition helix is stabilized through interactions with other parts of the DNA-binding domain.

2. Water-Mediated Contacts

Specific interactions between a protein and DNA are often mediated by bound water molecules. General thermodynamic arguments show that the entropic cost of transferring a water molecule from bulk water to a specific site within a protein or at the interface between the protein and DNA is in the range of 0 to 7 cal mol-1^"1, corresponding to a free energy change of between 0 and 2 kcal mol-1 at ambient temperature (62). The free energy cost varies thereby with the polarity of the environment of the bound water.

An example of a specific contact that is mediated by a water molecule is shown in Figure 1. The interaction between N(h) of Arg(111) of MyoD and the O6 of guanine is mediated by a well-ordered water molecule (3). Other examples of specific interactions that are mediated by well-resolved water molecules have been found in the DNA complexes of ci repressor (63), Hin recombinase (64), the restriction endonuclease BamHI (65), the DNA-binding domains of estrogen receptor (66), papillomavirus-1 E2 (67), and the antennapedia(C39 S) homeodomain (68). The DNA complex of the transcription factor GATA-1 is devoid of water molecules that mediate contacts between the protein and the nucleobases, but several water molecules were found between the protein and the phosphate backbone (69).

The 1.9 Aring; resolution crystal structure of the complex between the Trp repressor and its operator revealed that water-mediated contacts are the principal means for site-specific recognition (70, 71). Of the 90 water molecules that are intrinsic to the chemistry of the complex, 26 are located within the protein-DNA interface (13 in each half-site). Three of these 13 mediate four contacts between the nucleobases and the protein. Interestingly, these three hydration sites are already fully occupied in the free DNA (72). The water molecules should therefore be considered as intrinsic parts of the DNA structure and as noncovalent extensions of the DNA molecule that can be used for the stereospecific recognition of the trp operator (see TRP Operon). Similarly, the water molecule that mediates the contact between the backbone NH of Asn(79) and the N7 of guanine is present in all of the refined crystal structures of the uncomplexed repressor, and the water between Asn(80) and the DNA is seen in most of the refined structures of the free protein (73). These water molecules can therefore be considered functional extensions of the protein surface.

While water molecules are clearly important for both affinity and specificity in many DNA complexes of transcription factors, it must be stressed that some DNA-protein interfaces are completely devoid of water. For example, the 3100 Aring, interface between TBP and TATA-box containing DNA sequences is characterized by a perfect complementarity and complete exclusion of water (74-77).

3. Dehydration Effects

Thermodynamic studies demonstrated that site-specific DNA-binding reactions are characterized by negative and relatively large changes in the heat capacity. As a consequence, the enthalpic (D H) and the entropic (TD S) contributions to the free energy change (D G) of the binding reactions vary with temperature in an almost parallel manner, making DG nearly independent of temperature. Again, there are much more data available for prokaryotic DNA-binding proteins. Only recently have good thermodynamic data for the interaction between eukaryotic transcription factors and DNA become available. The reactions studied include the association reactions between histones and DNA (78), RNA polymerase Es and the P r promoter of lambda phage (79), Lac repressor and the lac O+ operator (80), the mnt repressor and the mnt operator site Omnt (81), the headpiece of Lac repressor and the lac operator (82), the restriction endonuclease EcoRI and its cognate binding site (82), Cro repressor and the Or3 operator site (7), the DNA-binding domain of glucocorticoid repressor and a glucocorticoid response element (25, 28), Trp repressor and operator (8, 9), l cI repressor with various combinations of the three operator sites Or(1), O r(2), and Or(3) (10), the transcriptional activator cyclic AMP response protein (CRP) complexed with two molecules of cAMP and its consensus binding site (83, 84), the transcription factor GCN4 and both an ATF/CREB and an AP-1 binding site (48), and the DNA-binding domain of the basic helix-loop-helix (BHLH) protein MASH-1 and E-box containing and heterologous DNA sequences (39).

It is now generally accepted that large negative values of the heat capacity change, Dcp, are the hallmark of biological reactions that form large highly complementary interfaces, irrespective of the overall stability of the complexes formed (9, 82). The areas buried in specific protein DNA complexes range from ~ 1000 to ~ 5500 Aring;2 (29, 39, 48, 75, 84, 85), and the heat capacity changes are caused by the removal of large amounts of nonpolar surface area from water on complex formation, accompanied by release of water. Although the change in water-accessible nonpolar surface area makes the dominant contribution to D cp, the changes in water accessibility of the polar surface areas (mainly due to the burial of the peptide backbone) also make a smaller contribution of opposite sign (86) (see Hydration).

Because the observed values of Dc p for the DNA-binding reactions of some proteins are too large to be accounted for solely by the amount of buried surface area in a "rigid body" association, conformational changes of both the protein and the DNA appear to occur (see below) (84, 87, 88).

Binding of proteins to DNA in a nonspecific fashion involves almost no changes in heat capacity (7, 9, 48), indicating that the formation of nonspecific complexes does not involve major dehydration. An interesting exception is provided by the DNA-binding domain of the transcription factor MASH-1. The association reaction between MASH-1 and an E-box-containing oligonucleotide, the natural target of MASH-1, is characterized by a heat capacity change of -733(±99)cal mol-1 K-1, while the formation of a complex with heterologous DNA results in a Dcp of -575cal mol-1 K-1 ( ). X-ray crystallography studies of the specific complexes of BHLH proteins showed that the DNA is contacted by an a-helix that fits snugly into the major groove (3, 4). Circular dichroism spectroscopy suggested that the protein conformation of MASH-1 was rather similar in the "specific" and the "nonspecific" complexes (39, 60). Unlike other DNA-binding proteins, all DNA complexes of MASH-1 appear to show the thermodynamic characteristics of specific complexes, irrespective of the particular DNA sequence.

The number of water molecules released on formation of a complex between a protein and DNA can be estimated by measuring the dissociation constant as a function of the osmotic strength altered through the addition of neutral salts (89-91). Such osmotic-stress methods have so far been applied to only a few DNA-binding reactions, namely the restriction endonucleases EcoRI (92, 93), EcoRV and PvuII (94), the gal repressor (95), Hin recombinase (96), the cyclic AMP receptor from E. coli (97), and the homeodomain containing the transcriptional activators ultrabithorax and deformed (98).

Formation of the complex between CRP and the C1 site in the lac promoter is accompanied by the release of 79 (±11) water molecules, while 56 (±10) water molecules are taken up when CRP is transferred from the C1 site to a nonspecific site (97). Depending on the neutral salt used, between 100 and 180 water molecules are released when gal repressor binds to the Oi operator site (95). An additional 6 (±3) waters are released when the repressor is transferred from Oi to Oe, to which it binds with enhanced affinity. Interestingly, despite the close sequence similarity in their homeodomains, water activity affects differentially the DNA binding of Ultrabithorax and Deformed (98). Between 22 and 27 water molecules were released for DNA binding by Ultrabithorax, while only 5 water molecules were released when Deformed bound to its optimal sequence. On the other hand, the DNA sequence did not exert a strong effect on the magnitude of the water release associated with DNA binding by Ultrabithorax.

The osmotic-stress methods yield a value for the number of water molecules released, which is the difference between the water molecules released and those taken up, for example as a consequence of the exposure of additional surface area due to unfolding.

4. Conformational Changes of the Protein on DNA Binding

Even though in some specific complexes the tightly packed interfaces between the protein and the DNA result from the docking of well-ordered, preexisting surfaces, in an increasing number of cases the conformations of both the protein and the DNA are found to change markedly in the complex. The DNA-binding reaction of the BHLH protein MASH-1, for instance, is characterized by a transition of the peptide from a largely unfolded to a mainly a-helical conformation (60). Even at concentrations well above the dimerization constant for the MASH-BHLH domain, where the HLH domain is stably folded, the basic region adopts an ordered conformation only upon binding DNA (39). A similar transition was observed with the DNA-binding domains of GCN4, Fos, and Jun, where the basic region undergoes a transition to an a-helical structure upon binding to DNA (99101). In these cases, the association does not result from a simple alignment of rigid, complementary surfaces, but rather follows what is generally known as an "induced fit" mechanism (102).

Because of a reduction in water-accessible surface area, folding transitions that occur on DNA binding result in a negative heat capacity change Dc p (see text above). Because experimental values for D cp are often too large to be accounted for simply by the reduction in water-accessible surface area in a rigid body association, Spolar and Record (84) have suggested how to dissect the various contributions to the entropy change. Consider as an example the DNA-binding reaction of the BHLH domain of MASH-1. This reaction shows a strong temperature dependence for both the measured DH and TD S, which compensate to make DG almost insensitive to temperature. A notable consequence is the existence of a temperature T^, for which TDS changes sign. Therefore, the following equation holds at T^:

tmp1E3-198_thumb

The total change in entropy consists of a contribution DS^ from the hydrophobic effect, the unfavorable entropic term D Srt, due to the reduction in rotational and translational degrees of freedom on association, the contribution from the polyelectrolyte effect DS PE, and DSother, which results primarily from conformational changes in the protein and/or the DNA (84). DSHe ( can be calculated from measured thermodynamic data according to the equation

tmp1E3-199_thumb

For MASH-1, T s was determined as 271 K and a value of -357 cal mol 1 K 1 was calculated for DShe (T s) (39). The polyelectrolyte effect could be estimated from the salt dependence as 50 cal mol-1 K-1 ((39); Meierhans, unpublished results), while DS rt for a bimolecular reaction was taken to be -50 cal mol-1 K-1 (103, 104). The change in entropy resulting from local folding transitions coupled to DNA could therefore be calculated as 357 cal mol-1 K-1. This was interpreted to indicate that approximately 54 amino acid residues of the basic region are involved in the folding reaction, or 27 residues per BHLH subunit; this interpretation is supported by CD studies of the DNA-binding reaction of MASH-BHLH and nuclear magnetic resonance (NMR) studies of the BHLH protein E47 (39, 105).

Comparison of the experimentally determined and the calculated Dcp for the association reaction of the transcription factor GCN4 and DNA suggested that approximately 7 amino acid residues of the basic region of this basic-zipper protein underwent a transition from a random to an a-helical conformation (48). The high-resolution structures of uncomplexed Trp repressor and its specific DNA complex indicate that 16 residues of helix D are disordered in the free repressor and a-helical in the DNA complex (106-109) [it should be pointed out that this interpretation of the structural data has been challenged (35)]. The same value for the number of residues that change conformation upon DNA binding was obtained from thermodynamic data, where D Sother was determined as –

tmp1E3-200_thumb

Structural data suggest that conformational changes also occur for the DNA-binding reaction of the DNA-binding domain of glucocorticoid receptor (110-113). These results are in good agreement with thermodynamic measurements and indicate a folding transition in approximately 18 residues per protein subunit (25). Another example of a conformational change is provided by the Antennapedia homeodomain. An N-terminal extension that is flexible in the free protein becomes ordered on DNA binding and contacts the minor groove of the DNA (114-116). Unfortunately, there are no thermodynamic data available for this system.

Thermodynamic studies suggest conformational changes for the DNA-binding reactions of the following proteins: the lac (117), Gal (118), and Mnt (84) repressors binding to their operator sequences, and RNA polymerase binding to the l Pr promoter.

For a number of DNA-binding proteins, complex formation is accompanied by changes in the tertiary or quaternary structure. For example, thermodynamic analysis of the DNA-binding reaction of the l Cro protein, under conditions where it exists as a stably folded dimer in solution, indicated a relatively small D Sother of 18 cal mol-1 K-1 (7). Information from the crystal structures of both free and complexed Cro suggests that this entropy change may reflect changes in quaternary structure of the Cro dimer on DNA binding (119, 120).

Other proteins for which structural evidence exists for coupled structural changes on binding to DNA include the endonuclease EcoRV and the transcriptional activator CRP. The DNA-binding pocket of EcoRV is not accessible in the free protein, and the cleft between the two protein subunits is too narrow to accommodate the DNA. Consequently, the pocket must be opened up through a combination of tertiary and quaternary structural transitions (58). For CRP, the relative orientations of the amino- and the carboxy-terminal domains change significantly on DNA binding. However, the structures of the DNA complexes of CRP and EcoRV revealed that optimal shape complementarity between these proteins and DNA is achieved not only through conformational adaptations of the protein, but also through changes in the structure of the DNA.

Next post:

Previous post: