DNA-binding proteins that bind to DNA in a sequence-specific manner have evolved to deal with a problem not normally encountered by enzymes, namely that the substrate, a specific segment of DNA, is immersed in a sea of other DNA sequences on the same molecule that are chemically and structurally very similar to the specific substrate. This is a problem that is fundamentally different from the problem of the discrimination of various small molecular substrates by enzymes, which can be based at least in part on the size and shape of the substrates. More than 20 years ago, it was pointed out that the nonspecific DNA-binding sites might be poor substrates when compared to the specific site, but they nevertheless significantly reduce the concentration of the free protein, simply because they are present in huge molar excess (1, 2). All known site-specific DNA-binding proteins also have a finite affinity for the nonspecific sites (3, 4). In this context, nonspecific DNA-binding is defined as binding that is equiprobable at any particular point along the DNA. In practice, specific and nonspecific binding constants are often determined by measuring the apparent dissociation constants of the complex of a protein with DNA containing either a specific DNA site (S) or a sequence that is completely heterologous (NS):
Table 1 compares the dissociation constants for specific and nonspecific binding for a number of different types of DNA-binding proteins. When referring to Table 1, it must be kept in mind that the apparent dissociation constants depend significantly on the conditions of the measurement, such as the temperature (5-15), pH (15-20), and concentration and type of cations and anions (13, 14, 16, 20, 22). A second complication stems from the fact that the stability of the specific complex is in some cases dependent on the DNA sequences flanking the specific binding site. A 17-fold difference in the dissociation constants is seen for the complexes of the restriction enzyme EcoRV and DNA sequences containing a cognate binding site with different flanking sequences (20). In addition, the apparent dissociation constants of the protein complexes with nonspecific DNA depend on the length of the DNA probe. All values for ^(NS) in Table 1 were corrected for the length of the DNA probe by dividing the measured ^(NS) by the twofold difference between the length of the probe and the length of the DNA-binding site of the specific complex.
Table 1. Specificities of Prokaryotic and Eukaryotic DNA-Binding Proteins
Protein |
Specific Site |
- DDG (kcal/mol) Reference |
|
Repressors lac |
10.1 27 |
||
l ci |
8.7 28 |
||
7.7 |
|||
6.4 |
|||
l Cro |
6.6 29 |
||
5.1 |
Table 1 shows that the dissociation constants of the specific complexes span a range of approximately six orders of magnitude. The tightest complexes have Kd(S) values that lie in the picomolar concentration range. Interestingly, the specific DNA complexes of bacterial proteins are generally more stable than the complexes of eukaryotic DNA-binding proteins, most probably due to the longer DNA-binding sites in the prokaryotic complexes.
The dissociation constants of the nonspecific complexes listed in Table 1, on the other hand, span only approximately four orders of magnitude. For the nonspecific complexes, the eukaryotic proteins bind to DNA more tightly than the prokaryotic ones. Therefore, the DNA-binding specificity (defined as K d(NS)/Kd(S)) of prokaryotic DNA-binding proteins is, in most cases, significantly greater than that of eukaryotic transcription factors.
It is interesting to consider these observations in the context of the size of both the bacterial and the mammalian genomes. The E. coli genome consists of. Of the proteins listed in Table 1, the restriction enzyme EcoRI displays the highest DNA-binding specificity (Xd(NS)/Xd The specific DNA-binding site of EcoRI has the sequence GAATTC. Such a hexamer sequence would occur statistically approximately 1000 times in the E. coli genome. As a consequence, approximately 1.1 x 104 times more protein is bound to the specific DNA site than to the nonspecific sites:
On the other hand, to bind 50% of the time to a unique binding site of the mammalian chromosome would require that the specificity of a transcription factor be > 3 x 109 (the size of the mammalian chromosome). Statistically, a minimal length of 16 bp is required to ensure that a given binding site is unique on the mammalian chromosome. Most transcription factors bind, however, to DNA sites that are too short to be unique on the mammalian chromosome. Proteins containing the basic helix-loop-helix motif (BHLH), for example, bind to the sequence CAGGTG, which occurs approximately 7 x 105 times on a mammalian chromosome. The expression of MyoD, which recognizes DNA through a BHLH domain, can activate myogenesis in a wide variety of cell types including myoblasts and fibroblasts (24, 25), while the BHLH-protein MASH-1 promotes the differentiation of committed neuronal precursor cells (26). BHLH proteins need to bind to DNA with a specificity of approximately 4 x 10J in order to bind with equal probability to a nonspecific site and to one of the approximately 700,000 specific sites on the mammalian chromosome. But even then, MASH-1 would still activate transcription from MyoD target promoters and vice versa. Such arguments may be part of the explanation why transcriptional regulation in higher organisms relies on multiprotein complexes with the potential for combinatorial interactions.