DNA:Protein Binding Specificity (Molecular Biology)

DNA-binding proteins that bind to DNA in a sequence-specific manner have evolved to deal with a problem not normally encountered by enzymes, namely that the substrate, a specific segment of DNA, is immersed in a sea of other DNA sequences on the same molecule that are chemically and structurally very similar to the specific substrate. This is a problem that is fundamentally different from the problem of the discrimination of various small molecular substrates by enzymes, which can be based at least in part on the size and shape of the substrates. More than 20 years ago, it was pointed out that the nonspecific DNA-binding sites might be poor substrates when compared to the specific site, but they nevertheless significantly reduce the concentration of the free protein, simply because they are present in huge molar excess (1, 2). All known site-specific DNA-binding proteins also have a finite affinity for the nonspecific sites (3, 4). In this context, nonspecific DNA-binding is defined as binding that is equiprobable at any particular point along the DNA. In practice, specific and nonspecific binding constants are often determined by measuring the apparent dissociation constants of the complex of a protein with DNA containing either a specific DNA site (S) or a sequence that is completely heterologous (NS):

tmp1E3-145_thumb


Table 1 compares the dissociation constants for specific and nonspecific binding for a number of different types of DNA-binding proteins. When referring to Table 1, it must be kept in mind that the apparent dissociation constants depend significantly on the conditions of the measurement, such as the temperature (5-15), pH (15-20), and concentration and type of cations and anions (13, 14, 16, 20, 22). A second complication stems from the fact that the stability of the specific complex is in some cases dependent on the DNA sequences flanking the specific binding site. A 17-fold difference in the dissociation constants is seen for the complexes of the restriction enzyme EcoRV and DNA sequences containing a cognate binding site with different flanking sequences (20). In addition, the apparent dissociation constants of the protein complexes with nonspecific DNA depend on the length of the DNA probe. All values for ^(NS) in Table 1 were corrected for the length of the DNA probe by dividing the measured ^(NS) by the twofold difference between the length of the probe and the length of the DNA-binding site of the specific complex.

Table 1. Specificities of Prokaryotic and Eukaryotic DNA-Binding Proteins

Protein

Specific Site

tmp1E3-146

- DDG (kcal/mol) Reference

Repressors lac

tmp1E3-147 tmp1E3-148

10.1 27

l ci

tmp1E3-149 tmp1E3-150

8.7 28

tmp1E3-151 tmp1E3-152

7.7

tmp1E3-153 tmp1E3-154

6.4

l Cro

tmp1E3-155 tmp1E3-156

6.6 29

tmp1E3-157 tmp1E3-158

5.1

10

OR3

tmp1E3-159 tmp1E3-160

7.3

trp

Wild-type op.

tmp1E3-161 tmp1E3-162

5.8

8

P22 Arc

Left half-site

tmp1E3-163 tmp1E3-164

4.3

30

MetJ

MetBox

tmp1E3-165 tmp1E3-166

3.4

9

Restriction Endonucleases

tmp1E3-167 tmp1E3-168

EcoRI

GAATTC

tmp1E3-169 tmp1E3-170

10.3

23

EcoRV

GATATC

tmp1E3-171 tmp1E3-172

6.0

20

Transcription Factors

tmp1E3-173 tmp1E3-174

CAP

lac Promoter

tmp1E3-175 tmp1E3-176

6.8

31

IHF

l attP H’

tmp1E3-177 tmp1E3-178

4.4

32

TBP

TATAAAAG

tmp1E3-179 tmp1E3-180

>4.6

33

MEF-2C

TATAAATA

tmp1E3-181 tmp1E3-182

>2.7

34, 35

GCN4

ATGACTCAT

tmp1E3-183 tmp1E3-184

2.0

13

ATGACGTCAT

tmp1E3-185 tmp1E3-186

2.2

MASH-1

CAGGTG

tmp1E3-187 tmp1E3-188

0.7

15, 36

E12

CAGGTG

tmp1E3-189 tmp1E3-190

1.5

36

MyoD/E12

CAGGTG

tmp1E3-191 tmp1E3-192

1.4

37

Table 1 shows that the dissociation constants of the specific complexes span a range of approximately six orders of magnitude. The tightest complexes have Kd(S) values that lie in the picomolar concentration range. Interestingly, the specific DNA complexes of bacterial proteins are generally more stable than the complexes of eukaryotic DNA-binding proteins, most probably due to the longer DNA-binding sites in the prokaryotic complexes.

The dissociation constants of the nonspecific complexes listed in Table 1, on the other hand, span only approximately four orders of magnitude. For the nonspecific complexes, the eukaryotic proteins bind to DNA more tightly than the prokaryotic ones. Therefore, the DNA-binding specificity (defined as K d(NS)/Kd(S)) of prokaryotic DNA-binding proteins is, in most cases, significantly greater than that of eukaryotic transcription factors.

It is interesting to consider these observations in the context of the size of both the bacterial and the  mammalian genomes. The E. coli genome consists oftmp1E3-193_thumb. Of the proteins listed in Table 1, the restriction enzyme EcoRI displays the highest DNA-binding specificity (Xd(NS)/Xd tmp1E3-194_thumbThe specific DNA-binding site of EcoRI has the sequence GAATTC. Such a hexamer sequence would occur statistically approximately 1000 times in the E. coli genome. As a consequence, approximately 1.1 x 104 times more protein is bound to the specific DNA site than to the nonspecific sites:

tmp1E3-195_thumb

On the other hand, to bind 50% of the time to a unique binding site of the mammalian chromosome would require that the specificity of a transcription factor be > 3 x 109 (the size of the mammalian chromosome). Statistically, a minimal length of 16 bp is required to ensure that a given binding site is unique on the mammalian chromosome. Most transcription factors bind, however, to DNA sites that are too short to be unique on the mammalian chromosome. Proteins containing the basic helix-loop-helix motif (BHLH), for example, bind to the sequence CAGGTG, which occurs approximately 7 x 105 times on a mammalian chromosome. The expression of MyoD, which recognizes DNA through a BHLH domain, can activate myogenesis in a wide variety of cell types including myoblasts and fibroblasts (24, 25), while the BHLH-protein MASH-1 promotes the differentiation of committed neuronal precursor cells (26). BHLH proteins need to bind to DNA with a specificity of approximately 4 x 10J in order to bind with equal probability to a nonspecific site and to one of the approximately 700,000 specific sites on the mammalian chromosome. But even then, MASH-1 would still activate transcription from MyoD target promoters and vice versa. Such arguments may be part of the explanation why transcriptional regulation in higher organisms relies on multiprotein complexes with the potential for combinatorial interactions.

Next post:

Previous post: