High Mobility Group (HMG) Proteins (Molecular Biology)

The major proteins of chromosomes and chromatin are the histones, but other, nonhistone chromosomal components are also present, in much lower amounts. The most abundant and best characterized of these is the class of high mobility group (HMG) proteins. (The name is an operational definition coined when the proteins were first isolated in the 1970s as proteins that were soluble in 2% trichloroacetic acid or 5% perchloric acid and migrated with a high mobility in polyacrylamide gel electrophoresis .) HMG proteins in metazoans fall into three structurally and functionally distinct classes: the HMG 1, 2 family, HMG 14 and 17, and the HMG I(Y) family. (The numbering is also a legacy from the past!) They appear to be present in all cells of higher eukaryotes. They are relatively abundant, being present in the nucleus on average at about 1 molecule per 10 to 15 nucleosomes [considerably less for HMG I(Y)]. All have intriguing properties, and their cellular roles are not fully understood, but they are likely to play important roles in gene expression. None of the three classes shows specificity for particular DNA sequences; instead, they bind to particular structures in DNA or to chromatin. Both HMG 1,2 and HMG I(Y) have "architectural" roles, and both bind primarily to DNA in the minor groove, although they are wholly unrelated in amino acid sequence and structure.

1. The HMG 1 and 2 Family

In vertebrates this class consists of HMG1 (molecular mass ~25,000Da) and two closely related forms of HMG2, which are a few residues shorter. They are the products of homologous genes and show a high degree of evolutionary conservation. They have a tripartite structure consisting of two tandem homologous regions of about 80 amino acid residues (the two DNA-binding "HMG boxes") and a long acidic tail of about 30 (HMG 1) or 20 (HMG 2) consecutive aspartic and glutamic acid residues linked to the second box by a short basic region. HMG 1 and 2 bind to DNA with a "structure-preference" rather than a sequence preference. They bind to bent or bulged DNA and to DNA kinked by the antitumor drug cis-platin, in preference to linear DNA; they also prefer four-way junctions, supercoiled DNA and DNA minicircles, and they stabilize DNA loops (the juxtaposed ends probably resemble crossovers in supercoiled DNA). They bend linear DNA, as shown by the ability to cause circularization (in the presence of DNA Ligaseto join the ends) of short DNA fragments (~80bp) that will not circularize unaided, and they constrain negative supercoils in relaxed circular DNA. The ability to distort DNA, as well as to recognize DNA distortions reflected in these properties, is likely to reflect the in vivo role(s) of HMG 1 and 2.


Relatively abundant HMG-box proteins that bind to DNA without sequence specificity also occur in yeast (NHP6A and B), insects (eg, HMG-D and HMG-Z in Drosophila; HMG1a and b in Chironomus), and plants (HMGa, etc.). In all these cases, the proteins contain only one HMG box and the acidic tail is often shorter than in vertebrate HMG1,2 (eg, 10 residues in HMG-D). All these proteins bind preferentially to distorted DNA substrates (although the Chironomus HMG 1a and 1b bind A-T-rich linear and four-way junction DNA with equal affinity). The HMG-box motif also occurs in a large and growing number of sequence-specific transcription factors, as a single copy, with no acidic tail, and embedded in unrelated sequence (5, 6). The first examples were the product of the male sex-determining gene on the mouse Y chromosome (SRY) and the lymphocyte enhancer binding protein (LEF-1). The specificity of these proteins for their DNA targets resides in the HMG box. They also bind to four-way DNA junctions, and they bend linear DNA containing their DNA binding sites (7). The estimated bend angles are ~130° for LEF-1 and ~90° for mouse and human SRY. Proteins with two or more HMG boxes bind DNA with relatively low sequence specificity. These include the RNA polymerase I transcription factor UBF (upstream binding factor; 4-6 HMG boxes, depending on source, and a very long acidic tail) and the mitochondrial factors mtTF1 and ABF2.

This diverse group of sequence-specific and non-sequence-specific HMG-box DNA-binding proteins is united by the ability to distort and bend DNA, which is probably central to their biological roles. They are regarded as "architectural elements in the assembly of nucleoprotein structures" (7). Their properties are analogous to those of the bacterial IHF (integration host factor) and HU proteins. Indeed HMG-D, NHP6A and HM can all functionally replace HU in Escherichia coli, and HMG 1 and 2 can functionally replace HU in facilitating assembly of the invertasome during Hin-mediated recombination .

Protein structures have been determined by NMR for the A and B HMG boxes of HMG 1, the HMG boxes of Drosophila HMG-D and the transcription factor Sox4, and for the HMG boxes of SRY and LEF-1 complexed with oligonucleotides. Although there are slight differences that might be significant, the protein folds are all generally very similar, showing that there is no gross change in the HMG-box structure on binding to DNA. The HMG box is L-shaped, the relative positions of the two arms being fixed by a tightly packed hydrophobic cluster; Figure 1 shows the A-box of HMG1 (8). The "long" arm contains an ^-terminal extended b-strand packed against a-helix III in the C-terminal half of the molecule; the "short" arm consists of a-helices I and II. DNA binding occurs on the concave face of the protein, which interacts in the minor groove of the DNA. The SRY/DNA and LEF-1/DNA structures (9, 10) show that the protein partially intercalates into the minor groove and causes the DNA to bend toward the major groove, away from the protein (Fig. 2). The minor groove is also opened up (becomes wider and shallower), forming a larger surface for interaction with the protein, and the DNA is locally untwisted. The protein thus sits on the outside of the bend and is very reminiscent of the way in which the TATA box-binding protein TBP interacts with DNA. In both structures a bulky hydrophobic residue (Ile in SRY, Met in LEF-1) intercalates between base pairs near the center of the DNA to cause bending. Binding is stabilized by interaction of a basic region C-terminal to the box with the major groove opposite the widened minor groove. All HMG boxes are likely to bind to DNA in essentially the same way, except that the role of intercalation in the non-sequence-specific proteins is less clear. Sequence-specific and non-sequence-specific HMG boxes also show characteristic differences in residues at key positions in the ^-terminal b-strand.

Figure 1. Structure of the HMG box in the A-domain of HMG1 (residues 11-83 of the protein) determined by NMR spectroscopy (8).

Structure of the HMG box in the A-domain of HMG1 (residues 11-83 of the protein) determined by NMR spectroscopy (8).

Figure 2. Structure of the LEF-1 HMG box complexed with DNA complex, determined by NMR spectroscopy (10).

Structure of the LEF-1 HMG box complexed with DNA complex, determined by NMR spectroscopy (10).

The isolated A and B boxes of HMG 1 bend and distort DNA like the entire protein; there are slight differences between them that might be related to the small differences in structure between the two boxes (8). Tandem boxes have a higher affinity for DNA than single boxes and are more effective than single boxes in DNA binding and bending; the basic linker between the boxes and the acidic tail further enhances the affinity (eg, Ref. 11). The acidic tail works in the opposite direction on most DNA substrates in vitro, by lowering the affinity of the HMG boxes for DNA, possibly by general charge repulsion, and possibly by interacting electrostatically with the DNA-binding face of the box (es). The acidic tail of HMG-D, however, increases the affinity and selectivity of the protein for mini-circles (<100bp), which are highly constrained and effectively present a "pre-bent" substrate (12); the same is true for HMG 1. Perhaps the tail stabilizes a conformation of the protein that is suited to recognition of a highly predistorted substrate.

Surprisingly little is known for certain about the binding of HMG 1 and 2 to chromatin. Early observations on micrococcal nuclease (Staphylococcal nuclease) digestion products of chromatin (3, 4) suggested that HMG 1 and 2 were associated with linker DNA, and that that there was a population of HMG1-containing nucleosomes lacking histone H1. This led to the suggestion that HMG 1 binds to sites normally occupied by H1, but the evidence for this is still circumstantial. However, binding of H1 and HMG 1 to nucleosomes reconstituted on to the Xenopus 5 S rRNA gene resulted in similar protection of chromatosome length DNA against micrococcal nuclease digestion (see Chromatin and Nucleosome), which was interpreted in terms of a shared structural role (13). The finding that, like HMG 1, H1 and its globular domain would also bind to synthetic four-way junctions (14) has been taken to reinforce this (the four-way junction is assumed to mimic the crossover of the entering and exiting duplexes around the dyad of the nucleosome). However, it is not clear that the two proteins are recognizing the same features of the four-way junction. Moreover, given the nature of H1 binding to the nucleosome (see Nucleosome or Chromatin), even if H1 and HMG 1 bind in the same general vicinity, the details of their interactions in chromatin would be expected to be very different. In vivo evidence for a shared functional role in at least some circumstances is more compelling. In Drosophila early embryos, where cell division is rapid, H1 is absent and the condensed chromosomes contain HMG-D (which has only one HMG box). At later stages, the chromosomes contain H1 rather than HMG-D, and gene transcription, which was previously absent, starts (15). This apparently reciprocal relationship suggests that H1 and HMG-D may have a common function at different stages of embryogenesis in Drosophila, although not necessarily an identical binding site on the nucleosome. Rigorous reconstitution studies are needed to settle the question of how HMG 1 and 2 bind to chromatin. It is, of course, possible that they function only at promoters or enhancers, where a nucleosome is absent. For example, by bending DNA they might facilitate the interaction of two proteins, bound on either side of the kink, that is required for some functional purpose; or they could facilitate the binding of a protein that binds best to a DNA bend, by pre-bending the DNA in an essentially catalytic role; or they might also stabilize a DNA loop, such as might be formed to bring enhancer and promoter elements into proximity.

HMG 1 and 2 have been variously implicated in DNA replication, transcription, DNA repair and recombination, and, given their versatility in DNA distortion and recognition of distorted DNA, this is not altogether surprising. In recent in vitro studies of V(D)J recombination of immunoglobulin biosynthesis (see Gene Rearrangement), HMG 1 and 2 were found to stimulate V(D)J cleavage (16) and to be components of a stable post-cleavage complex between synapsed recombination signals (17). An obvious question is whether HMG 1 and 2 have roles in transcriptional activation; distinct roles for the HMG boxes and the acidic tails might be envisaged. In vitro studies (for references see Ref. 2) have given conflicting answers. It has been suggested that activation might result from stabilization of an activated conformation of the TFIID-TFIIA initiation complex on the promoters of activated genes; or, based on in vitro binding assays, that HMG 1 and 2 might act indirectly to promote transcription by promoting the binding of various transcription factors to their cognate DNA binding sites, probably by bending the DNA. In vivo results (transfection experiments) show enhancement of transcription by HMG 1 and suggest that the acidic tail acts as a transactivation domain, although this is not the case for the heterologous mammalian HMG 1 expressed in yeast (2). However, HMG 1 and 2 have also been reported to repress transcription by RNA polymerase II in vitro—either by interacting with TBP in the presence of a TATA-box-containing oligonucleotide and thus preventing binding of TFIIB and formation of a preinitiation complex, or by acting later, after the assembly of the TBP-TFII promoter complex. There are now several examples of facilitation of transcription factor binding to DNA by HMG 1 and 2, and this seems likely to be important. For example, binding of the progesterone receptor to an oligonucleotide containing the progesterone response element is enhanced ~10-fold by HMG 1 and 2, and HMG 2 stimulates the sequence-specific binding of the octamer transcription factors Oct 1 and Oct 2, by interacting with the POU domains (2). HMG 1 stimulates DNA binding of human HOXD9, by interaction with the homeobox domain (18), and of the tumor suppressor p53 (19). Ternary complexes between the transcription factor, HMG protein, and DNA are likely, but in at least some cases the HMG must be only weakly bound, because such complexes cannot be detected by gel electrophoresis.

2. The HMG I(Y) Family

HMG I and Y are isoforms generated by alternative splicing, differing only in HMG Y (96 amino acid residues) having an 11-residue deletion compared with HMG I (107 residues). HMG C (105 residues) is the third member of this family and is related (~50% sequence identity) to HMG I(Y). Like HMG 1 and 2, they appear to function as "architectural" transcription factors, but the two classes are structurally unrelated. The proteins are subject to post-translational modifications, notably cell cycle-dependent reversible phosphorylation, probably by p34cdc2/cyclin ("cdc2 kinase"). They are expressed at higher levels (~15 to 50 times) in transformed and undifferentiated cells than in differentiated somatic cells, relatively independent of cellular growth rate; there is also a correlation with increased metastatic tumor potential. Further background and references may be found in Ref. 2.

The HMG I(Y) proteins have a distinctive primary structure containing three regions designated "AT hooks," separated by flexible chain. HMG C is similar in the A-T hook regions, but much less so elsewhere. The A-T hook binds to the minor grooves of A-T-rich sequences, occupying 5 or 6 bp; three tandem AT hooks would therefore occupy 15 to 18 bp of continuous A-T DNA. The consensus sequence of the A-T hook motif is -Pro-Arg-Gly-Arg-Pro- flanked by basic residues; the motif also occurs in a number of unrelated DNA-binding proteins in a range of species. The polypeptide backbone structure is predicted to be similar in shape to the drugs distamycin A, netropsin, and Hoechst 33258, which bind to the minor groove of B DNA and can displace HMG I(Y) from A-T-rich DNA. The structural similarity is borne out by NMR spectroscopy (see text below). In addition to minor groove binding—although in completely different ways—HMG I(Y) also shares with HMG 1 and 2 the property of binding to A-rich non-B-form DNA, such as four-way junctions, in preference to the corresponding duplexes (20). HMG I(Y) also bends DNA (but in this case bending is attributed simply to asymmetric charge neutralization on one face of the DNA duplex by the basic amino acid side chains in the A-T hook DNA-binding domains) and supercoils relaxed circular DNA in the presence of topoisomerase I, probably through strand unwinding, the mechanism of which is unclear. Removal of the negatively charged C-terminal domain increases supercoiling by 8- to 10fold, apparently without affecting the affinity of HMG I(Y) for A-T-rich DNA (2).

Earlier studies suggested a variety of roles for HMG I(Y) proteins (2). More recently, more direct experiments point to a role in transcriptional regulation (negative or positive) of a number of mammalian genes with A-T-rich promoter or enhancer sequences. In positive regulation, HMG I(Y) has been proposed to act as an "architectural" transcription factor that bends DNA and contacts components of a multiprotein complex, as well as promoting contacts between other members. A well-documented example (21) is the involvement of HMG I(Y) in the virus-induced expression of the human b-interferon gene (IFN-b). HMG I(Y) appears, through a combination of DNA bending and protein-protein interactions, to mediate the assembly of a complex containing HMG I(Y) and the transcription factors NF-kB and ATF-2/c-Jun, all bound simultaneously to two separate "positive regulatory domains" (PRDII and IV) of DNA in the 5′ promoter/enhancer region of the b-interferon gene. NF-kB binds to the major groove and HMG I(Y) to the minor groove. The structure of a 39-residue peptide containing the second and third DNA-binding domains complexed with a 12-bp oligonucleotide containing a 5-bp A-T tract from the PRDII element of the b-interferon enhancer has recently been determined by NMR spectroscopy (22) and reveals extensive hydrophobic and polar contacts in the minor groove centered around an Arg-Gly-Arg motif. In contrast to the HMG box-DNA complexes (see text above), the minor groove is not widened and the DNA is not bent. However, this is a short piece of DNA and, although the protein does not directly cause bending at its binding site, as the HMG box does, bending might well be induced outside the binding site and would be apparent only with a longer DNA segment. Principles similar to those involved at the b-interferon promoter are likely to prove a common feature of the role of HMG I(Y) in many systems—for example, in the function of HIV-1 pre-integration complexes (23).

The situation with respect to the binding of HMG I(Y) to chromatin is not clear. The proteins will bind to nucleosome core particles in vitro, at up to four molecules per nucleosome, in a noncooperative manner. However, they bind to A-T-rich DNA in preference to mixed-sequence nucleosomes and, as pointed out (2), if A-T sequences were present in the linker DNA—or in a nucleosome-free region—that would probably be the preferred binding site for HMG I(Y) in chromatin. There is an intriguing possible connection between HMG I(Y) and chromatin structure through histone H1 (see Histones). In vitro H1 will bind preferentially to A-T-rich fragments of naked DNA, possibly through "-SPKK-" (Ser-Pro-Lys-Lys) motifs in the C-terminal tail, and can be displaced by HMG I(Y). In the nucleus, scaffold attachment regions of chromatin are A-T-rich and have been suggested to be focal points for displacement of H1 by HMG I(Y), perhaps leading to "open chromatin" domains (24); in neoplastic cells, greatly increased levels of HMGI(Y) might contribute to altered patterns of gene expression. This is still speculation, but it is highly suggestive that HMG I(Y) co-localize (as shown by immunofluorescence) with A-T-rich scaffold regions in metaphase chromosomes, which contain the interphase scaffold attachment regions brought together (25).

3. HMG 14 and 17

These evolutionarily related proteins are present in all tissues of most higher organisms. Their precise cellular roles are not understood, although various lines of evidence suggest that they facilitate transcription of chromatin, and it now appears that this is at the level of higher-order structure. Early work reported a correlation between DNAse I-sensitivity of chromatin, which itself correlates with transcriptional competence (see Chromatin), and the presence of HMG 14 and 17 (see Refs. 2 and 3 for background). HMG 14 and 17 contain, respectively, 98 and 89 amino acid residues and have a high content of lysine, alanine and proline and a negatively charged C-terminal region (although there is no continuous run of acidic residues, as in HMG 1 and 2). They bind to nucleosomes in preference to free DNA through a 30-residue, basic sequence that is conserved within the HMG 14 and HMG 17 classes, but differs in detail between the two (2). HMG 14 and 17, despite their common ancestry, are only 60% identical in sequence and are likely to perform different roles in the cell.

Two copies of HMG 14 and 17 bind cooperatively to 146-bp nucleosome core particles (see Nucleosome) at roughly physiological ionic strength, presumably by recognizing particular structural features. In native chromatin, nucleosomes containing only HMG 14 or HMG 17 are clustered in runs of, on average, six nucleosomes, potentially giving functionally distinct domains along the chromatin fiber, the significance of which is not yet clear (26). The free proteins appear to be unstructured in solution but could well become structured upon interaction with the nucleosome core. Binding of HMG 14 is weakened by phosphorylation of Ser 6, which is an early event in the induction of immediate-early genes on mitogenic stimulation (27). Footprinting and cross-linking data have led to a model for the location of the two HMG 14 or 17 molecules on the core particle, in which the HMG proteins are bound by their basic ^-terminal regions to sites 20 or 30 base pairs from the end of the core particle DNA and then loop under one of the DNA ends to contact, in each case, a major groove adjacent to the dyad on the central turn of DNA (2). This could explain why HMG 14 and 17 stabilize nucleosome core particles, by preventing charge repulsion that would lead to unraveling of the ends. In contrast, HMG 14 and 17 appear to destabilize chromatin higher-order structure in some way, although they do not prevent its formation (2), consistent with the facilitating effect of HMG 14 and 17 on transcription. When tested for activator function as fusions to DNA-binding proteins in yeast, HMG 14 and 17 did not act as classical transcription factors, despite an acidic C-terminal region. However, transcription from minichromosomes assembled in Xenopus or Drosophila embryo extracts was stimulated by HMG 14 and 17, in contrast with transcription from naked DNA, consistent with a role for HMG 14 and 17 in destabilizing chromatin structure (2). Enhancement of transcription from assembled SV40 minichromosomes by HMG14 was interpreted as relief of H1-mediated repression, resulting in unfolding, for which the acidic C-terminal region of HMG 14 is needed (28). Specific interactions with the ^-terminal tail of H3 may be involved (29)— yet another role for the ^-terminal tails (see Chromatin).

Next post:

Previous post: