Bithorax Complex (Molecular Biology)

The homeotic genes of the bithorax complex (BX-C) have been the subject of many landmark discoveries in the field of developmental biology. Homeotic genes were first identified in Drosophila by mutations that affect their expression. These mutations lead to spectacular effects on the morphology of the fly; they cause the development of a specific body structure (ie, segment, antenna, leg, wing) at the place where another structure normally develops. The first homeotic mutations, bithorax (bx) and bithoraxoid (bxd), were described in 1923 by Bridges and Morgan (1). For example, in flies homozygous for bxd mutations, their first abdominal segment (A1) develops like a copy of the segment immediately adjacent anteriorly—namely, the third thoracic segment (T3). Thus, the role of the bxd+ function is to assign the identity of A1. In the late 1940s, Ed Lewis pioneered the field of developmental genetics by discovering the existence of additional homeotic mutations near bx and bxd. Although these mutations seem to affect different body segments of the thorax and abdomen, their complementation patterns turned out to be rather complex. Because they all mapped at the same genetic and cytological position, Lewis decided to name the locus the bithorax complex (BX-C). The work of Lewis was compiled in a milestone article in 1978 (2) in which he describes a series of homeotic mutations of BX-C that affect the identities of the third thoracic segment and all of the abdominal segments. Mutations in any of them transform the considered segment into a copy of the segment immediately anterior. Remarkably, genetic mapping revealed that these mutations are arranged on the chromosome in the same order along the anteroposterior axis as the segments they affect. By combining three mutations, Lewis produced flies with four wings instead of two (transformation of T3 into T2). Because such animals look like more ancestor forms of insect, Lewis proposed that homeotic genes played an instrumental role in evolution. The correspondence between the order of the genes on the chromosome and the order of the segments on the body of the fly has now attained almost mystical status and has turned out to be true also of the vertebrate homologues of BX-C. The Nobel Prize Committee has recognized this pioneering works by awarding its 1995 Nobel Prize of Medicine to Ed Lewis and two other Drosophila geneticists (Y. Nusslein-Volhard and E. Wieschaus).


In 1978, BX-C was the first Drosophila gene cloned from the chromosome without any prior knowledge of the products. To clone what turned out to be a large complex, Bender, Spierer, and Hogness developed the method of chromosome walking (3). Finally, in 1983, the molecular characterization of the BX-C led to the discovery of the homeobox in the laboratories of W. Gehring and of M. Scott (4, 5).

1. Molecular Genetics of BX-C

Figure 1 summarizes the molecular genetics of the BX-C. The complex covers 300 kbp of DNA, which are represented by the thin horizontal line marked off in kb (6, 7). Above the DNA line are represented the sites of the homeotic mutations that affect the identities of each of the segments under the control of the BX-C. The vertical arrows represent the sites of chromosomal rearrangement breaks, the triangles the sites of insertion of transposons, and the horizontal lines the extent of deletions. Expression studies and analysis of the homeotic phenotypes in embryos have revealed that the unit transformed in each of these nine classes of mutations does not correspond to body segments; instead, it is composed of the posterior part of one segment and the anterior compartment of the next segment. These units are named parasegments (PSs) (8). For example, bxd mutations cause the transformation of the posterior part of T3 (pT3) and the anterior part of A1 (aA1) into p (T2) and aT3. This corresponds to the transformation of parasegment 6 (PS6) into PS5. The mutations affecting parasegment identity form nine discrete entities that, as predicted by Lewis, are aligned on the chromosome in the same order as the parasegments in which they act on the body of the fly (abx/bx, bxd/pbx, iab-2 through iab-8). These mutations define nine PS-specific functions, and the arrows point toward the parasegments of the adult fly that are most affected in each class of mutations (Fig. 1). For reasons that will become clear later, it is worthwhile noting that all the mutations affecting the PS-specific functions are due to chromosomal rearrangements (more than 100 have been mapped). Thus, it seems impossible to affect these functions by point mutations, and it is unlikely that they correspond to individual genes coding for distinct proteins.

Figure 1. Genetic map of the bithorax complex (BX-C). The 300 kilobases of DNA that compise the complex are indica bottom line, which is marked off in 104 base pairs. Above the DNA line are represented the sites of the homeotic mutatic affect the identities of each of the segements under control of BX-C. The vertical arrows represent the sites of chromosoi rearrangement breaks, the triangles the sites of insertion of transposons, and the horizontal lines the extent of deletions. T parasegments (PS5 to PS13, 14) that each part of the complex controls are indicated above, in the adult fly. The three tra units (Ubx, abdA, and AbdB) are indicated below the DNA; in each case, transcription occurs from right to left. The exon indicated by the thick horizontal lines, the introns by the thin V’s connecting them. H indicates the homeobox domain. T alternative promoters a and g are indicated for AbdB.

Genetic map of the bithorax complex (BX-C). The 300 kilobases of DNA that compise the complex are indica bottom line, which is marked off in 104 base pairs. Above the DNA line are represented the sites of the homeotic mutatic affect the identities of each of the segements under control of BX-C. The vertical arrows represent the sites of chromosoi rearrangement breaks, the triangles the sites of insertion of transposons, and the horizontal lines the extent of deletions. T parasegments (PS5 to PS13, 14) that each part of the complex controls are indicated above, in the adult fly. The three tra units (Ubx, abdA, and AbdB) are indicated below the DNA; in each case, transcription occurs from right to left. The exon indicated by the thick horizontal lines, the introns by the thin V's connecting them. H indicates the homeobox domain. T alternative promoters a and g are indicated for AbdB.

Northern blots and the isolation and sequencing of complementary DNA revealed the existence of only three transcription units Ubx, abd-A, and Abd-B. These three transcriptions units are all transcribed from right to left (of Fig. 1) and contain a homeobox sequence at their 3′ end. The Ubx transcription unit covers 7O kb of DNA and can generate 12 different transcripts by alternative splicing and polyadenylation. Translation of these messenger RNAs yields a family of six proteins characterized by constant amino- and carboxy-proximal regions of 247 and 99 amino acid residues, respectively. The latter homeobox sequence is encoded by the 3 -terminal common exon. The members of this family are distinguishable by a short variable region that links the constant regions and consists of different combinations of three optional elements of 9, 17, and 17 residues (9-11). The spectrum of RNA products changes with time and tissue. It has been demonstrated that the Ubx protein containing the portion coded by the second microexon is not expressed in the nervous system (12). However, the different isoforms of the Ubx product are not all essential, since a mutation that eliminates the second microexon (and thus four isoforms) has no effect on fly viability and development (13).

The abd-A transcription unit is spread over a 20-kb region of DNA, and the mature abd-A transcript is composed of at least eight exons (14). As in Ubx, the abd-A transcription unit contains a microexon, but no alternative splicing generating multiple forms of the abd-A product has been detected thus far. The homeobox homology is found near the middle of the 330-residue protein. Analysis of the sequence of the whole BX-C (15) indicates that the open reading frame extends further upstream from the published ATG initiation codon, raising the possibility that abd-A may be slightly larger than originally thought. In as much as this upstream open reading frame is conserved in Tribolium, it is likely that abd-A consists of 590 residues (16).

Finally, the Abd-B transcription unit consists of three classes of transcripts that are generated by the use of three alternate promoters and differential spicing (Fig. 1). While the alpha class of transcripts produces a protein of 55 kDa, the beta and gamma forms produce a product of 30 kDa that is truncated at the N-terminus (17, 18).

Mutations that affect the Ubx, abd-A, and Abd-B transcription units (shown below the DNA line in Fig. 1) are all lethal at the embryo stage of development, and cuticle analysis of the dead embryos detects homeotic transformations. In Ubx mutants, PS5 and PS6 are transformed into PS4. If such an embryo could survive, it would give rise to a fly with T3 and A1 transformed into T2 (ie, a fly with three pairs of wings). Homozygous abd-A embryos have PS7, 8, and 9 transformed into PS6 (in the adult this would correspond to a transformation of A2, A3, and A4 into A1). Finally, Abd-B mutations have PS10, 11, 12, and 13 transformed into PS9 [A5-A8 transformed into A4 (2, 7, 19, 20)].

2. Expression

The genetic and molecular data that have been described thus far appear to conflict. On the one hand, genetic analysis reveals the existence of nine parasegment-specific functions that are responsible for the identity of PS5 to 13, which will form the posterior thorax and the abdominal segments of an adult fly. On the other hand, molecular studies indicate that BX-C encodes only three protein products. This apparent discrepancy was solved when antibodies directed against the Ubx, abd-A, and Abd-B products became available, allowing determination of which part of the embryo these genes are expressed. A first observation derived from these studies is that each of the Ubx, abd-A, and Abd-B genes are expressed in domains composed of several parasegments. Ubx is expressed from PS5 to 13 (9, 21), abd-A from PS7 to 12 (14, 22), and Abd-b from PS10 to 14 (23, 24). Second, the expression patterns are complex, intricate, and dynamic. Comparisons of the expression patterns between wild-type embryos and those carrying mutations in the PS-specific functions revealed that the latter correspond to large cis-regulatory regions that are responsible for construction of the complex expression patterns of Ubx, abd-A, and Abd-B. The abx/bx and bxd/pbx cis-regulatory regions are responsible for UBX expression in PS5 and 6, respectively (21, 25, 26). The iab-2, iab-4, and iab-4 cis-regulatory regions control expression of Abd-A in PS7, 8, and 9, respectively, while iab-5, iab-6, iab-7, and iab-8 are responsible for the pattern of ABD-B expression (initiated from the a promoter) in PS10 to 13 (14, 24, 27). PS14 expresses a truncated form of ABD-B, resulting from transcription initiated from the b and g promoters.

3. BX-C Regulation

BX-C gene regulation can be divided into two phases, initiation and maintenance. During the early phases of embryogenesis, when parasegment identity is initially selected, the PS-specific cis-regulatory regions are the targets of the gap gene and pair-rule gene products (28-31). These gap and pair-rule proteins activate the cis-regulatory regions in successively more posterior parasegments.

The gap and pair-rule gene products are present only transiently during early development. The fact that homeotic genes are expressed throughout development implies the existence of a mechanism that maintains the activity state of each of the cis-regulatory regions. This maintenance system requires the Polycomb group (Pc-G) and the trithorax group (trx-G) genes (32-34). While the products of the Pc-G function as negative regulators, the products of the trx-G act as positive regulators. The products of the Pc-G exert their regulatory effects by interacting with specific elements in each of the cis-regulatory domains calledpolycomb-response elements (35-39). There may be equivalent or overlapping trx response elements for the trx-G proteins (36, 40). Though their precise mode of action is unknown, the products of the Pc-g and trx-g are thought to stabilize the expression patterns in each parasegment by imprinting an inactive or active chromatin conformation of the PS-specific cis-regulatory subregions (33, 41, 42).

4. Regulatory Elements of BX-C

Molecular studies using reporter gene constructs have revealed the existence of elements within the PS-cis-regulatory units that seem to be responsible for the initiation and maintenance phases of BX-C regulation. Some DNA fragments are able to initiate expression of a Ubx-lacZ reporter gene in the proper parasegments during early embryonic development (28, 30, 43, 44). In most cases, however, these patterns are not maintained, and expression expands into more anterior parasegments around the time when BX-C regulation would switch to the maintenance mode. Other BX-C DNA fragments are capable of retaining the appropriate parasegmental restrictions in lacZ expression after the gap and pair-rule gene products disappear. These fragments contain "maintenance elements," also known as Pc-g response elements because their activity depends on Pc-g gene products (35-39, 43, 44). Finally, a third type of regulatory elements that has been identified in experiments with Ubx-lacZ reporter constructs are tissue- or cell-type-specific enhancers. They induce lacZ expression in specific tissue or cell types, with no restriction along the anteroposterior axis.

Many observations suggest that the PS-specific cis-regulatory units are organized into functionally independent domains. This is best illustrated by the expression patterns of "enhancer trap" transposons integrated in different domains of the complex (45, 46). These enhancer traps are subject to regulatory elements located within the same domain, but they are insensitive to regulatory elements in adjacent domains. The autonomy of each domain is ensured by elements that are believed to function as boundaries. Two such regulatory elements, Mcp and Fab-7, have been identified. Mcp is located between the iab-4 and iab-5 cis-regulatory units or domains, while Fab-7 is located between iab-6 and iab-7 (46-50).

5. Concluding Remarks

Molecular analysis of the BX-C has confirmed most of the predictions that Lewis had foreseen in his 1978 model. He had initially envisioned activation of a new gene product in each parasegment. It is now clearly established that there are only three major groups of related protein products encoded by BX-C (UBX, ABD-A, and ABD-B). Discrete genetic units exist, however, that are sequentially activated in each parasegment. These units function as transcription regulatory regions (PS-specific cis-regulatory regions). The complex cis-regulation that they mediate results in a very intricate pattern of expression, both between and within parasegments. Each parasegment is a mosaic of cells expressing different homeotic products. Under the direction of these proteins, different cells adopt different fates, yielding the complex array of pattern elements that characterizes a given parasegment (or segment). The PS-specific cis-regulatory regions are large (bxd is spread over more than 30 kb; see Fig. 1) and can act from remote distances on their target promoters (iab-5 is localized 50 kb away from its target Abd-B promoter). These properties suggest that the structure of the chromatin plays an important role to allow such long-distance interactions. Chromatin structure is also evoked by the properties of the Pc-G gene. The products of these Pc-G genes function as cellular memory to maintain the repressed state of the homeotic genes in body regions where they have not been activated during early development. There are analogies between the Pc-G repression and mating-type silencing in yeast or heterochromatic position-effect variegation in Drosophila. Though little is known at the molecular level, these analogies suggest that Pc-G repression involves the formation of a complex of Pc-G proteins leading to a chromatin structure that is refractory to transcription. The finding of boundary elements insulating adjacent PS-specific cis-regulatory regions has led to a model in which the sequential activation the cis-regulatory regions would be due to the stepwise opening of chromosomal domains (45, 46, 49, 51). Although no molecular clues exist to support such a model, it provides a rationale for the remarkable correspondence between the genomic organization of the BX-C and the anteroposterior axis of the fly. A similar model has been discussed recently in the case of the clusters of homeotic genes in mice, the Hox clusters (52).

Next post:

Previous post: