Homeobox gene repertoires: implications for the evolution of diversity (Genetics)

1. Introduction

With the completion of sequencing the DNA of numerous entire genomes, a comprehensive analysis of gene repertoires in evolutionary perspective becomes possible. Of particular interest are those types of genes that are thought to be responsible for, or contributing to, the evolutionary changes that manifest themselves in the various distinct species. One of the prominent classes of such genes are the home-obox genes, the developmental control genes that play a role in the formation of and cell differentiation in multiple tissues in organisms as diverse as plants, yeasts, and animals (Akam et al., 1994; Banerjee-Basu and Baxevanis, 2001; Kappen, 1995; Kenyon, 1994).

Homeobox genes encode transcription factors with a DNA binding domain (the so-called homeodomain) (Gehring et al., 1990), and have been shown to control the patterning and development of multiple tissue systems in animals (Tautz, 1996), including axial patterning in embryos, and of derivatives of all germ layers: nervous system in worms, flies, and vertebrates; mesodermal derivatives, such as heart and muscle in flies and vertebrates; and skeleton and skin in mammals, as well as endodermal derivatives, such as lung, pancreas, and gut in vertebrates. Furthermore, tissue differentiation in the hematopoietic system, kidney, and skeleton of vertebrates is controlled by homeobox genes, and they have been shown to be involved in cancer (Abate-Shen, 2002, 2003; Cillo, 1994; Cillo et al., 1999). Experimental as well as genetic changes in the function of homeodomain transcription factors alter embryonic development, tissue function, and cellular differentiation, thus making them excellent candidates for substrates of changes in the evolution of species with differences in these processes.


In plants, homeobox genes are also involved in patterning during development, but the divergence of gene sequences and function make it difficult to construct a conceptual equivalence of plant and animal homeobox genes. In fact, it is now well established that – while transcription factors encompass from 2 to 5% of the genome – the repertoires of homeobox genes in plants and animals are essentially distinct (Meyerowitz, 2002; Riechmann et al., 2000; Kappen, unpublished data).

For the purpose of this article, I will therefore focus on animal homeobox genes, and most specifically on those genomes for which complete sequence with appropriate annotation quality is available (Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Rattus norvegicus, and Homo sapiens). The goal of this article is to determine how the complement and composition (collectively accounting for the repertoire) of homeobox genes in diverse species can inform about the evolutionary basis of diversity between species.

To this end, I will extend my previous analysis from one completed genome (Kappen, 2000a) to multiple genomes. The earlier results strongly favored a model of “intercalation” of genes within the limits of a given repertoire of homeobox sequences. However, with only one completed (worm) and one semicomplete (at the time) genome (fly) analyzed, several important questions remained unanswered: (1) Does the complement of homeobox genes in a species relate to its complexity in body patterning? (2) Is there an identifiable pattern of gene multiplication as species with increased homeobox gene number arise? (3) Does the diversification of the homeobox gene repertoire follow a common trend along the evolutionary trajectory? or, in other words, are homeobox genes generally under common evolutionary constraints? (4) Can the patterns of history in homeobox gene evolution inform us about possible future trends? I will attempt to answer these questions on the basis of qualitative and quantitative analyses of homeodomain repertoires across relatively large evolutionary distances.

This focus will implicitly miss out on the exciting recent evidence for rapid homeobox gene evolution within shorter evolutionary time frames (Chow et al., 2001; Maiti et al., 1996; Schmid and Tautz, 1997; Sutton and Wilkinson, 1997; Ting et al., 1998, 2001). However, these reports are largely restricted to individual genes or individual evolutionary characters. While such investigations into individual homeobox gene function are indispensable to precisely decipher the relationship of gene regulation, gene function, and phenotypic character evolution (Averof and Patel, 1997), they cannot inform about patterns of evolution at the whole-genome level. In the same spirit, I will refer to excellent recent reviews on the evolution of individual subgroups of homeobox genes such as the Hox, Pax, En, Cut, Meis, and Knox genes (Burglin, 1998; Burglin and Cassata, 2002; Galliot et al., 1999; Gibert, 2002; Holland and Garcia-Fernandez, 1996; Kourakis and Martindale, 2000; Reiser et al., 2000; Steelman et al., 1997; Zhang and Nei, 1996) in favor of considering entire repertoires of homeodomains in selected model organisms. The expectation is that genome analyses will uncover trends of evolution at the gene as well as systems level, and in this way contribute to knowledge that may allow us to derive predictions about future trajectories of evolutionary change.

2. The evolution of homeobox gene repertoires in animals

2.1. Content and classification of homeodomains in major animal species

From the earliest invention of the homeodomain in a single-celled ancestral organism, expansion of the repertoire to the order of 100 homeodomains in invertebrates and even more in vertebrates was achieved by duplication and diversification (Banerjee-Basu and Baxevanis, 2001; Kappen, 2000a). Traces of ancestry may still be evident from the relationships of homeodomains within a given genome. The information that can be derived from analyzing genomic repertoires has implications for speciation as well as evolution of diversity. There is growing recognition that the gene repertoires within the animal kingdom are largely conserved (Rubin et al., 2000). Indeed, for many vertebrate homeobox genes, orthologs have been identified in worm and fly, and vice versa. From whole-genome repertoires, we can now establish the extent of overlap, determine whether species-specific (or, in a broader perspective, clade-specific) homeobox genes exist, and assess degrees of conservation and divergence.

To this end, I have supplemented prior collections (Kappen et al., 1989, 1993; Kappen, 2000a) of homeodomain sequences with recently completed genomes by using the search functions “homeobox”, “homeodomain”, “homeo box”, “homeo domain”, and “homeo” in the NCBI Genome Browser. Nomenclature discrepancies were resolved according to sequence identity or on the basis of experimental literature. Zebrafish and Xenopus were omitted from any analyses performed here to avoid complications from tetraploidy. Classification was done as described previously (Kappen et al., 1993) and is in concordance with existing accepted classification schemes (Buirglin, 1995). Content of homeodomain sequences from respective species within each subgroup was determined from the sequence compilation by hand. The criteria for clade-specific (unique) genes were: (1) lack of an ortholog in other invertebrate or vertebrate species, (2) selective presence only in vertebrates, and (3) minimum distance of the “unique” sequence from any other sequence class in the same genome by at least seven residues (Kappen et al., 1993).

The schematic summary of these data is shown in Figure 1. Of the total of 154 homeodomain sequence classes, 80 are unique to one clade, with 53 of those found only in vertebrates. The latter include a number of proteins that contain both zinc-finger domains and homeodomains, often with multiple homeodomains in the same protein that each belong to a separate class. Under the restrictive definition of class membership used here, 53 classes are shared between two clades and 48 classes are shared between all three. The number of shared sequence classes clearly exceeds a random distribution: under Poisson distribution, one would expect 154 units to distribute into 57 single clades, 28 into two clades, and only 15 into the triple-clade category. The much higher number of sequence classes shared between all clades demonstrates the strong evolutionary conservation. This makes it possible to analyze in detail the evolution for each of the classes that are present in all clades.

2.2. Sequence subclasses and their contribution to the repertoire

Within each genome (Figure 2), subclasses of closely related sequences exist, such as the Dlx and Pbx subclasses. The subclasses are often of different size in different clades, reflecting either clade- or species-specific duplication events (as in case of the Dlx class) or strong conservation of genes known to have been duplicated in an ancestral organism, such as in the case of the Hox genes. Conserved sequence classes appear in all genomes analyzed here, as highlighted by same-color shading in Figure 2. In addition, clade-specific subclasses exist, such as a subclass of Zinc-finger homeobox genes that are restricted to the vertebrate lineage (Bayarsaihan et al., 2003). Genomes in separate clades %2

Next post:

Previous post: