O-Glycosylation (Molecular Biology)

In eukaryotes, carbohydrate side-chains are added to certain amino acid residues in some protein sequences to form glycoproteins . In N-glycosylation, the sugars are attached in N-linkage to asparagine residues, and this type of glycosylation, which is initiated in the endoplasmic reticulum (ER), has been well studied. Glycans may also be added to serine and threonine residues in a protein, and because the hydroxyl group of these amino acids forms the link with the sugar, this type of glycosylation has been termed O-glycosylation . The enzymes that initiate glycosylation and add the individual sugars to extend or terminate the chain are glycosyltransferases, which catalyze the basic reaction:

where R is a serine or threonine residue or a sugar already attached to the acceptor protein. In most cases, the nucleotide may be UDP, as for galactose and the hexosamines, or GDP, as for fucose; these donor substances are made in the cytoplasm. An exception is sialic acid, which is added from cytadine monophosphate-sialic acid that is made in the nucleus. There are three main types of O-glycosylation in eukaryotes, namely: (1) Mucin-type O-glycosylation, in which the proteins normally carry multiple O-linked oligosaccharide chains; (2) O-Glycosylation leading to the formation of proteoglycans; and (3) O-GlcNAc glycosylation of cytoplasmic and nuclear proteins.This article will focus on O-glycosylation in mammalian cells, especially relating to mucin-type O-glycosylation. Proteins are O-glycosylated, however, even in fungal cells, and these pathways will also be discussed.

1. Mucin-Type O-Glycosylation

This type of glycosylation refers to the covalent attachment of O-glycans to serines and threonines in mucin core proteins, whereby the sugars are added individually and sequentially in the Golgi apparatus. The nucleotide sugar donors are transported into the lumen of the Golgi pathway where the glycosyltransferases are positioned. The genes coding for many of the relevant glycosyltransferases have recently been isolated, cloned, and expressed, and it is becoming clear from their specificities that several different enzymes may catalyze the same reaction. Moreover, the existence of enzymes that can add different sugars to the same substrate means that competition can occur for the substrate, providing the locations of the different enzymes overlap in the Golgi pathway. Thus, the locations of the enzymes involved in mucin-type O-glycosylation are as important as their level of activity in determining the final composition of O-glycans added.

1.1. Initiation of Glycosylation

In mammalian cells and as far down the evolutionary scale as Caenorhabditis elegans, the first sugar to be added to serine or threonine in alpha linkage is N-acetylgalactosamine (GalNAc). Identifying a specific sequence that defines a glycosylation site has been difficult. Recent developments suggest that this may be largely because not one enzyme but a family of enzymes (UDP-N-acetyl-a-D-galactosamine:polypeptide-N-Acetylgalactosaminyltransferases) are involved in this reaction. These enzymes, referred to here as ppGalNAcTs, catalyze the transfer of GalNAc from UDP-GalNAc to serines and threonine residues (1). The different ppGalNAcTs have distinct but overlapping specificities for the peptide sequence, and the same enzyme can add GalNAc to both serine and threonine residues (2). Thus, the sequences flanking the serine or threonine residue influence whether or not it is glycosylated, but this cannot be inferred merely from a database analysis of in vivo substrates (3).

Knockout strains of the ppGalNAcTs are only just being prepared (4), and some redundancy is to be expected. However, the ppGalNAcTs show well-defined tissue and cell specificity of expression (5), suggesting a differential dependence of each cell type on specific enzymes. The tissue and cell specificity of the ppGalNAcTs also suggests that differences may exist in the sites of glycosylation of the same protein expressed in different cells. Identification of sites that are glycosylated in vivo has been difficult, but some data are now beginning to emerge.

The location in the ER/Golgi pathway where mucin type O-glycosylation is initiated has been controversial. There has been general agreement that GalNAc is not added in the ER, but different methods of analysis have indicated different locations (6, 7). Recent studies using immunoelectron microscopy to localize tagged enzymes in transfected cells suggest that three of the ppGalNAcTs (T1, T2, and T3) are found throughout the Golgi, with the profile of expression being different for each enzyme (8). With mucin proteins, in which hundreds of sugar side chains may be added, it may be that the addition of GalNAc does indeed occur throughout the Golgi. If some chains are initiated later in the pathway, this could contribute to the heterogeneity of composition of the O-glycans that are found in mucin preparations.

The protein molecules that are glycosylated may be secreted or are in the cell wall. Although the classification of the yeast O-glycosylation may not fall into any of the categories defined for higher eukaryotes, multiple genes code for enzymes catalyzing the transfer of mannose from Dol-P-Man to serine or threonine residues (Dol-P-Man:protein mannosyl transferases, or PMTs). In this system, the possible redundancy of the enzymes has been investigated by producing multiple mutants, and only a triple disruption is lethal, although some of the double mutants are unable to grow without osmotic stabilization (8).

1.2. Chain Extension in Mucin-Type O-Glycan Synthesis

After the addition of GalNAc to the mucin protein, various core structures are formed by the addition of different sugars (see Fig. 1 of O-Linked Oligosaccharides). Chains are extended from these cores by the addition of N-acetylglucosamine (GlcNAc ) and galactose alternately, to give polylactosamine side chains that may be straight or branched (see Figs. 2 and 3 in O-Linked Oligosaccharides). Formation of the different cores varies with the tissue, but a pathway through core 1 and core 2 is commonly used. Two genes that can add GlcNAc in b1,6 linkage to core 1 (to form core 2) have been identified (9). One of the these enzymes can also catalyze the addition of GlcNAc to form the internal branch (shown in Fig. 2 of O-linked oligosaccharides), and the other (the core 2 enzyme) can only catalyze the formation of core 2 from core 1. Where biosynthesis is through core 2, the addition of the GlcNAc is crucial for chain extension.

1.3. Chain Termination

The O-glycans carried on mucins are usually terminated by the addition of sialic acid or fucose, and this may be subject to blood group A or B dependent transferase activity (see Fig. 3A and 3B of O-Linked Oligosaccharides). The number of enzymes that catalyze the addition of sialic acid is steadily increasing and presently is at least ten (10). Not all of these enzymes act on O-glycans, but those acting at the early stages of O-glycosylation are specific for this type of glycosylation. Important enzymes in this category are those adding sialic acid in a2,6 linkage to the first sugar GalNAc, or in a2,3 linkage to galactose in core 1 (see Fig. 2 of O-Linked Oligosaccharides). The sialyltransferase that adds sialic acid in a2,6 linkage to galactose at the end of N-glycans does not add the sugar to O-glycan chains that terminate with sialic acid in a2,3 linkage. Two candidate enzymes exist (ST3 Gal III and ST3 GalIV) that could add sialic acid to galactose at the end of O-glycans, which have been reported to be selective for type I and type 2 chains, respectively. (See Tsuji et al (1996) for further discussion on nomenclature.)

1.4. Localization of Glycosyltransferases Involved in the Biosynthesis of O-Glycans in the Golgi Pathway

Organization of the Golgi apparatus can be separated into three major components: (1) the cis-Golgi network, (2) the Golgi stacks (cis medial and trans), and (3) the trans Golgi network, or TGN. The sizes of the various components of the pathway vary with cell type (see Golgi Apparatus). Movement of proteins through the pathway is vectorial, progressing from cis to trans via nonselective budding transport. Glycosylated proteins leave the trans face of the stack to be sorted to their various destinations in the TGN, where further glycosylation may also occur (11).

The glycosyltransferases are themselves type-II membrane glycoproteins, with a short amino-terminal cytoplasmic domain, a single transmembrane domain (typically 17 residues), a loosely In fungi, serine and threonine residues are glycosylated via dolichyl phosphate-D mannose (Dol-P-Man) as an intermediate, a reaction that occurs in the ER and has not been observed thus far in higher eukaryotes. The following reactions have been established for Saccharomyces cerevisiae: folded putative stem cell region (50-100 residues), and a tightly folded globular catalytic domain of more than 325 residues, which extends into the lumen of the Golgi apparatus. Early work involving the localization of these enzymes using subcellular fractionation or detection of specific oligosaccharide structures with lectins suggested an orderly compartmentalization of enzymes corresponding to the sequential addition of sugars. More recent studies show that glycosyltransferase organization is more complex, however, and that overlap can occur even between enzymes such as the core 2 enzyme responsible for chain extension and the a2,3 sialyl- transferase ST3Gal I that effects termination (12). These enzymes both use Gal b1,3 -GalNAc-R as a substrate, and changes in their levels of activity affect the final composition of the O-glycans synthesized. Definitive mapping has only been done for a few of the many enzymes involved in O-glycan initiation and biosynthesis. However, with the production of the recombinant enzymes and, in particular, specific monoclonal antibodies directed against them, the relative positions of these enzymes should be established.

The mechanisms involved in sorting the glycosyltransferases, whether these be involved in N- or O-glycosylation, are not fully understood. Two models have been suggested to explain the distribution of these enzymes in the Golgi pathway (13, 14) (see Golgi Apparatus).

1.5. Molecules Carrying Mucin-Type O-Glycans

Although some molecules (eg, erythropoietin) may carry only one O-glycan, most of the molecules carrying carbohydrate side chains attached in this way carry multiple O-glycans and are classified as mucins; the final composition of the molecule is more than 50% carbohydrate. These molecules have a common feature in containing a tandem repeat domain, rich in serine and threonine residues, to which the O-glycans are attached. Of the eight human epithelial mucin genes that have been identified by gene cloning, all but the first (MUC1) are extracellular mucins. MUC1 is a type I transmembrane protein and resembles the membrane-associated selectin ligands (15). It is much larger, however, and the sequence in the repeats is much more conserved. The addition of sugars, even only GalNAc, results in the molecule becoming highly extended, reaching above the glycocalyx, allowing the membrane-associated mucins to influence cell-cell interactions.

The multiplicity of enzymes that can initiate mucin-type O-glycosylation and the sequential nature of the biosynthesis of the O-glycans allow for an almost infinite number of possible glycoforms to be produced from the same core protein, the final structure depending on the profile of expression of the glycosyltransferases. Studies with specific antibodies to the ppGalNAcTs are showing a marked tissue and cell specificity of expression (1, 5), and this also holds true for many of the enzymes involved in biosynthesis of the oligosaccharides, for which studies thus far have depended on looking at messenger RNA expression. Studies on the promoters governing expression of the transferases are only now beginning, but strong indications exist that tissue-specific expression may in some cases depend on the use of alternative promoters and alternative splicing, resulting in the production of various mRNAs with different 5′-untranslated regions (16).

1.6. Functions of Mucin-Type Glycoproteins

An obvious function of the extracellular mucins, which form an important component of the mucous layer covering some epithelial cells (eg, lining the gastrointestinal and respiratory tracts), is protection of the underlying epithelium from insult. The O-glycans play a major role in this protective function, not only in the formation of large oligomers, which makes the mucous layer viscous, but also in serving as receptors for invading microorganisms. The membrane-associated MUC1 mucin can also have a protective function but, like the selectin ligands, can also affect cell-cell adhesion.

The O-glycans on the membrane-associated mucins are generally heavily sialylated, and the resulting negative charge results in inhibition of cell-cell interactions unless a specific epitope in the oligosaccharide component can act as a ligand for a receptor on another cell, when cell-cell adhesion is enhanced. Study of the interaction of selectins with their glycoprotein ligands has led to detailed definition of the interacting oligosaccharide structure (see O-Linked Oligosaccharides). However, the specific oligosaccharide must be carried on a particular core protein, suggesting an involvement of the protein sequence or structure, either directly or indirectly by allowing clustering of the O-glycans.

1.7. Glycosylation Patterns, Differentiation, and Malignancy

The composition of the side chains added to a mucin core protein can vary not only in different tissues, but also with the differentiation state of a particular cell phenotype and with the change to malignancy (see Neoplastic Transformation). Although the changes in malignancy may or may not relate to disease progression, those observed in differentiation are likely to be functional and relate to cell adhesion.

Changes in the expression of the core 2 enzyme have been characterized both in differentiation of T cells and in cancer. Leukosialin is a major glycoprotein expressed on leukocytes, and changes in its composition of added O-glycans occur on activation of human T cells. In resting cells, the dominant O-glycan is a tetrasaccharide (disialylated core 1), whereas in activated T cells the expression of the core 2 enzyme is increased and core 2-based structures are added (17). A similar increase in the activity of the core 2 enzyme is seen in leukocytes from patients with immunodeficiency, as such that seen in Wiscott-Aldrich syndrome (9). Furthermore, the core 2 enzyme is differentially expressed in the thymus, where expression is high in the subcapsular and cortical thymocytes and low in the medullary thymocytes. In this case, the differential expression of core 2 structures correlates with the ability of the cells to interact with a lectin (galectin) synthesized by thymic epithelial cells.

Emphasizing the importance of cell phenotype in defining profiles of glycosylation, the changes in levels of enzymes in malignancy differ in various cancers. In leukemias, the core 2 enzyme can be increased, whereas it may be decreased in breast cancers (18). In breast cancer, there is also an increase in the a2,3 sialyl-transferase that adds sialic acid to core 1, resulting in an increase in the major epitope for sialoadhesin expressed on macrophages. In the colon, where the O-glycans added to the mucins are very large and complex, changes occur in carcinomas that result in increased expression of the sialylated Le a and Lex structures—the ligands for P and E selectins, respectively.

2. O-glycosylation leading to the formation of proteoglycans

Proteoglycans are composed of glycosaminoglycan chains (GAGs) bound to serine residues in a protein core, via a xylose-galactose-galactose bridge. Unlike mucin-type O-glycosylation, threonine residues do not seem to act as acceptors. All GAGs (chondroitin sulfate, dermatan sulfate, heparin and heparan sulfate) except hyaluronic acid are secreted as components of proteoglycans. The post-translational processing of the core protein occurs in the Golgi apparatus (19), where, after chain initiation, monosaccharides are added stepwise from the appropriate UDP-sugars. To attain their final shape, chains are then carried through a series of modifications, including sulfation (see O-Linked Oligosaccharides). There has been more progress in isolating the genes coding for and defining the core proteins of proteoglycans than in isolating the genes coding for the enzymes involved in the biosynthesis of the carbohydrate moieties added to these proteins. The GAGs are large, and where many chains are added there may be only 10% protein in the proteoglycan.

2.1. Initiation of Glycosylation

The initial, rate-limiting step in the synthesis of GAGs is the transfer of xylose from UDP-xylose to a serine residue. This step is catalyzed by UDP-D-xylose: proteoglycan core protein b-D-xylosyl transferase (xyloseT). Attempts to define a sequence in the core protein that will determine whether xylose will be added suggest that, although some limitations can be placed on the sequence flanking the serine (glycine must follow toward the carboxyl end), protein secondary structure may play an important role (20). The linking galactose moieties are then added, and the type of GAG to be synthesized is determined by the first hexosamine to be added, which begins the biosynthesis of the main chain.

2.2. Chain Extension

The GAGs consist of hexosamines and either hexuronic acid or L-iduronic acid or galactose units added alternately in unbranched sequence (see O-Linked Oligosaccharides). It seems likely that the enzymes that add the first hexosamine (GalN or GlcN) are different from those acting on the more peripheral regions of the chain. Which hexosamine is added determines which GAG is subsequently synthesized, and sequences in the core protein may direct the choice, as different GAGs can be added to different core proteins in the same cell.

2.3. Functions of Proteoglycans

Proteoglycans may occur intracellularly in secretory granules, at the cell surface, and in the extracellular matrix. The functions of proteoglycans are highly diverse, ranging from mechanical functions essential for maintaining the structural integrity of connective tissue, to effects on cell adhesion, motility, proliferation, differentiation, and morphogenesis. Many of these effects depend on binding of proteins to the GAG chains. As with the mucin molecules, these interactions can depend on the charge and are then relatively nonspecific and of low affinity, whereas others, involving a particular oligosaccharide with defined structure, are highly specific. The core protein, in addition to serving as a scaffold for the GAGs, may be involved in anchoring the molecule to the membrane.

3. O-GlcNAc glycosylation of cytoplasmic and nuclear proteins

O-linked N-acetylglucosamine linked to serine or threonine residues was discovered by Torres and Hart (21) and has since been found to be ubiquitous and abundant on nuclear and cytoskeletal proteins in virtually all eukaryotes, including fungi (22). This type of glycosylation occurs in the cytoplasm, not in the Golgi apparatus (the site of post-translation modification of the core proteins of mucins and proteoglycans), and it may be extremely important in modifying the activity of intracellular proteins with a wide diversity of functions. Table 1lists some of the classes of proteins that have been shown to be glycosylated in this way.

Table 1. Some Proteins Shown to be Subject to O-GlcNAc Acylation

Proteins involved in transcription (Pol II andtranscription factors)

Cytoskeletal proteins (intermediate filaments,bridging proteins)

Tumor suppressors, oncogenes (p53)

Nuclear pore proteins

3.1. O-GlcNAc and Phosphorylation

O-GlcNAc glycosylation is dynamic; the sugar turns over much more rapidly than does the protein backbone. Most of the proteins undergoing this type of glycosylation are phosphorylated, in some instances on the same amino acid residue, so it is suggested that the addition of O-GlcNAc is a regulatory modification analogous to phosphorylation. Many of the sites are similar to those used by some kinases, namely the glycogen synthase kinase and the MAP kinases, and in the myc oncogene the site has been mapped to Thr58 in the transactivation domain (23), which is also a major site of phosphorylation and a hot spot for mutagenesis in lymphomas. The glycosylation of RNA polymerase II at the mucin-like sequence at the C-terminal domain may indicate a role in transcriptional initiation.

3.2. O-GlcNAc and Protein Interactions

Another function of O-GlcNAc appears to be in mediating cytoskeletal assembly and organization, and a defect in the function has been implicated in Alzheimer’s disease (24). Where the GlcNAc is thought to be an alternative to phosphorylation, it is accessible to modification (by a glycosyltransferase that transfers galactose to the hexosamine). When O-GlcNAc is functioning in protein-protein interactions, the sugars appear to be buried in the native molecules and are only accessible after denaturation or proteolysis.

Whereas about 50% of the sites that are glycosylated are at or near a PVS (Pro-Val-Ser) type of sequence, the other half have no apparent consensus sequence. It is not yet clear how many enzymes are involved in this type of glycosylation. However, a gene coding for an O-GlcNAc transferase has now been isolated and characterized. (See Hart et al (1996) for a further discussion).

3.3. Concluding Comments

Addition of sugars in O-linkage to proteins represents a post-translational modification that has far-reaching implications in affecting protein structure and function. The use of recombinant DNA technology to identify and catalog the families of enzymes involved in effecting the modifications shows that the number of genes devoted to this activity in higher eukaryotes is very large indeed. The functions affected by these modifications are also very wide ranging. The relation between structure and function of O-glycosylated proteins is therefore complex, but important tools are being developed, and the challenge is becoming less daunting. One might predict that there will be more entries dealing with the subject in any future issues of this work.