O-Linked Oligosaccharides (Molecular Biology)

There are three major types of O-glycosylation:(7) Mucin-type O-glycosylation;(2) Glycosylation leading to the formation of proteoglycans; and(3) O-mGlcNAc glycosylation of cytoplasmic and nuclear proteins.O-GlcNAc glycosylation involves only the addition of a single hexosamine, but the other two types of glycosylation, although using only a limited number of sugars, lead to the formation of O-glycans with a wide variety of structures.

1. O-glycans Added to Mucin Core Proteins

The sugars commonly found in the O-glycans added to mucins in higher eukaryotes are the hexosamines N-acetylgalactosamine (GalNAc) and N-acetylglucosamine (GlcNAc), the monosaccharides galactose (Gal) and fucose (Fuc), and sialic acid (SA), some of these sugars, as well as tyrosine residues in the core protein, may be sulfated. Sugars are added individually and sequentially in the Golgi apparatus, and the structure of the final O-glycan is generally thought of as being made up of three elements; (1) a core structure built on the linkage sugar, GalNAc; (2) a chain extending from the core, generally made up of repeating lactosamine units; and (3) a terminal region often containing epitopes recognized by naturally occurring antibodies:

O-glycans may be released from glycoproteins by hydrazine or by b-elimination in alkaline borohydride solution of the reduced form containing terminal GalNAcol residues. After purification, analysis of the released O-glycans has been carried out using mass spectrometric methods, methylation chemical analysis, and nuclear magnetic resonance (NMR) techniques (1). When only a few O-glycans are added to a core protein, their structure is usually simpler than that of the O-glycans found on the mucins, which carry multiple side chains. Analysis of the O-glycans from mucins shows a high degree of structural heterogeneity, but whether the heterogeneity is between or within individual mucin molecules is not clear. These preparations have been derived from tissues that contain different cell phenotypes, which could exhibit different profiles of expression of the glycosyltransferases involved in the synthesis of the O-glycans, resulting in a population of different glycoforms where an individual molecule carries only one kind of structure. On the other hand, the addition of multiple O-glycans during transit through the Golgi apparatus may necessarily result in heterogeneity of both chain initiation and extension. The core proteins are extremely large and, after the addition of the first sugar, must be quite extended. It is easy to imagine how all the sites on the protein and on the extending chain might not be found by the relevant transferase(s).

In addition to a complete analysis of the O-glycan structure, components may be detected with specific antibodies, such as those reacting with the common blood group antigens, which recognize terminal sugars, or those reacting with the branched and unbranched polylactosamine chains ( I and i antigens, respectively).

1.1. Core Structures

It is possible for an O-glycan to consist only of the initial sugar GalNAc, and this has been referred to as the Tn antigen. More often, core structures are formed by the addition of GlcNAc and/or Gal units, or the chain is terminated by the addition of sialic acid, forming the sialyl Tn epitope that is specifically expressed in carcinomas. Figure 1 shows the main core structures that have been identified to date as being present on mucins. Core 1 and core 2 are by far the most common structures.

Figure 1. Main cone structures found on mucins.

Core 1, also referred to as the T antigen, may be found as such, particularly on mucins produced by carcinomas, but it is often sialylated in normal cells to give the mono- or disialylated T antigen. The reaction catalyzed by the core1 b3-Gal-T (UDP-Gal:GalNAc-R b 1,3-Gal-transferase) is:

A gene coding for the mouse enzyme has only recently been cloned, and it remains to be seen how many enzymes catalyze this reaction, and whether the peptide sequence flanking the glycosylation site affects the specificity.

An important point is that core 1 can be a substrate not only for sialyltransferases that add sialic acid in a-linkages and thus terminate the chain, but also for the core 2 enzyme, which is crucial for chain extension to proceed. Studies with cells transfected with these enzymes indicate that the two enzymes can compete for the core 1 substrate and therefore overlap in the Golgi apparatus. Differences in the activities of these enzymes, operating at an early stage of O-glycan synthesis, can therefore dramatically affect the final structure of the O-glycan and can explain the differences in the structure of the O-glycans added to MUC1 in breast cancer (1) or to leukosialin in T cell activation (ref 17 of O-glycosylation.)

Core 2 is generated from core 1 by enzymes catalyzing the reaction:

Two enzymes have been shown to catalyze the formation of core 2: the L enzyme, which catalyses only this reaction and for which the gene has been cloned (2), and the M enzyme, which has a wider substrate specificity and can also catalyze the formation of core 4 and internal chain branching. Changes in the level of activity of the enzyme(s) catalyzing the synthesis of core 2 have a profound effect on O-glycan structure in a wide variety of cell types.

Core 3 synthesis precedes Core 4 (see Fig. 1), and the enzymes catalyzing the synthesis of these structures have only been found in mucin-secreting tissues, such as those of the respiratory tract and the colon. Core 4 structures are more predominant than core 3, and because the transfer of GlcNAc to core 4 proceeds at a much faster rate than the b3-GlcNAc transferase making core 3, the activity and tissue distribution of the core 3 b3-GlcNAcT are the limiting factors in the synthesis of core 4. The enzyme catalyzing the synthesis of core 4 shows limited distribution, but it can also catalyze the synthesis of core 2 from core 1.

1.2. Backbone of O-Glycans

The elongation of O-glycans involves the addition of GlcNAc residues to Gal in b-1,3 or b-1,6 linkage and of Gal to GlcNAc in b-1,3 or b-1,4 linkage, to form linear and branched poly-N-lactosamine structures, as illustrated in Figure 2. Extension of the O-glycans based on core 2 can occur by the addition of sugars to either the Gal or the GlcNAc moieties. When extension is from core 4, galactose can be added to either glucosamine.

Figure 2. Elongation pathways forming the backbone of O-glycans.

(1) The elongation b3-GlcNAc transferase catalyses the addition of GlcNAc in b-1,3 linkage to Gal on core 1 and core 2. (See Fig. 1 for the structures of core 1 and core 2.) The enzyme shows limited distribution and may be reduced in cancers of the colon and breast. Elongation from core 1 prevents the formation of core 2, but it seems likely that core 2 is usually formed before elongation from Gal occurs.

(2) The i b3GlcNAc transferase enzyme adds GlcNAc to Gal in chains that have already been extended from cores. This enzyme is found ubiquitously and catalyzes the addition of GlcNAc alternately with Gal, producing a linear chain that is the blood group i antigen.

(3) The I b6-GlcNAc transferases are responsible for the formation of the blood group I antigen found on adult human erythrocytes, where it replaces the i antigen found in the fetal cells. The I antigen represents the branch initiated by these enzymes either immediately after the action of the b3 GlcNAc transferase or after the addition of lactosamines (see Fig. 2), and such branched O-glycans are found on many mucin-type molecules.

(4) The b-1,4 and b-1,3 galactosyltransferases add galactose from a UDP-Gal donor to GlcNAc in the growing polylactosamine chain in either b-1,4 or b-1,3 linkage. When the linkage is b-1,4, the chain is referred to as type 2, whereas the b-1,3 linkage is type 1. Both types of chain may be linear or branched.

1.3. Terminal Glycosylation

The terminal epitopes of the O-glycans on mucins are probably the most important in determining whether the molecule plays a role in cell adhesion phenomena. The epitopes recognized by antibodies related to the ABO and the Lewis blood group antigens are also found in this terminal region. Terminal sugars added in alpha linkage include sialic acid, Fuc, Gal, GalNAc and GlcNAc; Table 1lists some of the more important structures. Some sulfation of the sugars in terminal structures may also occur.

Table 1. Terminal Epitopes in Mucin O-glycans

Antigen	Structure
Blood group A

Blood group B

H (masked by A and B)

Le^a


Le^b

Le^x

Le^y

Sialylated Le^a
Sialylated Le^a
Sialylated Le^x
Sialylated Le^x

Sialyltransferases that are clearly specific for O-glycans are those that add sialic acid in a2-6 linkage to GalNAc(Tn) or in a2-3 linkage to core 1 (Galb1-3 GalNAc). O-glycans with an SA in a2-6 link to GalNAc(Tn) cannot be acted on by any known transferase. This link can be formed however, after the addition of SA in a2-3 linkage to Gal. The shorter O-glycans (ie, Tn, T, and their sialylated derivatives), are found on mucins expressed by some carcinomas, and the change from the normal glycosylation pathway has been analyzed in greatest detail in breast cancer (1). Several genes coding for sialyltransferases responsible for terminating the short O-glycans have been cloned (3) and show differing substrate specificities: some can synthesize the same linkage in glycolipids. Because of the multiplicity of glycosyltransferases, it is only by cloning the individual genes, thereby allowing work with the recombinant enzymes, that the biosynthetic pathways will be unambiguously clarified.

The position of glycosyltransferases in the Golgi apparatus also plays a significant role in whether the enzymes can act on a particular substrate. One of the sialyltransferases that add SA in a2-3 linkage to Gal in core 1 has been localized to the medial/trans Golgi stacks, with some found in the trans Golgi network (TGN). This is a relatively early position and allows competition with the core 2 enzyme and possibly the elongating enzymes (see ref. 12 of O-Glycosylation). It is assumed that the sialyltransferase that adds SA to Gal at the end of the extended chains may be located further down the Golgi pathway in the trans Golgi or TGN, but this has not been clarified. The location probably also relates to whether the same enzyme can add SA to lactosamine chains in both N and O-glycans. The sialylated derivatives of the Lea and Le x antigens terminating lactosamine chains are proving to be of great interest, as they appear with the change to malignancy in some tissues (4) and constitute the epitopes on the selectin ligands expressed by normal cells.

Fucosyltransferases form a large group of enzymes, and genes for several of these transferases have been isolated (5). They are involved in chain termination and are of particular interest in the synthesis of blood group antigens and in the sialylated derivatives of Lex and Lea that form the epitopes on selectin ligands. At least four a3-fucosyltransferases exist (a3 Fuc-T-III to-VI) that add fucose in a1-3 linkage to form the Lex epitope illustrated in Table 1. Of these, Fuc-T-III has the broadest substrate specificity, because it can act on type 1 or type 2 structures and also add fucose in a1-4 linkage, thus synthesizing several human blood group epitopes (Lea, Le b, Lex, and Le y as well as sialyl Lea and sialyl Lex). When added in this position, fucose terminates the chain and needs to be added after sialic acid to create the sialylated Lewis epitopes.

1.4. Functions of O-Glycans

Extracellular mucins such as MUC2 form large oligomers, and although dimerization and some oliogomerization occur within the cell, the interactions that continue after secretion into the mucous layer depend to a large extent on the presence of the O-glycans. Although the detailed structure of the O-glycans may not be so important for this function, it is relevant that the structures found on the mucins produced in the gastrointestinal and respiratory tracts carry large, complex O-glycans, which probably relate to the protective function that is crucial in these tissues (6). In glandular epithelia such as that in the breast, the O-glycans are shorter and simpler but still extended (7, 8). The carbohydrate side chains also serve to bind invading micro-organisms, and heterogeneity in their structure would serve to allow interactions with a variety of receptors. Specific interactions of defined structures present in the O-glycans involved in cell-cell adhesion, however, are now becoming clarified and are of great interest.

Selectin ligands are mucin-like glycoproteins that interact with the selectins expressed on endothelial cells, leukocytes, and platelets (ref. 15 of O-glycosylation). The interaction mediates rolling of leukocytes on blood vessels during inflammation, and, at the molecular level, the O-glycans expressed on the selectin ligand play a major role in determining the specificity of the interaction. The selectins show weak binding of sialylated fucosylated oligosaccharides, such as sialyl Lex, but bind much more strongly to glycoproteins carrying these O-glycans, which may be sulfated.

P-selectin glycoprotein ligand-1 (PSGL-1) is expressed on leukocytes and interacts with both E-selectin found on endothelial cells and P-selectin on platelets. The specificity of the ligand interaction has been studied by transfecting complementary DNAs coding for glycosyltransferases and PSGL-1 into CHO cells, which lack both the core 2 and the a1-3 fucosyltransferases required for chain extension and for the formation of the Lex epitope. Only CHO cells expressing both transferases were able to bind to P- and E-selectins (9). Because sialidase treatment also eliminates the binding of PSGL-1, sialyl Lex has been identified as the specific epitope required for interaction with the selectins. Binding to P- but not E-selectin also requires sulfation of tyrosine residues in the core protein. Clearly, the core protein plays a role in the presentation of the O-glycan, possibly by specifying the clustering and conformation, as well as by providing amino acid residues for sulfation.

The appearance of sialylated Lea in mucins expressed in colon cancer cells suggests that the interaction of the carbohydrate may influence the metastatic process by enhancing binding of the cancer cells to endothelial cells (10).

Sialoadhesin is a molecule expressed at high levels by macrophages (11) that interacts specifically with monosialylated core 1 (sialyl T). Its normal function is thought to relate to interactions with leukocytes. Because this structure is overexpressed on mucins expressed by cancer cells, however, the possibility exists of some interaction between the tumor cells in carcinomas and the infiltrating macrophages.

Membrane mucin MUC1 is unusual among the epithelial mucins in that it is a transmembrane molecule and, as such, resembles the selectin ligands. Being widely expressed on glandular epithelial cells from which carcinomas develop, it is highly expressed by these cancers and aberrantly glycosylated, with the O-glycans being based mainly on core 1 rather than on core 2. This makes the molecule antigenically distinct, and both humoral and cellular responses to the cancer-associated mucin have been seen in breast and ovarian cancer patients. In addition to responses being generated by the whole molecule, it is becoming clear that glycopeptides can be presented by major histocompatibility complex (MHC) molecules. Moreover, peptides carrying larger core 2-based structures are not presented as well as glycopeptides carrying Tn or T (12). The role of O-glycosylation in the immune response is of interest in the wider context, as many antigens from infectious agents are glycoproteins carrying O-glycans, including HIV, in which the env protein carries a sialyl Tn epitope (13).

2. The O-glycans of Proteoglycans

Proteoglycans are proteins that carry glycosaminoglycan side chains (GAGs), which can range from a simple linear chain of sugars to highly charged, sulfated polysaccharides. Proteoglycans can carry one to more than 100 GAGs, which consist of alternating hexosamines and hexuronic acid or galactose units and carrying sulfated substitutions at various positions.

2.1. Linkage Regions

The GAGs are linked to the protein core via a four-sugar bridge: glucuronic acid b1-3galactoseb1-3galactoseb1-4xylose-O-serine. The attachment of xylose to the serine residue of the protein core is catalyzed by xylosyltransferase (see O-Glycosylation). The linkage region is basically the same for most proteoglycans, and in both chondroitin sulfate and heparin sulfate (see text below), a proportion of the xylose has been shown to be phosphorylated (14). In chondroitin proteoglycans, the galactose residues have also been shown to be sulfated. Skeletal keratan sulfate is unlike the other GAG, as it is O-linked to serine or threonine residues via a GalNAc residue (15), as in mucin-type O-glycans.

2.2. The Glycosaminoglycan Chains

The hexosamines in the GAGs can be either D-glucosamine (GlcN) or D-galactosamine (GalN) and the hexuronic acid either D-glucuronic acid (GlcA) or L-iduronic acid (IdoA). These sugars and galactose are arranged in an alternating unbranched sequence and can carry sulfate substitutions at various positions. Table 2shows the compositions of the common GAGs.

Table 2. Composition of the Common GAG^a Chains of Proteoglycans

Name	Hexosamine	Hexuronic acid	Galactose
Chondroitin sulfate	Galactosamine	Glucuronic acid	—
Dermatan sulfate	Galactosamine	Glucuronic acid and iduronic acid	—
Keratan sulfate	Glucosamine	—	Galactose
Heparan sulfate	Glucosamine	Glucuronic acid and iduronic acid	—

^a Key: GAG, glycosaminoglycan.

The inherent structure of the alternation of two types of monosaccharides would be expected to give simple polysaccharide units. Considerable heterogeneity exists within and among the individual chains, however, due to modification of the repeating units that is often incomplete. This includes sulfate substitutions at differing positions and epimerization of carbon 5 of GlcA to form IdoA. For example the biosynthesis of heparan sulfate and heparin is initiated by the formation of [GlcAb1-4GlcNAca1-4]n, which is then N-deacetylated, N-sulfated, and undergoes C5 epimerization of GlcA to yield IdoA. The IdoA units and GlcN can then be O-sulfated. The final product is therefore very diverse, and four distinct HexA and six GlcN units have been identified, allowing 17 different HexA-GlcN and 10 different GlcN-HexA.

Which GAG is attached to which protein depends on the protein core and the cell type. For example, CD44, which can mediate cell adhesion, trafficking, and motility, can be expressed as a chondroitin sulfate or heparan sulfate proteoglycan in a cell-type specific manner (16).

2.3. Function of the GAGs

The biological roles of proteoglycans are many and diverse, ranging from simple mechanical support to playing an important part in cellular recognition, adhesion, motility, and proliferation. Most of these effects depend on the binding of macromolecules to the GAGs; in fact, the anticoagulant and antiproliferative activities of heparin and heparan sulfate are mediated by the free GAG. Most biological activities, however, depend on the GAG in association with the protein core of the proteoglycan. Binding of GAGs to proteins can be highly specific, as in the binding of heparin to antithrombin, or, as is often the case, the interaction can be less specific and usually electrostatic in nature. The biological activities expressed by a single GAG can often be attributed to specific carbohydrate structures. For example, the interaction between heparin and the proteinase inhibitor antithrombin is based on the occurrence in the GAG of a sequence of five specific oligosaccharides. This sequence is composed of three GlcN units, one GlcA unit, and one IdoA unit, with O-sulfate groups at various positions. The key feature of this structure is an O-sulfate group on the internal GlcN, which is essential for the high-affinity binding of the proteinase inhibitor. Another specific sequence that has been defined on GAGs is that involved in the interaction between dermatan sulfate and heparin cofactor II. As many of the genes encoding the core proteins carrying GAGs have been or are being cloned, and as the methodology improves for the sequence analysis of the GAGs, it is likely that many interactions involving highly specific recognition/binding sequences in the GAGs will be found.

3. Concluding Remarks

The structure of O-glycans, synthesized from relatively few building blocks, is made extremely diverse by both the order of the addition of the sugars and the large number of covalent linkages that are possible. Thus the conformation of the O-glycan differs markedly when the same sugar is added to the same substrate in a different linkage (eg, Lea vs Lex ). These differences in shape can be recognized both by B cells producing specific antibodies and by cell surface receptors such as the selectins. The demonstration of such specificity has generated interest in the role of these interactions in differentiation and in cell-cell adhesion. Moreover, the very large number of genes identified that catalyze the reactions leading to the synthesis of O-glycans emphasizes the importance of their fine structure to the organism. The development of knockout mice defective in these genes will go some way toward identifying the crucial functions they control.