Protein domains in eukaryotic signal transduction systems (Bioinformatics)

1. Functional domains

Originally, the concept of protein domains was derived from the analysis of three-dimensional structures (see Article 68, Protein domains, Volume 7). While small proteins typically have a “monolithic” structure consisting of a single fold, larger proteins can follow two different architectural principles. Some of them just form larger monolithic structures, while the majority of large proteins consist of several smaller folding units, the so-called domains. Structural domains can fold independently from the rest of the proteins, and each one has its own hydrophobic core region (see Article 69, Complexity in biological structures and systems, Volume 7). As a consequence of their autonomous folding capabilities, domains can often be excised from their host protein and pasted into a different protein context, without major changes in fold or function. Evolutionary processes such as exon shuffling and gene duplication, -fusion or -fission have created the multidomain “mosaic” structure found in many protein classes, a prime example being proteins involved in eukaryotic signal transduction pathways (Bork et al., 1997; Ponting et al., 1999; see also Article 9, Modeling protein evolution, Volume 1).

In the course of domain evolution, the structural fold and key functional aspects have been kept largely intact. Some domain types have proven to be very successful as mediators of a particular unit function, and variants are found over and over again in extant proteins. By a combination of bioinformatical and experimental approaches, it has been possible to detect protein domains even in the absence of a three-dimensional structure (Copley et al., 2002). Here, functional protein domains appear as “homology domains”, that is, as regions of local sequence similarity in proteins that are otherwise unrelated. The exhaustive detection of homology domains in signal transduction proteins has been – and continues to be – an important prerequisite for understanding the physiological importance of these proteins (Attwood, 2000; Hofmann, 1998; Hofmann, 2000).


As many homology domains require sophisticated sequence analysis method for their detection, a number of tools, databases, and WWW servers have been set up to help the nonspecialist with that important task. Information on homology domains can be stored in the form of so-called Hidden Markov Models (HMMs) or generalized profiles (GPs) (Hofmann, 2000; see also Article 91, Classification of proteins by sequence signatures, Volume 6). Relevant databases, which maintain information on all domain types found in signal transduction proteins, are discussed in more detail elsewhere in this topic (see Article 83, InterPro, Volume 6, Article 84, Functionally and structurally relevant residues in PROSITE motif descriptors, Volume 6, and Article 86, Pfam: the protein families database, Volume 6), an overview of the most important resources is given in Table 1. InterPro is a meta-database combining information stored in several domain databases (Mulder et al., 2003). Thus, the signaling domains described in the remainder of the text will be referenced by their InterPro accession numbers, listed in Tables 1 to 5.

Table 1 Web-based resources for domain detection

Database URL of search page
InterPro http://www.ebi.ac.uk/InterProScan
PROSITE http://hits.isb-sib.ch/cgi-bin/PFSCAN
Pfam http://www.sanger.ac.uk/Sofftware/Pfam/search.shtml
SMART http://smart.embl-heidelberg.de

Table 2 Functional domains in phospho-protein signaling. Only the most relevant domain types are shown, together with their INTERPRO accession number

Role InterPro Subtype
Protein kinases IPR000719 Classical type
IPR000403a Lipid-kinase like
IPR000687 Rio-type
IPR004166 alpha-kinase
Protein phosphatases IPR001932 Ser/Thr, PP2C type
IPR004843a Ser/Thr, calcineurin-type
IPR000106 Tyr, LMW-type
IPR000242 Tyr, PTP-type
IPR001763a Tyr, Cdc25-type
IPR000340 dual specificity
Recognition IPR000980 SH2 (pTyr-recognition)
IPR006020a PTB/PID (pTyr)
IPR000253 FHA (pSer/pThr)

aDomain family also includes members with different activities.

2. Eukaryotic signaling systems

Eukaryotic cells must respond to a large number of external stimuli, which are typically sensed by means of specific receptors localized on the cell surface. As a result of such stimulation, cells have a large repertoire of possible responses, including increased or decreased proliferation, differentiation, or apoptosis. On the molecular level, these responses can be mediated by increased or decreased expression of specific genes, and also by changes in the translation level of particular mRNAs or in the stability of specific protein products. Eukaryotic cells have developed a number of pathways for conveying signals from the cell surface to the transcription-, translation-, or degradation machinery (Pawson, 2004). In some instances, there is a simple and straightforward pathway leading from a cell surface receptor to a transcriptional regulator. More frequently, however, signal transduction pathways look – at first sight – unnecessarily complicated and involve a multitude of intermediary components. The reason for this complexity lies most probably in the requirement for cells to integrate the signals coming from different sources before mounting the most appropriate response. The components of signal transduction pathways have evolved to allow for a well-balanced amount of cross talk between pathways. In many instances, the whole concept of an isolated “signal transduction pathway” is not really tenable, and it appears more appropriate to talk of a “signal transduction network” instead.

Table 3 Functional domains in small G-protein signaling

Role InterPro Subtype
Small GTPase IPR003577 Ras
IPR003578 Rho
IPR003579 Rab
IPR002041 Ran
Exchange factor IPR001895 For Ras and similar
IPR001331 For Rho/Rac/Cdc42
IPR007515 For Rab proteins
IPR001194 For Rab proteins
IPR000408a For Ran
GTPase activating IPR001936 For Ras and similar
IPR000198 For Rho/Rac/Cdc42
IPR000195 For Rab proteins
IPR000331 For Ran
Recognition IPR003116 RAFRBD for Ras
IPR000159 RA domain for Ras
IPR000095 Crib domain for Rho etc.
IPR0033156 RABPH for Rab proteins
IPR0040126 RABRAP for Rab and Rap
IPR000156 RanBP for Ran

aDomain family also includes members with different activities.

fcTwo out of many Rab-binding domains that recognize only selected Rab proteins.

Table 4 Functional domains in ubiquitin signaling

Role InterPro Subtype
Modifier IPR000626a Ubiquitin-like
E1, E2, E3 factors IPR000011 Ubiquitin activating (E1)
IPR000608 Ubiquitin conjugating (E2)
IPR000569 HECT-type (E3)
IPR001841 RING-finger (E3)
Deubiquitinating IPR001394 USP type (for ubiquitin)
IPR001578 UCH type (for ubiquitin)
IPR003653 ULP type (for sumo etc.)
IPR003323 OTU type (for ubiquitin?)
IPR006155 Josephin type (for ubiquitin?)
Recognition IPR000449 UBA (for ubiquitin, NEDD8)
IPR003903 UIM (for ubiquitin)
IPR003892a CUE (for ubiquitin)

Table 5 Functional domains in signaling adaptors

Role InterPro Subtype
Proline-based motif recognition IPR001452 SH3
IPR001202 WW
Death domain like 6-helix bundle IPR000488 Death domain (DD)
IPR001875 Death effector domain (DED)
IPR001315 Caspase recruitment domain (CARD)
IPR004020 Pyrin domain (PYD)

2.1. Modularity in signal transduction

A hallmark of eukaryotic signal transduction proteins, starting from the cell surface receptor down to the final effector component, is their modular architecture (Bork et al., 1997). As a common mode of action, the receptors transform the external signal (e.g., the binding of a ligand) into a different, intracellular signal, which is then recognized by a downstream component and potentially converted into a third signal and so forth. Canonical signal transduction proteins comprise several receptor-and effector-functionalities, sometimes augmented by additional regulatory components, all of which are encoded by different domains. Besides the generation and the detection of a signal, the termination of the signal can be of crucial importance. Thus, most signal transduction systems also include dedicated components for signal termination, which are also encoded by specific domain types. It would be outside the scope of this chapter to mention, or even to discuss all signal trans-duction pathways employed by eukaryotic cells. Figure 1 shows three well-studied signaling paradigms that will be used as examples for the following paragraphs.

Growth factor signaling, shown in Figure 1(a), was the first signaling pathway to be understood in detail and is still among the most complex pathways known (Pawson, 2004; see also Article 117, EGFR network, Volume 6). This pathway makes use of three widely used signaling paradigms: proximity activation of enzymes, protein phosphorylation, and GTPase signaling. In brief, the cell surface receptor binds the growth factor by its extracellular domain, leading to a receptor dimerization that activates the tyrosine kinase domains found in the cytoplasmic portion of the receptor. The resulting tyrosine phosphorylation of the receptor is sensed by the SH2-domains of a downstream component, which transduces the signal via a GDP/GTP exchange protein to the small GTPase Ras. The GTP-bound Ras in turn transduces the signal to a downstream cascade of kinases, eventually resulting in MAP-kinase activation. Finally, MAP-kinase phosphorylates a large number of target proteins, including transcriptional regulators. Each of the generated downstream signals has a specialized system for signal termination.

Cytokine signaling, shown in Figure 1(b), employs a simpler pathway from receptor dimerization to transcriptional regulation (Shuai and Liu, 2003). Here, the receptor does not contain a kinase domain but rather binds noncovalently to Jak-kinases and activates them upon receptor dimerization. The resulting auto-phosphorylation of Jak-kinases recruits factors of the STAT family via an SH2 domain. Subsequently, the STAT proteins themselves become phosphorylated, which allows them to translocate into the nucleus where they work as transcription factors by virtue of their DNA-binding and transactivation domains. Important regulators of this pathway work by selective ubiquitinating and thus downregulating proteins of the cytokine signaling pathway. Similar regulation modes by ubiquitination and selective protein degradation are known in many other signaling pathways as well.

Simplified representation of three example pathways discussed in Section 2.1. (a) Growth factor signaling. The growth factor EGF and the membrane-associated Ras proteins are indicated by colored bubbles. The other signal transduction proteins are represented by their differently colored domains. The names of the corresponding proteins are given at the bottom of the figure. Interacting domains are shown juxtaposed to each other. Full domain names are given whenever possible. Abbreviated domain names are "RasGEF" for Ras GDP/GTP exchange domain, and "RBD" for the Ras-binding domain in the Raf kinase. (b) Cytokine signaling. Representation analogous to Figure 1(a). (c) Death receptor signaling. The trimeric death-ligand (e.g., FasL) and the mitochondrial proteins Bid and Cytochrome C are indicated by colored bubbles. The signal transduction proteins are represented by their differently colored domains, where DD, DED, and CARD mean "death domain", "death effector domain", and "caspase recruitment domain" respectively. Interacting domains are shown juxtaposed to each other. Caspase-8 cleaves Bid and Caspase-9 cleaves proCaspase-3, all other interactions are binding events

Figure 1 Simplified representation of three example pathways discussed in Section 2.1. (a) Growth factor signaling. The growth factor EGF and the membrane-associated Ras proteins are indicated by colored bubbles. The other signal transduction proteins are represented by their differently colored domains. The names of the corresponding proteins are given at the bottom of the figure. Interacting domains are shown juxtaposed to each other. Full domain names are given whenever possible. Abbreviated domain names are “RasGEF” for Ras GDP/GTP exchange domain, and “RBD” for the Ras-binding domain in the Raf kinase. (b) Cytokine signaling. Representation analogous to Figure 1(a). (c) Death receptor signaling. The trimeric death-ligand (e.g., FasL) and the mitochondrial proteins Bid and Cytochrome C are indicated by colored bubbles. The signal transduction proteins are represented by their differently colored domains, where DD, DED, and CARD mean “death domain”, “death effector domain”, and “caspase recruitment domain” respectively. Interacting domains are shown juxtaposed to each other. Caspase-8 cleaves Bid and Caspase-9 cleaves proCaspase-3, all other interactions are binding events

Simplified representation of three example pathways discussed in Section 2.1. (a) Growth factor signaling. The growth factor EGF and the membrane-associated Ras proteins are indicated by colored bubbles. The other signal transduction proteins are represented by their differently colored domains. The names of the corresponding proteins are given at the bottom of the figure. Interacting domains are shown juxtaposed to each other. Full domain names are given whenever possible. Abbreviated domain names are "RasGEF" for Ras GDP/GTP exchange domain, and "RBD" for the Ras-binding domain in the Raf kinase. (b) Cytokine signaling. Representation analogous to Figure 1(a). (c) Death receptor signaling. The trimeric death-ligand (e.g., FasL) and the mitochondrial proteins Bid and Cytochrome C are indicated by colored bubbles. The signal transduction proteins are represented by their differently colored domains, where DD, DED, and CARD mean "death domain", "death effector domain", and "caspase recruitment domain" respectively. Interacting domains are shown juxtaposed to each other. Caspase-8 cleaves Bid and Caspase-9 cleaves proCaspase-3, all other interactions are binding events

Figure 1 (continued)

Finally, the signaling pathway from death receptors to cell apoptosis is unusual in that it does not involve protein phosphorylation (Figure 1c) (Wajant, 2003). Central to this pathway is the activation of a protease class, the caspases, by induced proximity (Salvesen and Dixit, 1999). Caspases do not bind directly to the receptors, but rather use a cascade of unique adaptor protein containing domains of the death-domain 6-helix bundle superfamily (Aravind et al., 2001; Hofmann, 1999; Hofmann, 2003). Another unusual feature is the liberation of cytochrome C from the mitochondrion as an intermediate signal, which is sensed by the multidomain protein APAF-1. This binding event triggers the downstream branch of apoptosis signaling, which again involves caspases and members of the death-domain superfamily.

2.2. Proximity activation of enzymes

One of the recurrent motifs in eukaryotic signal transduction pathways is the activation of enzymes by induced proximity. Enzymatic activities that can be activated by this mechanism include protein kinases and proteases such as caspases. Frequently, these activities are encoded by functional domains, which are brought into close contact by the enzymatically inactive remainder of the protein. As a prerequisite for being susceptible to proximity activation, the enzyme must be inactive in its isolated form, or must be highly specific for a particular substrate to which it normally does not have access. Signaling processes, like for example receptor dimerization, bring two enzyme molecules or two enzymatic domains into close contact. In the case of caspases, which are synthesized as virtually inactive proenzymes, this close proximity is sufficient for the mutual proteolytic removal of the inactivating prodomain by the very weak residual activity of the two proenzyme molecules (Salvesen and Dixit, 1999). As the caspase prodomains are also responsible for their dimerization, the activated caspases are liberated from the complex. In the case of receptor tyrosine kinases, the two associating kinase domains phosphorylate each other at a position that is not accessible for intramolecular auto-phosphorylation.

2.3. Protein phosphorylation

Protein phosphorylation is probably the most widely used mechanism used in eukaryotic signal transduction (Pawson, 2004; see also Article 63, Protein phosphorylation analysis by mass spectrometry, Volume 6). The signal itself consists of a phosphate group that becomes attached to the hydroxyl group of a tyrosine, serine, or threonine residue. A large variety of different signals exist, which are distinguished by the nature of the phosphorylated protein and by the site of phosphorylation. Many signaling components can become phosphorylated at multiple positions, sometimes with opposing effects. The phosphorylation signal is generated by protein kinases, whose enzymatic activities are typically encoded by functional kinase domains (see Table 2). Most known kinase domains are evolutionarily related. There are subtle differences between kinases with preference for tyrosine and those preferring either serine or threonine. One group, the so-called dual specificity kinases, are able to modify both tyrosine and serine/threonine residues, their catalytic domains closely resemble the pure tyrosine kinases. Besides the large number of “classical” protein kinases, there also exists a number of different protein domains associated with protein kinase activity (Table 2). Despite their heterogeneity, bioinformatical analyses suggest that all of those domains are distantly related to the canonical protein kinase domains.

Unlike the kinases, which are encoded by recognizable domains, there does not seem to be such a preference for phosphorylatable residues. Most kinases have a preference for certain residues flanking the modified residues, and these conditions appear sufficient to observe phosphorylation in vitro. However, these preferences are far too weak to explain the high phosphorylation specificity observed in vivo. In most cases, a specific substrate recognition by the noncatalytic part of the kinase molecule is required for efficient in vivo phosphorylation. Phosphorylation signals are not necessarily permanent, but can be removed by enzymes belonging to the class of protein phosphatases. In terms of protein relationship, phosphatases are a much more heterogeneous protein class than kinases. A number of different phosphatase domains have been described and are summarized in Table 2.

Besides the mechanisms for generating and terminating phosphorylation signals, cells must have a sophisticated system for the specific recognition of protein phosphorylation; this is typically performed by specialized phosphopeptide-recognition domains (Pawson, 2004; Tsai, 2002). The best known of these domains is the SH2 domain, which recognizes tyrosine phosphorylation. The PTB/PID domain has a similar specificity, but is more rare. The FHA domain seems to perform an analogous task for the recognition of phosphorylated serine/threonine residues (Table 2).

2.4. Small G-protein signaling

Another important signaling system uses the nucleotide association status of small GTPases of the Ras superfamily (Geyer and Wittinghofer, 1997; Sprang, 1997). This system appears to be optimally tuned for reversibility and hence is frequently referred to as a “molecular switch”. Small G-proteins, such as Ras, Rho, Rab, Ran, and others, can associate with either GTP or GDP. In many cellular systems, the GTP-associated form has an activating role, while the GDP-bound form is either inactive or even opposes the role of the GTP-form. Small G-proteins have a very low intrinsic GTPase activity, and without external influences both forms can be stable over a prolonged period of time. An activating signal can be generated by the GDP/GTP exchange factors, which load the small proteins with GTP. This activity is typically encoded by specialized domain classes, each of them responsible for one type of G-protein (see Table 3). Conversely, the activating signal can be terminated by a number of GTPase activating proteins (GAPs). GAP domains specifically bind to their cognate G-proteins and stimulate their intrinsic GTPase activity. Similar to the exchange factors, a particular class of GAP domains acts on each subclass of small G-proteins.

Owing to the large diversity of small G-proteins, there exist a large number of different systems for sensing their GTP/GDP status (Table 3). GTP-associated Ras can be recognized by the RAFRBD domain found in the Raf kinase of the growth hormone signaling pathway (Wittinghofer and Nassar, 1996). The RA domain is distantly related to the RAFRBD and occurs in a large number of Ras effectors. Similar systems exist for other G-proteins, including the Crib domain for Rho/Rac recognition and the RanBP domain for Ran recognition (Table 3).

2.5. Ubiquitination

The covalent modification of proteins with ubiquitin has initially been interpreted exclusively as an earmark for protein degradation by the proteasome. During recent years, however, it has become increasingly clear that protein ubiquitination is used for a variety of signaling purposes, including – but not limited to -different mechanisms for tightly regulated protein degradation (Pickart, 2004). Ubiquitination comes close to, or even surpasses protein phosphorylation in terms of flexibility of the generated signal and also in the number and complexity of signaling components involved (Di Fiore et al., 2003) (Table 4). While protein phosphorylation can modify serine, threonine, of tyrosine side chains, ubiquitination is restricted to the e-amino groups of lysine, and possibly to the protein N-terminus. In contrast to the simple phosphorylation signal, there exists a multitude of ubiquitin-based signal, as ubiquitin itself can be ubiquitinated at various lysine residues, resulting in the formation of poly-ubiquitin chains of different lengths and linkage topologies. There is accumulating evidence that mono-ubiquitin and the different poly-ubiquitin chains all have different signaling capabilities (Schnell and Hicke, 2003). An additional layer of complexity is provided by several ubiquitin-like modifiers that are distinct from ubiquitin but have similar capabilities for protein modification (Schwartz and Hochstrasser, 2003).

The transfer of ubiquitin and its relatives onto a protein requires a cascade of three enzyme classes termed “ubiquitin activating enzymes” (E1), “ubiquitin conjugating enzymes” (E2) and “ubiquitin ligases” (E3). There are two different classes of E3 enzymes, one of them based on HECT domains, the other one on RING fingers, which are a specific form of complex Zn-finger domains. E3 components clearly harbor most of the diversity, as these components confer specificity to the ubiquitination reaction. Analogous to the phosphatases in kinase signaling, there are a large number of proteases that specifically remove ubiquitin from target proteins (Wing, 2003). Similar to the ubiquitination enzymes, the deubiquitinating activity is typically encoded by dedicated domains, which are surrounded by protein interaction domains conferring target specificity. The ubiquitin signal can be recognized by a number of ubiquitin-sensing domains, the most prominent examples being the UBA, UIM, and CUE domains (Di Fiore et al., 2003).

2.6. Signaling adaptors

Not all intermediary steps in signal transduction involve the modification of proteins. As exemplified by the death receptor signaling pathway described above, “adaptor proteins” can play a crucial role, in particular, for mediating a controlled amount of pathway cross talk (Hofmann, 1999; Hofmann, 2003). A typical adaptor protein contains two different protein interaction domains, one of them binding to the receptor or another upstream component, and the other one recruiting a downstream signaling component into a “signaling complex”.

A prime example is the DISC (for “death inducing signaling complex”), which forms at the cytoplasmic face of death receptors and is crucial for apoptosis signaling (Wajant, 2003). The cytoplasmic tail of death receptors like Fas and TNF-R55 contain a specific six-helix bundle domain termed “death domain” (DD), which has a high propensity to bind to other death domains. This property is used by FADD, a typical adaptor protein that has a death domain binding to the Fas receptor, combined with a “death effector domain” (DED) used for downstream signaling. The DED is distantly related to the death domain, but specifically recognizes other DEDs, but not DDs. The next downstream component, Caspase-8, contains two N-terminal DEDs, which recruit the enzyme to the receptor/adaptor complex, resulting in proximity activation of the caspase (Figure 1c). A third domain class belonging to the same six-helix fold is the “caspase recruitment domain” (CARD), which is found in other adaptor proteins and caspases, which work further downstream in apoptosis signaling. Recently, a fourth class of six-helix interaction domain has been described, the “pyrin domain” (PYD). All four domain classes share the propensity to interact with multiple other domains belong to the same class (Table 5).

Specific protein interactions are also crucial for other signaling events, including the growth factor pathway. For reasons that are not entirely clear, multiple signaling interactions make use of the specific recognition of short proline based motifs by dedicated domain classes, including the intensively studied SH3 and WW domains (Macias et al., 2002; Zarrinpar et al., 2003) (Table 5). Although both domain types have similar recognition capabilities, the SH3 domain is prevalent in phosphorylation-based pathways, while the WW domain is more frequent in ubiquitination-based processes.

3. Signaling specificity and complexity

The previous paragraphs have briefly discussed some of the most important domain types found in eukaryotic signaling proteins. Obviously, this domain list is far from being complete. Important signaling systems, such as the heterotrimeric G-proteins and protein domains involved in their regulation have been omitted for space constraints. The same is true for domains that generate or sense small-molecule second messengers. Important examples are adenylate- and guanylate cyclases generating cAMP and cGMP, respectively, and phospholipase C enzymes involved in the generation of inositol- and lipid messengers. Analogous to protein signals, the small-molecule messengers are sensed by effector proteins, which typically also employ modular domains for that task. Several signaling pathways rely on the localization of its components, in particular subcellular compartments, such as the plasma membrane, the nucleus, or even some kind of nuclear body. Protein domains that mediate subcellular targeting, like for example the C2 domain, have also not been discussed.

Finally, one important question should be briefly addressed: If all members belonging to one domain class perform the same unit function, how is the necessary signaling specificity provided? There does not appear to be a general answer applying to all instances. In most cases, the unit function of a domain class is only weakly defined, like for example, “phosphorylation on a tyrosine residue” or “binding to a death domain protein”. There are marked differences between the various tyrosine kinase domains, not only in their sequence but also in their molecular properties. Thus, it is well possible to encode the specificity within the nonessential part of the homology domain. A second source of specificity comes from the use of specific targeting domains. Catalytic domains, which may be quite promiscuous in isolated form, are frequently found in a protein context next to a specificity-conferring protein interaction domain. The possibilities offered by this modular architecture are probably the major reason why the evolutionary shuffling of functional domains in signal transduction systems has turned out to be such a success story.

Next post:

Previous post: