Bioinformatics

Sequence complexity of proteins and its significance in annotation (Bioinformatics)

1. Introduction The concept of complexity of protein sequences originated from the consideration of sequences as strings of symbols that can be studied with linguistic methods. Simple descriptors such as amino acid compositional properties attracted early attention. These studies suggested that small globular proteins can be classified in accordance to amino acid composition and this […]

Protein domains in eukaryotic signal transduction systems (Bioinformatics)

1. Functional domains Originally, the concept of protein domains was derived from the analysis of three-dimensional structures (see Article 68, Protein domains, Volume 7). While small proteins typically have a “monolithic” structure consisting of a single fold, larger proteins can follow two different architectural principles. Some of them just form larger monolithic structures, while the […]

Protein repeats (Bioinformatics)

1. Definition Some protein domains are composed of units of similar structure (see Figure 1). Often, but not always, these units are also similar in sequence, which explains why they have a similar structure. Therefore, they can be considered protein repeats that originated by duplications from a single ancestral sequence. These small units are large […]

Large-scale protein annotation (Bioinformatics)

1. Introduction A bacterial genome encodes anywhere between 500 and 5000 proteins, whereas a large eukaryotic genome encodes as many as 25 000 proteins. Making sense of a whole-genome set of amino acid sequences may sound like a daunting task; if your approach were to consider each protein in turn through literature review and traditional […]

evolutionary constraints as protein properties reflecting underlying mechanisms (Bioinformatics)

1. Introduction Certain protein families are very highly conserved across the major divisions of life. This is illustrated, for example, by the alignment in Figure 1(a) of Ran GTPases, which demonstrate a remarkable degree of sequence conservation across metazoans, fungi, plants, and protozoans. Ran is a member of the Ras superfamily of small GTPases (Hall, […]

Large-scale, classification-driven, rule-based functional annotation of proteins (Bioinformatics)

1. Introduction The high-throughput genome projects have resulted in a rapid accumulation of predicted protein sequences for a large number of organisms. Meanwhile, scientists have begun to tackle protein functions and other complex regulatory processes using global-scale data generated at various levels of biological organization, ranging from genomes and proteomes to metabolomes (metabolites synthesized by […]

Signal peptides and protein localization prediction (Bioinformatics)

1. Introduction In 1999, the Nobel prize in Physiology or Medicine was awarded to Guinther Blobel “for the discovery that proteins have intrinsic signals that govern their transport and localization in the cell”. Since the subcellular localization of a protein is an important clue to its function, the characterization and prediction of these intrinsic signals […]

Transmembrane topology prediction (Bioinformatics)

1. Introduction Transmembrane (TM) proteins make up about 20% of all protein sequences known, yet less than 1% of all the known structures. This discrepancy is due to the fact that TM proteins are hard to overexpress and crystallize, and therefore difficult to examine with X-ray diffraction or NMR. It is, however, much easier to […]

IMPALA/RPS-BLAST/PSI-BLAST in protein sequence analysis (Bioinformatics)

1. Introduction: philosophy of profile-based analysis Sequence evolution is a largely stochastic process. Random undirected mutations occur during the DNA replication. Depending on the effect of these mutations on the structure and function of the protein, they may become fixed in the population. According to a neutral theory of molecular evolution (Kimura, 1983), the majority […]

The domains of life and their evolutionary implications (Bioinformatics)

1. Introduction Over a century ago, Darwin wrote: “The time will come I believe, though I shall not live to see it, when we shall have very fairly true genealogical trees of each great kingdom of nature” (Burkhardt and Smith, 1990). From his phrasing, it is clear that Darwin had more in mind than just […]