Information Technology Reference
In-Depth Information
1 Introduction
Proteins are the building blocks of all cells in the living creatures of all kingdoms.
Proteins are produced by the process of translation. In this process, transcribed gene
sequence or mRNA is translated into a linear chain of amino acids which are called
proteins. To characterize the structural topology of proteins, primary, secondary,
tertiary and quaternary structure levels have been proposed. In the hierarchy, pro-
tein secondary structure (PSS) plays an important role in modeling of the protein
structures because it represents the local conformation of amino acids into regular
structures. There are three basic secondary structure elements (SSEs): alpha-helices,
beta-strands and coils. Alpha helices are corkscrew-shaped conformations where
the amino acids are packed tightly together. Beta sheets are made up of two or more
adjacent strands connected to each other by hydrogen bonds, extended so that the
amino acids are stretched out as far from each other to form beta strand. There are
also two main categories of the beta-sheet structures: if strands run in the same
direction then, called parallel-sheet whereas, if they run in the opposite direction
then, called anti-parallel beta-sheet. Several approaches have been taken in order to
devise tools for predicting the secondary structure from the protein sequence alone.
Moreover, secondary structure itself may be suf
cient for accurate prediction of a
protein
s tertiary structure (Przytycka et al. 1999 ). Therefore, many researchers
employ PSS as a feature to predict the tertiary structure (Gong and Rose 2005 ),
function (Lisewski and Lichtarge 2006 ) and sub-cellular localization of proteins
(Nair and Rost 2003 , 2005 ; Su et al. 2007 ).
Proteins have a precise tertiary structure that directs their function. Determining
the structures of various proteins would aid in our understanding of the mechanisms
of protein functions in biological systems. Prediction of protein structure from
amino acid sequences has been one of the most challenging tasks in computational
biology/bioinformatics for many years (Baker and Sali 2001 ; Skolnick et al. 2000 ).
Currently, only biophysical experimental techniques such as X-ray crystallography
and nuclear magnetic resonance are able to provide precise protein tertiary struc-
tures. There are 17,473,872,940 protein sequences in the latest release of UNIversal
PROTein resource KnowledgeBase (UniProtKB)/Translated European Molecular
Biology Laboratory (TrEMBL) as of 22nd April 2014, whereas the Protein Data
Bank (PDB) contained only 99,624 protein structures till then. This is achieved as a
result of an increase in large-scale genomic sequencing projects and the inability of
proteins to crystallize or crystals to diffract well. This gap has widened too much
over the last decade, despite the development of dedicated high-throughput X-ray
crystallography pipelines (Berman et al. 2000 ). Solving the protein structure by
Nucleic Magnetic Resonance (NMR) is limited to small and soluble proteins only.
Moreover, X-ray crystallography and NMR are costly and time consuming methods
for solving the protein structure. A list of the number of different types of molecules
in PDB and their experimental methods by which the structure is determined is
listed in Table 1 . Therefore, the computational prediction of structure of proteins is
'
Search WWH ::




Custom Search