Information Technology Reference
In-Depth Information
the proteins under study share <30 % sequence identity. The problem of detecting
homologous proteins with relatively low (<30 %) sequence identity is called remote
homology detection. Remote homology detection is related to protein fold recog-
nition, which is to infer if proteins sharing low sequence identity have similar
structural folds or not. In this topic we do not explicitly distinguish remote
homology detection from fold recognition since many remote homology detection
methods including those presented in this topic also apply to fold recognition.
In the following sections, we will describe the state-of-the-art methods of
homology detection and fold recognition, which could be roughly grouped into two
main categories: alignment-dependent and alignment-free methods. Sequence
information is essential to homology detection and fold recognition. Nevertheless,
some methods also use predicted structure information. In this topic, we mainly
focus on sequence-based, alignment-dependent methods for remote homology
detection and fold recognition that detects remote homologs based on protein
alignment using mainly sequence information.
1.2 Related Work
Protein homology detection and fold recognition have been extensively studied and
good progress has been made. More than 5,000 research articles indexed in PubMed
( http://www.ncbi.nlm.nih.gov/sites/entrez ) show relevance to
fold recognition
or
.SeeFarisellietal.[ 9 ], Wan and Xu [ 10 ], Lindahl and
Elofsson [ 11 ], and Jones et al. [ 12 ] for reviews on some widely-used computational
methods and tools for remote homology detection and fold recognition.
This section reviews existing methods for remote homology detection and fold
recognition. We will describe alignment-free approaches such as kernel-based fold
recognition methods that classify a protein sequence into a speci
remote homology detection
c fold class without
aligning proteins. We will also describe alignment-dependent approaches that con-
duct homology detection and fold recognition based upon protein alignments. All the
methods employ a few basic information sources including protein amino acid
sequence and sequence pro
le encodes evolutionary
information of a protein and is usually derived from multiple sequence alignment
(MSA) of close sequence homologs. In addition, some methods also make use of
predicted structure information or even native structure information when available.
See Fig. 1.1 for the classi
le. Meanwhile, sequence pro
cation of existing homology detection methods.
1.3 Alignment-Free Methods for Homology Detection
and Fold Recognition
Alignment-free methods for homology detection and fold recognition do not
explicitly build protein alignments. In particular, alignment-free methods represent
a protein sequence or pro
le as a feature vector and then identify proteins of similar
Search WWH ::




Custom Search