Information Technology Reference
In-Depth Information
recognition accuracy to 62 % on the test datasets used by Ding and Dubchak [ 32 ].
Dong et al. [ 33 ] developed a method called ACCFold with an overall accuracy of
70.1 %. ACCfold employs autocross-covariance (ACC) transformation to convert a
PSI-BLAST sequence pro
le (i.e., position-speci
c scoring matrix) into a series of
er for fold recognition.
Ghanty and Pal [ 34 ] developed a fold recognition method that uses a bi-gram
histogram to represent a protein sequence.
Recently Zakeri et al. [ 30 ] have developed a method called GEOMEAN and
reported that GEOMEAN can achieve fold recognition success rate of above 80 %.
GEOMEAN achieves this by taking a geometry inspired mean of different kernel
matrices (functions) instead of using a linear combination. SVM-Pairwise [ 18 ]is
another sensitive method that combines SVM and Pairwise protein alignment to
achieve superior performance. SVM-Pairwise is not a strict alignment-free method
since it uses a kernel function de
fixed-length vectors, which are then input to an SVM classi
ned on protein sequence alignment. SVM-Pair-
wise was tested with both dynamic-programming-based alignment and BLAST
alignment. SVM-Pairwise is among the best methods in terms of accuracy, but it is
slow since it takes time to build alignments for large proteins. In addition, false
positives in alignment (i.e., two unrelated residues are aligned) may impact
homology detection rate of SVM-Pairwise.
1.4 Alignment-Based Methods for Homology Detection
and Fold Recognition
Alignment-based methods detect homologs by
first aligning a query protein to each
of the subject proteins in the database and then rank and select homologs based
upon alignment quality, which is evaluated by a scoring function. Alignment-based
homology detection faces two major challenges. One is to design a good scoring
function that can yield accurate protein alignments. The other is to select homologs
based upon alignments, which is usually done by evaluating statistical signi
cance
(e.g., calculating E-value) of a raw alignment score or by machine learning.
According to information sources used for proteins under study, alignment-
based methods can be grouped into three categories: sequence-sequence (i.e., pri-
mary sequence information for both proteins under comparison), sequence-pro
le
(i.e., primary sequence information for one protein and sequence pro
le for the
other) and pro
les for both proteins), as shown in
Fig. 1.2 . Generally speaking, sequence-sequence methods are less sensitive than
sequence-pro
le-pro
le (i.e., sequence pro
le methods, which in turn are less sensitive than pro
le-pro
le
methods. However, sequence-sequence methods are more speci
c than sequence-
pro
le methods. See a
review by Wan and Xu [ 10 ] for a list of sequence-sequence, sequence-pro
le methods, which in turn are more speci
c than pro
le-pro
le, and
pro
le-pro
le alignment methods developed in the past few years.
Search WWH ::




Custom Search