Information Technology Reference
In-Depth Information
Chapter 4
Experiments and Results
Abstract This chapter describes the experimental results of the MRF-based
method for homology detection and fold recognition including alignment accuracy,
success rate, running time and contribution of some important features. This chapter
also compares the MRF-based method with currently popular PSSM- and
HMM-based methods such as HHpred, HHblits and FFAS, in terms of alignment
accuracy and success rate of homology detection and fold recognition.
Keywords Alignment accuracy
Homology detection success rate
Fold
recognition rate
HHpred
HHblits
FFAS
4.1 Training and Validation Data
To train the node alignment potential, we constructed the training and validation
data from SCOP70, in which any two proteins share <70 % sequence identity. In
total we use a set of 1,400 protein pairs as the training and validation data, which
covers 458 SCOP fold [ 1 - 3 ]. The sequence identity of all the training and vali-
dation protein pairs is uniformly distributed between 20 and 70 %. Further, two
proteins in a pair are similar at superfamily or fold level. A training or validation
protein has less than 400 residues and contains less than 10 % of residues without
3D coordinates. The reference alignment for a protein pair is generated by a
structure alignment tool DeepAlign. Each reference alignment has fewer than 50
middle gap positions and the number of terminal gaps is less than 20 % of the
alignment length. Five-fold cross validation is used to determine the hyper-
parameters in our machine learning model. In particular, each time we choose 1,000
out of the 1,400 protein pairs as the training data and the remaining 400 pairs as the
validation data such that there is no fold-level redundancy between the training and
validation data.
Search WWH ::




Custom Search