Information Technology Reference
In-Depth Information
Table 3. Effect of evolutionary information on the performance of random forest classifiers
constructed using PDNA-62 with the ASA-based definition of DNA-binding residues
Evolutionary
information
A ccuracy
(%)
S ensitivity
(%)
S pecificity
(%)
Strength
(%)
ROC
AUC
MCC
None
70.64
70.40
70.71
70.55
0.35
0.77
B
72.44
73.09
72.26
72.67
0.39
0.81
PSSM
78.51
76.51
79.06
77.78
0.48
0.86
H m , H d , K m , K d
76.38
76.82
76.26
76.54
0.46
0.84
PSSM , H m , H d , K m , K d
78.55
77.92
78.72
78.32
0.49
0.87
The effect of evolutionary information on classifier performance has further been
examined using ROC analysis. The ROC curves shown in Fig.1 have been generated
by varying the output threshold of RF classifiers, and each point on a ROC curve
represents a trade-off between sensitivity and specificity. For classifier performance
comparison, the ROC curve of a more accurate classifier is closer to the left-hand and
top borders of the plot. As shown in Fig. 1, the RF classifier trained with the two
types of evolutionary information (HKM+EI) is clearly better than the classifier con-
structed using only biochemical features (HKM).
Next, we investigated the effect of evolutionary information on RF classifiers con-
structed using the PDNA-62 dataset with the ASA-based definition of DNA-binding
residues. The ASA-based definition gave rise to more DNA-binding residues than the
atom distance-based definition. While the ASA-based set of DNA-binding residues
included 97.21% of the atom distance-based set of positive data instances (1,082
positive data instances), the ASA-based set also contained 553 positive data instances
that were designated as non-binding residues by the atom distance-based definition. In
other words, 33.82% of the DNA-binding residues defined by the ASA-based crite-
rion were not included in the atom distance-based set of positive data instances.
As shown in Table 3, the classifier constructed without evolutionary information
achieved 70.55% prediction strength with MCC = 0.35 and ROC AUC = 0.77, compa-
rable to the levels of performance measures shown in Table 2. Adding the different
descriptors of evolutionary information ( B , PSSM, H m , H d , K m and K d ) for input encod-
ing improved the performance of RF classifiers. Furthermore, the best classifier in Table
3 was also obtained by combining the different descriptors of evolutionary information
(PSSM, H m , H d , K m and K d ) with the three biochemical features. This RF classifier had
the prediction strength at 78.32% with 77.92% sensitivity and 78.72% specificity, MCC
= 0.49 and ROC AUC = 0.87 (Table 3). The classifier performance improvement by
using evolutionary information has further been confirmed in the ROC analysis (Fig. 2).
Therefore, with the ASA-based definition of DNA-binding residues, the use of evo-
lutionary information was also found to significantly improve the performance of RF
classifiers. In the previous study by Yan et al. [5], Naïve Bayes classifiers were con-
structed for predicting DNA-binding residues defined by the ASA-based criterion, and
Search WWH ::




Custom Search