Information Technology Reference
In-Depth Information
The ROC curves of the three classifiers (BindN-RF, BindN and DBS-PSSM) are
shown in Fig. 3. Based on the predictions made for the PDC25t test dataset, the RF
classifier (BindN-RF) clearly shows the best performance for almost all the trade-offs
between sensitivity and specificity. The results suggest that the RF-based approach
gives rise to more accurate prediction of DNA-binding residues in protein sequences
than the previous methods.
4 Conclusions
Sequence-based prediction of DNA-binding residues can provide useful information
for protein function annotation, protein-DNA docking and biological experiments. To
improve the prediction accuracy, a random forest-based approach has been developed
to combine relevant biochemical features with several descriptors of evolutionary
information for input encoding. The new descriptors of evolutionary information have
been shown to enhance classifier performance when they are used together with the
biochemical features and position-specific scoring matrices. Thus, the new descriptors
capture certain evolutionary information that is not contained in position-specific
scoring matrices previously used for DNA-binding site prediction. It has also been
shown in this study that evolutionary information can enhance classifier performance
for predicting DNA-binding residues defined by both the atom distance-based and
ASA-based criteria. The random forest-based approach gives rise to more accurate
prediction of DNA-binding residues than previously published methods. By using a
separate test dataset, the best random forest classifier achieved 80.00% overall accu-
racy with 73.08% sensitivity and 80.63% specificity. This classifier is currently being
used to upgrade our web server, BindN (http://bioinfo.ggc.org/bindn/), which has
been frequently accessed for biological research.
References
1. Ptashne, M.: Regulation of transcription: from lambda to eukaryotes. Trends Biochem.
Sci. 30, 275-279 (2005)
2. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov,
I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Res. 28, 235-242 (2000)
3. Sarai, A., Kono, H.: Protein-DNA recognition patterns and predictions. Annu. Rev.
Biophys. Biomol. Struct. 34, 379-398 (2005)
4. Ahmad, S., Gromiha, M.M., Sarai, A.: Analysis and prediction of DNA-binding proteins
and their binding residues based on composition, sequence and structural information.
Bioinformatics 20, 477-486 (2004)
5. Yan, C., Terribilini, M., Wu, F., Jernigan, R.L., Dobbs, D., Honavar, V.: Predicting DNA-
binding sites of proteins from amino acid sequence. BMC Bioinformatics 7, 262 (2006)
6. Ahmad, S., Sarai, A.: PSSM-based prediction of DNA binding sites in proteins. BMC Bio-
informatics 6, 33 (2005)
7. Kuznetsov, I.B., Gou, Z., Li, R., Hwang, S.: Using evolutionary and structural information
to predict DNA-binding sites on DNA-binding proteins. Proteins 64, 19-27 (2006)
8. Hwang, S., Gou, Z., Kuznetsov, I.B.: DP-Bind: a web server for sequence-based prediction
of DNA-binding residues in DNA-binding proteins. Bioinformatics 23, 634-636 (2007)
Search WWH ::




Custom Search