Experiments and Results - Protein Homology Detection Through Alignment of Markov Random Fields

Information Technology Reference

In-Depth Information

Chapter 4

Experiments and Results

Abstract This chapter describes the experimental results of the MRF-based

method for homology detection and fold recognition including alignment accuracy,

success rate, running time and contribution of some important features. This chapter

also compares the MRF-based method with currently popular PSSM- and

HMM-based methods such as HHpred, HHblits and FFAS, in terms of alignment

accuracy and success rate of homology detection and fold recognition.

Keywords Alignment accuracy

Homology detection success rate

Fold

recognition rate

HHpred

HHblits

FFAS

4.1 Training and Validation Data

To train the node alignment potential, we constructed the training and validation

data from SCOP70, in which any two proteins share <70 % sequence identity. In

total we use a set of 1,400 protein pairs as the training and validation data, which

covers 458 SCOP fold [ 1 - 3 ]. The sequence identity of all the training and vali-

dation protein pairs is uniformly distributed between 20 and 70 %. Further, two

proteins in a pair are similar at superfamily or fold level. A training or validation

protein has less than 400 residues and contains less than 10 % of residues without

3D coordinates. The reference alignment for a protein pair is generated by a

structure alignment tool DeepAlign. Each reference alignment has fewer than 50

middle gap positions and the number of terminal gaps is less than 20 % of the

alignment length. Five-fold cross validation is used to determine the hyper-

parameters in our machine learning model. In particular, each time we choose 1,000

out of the 1,400 protein pairs as the training data and the remaining 400 pairs as the

validation data such that there is no fold-level redundancy between the training and

validation data.

Search WWH ::

Custom Search

Home