Experiments and Results - Protein Homology Detection Through Alignment of Markov Random Fields

Information Technology Reference

In-Depth Information

Table 4.16 Contribution of the edge alignment potential and mutual information (MI), measured

by alignment recall improvement on proteins with at least 256 non-redundant sequence homologs

in two benchmarks Set3.6K and Set2.6K

391 pairs in Set3.6K

509 pairs in Set2.6K

Exact match

(%)

4-offset

(%)

Exact match

(%)

4-offset

(%)

Only node potential

59.5

63.4

71.3

75.8

Node + edge potential, no

MI

62.1

66.7

73.5

78.1

Node + edge potential with

MI

65.2

69.8

76.6

81.0

The structure alignments generated by DeepAlign are used as reference alignments

information is mainly useful for proteins with many sequence homologs since it is

close to 0 for proteins with few sequence homologs. As shown in Tables 4.15 and

4.16 , if only the proteins with at least 256 non-redundant sequence homologs are

considered, the improvement resulting from mutual information is

3%.

*

4.7 Running Time

Figure 4.1 shows the running time of MRFalign with respect to protein length. As a

control, we also show the running time of the Viterbi algorithm, which is used by

our ADMM algorithm to generate alignment at each iteration. As shown in this

figure, MRFalign is no more than 10 times slower than the Viterbi algorithm. To

speed up homology detection, we may use the Viterbi algorithm to perform an

initial search without considering edge alignment potential, and keep only top 10 %

of proteins for further examination. Then we run MRFalign to search for homologs

from the top 10 %. Therefore, although MRFalign may be slow compared to the

Viterbi algorithm, empirically we can do homology search only slightly slower than

the Viterbi algorithm.

4.8 Is Our MRFalign Method Overtrained?

We conducted two experiments to show that MRFalign is not overtrained. In the

first experiment, we used 36 CASP10 hard targets as the test data. Since our

training set was built before CASP10 started, we can believe that there is no

redundancy between the CASP10 hard targets and our training data. Using MRF-

align and HHpred, respectively, we search each of these 36 test targets against

Protein Homology Detection Through Alignment of Markov Random Fields

Search WWH ::

Custom Search

Home