Method - Protein Homology Detection Through Alignment of Markov Random Fields

Information Technology Reference

In-Depth Information

2.6 Scoring Similarity of One Markov Random Fields

and One Template

This MRF-based alignment method can also be applied to protein threading. In this

scenario, one of the two proteins under alignment has solved 3D structure. Of

course we can just simply use the node and edge alignment potentials described in

previous sections to align one MRF to one solved structure. In order to use the

native structure information in the protein with solved 3D structure, we may revise

the alignment potentials as follows.

1. Instead of using predicted secondary structure and solvent accessibility, we may

use their native information for the protein with solved 3D structure, which can

be generated by DSSP [ 22 ].

2. Let T denote the protein with solved 3D structure. We can directly calculate the

inter-residue distance for any residue pairs in T. That is, pd ik j

m ik reduces

to a simple distribution that has probability 1 for the native distance between

residues i and k and 0 otherwise. So, the edge alignment potential can be

simpli

c i ;

c k ;

ed as follows.

Pd ij j

d ij j

d ij Þ

d ij j

h i ; k ; j ; if ¼

c i ;

c j

c i ;

c j Þ

d ij

where d ij represents the distance of the two sequence residues at the two aligned

positions, Pd ij j

c j is

the conditional probability of d ij estimated from the contexts (denoted xi i and x j )of

the two sequence residues.

d ij is the conditional probability of d ij on d ij and Pd ij j

c i ;

d ij

where Pd ij

P ð d ij ; d ij Þ

( 2.7 ), Pd ij j

In Eq.

d ij Þ

d ij is the joint probability

of the pairwise distances of two aligned residue pairs and can be calculated by

simple statistics using a set of non-redundant protein structure alignments generated

by a structure alignment tool such as DeepAlign.

P ref d ij

is the background probability, and Pd ij ;

2.7 Algorithms for Aligning Two Markov Random Fields

As mentioned before, an alignment can be represented as a path in the alignment

matrix, which encodes an exponential number of paths. We can use a set of 3N 1 N 2

binary variables z i ; j to de

ne a path, where N 1 and N 2 are the lengths of the two

MSAs,

;

is an entry in the alignment matrix and u the associated state. Mean-

while, z i ; j

is equal to 1 if the alignment path passes

;

with state u

Therefore, the

Protein Homology Detection Through Alignment of Markov Random Fields

Search WWH ::

Custom Search

Home