Method - Protein Homology Detection Through Alignment of Markov Random Fields

Information Technology Reference

In-Depth Information

log

d ik ;

d jl Þ

pd jl j

pd ik j

h i ; k ; j ; l ¼

c i ;

c k ;

m ik

c j ;

c l ;

m jl

d ik Þ

d jl Þ

P ref ð

d ik ; d jl

is the probability of two nodes i and k in T interacting at

where pd ik j c i ; c k ; m ik

is the probability of two nodes j and l in S interacting

distance d ik ;

pd jl j

c j ;

c l ;

m jl

is the probability of one distance d ik

at distance d jl ;

pd ik ;

d jl

being aligned to

in reference alignments; and P ref d ik P ref d jl

is the

another distance d jl

background probability of observing d ik

(d jl ) in a protein structure. Meanwhile xi i

and x k are position-speci

c features centered at the ith and kth residues, respec-

tively, and m ik represents the mutual information between the ith and kth columns in

the multiple sequence alignment.

Compared to contact-based potentials, here we use interaction at a given distance

to obtain a higher-resolution description of the residue interaction pattern, as shown

in Fig. 2.5 . Therefore, this edge alignment potential is more informative and thus,

may lead to better alignment accuracy and homology detection rate.

Now we explain how to calculate each term in Eq. ( 2.6 ). P ref d ik P ref d jl

can be calculated by simple counting on a set of non-redundant protein structures,

e.g., PDB25. Similar to P ref d ik ;

can also be calculated by simple

counting on a set of non-redundant reference alignments. That is, we randomly

choose a set of protein pairs such that two proteins in each pair are similar at least at

the fold level. Then we generate their reference alignment (i.e., structure align-

ments) using a structure alignment tool DeepAlign [ 15 ] and

Pd ik ;

d jl

finally do simple

counting to estimate pd ik ;

d jl

In order to use simple counting, we discretize inter-

residue distance into 12 intervals: <4, 4

…

5, 5

,14

15, and >15

using a

As explained in the previous section, we predict pd ik j

c i ;

c k ;

m ik

probabilistic neural network (PNN) implemented in our context-speci

c distance-

dependent statistical potential package EPAD. EPAD takes as input sequence

pro

le contexts and mutual information and then yields inter-residue distance

probability distribution. See the EPAD paper [ 5 ] for the technical details.

The EPAD package has been blindly tested in CASP10 for template free modeling.

The CASP10 results show that EPAD can successfully fold some targets with

unusual fold (according to the CASP10 Free Modeling assessor Dr. BK Lee). Our

large-scale experimental test also indicates EPAD is much better than those context-

independent distance-based pairwise potentials such as DOPE [ 19 ], RW [ 20 ] and

DFIRE [ 21 ] in ranking protein decoys [ 5 ].

Protein Homology Detection Through Alignment of Markov Random Fields

Search WWH ::

Custom Search

Home