Information Technology Reference
In-Depth Information
Chapter 2
Method
Abstract This chapter describes a novel MRF-based method for homology
detection and fold recognition. In particular, it covers how to build an MRF model
for a protein sequence, how to score the similarity of two MRF models and the
similarity between one MRF model and one native structure, and
finally an alter-
nating direction method of multipliers (ADMM) method that can optimize the
scoring function.
Keywords Markov random
fields (MRF)
Hidden Markov models (HMM)
Alternating direction method of multipliers (ADMM)
Mutual
information
Residue co-evolution
2.1 Modeling a Protein Family Using Markov
Random Fields
Given a protein sequence, we run PSI-BLAST [ 1 ] with 5 iterations and E-value
cutoff 0.001 to find its sequence homologs and then build their multiple
sequence alignment (MSA). We can use a multivariate random variable X ¼
ð X 1 ; X 2 ; ... ; X N Þ;
where N is the number of columns (or the MSA length), to model
the MSA. Here each X i is a
finite discrete random variable representing the amino
acid at column i in the MSA, taking values from 1 to 21, corresponding to 20 amino
acids and gap. The occurring probability of the whole MSA can be modeled by an
Markov Random Field (MRF), which is a function of X. MRF is an undirected
graph that can be used to model a set of correlated random variables. As shown in
Fig. 2.1 , an MRF node represents one column in the MSA and an edge represents
the correlation between two columns. Here we ignore very short-range residue
correlation since it is not very informative. An MRF consists of two types of
functions:
X i Þ
and
X i ;
X k Þ;
where
X i Þ
is an amino acid preference function
for node i and
X i ;
X k Þ
is a pairwise amino acid preference function for edge
ð
i
;
k
Þ
Search WWH ::




Custom Search