Information Technology Reference
In-Depth Information
(e.g., residue co-evolution) and thus, encodes global information in a protein. In
particular, MRF is a graphical model encoding a probability distribution over the
MSA by a graph and a set of preset statistical functions. A node in the MRF
corresponds to one column in the MSA and one edge speci
es correlation between
two columns. Each node is associated with a function describing position-speci
c
amino acid mutation pattern. Similarly, each edge is associated with a function
describing correlated mutation statistics between two columns. With pro
le MRF
representation, alignment of two proteins or protein families becomes that of two
MRFs. To align two MRFs, a scoring function or alignment potential is needed
to measure the similarity of two MRFs. We use a scoring function consists of both
node alignment potential and edge alignment potential, which measure the node (i.e.,
amino acid) similarity and edge (i.e., interaction pattern) similarity, respectively.
It is computationally challenging to optimize a scoring function containing edge
alignment potential. To deal with this, we formulate MRF-MRF alignment as an
integer programming problem and then develop an Alternative Direction Method of
Multipliers (ADMM) [ 59 ] algorithm to solve it ef
ciently to a suboptimal solution.
ADMM divides the MRF alignment problem into two tractable sub-problems and
then iteratively solve them until they reach consistent solutions. Experiments show
that this MRF-MRF alignment method, denoted as MRFalign, can generate more
accurate alignments and is also much more sensitive than others in detecting remote
homologs. MRFalign works particularly well on mainly-beta proteins.
The most relevant work is Cowen
is MRFy/SMURF methods for fold recogni-
tion of beta proteins [ 60 , 61 ]. Nevertheless, our MRFalign method is signi
'
cantly
different from MRFy/SMURF in a few aspects: (1) MRFy/SMURF builds an MRF
based upon multiple structure alignment instead of multiple sequence alignment
(MSA). As such, it cannot apply to sequence-based homology detection in the
absence of native structures. In contrast, our method builds MRFs purely based
upon MSA and thus, applies to sequence-based protein alignment and homology
detection; (2) MRFy/SMURF can only align a single primary sequence to an MRF,
while our method aligns two MRFs to yield higher sensitivity; and (3) MRFy/
SMURF does not take into consideration residue co-evolution information. This
difference requires us to develop totally new methods to build MRFs from MSA,
measure similarity of two MRFs, and optimize the MRF-MRF alignment potential.
References
1. Brent, M.R.: Steady progress and recent breakthroughs in the accuracy of automated genome
annotation. Nat. Rev. Genet. 9 (1), 62 - 73 (2008)
2. Consortium, G.O.: The gene ontology project in 2008. Nucleic Acids Res. 36 (suppl 1),
D440
D444 (2008)
3. Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence
and structural data. Curr. Opin. Struct. Biol. 15 (3), 275
-
284 (2005)
4. Ginalski, K.: Comparative modeling for protein structure prediction. Curr. Opin. Struct. Biol.
16 (2), 172
-
177 (2006)
-
Search WWH ::




Custom Search