Introduction - Protein Homology Detection Through Alignment of Markov Random Fields

Information Technology Reference

In-Depth Information

(e.g., residue co-evolution) and thus, encodes global information in a protein. In

particular, MRF is a graphical model encoding a probability distribution over the

MSA by a graph and a set of preset statistical functions. A node in the MRF

corresponds to one column in the MSA and one edge speci

es correlation between

two columns. Each node is associated with a function describing position-speci

c

amino acid mutation pattern. Similarly, each edge is associated with a function

describing correlated mutation statistics between two columns. With pro

le MRF

representation, alignment of two proteins or protein families becomes that of two

MRFs. To align two MRFs, a scoring function or alignment potential is needed

to measure the similarity of two MRFs. We use a scoring function consists of both

node alignment potential and edge alignment potential, which measure the node (i.e.,

amino acid) similarity and edge (i.e., interaction pattern) similarity, respectively.

It is computationally challenging to optimize a scoring function containing edge

alignment potential. To deal with this, we formulate MRF-MRF alignment as an

integer programming problem and then develop an Alternative Direction Method of

Multipliers (ADMM) [ 59 ] algorithm to solve it ef

ciently to a suboptimal solution.

ADMM divides the MRF alignment problem into two tractable sub-problems and

then iteratively solve them until they reach consistent solutions. Experiments show

that this MRF-MRF alignment method, denoted as MRFalign, can generate more

accurate alignments and is also much more sensitive than others in detecting remote

homologs. MRFalign works particularly well on mainly-beta proteins.

The most relevant work is Cowen

is MRFy/SMURF methods for fold recogni-

tion of beta proteins [ 60 , 61 ]. Nevertheless, our MRFalign method is signi

'

cantly

different from MRFy/SMURF in a few aspects: (1) MRFy/SMURF builds an MRF

based upon multiple structure alignment instead of multiple sequence alignment

(MSA). As such, it cannot apply to sequence-based homology detection in the

absence of native structures. In contrast, our method builds MRFs purely based

upon MSA and thus, applies to sequence-based protein alignment and homology

detection; (2) MRFy/SMURF can only align a single primary sequence to an MRF,

while our method aligns two MRFs to yield higher sensitivity; and (3) MRFy/

SMURF does not take into consideration residue co-evolution information. This

difference requires us to develop totally new methods to build MRFs from MSA,

measure similarity of two MRFs, and optimize the MRF-MRF alignment potential.

References

1. Brent, M.R.: Steady progress and recent breakthroughs in the accuracy of automated genome

annotation. Nat. Rev. Genet. 9 (1), 62 - 73 (2008)

2. Consortium, G.O.: The gene ontology project in 2008. Nucleic Acids Res. 36 (suppl 1),

D440

D444 (2008)

3. Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence

and structural data. Curr. Opin. Struct. Biol. 15 (3), 275

-

284 (2005)

4. Ginalski, K.: Comparative modeling for protein structure prediction. Curr. Opin. Struct. Biol.

16 (2), 172

-

177 (2006)

-

Protein Homology Detection Through Alignment of Markov Random Fields

Search WWH ::

Custom Search

Home