Introduction - Protein Homology Detection Through Alignment of Markov Random Fields

Information Technology Reference

In-Depth Information

amino acid substitution matrix such as BLOSUM62, pro

le-based alignment needs

a different scoring function. Nevertheless, some scoring functions for primary

sequence-based homology detection can be generalized to pro

le-based methods. In

the following subsections, we brie

y review the scoring functions for (1) the

alignment between one primary sequence and one PSI-BLAST sequence pro

fl

le

(i.e., PSFM or PSSM), and (2) the alignment between two PSI-BLAST sequence

pro

les. A slight change of the scoring functions can apply to the case when a

pro

le is represented as an HMM.

1.4.4 Scoring Function for Sequence-Pro

le Alignment

and Comparison

Given a primary sequence and a sequence pro

le, to determine their similarity, one

strategy is to estimate how likely the primary sequence is a sample from the

probability distribution encoded by the sequence pro

le. Let

a

denote the proba-

bility distribution of 20 amino acids at a speci

c MSA or pro

le column. Supposing

that this pro

le column is aligned to amino acid j in the other protein, to determine

if amino acid j is a sample from

a;

the following score can be used.

log a j

p j

score

ða;

j

Þ¼

ð

1

:

3

Þ

where

le column and p j the

background probability of j. That is, Eq. ( 1.3 ) calculates the log-odds ratio of amino

acid j being observed at this speci

a j is the probability of amino acid j at this speci

c pro

le column. The larger the score, the more

likely amino acid j is generated from the distribution

c pro

instead of the background

distribution. Summing up Eq. ( 1.3 ) over all aligned positions yields a score for the

whole alignment between one primary sequence and one sequence pro

a

le. The

larger the alignment score, the more likely that the primary sequence is a sample

from the probability distribution encoded by the sequence pro

le. Therefore, the

alignment score quanti

es the similarity between the primary sequence and the

sequence pro

le.

Another scoring method is to generalize the sequence-sequence scoring in

Eq.

ð

!

Þ¼

P rel ð

;

Þ=

( 1.1 )

to sequence-pro

le scoring. Let Pi

j

i

j

P i denote the

probability of amino acid i mutating to amino acid j, where p rel ð

;

Þ

is the proba-

bility of two amino acids i and j are evolutionarily related and pi i is the background

probability of amino acid i

i

j

Summing up the 20 possible amino acids for i

according to the probability distribution a; we can calculate the probability of j

mutating from an amino acid distribution

:

a

as follows.

Protein Homology Detection Through Alignment of Markov Random Fields

Search WWH ::

Custom Search

Home