Information Technology Reference
In-Depth Information
amino acid substitution matrix such as BLOSUM62, pro
le-based alignment needs
a different scoring function. Nevertheless, some scoring functions for primary
sequence-based homology detection can be generalized to pro
le-based methods. In
the following subsections, we brie
y review the scoring functions for (1) the
alignment between one primary sequence and one PSI-BLAST sequence pro
fl
le
(i.e., PSFM or PSSM), and (2) the alignment between two PSI-BLAST sequence
pro
les. A slight change of the scoring functions can apply to the case when a
pro
le is represented as an HMM.
1.4.4 Scoring Function for Sequence-Pro
le Alignment
and Comparison
Given a primary sequence and a sequence pro
le, to determine their similarity, one
strategy is to estimate how likely the primary sequence is a sample from the
probability distribution encoded by the sequence pro
le. Let
a
denote the proba-
bility distribution of 20 amino acids at a speci
c MSA or pro
le column. Supposing
that this pro
le column is aligned to amino acid j in the other protein, to determine
if amino acid j is a sample from
a;
the following score can be used.
log a j
p j
score
ða;
j
Þ¼
ð
1
:
3
Þ
where
le column and p j the
background probability of j. That is, Eq. ( 1.3 ) calculates the log-odds ratio of amino
acid j being observed at this speci
a j is the probability of amino acid j at this speci
c pro
le column. The larger the score, the more
likely amino acid j is generated from the distribution
c pro
instead of the background
distribution. Summing up Eq. ( 1.3 ) over all aligned positions yields a score for the
whole alignment between one primary sequence and one sequence pro
a
le. The
larger the alignment score, the more likely that the primary sequence is a sample
from the probability distribution encoded by the sequence pro
le. Therefore, the
alignment score quanti
es the similarity between the primary sequence and the
sequence pro
le.
Another scoring method is to generalize the sequence-sequence scoring in
Eq.
ð
!
Þ¼
P rel ð
;
Þ=
( 1.1 )
to sequence-pro
le scoring. Let Pi
j
i
j
P i denote the
probability of amino acid i mutating to amino acid j, where p rel ð
;
Þ
is the proba-
bility of two amino acids i and j are evolutionarily related and pi i is the background
probability of amino acid i
i
j
Summing up the 20 possible amino acids for i
according to the probability distribution a; we can calculate the probability of j
mutating from an amino acid distribution
:
a
as follows.
Search WWH ::




Custom Search