Information Technology Reference
In-Depth Information
evolution), then the amino acid substitution score or mutation potential between
i and j is de
ned as follows.
p rel ð
i
;
j
Þ
Mi
ðÞ¼
;
j
log
ð
1
:
1
Þ
p i p j
ð
;
Þ;
The higher the Mði;
i
j
the more likely the two aligned residues are related.
Summing up Mi
over all aligned positions and deducting gap penalty yields
a quality score of one alignment. Mi
ðÞ
;
j
ðÞ
;
j
can also be interpreted in a different way.
By Bayes
'
rules, we have the following equation.
P
ð
X
¼
i
;
Y
¼
j
j
related
Þ
P
ð
related
j
X
¼
i
;
Y
¼
j
Þ¼
P related
ð
Þ
ð
¼
;
¼
Þ
PX
i
Y
j
ð
1
:
2
Þ
p rel ð
;
Þ
i
j
¼ P related
ð
Þ
p i p j
is a constant, Eq. ( 1.2 ) implies
that the log-likelihood of two aligned residues being evolutionarily related actually
equals to their mutation potential plus a constant.
Given a scoring function, dynamic programming algorithm can be used to
generate an alignment between two protein sequences, which guarantees to yield an
alignment optimizing the scoring function. However, dynamic programming has
computational time proportional to the length product of two proteins under
alignment, which may be too slow in many applications of homology search
especially when a large database of subject proteins is searched for homologs.
To speed up, some heuristic methods, such as BLAST, are developed to generate
suboptimal alignments and detect close homologs much more efficiently.
If we assume that the prior probability P related
ð
Þ
1.4.2 Pro
le-Based Alignment for Homology Detection
and Fold Recognition
To improve remote homology detection, profile-based protein comparison is
developed. The sequence pro
le of a protein encodes its evolutionary information
and is built from a set of close homologs. That is, instead of aligning primary
sequences, we may align/compare two protein sets, each containing close homologs
of a protein in question. To make comparison or alignment easy, a set of protein
homologs is usually represented as a sequence pro
le. The utilization of sequence
pro
les has increased the sensitivity of homology detection by three times over pure
sequence comparison [ 47 ].
PSI-BLAST can be used to generate sequence pro
le of a protein. PSI-BLAST
finds close homologs of this protein from a large sequence database such as the NCBI
non-redundant (NR) database [ 48 ], build a multiple sequence alignment (MSA) of
these homologs, and then convert the MSA to a sequence pro
le. For example,
Search WWH ::




Custom Search