Information Technology Reference
In-Depth Information
pro
le alignment tools include DIALIGN [ 42 ] and
FPS [ 43 ]. HHpred [ 44 ], FORTE [ 45 ], and PICASSO [ 46 ] are some tools that use
pro
le HMM. Other sequence-pro
le alignment for homology detection. They have shown better perfor-
mance than sequence-sequence or sequence-pro
le-pro
le methods for remote homology
detection.
In the following sections, we will present more details on the state-of-the-art
methods for alignment-based homology detection and fold recognition.
1.4.1 Sequence Alignment for Homology Detection
and Fold Recognition
Sequence alignment is a basic method for homology detection. The underlying
reason is that two homologous proteins shall align well to each other (i.e., there are
many conserved residues in their alignment and few gaps). As such, we can infer
the relationship of two proteins by sequence alignment quality. A simple rule is that
if the sequence identity of two proteins (i.e., the percentage of identical residues in
their alignment) is high (e.g., >40 %), then it is very likely that they are homolo-
gous. The limitation of sequence alignment lies in that it cannot reliably detect
homologous relationship when proteins under study are not very close to each
other. In particular, homology detection by sequence alignment is not very reliable
when the similarity of two proteins falls into the twilight zone, i.e., the sequence
identity of two proteins is less than 25 %. However, in many cases two proteins
sharing low sequence identity may still be homologous and share some important
structural and functional properties.
The quality of one alignment cannot be accurately judged by sequence identity.
Instead, a more sophisticated score is needed to quantify protein similarity based on
the given alignment. A typical scoring function calculates a ratio between the
likelihood of two proteins being homologous (or evolutionarily related) and that of
being non-homologous (or evolutionarily unrelated). We can use two amino acid
substitution models to estimate the probability of two proteins being homologous
and non-homologous, respectively. The probability model for
is
also called null model, describing the case that two aligned residues are evolu-
tionarily unrelated. Let X and Y denote the amino acid types of two aligned resi-
dues. The null model calculates the occurring probability of X and Y being aligned
as X
non-homologous
where P i and P j are the background probability of amino
acid i and j, respectively. A few probability models such as PAM [ 39 ] and BLO-
SUM [ 39 ] have been developed to estimate how likely two aligned residues are
evolutionarily related. PAM estimates the relatedness of two aligned residues
starting from single point mutations. BLOSUM derives amino acid substitution
model from blocks of multiple sequence alignment.
Let PX
ð
¼
i
;
Y
¼
j
Þ ¼
P i P j ;
denote the probability of two aligned
residues being evolutionarily related (i.e., one substitutes the other during
ð
¼
i
;
Y
¼
j
j
related
Þ ¼
P rel ð
i
;
j
Þ
Search WWH ::




Custom Search