Biology Reference
In-Depth Information
In the field of protein fold recognition, currently the most powerful
tools are almost exclusively based on profile-profile or HMM-HMM
(these terms are used nearly synonymously) comparison [ 67 ,
94 - 96 ]. Such profiles or HMMs are first constructed by using an
iterative homology search program, typically PSI-BLAST [ 75 ]. This
process is often called homology extension. In addition to sequence
profiles, the most sensitive tools also incorporate predicted 1-D
structural information, typically secondary structure [ 97 ] and sol-
vent accessibility [ 98 ] propensities. Thus, it is natural to incorporate
homology-extended sequence profiles and predicted 1-D structural
information into the objective function of MSA to improve its
quality. Earlier attempts are reviewed in [ 63 ]. PRALINE [ 99 ] and
SPEM [ 100 ] are representatives of more systematic approaches.
MAFFT [ 65 ] also has an option for homology extension, although
no 1-D structural information appears to be considered. PCMA
[ 101 ] is the first program that combines profile-profile comparison
with consistency transformation. The same group has extended
their approach to incorporate wider structure-related knowledge
to yield MUMMALS [ 27 ] and PROMALS [ 102 ].
It is noteworthy that development of sequence pair weight w p , q
shown in Eq. 4 [ 74 ] was coupled with homology extension. As a
result of homology search, a various number of homologous
sequences with various degrees of similarity to the query and to
one another are obtained. The pair weights are designed to exert
two effects: (1) correction for over/under representations of dif-
ferent groups of sequences, and (2) more favorable treatment of
more closely related sequence pairs than distant pairs. The results of
examinations of the influence of sequence weights or pair weights
on MSA quality are somewhat controversial; Wheeler and Kececio-
glu [ 43 ] reported that several weighting schemes they examined,
including unweighted case ( w p , q ¼
2.6 Other Heuristic
Methods
1), performed similarly in accu-
racy, whereas several other groups [ 28 , 41 , 52 , 74 ] suggested
positive effects of weighting. The fact that the experiments of
[ 52 , 74 ] were done after homology extension to the members in
the structural alignment dataset might explain, at least in part, the
cause of the discrepancy.
Incorporation of outer knowledge, such as linear motifs [ 103 ]
and protein tertiary structures [ 104 , 105 ], is promised to improve
the accuracy of MSA. However, it is ambiguous where to limit the
category of “sequence alignment,” and these topics are not
included in this chapter.
3 Notes
1. Although some algorithms are formulated to minimize dis-
tance rather than maximize similarity, these two approaches
are proven to be mathematically equivalent [ 106 ].
Search WWH ::




Custom Search