Biology Reference
In-Depth Information
The preprocessing strategy can be further optimized by means
of an iterative protocol. Each iteration is based upon the consis-
tency of a preceding MSA. Consistency is defined here as the
agreement between matched amino acids in the MSA and those
in associated pairwise alignments. PRALINE calculates a consis-
tency score for each amino acid in the MSA. These are then used as
position-specific weight in subsequent alignment. The effect of this
is that alignments in next iterations tend to maintain consistently
aligned regions, while less consistent regions are more likely to
become aligned differently. Iterations are terminated when conver-
gence or limit cycle is reached. The latter means that a given MSA
has been encountered during iteration earlier than the preceding
round. The user must specify the maximum number of iterations
for cases where convergence or limit cycle is not reached.
Protein sequences accumulate varying degrees of mutation during
evolution. This situation has an important bearing on the quality of
alignment methods which use generic amino acid scoring matrices
since these matrices are mostly derived from a specific set of care-
fully curated alignments. Such generalization implies a standar-
dized evolutionary model, which might lead to inconsistencies in
the alignments. Although the quality of alignments of closely
related proteins is hardly influenced by this issue, alignments of
distant protein sequences (
2.3 Homology-
Extended Alignment
30 % sequence identity) are much
more sensitive to this issue. This is because evolutionary traces
become largely obfuscated in divergent cases [ 16 ].
Two main approaches have led to improvements in distant
protein alignment. In the first approach, the generic substitution
matrix is readjusted to the evolutionary relation observed in the
input sequence set [ 17 ]. The second approach attempts to identify
the distant relation between the sequences through the incorpora-
tion of additional structural or homologous sequence information.
The homology-extended alignment strategy in PRALINE attempts
to address the problem of distant protein sequence alignment by
enriching the information content for each of the input sequence
with the help of homologous sequences collected using
PSI-BLAST. In this alignment strategy, a PSI-BLAST search is
performed for each input sequence against a particular sequence
database; the default is the nonredundant (NR) database. The user
can set the initial E -value threshold and the number of PSI-BLAST
iterations. In order to filter for redundant sequences, all PSI-
BLAST hits with 100 % sequence identity are not taken into
account. In cases where no hits are found or only redundant hits
are found, the PSI-BLAST search is rerun using an E -value thresh-
old which is ten times higher, i.e., ten times less stringent, than the
previous one. This process is reiterated until each input sequence
has at least one homologues sequence. The final local PSI-BLAST
<
Search WWH ::




Custom Search