Biology Reference
In-Depth Information
two significant alignments align over fewer than 50 amino-acids. 7
Thus, by using a scoring matrix that is appropriate for the average
protein identity between human and chicken, it is much easier to
identify orthologous sequences—sequences that differ because of
the mammal:reptile speciation event. Closely related (orthologous)
sequences are more reliably identified using a scoring matrix that
reflects the evolutionary distance of the organisms being compared;
in general, searches between organisms that have diverged over the
past 400 million years should certainly use “shallower” (less evolu-
tionarily distant) scoring matrices.
Changing gap penalties —Like the default scoring matrices, the gap
penalties used by default by BLAST and FASTA are designed to find
distant evolutionary relationships. The BLAST program provides a
limited range of gap penalties for the scoring matrices it supports
and most alternative penalties are more stringent. Increasing the
gap penalty, like choosing a shallower scoring matrix, will improve
the statistical significance of shorter, or more closely related, align-
ments. The FASTA programs do not limit the gap-penalty choices,
but scoring matrices have matched default gap penalties that are
appropriate for the matrix target percent identity [ 22 ]. In general,
the FASTA defaults should not be reduced (made less stringent);
increasing the penalties can sometimes improve significance for
shorter alignments. While strong arguments can be made for
adjusting scoring matrices to match short domain/exon lengths
and short evolutionary distances, current theory does not provide a
rationale for changing the default gap penalties for blastp /
blastx and fasta / fastx / ssearch similarity searches.
Summary—Protein sequence databases provide the most sensitive
searches for homologs; but modern protein databases are so large
that it is often more efficient, both statistically and computationally,
to search smaller, representative database, such as complete protein
sets from model organisms. The FASTA programs offer an option
to align against a larger set of sequences selected by “expanding”
the original set of hits. In general, BLAST and FASTA search
parameters are set to find very distant relationships for long
sequences (but blastn uses the much less sensitive rapid -mega-
blast option by default). Searches with short queries, for short
domains, or over
relatively
short
evolutionary distances
(
<
500 My), should use shallower scoring matrices.
7 Similar results are found with blastp with the PAM70 matrix, though the less stringent gap penalties used by
blastp produce longer alignments.
Search WWH ::




Custom Search