Biology Reference
In-Depth Information
particularly for protein query sequences with biased amino-acid
composition [ 10 , 12 , 13 ].
The FASTA package has grown as well; in addition to heuristic
strategies, it now offers accelerated optimal algorithms for
Smith-Waterman local protein alignment, global-global and
global-local alignment, and specialized algorithms for short
sequences. For general-purpose protein and translated DNA
sequence local similarity searches, the programs in Table 1 will
give very similar results; both provide statistical significance esti-
mates as expectation values ( E ()-values), and both provide compa-
rable scaled “bit” scores for comparing results over different
searches and database sizes.
Most performance differences between BLAST and FASTA
reflect the different scoring matrices, gap, extend, and frameshift
penalties used by default. The BLAST family of proteins typically
use the BLOSUM62 [ 14 ] matrix with a gap-open penalty of
11
and an gap-extension penalty of
12 for one residue
gap); the FASTA programs use BLOSUM50 with lower effective
gap penalties. The FASTA parameters allow higher sensitivity for
very distantly related sequences but require longer alignments. 3 By
default the fastx and fasty programs allow frameshifts in align-
ments, just as they allow gaps; blastx can allow frameshifts with
the -frame_shift_penalty option.
1 (a cost of
3
Inferring Homology: Interpreting Results
3.1 Use Expect or Bit
Scores, Not Percent
Identity, to Infer
Homology
BLAST and FASTA provide a variety of similarity measurements
from which one can infer homology. BLAST provides a bit score,
the E-value or Expect , the percent identity, percent positives,
and the alignment length. The FASTA programs provide a bit
score, E ()-value, percent identity, and percent similarity. 4 In addi-
tion, the FASTA programs provide a variety of “raw” similarity
scores that reflect the various stages of the heuristic FASTA algo-
rithm (e.g., init1 , initn , opt ), or the single optimal “raw” score
( s-w ) for ssearch . The bit score and Expect / E() values of
BLAST and FASTA are comparable and describe the number of
times the alignment score would be expected by chance. Thus, of all
the different scores provided by BLAST and FASTA, the Expect /
E-value is the one score that unambiguously reports the statistical
significance of the match.
3 The FASTA programs provide a variable scoring matrix option that shifts the scoring matrix for shorter query
sequences. The BLAST programs provide the -task blastp-short or -task blastn-short for
short protein:protein and DNA:DNA searches.
4 BLAST's percent positive counts aligned residues with a score
>
0; FASTA's fraction similar includes aligned
residues with scores
0.
 
Search WWH ::




Custom Search