Biology Reference
In-Depth Information
Table 7
Scoring matrices, target identity, and alignment lengths
Scoring
Target
bits/
50-bit
Matrix
% ident.
pos.
align len.
VT10
91.1
3.45
14
VT20
83.2
2.92
17
VT40
69.8
2.27
22
PAM30 a
53.3
1.47
34
VT80
50.2
1.39
36
PAM70 a
41.7
0.966
52
BLOSUM80 a
41.7
1.04
48
VT120
39.3
1.06
47
BLOSUM62 a
28.6
0.439
114
BLOSUM50
25.0
0.216
231
VT160
24.4
0.288
174
a Using default BLASTP gap penalties
very distantly related domains but does not extend over the full
length of the homology—is equally likely to occur. Homologous
under-extension can sometimes be recognized by identifying inter-
mediate distance homologs, just as transitive similarity can be used
to recognize distant homologs. If domain A aligns to domain B over
200 amino-acids, using the appropriate scoring matrix, and domain
B aligns to domain C for 200 amino-acids, then it makes sense to
include all 200 amino-acids of all three proteins in a Multiple
Sequence Alignment, even if domain A only aligns to 100 amino-
acids of domain C.
Summary—BLAST and FASTA produce accurate sequence align-
ment expectation values; expectation values
0.001 can be used
to reliably infer homology in single searches; lower (more strin-
gent) thresholds are required when multiple searches are per-
formed. Expectation values capture the effect of database size;
larger databases produce larger (worse) expectation values for the
same alignment score. For this reason, the bit score can be used to
roughly characterize the significance of an alignment independent
of algorithm or scoring parameters. Alignments scoring greater
than 50 bits are almost always significant; 40-50-bit alignment
scores are significant when small databases are searched;
<
40 bits
are never significant. The significance of very surprising, but weakly
significant, alignments can be confirmed using shuffled sequence
<
Search WWH ::




Custom Search