Biology Reference
In-Depth Information
5
Summary
Multiple Sequence Alignment requires homologous sequences; the
BLAST and FASTA programs calculate both alignment scores
and accurate estimates of their statistical significance that can
be reliably used to infer homology. Protein alignment scores with
E ()
<
0.001 in a single search reliably reflect homology—sequences
that have descended from a common ancestor. Searches for homologs
are far more sensitive at the protein level. Protein sequences change
more slowly than DNA sequences, providing greater evolutionary
look back time; protein alignments have more accurate statistical
estimates; and protein databases are dramatically smaller than DNA
databases. Today, most sequences are determined as DNA, but
blastx and fastx can automatically translate those sequences and
compare them to protein databases. Search sensitivity can be increased
by searching smaller, representative databases, and using scoringmatri-
ces that are targeted to the length and evolutionary distance of the
sequences of interest. Because protein sequence databases have
become so diverse, it is rare that a query sequence does not find
homologs; the most common reason for failing to find homologs is a
query sequence that is largely low-complexity or strongly biased
amino-acid composition. It is routine to find homologs between
human and bacterial proteins that last shared a common ancestor
more than 2.5 billion years ago. Sequence comparison has improved
dramatically since it became generally availablemore than 25 years ago;
sequence databases are far more comprehensive, and statistical esti-
mates are far more reliable. It has become much easier to identify
homologs, which can provide more data for Multiple Sequence
Alignments.
References
1. Camacho C, Coulouris G, Avagyan V, Ma N,
Papadopoulos J, Bealer K, Madden TL (2009)
Blast+: architecture and applications. BMC
Bioinformatics 10:421
2. Smith TF, Waterman MS (1981) Identification
of common molecular subsequences. J Mol
Biol 147:195-197
3. Li W, McWilliam H, Goujon M, Cowley A,
Lopez R, Pearson WR (2012) PSI-Search: iter-
ative HOE-reduced profile ssearch searching.
Bioinformatics 28:1650-1651
4. Huang X, Hardison RC, Miller W (1990) A
space-efficient algorithm for local similarities.
Comput Appl Biosci 6:373-381
5. Waterman MS, Eggert M (1987) A new algo-
rithm for best subsequences alignment with
application to tRNA-rRNA comparisons.
J Mol Biol 197:723-728
6. Altschul SF, Gish W, Miller W, Myers EW,
Lipman DJ (1990) A basic local alignment
search tool. J Mol Biol 215:403-410
7. Karlin S, Altschul SF (1990) Methods for asses-
sing the statistical significance of molecular
sequence features by using general scoring
schemes.
Proc Natl
Acad
Sci USA
87:2264-2268
8. Altschul SF, Madden TL, Schaffer AA, Zhang
J, Zhang Z, Miller W, Lipman DJ (1997)
Gapped BLAST and PSI-BLAST: a new gener-
ation of protein database search programs.
Nucleic Acids Res 25:3389-3402
9. Wootton JC, Federhen S (1993) Statistics of
local complexity in amino acid sequences and
sequence
databases.
Comput
Chem
17:149-163
Search WWH ::




Custom Search