Biology Reference
In-Depth Information
However, the transitive inference of homology between
sequences that do not share significant similarity requires that the
same regions (domains) of the sequences align. If protein B contains
domains B1 and B2, and the B1 domain shares significant similarity
with A, C, and D, and B2 with E, F, and G, then there is no reason
to believe that A, C, and D are homologous to E, F, and G.
However, if all seven full-length proteins A-D, rather than the
individual domains, were included in a Multiple Sequence Align-
ment, many programs would align the unrelated residues, as many
Multiple Sequence Alignment programs assume that the sequences
being aligned are globally homologous.
To ensure that Multiple Sequence Alignments are built from
homologous sequences, BLAST, FASTA, and other pairwise simi-
larity searching programs have two central goals: (1) to identify
sequences sharing excess similarity; (2) to ensure that statistical
estimates are accurate. In this chapter, we describe programs and
search strategies to perform sensitive searches that reliably identify
homologs.
2 Using BLAST and FASTA
The most effective similarity searches perform sequence compari-
sons at the protein sequence level, either by comparing a protein
sequence to a protein sequence database with blastp , fasta ,or
ssearch , or by using blastx or fastx (Table 1 ) to translate
DNA query sequences “on-the-fly” and compare them to a protein
sequence database at the protein level. Protein sequence compari-
son is 5-10-fold more sensitive than DNA:DNA comparison, and
protein sequence databases are considerably less redundant than
DNA sequence databases. Together, this means that protein or
2.1 Selecting a
Search Program
Table 1
BLAST and FASTA programs
BLAST
program
FASTA a program Query
Library Comments
Protein Protein Fast, sensitive protein comparison [ 6 , 8 , 23 , 24 ]
blastp
fasta
DNA DNA
Only for non-protein coding sequences.
blastn -task blastn is required for
sensitive searches
blastn
fasta
fastx /
fasty
DNA
Protein Performs 6-frame translation of query
with frameshifts [ 25 ]
blastx
tfastx /
tfasty
Protein DNA
tblastn
a The names of the FASTA programs are typically followed by a major version number, e.g., fasta36 or
ssearch36 . These numbers are not shown
Search WWH ::




Custom Search