Biomedical Engineering Reference
In-Depth Information
Word Methods
BLAST and FASTA are called word methods of sequence alignment because these algorithms work at
the level of words—multiple polypeptides or nucleic acids—instead of with individual polypeptides or
nucleic acids. Both methods of sequence alignment are fast enough to support searching for
alignments of query sequences against entire nucleotide or protein databases.
The high-level flow of the FASTA algorithm, which predates BLAST, is shown in Figure 8-12 . The first
step in the FASTA algorithm is to create a hash table of words from the query sequence. Hashing is a
function that maps words to integers to get a smaller set of values so that the search space is
minimized, for example. A hash table, such as the one in Figure 8-13 , maps words to array positions,
based on the hash function. For proteins, word length is typically one or two amino acids long. For
nucleic acid sequences, the word length is usually from four to six characters. In either case, the
longer the word length, the more rapid and the less thorough the search.
Figure 8-12. FASTA Algorithm Flowchart.
Figure 8-13. Hash Table for FASTA. The possible words are keyed to index
numbers (right), which are used to represent words in the hash table.
 
 
Search WWH ::




Custom Search