Pattern Matching - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

Word Methods

BLAST and FASTA are called word methods of sequence alignment because these algorithms work at

the level of words—multiple polypeptides or nucleic acids—instead of with individual polypeptides or

nucleic acids. Both methods of sequence alignment are fast enough to support searching for

alignments of query sequences against entire nucleotide or protein databases.

The high-level flow of the FASTA algorithm, which predates BLAST, is shown in Figure 8-12 . The first

step in the FASTA algorithm is to create a hash table of words from the query sequence. Hashing is a

function that maps words to integers to get a smaller set of values so that the search space is

minimized, for example. A hash table, such as the one in Figure 8-13 , maps words to array positions,

based on the hash function. For proteins, word length is typically one or two amino acids long. For

nucleic acid sequences, the word length is usually from four to six characters. In either case, the

longer the word length, the more rapid and the less thorough the search.

Figure 8-12. FASTA Algorithm Flowchart.

Figure 8-13. Hash Table for FASTA. The possible words are keyed to index

numbers (right), which are used to represent words in the hash table.

Search WWH ::

Custom Search

Home