Pattern Matching - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

Next, the characters are compared to those in the database, which has previously been processed

into words of the same length. FASTA uses the Blosum50 substitution matrix to score the top-10

alignments (without gaps) that contain the most similar words. These words are then merged into a

gapped alignment, which is scored, producing an "optimized score." FASTA produces an expectation

score, E , which represents the expected number of random alignments with z-scores greater than or

equal to the value observed, thereby providing an estimate of the statistical significance of the

results.

Although FASTA was the first widely used program for sequence alignment against genome-length

sequences, and is still actively supported in both Web and workstation versions, BLAST is by far the

more popular of the word-based algorithms for sequence alignment. Like FASTA, BLAST is a heuristic

approach to sequence alignment that provides speed through a hashing technique. BLAST also differs

from FASTA in that words are typically 3 characters long for proteins and 11 characters in length for

nucleotide sequences.

Like FASTA, BLAST also searches a pre-computed hash table of sequences in the protein or DNA

database. However, where BLAST excels is that the matching words are then extended to the

maximum length possible, as indicated by an alignment score. The top-scoring alignments in a

sequence, called maximal-scoring pairs (MSPs), are combined if possible into local alignments. The

latest version of BLAST can attempt gapped alignment. However, this tends to extend computational

time significantly, compared to ungapped alignments. One of the major issues of both BLAST and

FASTA results is how to interpret the significance of results. An individual score depends on a number

of variables, including the lengths of the sequences being aligned, the gap penalties, and the

alignment scoring system used.

Search WWH ::

Custom Search

Home