Biomedical Engineering Reference
In-Depth Information
is a set of sequences of different lengths. In essence, in a typical coarse-grained
parallel implementation, one of the processors acts as a “master”, dispatching blocks
of sequences to the “slaves” which, in turn, perform the algorithm calculations. When
the slaves report results for one block, the master sends a new block. This strategy is
possible because results from the comparison between query and database sequences
are independent of the previous results deriving from the comparison of the query
with other sequences.
However, the time required in the processing of any given sequence depends not
only on the length of the sequence but also on its composition. Therefore, the use of a
dynamic load balancing strategy is necessary. The simplest way is to modify the way in
which the master processor distributes the load on demand from the slaves. Obviously,
sending one-sequence messages introduces additional expensive time overhead due
to the high number of messages interchanged. Thus, rather than distributing messages
sequence-by-sequence, better results are achieved by dispatching blocks of sequences.
10.4 FASTA
FASTA [7] finds homologous sequences using a four-step process. First, sequences
that have sub-sequences of at least k adjacent residues, which match sub-sequences
in the query sequence, are identified. The recommended value of k for a protein
sequence alignment is 2 and for a DNA sequence alignment is between 4 and 6. The
second step combines groups of these matching sub-sequences into longer matching
regions called initial regions. Each initial region consists of one or more matching
sub-sequence separated by mismatching regions of residues. Gaps are not allowed
within the mismatching regions. That is, the number of residues between two con-
secutive matching sub-sequences within the same initial region has to be the same
in the query sequence and the target sequence. These initial regions are scored and
the best 10 initial regions are selected. During the third step, dynamic program-
ming is used to combine nearby initial regions and new scores are assigned to the
combined regions. This is an example of how dynamic programming is used in a
limited context (i.e., only for nearby initial regions) within heuristics. In this third
step, mismatching regions between the initial regions may contain gaps. The scores
generated in the third step are used to rank the database sequences. During the fourth
step, the Smith-Waterman [6] algorithm is applied to the top ranking sequences from
the previous step. Specifically, this algorithm is used to align the query sequence
and the database sequences within the selected initial regions and their neighboring
residues. The fourth step in FASTA is another example of the use of dynamic pro-
gramming in a limited context within a heuristic based alignment algorithm. The
Smith-Waterman [6] algorithm is only used for top ranking sequences and only
within selected regions of these sequences. Limiting the use of dynamic program-
ming increases the computational efficiency of the alignment algorithm. However, it
also means that the generated solution is only a sub-optimal solution rather than an
optimal one.
Search WWH ::




Custom Search