Biology Reference
In-Depth Information
M
2 matrices holding S R ( i , j ) scores is called to sum up
the pairwise scores for the coordinate pairs of c
ð
M
þ
1
Þ=
. This
approximation can be extended from pair scores to score combina-
tions of multiple alignment N -tuples ( N
ð
k 1
;
k 2
; ...;
k M
Þ
<
M ). While estimating
P
frommultiple alignment scores is a better approx-
imation, calculating multiple alignment combinations of N
sequences might be costly. Nevertheless, using triplet scores is
reported to be more efficient than the using pairwise alignments
when the overall procedure is considered for a few sequences [ 26 ].
The threshold determining the search subset can be done by itera-
tive deepening as in the case of pairwise algorithm, or a fast heur-
istics approximating the multiple alignment (e.g., progressive
alignment) can be used [ 27 ].
Further reduction in space complexity is achieved by combin-
ing bounded dynamic programming with Hirschberg's insight.
The corresponding method is usually referred to as the divide-
and-conquer frontier search [ 26 ]. The frontier is the active region
of the search space that is used in the computations. Since the
grandparents of the cells are not used in the score calculation, a
cell is not kept in the memory after the scores of its children are
computed. Obviously, the memory to be used in the backtrace
phase is lost. Hirschberg's algorithm keeps the ancestors of each
active cell located in a selected median hyperplane of the grid. The
ancestor list is updated at each cell computation. Finally, the ances-
tor of the target cell is determined to be the point where the
optimal path crosses the hyperplane. The problem is then split
into two subproblems as in the pairwise application, and the opti-
mal path is found recursively.
ð
k 1 ;
k 2 ; ...;
k M Þ
Divide-and-conquer methods have the similar subproblem splitting
insight with Hirschberg's algorithm, but they differ in how the split
points are found. The sequences are parsed with fast methods
avoiding dynamic programming in multidimensional grid. Once
a number of subproblems are defined, each problem is solved by a
multiple sequence alignment algorithm of choice.
Some methods determine the binary split points directly using
pairwise alignments [ 28 ]. Combining the forward and the back-
ward alignment matrices S F , and S R the optimal split of the prefix of
the suffix of a sequence be found as in pairwise Hirschberg's
algorithm. An approximate cut point is found for each sequence
by summing up the score matrices for each pairwise alignment
associated. The overall maximum score determines an average
split point. This procedure is repeated a few times and the
corresponding subproblems are solved conventionally. The final
multiple sequence alignment is simply the concatenation of all
subsolutions.
4.2 Divide-and-
Conquer Methods
in Multiple Sequence
Alignment
Search WWH ::




Custom Search