Biology Reference
In-Depth Information
updated at each iteration with the similar edge-updating procedure
in the global alignment algorithm. The current highest scoring cell
and its root is saved during the process. Once the forward score
computation is finished, the highest scoring cell and its root is
already known. As the second phase, calling the global alignment
procedure with Hirschberg's algorithm assigning the root as the
top-left corner and the highest scoring cell as the bottom-right
corner is sufficient to find the local alignment. This linear-space
local alignment algorithm runs in
'
is the time required for aligning the local segment by the
quadratic space Needleman-Wunsch algorithm. More sophisti-
cated strategies based on Hirschberg's divide-and-conquer princi-
ple have also been applied to the problem of finding near-optimal
alignments [ 16 ].
S รพ
2
'
3
time where
S
The quadratic-time solution offered by dynamic programming is
the optimal solution to the sequence alignment problem when an
exact solution is required. As a result, the entire set of alignment
candidates are searched in a smart way. Several tasks of computa-
tional biology require alignment of high number of sequences or of
relatively long sequences as a natural result of the length of the
molecular sequences found in life. The required computational
burden is high when exact dynamic programming solutions are
considered. For that reason, researches have proposed fast approxi-
mate solutions with lower costs. The most popular approach to an
approximate solution is defining heuristics to constrain the search
space and employing alignment algorithms in a reduced search
space. One might think that skipping potential alignment candi-
dates would bring a great statistical risk of finding solutions far away
from a set of desired solutions. While this is true, in a sense that the
optimal solution cannot be guaranteed with approximating heur-
istics, in practice such approaches are quite successful. Usually
biological sequences are not random with a great statistical com-
plexity, and they are expected to share significant homologies with
other sequences. Therefore, some regions of the dynamic program-
ming matrices are more likely to contain a near-optimal solution. As
an example, for two highly similar sequences, as they share large
regions of similarity, not a large number of gaps are expected in
total. Because of that, the optimal path cannot visit the cells too far
away from the diagonal. In case, the dynamic programming graph
can be reduced to a smaller subgraph. Here we are going to
introduce some heuristics approaches applied on dynamic program-
ming in order to achieve fast sequence alignments.
3.3 Heuristical
Improvements on
Sequence Alignment
Banded dynamic programming is originated from the idea that if
two aligned sequences are similar, the number of insertions and
deletions is below some upper bound. In the alignment graph, this
situation corresponds to a diagonal band centered around the
3.3.1 Banded Dynamic
Programming
Search WWH ::




Custom Search