Biology Reference
In-Depth Information
evolution scenario when the conserved sites are aligned together.
According to this, conserved residue pairings are rewarded with
positive scores and less likely substitutions, insertions and deletions
are usually punished with negative scores. In this naive form of
scoring alignments, the model of scoring is applied independently
to each pairing and summed up together. Therefore the problem
reduces to a maximum path finding problem on the alignment
graph and it can be solved using dynamic programming.
3
Pairwise Sequence Alignment Algorithms
The application of the optimal path finding algorithm to the
sequence alignment problem provides polynomial time solution
for finding the optimal alignments with the defined scoring
schemes. Various versions of these algorithms have been proposed
to date, with different objectives of alignment preferences and per-
formance specifications. Yet, themain idea behind is preserved and it
is based on the algorithm proposed by Needleman and Wunsch [ 4 ].
Although in its original article it is not referenced that 1970
Needleman-Wunsch algorithm is based on dynamic programming,
it is an exact application of the optimum path finding solution
we have mentioned. The algorithm states the methodology of
finding the similarities between two protein sequences. With appro-
priate scoring modifications, this can be generalized to DNA/RNA
sequences.
Needleman-Wunsch algorithm assigns positive scores to the
pairing residues of the same kind and zero to the residues of
different kinds. The penalties introduced by gaps depend on the
length of a gap. This is because a novel insertion or detection is a
significant event and consecutive insertions and deletions are more
likely to happen once the initial event occurs. The penalty for a
consecutive gap is usually defined as a concave function of the
length of the gap [ 5 ]. Concave scoring reflects the relatively
decreasing importance of a new adjacent gap introduced. As a
result, variable length of gap introductions has to be considered
individually in the computation of the alignment scores. This
corresponds to a modification in the graphical structure in Fig. 1 .
In the graph, only the transitions of single gaps have directed edges,
and they are represented by the horizontal and the vertical edges.
The consecutive gaps of length k can be represented as jumping
horizontal (vertical) edges between the cells c ( i , j ) and c ( i + k , j )
( c ( i , j ) and c ( i , j + k )). This modification enables the representation
of any alignment as a path and its corresponding score as the
additive score of this path, so the equivalence of the optimal path
and the sequence alignment problems is preserved. Note that this
network structure is in the class of the multilayer networks we have
covered, which satisfies the recursion relation in Eq. 5 . In this form,
Search WWH ::




Custom Search