Information Technology Reference
In-Depth Information
3.2
Multiple Sequence Alignment
Sequence alignment is to place the same symbol in different sequences at the same
position as much as possible by putting gaps between symbols. The purpose of se-
quence alignment is to measure how similar the given sequences are. It is a frequently
occurring and much studied problem in text processing and computational biology
[5]. For example, see figure 2 [13]. More formally, a pair of aligned sequences is
assigned the score, which is the sum of the scores of every pair of symbols that are at
the same position in the pair. The objective of alignment of two sequences is to
maximize the score, and the objective of alignment of multiple sequences is to maxi-
mize the sum of the scores of all the pairs of the sequences.
Two symbols at the same position is usually assigned a positive score if two sym-
bols are the same, a negative score if they are different or one is a symbol and the
other is a gap. For example, the score of two symbols a or b can be defined as fol-
lows:
s
(
a
,
b
)
=
+
1
if
a =
b
,
s
(
a
,
b
)
=
1
if
a
b
,
s , where _ represents a gap.
With the above scoring scheme, the score of the fourth and the fifth string in figure
2 is +1-1-1-1-1+1-2-2-2+0+0 = -8. The score of the multiple alignment of all the five
strings is the sum of the scores of the all the 10 pairs of strings.
s
(
a
,
_
)
=
2
,
s
(
_,
a
)
=
2
,
(
_,
_
)
=
0
Fig. 2. Multiple sequence alignment
To align a pair of sequences there is a well-known quadratic time dynamic pro-
gramming algorithm [1]. Unfortunately, if there are k strings of length n , the natural
generalization of the algorithm for pairwise alignment takes
k
time, and fur-
thermore the exact alignment problem has been proven to be NP-complete [5]. Hence
we used the approximation algorithm called center star method [13]. The center star
method has the error ratio of 2, in other words, it produces the alignment whose score
is guaranteed to be less than twice the score of the optimal alignment [5].
The center star method proceeds in three stages as follows:
Θ
(
n
)
1. For each string s , perform pairwise alignment between s and each of the other
strings and sum up the scores. Take the string c that maximizes the sum as the cen-
ter .
2. Perform pairwise alignment between c and each of the other strings.
3. Perform multiple alignment starting from c and adding the other strings one by one
using the result of the pairwise alignment in step 2, keeping the principle that
“once a gap, always a gap”.
Figure 3 through figure 6 illustrate the center star method by showing the interme-
diate steps of multiple sequence alignment in figure 2 [13].
Search WWH ::




Custom Search