Automatic Generation of Finite State Automata for Detecting Intrusions Using System Call Sequences - Computer Network Security

Information Technology Reference

In-Depth Information

3.2

Multiple Sequence Alignment

Sequence alignment is to place the same symbol in different sequences at the same

position as much as possible by putting gaps between symbols. The purpose of se-

quence alignment is to measure how similar the given sequences are. It is a frequently

occurring and much studied problem in text processing and computational biology

[5]. For example, see figure 2 [13]. More formally, a pair of aligned sequences is

assigned the score, which is the sum of the scores of every pair of symbols that are at

the same position in the pair. The objective of alignment of two sequences is to

maximize the score, and the objective of alignment of multiple sequences is to maxi-

mize the sum of the scores of all the pairs of the sequences.

Two symbols at the same position is usually assigned a positive score if two sym-

bols are the same, a negative score if they are different or one is a symbol and the

other is a gap. For example, the score of two symbols a or b can be defined as fol-

lows:

s

(

a

,

b

)

=

+

1

if

a =

b

,

s

(

a

,

b

)

=

−

1

if

a ≠

b

,

s , where _ represents a gap.

With the above scoring scheme, the score of the fourth and the fifth string in figure

2 is +1-1-1-1-1+1-2-2-2+0+0 = -8. The score of the multiple alignment of all the five

strings is the sum of the scores of the all the 10 pairs of strings.

s

(

a

,

_

)

=

−

2

,

s

(

_,

a

)

=

−

2

,

(

_,

_

)

=

0

Fig. 2. Multiple sequence alignment

To align a pair of sequences there is a well-known quadratic time dynamic pro-

gramming algorithm [1]. Unfortunately, if there are k strings of length n , the natural

generalization of the algorithm for pairwise alignment takes

k

time, and fur-

thermore the exact alignment problem has been proven to be NP-complete [5]. Hence

we used the approximation algorithm called center star method [13]. The center star

method has the error ratio of 2, in other words, it produces the alignment whose score

is guaranteed to be less than twice the score of the optimal alignment [5].

The center star method proceeds in three stages as follows:

Θ

(

n

)

1. For each string s , perform pairwise alignment between s and each of the other

strings and sum up the scores. Take the string c that maximizes the sum as the cen-

ter .

2. Perform pairwise alignment between c and each of the other strings.

3. Perform multiple alignment starting from c and adding the other strings one by one

using the result of the pairwise alignment in step 2, keeping the principle that

“once a gap, always a gap”.

Figure 3 through figure 6 illustrate the center star method by showing the interme-

diate steps of multiple sequence alignment in figure 2 [13].

Computer Network Security

Search WWH ::

Custom Search

Home