Information Technology Reference
In-Depth Information
call sequences generated by the sendmail program and lpr program [18]. We also
implemented Forrest's N-gram method with the length of short sequences being 10
and threshold value being 0.2, and compared the performance with our finite automa-
ton method.
4.1
Sendmail Program
Sendmail program provides 147 sequences of normal behavior and 34 sequences of
abnormal behavior. We used 100 sequences out of 147 normal sequences for model-
ing the normal behavior of the program, the remaining 47 sequences to test the false
positive rate, and 34 abnormal sequences to test the detection rate.
The data consist of pairs of process id number and the system call number the
process made. First we separate these pair and collect again as a sequence of system
calls for each process, and add 0 to the end of each sequence to signify the end of a
sequence. Figure 8 shows a sample of system call sequences. There are many cases
where different processes have identical system call sequences. These identical se-
quences are eliminated leaving only one sequence. These sequences can be divided
into three groups by their prefixes as Figure 9 shows.
105 104 104 106 105 104 104 106 … 0
1 5 5 5 5 5 5 0
105 104 104 106 105 104 104 106 1 … 0
4 2 66 66 4 138 66 5 23 45 4 27 66 … 0
Fig. 8. System call sequences of processes
G1 : 105 104 104 106 105 104 104 …
G2 : 1 5 5 5 5 5 …
G3 : 4 2 66 66 4 138 66 …
Fig. 9. Three groups
Then the macro selection algorithm described in Section 3.1 is applied. The sub-
strings are selected as macros according to the amount of reduction it can bring about
if they are replaced by a single symbol of the macro. In this experiment we also im-
posed the restriction that the frequency of occurrence of a substring be at least as high
as the number of processes created by the sendmail program, in order for the substring
to be selected as a macro. Figure 10 shows the result of the macro selection on group
G1. Letters represent macros and the numbers represent system calls. Figure 8 shows
the system call sequences after the macros have been applied. If any symbol or num-
ber repeats at least as often as four times in consecutive positions, then the repetition
is replaced by the symbol followed by + sign, as is commonly used in extended regu-
lar expressions.
Then it goes through the stage of multiple sequence alignment as described in Sec-
tion 3.2. Figure 8b shows the result of multiple alignment of the sequences in Figure
Search WWH ::




Custom Search