Information Technology Reference
In-Depth Information
be changed to a different value, while the length of the training window equals 4
times the length of the testing window. Fig. 2 is shown an example for a normal user.
The Hellinger distance plot shows the distance between training and testing sets
plotted over the entire email history of the user. For example, if a user has 2500 out-
bound emails, the plots starts at the 500 th record, and measures the distance between
the frequencies corresponding to the first 400 records, versus the emails correspond-
ing to the next 100 records; these two windows, of 400 and 100 records, respectively,
are then rolled forward over the entire email history of the user, by steps of one re-
cord. At each step, a Hellinger distance is calculated between the given training win-
dow of 400 records, and the corresponding testing window of 100 records.
Fig. 2. Normal User
What this plot tells us is that when a burst occurs, the recipient frequencies have
been changing significantly. This can be either a normal event, as we know from the
previous section, or an intrusion. Thus the plot can be included in our list of detection
tools. Formal evaluation of its efficiency, as well as of some of the metrics presented
along section 3.4, is described in the next section.
2.4.4 Tests of Simulated Virii
As data with real intrusions are very difficult to obtain [19], EMT's menu includes the
creation of simulated email records. Using a database of archived emails, EMT can
generate the arrival of a “dummy” virus, and insert corrupted records into a real email
log file. A set of parameters introduces randomness in the process, in order to mimic
real conditions: the time at which the virus starts, the number of corrupted emails sent
by the virus and its propagation rate. Each recipient of such a “dummy” corrupted
email is picked randomly from the address list of the selected user. These recipients
can be set to be all distinct, as most virii, but not all, would only send an email once to
each target recipient account.
By design, each email generated by a simulated virus contains one attachment, but
no information about the attachment is provided or used, other than the fact that there
is one attachment. Our assumption is that virii propagate themselves by emitting
emails with one attachment, which may or may not be a copy of themselves, so that
the tests encompass polymorphic virii as well. Virii can also propagate through
Search WWH ::




Custom Search