A Behavior-Based Approach to Securing Email Systems - Computer Network Security - page 68

Information Technology Reference

In-Depth Information

HTML content, and testing such scenario would follow the same logic as this test.

The results are expected to be very similar, as most calculations would be identical.

This new test would for the most part consists in just replacing one boolean random

variable (“attachment”) with another (“html”).

Our experiment used a combination of three plots, the “Hellinger distance” plot,

the “# distinct recipients” plot, and the “# attachment” plot, as detailed in the above

sections. Our intuition is that when a virus infiltrates itself, it causes each plot to

burst. We have tested three types of thresholds used to determine when a burst occurs:

a simple moving average (MA), a threshold proportional to the standard deviation of

the plots (TH), and a heuristic formula evaluating when a change of trend occurs

(HE).

The dataset for the test was an archive of 16 users, totaling 20,301 emails. The pa-

rameters that were randomly generated at each simulation were the time of the intru-

sion and the list of receiving recipients (taken from the address list of each selected

user). The parameters that were controlled were the propagation rate (ranging from

0,5 message per day to 24) and the number of corrupted emails sent (ranging from 20

corrupted emails sent to 100). In total, about 500,000 simulations were made, and for

each type, we determined the false alarm rate and the missing alarm rate. The results

are summarized in the tabl. 1.

No optimization was attempted, and we ran the experiment only once, as we did

not want to appear to use the best out of several runs. In summary, we achieved very

reasonable results, given the lack of optimization, and the simplicity of the threshold

formulae. Thus, this method can be expected to be quite efficient once optimized and

used in combination with other tools. As expected, a slower propagation rate makes

detection harder, as in such a case, each corrupted email becomes less “noticeable”

among the entire email flow (as can be seen in the table, each method gets worse

results as the propagation rate decreases). A smaller number of emails sent by the

virus would also make it harder to be detected (this can be seen by comparing the

results between 20 and 50 corrupted emails; in the case of 100 emails sent, the results

get lower for heuristic reasons only).

Table 1. Each cell contains the missing alarm rate and the false alarm rate (for example 27/9

means that the missing alarm rate is 27% and the false alarm rate is 9%). The results for the

three methods (MA, TH, HE) are shown for various propagation rates and number of emails

emitted by the virus.

Propagation Rate:

#corrupted email 20 50 100 20 50 100 20 50 100 20 50 100

MA 41/9 27/9 39/9 56/10 51/11 59/12 60/10 54/12 61/12 63/11 53/12 55/12

TH 49/4 41/4 60/4 69/4 67/5 73/6 73/5 69/6 73/6 76/5 65/5 67/6

24

2

1

0.5

Method

HE 25/8 21/8 48/8 48/9 54/11 63/12 59/10 63/13 67/13 69/11 73/12 67/13

2.5

Group Communication Models: Cliques

In order to study the email flows between groups of users, EMT provides a feature

that computes the set of cliques in an email archive.

Next Page

Computer Network Security

Search WWH ::

Custom Search

Home