Information Technology Reference
In-Depth Information
HTML content, and testing such scenario would follow the same logic as this test.
The results are expected to be very similar, as most calculations would be identical.
This new test would for the most part consists in just replacing one boolean random
variable (“attachment”) with another (“html”).
Our experiment used a combination of three plots, the “Hellinger distance” plot,
the “# distinct recipients” plot, and the “# attachment” plot, as detailed in the above
sections. Our intuition is that when a virus infiltrates itself, it causes each plot to
burst. We have tested three types of thresholds used to determine when a burst occurs:
a simple moving average (MA), a threshold proportional to the standard deviation of
the plots (TH), and a heuristic formula evaluating when a change of trend occurs
(HE).
The dataset for the test was an archive of 16 users, totaling 20,301 emails. The pa-
rameters that were randomly generated at each simulation were the time of the intru-
sion and the list of receiving recipients (taken from the address list of each selected
user). The parameters that were controlled were the propagation rate (ranging from
0,5 message per day to 24) and the number of corrupted emails sent (ranging from 20
corrupted emails sent to 100). In total, about 500,000 simulations were made, and for
each type, we determined the false alarm rate and the missing alarm rate. The results
are summarized in the tabl. 1.
No optimization was attempted, and we ran the experiment only once, as we did
not want to appear to use the best out of several runs. In summary, we achieved very
reasonable results, given the lack of optimization, and the simplicity of the threshold
formulae. Thus, this method can be expected to be quite efficient once optimized and
used in combination with other tools. As expected, a slower propagation rate makes
detection harder, as in such a case, each corrupted email becomes less “noticeable”
among the entire email flow (as can be seen in the table, each method gets worse
results as the propagation rate decreases). A smaller number of emails sent by the
virus would also make it harder to be detected (this can be seen by comparing the
results between 20 and 50 corrupted emails; in the case of 100 emails sent, the results
get lower for heuristic reasons only).
Table 1. Each cell contains the missing alarm rate and the false alarm rate (for example 27/9
means that the missing alarm rate is 27% and the false alarm rate is 9%). The results for the
three methods (MA, TH, HE) are shown for various propagation rates and number of emails
emitted by the virus.
Propagation Rate:
#corrupted email 20 50 100 20 50 100 20 50 100 20 50 100
MA 41/9 27/9 39/9 56/10 51/11 59/12 60/10 54/12 61/12 63/11 53/12 55/12
TH 49/4 41/4 60/4 69/4 67/5 73/6 73/5 69/6 73/6 76/5 65/5 67/6
24
2
1
0.5
Method
HE 25/8 21/8 48/8 48/9 54/11 63/12 59/10 63/13 67/13 69/11 73/12 67/13
2.5
Group Communication Models: Cliques
In order to study the email flows between groups of users, EMT provides a feature
that computes the set of cliques in an email archive.
Search WWH ::




Custom Search