Information Technology Reference
In-Depth Information
email address and their frequency of receiving messages. It is used as a reference to
the Recipient Frequency Histogram, as the email address corresponding to each bar of
the histogram can be found in the table in the same order.
“Total Recipient Address List size over time” is the chart in the lower left-hand
side in Fig. 5. The address list is the list of all recipients who received an email from
the selected user between two dates. This list is a cumulative distribution that grows
over an increasing number of emails analyzed, as new distinct recipients are added to
the list. It starts empty, then each time the selected user sends an email to a new dis-
tinct email address not yet in the list, that address is added to the list. The list size is
the number of elements in the list at a given point in the set of bulk emails analyzed.
(Note, the address book is the list of email addresses a user maintains in their mail
client program for easy reference, unlike the address list, which is the list of all recipi-
ents a user may send to eg., in reply to messages from accounts not recorded in the
user's address book).
The plots labeled “# distinct rcpts & # attach per email blocks” appear in the chart
in the lower right-hand side of Fig. 11. It contains five plots that visualize the variabil-
ity of the user's emission of emails. The green plot is the number of distinct recipients
per block of 50 emails sent. The program uses a rolling window of 50 emails to calcu-
late this metric. What it means is, the higher its value (and thus the closer to 50), the
wider the range of recipients the selected user sends emails to, over time. On the other
hand, if the metric is low, it means that the user predominantly sends messages to a
small group of people. The moving average of this metric (using 100 records) is plot-
ted as a blue overlapping curve, indicating the trend.
The plot in yellow is based on the same criterion, but with a window of 20 emails
instead of 50. This metric is chosen, so that it will have a faster reaction to anomalous
behavior, while the previous one using blocks of 50 shows the longer-term behavior.
The short-term profile can be used as the first level of alert, the longer-term one act-
ing to confirm it. The 100 record average of this metric is displayed in blue as well.
Finally the plot in red is the number of messages with attachment(s), per block of
50 emails. It shows the average ratio of emails with attachment(s) versus emails with-
out attachments, and any sudden spike of emails sent with attachment(s) will be de-
tected on the plot.
The profile window displays a fingerprint of the selected user's email frequency
behavior. The most common malicious intrusion can be detected very fast by the
metrics. For instance, a Melissa type virus would be detected since the five plots in
the chart will jump up to 50, 20 and 50 respectively (if the virus sends an attachment
of itself, polymorphic or not, the red plot will jump up). See section 3.4.4 for more
details on virus detection simulations.
Spammers too have a typical profile, easily distinguishable from a normal user's
profile. The metrics in Fig. 11 will be very high most of the time, as a SPAMbot gen-
erates emails to a very wide range of recipients. The address list size will likely grow
much faster then a typical user. If a spammer sends many emails with attachments,
the red metric will be close to 50 most of the time, unlike the general intuitive case of
normal user behavior.
2.4.2 Chi Square Test of User Histograms
The purpose of the Chi Square window shown in Fig. 6 is to test the hypothesis that
the recipient frequencies are identical (for a given user) over two different time
Search WWH ::




Custom Search