Information Technology Reference
In-Depth Information
from intuition, it is easier to detect a potential anomaly if the size of the recipient list
of the attack email is large.
Table 3. Simulation of user cliques, with 5 attack strategies.
Attack Strategy
Detection Rate
Send to all addresses, one at a time
0
Send many emails, each containing 2 ran-
dom addresses
13 %
Send many emails, each containing 3 ran-
dom addresses
49 %
Send many emails, each containing 5 ran-
dom addresses
96 %
Send 1 email, containing all addresses
100 %
3
Supervised Machine Learning Models
In addition to the attachment and account frequency models, EMT includes an inte-
grated supervised learning feature akin to that implemented in the MEF system previ-
ously reported in [12].
3.1
Modeling Malicious Attachments
MEF is designed to extract content features of a set of known malicious attachments,
as well as benign attachments. The features are then used to compose a set of training
data for a supervised learning program that computes a classifier. Fig. 2 displays an
attachment profile including a class label that is either “malicious” or “benign”.
MEF was designed as a component of MET. Each attachment flowing into an
email account would first be tested by a previously learned classifier, and if the likeli-
hood of “malicious” were deemed high enough, the attachment would be so labeled,
and the rest of the MET machinery would be called into action to communicate the
newly discovered malicious attachment, sending reports from MET clients to MET
servers.
The core elements of MEF are also being integrated into EMT. However, here the
features extracted from the training data include content-based features of email bod-
ies (not just attachment features).
The Naïve Bayes learning program is used to compute classifiers over labeled
email messages deemed interesting or malicious by a security analyst. The GUI al-
lows the user to mark emails indicating those that are interesting and those that are
not, and then may learn a classifier that is subsequently used to mark the remaining
set of unlabeled emails in the database automatically.
A Naïve Bayes [5] classifier computes the likelihood that an email is interesting
given a set of features extracted from the set of training emails that are specified by
the analyst. In the current version of EMT, the set of features extracted from emails
includes a set of static features such as domain name, time, sender email name, num-
ber of attachments, the MIME-type of the attachment, the likelihood the attachment is
Search WWH ::




Custom Search