Information Technology Reference
In-Depth Information
If its birth rate is greater than specified threshold T AND sent from at
least X number of users .
2.2
Account Statistics and Alerts
This mechanism has been extended to provide alerts based upon deviation from other
baseline user and group models. EMT computes and displays three tables of statistical
information for any selected email account. The first is a set of stationary email ac-
count models, i.e. statistical data represented as a histogram of the average number of
messages sent over all days of the week, divided into three periods: day, evening, and
night. EMT also gathers information on the average size of messages for these time
periods, and the average number of recipients and attachments for these periods.
These statistics can generate alerts when values are above a set threshold as specified
by the rule-based alert logic section.
We next describe the variety of models available in EMT that may be used to gen-
erate alerts of errant behavior.
2.3
Stationary User Profiles
Histograms are used to model the behavior of a user's email accounts. Histograms are
compared to find similar behavior or abnormal behavior within the same account
(between a long-term profile histogram, and a recent, short-term histogram), and be-
tween different accounts.
A histogram depicts the distribution of items in a given sample. EMT employs a
histogram of 24 bins, for the 24 hours in a day. (Obviously, one may define a different
set of stationary periods as the detect task may demand.) Email statistics are allocated
to different bins according to their outbound time. The value of each bin can represent
the daily average number of emails sent out in that hour, or daily average total size of
attachments sent out in that hour, or other features defined over an of email account
computed for some specified period of time.
Two histogram comparison functions are implemented in the current version of
EMT, each providing a user selectable distance function as described below. The first
comparison function is used to identify groups of email accounts that have similar
usage behavior. The other function is used to compare behavior of an account's recent
behavior to the long-term profile of that account.
2.3.1 Histogram Distance Functions
A distance function is used to measure histogram dissimilarity. For every pair of his-
tograms,
h ,
h , there is a corresponding distance
hD , called the distance be-
tween h and h . The distance function is non-negative, symmetric and 0 for identical
histograms. Dissimilarity is proportional to distance. We adapted some of the more
commonly known distance functions: simplified histogram intersection (L1-form),
Euclidean distance (L2-form), quadratic distance [0] and histogram Mahalanobis
(
1 h
,
)
2
Search WWH ::




Custom Search