Information Technology Reference
In-Depth Information
distance [0]. These standard measures were modified to be more suitable for email
usage behavior analysis. For concreteness,
n
i
=
1
0
L1-form:
D
(
h
,
h
)
=
|
h
[
i
]
h
[
i
]
|
(1)
1
1
2
1
2
n
i
=
1
0
2
L2-form:
D
(
h
,
h
)
=
(
h
[
i
]
h
[
i
])
(2)
2
1
2
1
2
D
(
h
,
h
)
=
(
h
h
)
T
A
(
h
h
)
Quadratic:
(3)
3
1
2
1
2
1
2
where n is the number of bins in the histogram. In the quadratic function, A is a ma-
trix where
i a denotes the similarity between bins i and j. In EMT we set
1
a ij , which assumes that the behavior in neighboring hours is more
similar. The Mahalanobis distance is a special case of the quadratic distance, where
A is given by the inverse of the covariance matrix obtained from a set of training
histograms. We will describe this in detail.
=
|
i
j
|
+
2.3.2 Abnormal User Account Behavior
The histogram distance functions are applied to one target email account. (See Fig. 4 .)
A long term profile period is first selected by an analyst as the “normal” behavior
training period. The histogram computed for this period is then compared to another
histogram computed for a more recent period of email behavior. If the histograms are
very different (i.e., they have a high distance), an alert is generated indicating possible
account misuse. We use the weighted Mahalanobis distance function for this detection
task.
The long term profile period is used as the training set, for example, a single
month. We assume the bins in the histogram are random variables that are statistically
independent. Then we get the following formula:
T
D
(
h
,
h
)
=
(
h
h
)
A
(
h
h
)
(4)
4
1
1
1
2
0
σ
0
0
0
σ
2
1
0
1
A
=
B
2
,
b
=
Cov
(
h
[
i
],
h
[
i
])
=
Var
(
h
[
i
])
=
σ
(5)
B
=
ii
i
2
0
0
σ
n
1
Then we get:
n
i
=
1
0
2
2
D
(
h
,
h
)
=
((
h
[
i
]
h
[
i
])
/
σ
)
(6)
4
1
1
i
Vector h represents the histogram of the (eg., one month) profile period, while
h represents the recent profile period (eg., one week).
σ
describes the dispersion of
usage behavior around the arithmetic mean. We then modify the Mahalanobis dis-
tance function to the weighted version. First we reduce the distance function from the
i
Search WWH ::




Custom Search