Databases Reference
In-Depth Information
Example 12.22 Contextual outlier detection when the context can be clearly identified. In customer-
relationship management, we can detect outlier customers in the context of customer
groups. Suppose AllElectronics maintains customer information on four attributes,
namely agegroup (i.e., under25 , 25-45 , 45-65 , and over65 ), postalcode , numberof
transactionsperyear , and annualtotaltransactionamount . The attributes agegroup
and postalcode serve as contextual attributes, and the attributes numberof
transactionsperyear and annualtotaltransactionamount are behavioral attributes.
To detect contextual outliers in this setting, for a customer, c , we can first locate the
context of c using the attributes agegroup and postalcode . We can then compare c with
the other customers in the same group, and use a conventional outlier detection method,
such as some of the ones discussed earlier, to determine whether c is an outlier.
Contexts may be specified at different levels of granularity. Suppose AllElectronics
maintains customer information at a more detailed level for the attributes age ,
postalcode , numberoftransactionsperyear , and annualtotaltransactionamount . We
can still group customers on age and postalcode , and then mine outliers in each group.
What if the number of customers falling into a group is very small or even zero? For a
customer, c , if the corresponding context contains very few or even no other customers,
the evaluation of whether c is an outlier using the exact context is unreliable or even
impossible.
To overcome this challenge, we can assume that customers of similar age and who
live within the same area should have similar normal behavior. This assumption can
help to generalize contexts and makes for more effective outlier detection. For example,
using a set of training data, we may learn a mixture model, U , of the data on the con-
textual attributes, and another mixture model, V , of the data on the behavior attributes.
A mapping p
is also learned to capture the probability that a data object o belong-
ing to cluster U j on the contextual attributes is generated by cluster V i on the behavior
attributes. The outlier score can then be calculated as
.
V i j U j /
X
X
S
.
o
/D
p
.
o 2 U j /
p
.
o 2 V i /
p
.
V i j U j /
.
(12.20)
U j
V i
Thus, the contextual outlier problem is transformed into outlier detection using mix-
ture models.
12.7.2 Modeling Normal Behavior with Respect to Contexts
In some applications, it is inconvenient or infeasible to clearly partition the data into
contexts. For example, consider the situation where the online store of AllElectronics
records customer browsing behavior in a search log. For each customer, the data log con-
tains the sequence of products searched for and browsed by the customer. AllElectronics
is interested in contextual outlier behavior, such as if a customer suddenly purchased a
product that is unrelated to those she recently browsed. However, in this application,
contexts cannot be easily specified because it is unclear how many products browsed
 
Search WWH ::




Custom Search