Outlier Detection - Data Mining: Concepts and Techniques - page 574

Databases Reference

In-Depth Information

Example 12.22 Contextual outlier detection when the context can be clearly identified. In customer-

relationship management, we can detect outlier customers in the context of customer

groups. Suppose AllElectronics maintains customer information on four attributes,

namely agegroup (i.e., under25 , 25-45 , 45-65 , and over65 ), postalcode , numberof

transactionsperyear , and annualtotaltransactionamount . The attributes agegroup

and postalcode serve as contextual attributes, and the attributes numberof

transactionsperyear and annualtotaltransactionamount are behavioral attributes.

To detect contextual outliers in this setting, for a customer, c , we can first locate the

context of c using the attributes agegroup and postalcode . We can then compare c with

the other customers in the same group, and use a conventional outlier detection method,

such as some of the ones discussed earlier, to determine whether c is an outlier.

Contexts may be specified at different levels of granularity. Suppose AllElectronics

maintains customer information at a more detailed level for the attributes age ,

postalcode , numberoftransactionsperyear , and annualtotaltransactionamount . We

can still group customers on age and postalcode , and then mine outliers in each group.

What if the number of customers falling into a group is very small or even zero? For a

customer, c , if the corresponding context contains very few or even no other customers,

the evaluation of whether c is an outlier using the exact context is unreliable or even

impossible.

To overcome this challenge, we can assume that customers of similar age and who

live within the same area should have similar normal behavior. This assumption can

help to generalize contexts and makes for more effective outlier detection. For example,

using a set of training data, we may learn a mixture model, U , of the data on the con-

textual attributes, and another mixture model, V , of the data on the behavior attributes.

A mapping p

is also learned to capture the probability that a data object o belong-

ing to cluster U j on the contextual attributes is generated by cluster V i on the behavior

attributes. The outlier score can then be calculated as

.

V i j U j /

X

X

S

.

o

/D

p

.

o 2 U j /

p

.

o 2 V i /

p

.

V i j U j /

.

(12.20)

U j

V i

Thus, the contextual outlier problem is transformed into outlier detection using mix-

ture models.

12.7.2 Modeling Normal Behavior with Respect to Contexts

In some applications, it is inconvenient or infeasible to clearly partition the data into

contexts. For example, consider the situation where the online store of AllElectronics

records customer browsing behavior in a search log. For each customer, the data log con-

tains the sequence of products searched for and browsed by the customer. AllElectronics

is interested in contextual outlier behavior, such as if a customer suddenly purchased a

product that is unrelated to those she recently browsed. However, in this application,

contexts cannot be easily specified because it is unclear how many products browsed

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home