Outlier Detection - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

earlier should be considered as the context, and this number will likely differ for each

product.

This second category of contextual outlier detection methods models the normal

behavior with respect to contexts. Using a training data set, such a method trains a

model that predicts the expected behavior attribute values with respect to the contextual

attribute values. To determine whether a data object is a contextual outlier, we can then

apply the model to the contextual attributes of the object. If the behavior attribute val-

ues of the object significantly deviate from the values predicted by the model, then the

object can be declared a contextual outlier.

By using a prediction model that links the contexts and behavior, these methods

avoid the explicit identification of specific contexts. A number of classification and

prediction techniques can be used to build such models such as regression, Markov

models, and finite state automaton. Interested readers are referred to Chapters 8 and

9 on classification and the bibliographic notes for further details (Section 12.11).

In summary, contextual outlier detection enhances conventional outlier detection

by considering contexts, which are important in many applications. We may be able

to detect outliers that cannot be detected otherwise. Consider a credit card user

whose income level is low but whose expenditure patterns are similar to those of

millionaires. This user can be detected as a contextual outlier if the income level

is used to define context. Such a user may not be detected as an outlier without

contextual information because she does share expenditure patterns with many mil-

lionaires. Considering contexts in outlier detection can also help to avoid false alarms.

Without considering the context, a millionaire's purchase transaction may be falsely

detected as an outlier if the majority of customers in the training set are not mil-

lionaires. This can be corrected by incorporating contextual information in outlier

detection.

12.7.3 Mining Collective Outliers

A group of data objects forms a collective outlier if the objects as a whole deviate sig-

nificantly from the entire data set, even though each individual object in the group may

not be an outlier (Section 12.1). To detect collective outliers, we have to examine the

structure of the data set, that is, the relationships between multiple data objects. This

makes the problem more difficult than conventional and contextual outlier detection.

“Howcanweexplorethedatasetstructure?” This typically depends on the nature

of the data. For outlier detection in temporal data (e.g., time series and sequences), we

explore the structures formed by time, which occur in segments of the time series or sub-

sequences. To detect collective outliers in spatial data, we explore local areas. Similarly,

in graph and network data, we explore subgraphs. Each of these structures is inherent to

its respective data type.

Contextual outlier detection and collective outlier detection are similar in that they

both explore structures. In contextual outlier detection, the structures are the contexts,

as specified by the contextual attributes explicitly. The critical difference in collective

outlier detection is that the structures are often not explicitly defined, and have to be

discovered as part of the outlier detection process.

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home