Databases Reference
In-Depth Information
earlier should be considered as the context, and this number will likely differ for each
product.
This second category of contextual outlier detection methods models the normal
behavior with respect to contexts. Using a training data set, such a method trains a
model that predicts the expected behavior attribute values with respect to the contextual
attribute values. To determine whether a data object is a contextual outlier, we can then
apply the model to the contextual attributes of the object. If the behavior attribute val-
ues of the object significantly deviate from the values predicted by the model, then the
object can be declared a contextual outlier.
By using a prediction model that links the contexts and behavior, these methods
avoid the explicit identification of specific contexts. A number of classification and
prediction techniques can be used to build such models such as regression, Markov
models, and finite state automaton. Interested readers are referred to Chapters 8 and
9 on classification and the bibliographic notes for further details (Section 12.11).
In summary, contextual outlier detection enhances conventional outlier detection
by considering contexts, which are important in many applications. We may be able
to detect outliers that cannot be detected otherwise. Consider a credit card user
whose income level is low but whose expenditure patterns are similar to those of
millionaires. This user can be detected as a contextual outlier if the income level
is used to define context. Such a user may not be detected as an outlier without
contextual information because she does share expenditure patterns with many mil-
lionaires. Considering contexts in outlier detection can also help to avoid false alarms.
Without considering the context, a millionaire's purchase transaction may be falsely
detected as an outlier if the majority of customers in the training set are not mil-
lionaires. This can be corrected by incorporating contextual information in outlier
detection.
12.7.3 Mining Collective Outliers
A group of data objects forms a collective outlier if the objects as a whole deviate sig-
nificantly from the entire data set, even though each individual object in the group may
not be an outlier (Section 12.1). To detect collective outliers, we have to examine the
structure of the data set, that is, the relationships between multiple data objects. This
makes the problem more difficult than conventional and contextual outlier detection.
“Howcanweexplorethedatasetstructure?” This typically depends on the nature
of the data. For outlier detection in temporal data (e.g., time series and sequences), we
explore the structures formed by time, which occur in segments of the time series or sub-
sequences. To detect collective outliers in spatial data, we explore local areas. Similarly,
in graph and network data, we explore subgraphs. Each of these structures is inherent to
its respective data type.
Contextual outlier detection and collective outlier detection are similar in that they
both explore structures. In contextual outlier detection, the structures are the contexts,
as specified by the contextual attributes explicitly. The critical difference in collective
outlier detection is that the structures are often not explicitly defined, and have to be
discovered as part of the outlier detection process.
 
Search WWH ::




Custom Search