Databases Reference
In-Depth Information
justify why the outliers detected are generated by some other mechanisms. This is often
achieved by making various assumptions on the rest of the data and showing that the
outliers detected violate those assumptions significantly.
Outlier detection is also related to noveltydetection in evolving data sets. For example,
by monitoring a social media web site where new content is incoming, novelty detection
may identify new topics and trends in a timely manner. Novel topics may initially appear
as outliers. To this extent, outlier detection and novelty detection share some similarity
in modeling and detection methods. However, a critical difference between the two is
that in novelty detection, once new topics are confirmed, they are usually incorporated
into the model of normal behavior so that follow-up instances are not treated as outliers
anymore.
12.1.2 Types of Outliers
In general, outliers can be classified into three categories, namely global outliers, con-
textual (or conditional) outliers, and collective outliers. Let's examine each of these
categories.
Global Outliers
In a given data set, a data object is a global outlier if it deviates significantly from the rest
of the data set. Global outliers are sometimes called pointanomalies , and are the simplest
type of outliers. Most outlier detection methods are aimed at finding global outliers.
Example 12.2 Global outliers. Consider the points in Figure 12.1 again. The points in region R signifi-
cantly deviate from the rest of the data set, and hence are examples of global outliers.
To detect global outliers, a critical issue is to find an appropriate measurement of
deviation with respect to the application in question. Various measurements are pro-
posed, and, based on these, outlier detection methods are partitioned into different
categories. We will come to this issue in detail later.
Global outlier detection is important in many applications. Consider intrusion detec-
tion in computer networks, for example. If the communication behavior of a computer
is very different from the normal patterns (e.g., a large number of packages is broad-
cast in a short time), this behavior may be considered as a global outlier and the
corresponding computer is a suspected victim of hacking. As another example, in trad-
ing transaction auditing systems, transactions that do not follow the regulations are
considered as global outliers and should be held for further examination.
Contextual Outliers
“Thetemperaturetodayis 28 C.Isitexceptional(i.e.,anoutlier)?” It depends, for exam-
ple, on the time and location! If it is in winter in Toronto, yes, it is an outlier. If it is a
summer day in Toronto, then it is normal. Unlike global outlier detection, in this case,
 
Search WWH ::




Custom Search