Databases Reference
In-Depth Information
12
Outlier Detection
Imagine that you are a transaction auditor in a credit card company. To protect your customers
from credit card fraud, you pay special attention to card usages that are rather different
from typical cases. For example, if a purchase amount is much bigger than usual for
a card owner, and if the purchase occurs far from the owner's resident city, then the
purchase is suspicious. You want to detect such transactions as soon as they occur and
contact the card owner for verification. This is common practice in many credit card
companies. Whatdataminingtechniquescanhelpdetectsuspicioustransactions?
Most credit card transactions are normal. However, if a credit card is stolen, its
transaction pattern usually changes dramatically—the locations of purchases and the
items purchased are often very different from those of the authentic card owner and
other customers. An essential idea behind credit card fraud detection is to identify those
transactions that are very different from the norm.
Outlierdetection (also known as anomalydetection ) is the process of finding data
objects with behaviors that are very different from expectation. Such objects are called
outliers or anomalies . Outlier detection is important in many applications in addition
to fraud detection such as medical care, public safety and security, industry damage
detection, image processing, sensor/video network surveillance, and intrusion detection.
Outlier detection and clustering analysis are two highly related tasks. Clustering finds
the majority patterns in a data set and organizes the data accordingly, whereas out-
lier detection tries to capture those exceptional cases that deviate substantially from the
majority patterns. Outlier detection and clustering analysis serve different purposes.
In this chapter, we study outlier detection techniques. Section 12.1 defines the differ-
ent types of outliers. Section 12.2 presents an overview of outlier detection methods. In
the rest of the chapter, you will learn about outlier detection methods in detail. These
approaches, organized here by category, are statistical (Section 12.3), proximity-based
(Section 12.4), clustering-based (Section 12.5), and classification-based (Section 12.6).
In addition, you will learn about mining contextual and collective outliers (Section 12.7)
and outlier detection in high-dimensional data (Section 12.8).
 
Search WWH ::




Custom Search