Outlier Detection - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

12

Outlier Detection

Imagine that you are a transaction auditor in a credit card company. To protect your customers

from credit card fraud, you pay special attention to card usages that are rather different

from typical cases. For example, if a purchase amount is much bigger than usual for

a card owner, and if the purchase occurs far from the owner's resident city, then the

purchase is suspicious. You want to detect such transactions as soon as they occur and

contact the card owner for verification. This is common practice in many credit card

companies. Whatdataminingtechniquescanhelpdetectsuspicioustransactions?

Most credit card transactions are normal. However, if a credit card is stolen, its

transaction pattern usually changes dramatically—the locations of purchases and the

items purchased are often very different from those of the authentic card owner and

other customers. An essential idea behind credit card fraud detection is to identify those

transactions that are very different from the norm.

Outlierdetection (also known as anomalydetection ) is the process of finding data

objects with behaviors that are very different from expectation. Such objects are called

outliers or anomalies . Outlier detection is important in many applications in addition

to fraud detection such as medical care, public safety and security, industry damage

detection, image processing, sensor/video network surveillance, and intrusion detection.

Outlier detection and clustering analysis are two highly related tasks. Clustering finds

the majority patterns in a data set and organizes the data accordingly, whereas out-

lier detection tries to capture those exceptional cases that deviate substantially from the

majority patterns. Outlier detection and clustering analysis serve different purposes.

In this chapter, we study outlier detection techniques. Section 12.1 defines the differ-

ent types of outliers. Section 12.2 presents an overview of outlier detection methods. In

the rest of the chapter, you will learn about outlier detection methods in detail. These

approaches, organized here by category, are statistical (Section 12.3), proximity-based

(Section 12.4), clustering-based (Section 12.5), and classification-based (Section 12.6).

In addition, you will learn about mining contextual and collective outliers (Section 12.7)

and outlier detection in high-dimensional data (Section 12.8).

Search WWH ::

Custom Search

Home