Databases Reference
In-Depth Information
12.4.1 Distance-Based Outlier Detection and a Nested
Loop Method
A representative method of proximity-based outlier detection uses the concept of
distance-based outliers . For a set, D , of data objects to be analyzed, a user can spec-
ify a distance threshold, r , to define a reasonable neighborhood of an object. For each
object, o , we can examine the number of other objects in the r -neighborhood of o . If
most of the objects in D are far from o , that is, not in the r -neighborhood of o , then o
can be regarded as an outlier.
Formally, let r
.
r 0
/
be a distancethreshold and
.
0
< 1
/
be a fraction
threshold. An object, o , is a DB
.
r ,
/
-outlier if
kf o 0 j dist
o , o 0
.
/ r gk
,
(12.10)
k D k
where dist
is a distance measure.
Equivalently, we can determine whether an object, o , is a DB
.,/
-outlier by checking
the distance between o and its k -nearest neighbor, o k , where k Ddk D ke. Object o is an
outlier if dist
.
r ,
/
r , because in such a case, there are fewer than k objects except for
o that are in the r -neighborhood of o .
“HowcanwecomputeDB
.
o , o k />
-outliers?” A straightforward approach is to use nested
loops to check the r -neighborhood for every object, as shown in Figure 12.6. For any
object, o i .
.
r ,
/
, we calculate the distance between o i and the other object, and
count the number of other objects in the r -neighborhood of o i . Once we find
1 i n
/
n other
Algorithm: Distance-based outlier detection.
Input:
a set of objects D Df o 1 ,
:::
, o n g, threshold r
.
r
>
0
/
and
.
0
< 1
/
;
Output: DB
.
r ,
/
outliers in D .
Method:
for i D 1 to n do
count 0
for j D 1 to n do
if i 6D j and dist
o i , o j / r then
count count C1
if count n then
exit f o i cannot be a DB . r ,/ outlierg
endif
endif
endfor
print o i f o i is a DB . r ,/ outlier according to (Eq. 12.10)g
endfor ;
.
Figure 12.6 Nested loop algorithm for DB
.
r ,
/
-outlier detection.
 
Search WWH ::




Custom Search