Databases Reference
In-Depth Information
Now, we can define the localreachabilitydensity of an object, o , as
k N k .
o
/k
P
lrd k .
o
/D
.
(12.13)
o 0 o
reachdist k .
/
o 0 2 N k . o /
There is a critical difference between the density measure here for outlier detection
and that in density-based clustering (Section 12.5). In density-based clustering, to deter-
mine whether an object can be considered a core object in a density-based cluster, we use
two parameters: a radius parameter, r , to specify the range of the neighborhood, and the
minimum number of points in the r -neighborhood. Both parameters are global and are
applied to every object. In contrast, as motivated by the observation that relative density
is the key to finding local outliers, we use the parameter k to quantify the neighborhood
and do not need to specify the minimum number of objects in the neighborhood as a
requirement of density. We instead calculate the local reachability density for an object
and compare it with that of its neighbors to quantify the degree to which the object is
considered an outlier.
Specifically, we define the localoutlierfactor of an object o as
P o 0 2 N k . o /
o 0
lrd k .
/
X
X
o /
lrd k .
o 0
o 0 o
LOF k .
o
/D
D
lrd k .
/
reachdist k .
/
.
(12.14)
k N k .
o
/k
o 0 2 N k . o /
o 0 2 N k . o /
In other words, the local outlier factor is the average of the ratio of the local reachability
density of o and those of o 's k -nearest neighbors. The lower the local reachability density
of o (i.e., the smaller the item P
o 0 o
) and the higher the local
reachability densities of the k -nearest neighbors of o , the higher the LOF value is. This
exactly captures a local outlier of which the local density is relatively low compared to
the local densities of its k -nearest neighbors.
The local outlier factor has some nice properties. First, for an object deep within a
consistent cluster, such as the points in the center of cluster C 2 in Figure 12.8, the local
outlier factor is close to 1. This property ensures that objects inside clusters, no matter
whether the cluster is dense or sparse, will not be mislabeled as outliers.
Second, for an object o , the meaning of LOF
reachdist k .
/
o 0 2 N k . o /
.
o
/
is easy to understand. Consider the
objects in Figure 12.9, for example. For object o , let
o 0 o
/j o 0 2 N k .
direct min .
o
/D min f reachdist k .
o
/g
(12.15)
be the minimum reachability distance from o to its k -nearest neighbors. Similarly, we
can define
o 0 o
/j o 0 2 N k .
direct max .
o
/D max f reachdist k .
o
/g.
(12.16)
We also consider the neighbors of o 's k -nearest neighbors. Let
o 00 o 0
/j o 0 2 N k .
and o 00 2 N k .
o 0
indirect min .
o
/D min f reachdist k .
o
/
/g (12.17)
 
Search WWH ::




Custom Search