Databases Reference
In-Depth Information
high-probability density in the surrounding region. The probability density at a point
depends on the distances from this point to the observed objects.
Formally, let
x
1
,
,
x
n
be an independent and identically distributed sample of a
random variable
f
. The
kernel density approximation of the probability density function
is
:::
x
x
i
h
n
X
1
nh
O
f
h
.
x
/D
K
,
(10.21)
i
D1
where
K
is a kernel and
h
is the bandwidth serving as a smoothing parameter. A
ker-
nel
can be regarded as a function modeling the influence of a sample point within its
neighborhood. Technically, a kernel
K
./
./
is a non-negative real-valued integrable func-
tion that should satisfy two requirements:
R
C1
for all
values of
u
. A frequently used kernel is a standard Gaussian function with a mean of 0
and a variance of 1:
1
K
.
u
/
du
D 1 and
K
.
u
/D
K
.
u
/
x
x
i
h
e
.
x
x
i
/
2
1
p
2
K
D
2
h
2
.
(10.22)
DENCLUE uses a Gaussian kernel to estimate density based on the given set of objects
to be clustered. A point
x
is called a
density attractor
if it is a local maximum of the
estimated density function. To avoid trivial local maximum points, DENCLUE uses a
noise threshold,
, and only considers those density attractors
x
such that
O
f
x
.
/
.
These nontrivial density attractors are the centers of clusters.
Objects under analysis are assigned to clusters through density attractors using a step-
wise hill-climbing procedure. For an object,
x
, the hill-climbing procedure starts from
x
and is guided by the gradient of the estimated density function. That is, the density
attractor for
x
is computed as
x
0
D
x
r
O
f
x
j
.
/
x
j
C
1
D
x
j
C
,
(10.23)
jr
O
f
x
j
.
/j
where
is a parameter to control the speed of convergence, and
1
h
d
C2
n
P
i
D1
K
x
x
i
r
O
f
.
x
/D
.
(10.24)
.
x
i
x
/
h
0 if
O
f
O
f
x
k
+
1
x
k
The hill-climbing procedure stops at step
k
, and assigns
x
to the
density attractor
x
D
x
k
. An object
x
is an outlier or noise if it converges in the hill-
climbing procedure to a local maximum
x
with
O
f
>
.
/<
.
/
x
.
A cluster in DENCLUE is a set of density attractors
X
and a set of input objects
C
such that each object in
C
is assigned to a density attractor in
X
, and there exists a path
between every pair of density attractors where the density is above
.
/<
. By using multiple
density attractors connected by paths, DENCLUE can find clusters of arbitrary shape.