Databases Reference
In-Depth Information
high-probability density in the surrounding region. The probability density at a point
depends on the distances from this point to the observed objects.
Formally, let x 1 ,
, x n be an independent and identically distributed sample of a
random variable f . The kernel density approximation of the probability density function is
:::
x x i
h
n X
1
nh
O f h .
x
/D
K
,
(10.21)
i D1
where K
is a kernel and h is the bandwidth serving as a smoothing parameter. A ker-
nel can be regarded as a function modeling the influence of a sample point within its
neighborhood. Technically, a kernel K
./
./
is a non-negative real-valued integrable func-
tion that should satisfy two requirements: R C1
for all
values of u . A frequently used kernel is a standard Gaussian function with a mean of 0
and a variance of 1:
1 K
.
u
/
du D 1 and K
. u
/D K
.
u
/
x x i
h
e . x x i / 2
1
p 2
K
D
2 h 2
.
(10.22)
DENCLUE uses a Gaussian kernel to estimate density based on the given set of objects
to be clustered. A point x is called a density attractor if it is a local maximum of the
estimated density function. To avoid trivial local maximum points, DENCLUE uses a
noise threshold,
, and only considers those density attractors x such that O f
x
.
/
.
These nontrivial density attractors are the centers of clusters.
Objects under analysis are assigned to clusters through density attractors using a step-
wise hill-climbing procedure. For an object, x , the hill-climbing procedure starts from
x and is guided by the gradient of the estimated density function. That is, the density
attractor for x is computed as
x 0 D x
r O f
x j
.
/
x j C 1 D x j C
,
(10.23)
jr O f
x j
.
/j
where
is a parameter to control the speed of convergence, and
1
h d C2 n P i D1 K x x i
r O f
.
x
/D
.
(10.24)
.
x i x
/
h
0 if O f
O f
x k + 1
x k
The hill-climbing procedure stops at step k
, and assigns x to the
density attractor x D x k . An object x is an outlier or noise if it converges in the hill-
climbing procedure to a local maximum x with O f
>
.
/<
.
/
x
.
A cluster in DENCLUE is a set of density attractors X and a set of input objects C
such that each object in C is assigned to a density attractor in X , and there exists a path
between every pair of density attractors where the density is above
.
/<
. By using multiple
density attractors connected by paths, DENCLUE can find clusters of arbitrary shape.
 
Search WWH ::




Custom Search