Biology Reference
In-Depth Information
In
probabilistic clustering
the cluster membership is expressed by probabil-
ities
p
k
(
x
)=Prob
C
k
.In
probabilistic d-clustering
these probabilities depend on the relevant distances.
Probabilistic d-clustering adjusted for the cluster size
q
, is called
probabilistic
{
x
∈C
k
}
, that a data point
x
belongs to the cluster
{
d,q
-clustering
.
An algorithm for probabilistic
}
-clustering, called the
PDQ Algorithm
,
is presented in Sec. 2.2. It updates the cluster centers, and the cluster sizes (if not
given) using the
extremal principle
of Sec. 2.2.3.
A byproduct of this approach is the
joint distance function
, see Sec. 2.2.2,
a function that captures the data in its low level sets, and provides a continuous
approximation of a discrete data set, using few parameters.
In Sec. 2.4 we apply the PDQ algorithm to estimation of the parameters of
Gaussian mixtures, and report the results in Sec. 2.5.
The PDQ algorithm is also effective for solving capacitated multi-facility lo-
cation problems, see Sec. 2.6.
Another application of the PDQ algorithm is for determining the “right” num-
ber of clusters, see Sec. 2.7.
For other approaches to probabilistic clustering see the surveys in H oppner et
al. [7], Tan et al. [14]. A unified optimization framework for clustering was given
by Teboulle [15].
{
d,q
}
2.2. Probabilistic
{
d,q
}
-Clustering
Let a data set
D⊂
R
n
be partitioned into
K
clusters
{C
k
:
k
=1
,
···
,K
}
,
k
=1
C
k
K
D
=
(2.3)
let
c
k
be the
center
(in some sense) of the cluster
C
k
,andlet
q
k
be the
cluster size
,
which is known in some applications, and an unknown to be estimated in others.
In what follows the cluster sizes, or their estimates, are assumed given wherever
appearing in the right hand side of a formula.
With each data point
x
∈D
,andacluster
C
k
with center
c
k
, we associate:
•
a
distance
d
k
(
x
,
c
k
), also denoted
d
k
(
x
),orjust
d
k
if
x
is understood,
•
a
probability
of membership in
C
k
, denoted
p
k
(
x
),orjust
p
k
.
) are different for different clusters. In
particular, different Mahalanobis distances
In general, the distance functions
d
k
(
·
x
−
c
k
,
Σ
−
k
(
x
−
c
k
)
1
/
2
,
d
k
(
x
)=
(2.4)
Search WWH ::
Custom Search