Biology Reference
In-Depth Information
In probabilistic clustering the cluster membership is expressed by probabil-
ities p k ( x )=Prob
C k .In
probabilistic d-clustering these probabilities depend on the relevant distances.
Probabilistic d-clustering adjusted for the cluster size q , is called probabilistic
{ x ∈C k }
, that a data point x belongs to the cluster
{
d,q
-clustering .
An algorithm for probabilistic
}
-clustering, called the PDQ Algorithm ,
is presented in Sec. 2.2. It updates the cluster centers, and the cluster sizes (if not
given) using the extremal principle of Sec. 2.2.3.
A byproduct of this approach is the joint distance function , see Sec. 2.2.2,
a function that captures the data in its low level sets, and provides a continuous
approximation of a discrete data set, using few parameters.
In Sec. 2.4 we apply the PDQ algorithm to estimation of the parameters of
Gaussian mixtures, and report the results in Sec. 2.5.
The PDQ algorithm is also effective for solving capacitated multi-facility lo-
cation problems, see Sec. 2.6.
Another application of the PDQ algorithm is for determining the “right” num-
ber of clusters, see Sec. 2.7.
For other approaches to probabilistic clustering see the surveys in H oppner et
al. [7], Tan et al. [14]. A unified optimization framework for clustering was given
by Teboulle [15].
{
d,q
}
2.2. Probabilistic
{
d,q
}
-Clustering
Let a data set
D⊂ R
n be partitioned into K clusters
{C k : k =1 ,
···
,K
}
,
k =1 C k
K
D =
(2.3)
let c k be the center (in some sense) of the cluster
C k ,andlet q k be the cluster size ,
which is known in some applications, and an unknown to be estimated in others.
In what follows the cluster sizes, or their estimates, are assumed given wherever
appearing in the right hand side of a formula.
With each data point x ∈D
,andacluster
C k with center c k , we associate:
a distance d k ( x , c k ), also denoted d k ( x ),orjust d k if x is understood,
a probability of membership in
C k , denoted p k ( x ),orjust p k .
) are different for different clusters. In
particular, different Mahalanobis distances
In general, the distance functions d k (
·
x c k , Σ k ( x c k )
1 / 2 ,
d k ( x )=
(2.4)
Search WWH ::




Custom Search