Biology Reference
In-Depth Information
Chapter 2
Probabilistic Distance Clustering: Algorithm and Applications
C. Iyigun
RUTCOR-Rutgers Center for Operations Research, Rutgers University
640 Bartholomew Rd., Piscataway, NJ 08854-8003, USA
Email: iyigun@rutcor.rutgers.edu
A. Ben-Israel
RUTCOR-Rutgers Center for Operations Research, Rutgers University
640 Bartholomew Rd., Piscataway, NJ 08854-8003, USA
Email: adi.benisrael@gmail.com
The probabilistic distance clustering method of the authors [2, 8], assumes the
cluster membership probabilities given in terms of the distances of the data points
from the cluster centers, and the cluster sizes. A resulting extremal principle
is then used to update the cluster centers (as convex combinations of the data
points), and the cluster sizes (if not given.) Progress is monitored by the joint
distance function (JDF), a weighted harmonic mean of the above distances, that
approximates the data by capturing the data points in its lowest contours. The
method is described, and applied to clustering, location problems, and mixtures
of distributions, where it is a viable alternative to the Expectation-Maximization
(EM) method. The JDF also helps to determine the “right” number of clusters
for a given data set.
2.1. Introduction
n ,anda dataset
We take data points to be vectors x =( x 1 ,...,x n )
R
D
consisting of N data points
.A cluster is a set of data points
that are similar, in some sense, and clustering is a process of partitioning a data
set into disjoint clusters.
In distance clustering (or d-clustering ), “similarity” is interpreted in terms
of a distance function d ( x , y ) in
{ x 1 , x 2 ,... x N }
n ,suchas
R
n ,
d ( x , y )=
x y
,
x , y R
(2.1)
29
Search WWH ::




Custom Search