Probabilistic Distance Clustering: Algorithm and Applications - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

Chapter 2

Probabilistic Distance Clustering: Algorithm and Applications

C. Iyigun

RUTCOR-Rutgers Center for Operations Research, Rutgers University

640 Bartholomew Rd., Piscataway, NJ 08854-8003, USA

Email: iyigun@rutcor.rutgers.edu

A. Ben-Israel

RUTCOR-Rutgers Center for Operations Research, Rutgers University

640 Bartholomew Rd., Piscataway, NJ 08854-8003, USA

Email: adi.benisrael@gmail.com

The probabilistic distance clustering method of the authors [2, 8], assumes the

cluster membership probabilities given in terms of the distances of the data points

from the cluster centers, and the cluster sizes. A resulting extremal principle

is then used to update the cluster centers (as convex combinations of the data

points), and the cluster sizes (if not given.) Progress is monitored by the joint

distance function (JDF), a weighted harmonic mean of the above distances, that

approximates the data by capturing the data points in its lowest contours. The

method is described, and applied to clustering, location problems, and mixtures

of distributions, where it is a viable alternative to the Expectation-Maximization

(EM) method. The JDF also helps to determine the “right” number of clusters

for a given data set.

2.1. Introduction

n ,anda dataset

We take data points to be vectors x =( x 1 ,...,x n )

∈ R

D

consisting of N data points

.A cluster is a set of data points

that are similar, in some sense, and clustering is a process of partitioning a data

set into disjoint clusters.

In distance clustering (or d-clustering ), “similarity” is interpreted in terms

of a distance function d ( x , y ) in

{ x 1 , x 2 ,... x N }

n ,suchas

R

n ,

d ( x , y )=

x − y

,

∀ x , y ∈ R

(2.1)

29

Search WWH ::

Custom Search

Home