Probabilistic Distance Clustering: Algorithm and Applications - Clustering Challenges in Biological Network - page 30

Biology Reference

In-Depth Information

where

·

is a norm. In particular, the Mahalanobis distance with the norm,

u , Σ − 1

1 / 2 ,

u

=

u

(2.2)

where Σ is the covariance matrix of the data involved.

2 with N = 1100 data points is shown in Fig. 2.1.

The data on the left was simulated from a normal distribution N (

Example 2.1. A data set in

R

µ

, Σ), with:

µ 1 =(2 , 0) , Σ 1 = 0 . 0005 0

, (100 points) ,

0 . 05

and the data on the right consist of 1000 points, simulated in a circle of diameter

1 centered at

µ 2 =(3 , 0) according to a radially symmetric distribution with

1

2 .

This data will serve as illustration in Examples 2.2-2.3 below.

Prob

{ x − µ 2 ≤

r

}

=2 r , 0

≤

r

≤

0.8

0.6

0.4

0.2

0

−0.2

−0.4

−0.6

−0.8

1.8

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

R 2

Fig. 2.1.

A data set in

In d-clustering each point is typically assigned to the cluster with the near-

est center. After each assignment, the cluster centers may change, requiring a

re-classification of the data points. A d-clustering algorithm will therefore it-

erate between centers and re-assignments. The best known such method is the

k -means clustering algorithm , see Hartigan [5].

Next Page

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home