Probabilistic Distance Clustering: Algorithm and Applications - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

(d) The computations stop (in Step 4) when the centers stop moving, at

which point the cluster membership probabilities may be computed by

(2.7). These probabilities are not used in the algorithm, but can be used

later to determine “rigid” clusters, say by assigning each data point to the

cluster with the highest probability. In our experience, these final clusters

give better estimates of centers and covariance matrices.

Example 2.3. Figure 2.2(b) shows probability level sets for the data of Exam-

ple 2.1, as determined by the principle (2.5) using the centers and covariances

computed by Algorithm 1.

2.4. Estimation of Parameters of Normal Distribution

The PDQ Algorithm of Sec. 2.3 is an alternative to the well known Expectation-

Maximization (EM) method for de-mixing distributions [10]. Given observations

from a density φ ( x ), that is itself a mixture of two densities,

φ ( x )= πφ 1 ( x )+(1

−

π ) φ 2 ( x ) ,

(2.34)

it is required to estimate the weight π , and the relevant parameters of the distribu-

tions φ 1 and φ 2 .

A common situation is when the distribution φ is a mixture of normal distri-

butions φ k , each with its mean c k and covariance Σ k that need to be estimated.

For the purpose of comparison with Algorithm 1, we present here the EM

Method for a Gaussian mixture (2.34) of two distributions,

exp −

2 ( x − c k ) T Σ − k ( x − c k ) .

1

(2 π ) n

1

φ k ( x )=

(2.35)

|

Σ k |

For further detail see, e.g., Hastie et al. [6].

Search WWH ::

Custom Search

Home