Biology Reference
In-Depth Information
(d) The computations stop (in Step 4) when the centers stop moving, at
which point the cluster membership probabilities may be computed by
(2.7). These probabilities are not used in the algorithm, but can be used
later to determine “rigid” clusters, say by assigning each data point to the
cluster with the highest probability. In our experience, these final clusters
give better estimates of centers and covariance matrices.
Example 2.3. Figure 2.2(b) shows probability level sets for the data of Exam-
ple 2.1, as determined by the principle (2.5) using the centers and covariances
computed by Algorithm 1.
2.4. Estimation of Parameters of Normal Distribution
The PDQ Algorithm of Sec. 2.3 is an alternative to the well known Expectation-
Maximization (EM) method for de-mixing distributions [10]. Given observations
from a density φ ( x ), that is itself a mixture of two densities,
φ ( x )= πφ 1 ( x )+(1
π ) φ 2 ( x ) ,
(2.34)
it is required to estimate the weight π , and the relevant parameters of the distribu-
tions φ 1 and φ 2 .
A common situation is when the distribution φ is a mixture of normal distri-
butions φ k , each with its mean c k and covariance Σ k that need to be estimated.
For the purpose of comparison with Algorithm 1, we present here the EM
Method for a Gaussian mixture (2.34) of two distributions,
exp
2 ( x c k ) T Σ k ( x c k ) .
1
(2 π ) n
1
φ k ( x )=
(2.35)
|
Σ k |
For further detail see, e.g., Hastie et al. [6].
Search WWH ::




Custom Search