Advanced Cluster Analysis - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

11.1.3 Expectation-MaximizationAlgorithm

“How can we compute fuzzy clusterings and probabilistic model-based clusterings?” In this

section, we introduce a principled approach. Let's start with a review of the k -means

clustering problem and the k -means algorithm studied in Chapter 10.

It can easily be shown that k -means clustering is a special case of fuzzy clustering

(Exercise 11.1). The k -means algorithm iterates until the clustering cannot be improved.

Each iteration consists of two steps:

The expectation step (E-step): Given the current cluster centers, each object is assigned

to the cluster with a center that is closest to the object. Here, an object is expected to

belong to the closest cluster.

The maximization step (M-step): Given the cluster assignment, for each cluster, the

algorithm adjusts the center so that the sum of the distances from the objects

assigned to this cluster and the new center is minimized. That is, the similarity of

objects assigned to a cluster is maximized.

We can generalize this two-step method to tackle fuzzy clustering and probabilistic

model-based clustering. In general, an expectation-maximization (EM) algorithm is

a framework that approaches maximum likelihood or maximum a posteriori estimates

of parameters in statistical models. In the context of fuzzy or probabilistic model-based

clustering, an EM algorithm starts with an initial set of parameters and iterates until

the clustering cannot be improved, that is, until the clustering converges or the change

is sufficiently small (less than a preset threshold). Each iteration also consists of two

steps:

The expectation step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters.

The maximization step finds the new clustering or parameters that maximize the SSE

in fuzzy clustering (Eq. 11.4) or the expected likelihood in probabilistic model-based

clustering.

Example11.7 Fuzzy clustering using the EM algorithm. Consider the six points in Figure 11.2, where

the coordinates of the points are also shown. Let's compute two fuzzy clusters using the

EM algorithm.

We randomly select two points, say c 1 D a and c 2 D b , as the initial centers of the two

clusters. The first iteration conducts the expectation step and the maximization step as

follows.

In the E-step , for each point we calculate its membership degree in each cluster. For

any point, o , we assign o to c 1 and c 2 with membership weights

1

2

dist

.

o , c 2 /

dist

.

o , c 1 /

dist

.

o , c 1 /

D

and

2 ,

1

2 C dist

2

2 C dist

dist

.

o , c 1 /

.

o , c 2 /

dist

.

o , c 1 /

.

o , c 2 /

2 C

2

dist

.

o , c 1 /

dist

.

o , c 2 /

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home