Databases Reference
In-Depth Information
11.1.3 Expectation-MaximizationAlgorithm
“How can we compute fuzzy clusterings and probabilistic model-based clusterings?” In this
section, we introduce a principled approach. Let's start with a review of the k -means
clustering problem and the k -means algorithm studied in Chapter 10.
It can easily be shown that k -means clustering is a special case of fuzzy clustering
(Exercise 11.1). The k -means algorithm iterates until the clustering cannot be improved.
Each iteration consists of two steps:
The expectation step (E-step): Given the current cluster centers, each object is assigned
to the cluster with a center that is closest to the object. Here, an object is expected to
belong to the closest cluster.
The maximization step (M-step): Given the cluster assignment, for each cluster, the
algorithm adjusts the center so that the sum of the distances from the objects
assigned to this cluster and the new center is minimized. That is, the similarity of
objects assigned to a cluster is maximized.
We can generalize this two-step method to tackle fuzzy clustering and probabilistic
model-based clustering. In general, an expectation-maximization (EM) algorithm is
a framework that approaches maximum likelihood or maximum a posteriori estimates
of parameters in statistical models. In the context of fuzzy or probabilistic model-based
clustering, an EM algorithm starts with an initial set of parameters and iterates until
the clustering cannot be improved, that is, until the clustering converges or the change
is sufficiently small (less than a preset threshold). Each iteration also consists of two
steps:
The expectation step assigns objects to clusters according to the current fuzzy
clustering or parameters of probabilistic clusters.
The maximization step finds the new clustering or parameters that maximize the SSE
in fuzzy clustering (Eq. 11.4) or the expected likelihood in probabilistic model-based
clustering.
Example11.7 Fuzzy clustering using the EM algorithm. Consider the six points in Figure 11.2, where
the coordinates of the points are also shown. Let's compute two fuzzy clusters using the
EM algorithm.
We randomly select two points, say c 1 D a and c 2 D b , as the initial centers of the two
clusters. The first iteration conducts the expectation step and the maximization step as
follows.
In the E-step , for each point we calculate its membership degree in each cluster. For
any point, o , we assign o to c 1 and c 2 with membership weights
1
2
2
2
dist
.
o , c 2 /
dist
.
o , c 1 /
dist
.
o , c 1 /
D
and
2 ,
1
1
2 C dist
2
2 C dist
dist
.
o , c 1 /
.
o , c 2 /
dist
.
o , c 1 /
.
o , c 2 /
2 C
2
dist
.
o , c 1 /
dist
.
o , c 2 /
 
Search WWH ::




Custom Search