Database Reference
In-Depth Information
hard-movMF , even though the final probability values obtained by soft-movMF
are actually very close to 0 and 1; and (ii) why is soft-movMF , which needs
to estimate more parameters, doing better even when there are insucient
number of points relative to the dimensionality of the space.
It turns out that both these issues can be understood by taking a closer look
at how soft-moVMF converges. In all our experiments, we initialized κ to 10,
and the initial centroids to small random perturbations of the global centroid.
Hence, for soft-movMF , the initial posterior membership distributions of the
data points are almost uniform and the Shannon entropy of the hidden random
variables is very high. The change of this entropy over iterations for the
News20 subsets is presented in Figure 6.8 . The behavior is similar for all the
other datasets that we studied. Unlike kmeans-based algorithms where most
of the relocation happens in the first two or three iterations with only minor
adjustments later on, in soft-movMF the data points are non-committal in
the first few iterations, and the entropy remains very high (the maximum
possible entropy for 3 clusters can be log 2 3=1 . 585). The cluster patterns
are discovered only after several iterations, and the entropy drops drastically
within a small number of iterations after that. When the algorithm converges,
the entropy is practically zero and all points are effectively hard-assigned
to their respective clusters. Note that this behavior is strikingly similar to
(locally adaptive) annealing approaches where κ canbeconsideredasthe
inverse of the temperature parameter. The drastic drop in entropy after a few
iterations is the typical critical temperature behavior observed in annealing.
As text data has only non-negative features values, all the data points lie in
the first orthant of the d -dimensional hypersphere and, hence, are naturally
very concentrated. Thus, the final κ values on convergence are very high,
reflecting the concentration in the data, and implying a low final tempera-
ture from the annealing perspective. Then, initializing κ to a low value, or
equivalently a high temperature, is a good idea because in that case when
soft-movMF is running, the κ values will keep on increasing over successive
iterations to get to its final large values, giving the effect of a decreasing tem-
perature in the process, without any explicit deterministic annealing strat-
egy. Also different mixture components can take different values of κ ,as
automatically determined by the EM updates, and need not be controlled
by any external heuristic. The cost of the added flexibility in soft-moVMF
over spkmeans is the extra computation in estimating the κ values. Thus the
soft-movMF algorithm provides a trade-off between modeling power and com-
putational demands, but one that, judging from the empirical results, seems
quite worthwhile. The hard-movMF algorithm, instead of using the more gen-
eral vMF model, suffers because of hard-assignments from the very beginning.
The fskmeans and spkmeans do not do well for dicult datasets due to their
hard assignment scheme as well as their significantly less modeling capabili-
ties.
Finally, a word on model selection, since choosing the number of clusters
remains one of the widely debated topics in clustering (31). A new objec-
Search WWH ::




Custom Search