Text Clustering with Mixture of von Mises-Fisher Distributions - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

6.2 Related Work

There has been an enormous amount of work on clustering a wide vari-

ety of datasets across multiple disciplines over the past fifty years (26). The

methods presented in this chapter are tailored for high-dimensional data with

directional characteristics, rather than for arbitrary datasets. In the learn-

ing community, perhaps the most widely studied high-dimensional directional

data stem from text documents represented by vector space models. Much

of the work in this domain uses discriminative approaches (48; 54). For ex-

ample, hierarchical agglomerative methods based on cosine, Jaccard or Dice

coe cients were dominant for text clustering till the mid-1990s (39). Over

the past few years several new approaches, ranging from spectral partitioning

(27; 54), to the use of generative models from the exponential family, e.g.,

mixture of multinomials or Bernoulli distributions (35) etc., have emerged. A

fairly extensive list of references on generative approaches to text clustering

can be found in (55).

Of particular relevance to this work is the spkmeans algorithm (20), which

adapts the kmeans algorithm to normalized data by using the cosine simi-

larity for cluster allocation, and also by re-normalizing the cluster means to

unit length. The spkmeans algorithm is superior to regular kmeans for high-

dimensional text data, and competitive or superior in both performance and

speed to a wide range of other existing alternatives for text clustering (49).

It also provides better characterization of clusters in terms of their top repre-

sentative or discriminative terms.

The vMF distribution is known in the literature on directional statistics

(30), and the maximum likelihood estimates (MLE) of the parameters have

been given for a single distribution. Recently Piater (37) obtained parameter

estimates for a mixture for circular, i.e., 2-dimensional vMFs. In an Appendix

to his thesis, Piater starts on an EM formulation for 2-D vMFs but cites the

diculty of parameter estimation (especially κ ) and eventually avoids doing

EM in favor of another numerical gradient descent based scheme. Mooney et

al. (33) use a mixture of two circular von Mises distributions to estimate the

parameters using a quasi-Newton procedure. Wallace and Dowe (51) perform

mixture modeling for circular von Mises distributions and have produced a

software called Snob that implements their ideas. McLachlan and Peel (31)

discuss mixture analysis of directional data and mention the possibility of us-

ing Fisher distributions (3-dimensional vMFs), but instead use 3-dimensional

Kent distributions (30). They also mention work related to the clustering of

directional data, but all the efforts included by them are restricted to 2-D or

3-D vMFs. Indeed, (31) also draws attention to the diculty of parameter

estimation even for 3-D vMFs.

The connection between a generative model involving vMF distributions

with constant κ and the spkmeans algorithm was first observed by (6). A

Search WWH ::

Custom Search

Home