Text Clustering with Mixture of von Mises-Fisher Distributions - Text Mining: Classification, Clustering, and Applications - page 137

Database Reference

In-Depth Information

Approximation error comparison for (1.14), (1.15) and (1.17)

10 0

10 −2

10 −4

10 −6

10 −8

(1.14)

(1.15)

(1.17)

10 −10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Average resultant parameter "r"

FIGURE 6.3 : Comparison of approximations for varying r (with d = 1000).

6.6 Algorithms

Mixture models based on vMF distributions naturally lead to two algo-

rithms for clustering directional data. The algorithms are centered on soft

and hard-assignment schemes and are titled soft-moVMF and hard-moVMF

respectively. The soft-moVMF algorithm ( Algorithm 5 ) estimates the param-

eters of the mixture model exactly following the derivations in Section 6.4

using EM. Hence, it assigns soft (or probabilistic) labels to each point that

are given by the posterior probabilities of the components of the mixture con-

ditioned on the point. On termination, the algorithm gives the parameters

Θ=

k

{

α h ,μ h ,κ h }

h =1 of the k vMF distributions that model the dataset

X

,as

well as the soft-clustering , i.e., the posterior probabilities p ( h

|

x i , Θ), for all h

and i .

The hard-moVMF algorithm ( Algorithm 6 ) estimates the parameters of the

mixture model using a hard assignment, or, winner takes all strategy. In

other words, we do the assignment of the points based on a derived posterior

distribution given by (6.13). After the hard assignments in every iteration,

each point belongs to a single cluster. As before, the updates of the component

parameters are done using the posteriors of the components, given the points.

The crucial difference in this case is that the posterior probabilities are allowed

to take only binary (0/1) values. Upon termination, Algorithm 6 yields a hard

Next Page

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home