Text Clustering with Mixture of von Mises-Fisher Distributions - Text Mining: Classification, Clustering, and Applications - page 125

Database Reference

In-Depth Information

apply the model to real life text datasets: How to eciently and accurately

compute κ h ,h =1 ,...,k from (6.11) for high-dimensional data? The problem

of estimating κ h is analyzed in Section 6.5.1 and experimentally studied in

Section 6.5.2.

6.5.1 Approximating κ

Recall that due to the lack of an analytical solution, it is not possible to

directly estimate the κ values (see ( 6.5 ) and (6.11)). One may employ a non-

linear root-finder for estimating κ , but for high dimensional data, problems of

overflows and numerical instabilities plague such root-finders. Therefore, an

asymptotic approximation of κ is the best choice for estimating κ . Such ap-

proaches also have the benefit of taking constant computation time as opposed

to any iterative method.

Mardia and Jupp (30) provide approximations for estimating κ for a single

component (6.5) for two limiting cases (Approximations (10.3.7) and (10.3.10)

of (30, pp. 198)):

d

−

1

κ

≈

valid for large r,

(6.14)

2(1

−

r )

dr 1+

( d +2) 2 ( d +4) r 4

d 2 ( d +8)

d

d +2 r 2 +

κ

≈

valid for small r,

(6.15)

where r is given by (6.5).

These approximations assume that κ

d , which is typically not valid for

high dimensional data (see the discussion in Section 6.8 for an intuition).

Furthermore, the r values corresponding to the text datasets considered in

this chapter are in the mid-range rather than in the two extreme ranges of r

that are catered to by the above approximations. We obtain a more accurate

approximation for κ as described below. With A d ( κ )= I d/ 2 ( κ )

I d/ 2 − 1 ( κ ) ,observe

that A d ( κ ) is a ratio of Bessel functions that differ in their order by just one.

Fortunately there exists a continuedfractionrepresentationof A d ( κ ) (52)

given by

I d/ 2 ( κ )

I d/ 2 − 1 ( κ )

1

A d ( κ )=

=

.

(6.16)

1

d

κ +

d +2

κ

+

···

Letting A d ( κ )= r , we can write (6.16) approximately as

1

r ≈

d

κ + r,

which yields

dr

κ

≈

.

1

−

r 2

Next Page

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home