Probabilistic Reasoning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

themes, e.g. “politics” and/or “culture”. We handle these cased by labeling them

with the most related theme. Under ML criterion, after some rounds of EM

iterations, we finally determine the theme of test document according to formula

(6.51).

EM algorithm is one of the primary parameter estimation approaches for

sparse data. It performs E step and M step alternately, so that to find the most

likely result. The general process of EM algorithm is described below:

(1) E step: calculate expectation based on the current parameters;

(2) M step: find the proper parameter with maximum likelihood based on the

expectation in E step;

(3) Compute the likelihood with the renewed parameters. If the likelihood

exceeds predefined threshold or number of iteration exceeds predefined value,

stop. Otherwise, go to Step (1).

In our algorithm, we adopt following two steps to perform iteration

(1) In E step, we obtain the expectation via the following Bayesian formula:

(

)

(

)

(

)

(

)

(6.52)

(

)

(

)

(

)

In terms of probabilistic semantics, the formula explains the probability of

word

(2) In M step, we use the expectation from the above step to estimate the density

of parameters.

in document

with latent theme variable

(

)

(

)

(

)

(6.53a)

(

)

(

)

(

)

(

)

(

)

(6.53b)

(

)

(

)

(

)

(

)

(

)

(6.53c)

(

)

Compared with SVD in LSA, EM algorithm has linear convergence time. It is

simple and easy to implement, and it results in local optimal of likelihood

function. Figure 6.4 shows the relation of iteration times and corresponding

maximum likelihood in our experiment.

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home