Probabilistic Reasoning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

themes. Accordingly, we proposed the Bayesian latent semantic model for

document generation.

Let document set be

= {

d 1 , d 2 , …, d n }, and word set be

= {

w 1 , w 2 , …,

w m }. The generation model for document

∈

can be expressed as follows:

(1) Choose document

at the probability of

(

);

(2) Choose a latent theme

, which has the prior knowledge

(

z| ȶ );

(3) Denote the probability that theme

contains document d by

(

z|d, ȶ )

(4) Denote the probability of word

∈ W

under the theme

(

w| z, ȶ )

After above process, we get the observed pair (d

, w

). The latent theme

omitted, and joint probability model is generated:

p d w

(

)

p d p w d

(

)

(

)

(6.44)

∈

(

)

(

)

(

)

(6.45)

This model is a hybrid probabilistic model under the following independence

assumptions:

(1) The generation of each observed pair (

d, w

) is relative independent, and they

are related via latent themes.

(2) The generation of word

is independent of any concrete document

. It only

depends on latent theme variable

Formula (6.45) indicates that in some document

, the distribution of word w

is the convex combination of latent themes. The weight of a theme in the

combination is the probability, at which document d belongs to the theme. Figure

6.3 illustrates the relationships between factors in the model.

...

Figure 6.3. Bayesian latent semantic model.

According to Bayesian formula, we substitute formula (6.45) into formula (6.44)

and get:

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home