Information Technology Reference
In-Depth Information
p
(
d
,
w
)
=
p
(
z
|
θ
)
p
(
w
|
z
,
θ
)
p
(
d
|
z
,
θ
)
(6.46)
z
Z
Compared with LSA, Bayesian latent semantic model has firm statistical
foundation and avoid the data sensibility in LSA. It also utilizes prior information
of latent theme variables to avoid over stiff like the SVD. In Bayesian latent
semantic model, formula (6.42) can be rewritten as:
U
=
{
p d
(
|
z
)}
i
k
n k
×
V
=
{
p w
(
|
z
)}
i
k
m k
×
!
=
diag p z
(
(
),
p z
(
),
?
p z
(
))
∑∑∑∑
k
k
k
So it has same representation form as that of SVD.
In LSA, the criterion for parameter selection is minimum least square loss.
From the view point of Bayesian learning, in our model, we have two applicable
criterions: maximum a posterior (MAP) and maximum likelihood (ML).
MAP estimation is applied to find the proper latent theme variable under the
condition of document set D and word set W :
∏∏ ∏
P
(
Z
|
D
,
W
)
=
p
(
z
|
d
,
w
)
(6.47)
z
Z
d
D
w
W
According to Bayesian formula, we have:
p
(
z
)
p
(
w
|
z
)
p
(
d
|
z
)
p
(
z
|
d
,
w
)
=
(6.48)
p
(
z
)
p
(
w
|
z
)
p
(
d
|
z
)
z
Z
ML estimation is used to find a proper value of the following expression:
∏ ∏
p
(
d
,
w
)
n
(
d
,
w
)
(6.49)
d
D
w
W
where
n
(
d, w
) represents the count of word
w
in document
d
. In practice, we
often take logarithm of the likelihood, shortly log-likelihood.
ÃÃ
n
(
d
,
w
)
log
p
(
d
,
w
)
(6.50)
d
D
w
W
A general approach to maximize the two estimations is expectation maximum
(EM), which is discussed in detail in section 6.7.
6.7 Semi-supervised Text Mining Algorithms
6.7.1 Web page clustering
Presently there are many algorithms for text classification, and they can achieve
satisfied precision and recall. Yet the cost for obtaining labeled training
documents is very high. Nigam et al. proposed an approach, in which they used
mix corpus including labeled and unlabeled documents to train classifier and
Search WWH ::




Custom Search