Information Technology Reference
In-Depth Information
Based on the above independence assumption and Bayesian formula,
equation (6.54) can be rewritten as:
p
(
c
|
θ
)
p
(
d
|
c
,
θ
)
j
j
p
(
c
|
d
,
θ
)
=
j
p
(
d
|
θ
)
(6.55)
m
r
p
(
c
|
θ
)
p
(
w
|
c
,
θ
)
j
r
j
=
=
1
Ã
k
i
m
r
p
(
c
|
θ
)
p
(
w
|
c
,
θ
)
i
r
i
=
1
=
1
The learning task becomes to learn parameters of model from prior information
in training data. Here we adopt multinomial distribution and Dirichlet conjugate
distribution.
à =
|
D
i
|
I
(
c
(
d
)
=
c
)
i
j
θ
=
p
(
c
|
θ
)
=
1
(6.56a)
c j
j
|
D
|
Ã
|
D
|
α
+
n
(
d
,
w
)
I
(
c
(
d
)
=
c
)
jt
i
t
i
j
θ
=
p
(
w
|
c
,
θ
)
=
i
=
1
(6.56b)
w
|
c
t
j
à Ã
m
k
|
D
i
|
t
j
α
+
n
(
d
,
w
)
I
(
c
(
d
)
=
c
)
j
0
i
k
i
j
=
1
=
1
= Ã
k
α
α
c
( )
where
is the super-parameter of model;
is the class
j
0
ji
i
=
1
labeling function
I
(
a = b
) is characteristic function (if
a = b
, then
I
(
a = b
)=1;
otherwise
)=0).
Although the applicable condition for naïve Bayesian model is somewhat
harsh, numerous experiments demonstrate that even when independence
assumption is unsatisfied, naïve Bayesian model can still work robustly. It has
been one of the most popular methods for text classification.
Below we will classify unlabeled documents according to MAP criterion
based on the knowledge in these unlabeled documents.
Consider the entire sample set
I
(
a = b
D L is the set of documents
that has been labeled in the first stage. Assume that the generation of all samples
in D is mutually independent; then the following equation holds:
D
=
D L D U , where
|
C
|
Ã
p
(
D
|
θ
)
=
p
(
c
|
θ
)
p
(
d
|
c
,
θ
)
p
(
c
(
d
)
|
θ
)
p
(
d
|
c
(
d
),
θ
)
j
i
j
i
i
i
d
D
j
=
1
d
D
i
U
i
L
(6.57)
In the above equation, unlabeled documents are regarded as mix model. Our
learning task is to gain the maximum estimation of model parameter ȶ with the
sample set
D
. according to Bayesian theorem, we have:
p
(
θ
)
p
(
D
|
θ
)
p
(
θ
|
D
)
=
(6.58)
P
(
D
)
Search WWH ::




Custom Search