Information Technology Reference
In-Depth Information
Themethods implemented in this chapter areNaïve Bayes andMaximumEntropy.
These methods have been efficiently applied to text classification studies in the lit-
erature. There are many successful examples in the literature about Bayesian Prob-
abilistic classifiers [
1
,
15
,
28
,
33
] and Maximum Entropy classifiers [
2
,
18
,
23
].
2.2.5.1 Naïve Bayes Method
The Binary IndependenceModel was developed byYu and Salton [
34
] and Robertson
and Jones [
24
] in the 1970s. The model held the status of being one of the first models
utilized in probabilistic information retrieval. The Naïve BayesMethod can be briefly
reviewed as follows:
Let
x
be a vector to be classified, and
c
k
be a possible class. The information
to be known is the probability that the vector
x
belongs to the class
c
k
. First, the
probability
P
c
k
|
x
(
)
is transformed using Bayes' rule.
(
x
P
|
c
k
)
c
k
|
x
P
(
)
=
P
(
c
k
)
×
(2.1)
(
x
P
)
, i.e., the class probability can be estimated from training data. Due to the
sparsity of training data, in most cases direct estimation of
P
P
(
c
k
)
c
k
|
x
(
)
is impossible.
(
x
P
|
c
k
)
is decomposed below,
d
(
x
P
|
c
k
)
=
P
(
x
j
|
c
k
)
(2.2)
j
=
1
where
x
j
is the
j
th element of vector
x
.So
P
c
k
|
x
(
)
becomes as follows:
j
=
1
P
(
x
j
|
c
k
)
c
k
|
x
P
(
)
=
P
(
c
k
)
×
(2.3)
(
x
P
)
c
k
|
x
can be calculated and
x
can be classified with
By using this equation,
P
(
)
c
k
|
x
the highest
P
(
)
.
2.2.5.2 Maximum Entropy Method
Nigam et al. [
20
] defined maximum entropy as a technique for estimating probability
distributions using data. The most important rule in maximum entropy is that when
nothing is known, the distribution should be kept uniform; in other words, distribution
should have maximal entropy. In order to gather a set of constraints for the model,
which describe class-specific expectations for the distribution, labeled training data
Search WWH ::
Custom Search