Information Technology Reference
In-Depth Information
Themethods implemented in this chapter areNaïve Bayes andMaximumEntropy.
These methods have been efficiently applied to text classification studies in the lit-
erature. There are many successful examples in the literature about Bayesian Prob-
abilistic classifiers [ 1 , 15 , 28 , 33 ] and Maximum Entropy classifiers [ 2 , 18 , 23 ].
2.2.5.1 Naïve Bayes Method
The Binary IndependenceModel was developed byYu and Salton [ 34 ] and Robertson
and Jones [ 24 ] in the 1970s. The model held the status of being one of the first models
utilized in probabilistic information retrieval. The Naïve BayesMethod can be briefly
reviewed as follows:
Let x be a vector to be classified, and c k be a possible class. The information
to be known is the probability that the vector x belongs to the class c k . First, the
probability P
c k | x
(
)
is transformed using Bayes' rule.
( x
P
|
c k )
c k | x
P
(
) =
P
(
c k ) ×
(2.1)
( x
P
)
, i.e., the class probability can be estimated from training data. Due to the
sparsity of training data, in most cases direct estimation of P
P
(
c k )
c k | x
(
)
is impossible.
( x
P
|
c k )
is decomposed below,
d
( x
P
|
c k ) =
P
(
x j |
c k )
(2.2)
j
=
1
where x j is the j th element of vector x .So P
c k | x
(
)
becomes as follows:
j = 1 P
(
x j |
c k )
c k | x
P
(
) =
P
(
c k ) ×
(2.3)
( x
P
)
c k | x
can be calculated and x can be classified with
By using this equation, P
(
)
c k | x
the highest P
(
)
.
2.2.5.2 Maximum Entropy Method
Nigam et al. [ 20 ] defined maximum entropy as a technique for estimating probability
distributions using data. The most important rule in maximum entropy is that when
nothing is known, the distribution should be kept uniform; in other words, distribution
should have maximal entropy. In order to gather a set of constraints for the model,
which describe class-specific expectations for the distribution, labeled training data
 
Search WWH ::




Custom Search