Database Reference
In-Depth Information
Note that in case of a probabilistic classifier, the crisp classification
y
k
(
x
) is usually obtained as follows:
P
M
k
(
y
=
c
i
|
y
k
(
x
) = arg max
c
i
∈dom
(
y
)
x
)
,
(9.3)
P
M
k
(
y
=
c
where
M
k
denotes classifier
k
and
|
x
) denotes the probability of
y
obtaining the value
c
given an instance
x
.
9.3.1.2
Performance Weighting
The weight of each classifier can be set proportional to its accuracy
performance on a validation set [Opitz and Shavlik (1996)]:
(1
−
E
i
)
α
i
=
,
(9.4)
T
(1
−
E
j
)
j
=1
where
E
i
is a normalization factor which is based on the performance
evaluation of classifier
i
on a validation set.
9.3.1.3
Distribution Summation
The idea of the distribution summation combining method is to sum up
the conditional probability vector obtained from each classifier
[
Clark and
Boswell (1991)
]
. The selected class is chosen according to the highest value
in the total vector. Mathematically, it can be written as:
P
M
k
(
y
=
c
i
|
Class
(
x
) = argmax
c
i
∈dom
(
y
)
x
)
.
(9.5)
k
9.3.1.4
Bayesian Combination
In the Bayesian combination method the weight associated with each
classifier is the posterior probability of the classifier given the training set
[Buntine (1990)].
P
M
k
(
y
=
c
i
|
Class
(
x
) = argmax
c
i
∈dom
(
y
)
P
(
M
k
|
S
)
·
x
)
,
(9.6)
k
where
P
(
M
k
|
S
) denotes the probability that the classifier
M
k
is correct
given the training set
S
. The estimation of
P
(
M
k
|
S
) depends on the
classifier's representation. To estimate this value for decision trees the
reader is referred to
[
Buntine (1990)
]
.
Search WWH ::
Custom Search