Information Technology Reference
In-Depth Information
•
Information-Gain criterion
Denoting
n
r
=
n
1
r
+
n
0
r
,
n
l
=
n
1
l
+
n
0
l
,wehave:
n
1
r
n
r
ln
n
1
r
n
0
r
n
r
ln
n
0
r
Entropy of the right node:
H
r
=
−
n
r
−
n
r
.
n
1
l
n
l
ln
n
1
l
n
0
l
n
l
ln
n
0
l
Entropy of the left node:
H
l
=
−
n
l
−
n
l
.
n
r
n
H
r
+
n
l
Average entropy to be minimized:
IG
=
n
H
l
.
Denoting
α
=1
−
α
and
β
=1
−
β
, one derives after simple manipulations:
αp
αp
+
βq
−
βq
αp
+
βq
−
IG
(
p, α, β
)=
−
αp
ln
βq
ln
αp
αp
+
βq
−
βq
αp
+
βq
αp
ln
βq
ln
.
(4.47)
Formula 4.47 clearly shows that
IG
(
p, α, β
) is
independent
of the predicted
class labeling; swapping
α
by
β
together with
p
by
q
in formula 4.47 leads
to the same result.
•
Twoing criterion
The original towing criterion in terms of the
n
's is expressed as:
n
2
+
2
n
r
n
l
n
1
r
n
r
−
n
1
l
n
l
n
0
r
n
r
−
n
0
l
n
l
TWO
=
.
(4.48)
Performing the same manipulations as above one arrives at
TWO
(
p, α, β
)=(
αp
+
βq
)(
βq
+
αp
)
αp
βq
+
αp
−
αp
αp
+
βq
+
+
2
βq
βq
+
αp
−
βq
αp
+
βq
.
(4.49)
TWO
(
p, α, β
) is also
independent
of the predicted class labeling and its
maximum value is 4
pq
; therefore, we minimize 4
pq
TWO
(
p, α, β
).
The following two criteria are
not independent
of the predicted classifica-
tion.
−
•
Probability-of-Error criterion:
Assuming the
r
=
ω
1
,l
=
ω
0
assignment and denoting by
PE
1
,
PE
0
the
class error rates (
PE
1
=
n
1
l
/n
1
=1
−
α, P E
0
=
n
0
r
/n
0
=1
−
β
), we
have:
PE
(
p, α, β
)=
pP E
1
+
qPE
0
=
p
(1
−
α
)+
q
(1
−
β
)
.
(4.50)