Information Technology Reference
In-Depth Information
Information-Gain criterion
Denoting n r = n 1 r + n 0 r , n l = n 1 l + n 0 l ,wehave:
n 1 r
n r
ln n 1 r
n 0 r
n r
ln n 0 r
Entropy of the right node: H r =
n r
n r .
n 1 l
n l
ln n 1 l
n 0 l
n l
ln n 0 l
Entropy of the left node: H l =
n l
n l .
n r
n H r + n l
Average entropy to be minimized: IG =
n H l .
Denoting α =1
α and β =1
β , one derives after simple manipulations:
αp
αp + βq
βq
αp + βq
IG ( p, α, β )=
αp ln
βq ln
αp
αp + βq
βq
αp + βq
αp ln
βq ln
.
(4.47)
Formula 4.47 clearly shows that IG ( p, α, β ) is independent of the predicted
class labeling; swapping α by β together with p by q in formula 4.47 leads
to the same result.
Twoing criterion
The original towing criterion in terms of the n 's is expressed as:
n 2
+
2
n r n l
n 1 r
n r
n 1 l
n l
n 0 r
n r
n 0 l
n l
TWO =
.
(4.48)
Performing the same manipulations as above one arrives at
TWO ( p, α, β )=( αp + βq )( βq + αp )
αp
βq + αp
αp
αp + βq
+
+
2
βq
βq + αp
βq
αp + βq
.
(4.49)
TWO ( p, α, β ) is also independent of the predicted class labeling and its
maximum value is 4 pq ; therefore, we minimize 4 pq
TWO ( p, α, β ).
The following two criteria are not independent of the predicted classifica-
tion.
Probability-of-Error criterion:
Assuming the r = ω 1 ,l = ω 0 assignment and denoting by PE 1 , PE 0 the
class error rates ( PE 1 = n 1 l /n 1 =1
α, P E 0 = n 0 r /n 0 =1
β ), we
have:
PE ( p, α, β )= pP E 1 + qPE 0 = p (1
α )+ q (1
β ) .
(4.50)
 
Search WWH ::




Custom Search