Information Technology Reference
In-Depth Information
7.7.2 Attribute selection
In algorithm 7.4, we only have one empty decision tree when learning starts, not
knowing how to classify examples according to attributes. In terms of training set
we should construct the decision tree to partition the whole example space based
on attributes. Let training set be X and will be divided them into n classes.
Assume that the number of training example belonging to the ith class is C i total
number of training examples in X is | X |. If probability that a example belongs to
the ith class is written as
C i ), then:
P
(
C
P C
(
)
=
i
(7.25)
i
|
X
|
At this time, uncertainty of decision tree partition
C
is:
= − Ã
H X C
(
,
)
P C
(
) log
P C
(
)
(7.26)
i
i
In context without confusion,
).
Decision tree learning procedure is a procedure that decision tree makes
uncertainty of partition increasingly diminished. If we select attribute a to test,
when we know
H
(
X C
) is simply written as
H
(
X
a
=
a j
H
(
X
/
a
)
= ÃÃ
p
(
Ci
;
a
=
a
)
log
p
(
Ci
/
a
=
a
)
j
j
i
j
ÃÃ
= −
p a
(
=
a
)
p Ci
(
/
a
=
a
) log
p Ci
(
/
a
=
a
)
j
j
j
i
j
(7.27)
Ã
Ã
= −
p a
(
=
a
)
p Ci
(
/
a
=
a
) log
p Ci
(
/
a
=
a
)
j
j
j
j
i
Assume that number of examples belonging to the
i
th class is
C ij , written as
C
ij
P Ci a
(
,
=
a
)
=
, i.e.
P
(
C i , a=a j ) is probability belonging to the ith class
j
|
X
|
when test attribute
a j . At this time, uncertainty degree of decision tree
classification is condition entropy of training set to attribute
a
is
X
= Ã
H
(
X
)
p
(
Ci
/
a
=
a
)
log
p
(
Ci
/
a
=
a
)
(7.28)
j
j
j
i
After selecting test attribute
a
, for every leaf node
X j meets
a
=
a j , the information
entropy about classification information is:
H
(
X
/
a
)
= Ã
p
(
a
=
a
)
H
(
X
)
(7.29)
j
j
j
The amount of information provided by classification of attribute a is
I
(
X ; a
):
Search WWH ::




Custom Search