Digital Signal Processing Reference
In-Depth Information
Fig. 7.1 Exemplary DT: A
two-class problem is shown
with three features. Circles
represent the root and inner
nodes, rectangles represent
the leaves with the class labels
x
2
< 0.5
else
x
-1
1
<0.2
else
x 3
-1
<0.4
else
-1
+1
node
v
then has J a (v)
outgoing edges. The leaves are assigned the according class
labels.
In the recognition phase of an unknown pattern vector x
T , one
= (
x 1 ,...,
x N )
starts at the root
w
and follows the path through the tree as follows: At each node
v
along the path choose the edge for which x a (v)
is within this edge's interval until a
leave is reached. The class to decide for is then this leave's class.
An example of a DT is shown in Fig. 7.1 . In this example, quantisation of feature
values was chosen as binary. This results in a simple threshold decision at each node.
An optimisation criterion is now to maximise the information gain in view of the
correct classification and with the remaining features at each node. The Shannon
entropy H
(
Y
)
of the distribution of the class probabilities
(
Y 1 ,...,
Y M )
can be used
to this end:
M
H
(
Y 1 ,...,
Y M ) =−
Y i ld
(
Y i ).
(7.1)
i
=
1
of pattern vectors x with known class attribution y , the needed
average information H
For a training set
L
(L)
to assign an instance in
L
to a class i
∈{
1
,...,
M
}
is
determined according to:
M
= | L i |
Y i ld
( Y i ),
Y i
H
(L) =−
| L | ,
(7.2)
i
=
1
with class attribution i .
In order to determine the contribution of each individual feature n to the aimed at
class assignment, for each n the set
where
L i is the set of elements in
L
L
is divided into the subsets
L n , j ,
j
=
1
,...,
J n
based on the different values of n . The remaining average information H
needed
after observation of the feature n for the class assignment results as the weighted
(L |
n
)
Search WWH ::




Custom Search