Digital Signal Processing Reference
In-Depth Information
(L n , j )
average of the information H
, as required to classify an element of the subset
L n , j :
J n
| L n , j |
| L |
(L |
) =
(L n , j ).
H
n
H
(7.3)
j
=
1
By this equation the IG can be defined. It describes how the entropy, i.e., the
information needed for the assignment, is reduced by addition of the feature n :
IG
(L,
n
) =
H
(L)
H
(L |
n
).
(7.4)
However, this definition tends to favour features with a high number of different
values J n : If all elements in
, whose features n have the same value belong to the
same class—this is in particular the case, if a feature has a different value for each
element in
L
L
—, then H
(L |
n
)
equals zero, and by that one obtains a maximal IG
(L,
n
)
.
By introduction of the Information Gain Ratio (IGR) this can be compensated:
IG
(L,
n
)
H | L n , 1 |
.
IGR
(L,
n
) =
(7.5)
| L | ,..., | L n , J n |
| L |
The term in the denominator is called split information and is computed according
to Eq. ( 7.1 ). This is the information one obtains by the described split of the set
L
according to the values of the feature n .
A popular method for the training of a DT based on a training set
is the iterative
dichotomiser 3 (ID3) algorithm [ 3 ]. ID3 constructs the DT recursively for the overall
feature set by concatenation of sub-trees for each subset of the features. For a given
set of features
L
M ⊆{
1
,...,
N
}
and training set
L
, the steps are as follows:
1. If all elements in
L
belong to class i return a leaf labelled i .
2. If
M
is empty, return a leaf labelled by the most frequent class in
L
.
3. Else search for the feature n with the highest IG(R), i.e.,
n =
arg max n M
IG
(L,
n
).
n }
=
,...,
M −{
4. For all j
1
J n
construct a DT by ID3 on the feature set
and
L n , j . Return a tree with the root labelled by the feature n whose
edges lead to the constructed DTs (cf. Fig. 7.2 )
ID3 is a greedy algorithm as in every step a feature is selected by a local optimisa-
tion criterion. A global optimum is not guaranteed. Further, ID3 always terminates,
as with every recursive call the remaining set of features decreases and the case of
an empty feature set is handled separately.
An extension of ID3 are the C4.5 or J48 variants that introduce pruning of sub-trees
[ 2 , 4 ] for increased efficiency. During pruning, a whole sub-tree can be replaced by
a leaf, if the error probability is not significantly increased by this substitution. Note
the training set
 
 
Search WWH ::




Custom Search