Information Technology Reference
In-Depth Information
Because the expense of real number computation is so high, all term
frequencies are changed from real number into nominal value:
0
≤
frequency
1.
< 0.2:
low
0
.
2
≤
frequency
2.
< 0.5:
medium
0
.
5
≤
frequency
3.
:
high
Table 5
Nominal term frequencies
computer
programming
language
algorithm derivative
class
doc1.txt
high medium low low computer
doc2.txt
low low medium high math
doc3.txt
medium low medium high math
doc4.txt
medium high low medium computer
doc5.txt
low low medium medium math
doc6.txt
medium low medium low computer
The basic idea of generating decision tree [Mitchell 1997] is to split the tree
into two sub-trees at the most informative node. Such node is chosen by
computing its entropy or information gain. Following figure shows the decision
tree generated from our training data.
derivative
low
high
Computer
science
math
medium
computer
low
medium
or
high
Computer
science
math
Fig. 5
Decision tree