Information Technology Reference
In-Depth Information
Because the expense of real number computation is so high, all term
frequencies are changed from real number into nominal value:
0
frequency
1.
< 0.2: low
0
.
2
frequency
2.
< 0.5: medium
0
.
5
frequency
3.
: high
Table 5 Nominal term frequencies
computer programming
language algorithm derivative class
doc1.txt high medium low low computer
doc2.txt low low medium high math
doc3.txt medium low medium high math
doc4.txt medium high low medium computer
doc5.txt low low medium medium math
doc6.txt medium low medium low computer
The basic idea of generating decision tree [Mitchell 1997] is to split the tree
into two sub-trees at the most informative node. Such node is chosen by
computing its entropy or information gain. Following figure shows the decision
tree generated from our training data.
derivative
low
high
Computer
science
math
medium
computer
low
medium or high
Computer
science
math
Fig. 5 Decision tree
Search WWH ::




Custom Search