Discovering User Interests by Document Classification - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

Because the expense of real number computation is so high, all term

frequencies are changed from real number into nominal value:

0

≤

frequency

1.

< 0.2: low

0

.

2

≤

frequency

2.

< 0.5: medium

0

.

5

≤

frequency

3.

: high

Table 5 Nominal term frequencies

computer programming

language algorithm derivative class

doc1.txt high medium low low computer

doc2.txt low low medium high math

doc3.txt medium low medium high math

doc4.txt medium high low medium computer

doc5.txt low low medium medium math

doc6.txt medium low medium low computer

The basic idea of generating decision tree [Mitchell 1997] is to split the tree

into two sub-trees at the most informative node. Such node is chosen by

computing its entropy or information gain. Following figure shows the decision

tree generated from our training data.

derivative

low

high

Computer

science

math

medium

computer

low

medium or high

Computer

science

math

Fig. 5 Decision tree

Mining and Analyzing Social Networks

Search WWH ::

Custom Search

Home