Database Reference
In-Depth Information
is the gain of the attribute. If the gain-ratio of the attribute with the maximal gain-ratio exceeds a pre-
defined threshold λ, the tree will be expanded by adding the selected subcategory to the current root.
Algorithm Solution
Based on the solutions mentioned above, we can now describe how we construct a category tree. Since
the problem of finding a tree with minimal category cost is NP-hard, we propose an approximate algo-
rithm (see Algorithm 6).
After building the category tree, the user can go along with the branches of tree to find the interesting
answers. As mentioned above, the user can explore category tree using two models, i.e., showing tuples
(option 'ShowTuples') and showing category (option 'ShowCat'). When the user chosen the option
'ShowTuples' on a node (an intermediate node or a leaf node), the system will provide the items satisfy-
ing all conditions from the root to the current node. The category tree accessing algorithm is shown in
Algorithm 7.
Partition Criteria and Cost Estimation
We first describe how to compute the information gain which acts as the partition criteria, and then we
give the cost estimation of visiting the intermediate nodes and the leaves.
Partition Criteria
Existing decision tree construction algorithms such as C4.5 compute an information gain to measure
how good an attribute classifies data. Given a decision tree T with N tuples and n classes, where each
class C i in T has N i tuples. The entropy can be defined as follows,
n
N
N
N
N
1
E T
( )
=
-
i
log
i
(6)
i
=
Algorithm 7. The algorithm of accessing category tree
1. For each intermediate node
2. If option = 'ShowTuples'
3. List the tuples satisfying all conditions from the root to the current
node
4. Else
5. List the subcategory of the current node
6. End For
7. For each leaf node
8. The explore model is only the option 'ShowTuples'
9. List the tuples satisfying all conditions from the root to the current node
10. End For
Search WWH ::




Custom Search