Database Reference
In-Depth Information
Figure 3. Total actual cost
Figure 4. Average number of selected houses per subject
|
(
i
)|
Sib
v
∑
∑
∑
ActCost
=
(
K N v
( )
+
K
(|
Sib v
( ) |
+
P N v
(
(
))
))
1
2
i
st
j
j
∀
leaf v visited by a subject
v Anc v
∈
( )
j
=
1
i
(13)
Unlike the category cost in Definition 2, this cost is the real count of intermediate (including siblings)
and tuples visited by a subject. We assume the weight for visiting intermediate nodes and visiting tuples
are equal, i.e.
K
1
=
K
2
= 1. In general the lower the total category cost, the better the categorization method.
Figure 3 shows the total actual cost, averaged over all the subjects, for
Cost-based
,
C4.5-Categorization
,
and
Greedy
algorithm. Figure 4 reports the average number of houses selected by each subject. Figure
5 reports the average category cost of per selected house for these algorithms.
The results show that the category trees generated by
Cost-based
algorithm have the lowest actual
cost and the lowest average cost per selected house (the number of query clusters
k
was set to 30). Users