Database Reference
In-Depth Information
Figure 5. Average category cost of per selected house
Table 3. Results of survey
Categorization algorithm
#subjects that called it best
Cost-based
16
C4.5-categorization
4
Greedy
2
have also found more houses worth considering to buy using our algorithm than the other two algorithms,
suggesting our method makes it easier for users to find interesting houses. The tree generated by Greedy
algorithm has the worst results. This expected because the Greedy algorithm ignores different user
preferences, and dose not consider future partitions when generating category trees. The C4.5-Catego-
rization algorithm also has higher cost than our method. The reason is that our algorithm uses a parti-
tioning criterion that considers the cost of visiting the tuples in intermediate nodes, while C4.5-Catego-
rization algorithm does not. Moreover, our algorithm can use a few clusters to representative a large
scale tuples without lose accuracy (it will be tested in the next experiment).
The results show that using our approach, on average a subject only needs to visit no more than 8
tuples or intermediate nodes for queries Q 1 , Q 2 , Q 3 , and Q 4 to find the first relevant tuple, and needs to
visit about 18 tuples or intermediate nodes for Q 5 . The total navigational cost for our algorithm is less
than 45 for the former four queries, and is less than 80 for Q 5 . At the end of the study, we asked subjects
which categorization algorithm worked the best for them among all the queries they tried. The result of
that survey is reported in Table 3 and shows that a majority of subjects considered our algorithm the best.
Queries Clustering Experiment
This experiment aims at testing the quality of the algorithm for the queries clustering, whose accuracy
has a great impaction on the accuracy of the clusters of the tuples. We first translated each query in the
query history into its corresponding vector representation, and then we adopt the following strategies
to generate synthetic datasets. Every dataset is characterized by 4 parameters: n , m , l , noise . Here the n
Search WWH ::




Custom Search