Database Reference
In-Depth Information
We propose a clustering approach to cluster queries and summarize preferences of all users in the
system using the query history. This approach uses query pruning, pre-computation and query
clustering to deal with large query histories and large data sets.
We propose a cost-based algorithm to construct a category tree over these clusters pre-formulated
in the offline processing phrase. Unlike the existing categorization and decision tree construction
approaches, our approach shows tuples for intermediate nodes and considers the cost for users to
visit both intermediate nodes and leaves.
The rest of this chapter is organized as follows. Section 2 reviews some related work. Section 3
formally defines some notions. Section 4 describes the queries and tuples clustering method. Section 5
proposes the algorithm for the category tree construction. Section 6 shows the experimental results. The
chapter is concluded in Section 7.
RELATED WORK
Two kinds of automatic categorization approaches have been proposed by Chakrabarti et . al . (Chakrab-
arti, Chaudhuri & Hwang, 2004) and Chen et . al . (Chen & Li, 2007), respectively. Chakrabarti et . al .
proposed a greedy algorithm to construct a category tree. This algorithm uses query history of all users
in the system to infer an overall user preference as the probabilities that users are interested in each at-
tribute. Taking advantage of C4.5 decision tree constructing algorithm, Che n (Chen & Li, 2007) proposed
a two-step solution which first clusters user query history and then constructs the navigated tree for
resolving the user's personalized query. We make use of some of these ideas, but enhance the category
tree with the feature of showing tuples in the intermediate nodes and focus on how the clusters of query
history and the cost of visiting both intermediate nodes and leaves have impact on the categorization.
For providing query personalization, several approaches have been proposed to define a user profile
for each user and use the profile to decide his preferences (Koutrika &Ioannidis, 2004; Kieβling, 2002).
As Chen (Chen & Li, 2007) pointed out that, however, in real life, user profiles may not be available
because users do not want to or cannot specify their preferences (if they can, they can form the appropri-
ate query and there is no need for either ranking or categorizing). The profile may be derived from the
query history of a certain user, but this method does not work if the user is new to the system, which is
exactly true when the user needs help.
There has been a rich body of work on categorizing text documents (Dhillon, Mallela, & Kumar,
2002; Joachims, 1998; Koller,& Sahami, 1997) and Web search results (Liu, Yu & Meng, 2002; Zeng,
He, Chen, Ma & Ma, 2004). But categorizing relational data presents unique challenges and opportuni-
ties. First, relational data contains numerical values while text categorization methods treat documents
as bags of words. This chapter tries to minimize the overhead for users to navigate the generated tree
(it will be defined in Section 3), which is not considered in the existing text categorization methods.
Also there has been a rich body of work on information visualization techniques (Card, MacKinlay
& Shneiderman, 1999). Two popular techniques are dynamic query slider (Ahlberg & Shneiderman,
1994) and brushing histogram (Tweedie, Spence, Williams & Bhogal, 1994). The former allows users
to visualize dynamic query results by using sliders to represent range search conditions, and the latter
employs interactive histograms to represent each attribute and helps users exploring correlations between
attributes. Note that they do not take query history into account. Furthermore, information visualization
Search WWH ::




Custom Search