Automatic Categorization of Web Database Query Results - Advanced Database Query Systems

Database Reference

In-Depth Information

•

We propose a clustering approach to cluster queries and summarize preferences of all users in the

system using the query history. This approach uses query pruning, pre-computation and query

clustering to deal with large query histories and large data sets.

•

We propose a cost-based algorithm to construct a category tree over these clusters pre-formulated

in the offline processing phrase. Unlike the existing categorization and decision tree construction

approaches, our approach shows tuples for intermediate nodes and considers the cost for users to

visit both intermediate nodes and leaves.

The rest of this chapter is organized as follows. Section 2 reviews some related work. Section 3

formally defines some notions. Section 4 describes the queries and tuples clustering method. Section 5

proposes the algorithm for the category tree construction. Section 6 shows the experimental results. The

chapter is concluded in Section 7.

RELATED WORK

Two kinds of automatic categorization approaches have been proposed by Chakrabarti et . al . (Chakrab-

arti, Chaudhuri & Hwang, 2004) and Chen et . al . (Chen & Li, 2007), respectively. Chakrabarti et . al .

proposed a greedy algorithm to construct a category tree. This algorithm uses query history of all users

in the system to infer an overall user preference as the probabilities that users are interested in each at-

tribute. Taking advantage of C4.5 decision tree constructing algorithm, Che n (Chen & Li, 2007) proposed

a two-step solution which first clusters user query history and then constructs the navigated tree for

resolving the user's personalized query. We make use of some of these ideas, but enhance the category

tree with the feature of showing tuples in the intermediate nodes and focus on how the clusters of query

history and the cost of visiting both intermediate nodes and leaves have impact on the categorization.

For providing query personalization, several approaches have been proposed to define a user profile

for each user and use the profile to decide his preferences (Koutrika &Ioannidis, 2004; Kieβling, 2002).

As Chen (Chen & Li, 2007) pointed out that, however, in real life, user profiles may not be available

because users do not want to or cannot specify their preferences (if they can, they can form the appropri-

ate query and there is no need for either ranking or categorizing). The profile may be derived from the

query history of a certain user, but this method does not work if the user is new to the system, which is

exactly true when the user needs help.

There has been a rich body of work on categorizing text documents (Dhillon, Mallela, & Kumar,

2002; Joachims, 1998; Koller,& Sahami, 1997) and Web search results (Liu, Yu & Meng, 2002; Zeng,

He, Chen, Ma & Ma, 2004). But categorizing relational data presents unique challenges and opportuni-

ties. First, relational data contains numerical values while text categorization methods treat documents

as bags of words. This chapter tries to minimize the overhead for users to navigate the generated tree

(it will be defined in Section 3), which is not considered in the existing text categorization methods.

Also there has been a rich body of work on information visualization techniques (Card, MacKinlay

& Shneiderman, 1999). Two popular techniques are dynamic query slider (Ahlberg & Shneiderman,

1994) and brushing histogram (Tweedie, Spence, Williams & Bhogal, 1994). The former allows users

to visualize dynamic query results by using sliders to represent range search conditions, and the latter

employs interactive histograms to represent each attribute and helps users exploring correlations between

attributes. Note that they do not take query history into account. Furthermore, information visualization

Search WWH ::

Custom Search

Home