Automatic Categorization of Web Database Query Results - Advanced Database Query Systems

Database Reference

In-Depth Information

techniques require users to specify what information to visualize (e.g., by setting the slider or selecting

histogram buckets). Since our approach generates the information to visualize, i.e., the category tree,

our approach is a complementary to visualization techniques. If the leaf of the tree still contains many

tuples, for example, a query slider or brushing histogram can be used to further narrow down the scope.

Concerning the ranked retrieval from databases, user relevance feedback (Rui, Huang & Merhotra,

1997; Wu, Faloutsos, Sycara & Payne, 2000) is employed to learn the similarity between a result tuple

and the query, which is used to rank the query results in relational multimedia databases. The SQL query

language is extended to allow the user to specify the ranking function according to their preference for

the attributes (Kieβling, 2002; Roussos, Stavrakas & Pavlaki, 2005). Also, the importance scores of

result tuples (Agrawal, Chaudhuri, Das & Gionis, 2003; Chaudhuri, Das, Hristidis & Weikum, 2004;

Geerts, Mannila & Terzim, 2004) are extracted automatically by analyzing the past workloads, which

can reveal what users are looking for and what they consider as important. According to the scores, the

tuples can be ranked. Ranking is a complementary to categorization. We can use ranking in addition to

our techniques (e.g., we rank tuples stored in the intermediate nodes and leaves). However, most exist-

ing work does not consider the diversity issue of user preferences. In contrast, we focus on addressing

the diversity issue of user preferences for the categorization approach.

Also there has been a lot of work on information retrieval (Card, MacKinlay & Shneiderman, 1999;

Shen, Tan & Zhai, 2005; Finkelstein &Gabrilovich, 2001; Joachims, 2006; Sugiyama & Hatano, 2004)

using query history or other implicit feedbacks. However, these work focuses on searching text docu-

ments, while this chapter focuses on searching relational data. In addition, these studies typically rank

query results, while this chapter categorizes the results. Of course, ones could use the existing hierar-

chical clustering techniques (Mitchell, 1997) to create the category tree. But the generated trees are not

easy for users to navigate. For example, how do we describe the tuples contained in a node? We can

use a representative tuple, but such a tuple may contain many attributes. It is difficult for users to read.

On the contrary, the category tree used in this chapter is easy to understand because each node just uses

one attribute.

BASICS OF CATEGORIZATION

This section introduces the query history firstly, and then defines the category tree and the category cost.

The categorical space and exploration model are finally described.

Query History

Consider a database relation D with n tuples D = { t 1 ,..., t n } with schema R { A 1 ,..., A m }. Let Dom( A i )

represent the active domain of attribute A i .

Let H be a query history {( Q 1 , U 1 , F 1 ),..., ( Q k , U k , F k )} in chronological order, where Q i is a query,

U i is a session ID (a session starts when a user connects to the database and ends when the user discon-

nects), and F i is the importance weight of the query, which is evaluated by the frequency of the query in

H . Assume that the queries in the same session are asked by the same user, which will be used later to

prune queries. The query history can be collected using the query log of commercial database systems.

Search WWH ::

Custom Search

Home