Database Reference
In-Depth Information
techniques require users to specify what information to visualize (e.g., by setting the slider or selecting
histogram buckets). Since our approach generates the information to visualize, i.e., the category tree,
our approach is a complementary to visualization techniques. If the leaf of the tree still contains many
tuples, for example, a query slider or brushing histogram can be used to further narrow down the scope.
Concerning the ranked retrieval from databases, user relevance feedback (Rui, Huang & Merhotra,
1997; Wu, Faloutsos, Sycara & Payne, 2000) is employed to learn the similarity between a result tuple
and the query, which is used to rank the query results in relational multimedia databases. The SQL query
language is extended to allow the user to specify the ranking function according to their preference for
the attributes (Kieβling, 2002; Roussos, Stavrakas & Pavlaki, 2005). Also, the importance scores of
result tuples (Agrawal, Chaudhuri, Das & Gionis, 2003; Chaudhuri, Das, Hristidis & Weikum, 2004;
Geerts, Mannila & Terzim, 2004) are extracted automatically by analyzing the past workloads, which
can reveal what users are looking for and what they consider as important. According to the scores, the
tuples can be ranked. Ranking is a complementary to categorization. We can use ranking in addition to
our techniques (e.g., we rank tuples stored in the intermediate nodes and leaves). However, most exist-
ing work does not consider the diversity issue of user preferences. In contrast, we focus on addressing
the diversity issue of user preferences for the categorization approach.
Also there has been a lot of work on information retrieval (Card, MacKinlay & Shneiderman, 1999;
Shen, Tan & Zhai, 2005; Finkelstein &Gabrilovich, 2001; Joachims, 2006; Sugiyama & Hatano, 2004)
using query history or other implicit feedbacks. However, these work focuses on searching text docu-
ments, while this chapter focuses on searching relational data. In addition, these studies typically rank
query results, while this chapter categorizes the results. Of course, ones could use the existing hierar-
chical clustering techniques (Mitchell, 1997) to create the category tree. But the generated trees are not
easy for users to navigate. For example, how do we describe the tuples contained in a node? We can
use a representative tuple, but such a tuple may contain many attributes. It is difficult for users to read.
On the contrary, the category tree used in this chapter is easy to understand because each node just uses
one attribute.
BASICS OF CATEGORIZATION
This section introduces the query history firstly, and then defines the category tree and the category cost.
The categorical space and exploration model are finally described.
Query History
Consider a database relation D with n tuples D = { t 1 ,..., t n } with schema R { A 1 ,..., A m }. Let Dom( A i )
represent the active domain of attribute A i .
Let H be a query history {( Q 1 , U 1 , F 1 ),..., ( Q k , U k , F k )} in chronological order, where Q i is a query,
U i is a session ID (a session starts when a user connects to the database and ends when the user discon-
nects), and F i is the importance weight of the query, which is evaluated by the frequency of the query in
H . Assume that the queries in the same session are asked by the same user, which will be used later to
prune queries. The query history can be collected using the query log of commercial database systems.
Search WWH ::




Custom Search