Database Reference
In-Depth Information
Figure 2. Tree generated by the C4.5-Categorization method
to create the navigational tree. But the created category tree (Figure 2) has two drawbacks: (i) the tuples
under the intermediate nodes cannot be explored by the users, i.e., users can only access the tuples under
the leaf nodes but cannot examine the tuples in the intermediate nodes; (ii) the cost of visiting the tuples
of intermediate node is not considered if the user choose to explore the tuples of intermediate node.
User preferences are often difficult to obtain because users do not want to spend extra efforts to specify
their preferences, thus there are two major challenges to address the diversity issue of user preferences:
(i) how to summarize different kinds of user preferences from the behavior of all users already in the
system, and (ii) how to categorize or rank the query results according to the specific user preferences.
Query history has been widely applied to infer the preferences of all users in the system (Agrawal,
Chaudhuri, Das & Gionis, 2003; Chaudhuri, Das, Hristidis & Weikum, 2004; Chakrabarti, Chaudhuri
& Hwang, 2004; Das, Hristidis, Kapoor & Sudarshan, 2006).
In this chapter, we present techniques to automatically categorize the results of user queries on Web
databases in order to reduce information overload. We propose a two-step approach to address both
challenges for the categorization case. The first step analyzes query history of all users already in the
system offline and then generates a set of clusters over the data. Each cluster corresponds to one type of
user preferences and is associated with a probability that users may be interested in the cluster. Assume
that an individual user's preference can be represented as a subset of these clusters. When a specific user
submits a query, the second step first compute the similarity between the query and the representative
queries in the query clusters, and then the data clusters the user may be interested in can be inferred by
the query. Next, the set of data clusters generated in the first step is intersected with the query answers
and then a labeled hierarchical category structure is generated automatically based on the contents of the
tuples in the answer set. Consequently, a category tree is automatically constructed over these intersected
clusters on the fly. This tree is finally presented to the user.
This chapter presents a domain-independent approach to addressing the information overload problem.
The contributions are summarized as follows:
Search WWH ::




Custom Search