Database Reference
In-Depth Information
We assume that all queries only contain point or range conditions, and the query is of the form: Q =
i m ( A i θ a i ), where a i Dom( A i ), θ {>, <, =, ≥, ≤, between, in}. Note that if θ is the operator between ,
A i θ a i has the format of “ A i between a i 1 and a i 2 ”or “ a i 1 A i a i 2 ”, where a i 1 , a i 2 Dom( A i ).
D can be partitioned into a set of disjoint preference-based clusters C = { C 1 ,..., C q }, where each
cluster C j corresponds to one type of user preferences. Each C j is associated with a probability P j that
users are interested in C j . This set of clusters over D is inferred from the query history. We assume that
the dataset D is fixed. But, in practice D may get modified from time to time. For the purpose of this
chapter, we will assume that the clusters are generated periodically (e.g., once a month) as the set of
queries evolve and database is updated.
Category Tree
Category Tree
Definition 1 (Category tree). A category tree T ( V , E , L ) consists of a node set V , an edge set E , and a
label set L . Each node v V has a label lab( v ) L which specifies the condition on an attribute such that
the following should be satisfied: (i) such conditions are point or range conditions, and the bounds in
the range conditions are called partition points; (ii) v contains a set of tuples N ( v ) that satisfy all condi-
tions on its ancestors including itself, in other words, N ( v ) is the subset of tuples in D that satisfies the
conjunction of catalog labels of all nodes on the path from the root to v ; (iii) conditions associated with
subcategory of an intermediate node v are on the same attribute (called partition attribute), and define
a partition of the tuples in v .
The label of a category (or a node), therefore, solely and unambiguously describes to the user which
tuples, among those in the tuple set of the parent of v , appear under v . Hence, user can determine whether
v contains any item that is relevant to her or not by looking just at the label and hence decide whether
to explore or ignore v . The lab ( v ) has the following structure:
If the categorizing attribute A is a categorical attribute: lab ( v ) is of the form ' A S ' where S
Dom( A ) (Dom( A ) denotes the domain of values of attribute A in D ). A tuple t satisfied the predicate lab
( v ) if t . A S , otherwise it is false ( t . A denotes the value of tuple t on attribute A ).
If the categorizing attribute A is a numeric attribute: lab ( v ) is of the form ' a 1 A < a 2 ' where a 1 ,
a 2 Dom( A ). A tuple t satisfies the predicate lab ( v ) is true if a 1 t . A < a 2 , otherwise it is false.
Exploration Model
Given a category tree T over the query results, the user starts the exploration by exploring the root node.
Suppose that she has decided to explore the node v , if v is an intermediate node, she non-deterministically
(i.e., not known in advance) chooses one of the two options:
Option 'ShowTuples': Browse through the tuples in N ( v ). Note that the user needs to examine all
tuples in N ( v ) to make sure that she finds every tuple relevant to her.
Option 'ShowCat': Examine the labels of all the n subcategories of v , exploring the ones relevant
to her and ignoring the rest. More specifically, she examines the label of each subcategory v i of v star-
ing form the first subcategory and no-deterministically chooses to either explore it or ignore it. If she
chooses to ignore v i , she simply proceeds and examines the next label (of v i +1 ). If she chooses to explore
Search WWH ::




Custom Search