Database Reference
In-Depth Information
We assume that all queries only contain point or range conditions, and the query is of the form:
Q
=
∧
i
∧
m
(
A
i
θ a
i
), where
a
i
∧
Dom(
A
i
),
θ
∧
{>, <, =, ≥, ≤, between, in}. Note that if
θ
is the operator
between
,
A
i
θ a
i
has the format of “
A
i
between
a
i
1
and
a
i
2
”or “
a
i
1
≤
A
i
≤
a
i
2
”, where
a
i
1
,
a
i
2
∧
Dom(
A
i
).
D
can be partitioned into a set of disjoint preference-based clusters
C
= {
C
1
,...,
C
q
}, where each
cluster
C
j
corresponds to one type of user preferences. Each
C
j
is associated with a probability
P
j
that
users are interested in
C
j
. This set of clusters over
D
is inferred from the query history. We assume that
the dataset
D
is fixed. But, in practice
D
may get modified from time to time. For the purpose of this
chapter, we will assume that the clusters are generated periodically (e.g., once a month) as the set of
queries evolve and database is updated.
Category Tree
Category Tree
Definition 1 (Category tree).
A category tree
T
(
V
,
E
,
L
) consists of a node set
V
, an edge set
E
, and a
label set
L
. Each node
v
∧
V
has a label lab(
v
)
∧
L
which specifies the condition on an attribute such that
the following should be satisfied: (i) such conditions are point or range conditions, and the bounds in
the range conditions are called partition points; (ii)
v
contains a set of tuples
N
(
v
) that satisfy all condi-
tions on its ancestors including itself, in other words,
N
(
v
) is the subset of tuples in
D
that satisfies the
conjunction of catalog labels of all nodes on the path from the root to
v
; (iii) conditions associated with
subcategory of an intermediate node
v
are on the same attribute (called partition attribute), and define
a partition of the tuples in
v
.
The label of a category (or a node), therefore, solely and unambiguously describes to the user which
tuples, among those in the tuple set of the parent of
v
, appear under
v
. Hence, user can determine whether
v
contains any item that is relevant to her or not by looking just at the label and hence decide whether
to explore or ignore
v
. The
lab
(
v
) has the following structure:
If the categorizing attribute
A
is a categorical attribute:
lab
(
v
) is of the form '
A
∧
S
' where
S
∧
Dom(
A
) (Dom(
A
) denotes the domain of values of attribute
A
in
D
). A tuple
t
satisfied the predicate
lab
(
v
) if
t
.
A
∧
S
, otherwise it is false (
t
.
A
denotes the value of tuple
t
on attribute
A
).
If the categorizing attribute
A
is a numeric attribute:
lab
(
v
) is of the form '
a
1
≤
A
<
a
2
' where
a
1
,
a
2
∧
Dom(
A
). A tuple t satisfies the predicate
lab
(
v
) is true if
a
1
≤
t
.
A
<
a
2
, otherwise it is false.
Exploration Model
Given a category tree
T
over the query results, the user starts the exploration by exploring the root node.
Suppose that she has decided to explore the node
v
, if
v
is an intermediate node, she non-deterministically
(i.e., not known in advance) chooses one of the two options:
Option 'ShowTuples':
Browse through the tuples in
N
(
v
). Note that the user needs to examine all
tuples in
N
(
v
) to make sure that she finds every tuple relevant to her.
Option 'ShowCat':
Examine the labels of all the
n
subcategories of
v
, exploring the ones relevant
to her and ignoring the rest. More specifically, she examines the label of each subcategory
v
i
of
v
star-
ing form the first subcategory and no-deterministically chooses to either explore it or ignore it. If she
chooses to ignore
v
i
, she simply proceeds and examines the next label (of
v
i
+1
). If she chooses to explore