Database Reference
In-Depth Information
v i , she does so recursively based on the same exploration model, i.e., by choosing either 'ShowTuples'
or 'ShowCat' if it is an intermediate node or by choosing 'ShowTuples' if it is a leaf node. After she
finishes the exploration of v i , she goes ahead and examines the label of the next subcategory of v (of
v i +1 ). When the user reaches the end of the subcategory list, she is done. Note that we assume that the
user examines the subcategories in the order it appears under v ; it can be from top to bottom or from left
to right depending on how the tree is rendered by the user interface.
Category Cost
We assume that a user visits T in a top-bottom fashion, and stops at a node (intermediate node or leaf
node) that contains the tuples that she is interested in.
Let v be a node (intermediate node or leaf node) of T with N ( v ) tuples and C j be a cluster in C . C j
v denotes that v contains tuples in C j . Anc ( v ) denotes the set of ancestors of v including v itself, but
excluding the root. Sib ( v ) denotes the set of nodes at the same level as the node v including itself. Let
K 1 and K 2 represent the weights of visiting a tuples in the node and visiting an intermediate tree node,
respectively. Let P j be the probability that users will be interested in cluster C j , and let P st be the prob-
ability that user goes for option 'ShowTuples' for an intermediate node v given that she explores v . The
category cost is defined as follows.
Definition 2 (category cost). The category cost
Cost ( ,
T C
)
=
|
Sib v
(
)|
(1)
i
P K N v
(
|
( ) |
+
K
(|
Sib v
( ) |
+
P N v
(
(
))
))
j
1
2
i
st
j
j
v Node T
(
)
C v
∩ ≠
f
v Anc v
( )
j
=
1
j
i
The category cost of a leaf node v consists of three terms: the cost of visiting tuples in leaf node v ,
the cost of visiting intermediate nodes, and the cost of visiting tuples in intermediate nodes if the user
chooses to explore it. Users need to examine the labels of all sibling nodes to select a node on the path
from the root to v , thus users have to visit
i intermediate tree nodes. Users may also
like to examine the tuples of some sibling nodes on the path from the root to v , thus users have to visit
( ) Sib v i
|
( ) |
v Anc v
|
Sib v
(
)|
i
P N v
st
(
(
))
tuples of intermediate tree nodes. When users reach the node v which they
j
j
v Anc v
( )
j
=
1
i
would like to explore it, they have to look at N ( v ) tuples in v . P st is the probability that the user exploring
v using 'ShowTuples', P st = N ( A v )/ N , where N ( A v ) denotes the number of queries in the query history
that contain selection condition on attribute A of node v and N is the total number of queries in the
query history. Definition 2 computes the expected cost over all clusters and nodes.
Query Results Categorization
For resolve the problem of query results categorization, we propose a solution which consists of two
steps, the offline data clustering step and the online category tree construction step. In this Section, we
first describe data clustering and then present the category tree construction approach.
Search WWH ::




Custom Search