Automatic Categorization of Web Database Query Results - Advanced Database Query Systems

Database Reference

In-Depth Information

v i , she does so recursively based on the same exploration model, i.e., by choosing either 'ShowTuples'

or 'ShowCat' if it is an intermediate node or by choosing 'ShowTuples' if it is a leaf node. After she

finishes the exploration of v i , she goes ahead and examines the label of the next subcategory of v (of

v i +1 ). When the user reaches the end of the subcategory list, she is done. Note that we assume that the

user examines the subcategories in the order it appears under v ; it can be from top to bottom or from left

to right depending on how the tree is rendered by the user interface.

Category Cost

We assume that a user visits T in a top-bottom fashion, and stops at a node (intermediate node or leaf

node) that contains the tuples that she is interested in.

Let v be a node (intermediate node or leaf node) of T with N ( v ) tuples and C j be a cluster in C . C j ∩

v ≠ ∧ denotes that v contains tuples in C j . Anc ( v ) denotes the set of ancestors of v including v itself, but

excluding the root. Sib ( v ) denotes the set of nodes at the same level as the node v including itself. Let

K 1 and K 2 represent the weights of visiting a tuples in the node and visiting an intermediate tree node,

respectively. Let P j be the probability that users will be interested in cluster C j , and let P st be the prob-

ability that user goes for option 'ShowTuples' for an intermediate node v given that she explores v . The

category cost is defined as follows.

Definition 2 (category cost). The category cost

Cost ( ,

T C

)

Sib v

(

(1)

∑

P K N v

(

( ) |

Sib v

( ) |

P N v

(

))

v Node T

∈

(

)

C v

∩ ≠

v Anc v

∈

( )

The category cost of a leaf node v consists of three terms: the cost of visiting tuples in leaf node v ,

the cost of visiting intermediate nodes, and the cost of visiting tuples in intermediate nodes if the user

chooses to explore it. Users need to examine the labels of all sibling nodes to select a node on the path

from the root to v , thus users have to visit

i ∈ ∑ intermediate tree nodes. Users may also

like to examine the tuples of some sibling nodes on the path from the root to v , thus users have to visit

( ) Sib v i

( ) |

v Anc v

Sib v

(

∑

P N v

(

))

tuples of intermediate tree nodes. When users reach the node v which they

v Anc v

∈

( )

would like to explore it, they have to look at N ( v ) tuples in v . P st is the probability that the user exploring

v using 'ShowTuples', P st = N ( A v )/ N , where N ( A v ) denotes the number of queries in the query history

that contain selection condition on attribute A of node v and N is the total number of queries in the

query history. Definition 2 computes the expected cost over all clusters and nodes.

Query Results Categorization

For resolve the problem of query results categorization, we propose a solution which consists of two

steps, the offline data clustering step and the online category tree construction step. In this Section, we

first describe data clustering and then present the category tree construction approach.

Advanced Database Query Systems

Search WWH ::

Custom Search

Home