Automatic Categorization of Web Database Query Results - Advanced Database Query Systems

Database Reference

In-Depth Information

We assume that all queries only contain point or range conditions, and the query is of the form: Q =

∧ i ∧ m ( A i θ a i ), where a i ∧ Dom( A i ), θ ∧ {>, <, =, ≥, ≤, between, in}. Note that if θ is the operator between ,

A i θ a i has the format of “ A i between a i 1 and a i 2 ”or “ a i 1 ≤ A i ≤ a i 2 ”, where a i 1 , a i 2 ∧ Dom( A i ).

D can be partitioned into a set of disjoint preference-based clusters C = { C 1 ,..., C q }, where each

cluster C j corresponds to one type of user preferences. Each C j is associated with a probability P j that

users are interested in C j . This set of clusters over D is inferred from the query history. We assume that

the dataset D is fixed. But, in practice D may get modified from time to time. For the purpose of this

chapter, we will assume that the clusters are generated periodically (e.g., once a month) as the set of

queries evolve and database is updated.

Category Tree

Definition 1 (Category tree). A category tree T ( V , E , L ) consists of a node set V , an edge set E , and a

label set L . Each node v ∧ V has a label lab( v ) ∧ L which specifies the condition on an attribute such that

the following should be satisfied: (i) such conditions are point or range conditions, and the bounds in

the range conditions are called partition points; (ii) v contains a set of tuples N ( v ) that satisfy all condi-

tions on its ancestors including itself, in other words, N ( v ) is the subset of tuples in D that satisfies the

conjunction of catalog labels of all nodes on the path from the root to v ; (iii) conditions associated with

subcategory of an intermediate node v are on the same attribute (called partition attribute), and define

a partition of the tuples in v .

The label of a category (or a node), therefore, solely and unambiguously describes to the user which

tuples, among those in the tuple set of the parent of v , appear under v . Hence, user can determine whether

v contains any item that is relevant to her or not by looking just at the label and hence decide whether

to explore or ignore v . The lab ( v ) has the following structure:

If the categorizing attribute A is a categorical attribute: lab ( v ) is of the form ' A ∧ S ' where S ∧

Dom( A ) (Dom( A ) denotes the domain of values of attribute A in D ). A tuple t satisfied the predicate lab

( v ) if t . A ∧ S , otherwise it is false ( t . A denotes the value of tuple t on attribute A ).

If the categorizing attribute A is a numeric attribute: lab ( v ) is of the form ' a 1 ≤ A < a 2 ' where a 1 ,

a 2 ∧ Dom( A ). A tuple t satisfies the predicate lab ( v ) is true if a 1 ≤ t . A < a 2 , otherwise it is false.

Exploration Model

Given a category tree T over the query results, the user starts the exploration by exploring the root node.

Suppose that she has decided to explore the node v , if v is an intermediate node, she non-deterministically

(i.e., not known in advance) chooses one of the two options:

Option 'ShowTuples': Browse through the tuples in N ( v ). Note that the user needs to examine all

tuples in N ( v ) to make sure that she finds every tuple relevant to her.

Option 'ShowCat': Examine the labels of all the n subcategories of v , exploring the ones relevant

to her and ignoring the rest. More specifically, she examines the label of each subcategory v i of v star-

ing form the first subcategory and no-deterministically chooses to either explore it or ignore it. If she

chooses to ignore v i , she simply proceeds and examines the next label (of v i +1 ). If she chooses to explore

Search WWH ::

Custom Search

Home