Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

. . are estimated by computing the confidences of pair-wise association rules (Agraw-

al, Mannila, Srikant, Toivonen & Verkamo, 1996) in W and R , respectively. Note that the score in the

above formula is composed of two large factors. The first factor is the global part of the score, while the

second one is the conditional part of the score. This approach is implemented in STAR (Kapoor, Das,

Hristidis, Sudarshan & Weikum, 2007).

Note that in both Chaudhuri's system (Chaudhuri & Das, 2003) and the PIR system (Chaudhuri,

Das, Hristidis & Weikum, 2004), the atomic quantities F(x) , P(x|W) , P(x|R) , P(y|x,W) and P(y|x,R) are

pre-computed and stored in special auxiliary tables for all distinct values x and y in the workload and

the database. Then at query time, both approaches first select the tuples that satisfy the query condition,

then scan and compute the score for each such tuple using the information in the auxiliary tables, and

finally returns the top-k tuples. The main drawback of both (Chaudhuri & Das, 2003) and (Chaudhuri,

Das, Hristidis & Weikum, 2004) is their high storage and maintenance costs of auxiliary tables. More-

over, they require a workload containing past user queries as input, which is not always available (e.g.,

new online databases).

and P(t A | t A ,R)

if

k

QRRE

In the QRRE xi system (Su, Wang, Huang & Lochovsky, 2006), the authors proposed an automatic rank-

ing method, which can rank the query results from an E-commerce Web database R using only data

analysis techniques. Consider a tuple t = t A ,

. ... . 1 in the result set T q of a query q that is submit-

ted by a buyer. QRRE assigns a weight w if to each attribute A if that reflects its importance to the user. w if

is evaluated by the difference (e.g., The Kullback-Leibler divergence (Duda, Hart & Stork, 2000)) be-

tween the distribution (histogram) of A if 's values over the result set T q and their distribution (histogram)

over the whole database R . The bigger the divergence, the more A if is important for a buyer. For instance,

suppose the database HouseDB contains houses for sale in France and consider the query q with the

condition 'View = Waterfront'. Intuitively, the Price values of the tuples in the result set T q distribute in

a small and dense range with a relatively high average, while the Price values of tuples in HouseDB

distribute in a large range with a relatively low average. The distribution difference shows a close cor-

relation between the unspecified attribute, namely, Price, and the query 'View = Waterfront'. In contrast,

attribute Size is less important for the user since its distribution in houses with a waterfront view may

be similar to its distribution in the entire database HouseDB. Besides the attribute weight, QRRE also

assigns a preference score p if to each attribute value t.A if . p if is computed based on the following two as-

sumptions:

,t A N

•

a product with a lower price is always more desired by buyers than a product with a higher price

if the other attributes of the two products have the same values. For example, between two houses

that differ only in their price, the cheapest one is preferred. Hence, QRRE assigns a small prefer-

ence score to a high Price value and a large preference score to a low Price value;

•

a non-Price attribute value with higher 'desirableness' for the user corresponds to a higher price.

For example, a large house, which most buyers prefer, is usually more expensive than a small one.

Thus, in the case of a non-Price attribute A if , QRRE first converts its value t.A if to a Price value pv

which is the average price of the products for A if = t.A if in the database R . Then, QRRE assigns a

large preference score to t.A if if pv is large.

Advanced Database Query Systems

Search WWH ::

Custom Search

Home