Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

the top-k problem, T q could alternatively be seen as a set of N sorted lists L i of T q (the number of tuples

in T q ) pairs (t,s ),t i Î . Hence, for each attribute A i , there is a sorted list L i in which all T q results

are ranked in descendant order. Entries in the lists could be accessed randomly from the tuple identifier

or sequentially from the sorted score. The main issue for top-k query processing is then to obtain the k

tuples with the highest overall scores computed according to a given aggregation function agg(s ,

...

,s )

of the attribute scores s i . The aggregation function agg used to combine ranking criteria has to be mono-

tone; that is, agg must satisfy the following property:

agg(s ,

...

,s )

agg(s' ,

...

,s' )

for every i

The naive algorithm consists in looking at every entry t,s i in each of the sorted lists L i , computing

the overall grade of every object t , and returning the top k answers. Obviously, this approach is un-

necessarily expensive as it does not take advantage of the fact that only the k best answers are part of

the query answer and the remaining answers do not need to be processed. Several query answering

algorithms have been proposed in the literature to efficiently process top-k queries. The most popular is

the Threshold Algorithm (TA) independently proposed by several groups (Fagin, Lotem & Naor, 2001;

Nepal & Ramakrishna, 1999; Güntzer, Balke & Kießling, 2000).

The TA algorithm works as follows:

1. do sorted access in parallel to each of the N sorted lists. As a tuple t is seen under sorted access in

some list, do random access to the other lists to find the score of t in every list and compute the

overall score of t . Maintain in a set TOP the k seen tuples whose overall scores are the highest

among all tuples seen so far;

2. for each list L i , let s i be the last score seen under sorted access in L i . Define the threshold to be

Ä =

agg ... 1 . If TOP involves k tuples whose overall scores are higher than or equal to τ ,

then stop doing sorted access to the lists. Otherwise, go to step 1;

3. return TOP.

(s ,

,s )

Table 2 shows an example with three Lists L 1 , L 2 and L 3 . Assume that the top-k query requests the

top-2 tuples and the aggregation function agg is the summation function SUM. TA first scans the first

tuples in all lists which are t 5 , t 4 , and t 3 . Hence the threshold value at this time is τ = 21+34+30 = 85.

Then TA calculates the aggregated score for each tuple seen so far by random accesses to the three lists.

We get the aggregated score for t 5 SUM( t 5 ) = 21+9+7 = 37, for t 4 SUM( t 4 ) = 11+34+14 = 59 and for

t 3 SUM( t 3 ) = 11+26+30 = 67. TA maintains the top-2 tuples seen so far which are t 3 and t 4 . As neither

of them has an aggregated score greater than the current threshold value τ = 85, TA continues to scan

the tuples at the second positions of all lists. At this time, the threshold value is recomputed as τ = 17

+ 29 + 14 = 60. The new tuples seen are t 1 and t 2 . Their aggregated scores are retrieved and calculated

as SUM( t 1 ) = 0 + 29 + 0 = 29 and SUM( t 2 ) = 17 + 0 + 1 = 18. TA still keeps tuples t 3 and t 4 since their

aggregated scores are higher than those of both t 1 and t 2 . Since only t 3 has an aggregated score greater

than the current threshold value τ = 60, TA algorithm continues to scan the tuples in the third positions.

Now the threshold value is τ = 11 + 29 + 9 = 49 and the new tuple seen is t 0 . TA computes the aggregated

score for t 0 which is 38. t 3 and t 4 still maintain the two highest aggregated scores which are now greater

Advanced Database Query Systems

Search WWH ::

Custom Search

Home