Database Reference
In-Depth Information
the top-k problem, T q could alternatively be seen as a set of N sorted lists L i of T q (the number of tuples
in T q ) pairs (t,s ),t i Î . Hence, for each attribute A i , there is a sorted list L i in which all T q results
are ranked in descendant order. Entries in the lists could be accessed randomly from the tuple identifier
or sequentially from the sorted score. The main issue for top-k query processing is then to obtain the k
tuples with the highest overall scores computed according to a given aggregation function agg(s ,
...
,s )
N
1
of the attribute scores s i . The aggregation function agg used to combine ranking criteria has to be mono-
tone; that is, agg must satisfy the following property:
agg(s ,
...
,s )
£
agg(s' ,
...
,s' )
if
s
£
s'
for every i
1
N
1
N
i
i
The naive algorithm consists in looking at every entry t,s i in each of the sorted lists L i , computing
the overall grade of every object t , and returning the top k answers. Obviously, this approach is un-
necessarily expensive as it does not take advantage of the fact that only the k best answers are part of
the query answer and the remaining answers do not need to be processed. Several query answering
algorithms have been proposed in the literature to efficiently process top-k queries. The most popular is
the Threshold Algorithm (TA) independently proposed by several groups (Fagin, Lotem & Naor, 2001;
Nepal & Ramakrishna, 1999; Güntzer, Balke & Kießling, 2000).
The TA algorithm works as follows:
1. do sorted access in parallel to each of the N sorted lists. As a tuple t is seen under sorted access in
some list, do random access to the other lists to find the score of t in every list and compute the
overall score of t . Maintain in a set TOP the k seen tuples whose overall scores are the highest
among all tuples seen so far;
2. for each list L i , let s i be the last score seen under sorted access in L i . Define the threshold to be
Ä =
agg ... 1 . If TOP involves k tuples whose overall scores are higher than or equal to τ ,
then stop doing sorted access to the lists. Otherwise, go to step 1;
3. return TOP.
(s ,
,s )
N
Table 2 shows an example with three Lists L 1 , L 2 and L 3 . Assume that the top-k query requests the
top-2 tuples and the aggregation function agg is the summation function SUM. TA first scans the first
tuples in all lists which are t 5 , t 4 , and t 3 . Hence the threshold value at this time is τ = 21+34+30 = 85.
Then TA calculates the aggregated score for each tuple seen so far by random accesses to the three lists.
We get the aggregated score for t 5 SUM( t 5 ) = 21+9+7 = 37, for t 4 SUM( t 4 ) = 11+34+14 = 59 and for
t 3 SUM( t 3 ) = 11+26+30 = 67. TA maintains the top-2 tuples seen so far which are t 3 and t 4 . As neither
of them has an aggregated score greater than the current threshold value τ = 85, TA continues to scan
the tuples at the second positions of all lists. At this time, the threshold value is recomputed as τ = 17
+ 29 + 14 = 60. The new tuples seen are t 1 and t 2 . Their aggregated scores are retrieved and calculated
as SUM( t 1 ) = 0 + 29 + 0 = 29 and SUM( t 2 ) = 17 + 0 + 1 = 18. TA still keeps tuples t 3 and t 4 since their
aggregated scores are higher than those of both t 1 and t 2 . Since only t 3 has an aggregated score greater
than the current threshold value τ = 60, TA algorithm continues to scan the tuples in the third positions.
Now the threshold value is τ = 11 + 29 + 9 = 49 and the new tuple seen is t 0 . TA computes the aggregated
score for t 0 which is 38. t 3 and t 4 still maintain the two highest aggregated scores which are now greater
Search WWH ::




Custom Search