Evaluating Top-k Skyline Queries Efficiently - Advanced Database Query Systems

Database Reference

In-Depth Information

Table 1. Estimated Skyline Cardinality

#Dimensions

Cardinality

2

191

3

2,637

4

36,431

5

503,309

6

6,953,471

7

96,065,749

8

1,327,197,371

9

18,335,909,288

10

253,319,948,365

whose complexity depends on the size of the database. Particularly, the task of evaluating queries based

on user preferences may be considerably affected by this situation.

Skyline (Börzsönyi et al., 2001) approaches have been successfully used to naturally express user

preference conditions useful to characterize relevant data in large datasets. Even though, Skyline may

be a good choice for huge data sets its cardinality may become very large as the number of criteria or

dimensions increases. The estimated cardinality of the Skyline is O(ln d-1 n) when the dimensions are

independent where n is the size of the input data and d the number of dimensions (Bentley et al., 1978).

Consider Table 1 that shows estimates of the skyline cardinality when the number of dimensions

ranges from 2 to 10 in a database comprised of 1,000,000 tuples. We may observe in Table 1 that Skyline

cardinality rapidly increases making unfeasible for users to process the whole skyline set. In conse-

quence, users may have to discard useless data manually and consider just a small subset or a subset of

the Skyline that best meet the multidimensional criteria. To identify these points, the Top-k Skyline has

been proposed (Goncalves and Vidal, 2009; Chan et al., 2006b; Lin et al., 2007). Top-k Skyline uses a

score function to induce a total order of the Skyline points, and recognizes the top-k objects based on

these criteria.

Several algorithms have been defined to compute the Top-k Skyline, but they may be very costly

(Goncalves and Vidal, 2009; Chan et al., 2006b; Lin et al., 2007; Vlachou and Vazirgiannis, 2007). First,

they require the computation of the whole Skyline; second, they execute probes of the multidimen-

sional function over the whole Skyline points. Thus, if k is much smaller than the cardinality of the

Skyline, these solutions may be very inefficient because a large number of non-necessary probes may

be evaluated, i.e., at least Skyline size minus k performed probes will be non-necessaries.

Top-k Skyline has become necessary in many real-world situations (Vlachou and Vazirgiannis, 2007),

and a wide range of ranking metrics to measure the interestingness of each Skyline tuple has been pro-

posed. Examples of these ranking metrics are skyline frequency (Chan et al., 2006b), k-dominant skyline

(Chan et al., 2006a), and k representative skyline (Lin et al., 2007). Skyline frequency ranks Skyline

in terms of the number of times in which a Skyline tuple belongs to a non-empty subset or subspace of

the multi-dimensional criteria. k-dominant skyline metric identifies Skyline points in k ≤ d dimensions

of multi-dimensional criteria. Finally, k representative skyline metric produces the k Skyline points that

have the maximal number of dominated object.

Advanced Database Query Systems

Search WWH ::

Custom Search

Home