Database Reference
In-Depth Information
Table 1. Estimated Skyline Cardinality
#Dimensions
Cardinality
2
191
3
2,637
4
36,431
5
503,309
6
6,953,471
7
96,065,749
8
1,327,197,371
9
18,335,909,288
10
253,319,948,365
whose complexity depends on the size of the database. Particularly, the task of evaluating queries based
on user preferences may be considerably affected by this situation.
Skyline (Börzsönyi et al., 2001) approaches have been successfully used to naturally express user
preference conditions useful to characterize relevant data in large datasets. Even though, Skyline may
be a good choice for huge data sets its cardinality may become very large as the number of criteria or
dimensions increases. The estimated cardinality of the Skyline is O(ln d-1 n) when the dimensions are
independent where n is the size of the input data and d the number of dimensions (Bentley et al., 1978).
Consider Table 1 that shows estimates of the skyline cardinality when the number of dimensions
ranges from 2 to 10 in a database comprised of 1,000,000 tuples. We may observe in Table 1 that Skyline
cardinality rapidly increases making unfeasible for users to process the whole skyline set. In conse-
quence, users may have to discard useless data manually and consider just a small subset or a subset of
the Skyline that best meet the multidimensional criteria. To identify these points, the Top-k Skyline has
been proposed (Goncalves and Vidal, 2009; Chan et al., 2006b; Lin et al., 2007). Top-k Skyline uses a
score function to induce a total order of the Skyline points, and recognizes the top-k objects based on
these criteria.
Several algorithms have been defined to compute the Top-k Skyline, but they may be very costly
(Goncalves and Vidal, 2009; Chan et al., 2006b; Lin et al., 2007; Vlachou and Vazirgiannis, 2007). First,
they require the computation of the whole Skyline; second, they execute probes of the multidimen-
sional function over the whole Skyline points. Thus, if k is much smaller than the cardinality of the
Skyline, these solutions may be very inefficient because a large number of non-necessary probes may
be evaluated, i.e., at least Skyline size minus k performed probes will be non-necessaries.
Top-k Skyline has become necessary in many real-world situations (Vlachou and Vazirgiannis, 2007),
and a wide range of ranking metrics to measure the interestingness of each Skyline tuple has been pro-
posed. Examples of these ranking metrics are skyline frequency (Chan et al., 2006b), k-dominant skyline
(Chan et al., 2006a), and k representative skyline (Lin et al., 2007). Skyline frequency ranks Skyline
in terms of the number of times in which a Skyline tuple belongs to a non-empty subset or subspace of
the multi-dimensional criteria. k-dominant skyline metric identifies Skyline points in k ≤ d dimensions
of multi-dimensional criteria. Finally, k representative skyline metric produces the k Skyline points that
have the maximal number of dominated object.
 
Search WWH ::




Custom Search