Database Reference
In-Depth Information
value indicates that a point may be dominated on smaller subsets of the multidimensional criteria, and
it can be considered a very good point because it may dominate in many of the other subsets; in con-
trast, a skyline point with a low Skyline Frequency value shows that other skyline points dominate it in
subsets of the multidimensional criteria. Approaches in (Pei et al., 2006; Yuan et al., 2005) propose two
algorithms to compute Skyline Frequency values by building the Skycube or the union of the skylines
of the non-empty subsets of the multidimensional criteria. The Bottom-Up Skycube algorithm (BUS)
(Yuan et al., 2005) identifies the Skycube of d dimensions in a bottom-up fashion. BUS sorts dataset on
each dimension of the multidimensional criteria in a list and it calculates the skyline points from one
to d dimensions. BUS makes use of the skyline point properties to speed the Skycube computation. On
the other hand, the Top-Down Skycube algorithm (TDS) (Pei et al., 2006) computes the Skycube in a
top-down manner based on a Divide and Conquer (DC) Skyline algorithm (Börzsönyi et al., 2001). TDS
computes a minimal set of paths in the lattice structure of the Skycube and then, it identifies skylines
in these paths. Thus, multiple related skylines are built simultaneously. BUS and TDS can be used to
compute the Top-k Skyline. However, some overhead may have to be paid, because both algorithms
compute the Skycube completely. (Goncalves and Vidal, 2009) propose an index-based technique called
TKSI, to compute the Top-k Sky-line points by just probing the minimal subset of incomparable points
and using a given score function to distinguish the best points.
TOP-K SKYLINE
The Top-k Skyline approach identifies the objects that best meet the multi-dimensional criteria based
on the Skyline Frequency Metric (SFM). The problem of efficient implementation of Top-k Skyline
(EITKS) is defined as the problem of building the set ξ <sf,m,k> minimizing the number of non-necessary
probes; a probe p on the multidimensional criteria function m and the Skyline Frequency metric sf is
necessary if and only if p is perfomed on an object o ξ <sf,m,k> .
Skyline frequency of an object o returns the number of times in which o belongs to a non-empty
subset the multidimensional criteria. There are 2 q - 1 non-empty subspaces for multidimensional criteria
m with q dimensions. In Figure 1, we illustrate that the structure of lattice of for our example contains
2 4 - 1 subspaces. Moreover, there exists a containment relationship property between the skylines of
subspaces when data are non-duplicated: Given two subspaces U , V where U V , then SKY U SKY V .
Since o SKY U , none object may be better than o in all criteria of V (Chan et al., 2006b). For example,
the object 8846 SKY {VLDB} also belongs to SKY {VLDB,SIGMOD} because of Skyline definition formula,
i.e., any object o in SKY {VLDB,SIGMOD} may dominate 8846 in {SIGMOD} but not in {VLDB, SIGMOD}.
BUS is based on the containment relationship property in order to save probes among subspaces.
Instead of constructing Skylines of each subspace individually, BUS builds the lattice in a bottom-up
fashion sharing results of the skylines in order to minimize probes of the multidimensional criteria. To
illustrate the behavior of the BUS with an example, consider the following query: the top-1 candidates
with maximum number of papers in EDBT, ICDE, VLDB, and SIGMOD. To answer this query, BUS
calculates the Skyline for each subspace of 1-dimension EDBT, ICDE, VLDB, SIGMOD; then shares
results for the skyline of subspaces of 2-dimensions {EDBT,ICDE},{EDBT,VLDB}, {EDBT,SIGMOD},
{ICDE,VLDB}, {ICDE,SIGMOD}, {VLDB, SIGMOD} using the containment relationship property;
so and so until 2 4 - 1 all Skylines for subspaces of multidimensional criteria are built. The Skylines of
each subspace are in Table 3.
Search WWH ::




Custom Search