Database Reference
In-Depth Information
Finally, the probes of the functions m and sf required to identify the top-k objects in the Skyline
correspond to necessary probes, i.e., a probe p of the functions m or f is necessary if and only if p is
performed on an object o ξ <sf,m,k> . In this work, we define an algorithm that minimizes the number of
non-necessary probes, while computing the Top-k Skyline objects with respect to the functions m and sf .
Related Work
Skyline (Börzsönyi et al., 2001) and Top-k (Carey and Kossmann, 1997) approaches have been defined
in the context of databases to distinguish the best points that satisfy a given ranking condition. A Skyline-
based technique identifies a partially ordered set of points whose order is induced by criteria comprised
of conditions on equally important parameters. Top-k approaches select the top-k elements based on a
score function or discriminatory criteria that induce a totally ordered of the input set.
(Bentley et al., 1978) proposed the first Skyline algorithm, referred to as the maximum vector prob-
lem and it is based on the divide & conquer principle. Progress has been made as of recent on how to
compute efficiently such queries in a relational system and over large datasets. Block-Nested-Loops
(BNL) (Börzsönyi et al., 2001), Sort-Filter-Skyline (SFS) (Godfrey et al., 2005) and LESS (Linear
Elimination Sort for Skyline) (Godfrey et al., 2005) are three algorithms that identify the Skyline by
scanning the whole dataset.
On the other hand, progressive (or online) algorithms for computing Skyline have been introduced:
the Tan et al.'s algorithm, NN (Nearest Neighbor) and BBS (Branch-and-Bound Skyline) (Kossmann et
al., 2002; Papadias et al., 2003; Tan et al., 2001). A progressive algorithm returns the first results without
having to read the entire input and produces more results during execution time. Although these strategies
could be used to implement our approach, they may be inefficient because they may perform a number
of non-necessary probes or require index structures which are not accessible in Web data sources.
In order to process Skyline queries against Web data sources, efficient algorithms have been designed
considering sequential and random accesses. Each data source contains object identifiers and their scores.
A sequential access retrieves an object from a sorted data source while a random access returns the score
from a given object identifier. The Basic Distributed Skyline (BDS) defined by (Balke et al., 2004) is
one of the algorithms to solve this kind of Skyline queries. BDS is a twofold solution which builds a
Skyline superset in a first phase and then, it discards the dominated points in a second phase. A second
algorithm known as Basic Multi-Objective Retrieval (BMOR) is presented by (Balke and Güntzer, 2004);
in contrast to BDS, BMOR compares all the seen objects until a seen object that dominates the virtual
object is found. The virtual object is constantly updated and is comprised of all the highest values seen
so far. Both algorithms avoid to scan the whole dataset, and minimize the number of probes.
A new hybrid approach that combines the benefits of Skyline and Top-k has been proposed and it is
known as Top-k Skyline (Goncalves and Vidal, 2009). Top-k Skyline identifies the top-k objects using
discriminatory criteria that induces a total order of the objects that compose the skyline of points that satisfy
a given multi-dimensional criteria. Top-k Skyline has become necessary in many real-world situations
(Vlachou and Vazirgiannis, 2007), and a variety of ranking metrics have been proposed to discriminate
among the points in the Skyline, e.g., Skyline Frequency (Chan et al., 2006b), k-dominant sky-line (Chan
et al., 2006a), and k representative skyline (Lin et al., 2007). The Skyline Frequency Metric is one of the
most significant metrics that ranks skyline points in terms of how many times a skyline point belongs to
the skyline induced by the subsets of the multidimensional criteria; it measures how much a skyline point
satisfies the different parameters in the multidimensional criteria. Intuitively, a high Skyline Frequency
Search WWH ::




Custom Search