Database Reference
In-Depth Information
Skyline frequency is one of the most significant metrics that measures interestingness of each Skyline
point in the answer. Intuitively, a high Skyline frequency value indicates that the point is dominated
only in few subsets of multidimensional criteria and therefore, it is considered a very interesting point
because it may dominate in many of the subsets.
In (Chan et al., 2006b), the authors proposed an efficient approximated algorithm to estimate the
skyline frequency values. (Yuan et al., 2005) define two algorithms to efficiently calculate the Skycube
or the union of the Skylines of the non-empty subsets of multidimensional criteria. Both algorithms
make uses of the Skyline point properties to speed the Skycube computation. But, they compute the
Skycube completely.
To overcome these limitations, we propose an algorithm that takes advantages of the properties of
the skyline frequency metric, and identifies the subset of the Skyline points that are needed to compute
the top-k ones in the answer. In this chapter, we will address the problem of computing Top-k Skyline
queries efficiently (Goncalves and Vidal, 2009; Chan et al., 2006b) in a way that number of probes of
the multidimensional function or score function is minimized.
This chapter comprises five sections in addition to section 1 that motivates the problem. Section 2
presents the background required to understand the Top-k Skyline problem. Section 3 will present our
Top-k Skyline approach. We will describe an algorithm that is able to compute only the subset of the
Skyline that will be required to produce the top-k objects. In Section 4, the quality and performance of
the proposed technique will be empirically evaluated against the state-of-the-art solutions. Finally, the
conclusions of our work will be pointed out in Section 5.
BACKGROUND
In this section, we present a motivating example and preliminary definitions, and we summarize existing
approaches to compute the Skyline and Top-k points. Then, we will outline the advantages and limitations
of each approach, and we will consider existing solutions defined to calculate Top-k Skyline. Finally,
we will present some metrics proposed to rank the Skyline which allows to score the importance of the
Skyline without necessity of defining a score function.
Motivating Example
To motivate the problem of computing Top-k Skyline queries efficiently, consider the DBLP Computer
Science Bibliography database (Ley, 2010) and a research institute which offers a travel fellowship to
the best three researchers based on their number of publications on the main four database conferences:
EDBT, ICDE, VLDB and SIGMOD. The DBLP database provides information on researcher's perfor-
mance, which include number of papers in each conference. The summarized information is organized
in the Researcher relational table, where the candidates are described by an identifier, author name, total
number journals in SIGMODR, VLDBJ, TODS, TKDE, and DKE, and number of papers in EDBT,
ICDE, VLDB SIGMOD.
According to the research institute policy, all criteria are equally important and relevant; hence, either
a weight or a score function cannot be assigned. A candidate can be chosen for granting an award, if and
only if, there is no other candidate with more papers in EDBT, ICDE, VLDB and SIGMOD. To nominate
a candidate, the research institute must identify the set of all the candidates that are non-dominated by any
Search WWH ::




Custom Search