Evaluating Top-k Skyline Queries Efficiently - Advanced Database Query Systems

Database Reference

In-Depth Information

Skyline frequency is one of the most significant metrics that measures interestingness of each Skyline

point in the answer. Intuitively, a high Skyline frequency value indicates that the point is dominated

only in few subsets of multidimensional criteria and therefore, it is considered a very interesting point

because it may dominate in many of the subsets.

In (Chan et al., 2006b), the authors proposed an efficient approximated algorithm to estimate the

skyline frequency values. (Yuan et al., 2005) define two algorithms to efficiently calculate the Skycube

or the union of the Skylines of the non-empty subsets of multidimensional criteria. Both algorithms

make uses of the Skyline point properties to speed the Skycube computation. But, they compute the

Skycube completely.

To overcome these limitations, we propose an algorithm that takes advantages of the properties of

the skyline frequency metric, and identifies the subset of the Skyline points that are needed to compute

the top-k ones in the answer. In this chapter, we will address the problem of computing Top-k Skyline

queries efficiently (Goncalves and Vidal, 2009; Chan et al., 2006b) in a way that number of probes of

the multidimensional function or score function is minimized.

This chapter comprises five sections in addition to section 1 that motivates the problem. Section 2

presents the background required to understand the Top-k Skyline problem. Section 3 will present our

Top-k Skyline approach. We will describe an algorithm that is able to compute only the subset of the

Skyline that will be required to produce the top-k objects. In Section 4, the quality and performance of

the proposed technique will be empirically evaluated against the state-of-the-art solutions. Finally, the

conclusions of our work will be pointed out in Section 5.

BACKGROUND

In this section, we present a motivating example and preliminary definitions, and we summarize existing

approaches to compute the Skyline and Top-k points. Then, we will outline the advantages and limitations

of each approach, and we will consider existing solutions defined to calculate Top-k Skyline. Finally,

we will present some metrics proposed to rank the Skyline which allows to score the importance of the

Skyline without necessity of defining a score function.

Motivating Example

To motivate the problem of computing Top-k Skyline queries efficiently, consider the DBLP Computer

Science Bibliography database (Ley, 2010) and a research institute which offers a travel fellowship to

the best three researchers based on their number of publications on the main four database conferences:

EDBT, ICDE, VLDB and SIGMOD. The DBLP database provides information on researcher's perfor-

mance, which include number of papers in each conference. The summarized information is organized

in the Researcher relational table, where the candidates are described by an identifier, author name, total

number journals in SIGMODR, VLDBJ, TODS, TKDE, and DKE, and number of papers in EDBT,

ICDE, VLDB SIGMOD.

According to the research institute policy, all criteria are equally important and relevant; hence, either

a weight or a score function cannot be assigned. A candidate can be chosen for granting an award, if and

only if, there is no other candidate with more papers in EDBT, ICDE, VLDB and SIGMOD. To nominate

a candidate, the research institute must identify the set of all the candidates that are non-dominated by any

Search WWH ::

Custom Search

Home