Database Reference
In-Depth Information
Table 6. DBLP features
Name
#nodes
#edges
Size (MB)
DBLP
876,110
4,166,626
3,950
Theorem 1. Given a set of sorted indices I 1 , …, I q on each attribute of multidimensional criteria m ; an
index I q+1 defined on values of SFM sf ; the Skyline SKY S ; and an integer k such as 0 < k ≤ |SKY S |. Then,
a lower bound of the number of probes performed by TKSI is 2k.
The best case for the TKSI algorithm is each object o accessed by the index I q+1 is compared against
a only object o ' of some index I 1 , …, I q and each object o is in the answer. Thus, 2k probes are necessary
because k objects are compared and TKSI verifies by each object in I q+1 if each object o dominates o ' ,
and o ' dominates o .
EXPERIMENTAL STUDY
Dataset and Query Benchmark: We shredded the downloaded DBLP file (Ley, 2010) into the relational
database; DBLP features are shows in Table 6. We randomly generated 25 queries by restricting the
numbers of papers by each author in the DBLP dataset; the queries are characterized by the following
properties: (a) only one table in the FROM clause; (b) the attributes in the multicriteria function were
selected following a uniform distribution; (c) directives for each attribute of the multicriteria function
were selected considering only maximizing; (d) the number of attributes of the multicriteria function is
five, six and seven; and (e) k is 3.
Evaluation Metrics: we report on the Number of Probes (NP), the ratio of the skyline size and
the Normalized Skyline Frequency value (NSF). NP is the number of the probes of the multidi-
mensional criteria and Skyline Frequency Metric evaluations performed by the algorithm. NSF is
a quality metric that represents a percent of non-empty subspaces of the multidimensional criteria;
it indicates how good a Skyline object is. NSF is computed as follows: SFM / (2 q -1).
Implementations: TKSI and BUS algorithms were developed in Java (64-bit JDK version 1.5.0
12) on top of Oracle 9i. A set of sorted queries are executed for each criterion of the multicriteria
and the score functions, and the resultsets are stored on indices. The resultsets are sorted descen-
dantly according to the MAX criteria of the multicriteria function. Each resultset is accessed
on-demand.
Furthermore, a set of hash maps are built, one for each index. These hash maps are comprised of
objects accessed by each index. Also, a variable for each index is updated with the last value seen in
that index. Initially, these variables are set with the best values. Lately, they are updated according to
the last object accessed by each index.
Thus, TKSI accesses the first object o from the index over the score function. It selects which is the
index I i that has the lowest gap with respect to o . The resultset of the selected index I i is scanned until
Search WWH ::




Custom Search