Database Reference
In-Depth Information
Table 5. The TKSI Algorithm Execution
Id
Publication/GPA/Experience
Index
8846
12
I
5
8846
10
I
1
19660
9
I
1
Property 1.
Given a set of sorted indices I
1
, …, I
q
on each attribute of multidimensional criteria
m
; an
index I
q+1
defined on values of SFM
sf
; the Skyline SKY
S
; an integer
k
such as 0 < k ≤ |SKY
S
|; and object
o
indexed by I
q+1
. Then,
o
is an Top-k Skyline object if not exists an object above
o
in all indices I
1
, …,
I
q
and not exists Top k - 1 Skyline objects with higher SFM value than it.
TKSI focuses on performing sorted accesses on the SFM index I
q+1
firstly and then, verifying if
each accessed object is a Top-k Skyline using the indices on multidimensional criteria I
i
, …, I
q
. Basi-
cally, TKSI receives a set of indices on each attribute of multidimensional criteria
m
and the Skyline
Frequency metric
sf
, and an integer
k
; and it builds the Top-k Skyline using the indices (Table 4). Since
the indices on multidimensional criteria are sorted, TKSI has not to scan the entire index and builds the
whole skyline while
k
is smaller than the Skyline size.
Following with our example, TKSI accesses the objects from I
5
sequentially until the top-1 object is
produced. For each accessed object
o
from I
5
, TKSI verifies that
o
is a Top-k Skyline object. Because
objects are sorted, it is very likely that any object with the higher values in each index of function
m
dominates the next objects in the indices. For this reason, TKSI must select one of the indices I
1
, I
2
, I
3
,
or I
4
in order to minimize the necessary probes over the multicriteria function
m
. The objects could be
accessed in a round robin fashion. However, in order to speed up the computation, TKSI determines
what is the index whose distance with respect to
o
is the lowest, i.e., the index that will avoid the access
of more non-necessary objects. To do this, TKSI computes the distance D
1
as the difference between
the last seen value from I
1
and the value for EDBT of o (min
1
- s
1
(o)), D
2
as the difference between the
last seen value from I
2
and the value for ICDE of o (min
2
- s
2
(o)), D
3
as the difference between the last
seen value from I
3
and the value for VLDB of o (min
3
- s
3
(o)), and D
4
as the difference between the
last seen value from I
4
and the value for SIGMOD of o (min
4
- s
4
(o)). Next, TKSI selects the minimum
value between D
1
, D
2
, D
3
, and D
4
.
Initially, TKSI accesses the first object
8846
from I
5
, and their values for EDBT, ICDE, VLDB and
SIGMOD randomly. Because of the objects from I
1
, I
2
, I
3
, and I
4
have not been seen yet; TKSI assumes
the last seen value is the maximum value possible for the attribute. Therefore, the best distance between
D
1
= 10 - 10, D
2
= 49 - 23, D
3
= 35 - 35, and D
4
= 39 - 28 is calculated. In this case, I
1
and I
3
have the
minimum distance. Note that
8846
is placed in the indices I
1
and I
3
in a lower position than the same
object in I
2
and I
4
. The objects of I
1
are accessed until the object
19660
with a value lower in EDBT is
found. All these objects are compared against
8846
to verify if some of them dominate it. Since, none
of the objects dominates
8846
, the object
8846
is a Top-k Skyline object. If some object indexed by I
1
dominates
8846
, a new object from I
5
is accessed. However, the algorithm decides to stop here because
the objects behind
19660
have worse values in Publication than
8846
, and they may not dominate
8846
.
The detailed TKSI execution is showed in Table 5.