Database Reference
In-Depth Information
Figure 20. Time cost comparison
ing that the end-user benefit is proportional to the number of items (clusters or tuples) the user needs to
examine at any time as well as to the dissimilarity between these items.
We define the 'Compression Rate' (
CR
) as the ratio of the number of clusters (summaries) returned
as a starting point for an online exploration over the total number of cells covered by such summaries.
Note that
CR
= 1 means no compression at all, whereas smaller values represent higher compression. As
expected, Figure 21 shows that
CR
values of
ESRA
and
ESA+SEQ
are quite similar and much smaller
than that of
ESA
. Thus, size reduction is clearly achieved by the
ESRA
algorithm.
We could also see that the dissimilarity (Figure 22) of the first partitioning that
ESRA
presents to the
end-user is greater than that of
ESA
and is in the same order of magnitude than that provided by
ESA+SEQ
.
It means that
ESRA
significantly improves discrimination between items when compared against
ESA
and is as effective as the post-clustering approach
ESA+SEQ
. Furthermore, the dissimilarity of the
ESA
result is quite similar to that of the most specialized partitioning, i.e., the set of cells. Thus the rearrang-
ing process is highly required to provide the end-user with well-founded clusters.
Now assume that the user decides to explore a summary z returned by
ESRA
. We want to examine
the number of hops
NoH
(i.e., the number of SHOWTUPLES/SHOWCAT operations) the user might
employ to reach a relevant tuple.
NoH
ranges from 1 up to
d
z
+ 1
, where
d
z
is the height of the tree
rooted by
z
(i.e., subtree
H
z
). The best case (
NoH
= 1) occurs when
z
exactly fits the user's need, where-
as the worst case (
NoH
=
d
z
+ 1) occurs when the relevant information is reached by following a path
of maximal length in
H
z
. Note that these two scenarios are on opposite extremes of the spectrum of pos-
sible situations: the generalized case (0 ≤
NoH
≤
d
z
+ 1) is that the user's need is successfully served by
a node at height
h
such that 0 ≤
h
≤
d
z
.
In the following experiment (Figure 23), one considers a user query
q
with selectivity of
η
, i.e., the
number of tuples returned by
q
divided by the total number of tuples in the database (33735). We look
for all the possible summaries (subtrees) in the pre-computed summary hierarchy that can be returned