Database Reference
In-Depth Information
Figure 20. Time cost comparison
ing that the end-user benefit is proportional to the number of items (clusters or tuples) the user needs to
examine at any time as well as to the dissimilarity between these items.
We define the 'Compression Rate' ( CR ) as the ratio of the number of clusters (summaries) returned
as a starting point for an online exploration over the total number of cells covered by such summaries.
Note that CR = 1 means no compression at all, whereas smaller values represent higher compression. As
expected, Figure 21 shows that CR values of ESRA and ESA+SEQ are quite similar and much smaller
than that of ESA . Thus, size reduction is clearly achieved by the ESRA algorithm.
We could also see that the dissimilarity (Figure 22) of the first partitioning that ESRA presents to the
end-user is greater than that of ESA and is in the same order of magnitude than that provided by ESA+SEQ .
It means that ESRA significantly improves discrimination between items when compared against ESA
and is as effective as the post-clustering approach ESA+SEQ . Furthermore, the dissimilarity of the ESA
result is quite similar to that of the most specialized partitioning, i.e., the set of cells. Thus the rearrang-
ing process is highly required to provide the end-user with well-founded clusters.
Now assume that the user decides to explore a summary z returned by ESRA . We want to examine
the number of hops NoH (i.e., the number of SHOWTUPLES/SHOWCAT operations) the user might
employ to reach a relevant tuple. NoH ranges from 1 up to d z + 1 , where d z is the height of the tree
rooted by z (i.e., subtree H z ). The best case ( NoH = 1) occurs when z exactly fits the user's need, where-
as the worst case ( NoH = d z + 1) occurs when the relevant information is reached by following a path
of maximal length in H z . Note that these two scenarios are on opposite extremes of the spectrum of pos-
sible situations: the generalized case (0 ≤ NoH d z + 1) is that the user's need is successfully served by
a node at height h such that 0 ≤ h d z .
In the following experiment (Figure 23), one considers a user query q with selectivity of η , i.e., the
number of tuples returned by q divided by the total number of tuples in the database (33735). We look
for all the possible summaries (subtrees) in the pre-computed summary hierarchy that can be returned
Search WWH ::




Custom Search