Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

Figure 20. Time cost comparison

ing that the end-user benefit is proportional to the number of items (clusters or tuples) the user needs to

examine at any time as well as to the dissimilarity between these items.

We define the 'Compression Rate' ( CR ) as the ratio of the number of clusters (summaries) returned

as a starting point for an online exploration over the total number of cells covered by such summaries.

Note that CR = 1 means no compression at all, whereas smaller values represent higher compression. As

expected, Figure 21 shows that CR values of ESRA and ESA+SEQ are quite similar and much smaller

than that of ESA . Thus, size reduction is clearly achieved by the ESRA algorithm.

We could also see that the dissimilarity (Figure 22) of the first partitioning that ESRA presents to the

end-user is greater than that of ESA and is in the same order of magnitude than that provided by ESA+SEQ .

It means that ESRA significantly improves discrimination between items when compared against ESA

and is as effective as the post-clustering approach ESA+SEQ . Furthermore, the dissimilarity of the ESA

result is quite similar to that of the most specialized partitioning, i.e., the set of cells. Thus the rearrang-

ing process is highly required to provide the end-user with well-founded clusters.

Now assume that the user decides to explore a summary z returned by ESRA . We want to examine

the number of hops NoH (i.e., the number of SHOWTUPLES/SHOWCAT operations) the user might

employ to reach a relevant tuple. NoH ranges from 1 up to d z + 1 , where d z is the height of the tree

rooted by z (i.e., subtree H z ). The best case ( NoH = 1) occurs when z exactly fits the user's need, where-

as the worst case ( NoH = d z + 1) occurs when the relevant information is reached by following a path

of maximal length in H z . Note that these two scenarios are on opposite extremes of the spectrum of pos-

sible situations: the generalized case (0 ≤ NoH ≤ d z + 1) is that the user's need is successfully served by

a node at height h such that 0 ≤ h ≤ d z .

In the following experiment (Figure 23), one considers a user query q with selectivity of η , i.e., the

number of tuples returned by q divided by the total number of tuples in the database (33735). We look

for all the possible summaries (subtrees) in the pre-computed summary hierarchy that can be returned

Search WWH ::

Custom Search

Home