Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

Figure 18. Summary z 1

SHOWCAT option for further expansion). At the end of the exploration of the children of z i , go

ahead and examine z i+1 .

For instance, suppose a user is looking for medium , large as well as expensive houses in the suburb

but issues the broad query Q 1 (Example 3.1): 'find medium or large houses in the suburb '. The set of

summaries S presented to that user is { z 1 }, where z 1 is a subtree (Figure 18) in the pre-computed sum-

mary hierarchy shown in Figure 17. In this situation, the user can explore the subtree rooted by z 1 as

follows to reach relevant tuples: analyze the intent of z 1 and explore it using SHOWCAT option, analyze

the intent of z 10 and ignore it, analyze the intent of z 11 and use SHOWTUPLES option to navigate through

the tuples in R z 11 (i.e., t 25 - t 30 ) to identify each relevant tuple.

Note that when the set S = { z } is a singleton, i.e., z is a node of the pre-computed clustering tree, its

exploration is straightforward. Indeed, given a summary of the tree rooted by z that the user wishes to

examine more closely (SHOWCAT option), its children are well separated since SAINTETIQ is designed

to discover summaries (clusters) that locally optimize the objective function U . Furthermore, the num-

ber of clusters presented to the user, at each time, is small; the highest value is equal to the maximum

width of the pre-computed tree. However, since the summary hierarchy is independent of the query, the

set of starting point answers S could be large and consequently dissimilarity between summaries is

susceptible to skew. It occurs when the summary hierarchy is not perfectly adapted to the user query. In

this situation, it is hard for the user to separate the interesting summaries from the uninteresting ones,

thereby leading to potential decision paralysis and wastage of time and effort.

In the next subsection, we propose an original rearranging query results algorithm to tackle this problem.

3.4 Rearranging the Result Set

The problem of discrimination (dissimilarity) between ESA 's results occurs when these results are scat-

tered over the queried summary hierarchy. This situation is illustrated in Figure 19 (Left), where the set

of summaries S = { z 00 , z 01 , z 1000 , z 101 , z 11 } is returned by ESA as the result of a query Q over the summary

hierarchy H .

A straightforward way to address this problem would be to, first, execute the SAINTETIQ summa-

rization service ( SEQ ) on the cells populated by records of tset(Q) , i.e., the cells covered by summaries

Search WWH ::

Custom Search

Home