Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

Figure 19. (Left) A pre-computed hierarchy H (Right) Rearranging Q summary results

of S . Then, we present to the user the top-level partition of the result tree. We will refer to this approach

as ESA+SEQ (i.e., search step followed by summarization step) for the remainder of this chapter.

Size reduction and discrimination between items in S are clearly achieved at the expense of an over-

head computational cost. Indeed, the search step time complexity T ESA is in O(L) where L is the number

of leaves of the queried summary hierarchy (see Section 3.2.3). Furthermore, the summarization step

time complexity T SEQ is in O(L′ · log L′) with L′ the number of cells populated by answer records (see

Section 3.1.2). Therefore, the global time complexity T ESA + SEQ of the ESA+SEQ approach is in O(L ·

log L) : LL′ since the query is expected to be broad. Thus, ESA+SEQ doesn't fit the querying process

requirement (see experimental results in Section 3.6), that is to quickly provide the user with concise

and structured answers.

To tackle this problem, we propose an algorithm coined Explore-Select-Rearrange Algorithm ( ESRA )

that rearranges answers, based on the hierarchical structure of the queried summary hierarchy, before

returning them to the user. The main idea of this approach is rather simple. It starts from the summary

partition S (a clustering schema of the query results) and produces a sequence of clustering schemas with

a decreasing number of clusters at each step. Each clustering schema produced at each step results from

the previous one by merging the ' closest ' clusters into a single one. Similar clusters are identified thanks

to the hierarchical structure of the pre-computed summary hierarchy. Intuitively, summaries which are

closely related have a common ancestor lower in the hierarchy, whereas the common ancestor of unre-

lated summaries is near the root. This process stops when it reaches a single hyperrectangle (the root z* ).

Then, we present to the user the top-level partition (i.e., children of z* ) in the obtained tree instead of S .

For instance, when this process is performed on the set of summaries S = { z 00 , z 01 , z 1000 , z 101 , z 11 } shown

in Figure 19, the sequence of clustering schemas in Table 10 is produced.

The hierarchy H′ obtained from the set of query results S is shown in Figure 19 (Right). Thus, the

partition { z′ , z′′′ } is presented to the user instead of S . This partition has a small size and defines well

separated clusters. Indeed, all agglomerative methods, including the above rearranging process, have a

monotonicity property (Hastie, Tibshirani & Friedman, 2001): the dissimilarity between the merged

clusters is monotonically increasing with the level. In the above example, it means that the dissimilar-

ity value of the partition { z′ , z′′′ } is greater than the dissimilarity value of the partition { z 00 , z 01 , z 1000 , z 101 ,

z 11 }.

Search WWH ::

Custom Search

Home