Database Reference
In-Depth Information
Figure 19. (Left) A pre-computed hierarchy H (Right) Rearranging Q summary results
of S . Then, we present to the user the top-level partition of the result tree. We will refer to this approach
as ESA+SEQ (i.e., search step followed by summarization step) for the remainder of this chapter.
Size reduction and discrimination between items in S are clearly achieved at the expense of an over-
head computational cost. Indeed, the search step time complexity T ESA is in O(L) where L is the number
of leaves of the queried summary hierarchy (see Section 3.2.3). Furthermore, the summarization step
time complexity T SEQ is in O(L′ · log L′) with L′ the number of cells populated by answer records (see
Section 3.1.2). Therefore, the global time complexity T ESA + SEQ of the ESA+SEQ approach is in O(L ·
log L) : LL′ since the query is expected to be broad. Thus, ESA+SEQ doesn't fit the querying process
requirement (see experimental results in Section 3.6), that is to quickly provide the user with concise
and structured answers.
To tackle this problem, we propose an algorithm coined Explore-Select-Rearrange Algorithm ( ESRA )
that rearranges answers, based on the hierarchical structure of the queried summary hierarchy, before
returning them to the user. The main idea of this approach is rather simple. It starts from the summary
partition S (a clustering schema of the query results) and produces a sequence of clustering schemas with
a decreasing number of clusters at each step. Each clustering schema produced at each step results from
the previous one by merging the ' closest ' clusters into a single one. Similar clusters are identified thanks
to the hierarchical structure of the pre-computed summary hierarchy. Intuitively, summaries which are
closely related have a common ancestor lower in the hierarchy, whereas the common ancestor of unre-
lated summaries is near the root. This process stops when it reaches a single hyperrectangle (the root z* ).
Then, we present to the user the top-level partition (i.e., children of z* ) in the obtained tree instead of S .
For instance, when this process is performed on the set of summaries S = { z 00 , z 01 , z 1000 , z 101 , z 11 } shown
in Figure 19, the sequence of clustering schemas in Table 10 is produced.
The hierarchy H′ obtained from the set of query results S is shown in Figure 19 (Right). Thus, the
partition { z′ , z′′′ } is presented to the user instead of S . This partition has a small size and defines well
separated clusters. Indeed, all agglomerative methods, including the above rearranging process, have a
monotonicity property (Hastie, Tibshirani & Friedman, 2001): the dissimilarity between the merged
clusters is monotonically increasing with the level. In the above example, it means that the dissimilar-
ity value of the partition { z′ , z′′′ } is greater than the dissimilarity value of the partition { z 00 , z 01 , z 1000 , z 101 ,
z 11 }.
Search WWH ::




Custom Search