Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

•

In a pre-processing step, we compute knowledge-based summaries of the queried data. The under-

lying summarization technique used in this paper is the SAINTETIQ model (Raschia & Mouaddib,

2002; Saint-Paul, Raschia, & Mouaddib, 2005), which is a domain knowledge-based approach that

enables summarization and classification of structured data stored into a database. SAINTETIQ

first transforms raw data into high-level representations (summaries) that fit the user's perception

of the domain, by means of linguistic labels (e.g., cheap , reasonable , expensive , very expensive )

defined over the data attribute domains and provided by a domain expert or even an end-user. Then

it applies a hierarchical clustering algorithm on these summaries to provide multi-resolution sum-

maries (i.e., summary hierarchy) that represent the database content at different abstraction levels.

The summary hierarchy can be seen as an analogy for knowledge representation estate agent.

•

At query time, we use the summary hierarchy of the data, instead of the data itself, to quickly pro-

vide the user with concise, useful and structured answers as a starting point for an online analysis.

This goal is achieved thanks to the Explore-Select algorithm ( ESA ) that extracts query-relevant

entries from the summary hierarchy. Each answer item describes a subset of the result set in a

human-readable form using linguistic labels. Moreover, answers of a given query are nodes of the

summary hierarchy and every subtree rooted by an answer offers a 'guided tour' of a data subset

to the user. The user then navigates this tree, in a top-down fashion, exploring the summaries of

interest while ignoring the rest. Note that the database is accessed only when the user requests to

download ( Upload ) the original data that a potentially relevant summary describes. Hence, this

framework is intended to help the user iteratively refine her/his information need in the same way

as done by the estate agent.

However, since such the summary hierarchy is independent of the query, the set of starting point

answers could be large and consequently dissimilarity between items is susceptible to skew. It occurs

when the summary hierarchy is not perfectly adapted to the user query. To tackle this problem, we first

propose a straightforward approach ( ESA-SEQ ) using the clustering algorithm of SAINTETIQ to optimize

the high-level answers. The optimization requires post-processing and therefore, it incurs overhead time

cost. Thus, we finally develop an efficient and effective algorithm ( ESRA , i.e., ES-Rearrange Algorithm)

that rearranges answers based on the hierarchical structure of the pre-computed summary hierarchy,

such that no post-processing task (but the query evaluation itself) have to be performed at query time.

The rest of this section is organized as follows. First, we present the SAINTETIQ model and its

properties and we illustrate the process with a toy example. Then, in Section 3.2 we detail the use of

SAINTETIQ outputs in a query processing and we describe the formulation of queries and the retrieval

of clusters. Thereafter, we discuss in Section 3.3 how such results help facing the many-answers prob-

lem. The algorithm that addresses the problem of dissimilarity (discrimination) between the starting

point answers by rearranging them is presented in Section 3.4. Section 3.5 discusses an extension of

the above process that allows every user to use her/his own vocabulary when querying the database. An

experimental study using real data is presented in Section 3.6.

3.1 Overview of the SAINTETIQ System

In this subsection, we first introduce the main ideas of SAINTETIQ (Raschia & Mouaddib, 2002; Saint-

Paul, Raschia, & Mouaddib, 2005). Then, we briefly discuss some other data clustering techniques, and

argue that SAINTETIQ is more suitable for interactive and exploratory data retrieval.

Advanced Database Query Systems

Search WWH ::

Custom Search

Home