Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

accepted distance value. ARES then accesses dissimilarity relations to produce a boolean query which

will be processed by a conventional database system. For example, the vague condition A ≈ v is trans-

formed to the boolean one A{x D A | (v, x, dist) DR A ∧ dist ≤ τ} , where τ is the maximum allowed distance

given by the user on D A . In other words, x and v are considered somewhat close as far as dist ≤ τ . The

produced query will then select acceptable tuples for which a global distance is calculated, by summing

up the elementary distances tied to each vague condition in the query. Finally, the tuples are sorted in

ascending order according to their global distance (dissimilarity) values and the system will output as

many tuples as possible within the limit that has been specified by the user.

The main drawback of ARES is its high storage and maintenance costs of dissimilarity relations: each

dissimilarity relation needs m 2 entries with respect to m different attribute values in the corresponding

conventional relation; and when a new attribute value is added, 2m + 1 additional entries are necessary for

the corresponding dissimilarity relation. Moreover, ARES does not allow defining dissimilarity between

attribute values for infinite domains because the dissimilarities can only be defined by means of tables.

VAGUE

VAGUE v (Motro, 1988) is a system that resembles ARES in its overall goals. It is an extension to the

relational data model with data metrics and the SQL language with a comparator ≈. Indeed, each attri-

bute domain D is endowed with a metric M D to define distance (dissimilarity) between its values. M D is

a mapping from the cartesian product D×D to the set of non-negative reals which is:

•

reflexive, , i.e., M D (x, x) = 0 , for every value x in D ;

•

symmetric , i.e., M D (x, y) = M D (y, x) , for all values x and y in D ; and

•

transitive , i.e., M D (x, y) ≤ M D (x, z) + M D (z, y) , for all values x , y and z in D .

Furthermore, M D is provided with a radius r . This notion is very similar to the maximum dissimilarity

allowed in ARES. Thus, two values v 1 and v 2 in D are considered to be similar if M D (v 1 , v 2 ) ≤ r . During

query processing, each vague condition expressed in the query is translated (in a similar way to ARES)

into a boolean one using the appropriate metric and the resulting query is used to select tuples. Then, an

ordering process takes place, relying on the calculation of distances (by means of associated metrics) for

the elementary vague conditions. The global distance attached to a selected tuple in case of a disjunctive

query is the smallest of distances related to each vague condition. For conjunctive queries, the global

distance is obtained as the root of the sum of the squares (i.e., the Euclidean distance) of distances tied

to each vague condition.

Note that in VAGUE, the users cannot provide their own similarity thresholds for each vague condition

but when a vague query does not match any data, VAGUE doubles all searching radii simultaneously.

Thus the search performance can be considerably deteriorated.

IQE

Another system that supports similarity-based search over relational databases is the IQE vi system

(Nambiar & Kambhampati, 2003). IQE converts the imprecise query (i.e., conditions involving the

similarity operator ≈) into equivalent precise queries that appear in an existing query workload. Such

queries are then used to answer the user given imprecise query. More precisely, given the workload, the

main idea of IQE is to map the user's imprecise query q i to a precise query q p by tightening the opera-

tor in the query condition. For example, tightening the operator 'similar-to' (≈) to 'equal-to' (=) in the

Advanced Database Query Systems

Search WWH ::

Custom Search

Home