Database Reference
In-Depth Information
accepted distance value. ARES then accesses dissimilarity relations to produce a boolean query which
will be processed by a conventional database system. For example, the vague condition A ≈ v is trans-
formed to the boolean one A{x D A | (v, x, dist) DR A dist ≤ τ} , where τ is the maximum allowed distance
given by the user on D A . In other words, x and v are considered somewhat close as far as dist ≤ τ . The
produced query will then select acceptable tuples for which a global distance is calculated, by summing
up the elementary distances tied to each vague condition in the query. Finally, the tuples are sorted in
ascending order according to their global distance (dissimilarity) values and the system will output as
many tuples as possible within the limit that has been specified by the user.
The main drawback of ARES is its high storage and maintenance costs of dissimilarity relations: each
dissimilarity relation needs m 2 entries with respect to m different attribute values in the corresponding
conventional relation; and when a new attribute value is added, 2m + 1 additional entries are necessary for
the corresponding dissimilarity relation. Moreover, ARES does not allow defining dissimilarity between
attribute values for infinite domains because the dissimilarities can only be defined by means of tables.
VAGUE
VAGUE v (Motro, 1988) is a system that resembles ARES in its overall goals. It is an extension to the
relational data model with data metrics and the SQL language with a comparator ≈. Indeed, each attri-
bute domain D is endowed with a metric M D to define distance (dissimilarity) between its values. M D is
a mapping from the cartesian product D×D to the set of non-negative reals which is:
reflexive, , i.e., M D (x, x) = 0 , for every value x in D ;
symmetric , i.e., M D (x, y) = M D (y, x) , for all values x and y in D ; and
transitive , i.e., M D (x, y) ≤ M D (x, z) + M D (z, y) , for all values x , y and z in D .
Furthermore, M D is provided with a radius r . This notion is very similar to the maximum dissimilarity
allowed in ARES. Thus, two values v 1 and v 2 in D are considered to be similar if M D (v 1 , v 2 ) ≤ r . During
query processing, each vague condition expressed in the query is translated (in a similar way to ARES)
into a boolean one using the appropriate metric and the resulting query is used to select tuples. Then, an
ordering process takes place, relying on the calculation of distances (by means of associated metrics) for
the elementary vague conditions. The global distance attached to a selected tuple in case of a disjunctive
query is the smallest of distances related to each vague condition. For conjunctive queries, the global
distance is obtained as the root of the sum of the squares (i.e., the Euclidean distance) of distances tied
to each vague condition.
Note that in VAGUE, the users cannot provide their own similarity thresholds for each vague condition
but when a vague query does not match any data, VAGUE doubles all searching radii simultaneously.
Thus the search performance can be considerably deteriorated.
IQE
Another system that supports similarity-based search over relational databases is the IQE vi system
(Nambiar & Kambhampati, 2003). IQE converts the imprecise query (i.e., conditions involving the
similarity operator ≈) into equivalent precise queries that appear in an existing query workload. Such
queries are then used to answer the user given imprecise query. More precisely, given the workload, the
main idea of IQE is to map the user's imprecise query q i to a precise query q p by tightening the opera-
tor in the query condition. For example, tightening the operator 'similar-to' (≈) to 'equal-to' (=) in the
Search WWH ::




Custom Search