Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

imprecise query 'Salary ≈ 40K€' gives us the precise query 'Salary = 40K€'. Then IQE computes the

similarity of q p to all queries in the workload. To estimate the similarity between two queries, IQE uses

the document Jaccard similarity metric over the answer sets of the two queries. A minimal similarity

threshold τ is used to prune the number of queries similar to q p . Finally, the answer to q i is the union of

the answers of the precise queries similar to q p , with each tuple in the union inheriting the similarity of

its generating precise query. Although IQE is a useful system, a workload containing past user queries

is required, which is unavailable for new online databases.

Nearest Neighbors

In the approach known as nearest neighbors (Roussopoulos, Kelley, & Vincent, 2003), database records

and queries are viewed as points (i.e., feature vectors) in a multidimensional space S with a metric M S

(e.g., the Euclidean distance). Here a typical query is given by an example (Moshé, 1975) and its result

set corresponds to the set of database records which are close to it according to M S . For instance, in

image databases, the user may pose a query asking for the images most similar to a given image. This

type of query is known as nearest neighbor query and it has been extensively studied in the past (Kevin,

Jonathan, Raghu & Uri, 1999).

The two most important types of nearest neighbor queries (NNQ) in databases are:

•

ε -Range Query. The user specifies a query object q S and a query radius ε . The system retrieves

all objects from the database DB S that have a distance from q not exceeding ε (Figure 4-(a)).

More formally, the result set RQ q e is defined as follows: RQ

q

= ∈

{

t DB M q t

|

( , )

≤

e

}

e

S

•

k -Nearest Neighbor Query. The user specifies a query object q and the cardinality k of the result

set. The system retrieves the k objects from the database DB S that have the least distance from q

(Figure 4-(b)). More formally, the result set NN q is defined as follows:

∀ ∈

t NN

q

,

t

' DB - NN

q

,

M q t M q t

( , )

'

∀ ∈

k

S

A naive solution for answering a given NNQ query is to scan the entire database and test for each

object if it is currently among the results. Obviously, this solution is very expensive and not feasible for

a very large set of objects. Several multidimensional index structures, that enable to prune large parts

of the search space, were proposed. The most popular are R-Tree and its variants R*-tree, X-Tree, SS-

Figure 4. Nearest neighbor query types

Advanced Database Query Systems

Search WWH ::

Custom Search

Home