Database Reference
In-Depth Information
imprecise query 'Salary ≈ 40K€' gives us the precise query 'Salary = 40K€'. Then IQE computes the
similarity of q p to all queries in the workload. To estimate the similarity between two queries, IQE uses
the document Jaccard similarity metric over the answer sets of the two queries. A minimal similarity
threshold τ is used to prune the number of queries similar to q p . Finally, the answer to q i is the union of
the answers of the precise queries similar to q p , with each tuple in the union inheriting the similarity of
its generating precise query. Although IQE is a useful system, a workload containing past user queries
is required, which is unavailable for new online databases.
Nearest Neighbors
In the approach known as nearest neighbors (Roussopoulos, Kelley, & Vincent, 2003), database records
and queries are viewed as points (i.e., feature vectors) in a multidimensional space S with a metric M S
(e.g., the Euclidean distance). Here a typical query is given by an example (Moshé, 1975) and its result
set corresponds to the set of database records which are close to it according to M S . For instance, in
image databases, the user may pose a query asking for the images most similar to a given image. This
type of query is known as nearest neighbor query and it has been extensively studied in the past (Kevin,
Jonathan, Raghu & Uri, 1999).
The two most important types of nearest neighbor queries (NNQ) in databases are:
ε -Range Query. The user specifies a query object q S and a query radius ε . The system retrieves
all objects from the database DB S that have a distance from q not exceeding ε (Figure 4-(a)).
More formally, the result set RQ q e is defined as follows: RQ
q
= ∈
{
t DB M q t
|
( , )
e
}
e
S
k -Nearest Neighbor Query. The user specifies a query object q and the cardinality k of the result
set. The system retrieves the k objects from the database DB S that have the least distance from q
(Figure 4-(b)). More formally, the result set NN q is defined as follows:
∀ ∈
t NN
q
,
t
' DB - NN
q
,
M q t M q t
( , )
( , )
'
∀ ∈
k
k
S
S
A naive solution for answering a given NNQ query is to scan the entire database and test for each
object if it is currently among the results. Obviously, this solution is very expensive and not feasible for
a very large set of objects. Several multidimensional index structures, that enable to prune large parts
of the search space, were proposed. The most popular are R-Tree and its variants R*-tree, X-Tree, SS-
Figure 4. Nearest neighbor query types
Search WWH ::




Custom Search