Database Reference
In-Depth Information
functional dependency to get the importance degree of the schema attributes in a database, according to
which the order of the relaxed attributes is specified. The data samples used to compute the importance
degree are also chosen randomly. Note that both Muslea et al's and Nambiar et al's approaches reduce
the search space considerably, but the output relaxed queries, while succeed, are not necessary XSSs.
2.1.2 Similarity Search
A similarity-based search uses a notion of non-exact matching. In other words, when available data
stored in the database do not exactly match a user's query, database records which best fit (i.e., which
are the most similar to) the query are retrieved and ranked according to their similarity to the query.
Consider a tourist looking for a hotel, with a rent price at 100€ a day, and located in the city center.
She/he will unfortunately fail to find such a hotel by means of the traditional database systems if the city
center does not have any hotel rented at that price. However, a similarity-based search system would
return the most similar hotels (i.e., hotels having attributes values very close to those specified in the
query) instead of the empty result set. In fact, the user might accept a hotel near the city center and the
rent price can also be a little lower or higher 100€ per day.
In the literature, several approaches have been proposed to enhance conventional databases with
similarity search capabilities, such as ARES (Ichikawa & Hirakawa, 1986), VAGUE (Motro, 1988), IQE
(Nambiar & Kambhampati, 2003) and Nearest-Neighbors (Roussopoulos, Kelley, & Vincent, 2003). The
former three approaches (ARES, VAGUE and IQE) make use of an explicit operator 'similar-to', which
extends the usual equality. The latter (Roussopoulos, Kelley, & Vincent, 2003) views the records in the
database as points in a multidimensional space and the queries about these records are transformed into
the queries over this set of points.
ARES
ARES iv (Ichikawa & Hirakawa, 1986) is the first system that has addressed the basic issue of similar-
ity matching. It introduces a new operator named 'similar-to' and denoted ≈, meaning 'approximately
equal to'. ≈ can be used as a comparison operator inside queries instead of the usual equality operator
(=) in order to express vague conditions, e.g., A ≈ v will select values of an attribute A that are similar
to a constant v . The interpretation of ≈ is based on dissimilarity relations tied to each domain. A dis-
similarity relation, DR A (A 1 ,A 2 ,Distance) , on the domain D A of attribute A contains triples of the form
(v 1 , v 2 , dist) , where v 1 D A , v 2 D A and dist represents the distance (dissimilarity) value between v 1 and v 2
(a smaller value means v 1 and v 2 are more similar). Table 1 illustrates an example dissimilarity relation
for the attribute Job of a relation EmpDB.
In a given query, which contains vague conditions (i.e., conditions involving the similarity operator
≈), the following process takes place. First of all, for each vague condition, the user gives a maximum
Table 1. Dissimilarity relation defined on the attribute Job
Job 1
Job 2
Distance
Student
Student
Supervisor
PhD. Student
Supervisor
PhD. Student
1
3
2
Search WWH ::




Custom Search