Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

functional dependency to get the importance degree of the schema attributes in a database, according to

which the order of the relaxed attributes is specified. The data samples used to compute the importance

degree are also chosen randomly. Note that both Muslea et al's and Nambiar et al's approaches reduce

the search space considerably, but the output relaxed queries, while succeed, are not necessary XSSs.

2.1.2 Similarity Search

A similarity-based search uses a notion of non-exact matching. In other words, when available data

stored in the database do not exactly match a user's query, database records which best fit (i.e., which

are the most similar to) the query are retrieved and ranked according to their similarity to the query.

Consider a tourist looking for a hotel, with a rent price at 100€ a day, and located in the city center.

She/he will unfortunately fail to find such a hotel by means of the traditional database systems if the city

center does not have any hotel rented at that price. However, a similarity-based search system would

return the most similar hotels (i.e., hotels having attributes values very close to those specified in the

query) instead of the empty result set. In fact, the user might accept a hotel near the city center and the

rent price can also be a little lower or higher 100€ per day.

In the literature, several approaches have been proposed to enhance conventional databases with

similarity search capabilities, such as ARES (Ichikawa & Hirakawa, 1986), VAGUE (Motro, 1988), IQE

(Nambiar & Kambhampati, 2003) and Nearest-Neighbors (Roussopoulos, Kelley, & Vincent, 2003). The

former three approaches (ARES, VAGUE and IQE) make use of an explicit operator 'similar-to', which

extends the usual equality. The latter (Roussopoulos, Kelley, & Vincent, 2003) views the records in the

database as points in a multidimensional space and the queries about these records are transformed into

the queries over this set of points.

ARES

ARES iv (Ichikawa & Hirakawa, 1986) is the first system that has addressed the basic issue of similar-

ity matching. It introduces a new operator named 'similar-to' and denoted ≈, meaning 'approximately

equal to'. ≈ can be used as a comparison operator inside queries instead of the usual equality operator

(=) in order to express vague conditions, e.g., A ≈ v will select values of an attribute A that are similar

to a constant v . The interpretation of ≈ is based on dissimilarity relations tied to each domain. A dis-

similarity relation, DR A (A 1 ,A 2 ,Distance) , on the domain D A of attribute A contains triples of the form

(v 1 , v 2 , dist) , where v 1 D A , v 2 D A and dist represents the distance (dissimilarity) value between v 1 and v 2

(a smaller value means v 1 and v 2 are more similar). Table 1 illustrates an example dissimilarity relation

for the attribute Job of a relation EmpDB.

In a given query, which contains vague conditions (i.e., conditions involving the similarity operator

≈), the following process takes place. First of all, for each vague condition, the user gives a maximum

Table 1. Dissimilarity relation defined on the attribute Job

Job 1

Job 2

Distance

Student

Supervisor

…

PhD. Student

Supervisor

PhD. Student

…

1

3

2

…

Advanced Database Query Systems

Search WWH ::

Custom Search

Home