Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

Tree, etc. For a more detailed elaboration on multidimensional access methods and on the corresponding

query processing techniques, we refer the interested reader to (Volker & Oliver, 1998) and (Christian,

Stefan & Daniel, 2001).

While the approaches described in both previous subsections differ in their implementation details,

their overall goal is the same - allowing the database system to return answers, related to the failed query,

which is more convenient than returning nothing. In the following section, we review some works ad-

dressing the many-answers problem, i.e., the situation where the original query results in overabundant

answers.

2.2 Handling the Many-Answers Problem

Due to the ever increasing amount of information stored each day into databases, user's queries often

result in too many answers - many of them irrelevant to the user. This phenomenon is also commonly

referred to as 'information overload', as the user expends a huge amount of effort sifting through the

result set looking for interesting results.

The many-answers problem often stems from the specificity of the user query that is too general. A

solution to reduce the number of answers consists in narrowing the query by adding new conditions.

However, it is quite impossible to find conditions that can effectively restrict the retrieved data without

any knowledge of the database content. To circumvent this initial issue, several techniques have been

proposed, including automated ranking (Section 2.2.1) and clustering (Section 2.2.2) of query results.

2.2.1 Automated Ranking of Query Results

Automated ranking of query results is a popular aspect of the query model in Information Retrieval. It

consists in providing a user with a list ranked in descending order of relevance (though the user may not

have explicitly specified how) of either all query results or only the top-k subset, rather than thousands

of answers ordered in a completely uninformative way. In contrast, the Boolean query model used in

traditional database systems does not allow for any form of relevance ranking of the retrieved tuple set.

Indeed, all tuples that satisfy the query conditions are returned without any relevance distinction between

them. Thus, the result of any query is simply a partition of the database into a set of retrieved tuples,

and a set of not-retrieved tuples. It is only recently that top-k queries have been introduced (see (Ilyas,

Beskales & Soliman, 2008) for a survey). Top-k queries provide the user with the k most relevant results

of a given query ranked according to some scoring function. Consider a realtor database HouseDB with

information on houses for sale, including their Price, Size, Age, City, Zip, Location, SchoolDistrict, View,

#Bedrooms, #Bathrooms, Garage and BoatDock. The following query is an example of a top-k query:

SELECT Price AS p , Size AS s , Age AS a

FROM HouseDB

WHERE Zip = 75000

ORDER BY f(d Price (p), d Size (s), d Age (a))

LIMIT 5

where LIMIT limits the number of results reported to the user, d A (x) measures the extent (score) to which

the value x of attribute A is relevant to the user and f determines how to combine the ranking according

Search WWH ::

Custom Search

Home