Database Reference
In-Depth Information
Tree, etc. For a more detailed elaboration on multidimensional access methods and on the corresponding
query processing techniques, we refer the interested reader to (Volker & Oliver, 1998) and (Christian,
Stefan & Daniel, 2001).
While the approaches described in both previous subsections differ in their implementation details,
their overall goal is the same - allowing the database system to return answers, related to the failed query,
which is more convenient than returning nothing. In the following section, we review some works ad-
dressing the many-answers problem, i.e., the situation where the original query results in overabundant
answers.
2.2 Handling the Many-Answers Problem
Due to the ever increasing amount of information stored each day into databases, user's queries often
result in too many answers - many of them irrelevant to the user. This phenomenon is also commonly
referred to as 'information overload', as the user expends a huge amount of effort sifting through the
result set looking for interesting results.
The many-answers problem often stems from the specificity of the user query that is too general. A
solution to reduce the number of answers consists in narrowing the query by adding new conditions.
However, it is quite impossible to find conditions that can effectively restrict the retrieved data without
any knowledge of the database content. To circumvent this initial issue, several techniques have been
proposed, including automated ranking (Section 2.2.1) and clustering (Section 2.2.2) of query results.
2.2.1 Automated Ranking of Query Results
Automated ranking of query results is a popular aspect of the query model in Information Retrieval. It
consists in providing a user with a list ranked in descending order of relevance (though the user may not
have explicitly specified how) of either all query results or only the top-k subset, rather than thousands
of answers ordered in a completely uninformative way. In contrast, the Boolean query model used in
traditional database systems does not allow for any form of relevance ranking of the retrieved tuple set.
Indeed, all tuples that satisfy the query conditions are returned without any relevance distinction between
them. Thus, the result of any query is simply a partition of the database into a set of retrieved tuples,
and a set of not-retrieved tuples. It is only recently that top-k queries have been introduced (see (Ilyas,
Beskales & Soliman, 2008) for a survey). Top-k queries provide the user with the k most relevant results
of a given query ranked according to some scoring function. Consider a realtor database HouseDB with
information on houses for sale, including their Price, Size, Age, City, Zip, Location, SchoolDistrict, View,
#Bedrooms, #Bathrooms, Garage and BoatDock. The following query is an example of a top-k query:
SELECT Price AS p , Size AS s , Age AS a
FROM HouseDB
WHERE Zip = 75000
ORDER BY f(d Price (p), d Size (s), d Age (a))
LIMIT 5
where LIMIT limits the number of results reported to the user, d A (x) measures the extent (score) to which
the value x of attribute A is relevant to the user and f determines how to combine the ranking according
Search WWH ::




Custom Search