Practical Approaches to the Many-Answer Problem - Advanced Database Query Systems

Database Reference

In-Depth Information

where Re l denotes the set of relevant documents, Re

l

=

(

D

−

Re )

l

the set of irrelevant ones and

(Re | ) ) the probability of relevance (resp., non-relevance) of document d w.r.t.

the query q . The higher the ratio of the probability of relevance to non-relevance is w.r.t. a document d ,

then the more likely document d is to be relevant to a user query q .

In the above formula, the second equality is obtained us ing the Bayes formula, whereas the final

simplification follows from the fact that P(

P

(Re | ) (resp., P

l d

Re are the same for every document d and

thus are mere constant valu es that do not influence the ranking of documents.

Note that Re l and Re l are unknown at query time and consequently are estimated as accurately as

possible, on the basis of whatever data has been made available to the system for this purpose. The

usual techniques in Information Retriev al m ake some simplifying assumptions, such as estimating Re l

through user feedback, approximating Re l as D (since Re l is usually small compared to D ) and as-

suming some form of independence between query terms (e.g., the Binary Independence Model, the

Linked Dependence Model, or the Tree Dependence Model. INQUERY (Callan, Croft & Harding, 1992)

is an example of this model.

Re and P(

l)

l )

Link-Based Ranking Methods

The link-based ranking methods are based on how the pages on the Internet link to each other (Brin &

Page, 1998; Bharat & Henzinger, 1998; Kleinberg, 1999; Borodin, Roberts, Rosenthal & Tsaparas, 2001;

Ng, Zheng & Jordan, 2001). Indeed, the relevance of a page is not only decided by the page content, but

is also based on the linkage among pages. An example of a ranking algorithm based on link analysis is

the PageRank algorithm (Brin & Page, 1998) introduced by Google vii . PageRank computesWeb page

scores by exploiting the graph inferred from the link structure of the Web. Its underlying motivation is

that pages with many backlinks are more important than pages with only a few backlinks. So basically,

a page's rank in Google's search results is higher if many, preferably important, pages link to that page.

The higher the PageRank is, the more relevant the page is (according to Google). Another example of a

ranking algorithm using link analysis is the HITS viii algorithm (Kleinberg, 1999). The HITS algorithm

suggests that each page should have a separate 'authority' rating (based on the links going to the page)

and a 'hub' rating (based on the links going from the page). The intuition behind the algorithm is that

important hubs have links to important authorities and important authorities are linked by important hubs.

For more details and more general sources about Information Retrieval ranking methods, please refer

to (Manning, Raghavan & Schtze, 2008).

2.2.1.2 DB Ranking Techniques

Automated ranking of database query results has been extensively investigated in recent years. Examples

include (Chaudhuri & Das, 2003; Chaudhuri, Das, Hristidis & Weikum, 2004; Su, Wang, Huang &

Lochovsky, 2006; Wu, Faloutsos, Sycara & Payne, 2000; MacArthur, Brodley, Ka & Broderick, 2002).

These approaches take a user's query - which typically specify simple selection conditions on a small

set of attributes - and use diverse knowledge sources to automatically estimate the ranking of its results.

For instance, the systems proposed in (Chaudhuri & Das, 2003; Chaudhuri, Das, Hristidis & Weikum,

2004; Su, Wang, Huang & Lochovsky, 2006) use workload and/or database statistics while (Wu, Falout-

Advanced Database Query Systems

Search WWH ::

Custom Search

Home