Database Reference
In-Depth Information
where Re l denotes the set of relevant documents, Re
l
=
(
D
Re )
l
the set of irrelevant ones and
(Re | ) ) the probability of relevance (resp., non-relevance) of document d w.r.t.
the query q . The higher the ratio of the probability of relevance to non-relevance is w.r.t. a document d ,
then the more likely document d is to be relevant to a user query q .
In the above formula, the second equality is obtained us ing the Bayes formula, whereas the final
simplification follows from the fact that P(
P
(Re | ) (resp., P
l d
l d
Re are the same for every document d and
thus are mere constant valu es that do not influence the ranking of documents.
Note that Re l and Re l are unknown at query time and consequently are estimated as accurately as
possible, on the basis of whatever data has been made available to the system for this purpose. The
usual techniques in Information Retriev al m ake some simplifying assumptions, such as estimating Re l
through user feedback, approximating Re l as D (since Re l is usually small compared to D ) and as-
suming some form of independence between query terms (e.g., the Binary Independence Model, the
Linked Dependence Model, or the Tree Dependence Model. INQUERY (Callan, Croft & Harding, 1992)
is an example of this model.
Re and P(
l)
l )
Link-Based Ranking Methods
The link-based ranking methods are based on how the pages on the Internet link to each other (Brin &
Page, 1998; Bharat & Henzinger, 1998; Kleinberg, 1999; Borodin, Roberts, Rosenthal & Tsaparas, 2001;
Ng, Zheng & Jordan, 2001). Indeed, the relevance of a page is not only decided by the page content, but
is also based on the linkage among pages. An example of a ranking algorithm based on link analysis is
the PageRank algorithm (Brin & Page, 1998) introduced by Google vii . PageRank computesWeb page
scores by exploiting the graph inferred from the link structure of the Web. Its underlying motivation is
that pages with many backlinks are more important than pages with only a few backlinks. So basically,
a page's rank in Google's search results is higher if many, preferably important, pages link to that page.
The higher the PageRank is, the more relevant the page is (according to Google). Another example of a
ranking algorithm using link analysis is the HITS viii algorithm (Kleinberg, 1999). The HITS algorithm
suggests that each page should have a separate 'authority' rating (based on the links going to the page)
and a 'hub' rating (based on the links going from the page). The intuition behind the algorithm is that
important hubs have links to important authorities and important authorities are linked by important hubs.
For more details and more general sources about Information Retrieval ranking methods, please refer
to (Manning, Raghavan & Schtze, 2008).
2.2.1.2 DB Ranking Techniques
Automated ranking of database query results has been extensively investigated in recent years. Examples
include (Chaudhuri & Das, 2003; Chaudhuri, Das, Hristidis & Weikum, 2004; Su, Wang, Huang &
Lochovsky, 2006; Wu, Faloutsos, Sycara & Payne, 2000; MacArthur, Brodley, Ka & Broderick, 2002).
These approaches take a user's query - which typically specify simple selection conditions on a small
set of attributes - and use diverse knowledge sources to automatically estimate the ranking of its results.
For instance, the systems proposed in (Chaudhuri & Das, 2003; Chaudhuri, Das, Hristidis & Weikum,
2004; Su, Wang, Huang & Lochovsky, 2006) use workload and/or database statistics while (Wu, Falout-
 
Search WWH ::




Custom Search