Information Technology Reference
In-Depth Information
6.3.1.3
Ranking Function: Cosine and InQuery
The vector-space models have an intuitive ranking function in the form of cosine
measurements. In particular, the cosine ranking function is given by ( 6.2 ), for a
document D with query Q , where both D and Q contain q words, iterating over all
words.
q Q q D q
D
·
Q
cos
(
D
,
Q
)=
| =
q Q q
(6.2)
|
D
||
Q
q D q
The only question is whether or not the vectors should be normalized to have
a Euclidean weight of 1, and whether or not the query terms themselves should be
weighted. We investigate both options. The classical cosine is given as cosine ,which
normalizes the vector lengths and then proceeds to weight both the query terms and
the vector terms by BM 25. The version without normalization is called inquery after
the InQuery system (Allan et al. 2000). The inquery ranking function is the same
as cosine except without normalization each word in the query can be considered to
have uniform weighing.
6.3.1.4
Relevance Feedback Algorithms: Okapi, LCA, and Ponte
There are quite a few options on how to expand queries in a vector-space model.
One popular and straightforward method, first proposed by Rocchio (1971) and at
one point used by the Okapi system (Robertson et al. 1994), is to expand the query
by taking the average of the j total relevant document models R , with a document
D
R , and then simply replacing the query Q with the top m words from averaged
relevant document models. This process is given by ( 6.3 ) and is referred to as okapi :
1
j
D R D
okapi
(
Q
)=
(6.3)
Another state of the art query expansion technique is known as Local Content
Analysis ( lca ) (Xu and Croft 1996). Given a query Q with query terms q 1 ...
q k and
a set of results D and a set of relevant documents R ,then lca ranks every w
V by
( 6.4 ), where n is the size of the relevant documents R , id f w is the inverse document
frequency of word w ,and D q and D w are the frequencies of the words w and q
Q
in relevant document D
R .
0
r R D q D w id f q
1
/
log n
)= q Q
lca
(
w ; Q
.
1
+
log
(6.4)
1
/
id f w
After each word w
V has been ranked by lca , then the query expanded by LCA is
just the top m words given by lca . Local Content Analysis attempts to select words
 
Search WWH ::




Custom Search