Information Technology Reference
In-Depth Information
6.3.1.3
Ranking Function: Cosine and InQuery
The vector-space models have an intuitive ranking function in the form of cosine
measurements. In particular, the cosine ranking function is given by (
6.2
), for a
document
D
with query
Q
, where both
D
and
Q
contain
q
words, iterating over all
words.
q
Q
q
D
q
D
·
Q
∑
cos
(
D
,
Q
)=
|
=
∑
q
Q
q
(6.2)
|
D
||
Q
∑
q
D
q
The only question is whether or not the vectors should be normalized to have
a Euclidean weight of 1, and whether or not the query terms themselves should be
weighted. We investigate both options. The classical cosine is given as
cosine
,which
normalizes the vector lengths and then proceeds to weight both the query terms and
the vector terms by
BM
25. The version without normalization is called
inquery
after
the
InQuery
system (Allan et al. 2000). The
inquery
ranking function is the same
as
cosine
except without normalization each word in the query can be considered to
have uniform weighing.
6.3.1.4
Relevance Feedback Algorithms: Okapi, LCA, and Ponte
There are quite a few options on how to expand queries in a vector-space model.
One popular and straightforward method, first proposed by Rocchio (1971) and at
one point used by the
Okapi
system (Robertson et al. 1994), is to expand the query
by taking the average of the
j
total relevant document models
R
, with a document
D
R
, and then simply replacing the query
Q
with the top
m
words from averaged
relevant document models. This process is given by (
6.3
) and is referred to as
okapi
:
∈
1
j
D
∈
R
D
okapi
(
Q
)=
(6.3)
Another state of the art query expansion technique is known as
Local Content
Analysis
(
lca
) (Xu and Croft 1996). Given a query
Q
with query terms
q
1
...
q
k
and
a set of results
D
and a set of relevant documents
R
,then
lca
ranks every
w
V
by
(
6.4
), where
n
is the size of the relevant documents
R
,
id f
w
is the inverse document
frequency of word
w
,and
D
q
and
D
w
are the frequencies of the words
w
and
q
∈
∈
Q
in relevant document
D
∈
R
.
0
r
∈
R
D
q
D
w
id f
q
1
/
log
n
)=
q
∈
Q
lca
(
w
;
Q
.
1
+
log
(6.4)
1
/
id f
w
∈
After each word
w
V
has been ranked by
lca
, then the query expanded by LCA is
just the top
m
words given by
lca
. Local Content Analysis attempts to select words
Search WWH ::
Custom Search