Information Technology Reference
In-Depth Information
of 0.2
and an
m
of 10,000, which produced an average precision of 0.8611, which was
significantly higher than the language model baseline of 0.5043 (
p
The highest performing language model was
tf
with a cross-entropy
ε
<
0
.
05) using
again an
m
of 10,000 for document models and with a cross entropy
of 0.99.
Rather interestingly,
tf
always outperformed
rm
,and
rm
's best performance had a
MAP of 0.7223 using an
ε
ε
of 0.1 and an
m
of 10,000.
6.5.1.2
Discussion
Of all parameter combinations, the
okapi
relevance feedback works best in com-
bination with a moderate sized word-window (
m
100) and with the
inquery
weighting scheme. It should be noted its performance is identical from a statistical
standpoint with
ponte
, but as both relevance feedback components are similar
and both use
inquery
comparison and
BM
25 weighing, and not surprisingly the
algorithms are very similar. Why would
inquery
and
BM
25 be the best performing?
The area of optimizing information retrieval is infamously a black art. In fact,
BM
25 and
inquery
combined present the height of heuristic-driven information
retrieval algorithms as explored in Robertson and Sparck Jones (1976). While its
performance increase over
lca
is well-known and not surprising, it is interesting that
BM
25 and
inquery
perform significantly better than the language model approach.
The answer is rather subtle. Another observation is in order; note that for vector
models,
inquery
always outperformed
cosine
, and that for language models
tf
always outperformed
rm
. Despite the differing frameworks of vector-space models
and language models, both
cosine
and
rm
share the common characteristic of
normalization. In essence, both
cosine
and
rm
normalize by documents:
cosine
nor-
malizes term frequencies per vector before comparing vectors, while
rm
constructs
a relevance model on a per-relevant document basis before creating the average
relevance model. In contrast,
inquery
and
tf
do not normalize:
inquery
compares
weighted term frequencies, and
tf
constructs a relevance model by combining all
the relevance documents and then creating the relevance model from the
raw pool
of all relevant document models.
Thus it appears the answer is that any kind of normalization by length of the
document hurts performance. The reason for this is likely because the text auto-
matically extracted from hypertext documents is 'messy,' being of low quality and
bursty, with highly varying document lengths. As observed informally earlier (Ding
and Finin 2006) and more formally later (Halpin 2009a), the amount of triples in
Semantic Web documents follow a power-law, so there are wildly varying document
lengths of both the relevance model and the document models. Due to these factors,
it is unwise to normalize the models, as that will almost certainly dampen the
effect of valuable features like crucial keywords (such as 'Paris' and 'tourist' in
disambiguating various 'eiffel'-related queries).
Then the reason
BM
25-based vector models in particular perform so well is
that, due to its heuristics, it is able to effectively keep track of a term's document
frequency and inverse document frequency accurately. Also, unlike most other
=
Search WWH ::
Custom Search