Information Technology Reference
In-Depth Information
of 0.2
and an m of 10,000, which produced an average precision of 0.8611, which was
significantly higher than the language model baseline of 0.5043 ( p
The highest performing language model was tf with a cross-entropy
ε
<
0
.
05) using
again an m of 10,000 for document models and with a cross entropy
of 0.99.
Rather interestingly, tf always outperformed rm ,and rm 's best performance had a
MAP of 0.7223 using an
ε
ε
of 0.1 and an m of 10,000.
6.5.1.2
Discussion
Of all parameter combinations, the okapi relevance feedback works best in com-
bination with a moderate sized word-window ( m
100) and with the inquery
weighting scheme. It should be noted its performance is identical from a statistical
standpoint with ponte , but as both relevance feedback components are similar
and both use inquery comparison and BM 25 weighing, and not surprisingly the
algorithms are very similar. Why would inquery and BM 25 be the best performing?
The area of optimizing information retrieval is infamously a black art. In fact,
BM 25 and inquery combined present the height of heuristic-driven information
retrieval algorithms as explored in Robertson and Sparck Jones (1976). While its
performance increase over lca is well-known and not surprising, it is interesting that
BM 25 and inquery perform significantly better than the language model approach.
The answer is rather subtle. Another observation is in order; note that for vector
models, inquery always outperformed cosine , and that for language models tf
always outperformed rm . Despite the differing frameworks of vector-space models
and language models, both cosine and rm share the common characteristic of
normalization. In essence, both cosine and rm normalize by documents: cosine nor-
malizes term frequencies per vector before comparing vectors, while rm constructs
a relevance model on a per-relevant document basis before creating the average
relevance model. In contrast, inquery and tf do not normalize: inquery compares
weighted term frequencies, and tf constructs a relevance model by combining all
the relevance documents and then creating the relevance model from the raw pool
of all relevant document models.
Thus it appears the answer is that any kind of normalization by length of the
document hurts performance. The reason for this is likely because the text auto-
matically extracted from hypertext documents is 'messy,' being of low quality and
bursty, with highly varying document lengths. As observed informally earlier (Ding
and Finin 2006) and more formally later (Halpin 2009a), the amount of triples in
Semantic Web documents follow a power-law, so there are wildly varying document
lengths of both the relevance model and the document models. Due to these factors,
it is unwise to normalize the models, as that will almost certainly dampen the
effect of valuable features like crucial keywords (such as 'Paris' and 'tourist' in
disambiguating various 'eiffel'-related queries).
Then the reason BM 25-based vector models in particular perform so well is
that, due to its heuristics, it is able to effectively keep track of a term's document
frequency and inverse document frequency accurately. Also, unlike most other
=
Search WWH ::




Custom Search