The Semantics of Search - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

of 0.2

and an m of 10,000, which produced an average precision of 0.8611, which was

significantly higher than the language model baseline of 0.5043 ( p

The highest performing language model was tf with a cross-entropy

ε

<

0

.

05) using

again an m of 10,000 for document models and with a cross entropy

of 0.99.

Rather interestingly, tf always outperformed rm ,and rm 's best performance had a

MAP of 0.7223 using an

ε

of 0.1 and an m of 10,000.

6.5.1.2

Discussion

Of all parameter combinations, the okapi relevance feedback works best in com-

bination with a moderate sized word-window ( m

100) and with the inquery

weighting scheme. It should be noted its performance is identical from a statistical

standpoint with ponte , but as both relevance feedback components are similar

and both use inquery comparison and BM 25 weighing, and not surprisingly the

algorithms are very similar. Why would inquery and BM 25 be the best performing?

The area of optimizing information retrieval is infamously a black art. In fact,

BM 25 and inquery combined present the height of heuristic-driven information

retrieval algorithms as explored in Robertson and Sparck Jones (1976). While its

performance increase over lca is well-known and not surprising, it is interesting that

BM 25 and inquery perform significantly better than the language model approach.

The answer is rather subtle. Another observation is in order; note that for vector

models, inquery always outperformed cosine , and that for language models tf

always outperformed rm . Despite the differing frameworks of vector-space models

and language models, both cosine and rm share the common characteristic of

normalization. In essence, both cosine and rm normalize by documents: cosine nor-

malizes term frequencies per vector before comparing vectors, while rm constructs

a relevance model on a per-relevant document basis before creating the average

relevance model. In contrast, inquery and tf do not normalize: inquery compares

weighted term frequencies, and tf constructs a relevance model by combining all

the relevance documents and then creating the relevance model from the raw pool

of all relevant document models.

Thus it appears the answer is that any kind of normalization by length of the

document hurts performance. The reason for this is likely because the text auto-

matically extracted from hypertext documents is 'messy,' being of low quality and

bursty, with highly varying document lengths. As observed informally earlier (Ding

and Finin 2006) and more formally later (Halpin 2009a), the amount of triples in

Semantic Web documents follow a power-law, so there are wildly varying document

lengths of both the relevance model and the document models. Due to these factors,

it is unwise to normalize the models, as that will almost certainly dampen the

effect of valuable features like crucial keywords (such as 'Paris' and 'tourist' in

disambiguating various 'eiffel'-related queries).

Then the reason BM 25-based vector models in particular perform so well is

that, due to its heuristics, it is able to effectively keep track of a term's document

frequency and inverse document frequency accurately. Also, unlike most other

=

Social Semantics: The Search for Meaning on the Web

Search WWH ::

Custom Search

Home