Using the Euclidean Distance for Retrieval Evaluation - Advances in Databases

Database Reference

In-Depth Information

Another finding is that average precision may not be as good as people thought

before, though our experimental results in this paper do not necessarily conflict

with that from previous research due to two reasons as follows: first, the method-

ologies we have taken are slightly different from those in previous research. Sec-

ond, the four metrics AP, RP, NDCG, and P10 used in our investigation are the

expanded forms for graded relevance judgment, while in previous research their

original form was used with binary relevance judgment. However, the experimen-

tal results reported in this paper provide some new evidence for the evaluation

and comparison of these commonly used metrics.

References

1. Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of the 24th

Annual International ACM SIGIR Conference, New Orleans, Louisiana, USA, pp.

276-284 (September 2001)

2. Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In:

Proceedings of ACM SIGIR Conference, Athens, Greece, pp. 33-40 (July 2000)

3. Calve, A.L., Savoy, J.: Database merging strategy based on logistic regression.

Information Processing & Management 36(3), 341-359 (2000)

4. Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of IR techniques.

ACM Transactions on Information Systems 20(4), 442-446 (2002)

5. Lee, J.H.: Analysis of multiple evidence combination. In: Proceedings of the 20th

Annual International ACM SIGIR Conference, Philadelphia, Pennsylvania, USA,

pp. 267-275 (July 1997)

6. Montague, M., Aslam, J.A.: Relevance score normalization for metasearch. In:

Proceedings of ACM CIKM Conference, Berkeley, USA, pp. 427-433 (November

2001)

7. Sakai, T.: Evaluating evaluation metrics based on the bootstrap. In: Proceedings

of ACM SIGIR Conference, Seattle, USA, pp. 525-532 (August 2006)

8. Sanderson, M., Zobel, J.: Information retrieval system evaluation: Effort, sensitiv-

ity, and reliability. In: Proceedings of ACM SIGIR Conference, Salvador, Brazil,

pp. 162-169 (August 2005)

9. Wu, S., Bi, Y., McClean, S.: Regression relevance models for data fusion. In:

Proceedings of the 18th International Workshop on Database and Expert Systems

Applications, Regensburg, Germany, pp. 264-268 (September 2007)

10. Wu, S., Bi, Y., Zeng, X.: Retrieval result presentation and evaluation. In: Bi, Y.,

Williams, M.-A. (eds.) KSEM 2010. LNCS, vol. 6291, pp. 125-136. Springer, Hei-

delberg (2010)

11. Wu, S., Crestani, F., Bi, Y.: Evaluating score normalization methods in data fusion.

In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182,

pp. 642-648. Springer, Heidelberg (2006)

12. Wu, S., McClean, S.: Evaluation of system measures for incomplete relevance judg-

ment in IR. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Chris-

tiansen, H. (eds.) FQAS 2006. LNCS (LNAI), vol. 4027, pp. 245-256. Springer,

Heidelberg (2006)

13. Zobel, J.: How reliable are the results of large-scale information retrieval exper-

iments. In: Proceedings of ACM SIGIR Conference, Melbourne, Australia, pp.

307-314 (August 1998)

Advances in Databases

Search WWH ::

Custom Search

Home