Information Technology Reference
In-Depth Information
Standard searching: The system ranks the results without relevance feedback;
Explicit feedback: The results interface allows choosing relevance documents as feedback;
Blind feedback: The search interface allows choosing relevance documents by the system.
The user interface allows the insertion of keywords and it also enables the setting of a certain num-
ber of parameters, namely: the search engines used in to fetch Web pages; the number of links to be
returned by the underlying search engines, the relative weights of title, description, keywords and body
tags, the relative weights of Syntactic-Semantic grade and Semantic grade.
eXPeri Ment al resul ts
The need for a suitable evaluation of information retrieval systems imposes the adoption of methodolo-
gies to give answers about why, what and how-to evaluate. Several authors give answers to these ques-
tions (Cleverdon, Mills & Keen, 1996; Vakkari & Hakala, 2000). The techniques used to measure the
effectiveness are often affected by the used retrieval strategy and the results presentation.
We use a test set collection to evaluate our system. A test collection is a set of documents, queries
and a list of relevant documents in the collection. We use it to compare the results of our system using
the ranking strategies described previously. It is important to have standard parameters for IR system
evaluation. For this reason we use precision and recall curves. Recall is the fraction of all relevant ma-
terial that is returned by a search; precision is a measure of the number of relevant documents in the
set of all documents returned by a search. We built the test set from the directory service of the search
engine yahoo (search.yahoo.com/dir). The directory service supplies the category referred to each Web
page. In this way we have a relevance assessment useful to compare our results. The test collection has
more then 800 pages retrieved using words with a high polysemic value so that the documents belong
to different categories. We choose keywords about both general and specific subjects. This class dis-
tinction is useful to measure the performance differences between the rank strategies using a general
knowledge base and adding relevance feedback.
In Ruthven and Lalmas (2003) there are some important considerations derived from the analysis
of references, criticising the use of the precision-recall measure for RF (Borlund & Ingwersen, 1997;
Chang, Cirillo & Razon, 1971; Frei, Meienberg & Schauble, 1991). In fact, using relevance feedback the
documents marked as relevant are pushed to the top of the result list improving artificially the recall-
precision curve (ranking effect) rather then taking into account the feedback effect, that is liable to push
to the top of the ranked list the unseen relevant documents.
The proposed alternatives to consider the feedback on the unseen relevant documents are:
Residual ranking: This strategy removes from the collection those items which were assessed
for relevance for feedback purposes, and it evaluates two runs (with or without feedback) on the
reduced collection.
Freezing: The documents, examined for relevance before feedback, are retained as the top-rank-
ing documents in the feedback run.
Test and control groups: The collection is randomly split into two collections: The test group
and the control group. Relevance feedback information is taken from the test group but the recall-
precision is performed only on the control group, so there is no ranking effect.
Search WWH ::




Custom Search