A User-Centered Approach for Information Retrieval - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

•

Standard searching: The system ranks the results without relevance feedback;

•

Explicit feedback: The results interface allows choosing relevance documents as feedback;

•

Blind feedback: The search interface allows choosing relevance documents by the system.

The user interface allows the insertion of keywords and it also enables the setting of a certain num-

ber of parameters, namely: the search engines used in to fetch Web pages; the number of links to be

returned by the underlying search engines, the relative weights of title, description, keywords and body

tags, the relative weights of Syntactic-Semantic grade and Semantic grade.

eXPeri Ment al resul ts

The need for a suitable evaluation of information retrieval systems imposes the adoption of methodolo-

gies to give answers about why, what and how-to evaluate. Several authors give answers to these ques-

tions (Cleverdon, Mills & Keen, 1996; Vakkari & Hakala, 2000). The techniques used to measure the

effectiveness are often affected by the used retrieval strategy and the results presentation.

We use a test set collection to evaluate our system. A test collection is a set of documents, queries

and a list of relevant documents in the collection. We use it to compare the results of our system using

the ranking strategies described previously. It is important to have standard parameters for IR system

evaluation. For this reason we use precision and recall curves. Recall is the fraction of all relevant ma-

terial that is returned by a search; precision is a measure of the number of relevant documents in the

set of all documents returned by a search. We built the test set from the directory service of the search

engine yahoo (search.yahoo.com/dir). The directory service supplies the category referred to each Web

page. In this way we have a relevance assessment useful to compare our results. The test collection has

more then 800 pages retrieved using words with a high polysemic value so that the documents belong

to different categories. We choose keywords about both general and specific subjects. This class dis-

tinction is useful to measure the performance differences between the rank strategies using a general

knowledge base and adding relevance feedback.

In Ruthven and Lalmas (2003) there are some important considerations derived from the analysis

of references, criticising the use of the precision-recall measure for RF (Borlund & Ingwersen, 1997;

Chang, Cirillo & Razon, 1971; Frei, Meienberg & Schauble, 1991). In fact, using relevance feedback the

documents marked as relevant are pushed to the top of the result list improving artificially the recall-

precision curve (ranking effect) rather then taking into account the feedback effect, that is liable to push

to the top of the ranked list the unseen relevant documents.

The proposed alternatives to consider the feedback on the unseen relevant documents are:

•

Residual ranking: This strategy removes from the collection those items which were assessed

for relevance for feedback purposes, and it evaluates two runs (with or without feedback) on the

reduced collection.

•

Freezing: The documents, examined for relevance before feedback, are retained as the top-rank-

ing documents in the feedback run.

•

Test and control groups: The collection is randomly split into two collections: The test group

and the control group. Relevance feedback information is taken from the test group but the recall-

precision is performed only on the control group, so there is no ranking effect.

Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Search WWH ::

Custom Search

Home