A User-Centered Approach for Information Retrieval - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

where poly(i) is the polysemy (number of senses) of i. For example, the word music has five senses in

WordNet, so the probability that it is used to express a specific meaning is equal to 1/5.

Therefore, we build a representation of the retrieved Web pages using the DSN; each word in the

page which matches any of the terms in the DSN is a component of the document representation and

the links between them are the relations in the DSN.

sy Milarity Metric

Given a conceptual domain, in order to individuate the interesting pages by using a DSN, it is necessary

to define a grading system to assign a vote to the documents on the basis of their Syntactic and Semantic

content. Therefore, to measure the relevance of a given document we consider the Semantic relatedness

between terms and, using relevance feedback techniques, statistical information about them.

The proposed measure considers two types of information; one concerning syntactic information

based on the concepts of word frequency and term centrality and another one concerning the Semantic

component calculated on each set of words in the document. The relevance feedback techniques we

used take into account two types of feedback: explicit and blind feedback.

The first one is performed after the first results presentation. In fact, the system, using the metric

for ranking described below, presents to the user a result list and shows for each result the top 2 ranked

sentences from the related page. The top sentences are detected using the system metric on each sen-

tence in the document and ordering them. With this information the user can manually choose relevant

documents or he can open the whole page.

With the blind approach the user can allow the system to automatically perform the relevance feed-

back on a defined number of documents.

The first metric contribution is called the Syntactic-Semantic grade (SSG). In this chapter we propose

a new approach to calculate the SSG and compare it with the one proposed in Albanese, Picariello &

Rinaldi (2004); the metric proposed there represents our standard metric. We can define the relevance

of a word in a given conceptual domain and, if the feedback functions are chosen, in the set of selected

documents. Therefore we use a hybrid approach exploiting both statistical and Semantic information.

The statistical information is obtained by applying the relevance feedback technique described in Weiss,

Vélez & Sheldon (1996), and it is enriched with the Semantic information provided by computing the

centrality of the terms (Equation 1). In this way we divide the terms into classes, on the basis of their

centrality:









0.5 0.5

i k









max,

(2)

SSG

i k









∑

( )

0.5 0.5

i k









i k

∈









max,

where k is the k-th document, i is the i-th term, TF i,k is the term frequency of i in k, TF max,k is the maxi-

mum term frequency in k, i is the centrality of i.

We use this approach to improve the precision of the model of the domain of interest and to

overcome the lack of very specific terms in Wordnet (e.g. computer science specific terminology). Thus,

the use of relevance feedback re-weights and expands the Semantic network by adding new terms -not

Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Search WWH ::

Custom Search

Home