A User-Centered Approach for Information Retrieval - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

present in the DSN- from relevant documents. After the relevance feedback step, the system assigns a

i = 1 to the new terms thus considering them as important in the context.

The other contribution is based on a combination of the path length (l) between pairs of terms and

the depth (d) of their subsumer (i.e. the first common ancestor), expressed as the number of hops. The

correlation between terms constitutes the Semantic relatedness and it is computed through a nonlinear

function. The choice of a nonlinear function to express the Semantic relatedness between terms derives

from several considerations. The value of the length and the depth of a path, based on how they are

defined, may vary from 0 to infinity, while relatedness between two terms should be expressed as a

number in the [0,1] interval. In particular, when the path length decreases to 0 the relatedness should

monotonically increase to 1, while it should monotonically decrease to 0 when path length goes to infinity.

Also we need a scaling effect on the depth, because words in the upper levels of a Semantic hierarchy

express more general concepts than the words in a lower level. We use a non linear function for scaling

down the contribution of subsumers in an upper level and scaling up those in a lower one.

Given two words w 1 and w 2 , the length l of the path between w 1 and w 2 is computed using the DSN

and it is defined as:

j h w w

( , )

∑

(

)

(3)

l w w

min

where j spans over all the paths between w 1 and w 2 , h j (w 1 , w 2 ) is the number of hops in the j-th path and

σ i is the weight assigned to the i-th hop in the j-th path in respect to the hop linguistic property. As an

example, let us consider three concepts X, Y and Z and some possible f paths between them. The paths,

represented by arcs, are labelled with their linguistic properties σ and the concepts have a common

subsumer S having a distance of 8 levels from the WordNet root. Now suppose that

= and

0.8

t = , where σ i is the path between X and Z, σ j is the one between Y and Z and σ t is the path between

X and Y. In this case the best path is the one traversing Z with a value of l=1.58. The depth d of the sub-

sumer of w 1 and w 2 is also computed using WordNet. To this aim only the hyponymy and hyperonymy

relations (i.e. the IS-A hierarchy) are considered; d(w 1 , w 2 ) is computed as the number of hops from the

subsumer of w 1 and w 2 to the root of the hierarchy.

Given the above considerations, we selected an exponential function that satisfies the previously

discussed constraints; our choice is also supported by the studies of Shepard (1987), who demonstrated

that exponential-decay functions are a universal law in psychological science.

We can now introduce the definition of Semantic Grade (SeG), which extends a metric proposed in

Li, Bandar & Mclean (2003):

0.3

(

)

(

)

d w w

−

d w w

−

(

)

( )

∑

−

l w w

(4)

SeG

(

)

(

)

d w w

−

d w w

(

)

w w

where ν is the considered document, (w i , w j ) are the pairs of words in pre-processed document and α≥0

and β>0 are two scaling parameters whose values are experimentally defined.

This formula has been used in our previous work (Albanese, Picariello & Rinaldi, 2004) with good

results and its fine performance is highlighted in Varelas et al. (2005).

Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Search WWH ::

Custom Search

Home