The Semantics of Search - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

6.3.2.4

Ranking Function: Cross Entropy

We now want to re-compute the retrieval score of document D basedonthe

estimated language model of the relevant class u R . What is needed is a principled

way of comparing a relevance model u R against a document language model u D .

One way of comparing probability that has shown the best performance in empirical

information retrieval research (Lavrenko 2008) is cross entropy. Intuitively, cross

entropy is an information-theoretic measure that measures the average number of

bits needed to identify the probability of distribution p being generated if p was

encoded using given probability distribution p rather than q itself. For the discrete

case this is defined as:

)= − ∑ x

H

(

p

,

q

p

(

x

)

log

(

q

(

x

))

(6.10)

q ,

then the two models can be compared directly using cross-entropy, as shown in

( 6.11 ). This use of cross entropy also fulfills the Probability Ranking Principle and

so is directly comparable to vector-space ranking via cosine (Lavrenko 2008).

If one considers that the u R =

p and that document model distribution u D =

u D )= w ∈ V u R ( w ) log u D ( w )

−

H

(

u R ||

(6.11)

Note that either the averaged relevance model u R , avg or the concatenated

relevance model u R , con can be used in ( 6.11 ). We refer to the former as rm and

to the latter as tf in the following experiments.

6.4

System Description

We present a novel system that uses the same underlying information retrieval

system on both hypertext and Semantic Web data so that relevance feedback can be

done in a principled manner from both sources of data with language models. In our

system, the query is run first against the hypertext Web and relevant hypertext results

are then used to expand a Semantic Web search query with terms from resulting

hypertext web-pages. The expanded query is then run against the Semantic Web,

resulting in a different ranking of results than the non-expanded query. We can also

then run the process backwards, using relevant Semantic Web data as relevance

feedback to improve hypertext Web search.

This process is described using pseudo-code in Fig. 6.7 where the set of all

queries to be run on the system is given by the QuerySet parameter. The two

different kinds of relevance feedback are given by the SearchType parameter,

with SearchType=RDF for searching over RDF data using HTML documents as

data for relevance feedback-based query expansion, and HTML for searching over

HTML documents with RDF as the data for relevance-feedback query expansion.

Social Semantics: The Search for Meaning on the Web

Search WWH ::

Custom Search

Home