Information Technology Reference
In-Depth Information
6.3.2.4
Ranking Function: Cross Entropy
We now want to re-compute the retrieval score of document D basedonthe
estimated language model of the relevant class u R . What is needed is a principled
way of comparing a relevance model u R against a document language model u D .
One way of comparing probability that has shown the best performance in empirical
information retrieval research (Lavrenko 2008) is cross entropy. Intuitively, cross
entropy is an information-theoretic measure that measures the average number of
bits needed to identify the probability of distribution p being generated if p was
encoded using given probability distribution p rather than q itself. For the discrete
case this is defined as:
)= x
H
(
p
,
q
p
(
x
)
log
(
q
(
x
))
(6.10)
q ,
then the two models can be compared directly using cross-entropy, as shown in
( 6.11 ). This use of cross entropy also fulfills the Probability Ranking Principle and
so is directly comparable to vector-space ranking via cosine (Lavrenko 2008).
If one considers that the u R =
p and that document model distribution u D =
u D )= w V u R ( w ) log u D ( w )
H
(
u R ||
(6.11)
Note that either the averaged relevance model u R , avg or the concatenated
relevance model u R , con can be used in ( 6.11 ). We refer to the former as rm and
to the latter as tf in the following experiments.
6.4
System Description
We present a novel system that uses the same underlying information retrieval
system on both hypertext and Semantic Web data so that relevance feedback can be
done in a principled manner from both sources of data with language models. In our
system, the query is run first against the hypertext Web and relevant hypertext results
are then used to expand a Semantic Web search query with terms from resulting
hypertext web-pages. The expanded query is then run against the Semantic Web,
resulting in a different ranking of results than the non-expanded query. We can also
then run the process backwards, using relevant Semantic Web data as relevance
feedback to improve hypertext Web search.
This process is described using pseudo-code in Fig. 6.7 where the set of all
queries to be run on the system is given by the QuerySet parameter. The two
different kinds of relevance feedback are given by the SearchType parameter,
with SearchType=RDF for searching over RDF data using HTML documents as
data for relevance feedback-based query expansion, and HTML for searching over
HTML documents with RDF as the data for relevance-feedback query expansion.
Search WWH ::




Custom Search