Information Technology Reference
In-Depth Information
For both hypertext and Semantic search, there were 71 (18%) unresolved queries
that did not have any results. For the hypertext Web search, only 3 (2%) queries
were unresolved, while 68 (34%) of the queries were unresolved for the Semantic
Web. This simply means that the hypertext search engines almost always returned at
least one relevant result in the top 10, but that for the Semantic Web almost a third
of all queries did not return any relevant result in the top 10. This only means there
is much information that does not yet have a relevant form on the Semantic Web,
unless it is hidden by the perhaps sub-optimal ranking by FALCON-S.
Another question is how many queries had a relevant result as their top result? In
general, 197 queries (50%) had top-ranked relevant results over both Semantic Web
and hypertext search. While the hypertext Web search had 121 (61%) top-ranked
relevant results, the Semantic Web only had 76 (58%) top-ranked results. What is
more compelling for relevance feedback is the number of relevant results that were
not the top-ranked result. Again for both kinds of searches, there were 132 (33.0%)
queries where a relevant result was not in the top position of the returned results.
For the hypertext Web, there were 76 (39%) queries with a top non-relevant result.
Yet for the Semantic Web there were 56 (42%) queries that had a top non-relevant
result. So queries on the Semantic Web are morelikelytoturnupnorelevantresults
in the top 10. When a relevant query is returned in the top 10 results it is quite likely
that a non-relevant result will be in the top position for both the hypertext Web and
the Semantic Web.
6.3
Information Retrieval for Web Search
In our evaluation we tested two general kinds of information retrieval frameworks:
vector-space models and language models. In the vector-space model , document
models are considered to be vectors of terms (usually called 'words' as they are
usually, although not exclusively, from natural language, as we transform URIs
into 'pseudo-words') where the weighing function and query expansion have no
principled basis besides empirical results. Ranking is usually done via a comparison
using the cosine distance, a natural comparison metric between vectors. The key to
success with vector-space models tends to be the tuning of the parameters of their
weighing function. While fine-turning these parameters has led to much practical
success in information retrieval, the parameters have little formally-proven basis
but are instead based on common-sense heuristics like document length and average
document length.
Another approach, the language model approach, takes a formally principled and
probabilistic approach to determining the ranking and weighting function. Instead
of each document being considered some parametrized word-frequency vector, the
documents are each considered to be samples from an underlying probabilistic
language model M D ,ofwhich D itself is only a single observation. In this manner,
the query Q can itself also be considered a sample from a language model. In early
language modeling efforts the probability that the language model of a document
Search WWH ::




Custom Search