Information Technology Reference
In-Depth Information
indexing and retrieval of hypertext documents on the World Wide Web by search
engines like Google and Yahoo! Search. Our experimental hypothesis is that the
statistical semantics of sense created from Semantic Web documents can help
hypertext search and vice versa, and this can be empirically shown via the use of
relevance feedback.
On an aside, we realize that our reduction of 'semantic search' to keyword-based
information retrieval over the Semantic Web is very restrictive, as many people use
'semantic search' to mean simply search that relies on anything beyond surface
syntax, including the categorization of complex queries (Baeza-Yates and Tiberi
2007) and entity-recognition using Semantic Web ontologies (Guha et al. 2003).
We will not delve into an extended explanation of the diverse kinds of semantic
search, as surveys of this kind already exist (Mangold 2007). Yet given the relative
paucity of publicly accessible data-sets about the wider notion of semantics and the
need to start with a simple rather than complex paradigm, we will restrict ourselves
to the Semantic Web and assume a traditional, keyword-based ad-hoc information
retrieval paradigm for both kinds of search, leaving issues like complex queries and
natural language semantics for future research. Keyword search consisting of 1-2
terms should also be explored as it is the most common kind of query in today's
Web search regardless of whether any results from this experiment can generalize
to other kinds of semantic search (Silverstein et al. 1999). Until recently semantic
search suffered from a lack of a thorough and neutral Cranfield-style evaluation, and
so we carefully explain and employ the traditional information retrieval evaluation
frameworks in our experiment to evaluate semantic search. At the time of the
experiment, our evaluation was the first Cranfield-style evaluation for searching
on the Semantic Web. This evaluation later generalized into the annual 'Semantic
Search' competition, 1 which has since become a standard evaluation for search over
RDF data (Blanco et al. 2011b). However, our particular evaluation presented here
is still the only evaluation to determine relevance judgments over both hypertext and
RDF using the same set of queries.
In Sect. 6.2 we first elucidate the general nature of search from hypertext
documents to semantic search over Semantic Web documents. A general open-
domain collection of user queries from a real hypertext query-log were run against
the Semantic Web. Then human judges constructed a 'gold-standard' collection of
queries and results judged for relevance, from both the Semantic and hypertext Web.
Then in Sect. 6.3 we give a brief overview of information retrieval frameworks
and ranking algorithms. While this section may be of interest to Semantic Web
researchers unfamiliar with such techniques, information retrieval researchers may
wish to proceed immediately past this section. Our system is described in Sect. 6.4 .
In Sect. 6.5 , these techniques are applied to the 'gold standard' collection created in
Sect. 6.2 so that the best parameters and algorithms for relevance feedback for both
hypertext and semantic search can be determined. In Sects. 6.6 and 6.7 the effects
of using pseudo-feedback and Semantic Web inference are evaluated. The system
1 Sponsored by Yahoo! Research for both 2010 and 2011.
Search WWH ::




Custom Search