The Semantics of Search - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

indexing and retrieval of hypertext documents on the World Wide Web by search

engines like Google and Yahoo! Search. Our experimental hypothesis is that the

statistical semantics of sense created from Semantic Web documents can help

hypertext search and vice versa, and this can be empirically shown via the use of

relevance feedback.

On an aside, we realize that our reduction of 'semantic search' to keyword-based

information retrieval over the Semantic Web is very restrictive, as many people use

'semantic search' to mean simply search that relies on anything beyond surface

syntax, including the categorization of complex queries (Baeza-Yates and Tiberi

2007) and entity-recognition using Semantic Web ontologies (Guha et al. 2003).

We will not delve into an extended explanation of the diverse kinds of semantic

search, as surveys of this kind already exist (Mangold 2007). Yet given the relative

paucity of publicly accessible data-sets about the wider notion of semantics and the

need to start with a simple rather than complex paradigm, we will restrict ourselves

to the Semantic Web and assume a traditional, keyword-based ad-hoc information

retrieval paradigm for both kinds of search, leaving issues like complex queries and

natural language semantics for future research. Keyword search consisting of 1-2

terms should also be explored as it is the most common kind of query in today's

Web search regardless of whether any results from this experiment can generalize

to other kinds of semantic search (Silverstein et al. 1999). Until recently semantic

search suffered from a lack of a thorough and neutral Cranfield-style evaluation, and

so we carefully explain and employ the traditional information retrieval evaluation

frameworks in our experiment to evaluate semantic search. At the time of the

experiment, our evaluation was the first Cranfield-style evaluation for searching

on the Semantic Web. This evaluation later generalized into the annual 'Semantic

Search' competition, 1 which has since become a standard evaluation for search over

RDF data (Blanco et al. 2011b). However, our particular evaluation presented here

is still the only evaluation to determine relevance judgments over both hypertext and

RDF using the same set of queries.

In Sect. 6.2 we first elucidate the general nature of search from hypertext

documents to semantic search over Semantic Web documents. A general open-

domain collection of user queries from a real hypertext query-log were run against

the Semantic Web. Then human judges constructed a 'gold-standard' collection of

queries and results judged for relevance, from both the Semantic and hypertext Web.

Then in Sect. 6.3 we give a brief overview of information retrieval frameworks

and ranking algorithms. While this section may be of interest to Semantic Web

researchers unfamiliar with such techniques, information retrieval researchers may

wish to proceed immediately past this section. Our system is described in Sect. 6.4 .

In Sect. 6.5 , these techniques are applied to the 'gold standard' collection created in

Sect. 6.2 so that the best parameters and algorithms for relevance feedback for both

hypertext and semantic search can be determined. In Sects. 6.6 and 6.7 the effects

of using pseudo-feedback and Semantic Web inference are evaluated. The system

1 Sponsored by Yahoo! Research for both 2010 and 2011.

Social Semantics: The Search for Meaning on the Web

Search WWH ::

Custom Search

Home