Information Technology Reference
In-Depth Information
users of a query in order to expand the query . By 'expand the query,' we mean that
the usually rather short query is expanded into a much larger query by adding words
from known relevant documents. For example, a query on the hypertext Web for
the Eiffel Tower given as 'eiffel' might be expanded into 'paris france eiffel tour.'
If the relevant pages instead were about an Eiffel Tower replica in Texas, the same
results query could be expanded into 'paris texas eiffel replica.' The same principle
applies to the Semantic Web, except that the natural language terms may include
Semantic Web URIs and terms resulting from inference or URI processing. The
hypothesis of relevance feedback, as pioneered by Rocchio in the SMART retrieval
system, is that the relevant documents will disambiguate and in general give a better
description of the information need of the query than the query itself (Rocchio
1971). Relevance feedback has been shown in certain cases to improve retrieval
performance significantly. Extending this classical work, relevance models ,as
formalized by Lavrenko (2008), create language models directly from the indexed
documents rather than explicitly waiting for the user to make a relevance judgment .
Relevance models are especially well-suited to our hypothesis that multiple kinds
of encodings should be part of the same sense, as relevance models consider each
source of data (query, documents, perhaps even tags and Semantic Web data) as
'snapshots' from some underlying generative model.
Since we will use representations from different sources of data, we cannot
simply contain the notion of resource to a single URI, as currently - as content
negotiation amongst various encodings is currently barely deployed on the Web
- hypertext web-pages and Semantic Web documents encoded in RDF without
exception almost always have different URIs. However, a web-page for the Eiffel
Tower encoded in HTML and a Semantic Web document encoded in RDF can still
share the same content of the Eiffel Tower, despite having differing URIs. So, the
information pertaining to a resource will be spread amongst multiple co-referential
URIs. Therefore, the best way to determine the set of URIs relevant to a particular
resource is to attach the resource to the information need of an ordinary web user as
expressed by a query in a search engine. Then the next step is to have humans judge a
set of Web representations - either Semantic Web documents, hypertext web-pages,
or both - and consider the set of these web representations and attendant URIs to be
a partial snapshot of the relevant information pertaining to a sense.
This technique can be transformed into a testable hypothesis; the hypothesis put
forward by Baeza-Yates that search on the Semantic Web can be used to improve
traditional ad-hoc information retrieval for hypertext Web search engines and vice-
versa (Baeza-Yates 2008). Currently, there exist several nascent Semantic Web
search engines that specifically index and return ranked Linked Data in RDF in
response to keyword queries. Yet their rankings are much less well-studied than
hypertext Web rankings, and so are thought likely to be sub-optimal. While we
realize the amount and sources of structured data on the Web are huge, to restrict
and test the hypothesis of Baeza-Yates, from hereon we will assume that 'semantic
search' refers to indexing and retrieving of Linked Data by search engines like
Sindice and FALCON-S (Cheng et al. 2008), and hypertext search refers to the
Search WWH ::




Custom Search