Information Technology Reference
In-Depth Information
relevance, but select on a small subset. However, pseudo-feedback does not take
advantage of users selecting web-pages, but just assumes the top x are relevant.
A better approach would be to consider click-through logs of search engines as
incomplete approximations of manual relevance feedback (Cui et al. 2002). As we
only had a small sample of the Microsoft Live Query log, this was unfeasible for
our experiments, but would be compelling future work. There is a massive amount
of human user click-through data available to commercial hypertext search engines,
although Semantic Web data has little relevance feedback data itself. While it is
easy enough to use query logs to determine relevant hypertext Web data, no such
option exists for the Semantic Web. However, there are possible methodologies for
determining the 'relevance' of Semantic Web data, even if machines rather than
humans are consuming the data. For example, Semantic Web data that is consumed
by applications like maps and calendar programs can be ascertained to be actually
relevant.
Finally, while generic Semantic Web inference may not help in answering simple
keyword-based queries for entities and concepts, further research needs to be done
to determine if inference can help answer complex queries. While in most keyword-
based searches the name of the information need is mentioned directly in the
query, which in our experiment results from choosing the queries via a named
entity recognizer, in complex queries only the type or attributes of the information
need are mentioned directly. The name of particular answers is usually unknown.
Therefore, some kind of inference may be crucial in determining what entities or
concepts match the attributes or type mentioned in the query terms. For example, the
SemSearch 2011 competition's 'complex query' task was very difficult for systems
that did well on keyword search, and the winning system used a customized crawling
of the Wikipedia type hierarchy (Blanco et al. 2011a).
6.9.1
Experimental Conclusions
This study features a number of results that impact the larger field of semantic
search. First, it shows a rigorous information retrieval evaluation, the 'Cranfield
paradigm', can be applied to semantic search despite the differences between the
Semantic Web and hypertext. These differences are well-recorded in our sample
of the Semantic Web as taken via FALCON-S using a query log, and reveals a
number of large differences between the Semantic Web data and hypertext data, in
particular that while relevant data for ordinary open-domain queries does appear
on the Semantic Web, Semantic Web data is in general more sparse than hypertext
data when given a keyword query from an ordinary user's hypertext Web search.
However, when the Semantic Web does contain data relevant to a given query, that
data is likely to be accurate information, a fact we exploit in our techniques.
Unlike previous work in semantic search that focuses usually on some form of
PageRank or other link-based ranking, we concentrate on using techniques from
information retrieval, including language models and vector-space models, over
Search WWH ::




Custom Search