The Semantics of Search - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

relevance, but select on a small subset. However, pseudo-feedback does not take

advantage of users selecting web-pages, but just assumes the top x are relevant.

A better approach would be to consider click-through logs of search engines as

incomplete approximations of manual relevance feedback (Cui et al. 2002). As we

only had a small sample of the Microsoft Live Query log, this was unfeasible for

our experiments, but would be compelling future work. There is a massive amount

of human user click-through data available to commercial hypertext search engines,

although Semantic Web data has little relevance feedback data itself. While it is

easy enough to use query logs to determine relevant hypertext Web data, no such

option exists for the Semantic Web. However, there are possible methodologies for

determining the 'relevance' of Semantic Web data, even if machines rather than

humans are consuming the data. For example, Semantic Web data that is consumed

by applications like maps and calendar programs can be ascertained to be actually

relevant.

Finally, while generic Semantic Web inference may not help in answering simple

keyword-based queries for entities and concepts, further research needs to be done

to determine if inference can help answer complex queries. While in most keyword-

based searches the name of the information need is mentioned directly in the

query, which in our experiment results from choosing the queries via a named

entity recognizer, in complex queries only the type or attributes of the information

need are mentioned directly. The name of particular answers is usually unknown.

Therefore, some kind of inference may be crucial in determining what entities or

concepts match the attributes or type mentioned in the query terms. For example, the

SemSearch 2011 competition's 'complex query' task was very difficult for systems

that did well on keyword search, and the winning system used a customized crawling

of the Wikipedia type hierarchy (Blanco et al. 2011a).

6.9.1

Experimental Conclusions

This study features a number of results that impact the larger field of semantic

search. First, it shows a rigorous information retrieval evaluation, the 'Cranfield

paradigm', can be applied to semantic search despite the differences between the

Semantic Web and hypertext. These differences are well-recorded in our sample

of the Semantic Web as taken via FALCON-S using a query log, and reveals a

number of large differences between the Semantic Web data and hypertext data, in

particular that while relevant data for ordinary open-domain queries does appear

on the Semantic Web, Semantic Web data is in general more sparse than hypertext

data when given a keyword query from an ordinary user's hypertext Web search.

However, when the Semantic Web does contain data relevant to a given query, that

data is likely to be accurate information, a fact we exploit in our techniques.

Unlike previous work in semantic search that focuses usually on some form of

PageRank or other link-based ranking, we concentrate on using techniques from

information retrieval, including language models and vector-space models, over

Search WWH ::

Custom Search

Home