The Semantics of Search - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

The standard TREC definition for relevance is “If you were writing a report on

the subject of the topic and would use the information contained in the document in

the report, then the document is relevant” (Hawking et al. 2000). As semantic search

is supposed to be about entities and concepts rather than documents, semantic search

needs a definition of relevance based around information about entities or concepts

that is independent of particular terms in queries or documents. In one sense,

this entity-centric relevance should have a wider remit than the document-centric

relevance definition, as any information about the entity that could be relevant

should be included. Yet in another sense, this definition is more restrictive, as if one

considers the world (perhaps fuzzily) partitioned into distinct entities and concepts,

then merely related information would not count. In the instructions, relevance was

defined as whether or not a result is about the same thing as the query, which can

be determined by whether or not accurate information about the information need

is expressed by the result . The following example was given to the judges: “Given

a query for 'Eiffel Tower,' a result entitled 'Monuments in Paris' would likely be

relevant if there was information about the Eiffel Tower in the page, but a result

entitled 'The Restaurant in the Eiffel Tower' containing only the address and menus

of the restaurant would not be relevant.”

Kinds of Web results that would ordinarily be considered relevant are therefore

excluded. In particular, there is a restriction that the relevant information must

be present in the result itself. This excludes possibly relevant information that is

accessible via outbound links, even a single link. All manner of results that are

collections of links are thus excluded from relevancy, including both 'link farms'

purposely designed to be highly ranked by page-rank based search engines, as well

as legitimate directories of high-quality links to relevant information. These hubs are

excluded precisely because the information, even if it is only a link transversal away,

is still not directly present in the retrieved result. By this same principle, results

that merely redirect to another resource via some method besides the standardized

HTTP methods are excluded, since a redirection can be considered a kind of link.

They would be considered relevant only if additional information was included in

the result besides the redirection itself.

In order to aid the judges, a Web-based interface was created to present the

queries and results to the judges. Although an interface that presented the queries

and the search interface in a manner similar to search engines was created, human

judges preferred an interface that presented them the results for judgments one-

at-a-time, forcing them to view a rendering of the web-page associated with each

URI originally offered by the search engine. For each hypertext web-page, the

web-page was rendered using the Firefox Web Browser and PageSaver Pro 2.0.

For each Semantic Web document, the result was rendered (i.e. the triples and any

associated text in the subject) by using the open-source Disco Hyperdata Browser

with Firefox. 2

In both cases, the resulting rendering of the Web representation was

2 The Disco Hyperdata Browser, a browser that renders Semantic Web data to HTML, is available

at http://www4.wiwiss.fu-berlin.de/bizer/ng4j/disco/

Social Semantics: The Search for Meaning on the Web

Search WWH ::

Custom Search

Home