Information Technology Reference
In-Depth Information
The standard TREC definition for relevance is “If you were writing a report on
the subject of the topic and would use the information contained in the document in
the report, then the document is relevant” (Hawking et al. 2000). As semantic search
is supposed to be about entities and concepts rather than documents, semantic search
needs a definition of relevance based around information about entities or concepts
that is independent of particular terms in queries or documents. In one sense,
this entity-centric relevance should have a wider remit than the document-centric
relevance definition, as any information about the entity that could be relevant
should be included. Yet in another sense, this definition is more restrictive, as if one
considers the world (perhaps fuzzily) partitioned into distinct entities and concepts,
then merely related information would not count. In the instructions, relevance was
defined as whether or not a result is about the same thing as the query, which can
be determined by whether or not accurate information about the information need
is expressed by the result . The following example was given to the judges: “Given
a query for 'Eiffel Tower,' a result entitled 'Monuments in Paris' would likely be
relevant if there was information about the Eiffel Tower in the page, but a result
entitled 'The Restaurant in the Eiffel Tower' containing only the address and menus
of the restaurant would not be relevant.”
Kinds of Web results that would ordinarily be considered relevant are therefore
excluded. In particular, there is a restriction that the relevant information must
be present in the result itself. This excludes possibly relevant information that is
accessible via outbound links, even a single link. All manner of results that are
collections of links are thus excluded from relevancy, including both 'link farms'
purposely designed to be highly ranked by page-rank based search engines, as well
as legitimate directories of high-quality links to relevant information. These hubs are
excluded precisely because the information, even if it is only a link transversal away,
is still not directly present in the retrieved result. By this same principle, results
that merely redirect to another resource via some method besides the standardized
HTTP methods are excluded, since a redirection can be considered a kind of link.
They would be considered relevant only if additional information was included in
the result besides the redirection itself.
In order to aid the judges, a Web-based interface was created to present the
queries and results to the judges. Although an interface that presented the queries
and the search interface in a manner similar to search engines was created, human
judges preferred an interface that presented them the results for judgments one-
at-a-time, forcing them to view a rendering of the web-page associated with each
URI originally offered by the search engine. For each hypertext web-page, the
web-page was rendered using the Firefox Web Browser and PageSaver Pro 2.0.
For each Semantic Web document, the result was rendered (i.e. the triples and any
associated text in the subject) by using the open-source Disco Hyperdata Browser
with Firefox. 2
In both cases, the resulting rendering of the Web representation was
2 The Disco Hyperdata Browser, a browser that renders Semantic Web data to HTML, is available
at http://www4.wiwiss.fu-berlin.de/bizer/ng4j/disco/
Search WWH ::




Custom Search