Information Technology Reference
In-Depth Information
Fig. 6.3
The interface used to judge web-page results for relevancy
×
saved at 469
631 pixel resolution. The reason that the web-page was rendered
instead of a link given directly to the URI is because of the unstable state of the
Web, especially the hypertext Web. Even caching the HTML would have risked
losing much of the graphic element of the hypertext Web. By creating 'snapshot'
renderings, each judge at any given time was guaranteed to be presented with the
result in the same visual form. One side-effect of this is that web-pages that heavily
depend on non-standardized technologies or plug-ins would not render and were
thus presented as blank screen shots to the user, but this formed a small minority of
the data. The user-interface divided the evaluation into two steps:
Judging relevant results from a hypertext Web search: The judge was given the
search terms created by an actual human user for a query and an example relevant
web-page whose full snapshot could be viewed by clicking on it. A full rendering
of the retrieved web-page was presented to the user with its title and summary
(as produced by Yahoo! Search) easily viewed by the judge as in Fig. 6.3 .The
judge clicked on the check-box if the result was considered relevant. Otherwise,
the web-page was by default recorded as not relevant. The web-page results were
presented to the judge one at a time, ten times for each query.
Judging relevant results from a Semantic Web search: Next, the judge assessed
all the Semantic Web results for relevancy. These results were retrieved from the
Semantic Web using the same interface displayed to the judge in the first step as
showninFig. 6.4 , and a title was displayed by retrieving any literal values from
rdfs:label properties and a summary by retrieving any literal values from
rdfs:comment values. Using the same interface as in the first step, the judge
had to determine whether or not the Semantic Web results were relevant.
statistic was taken in order to
test the reliability of inter-judge agreement on relevancy judgments (Fleiss 1971).
Simple percentage agreement is not sufficient, as it does not take into account the
After the ratings were completed, Fleiss'
κ
Search WWH ::




Custom Search