Game Development Reference
In-Depth Information
3. Third, we combined the two data sets from above (3-15 tags per image).
4. Lastly, we employed (different) experts in directly evaluating the tags produced
by the game (the task for the experts was to remove wrong tags from the set).
At first, we aimed for the original Corel 5K image dataset only. It consists of 5,000
images (which we used in the game). The images are shipped with assigned tags,
65% of images have four tags assigned. The tags are considered correct [ 1 ]. On
the other hand, these metadata are very narrow—they do not even cover all major
features for some of the images. Therefore, we aimed for the dataset extension by
means of expert work.
Yet even the extended gold standard did not offer full coverage of topics relevant to
the content of the images—many concepts have weaker links to an image content, but
may still be considered relevant. The player annotators may look on the images from
different perspectives, use different words even when describing the same semantics,
focus on different image aspects. Therefore, we also opted for a posteriori evaluation
of the created tags, which could capture the validity of all of them.
Hypotheses. The image tags acquired through PexAce are correct (they are con-
firmed by gold standard or experts as correct). For a posteriori evaluation, we expected
a higher precision than for gold standard (experiments 1, 2 and 3) evaluation, yet for
this evaluation, we expected at least 50% precision.
Participants. The players of the gamewere all Slovak native speakers. They played
the game mostly in Slovak language. The three experts created the gold standard,
and three evaluated the game-acquired tags in a posteriori evaluation. The experts
were familiar with the concept of multimedia metadata.
Data. A set of 400 images (along with their game tags) was randomly selected
from images tagged within the game. Three experts working separately were asked
to assign 10-15 tags for each image. When at least two of the three agreed on a same
tag, the tag was added to the reference tag set of the image. The experts creating
the gold standard were not aware about the tags existing in Corel dataset—these
were added to reference sets afterward. In the end, each of the evaluated images had
a reference set of 3-15 tags (majority of images had seven or eight tags).
Process. For experiments 1, 2 and 3, the precision was computed automatically
against the reference sets. The resulting precision for each setup was computed as
the average of precisions of individual images (the precision was equal to number of
correct tags divided by all tags assigned to image). For experiment 4, the precision
was computed in a same way—only the decision whether a tag is describing an
image or not, was left to a group of judges. Each judge independently reviewed all
tag assignments. When he felt a tag is incorrectly assigned, he marked it. When all
judges finished working, all tags marked at least once, were considered invalid.
Results. For experiment 1 (Corel 5k dataset only), the precision was only 37.42%,
for experiments 2 (our expert gold standard) and 3 (joint gold standard), this was
almost twice as much: 65 and 68% respectively. As expected, the best results were
yielded by experiment 4: 94% precision. These results show us a high precision of
the tags acquired through PexAce as image metadata.
Search WWH ::




Custom Search