PexAce: A Method for Image Metadata Acquisition - Semantic Acquisition Games: Harnessing Manpower for Creating Semantics

Game Development Reference

In-Depth Information

3. Third, we combined the two data sets from above (3-15 tags per image).

4. Lastly, we employed (different) experts in directly evaluating the tags produced

by the game (the task for the experts was to remove wrong tags from the set).

At first, we aimed for the original Corel 5K image dataset only. It consists of 5,000

images (which we used in the game). The images are shipped with assigned tags,

65% of images have four tags assigned. The tags are considered correct [ 1 ]. On

the other hand, these metadata are very narrow—they do not even cover all major

features for some of the images. Therefore, we aimed for the dataset extension by

means of expert work.

Yet even the extended gold standard did not offer full coverage of topics relevant to

the content of the images—many concepts have weaker links to an image content, but

may still be considered relevant. The player annotators may look on the images from

different perspectives, use different words even when describing the same semantics,

focus on different image aspects. Therefore, we also opted for a posteriori evaluation

of the created tags, which could capture the validity of all of them.

Hypotheses. The image tags acquired through PexAce are correct (they are con-

firmed by gold standard or experts as correct). For a posteriori evaluation, we expected

a higher precision than for gold standard (experiments 1, 2 and 3) evaluation, yet for

this evaluation, we expected at least 50% precision.

Participants. The players of the gamewere all Slovak native speakers. They played

the game mostly in Slovak language. The three experts created the gold standard,

and three evaluated the game-acquired tags in a posteriori evaluation. The experts

were familiar with the concept of multimedia metadata.

Data. A set of 400 images (along with their game tags) was randomly selected

from images tagged within the game. Three experts working separately were asked

to assign 10-15 tags for each image. When at least two of the three agreed on a same

tag, the tag was added to the reference tag set of the image. The experts creating

the gold standard were not aware about the tags existing in Corel dataset—these

were added to reference sets afterward. In the end, each of the evaluated images had

a reference set of 3-15 tags (majority of images had seven or eight tags).

Process. For experiments 1, 2 and 3, the precision was computed automatically

against the reference sets. The resulting precision for each setup was computed as

the average of precisions of individual images (the precision was equal to number of

correct tags divided by all tags assigned to image). For experiment 4, the precision

was computed in a same way—only the decision whether a tag is describing an

image or not, was left to a group of judges. Each judge independently reviewed all

tag assignments. When he felt a tag is incorrectly assigned, he marked it. When all

judges finished working, all tags marked at least once, were considered invalid.

Results. For experiment 1 (Corel 5k dataset only), the precision was only 37.42%,

for experiments 2 (our expert gold standard) and 3 (joint gold standard), this was

almost twice as much: 65 and 68% respectively. As expected, the best results were

yielded by experiment 4: 94% precision. These results show us a high precision of

the tags acquired through PexAce as image metadata.

Semantic Acquisition Games: Harnessing Manpower for Creating Semantics

Search WWH ::

Custom Search

Home