Our SAGs: Design Aspects and Improvements - Semantic Acquisition Games: Harnessing Manpower for Creating Semantics

Game Development Reference

In-Depth Information

present in more specific domains or even between individual tasks (the player might

be able to identify one work of van Gogh and meanwhile completely mismatch some

other). Ultimately, some tasks could be executed by only some players.

Of course, the performance of the player (by which we mean the quality and

quantity of usable artifacts) is also influenced by his motivation, joy or entertainment

that the gaming process provide (especially in case of SAGs). But, these factors are

also (at least partially) determined by the task domain (e.g. the content the user is

interacting with).

Therefore, assigning the players with “right” tasks may arguably improve the

outcome of the crowdsourcing process. If the tasks are assigned randomly to the

players, it causes the low quality of the process results [ 1 ], e.g. too general meta-

data [ 2 ]. Despite that, only few works have been issued to address this problem [ 1 ].

8.2.3.2 Experiments with PexAce Logs

To learn about the potential of recognizing and using the player expertise informa-

tion, we re-examined the logs of the PexAce and run several synthetic experiments.

Their basic idea was to compute information about player's expertise, then use it for

weighting the player term suggestions and see, whether this leads to higher quality

results.

We firstly needed to define some measures to represent the degree of player's

expertise for the game. We worked with two measures:

1. Usefulness , which was defined as a ratio of number of correct term suggestions

made by the player to all of his suggestions. The “correct term” means a term

positively validated against golden standard, resp. judge evaluation.

2. Participation on the consensus or the consensus ratio , i.e. ratio of number of

suggestions made by player that were also suggested by someone else to all of

the player's suggestions.

The first measure exactly matches the value of the player (from the game's purpose

standpoint), but cannot be acquired without a golden dataset. We can use it in lab-

oratory conditions to evaluate the players, but it would be impossible to measure it

during a real game deployment, where no dataset is available. On the other hand, the

consensus ratio can be measured upon game deployment. However, though it can

be reasonably expected to correlate with true usefulness (after all, the mutual player

artifact validation is based on it) a bias can be expected.

Having the measures defined, we were seeking answers to following questions:

1. How do the players of the game differ? How do they differ according to their

usefulness and how according to the consensus ratio? How does the usefulness

correlates with the consensus ratio?

2. Will it help the game purpose, if we start to weight player suggestions according

to the player usefulness? Will it improve the output quality? Is it sufficient to use

the consensus ratio only, or we need to seek other ways of acquiring information

about expertise of the players (other measures)?

Search WWH ::

Custom Search

Home