Game Development Reference
In-Depth Information
present in more specific domains or even between individual tasks (the player might
be able to identify one work of van Gogh and meanwhile completely mismatch some
other). Ultimately, some tasks could be executed by only some players.
Of course, the performance of the player (by which we mean the quality and
quantity of usable artifacts) is also influenced by his motivation, joy or entertainment
that the gaming process provide (especially in case of SAGs). But, these factors are
also (at least partially) determined by the task domain (e.g. the content the user is
interacting with).
Therefore, assigning the players with “right” tasks may arguably improve the
outcome of the crowdsourcing process. If the tasks are assigned randomly to the
players, it causes the low quality of the process results [ 1 ], e.g. too general meta-
data [ 2 ]. Despite that, only few works have been issued to address this problem [ 1 ].
8.2.3.2 Experiments with PexAce Logs
To learn about the potential of recognizing and using the player expertise informa-
tion, we re-examined the logs of the PexAce and run several synthetic experiments.
Their basic idea was to compute information about player's expertise, then use it for
weighting the player term suggestions and see, whether this leads to higher quality
results.
We firstly needed to define some measures to represent the degree of player's
expertise for the game. We worked with two measures:
1. Usefulness , which was defined as a ratio of number of correct term suggestions
made by the player to all of his suggestions. The “correct term” means a term
positively validated against golden standard, resp. judge evaluation.
2. Participation on the consensus or the consensus ratio , i.e. ratio of number of
suggestions made by player that were also suggested by someone else to all of
the player's suggestions.
The first measure exactly matches the value of the player (from the game's purpose
standpoint), but cannot be acquired without a golden dataset. We can use it in lab-
oratory conditions to evaluate the players, but it would be impossible to measure it
during a real game deployment, where no dataset is available. On the other hand, the
consensus ratio can be measured upon game deployment. However, though it can
be reasonably expected to correlate with true usefulness (after all, the mutual player
artifact validation is based on it) a bias can be expected.
Having the measures defined, we were seeking answers to following questions:
1. How do the players of the game differ? How do they differ according to their
usefulness and how according to the consensus ratio? How does the usefulness
correlates with the consensus ratio?
2. Will it help the game purpose, if we start to weight player suggestions according
to the player usefulness? Will it improve the output quality? Is it sufficient to use
the consensus ratio only, or we need to seek other ways of acquiring information
about expertise of the players (other measures)?
Search WWH ::




Custom Search