Game Development Reference
In-Depth Information
1
0.9
0.8
0.7
0.6
0.5
0.4
Players
Consensus ratio
Usefulness
Fig. 8.1 The players of the PexAce (x-axis). A comparison of consensus ratios and usefulness for
individual players. Surprisingly, there is only a partial correlation between the two measures
We took our game logs, which was basically a set of unique term suggestions for
images (i.e. in a form player-image-tag ). We have filtered out players with too few
suggestions (less than 30 by the rule of thumb) resulting in 58 remaining players. We
computed consensus ratio for each player. Using the golden standard we prepared in
previous experiments, we computed usefulness for each player (see Fig. 8.1 ). Then,
using the Pearson's correlation index, we computed the correlation of usefulness to
the consensus ratio as 0.496. Such correlation is not high, as one would expect, but
rather midway. Apparently, the consensus ratio bears a significant bias and does not
always represent the true usefulness of the player. Even so, we tested whether it can
help in increasing the quality of SAG output.
We have modified the original method of consensus tag filtering by introducing
vote weights based on expertise. First, we used the usefulness values, second, the
consensus ratio values. Both were originally numbers on the 0 to 1 interval so we
directly use them as weight values. At the same time, we set up a threshold parameter,
which was used to compute whether a consensus was or was not reached: if the sum
of all suggestions (weights) suggesting the same tag to same images was higher than
a threshold, the tag was accepted, otherwise not. We initially guessed the optimal
threshold value as a doubled average usefulness value of the dataset (which was 0.7),
however, we run the experiment multiple times with different thresholds.
Using the weighting methods, we repeatedly filtered the final tags. Each time,
we compared the tags with the golden standard, computing the output correctness.
The results can be seen in the Fig. 8.2 , where both approaches were compared with
a baseline correctness which was achieved without the use of weighting. As we
can observe in the graph, with the growing threshold (increasing strictness), the
usefulness-based approach starts giving results significantly more correct. We can
also see that the performance of the consensus ratio-based method also grows, but
Search WWH ::




Custom Search