Information Technology Reference
In-Depth Information
tries to avoid risk, a Monster Killer (MK) persona who tries to kill all monsters
and escape the level, and a Trea s u re Collector (TC) persona who attempts to
collect all treasures and escape the level. The decision making styles are defined
by the utility weights presented in Table 1, and serve as a metaphor for the rela-
tive importance of the affordances to the archetypical player represented by the
persona. When assigned to personas, utility points from a level are normalized by
the maximally attainable utility for the same level. Personas are evolved by, for
each generation, exposing them to 9 of the 10 levels of MiniDungeons, yielding
50 agents in total. For each generation, their fitness is computed as the average of
the normalized utility scores from the seen levels. All subsequent evaluations
presented in this paper are done using 10-fold cross validation, i.e., a persona is
evaluated on the level which it was not exposed to during evolution.
Clones: Clones, like personas, are evolved by exposing them to 9 of the 10 levels
of MiniDungeons. Their fitness value is computed as the average nor- malized
AAR across all 9 seen levels. One clone per player per map is evolved, yielding
380 agents in total. All subsequent tests are done using 10-fold cross validation,
evaluating the clones on unseen levels.
Baseline Agents: In order to evaluate the limits of the perceptron-based
representation, a set of baseline agents is evolved, one agent for each human
playtrace, 380 total. These are exposed to a single level of MiniDungeons. Their
fitness scores are computed directly from AAR in an attempt to establish the
closest fit to each human player that the representation can achieve.
5
Results
This section compares the two presented evaluation metrics, and compares the
ability of personas, clones, and baseline agents to represent human decision mak-
ing styles in MiniDungeons. Table 2 shows the mean of the agreement ratios for
each kind of agent evolved, using both the AAR and TAR metrics. The ratios
indicate that all agents achieve higher agreement with human playtraces when
evaluated with the AAR metric than with the TAR metric. Additionally, they
indicate that when using AAR clones perform only slightly better than personas (t
= −3.23, df = 753.00, p < while when using TAR the clones perform
substantially better than the personas (t = −39.26, df = 721.51, p < 0.001), as
tested using Welch's t test. Using AAR, the baseline agents perform significantly
better than both personas and clones (df = 2, F = 62.59, p < 0.001), but when
using TAR they perform significantly worse than the clones (df = 2, F = 59.1, p <
0.001), as tested using ANOVA. Table 3 shows which personas exhibited the best
ability to represent human playtraces, for each MiniDungeons level and in total.
For each human playtrace, the personas with the highest AAR and TAR,
respectively, are identified. Both metrics generally favor the Treasure Collector
persona as the best match for most playtraces, although there is some discrepancy
between the two measures in terms of which personas represent the human
playtraces best.
Search WWH ::




Custom Search