Information Technology Reference
In-Depth Information
3 System Evaluation
In order to evaluate the system, (
= 10) active smartphone users, including five
females and five males, aged between 21 to 33, (
n
82), were
randomly selected from National University of Singapore's staff and students.
None of the participants had previous experience of interacting with similar tech-
nologies such as empathetic virtual agents or emotional robots. The evaluation
was performed in two stages using a simulator. The first stage the PME mod-
ule was evaluated and in the second stage, the fitness of the virtual companion
behaviors was tested.
The PME was tested by comparing 120 samples of mood data inferred from
the proposed PME model against the self-reported user moods. Each user has
logged the smartphone usage and context data such as time, location, running
apps, call logs etc. in relation to their self-perceived mood. The self-perceived
user mood state was compared against the result of the proposed model. The
Mann-Whitney U test was used to evaluate whether or not there was any sig-
nificance difference in the valence arousal space between self-perceived and the
automatically inferred results. The p value for two tailed is 0.64 for the valence
space, and 0.73 for the arousal space. Both p values are higher than 5%. There-
fore, could be concluded that there is no significance difference in the proposed
model and user self-perceived affective state. This suggests the success of PME
model in the correct inference of the mood state of the user.
In the second experiment, the goal was to confirm that the interactive com-
panion could produce corresponding behaviors, which relates to the mood. The
fitness of the virtual companion behaviors points to the correspondence of the
auto-generated agent's behavior from the point of view of the users. Since, the
suitability of the behavior is a subjective issue and cannot be measured quan-
titatively we adopted the method proposed by [12] to assess the model. In this
method, the participants were given 10 different combinations of the user mood
(themselves), artificial companion's mood, and time of the day (special time or
not). For each scenario of the mentioned combinations, they observed 5 randomly
generated behavior by the animated agent followed by 5 behavior generated by
the proposed behavior network. Afterward, they were asked to rate the appro-
priateness of the behavior from 1 (strongly inappropriate) to 5 (strongly appro-
priate). The mean fitness scores of each participant were calculated as shown in
table 1. The result was analyzed by Wilcoxon signed-rank test with the fitness
scores. As a result, the p value was obtained as 0 . 005 < 0 . 5% which confirms
the proposed model succeeded in generating suitable behavior compared to ran-
domly generated behaviors.
Mean
=25
.
8
,SD
=3
.
Tabl e 1. Mean fitness ranks
Participant pa1 pa2 pa3 pa4 pa5 pa6 pa7 pa8 pa9 pa10
Random 2.75 2.25 1.87 2.5 3.125 2.25 2.75 1.62 2.5 2.12
Behavior Net. 4 4.25 3.37 3.25 3.25 3.87 3.62 3.62 4 3.12
 
Search WWH ::




Custom Search