Database Reference
In-Depth Information
The question of the best value of s C is difficult and a subject of forthcoming
investigations and will not be addressed further here.
In closing let us turn our attention to a further special problem in the conditional
version. If a rule is no longer applied for recommendations after exceeding the
threshold value n min (because in the meantime other rules have become preferred),
it has - at least in the simplified version ( 5.26 ) - in general little chance to be
applied again, since the conditional probability p ss 0 is no longer being updated. This
holds even if its potential acceptance has increased again.
In order to get around this, we introduce a special explorative delivery mode for
the DP algorithm. For this, similarly to the
ε DP is
specified, in which instead of being delivered according to the action-value function
q π s
ε
-greedy policy, a percentage rate
ð , the recommendations are delivered in descending order according to the
following criterion:
;
r ss a ¼Δ
ðÞ¼p ss a p ss a
p a r a :
Θ
s
;
ð 5
:
27 Þ
Thus, the idea is that the difference between the unconditional probability p ss a
and the conditional probability p ss a is a good indicator for whether a rule has become
more attractive again. For if the difference increases, the user will be more inclined
toward product s a even without a recommendation, and the necessity of its delivery
increases.
Let us emphasize that the empirical approach of this section just presents some
very first and simple approaches to handle the crucial problem of statistical stability
of the DP-Version. Surely, much more advanced instruments can be developed.
Despite this, in Chaps. 6 , 7 , 8 , 9 , and 10 we will develop mathematically
more demanding methods to increase the stability of our RL approach for
recommendations.
That concludes our trip around the basic RL methods for our RE framework.
Let us now consider their experimental evaluation.
5.4 Experimental Results
In this section we will present experimental results for the approaches of Sects. 5.2
and 5.3 . Therefore, we will first verify the central Assumption 5.2 experimentally.
After that we extend the simulation of Sect. 4.4 in such a way that we first model the
environment, i.e., the conditional transition probabilities and rewards. Then, for the
actual simulation, we will use the environment model in order to generate an
arbitrary number of virtual sessions for testing the recommendation algorithms
under conditions close to reality. At the end of this section, we will use the extended
simulation for testing the algorithms introduced in this chapter.
Search WWH ::




Custom Search